A framework for comparing wines?

Hey Berserkers! For the last 18 months I have been building a wine platform that has a large amount of data on ~55,000 wines including metadata, descriptions on thousands of wines/producers, critic scores, and live pricing from a network of retailers. I am now building the front end that will organize this data to help users better discover and find wines they like.

With that in mind, I’ve come up with a set of analytics for every wine that has 5 metrics in a radar chart so they can easily be visualized among wines. It’s a rough draft and I would be very grateful if you guys have any thoughts or feedback on this initial set of metrics.

Value - This starts as a basic QPR that compares price to average critic score, but then ranks the wine against a blend of all wines and (if enough data) its regional peers to give a percentile rating. So a high “value” Beaujolais will have better QPR than other Beaujolais.

Popularity - Over time, this will be a ranking of relative interest (page views, engagement, searches) for the wine on my website. But since it’s so new and I don’t have that data, I’m currently using a proxy based on a variety of factors that simulate popularity.

Prestige - This is a producer-level metric that applies to all the wines they make. It’s weighted blend of critic reviews across their portfolio, review coverage (total number of reviews compared against all other producers), rating consistency across all wines (std dev among ratings), and popularity within my database. This rating is then normalized 0-1 across all producers that have enough data points to generate a ranking. Ultimately asking - how well-known, durable, and consistent has this producer been over time?

Availability - I’m ingesting inventory / price data from a number of retailers in my network and based on that, wines with more available offers get a higher score.

Critic Scores - Simply the normalized, average scores for this wine among all critics across all vintages where an average of 75 points would get a 0, a wine with an average of 98 points would get a 1.

A few notes…

  • In my testing, this has been a great starting point. It’s really fun, for example, to use this for “Similar Wines” because it surfaces wines not just in the same style or price range, but also with the same “shape” that makes it far more useful.

  • All things being equal, it seems that a more “full” chart represents a “better” wine but this isn’t really designed to be a judgement on good or bad, more of a tool for comparison. For example, high availability may signal that you can easily buy more if you like it, while low availability may indicate something scarce and worth collecting, it depends on what you want.

  • I don’t think any of these metrics mean anything on their own. With a field like wine that has an overwhelming amount of data, this is just an attempt to create a standardized way to search / compare / discover new things.

I’m curious what you guys think of this methodology? Are these the best metrics on which to compare wines? Do any seem not very useful? What factors am I missing? I can think of no group of people whose thoughts on this topic I’d be more interested in hearing, so thank you for your help!


note the “flat top” (high in all metrics except value) shared among Biondi-Santi’s similar wines…

We’re clearly not the target audience.

The first thing that jumps out at me is the unevenness of data across wines/producers/regions could lead to some very funky outcomes.

3 Likes

Hi Rahsaan, thanks for the reply. Could you elaborate on both points?

These are basically advanced filters that can surface interesting wines you may not have seen before. How are you not an audience for that?

In what specific ways do you see data being uneven across regions and what types of funky outcomes could you imagine as a result? Most of these metrics are calculated among region peers so even if the charts end up meaning something different for Sta. Rita Hills than Tuscany they stay relevant within regions.

A bigger issues I’m having so far is not uneven or “funky” outcomes, but simply not having enough data, but what’s there seems quite meaningful so far.

Most people on here think they know wine better than critics. To the extent we know our own palates very well, we are correct. Look at the threads, we are constantly telling people not to pay too much attention to critics.

We also have very idiosyncratic definitions of value. Look into the Burgundy threads.

Most people here think scores/points are silly and not meaningful.

Critics are not created equally. Not all regions are covered (equally) by critics. Not all critics use the same metrics (some would never give above 95 to a village level wine, others give everything above a 95). You can take care of some of that with statistical adjustments, but the fundamental data issue is that not all wines will have the same depth of ‘scores’. So some wines may appear more highly rated/better value just because of quirks in which data were available. Certainly not for reasons that would lead folks on here to seek them out.

The popularity measure is also prone to value wines for reasons other than what we would find interesting.

That said, there is probably a market for the product.

You might start in a particular region where you could get full coverage of all the relevant wines?

That makes sense and I totally agree regarding critics. However, data can behave differently at scale. For example, individual stock pickers might all be idiots but collectively the market ends up being right. So any individual JS 100 might be silly, but across many thousands of scores interesting things do emerge.

While critic scores do contribute to 3 of my metrics, they are only one piece of data (other than “Critic Scores” obviously)… and I think it works. Even if we all think we know better (and we do), it’s still kind of useful information.

I have not found depth of data in certain regions to be a huge issue so far, other than the fact that some wines may not appear in top listings, but that will only get better over time as I add more retailer inventory and data.

I also agree that popularity is maybe the most difficult one, partly because if I do rely on my own traffic/engagement data, there are lots of ways that ends up biased and not reflective of true relative interest, so I will keep working on improving or replacing that metric

Thank you, those are great insights!

A few questions I have:

Let’s say a vintage/style goes bonkers and suddenly becomes the it thing. Obviously some producers are timeless but others gain prestige/popularity rapidly. How would this account for wines that have a sudden spike? Would it show high popularity but low prestige if it wasn’t super well known or new on the scene?

Availability is a weird one because you mentioned it may vary location to location. Would you be able to take account for one location to another? For instance, could the wheel change based on what continent or country?

I mentioned above that popularity probably needs more nuance, but I think the “correct” outcome in that situation would be for popularity to rise quickly, but prestige to rise more modestly/slowly. Prestige would go up because coverage and high scores go up, but it also takes into account consistency / track record, so that would dampen its rise. They wouldn’t have the long-term track record of consistent greatness and high level of professional coverage that, say Vega Sicilia, Cheval Blanc, the first growths, etc. have.

Availability right now is definitely the simplest to calculate, it’s just the number of retail listings relative to all other wines, but I love the idea of normalizing that by country or region (i.e. how available this rioja is vs other riojas) makes a lot more sense. Like, Bordeaux dominates availability, so that shouldn’t penalize other regions. That’s a great idea, thank you.

Having nodes like prestige, popularity and critics scores will turn this tool simply into circular self fulfilling prophecy. People who are buying based on those things already know those things. That’s how they got here. That sort of information is trivial to find.

You need to decide who this tool is for. Is it for wine geeks that read this site? Is it for ‘prestige buyers’ who are looking to invest in show off labels? Is it for newbs looking for must try wines?

There needs to be information that concern things like style, user value and grape/region for it to truly help someone that doesn’t already know what the “prestige” is.

5 Likes

You need to familiarize yourself with RTP Richard Latham’s work…

8 Likes

The market is about which companies will be profitable. Wine points are different. Once you get past basic distinctions between quality and flawed wines, everything else is subjective taste preferences. It doesn’t matter how many critics give a wine 100 points if I don’t like the style/terroir. Wine geeks know that, which is why we generally disregard critics scores as a general rule (maybe for specific wines/critics where we’re in sync). But if you’re aiming at a different audience (i.e. not us), then sure.

It’s also the reason ‘crowd-sourced’ points like Cellartracker are cacophonous garbage.

I find that hard to believe. Maybe if you focus on general public regions like Napa, Bordeaux and the Southern Rhone. But wine geeks focus on Anjou, Savoie, Baden, which don’t get much critic attention.

4 Likes

This is probably targeted at more “leisurely wine lovers” than many people here, but here are my 2 cents: the metrics are interesting, but they are irrelevant to me when I want to compare wines or look for new ones. I would rather care about style than popularity, availability, etc…

2 Likes

Thanks for the responses, guys. I think there’s a bit of confusion because I framed my question wrong and wasn’t clear enough about the purpose of quantifying this data into these metrics. It is not about helping people find high prestige wines, “oh this wine is high prestige, I better buy it!,” it’s about building quantified profiles on wines so that comparisons can be made more deeply than any other platform (which can only search by price, style, region, and variety).

With these metrics… plus live pricing and availability… plus written profiles on the wine and producers, you can do things that aren’t possible anywhere else. For example, you can do a natural-language AI search for, “I had an Ardal RDD last night, help me find more wines like this” and while any other platform would just search for Tempranillo around $35 or whatever, having these extra dimensions makes this search much more powerful because it can do vector searches on the “shapes” or profiles of wines in addition to price, style, and variety. It will almost certainly surface things you’ve never heard of and may want to explore further.

The other use case for this data is trends. By calculating this data regularly across tens of thousands of wines and hundreds of thousands of retail offers, it can highlight interesting trends and shifts in the wine market over time that might be of value.

So I totally agree with AndyK that I wouldn’t care so much about how popular a wine is when I’m doing research for my own purchases. And I think my question was definitely framed wrong, it’s less about “what data points do you care about”… what I’m really working through is how to best quantify or group whatever data is available on wines into the most useful buckets.

if you are using critic scores to engineer three of your features how are you not concerned about propagating noise at worst, and at best multicollinearity with a weak signal? i would rethink the utility of assigned or perceived attributes that are not intrinsic to a wine and how or where it was made.

3 Likes

Preferences are so individual. Two people who lineup up on one style from one region can completely disagree on another style or region.

I point out that with my decades long regular blind tasting group, we’d typically have 6 of 8 wines getting both first and last place votes. In talking about the wines with the results it’s clear preferences go far beyond simple things like style. An individual aspect out of thousands of potential and many in particular in a given wine can be absolutely offputting or appealing to someone. Our sensitivity to smells depends, in part, on the number of each type of the 120 olfactory receptor types we have, which can be zero for some. Sense memory associations have an impact.

A couple decades ago Wine Spectator did a blind pairs tasting of Oregon Pinot vs Red Burgundy with their respective critics. On top of a good number of wrong guesses for which was which, their ratings sometimes differed notably. In one case it was by 25 points. I think the publisher learned that dispelling the illusion of authoritative objectivity wasn’t in their best interest.

I agree this seems too biased on the 100pt wine scale. There are a lot of wines out there that don’t get reviewed and would be completely out of place in this data set

2 Likes

I’m genuinely curious, for a user query like that what sort of results would you want to return that are not Tempranillos around $35? Even the example you gave in the original post is all Sangiovese around the same price point (with the exception of one Supertuscan).

I do think we can do much better than the status quo for wine recommendations but I’m not convinced that the simple 5-dimensional vector search is the best way to implement that. It is perhaps more interpretable but I don’t think it will necessarily yield higher quality recommendations.

The use of the radar plots is interesting. The critics score and qpr seem to be the most useful of the five variables. I am wondering if you can somehow display taste dimensions for the three other variables?

Another observation from tasting with a group that allows open discussion (not the illusion of consensus from domineering alpha douchebag bloviating) and voting results. Some wines do draw broad agreement across palates, other wines are notably polarizing.

One sort of example is culty wines are often extreme in style. Their customers, the market for that sort of wine, self-selects. Reviews might all be glowing. A random person trying that sort of wine might find such a wine highly offputting. Some people love wines with a lot of brett or VA that others abhor. Some grape varieties are polarizing.

1 Like

I think you should track as many metrics as you can for each wine and let users decide which attributes they want to compare on the graph

1 Like