I’ve got a bit of an issue with wine scores, which most I think can agree can range from subjective to deeply flawed. I just never loved them and find I can get very little value out of them.
It struck me that it might be possible to apply an elo based scoring system to wines, in a similar manner to how chess players are ranked, predicated on the idea that you can compare to wines far more easily than assigning a score on an arbitrary scale.
I wrote an article trying to unpack the idea. Not saying it’s the answer, just exploring whether it could be useful or totally misguided.
Here’s the write-up if anyone wants to take a look:
As part of messing around with it, I built a little app for myself to test the concept. A couple of friends wanted to try it, so I tidied it up and put it on the App Store. Not trying to “promote” anything just sharing in case anyone here is curious how the comparisons play out in practice.
I earn nothing from this (in fact cost me a bit to put together), just super curious to hear what you guys think of the concept. Mods, if this runs foul of any policy, my apologies, feel free to nuke it, but think it might make for interesting discussion.
My initial reaction is skepticism. It’s not clear what it means to ‘enjoy’ one wine more than another independent of specific contexts. It also seems to suffer from the same problem as cellartracker ratings, because if it is let loose on the masses then you will get lots of ‘data’ that are impossible to interpret and are mostly just garbage noise.
Another interesting decision: “Also, when a new vintage is added of an existing wine, it will inherit the Elo score of the most recent vintage in the system.”
I see why this is a simplifying move that makes sense computationally, but could lead to funny outcomes for wine.
Valid criticisms of the existing approach to points.
One bit I’ll challenge on (even though I believe it likely true).
The difference between a critic’s score of 93, 94 or 95, is more likely a reflection of mood, time pressure and unconscious bias than any defined qualitative experience.
It’s an assertion without data to back that up. The only critic I’m aware of who used to do anything to check this (and it doesn’t address the whole issue), was (Daniel) Rogov, an Israeli wine critic. He asked his assistant to plant duplicates in every blind tasting, and if his results for those two wines were far apart (>4 points from memory), he’d scrap the whole tasting results. In practice, with the usual compressed scale, that’s a big window to hit. You might be able to dig out those conversations on Wine Lovers Discussion Group (WLDG), perhaps contacting Robin Garr if Rogov’s old sub-site has gone.
One key element that could be a big advantage of your approach, is the varying approaches to scoring. 20 point and 100 point scales could both be used as inputs, and by calibrating the tasters use of their preferred scale, you can also isolate the significant differences of usage of the scales e.g. Suckling’s generous application of the 100 point scale, vs. some of the more challenging amateurs for whom 90 represents them thinking very highly of the wine.
As someone who studied stats at Uni, I see merit in treating it more like multivariate analysis (i.e. not just analysing the wine, but in the context of the taster).
There are still issues for sure:
The crowdsourced data that is influenced by critics, hence not as independent as it looks
Critics throwing 100 point scores out for impact (readers gravitate to those scores, so they get fed them)
Palates change over time, as do people’s attitudes to scores (some more generous, notably critics. Others less generous)
Pride. Who is confident enough to say they were wrong in an earlier tasting note? There appears to be a tendency to stick closely to previous scores.
Wine variability, especially under cork
Other variables as mentioned of mood, setting, weather, palate fatigue, food, other wines tasted adjacent to this one etc.
Subtle wines getting smothered in a lineup of bigger / more intense wines (and the reverse also true, of bigger wines appearing clumsy in a lineup of more elegant wines)
However… many of those feel likely to be acceptable if the sample size is sufficiently big, which ideally leaves you with two key variables: the wine and the wine taster. Isolating other variables feels impossible.
So yes, this is a shortcoming for sure. I’ve set it up so that new wines you rank are compared to the victor of your last pairing. Your personal current favourite as it were. This I’m sure can be improved on but was a relatively simple starting point, as most wines are indeed drunk one at a time. It does create a recency issue if your favourite wine was had some time ago. That said, this is a proof of concept meant to spark discussion, and flaws in the app are sure to be discovered and hopefully subject to future improvement as a result.
All excellent points! If only everyone were as stringent in their tasting practices (and ethics) as Rogov, we’d likely have a better impression of scores. The main problem as far as I see it remains one of interpreting scales (20-100 point, 5stars etc) in relation to perceived wine quality. Eliminating the need for such interpretation allows the focus to be solely on wine quality, and might just make the resulting score more reliable (provided sufficient sample size).
This is kind of dealbreaker for me, unfortunately, and I suspect part of the reason that point scoring was developed in the first place, as it is geared towards rating each bottle on its own merits, and not against another completely random wine.
Furthermore, what if I started the night with a champagne, and then moved on to a Cab Franc for dinner? If I want to rank each bottle, what criteria do I use? Does the Cab suffer because I liked the champagne more, even though the red was delicious and not flawed, went perfectly with my meal and was very enjoyable, but I just like champagne more as a personal preference. This premise that every wine is able to be compared 1:1 to each other is simply not true for me, and also removes objectivity which is something I’d guess most reviewers pride themselves on.
Lastly, while taste memories can be very profound (I have many), they are not exactly accurate. Way too much subjectivity in being able to rate a wine against your own memory of a previous wine.
I am also a (extremely bad) chess player, and while the ELO rating system works well for that game, it was developed as a result of the game always being a head to head contest. Wine is almost never a head to head contest for me, so this system just doesn’t make sense for someone like me to use.
I mean fair enough. The thing is, any wine is scored with the premise that there is an underlying objective quality that can be judged, which in many ways is unrelated to style. An excellent Champagne will be a better wine than a mediocre red. While each individual assessment is subjective however, one would imagine that an objectively better wine will tend to outcompete (on average) wines that are not as good. This allows for a comparison of wines on their merits as experienced by the individual, adjusted for the experiences of peers, without the interference of differing interpretations of the scale to be used. As such I’m not really seeing how it removes objectivity.
Not claiming the methodology is perfect of course, it’s not, but I think the results stand a better chance of actually reflecting the quality of the wine in this way. If a comparison is impossible or seems unfair, a wine can also draw, which also generates data influencing the scores of the respective wines, relative to their past performance.
Flawed assumption. Almost all of the wines in any reputable wine store (not the Big Box Places), are all high-quality wines. The differences are mainly about personal palate preferences for style, and the context where they will be opened.
But what is the goal of this app? To help novices make better purchasing decisions?
In my example, the 2 wines would have scored the same, let’s say 93pts, but I preferred the champagne because I like champagne more than I like cab franc. Why does the cabs rating suffer, and wouldn’t a point scoring system be more objective in that sense? You are scoring the wine based on its comparisons to the other wines of the style and region that you (presumably a wine critic, not just some guy who drinks) have tasted over your career, and you are considering all of that when you score.
Also just want to note that I don’t think the 100 point system is great or even the best way to rate wines, and personally, Ive never assigned a point value to any wine and I have no interest in doing that, nor do I use critics scores to decide which wines to drink.
If the goal is to provide better information to wine novices, I have long thought that understanding a few basic parameters would help them make better choices. I.E. do they want a wine that is high/low in alcohol/body, acid, tannin, oak. Ideally they would communicate those preferences to a human sommelier/retail workers, but they could also be measured in a crude way in an app to help them distinguish among options on a list, based on what they have liked in the past.
Indeed it is flawed (and for clarity, I do not apply scores to wines).
However is it potentially less flawed than the current mish-mash of tasters and scoring methodologies? Yes the potential is absolutely there. Not enough for me to consider assigning scores, as I find it a flawed process for even personal consumption, but ever more so if anyone else were to consider my scores useful.
Where it could be more useful is in evening out the ludicrously high scores of some critics, with those that try to utilise more of the available scale.
Palate preference is one key hurdle this won’t on its own resolve. For that a different approach might analyse scores for different wines and develop a palate alignment profile, assessing the correlation between different tasters. Where there is good correlation, the TNs of that person will be far more relevant useful, than one where there’s little (or indeed negative) correlation. Such a system applied to an app like Cellartracker, could propose ‘tasting buddies’. People you may never have met, but who drink similar wines, and indeed enjoy similar wines (& dislike similar wines). A truly massive data exercise, but one that might interest someone to undertake.
Also absolutely useful to ask the question about the aim. Simply to try and even out the variable nature of scoring methodologies seems a valid aim, and if acknowledging it’s still not resolving palate variability, then that’s a good honest aim.
Wouldn’t that suggest that all scores (whether critic or community generated) are purely subjective? If so that is certainly not how wine scores are treated, or conveyed to consumers who read scores on shelf talkers or in media.
The aim of the app is merely a proof of concept, to spark discussion about possible ways to improve the way we go about communicating the qualities of a wine while recognising that we are not likely to get rid of scoring altogether. There is undoubtedly marketing power in scores, so why not strive for accuracy to the extent possible?
That’s it really. The aim is to see if we can improve on something that many freely acknowledge is flawed, yet don’t currently have any alternatives for. This is just a proof of concept to see if it can be done differently. It’s not going to take the variability of independent tasters out of the picture. We are too varied for that and will never reach true consensus on matters of taste, yet scores are often portrayed as hard data even though it’s really not. This way removes one variable (the arbitrary scale and assignment of the qualities of wine into discrete units) which gets in the way of the primary question people want to know when using scores. Which wine is better?
Of course they are! Nobody serious ever suggested otherwise. These critics aren’t getting Objective Truth that descends from the heavens.
Critics of all kinds are experts in their field, whether wine, food, music, literature, theater, cinema, etc etc. They communicate their subjective evaluations to people who do not have the same time to devote to the field. A casual consumer will look at the rough scores to see if the wine/movie/book/etc is ‘good’ or ‘bad’ and that’s probably enough for them. A more educated consumer will read the critics’ commentary to see if the style suits their preferences.
If you want a system to determine which wine is “better” then first you need to answer the question “better for what”? That is where your analogy with chess/sports falls down, because there are winners in chess/sports. But there is no equivalent in wine/food/books, etc, where it is all about which one people prefer. And those preferences cannot be separated from subjective taste and context.
That said, there is probably room for more systematic generalizations to help the newbs. As noted before, you could think about preferences for different characteristics of wine (alcohol, acid, tannin, etc), and the context/sub-genres (garden aperitif, steakhouse dinner, etc etc).