What would happen if a wine critic was asked to duplicate a blind tasting on a group of wines? Would the scores be the same? For example, a critic reviews (again blind) 50 wines on day 1. On day 2, they are given exactly the same wines to review and score. Would the scores/reviews be consistent?
My guess is that they would not be consistent. Numerous justifications come to mind, all perfectly reasonable. But then I wonder how much weight can be given to such reviews if they are not repeatable.
I suspect that very few, if any, tasters (“pro” or otherwise) can consistently award comparable scores to the very same wine when they unwittingly encounter them among a slew of others.
Thomas Pellechia’s Riesling taste off (organized by Thomas and John Zuccarino) for primarily amateurs disclosed how much context affected one’s perception and scoring. Of course, one can always contrive a ‘stacked deck’ tasting situation to prove whatever theory one espouses.
In any event, I haven’t read about very many pros successfully subjecting themselves to repeatability surveys under well-controlled conditions. Have others found any such surveys?
Wow, that would be ultimately cool, but I can’t see it happening. A critic who does this risks losing his or her reputation, as the notes/ratings could be highly variable from the non-blind to the blind, but it would be so fantastic even for us non-ratings buyers.
I agree Jay. Reduce the total sample size and stick in a few duplicates (or triplicates) and I think you’ll find plenty of inconsistency…for many reasons. Even more variability if you repeat the same tasting but perhaps one or two weeks later at a different time of day (i.e. AM vs. Evening). Despite the shortage of clothes on the critic emperors, consumers have to trust somebody as there’s just too damn much homework otherwise.
The studies I’ve seen done on this (admittedly this goes back 25 years or more) suggested that the best one could reasonably hope for–meaning highly trained professionals on a good day, tasting a limited number of wines in a controlled environment–was about +/-7.5% of the point scale used.
So, as example, a professional using a 20-pt system would, in top form, have a degree of accuracy of approx +/-1.5 pts for any point score given.
I referenced the 20-pt scale because the studies I mentioned were discussed while I was at UCD and we were using (for a brief time, at least) a 20-pt scale.
The British wine writers also seem to favor a 20-pt scale.
It’s true! How many ratings below 80 points have you seen published. I would guess this is roughly .5% of all ratings, or less.
Jay makes an EXCELLENT point regarding the justifications, and I thought of the very same thing, but just as he said, that would put the original reviews (and any reviews, frankly) in question, as those same justifications would apply.
I have never participated in this type of tastings. However, when two of the same wine were served in a 30 wines session, I was able to rate within 2± points. What would be truly interesting, by removing variable, is to serve fifteen wines twice from the same region, single blind. I am certain that most professional critics would agree to participate.
I can’t tell you how many times I’ve served a magnum into two different glasses during a flight or evening. Likewise, I can’t tell you how many times I’ve heard totally different views of the “two” wines.
But Ray, with age, different bottles of the same wine show quite differently. I used to belong to a wine and food society that cellared wines to serve when mature at their dinners. I was often on the sommelier’s committee, opening the bottles from the same case to see if they were sound and then decanting them. The variation among wines drawn from the same case was often astonishing. It’s one of the reasons why I think little is gained by giving tasting notes on older bottles.
Though I fully agree it is quite an impossible task for a single taster, I have to tell you the experience of the GJE in this matter.
In 1197, we did a tasting of top Bordeaux in Dutournier restaurant in Paris; Then 2 days later I did fly to Singapore with the same wines bought at the same place, though served in a different order at the Raffles (and, of course, always blind) with 7 of my top GJE Members.
Well, results were stunning : practically, they did come out in the near to exact same order of the 10 first crus and idem for the last 10 with some mouvements in the middle.
Of course, you may say that this is more easy with a group than with individuals. Yes indeed. But many times, I put the same wine in the morning and in the afternoon, and usually, the difference is ± 1 point for the best tasters.
But, fundamentaly, do not expect from an human palate the capacity to score identically a wine which is served in different conditions : time, temperature, surroundings and so many other factors to consider.
This is the reason why I do prefer a lot from a journalist to teach me about the STYLE of the Producer (finesse against power for example) so I have a good idea about the styles of his wines.
While I agree that there can be bottle variation within the same case of wine especially with older wines, there is no variation when both glasses served come out of the same magnum. I always decant the mags and then pour the wine back and forth into two separate decanters so when I hear “the top half of a magnum tastes different than the bottom half” I can make that excuse moot as well.
I have the utmost respect for Mr. Mauss and his organizing GJE evaluations. On the other hand, I posted these sentiments on eBlob a while back.
Please don’t interpret the following as a swipe against GJE, but Matt Kramer has suggested in his > Making Sense of Wine > book that wine tasters in general are susceptible to what he calls the “low-cut dress syndrome”. Kramer goes on to revisit his discussion with a GJE organizer (François Mauss, if I recall correctly), who acknowledged that certain tasting trends in a European tasting that were duplicated in New York City lent credence to Kramer’s proposition. Perhaps no palate is infallible?
The UCDavis 20-pt system was chosen on the basis of studies that indicated that trained tasters, using the Davis system,
could only reliably distinguish 20 degrees of gradation in quality.
A very important point to make is that the UCDavis system is not a hedonic scoring system, which is a very much different
type of beast. Refer to Amerine & Roessler (I believe) for a good discussion on scoring systems.
Tom
Kevin and Francois, I’m a fan of the GJE and similar tastings. I agree that the scores generated by an experienced group, in a controlled setting, can reduce rating uncertainty.
Do you think the ratings would be as reproducible if the 30 wines had no limits with respect to region? Vintage (see Claude’s post)? How about combined with no limit on varietals or blends? Random serving orders?
I guess the point is…is a single critic’s score truly reliable for a given wine? A rating of 86 on one day and a 90 on the next, could be the difference between the discount rack and sales by the pallet.