Search issue : Accent marks

I just figured out today that accent marks throw off searches. For example, searching for Chateau Blah-Blah and Château Blah-Blah get two different sets of results. Is there any software fix for this?



Most software doesn’t handle this well. I actually go to rather great lengths in CT to make accent insensitive search.

Eric, I worked on the initial playlists for Brasilian music for the rollout of iTunes and this was a NIGHTMARE as the data base would have as many as five different spellings of most of the classic stuff (all of which had been taken from liner notes, many using SOME accents but omitting ç for instance). To make a playlist of 500 classic songs should have taken about an hour but took at least 8.

Will software EVER catch up to this and what do the Vietnamese or Scandanavians do for God’s sake?

You just need to listen to more Lynyrd Skynyrd

Turn it up.

Roberto, there are very good tools to allow a developer to make the software just “do the right thing”, but it takes a very concerted effort on the part of the developer. In the case of wine, for the “Chateau” versus “Château” example alone, it is entirely clear that it is important and necessary to handle it. The way that most sites deal with it is by just flattening out all extended characters which is a shame…

For this site, I might suggest to Todd that there may be an “accent sensitivity” setting that can be perhaps flipped. If not then we are SOL.

Guys, one of my favorite songs EVER:


Saw them MANY times before Ronnie died…

Excellent song.

fuckin’ A

An advanced search option would be much appreciated, too. It’s very hard to pull up some threads easily.

Ah, the problem of different characters and character sets. I encounter this every day as a journalist working on an electronic product. Cut and paste a word with an apostrophe and it renders on the product as a question mark.

I remember when the euro was introduced and the European edition of the Wall Street Journal kept printing figures like &&E256 – some errant code that was supposed to generate the euro symbol.

This isn;t a lower versus ASCII issue or codepage issue. Its just that the software assumes that “â != a” when it really is for our purposes.

Was that semi-colon a subtle commentary on the problem, Eric? :wink:

I realize the search problem is not a character set issue. But for the native-English-speaking layman it comes down to the same thing: Why are there so many different kinds of A’s and C’s, etc., when there’s just one of each on my keyboard?