Today, I wanted to improve our blog-title-to-permalink function, so that (French) accentuated characters are not simply stripped but rather converted to their non accentuated version. For example, "é" would be converted to "e".
After some googling and (slightly) tweaking what I found, here is the function I use:
noaccents_table = ''.join(map(chr, range(192))) + \
"AAAAAAACEEEEIIIIDNOOOOOxOUUUUYTsaaaaaaaceeeeiiiidnooooo/ouuuuyty"
def latin1_to_ascii(u_str):
return u_str.encode('latin1', 'replace').translate(noaccents_table)
As you can see, it takes a unicode string as argument. Here how you use it:
>>> latin1_to_ascii(u'évidemment')
'evidemment'
Note for later: if I ever need to do it in a more generalized way (not only for latin1), the iconv module (http://pypi.python.org/pypi/iconv) might (or might not) be useful.
Comments
Because I personally find that:
http://xxx/evidemment looks better than http://xxx/%C3%A9videmment
But the browsers now display the links with the characters converted
I didn't want to go through the trouble of detecting the encoding of the url nor handle the decoding of it, potentially accounting for varying behaviors across browsers.