Translation table to map Latin-1 to ASCII?
reageer.in at de.nieuwsgroep
Sun Jan 26 13:54:53 CET 2003
>> Can anyone point me to a translation table for string.translate
>> to map Latin-1 (ISO 8859-1) to ASCII such that \"e maps to e
>The translation table would depend on what you want to use it for; you
>are smashing 256 different characters into 128, so you have a choice
>of losing information or emitting at least two characters per input
I want to use it for log file analysis and statistical reporting
of an ht://Dig search engine. I've configured the search engine
itself to use the 'accents' algorithm: "This algorithm will
treat all accented letters as equivalent to their unaccented
Now I want to mimic this behavior in the log analyzer that I'm
writing in Python.
For example, if someone from Germany searches for \"ubersetzung
and someone from Holland searches for ubersetzung I want those
to count as two searches for the same search phrase in the
>What do you want to do with all the non-alphabetic characters?
Perhaps I should rephrase the challenge: I want to map Latin-1
to Latin-1, 1-to-1 for most characters, but with accented
letters mapped to their unaccented counterparts.
>I trust this isn't a Python specific question i.e. if someone gave you
>a translation table to be used in C or some other language, you'd be
>able to Pythonise it.
Sure, and I can also write it myself or get it from ht://Dig's
source code. But I'm lazy and I was just hoping that someone
would have a similar translation table for Python's
string.translate lying around :-)
I found the following solution in this group's archive on
Google. If only I would understand how it works :-)
import unicodedata, string
# build iso-latin-1 to "undotted" ascii translation table
table = range(256)
for i in table:
x = unicodedata.decomposition(unichr(i))
if x and x == "0":
table[i] = int(x.split(), 16)
return string.join(map(chr, table), "")
text = "text with accented characters"
noaccents = maketable()
undotted_text = string.translate(text, noaccents)
Wat wil jij leren? http://www.leren.nl
More information about the Python-list