convert Unicode filenames to good-looking ASCII
Peter Otten
__peter__ at web.de
Thu May 6 12:34:42 EDT 2010
coldpizza wrote:
> Hello,
>
> I need to convert accented unicode chars in some audio files to
> similarly-looking ascii chars. Looks like the following code seems to
> work on windows:
>
> import os
> import sys
> import glob
>
> EXT = '*.*'
>
> lst_uni = glob.glob(unicode(EXT))
>
> os.system('chcp 437')
> lst_asci = glob.glob(EXT)
> print sys.stdout.encoding
>
> for i in range(len(lst_asci)):
> try:
> os.rename(lst_uni[i], lst_asci[i])
> except Exception as e:
> print e
>
> On windows it converts most of the accented chars from the latin1
> encoding. This does not work in Linux since it uses 'chcp'.
>
> The questions are (1) *why* does it work on windows, and (2) what is
> the proper and portable way to convert unicode characters to similarly
> looking plain ascii chars?
>
> That is how to properly do this kind of conversion?
> ü > u
> é > e
> â > a
> ä > a
> à > a
> á > a
> ç > c
> ê > e
> ë > e
> è > e
>
> Is there any other way apart from creating my own char replacement
> table?
>>> from unicodedata import normalize
>>> s = u"""ü > u
... é > e
... â > a
... ä > a
... à > a
... á > a
... ç > c
... ê > e
... ë > e
... è > e
... """
>>> from unicodedata import normalize
>>> print normalize("NFD", s).encode("ascii", "ignore")
u > u
e > e
a > a
a > a
a > a
a > a
c > c
e > e
e > e
e > e
More information about the Python-list
mailing list