convert Unicode filenames to good-looking ASCII

Peter Otten __peter__ at web.de
Thu May 6 12:34:42 EDT 2010


coldpizza wrote:

> Hello,
> 
> I need to convert accented unicode chars in some audio files to
> similarly-looking ascii chars. Looks like the following code seems to
> work on windows:
> 
> import os
> import sys
> import glob
> 
> EXT = '*.*'
> 
> lst_uni = glob.glob(unicode(EXT))
> 
> os.system('chcp 437')
> lst_asci = glob.glob(EXT)
> print sys.stdout.encoding
> 
> for i in range(len(lst_asci)):
>     try:
>         os.rename(lst_uni[i], lst_asci[i])
>     except Exception as e:
>         print e
> 
> On windows it converts most of the accented chars from the latin1
> encoding. This does not work in Linux since it uses 'chcp'.
> 
> The questions are (1) *why* does it work on windows, and (2) what is
> the proper and portable way to convert unicode characters to similarly
> looking plain ascii chars?
> 
> That is how to properly do this kind of conversion?
>  ü  > u
>  é  > e
>  â  > a
>  ä  > a
>  à  > a
>  á  > a
>  ç  > c
>  ê  > e
>  ë  > e
>  è  > e
> 
> Is there any other way apart from creating my own char replacement
> table?

>>> from unicodedata import normalize
>>> s = u"""ü  > u
...  é  > e
...  â  > a
...  ä  > a
...  à  > a
...  á  > a
...  ç  > c
...  ê  > e
...  ë  > e
...  è  > e
... """
>>> from unicodedata import normalize
>>> print normalize("NFD", s).encode("ascii", "ignore")
u  > u
 e  > e
 a  > a
 a  > a
 a  > a
 a  > a
 c  > c
 e  > e
 e  > e
 e  > e




More information about the Python-list mailing list