convert Unicode filenames to good-looking ASCII

coldpizza vriolk at gmail.com
Thu May 6 12:55:16 EDT 2010


Cool! Thanks to both Iliya and Peter!

On May 6, 7:34 pm, Peter Otten <__pete... at web.de> wrote:
> coldpizza wrote:
> > Hello,
>
> > I need to convert accented unicode chars in some audio files to
> > similarly-looking ascii chars. Looks like the following code seems to
> > work on windows:
>
> > import os
> > import sys
> > import glob
>
> > EXT = '*.*'
>
> > lst_uni = glob.glob(unicode(EXT))
>
> > os.system('chcp 437')
> > lst_asci = glob.glob(EXT)
> > print sys.stdout.encoding
>
> > for i in range(len(lst_asci)):
> >     try:
> >         os.rename(lst_uni[i], lst_asci[i])
> >     except Exception as e:
> >         print e
>
> > On windows it converts most of the accented chars from the latin1
> > encoding. This does not work in Linux since it uses 'chcp'.
>
> > The questions are (1) *why* does it work on windows, and (2) what is
> > the proper and portable way to convert unicode characters to similarly
> > looking plain ascii chars?
>
> > That is how to properly do this kind of conversion?
> >  ü  > u
> >  é  > e
> >  â  > a
> >  ä  > a
> >  à  > a
> >  á  > a
> >  ç  > c
> >  ê  > e
> >  ë  > e
> >  è  > e
>
> > Is there any other way apart from creating my own char replacement
> > table?
> >>> from unicodedata import normalize
> >>> s = u"""ü  > u
>
> ...  é  > e
> ...  â  > a
> ...  ä  > a
> ...  à  > a
> ...  á  > a
> ...  ç  > c
> ...  ê  > e
> ...  ë  > e
> ...  è  > e
> ... """>>> from unicodedata import normalize
> >>> print normalize("NFD", s).encode("ascii", "ignore")
>
> u  > u
>  e  > e
>  a  > a
>  a  > a
>  a  > a
>  a  > a
>  c  > c
>  e  > e
>  e  > e
>  e  > e




More information about the Python-list mailing list