convert Unicode filenames to good-looking ASCII
coldpizza
vriolk at gmail.com
Thu May 6 12:55:16 EDT 2010
Cool! Thanks to both Iliya and Peter!
On May 6, 7:34 pm, Peter Otten <__pete... at web.de> wrote:
> coldpizza wrote:
> > Hello,
>
> > I need to convert accented unicode chars in some audio files to
> > similarly-looking ascii chars. Looks like the following code seems to
> > work on windows:
>
> > import os
> > import sys
> > import glob
>
> > EXT = '*.*'
>
> > lst_uni = glob.glob(unicode(EXT))
>
> > os.system('chcp 437')
> > lst_asci = glob.glob(EXT)
> > print sys.stdout.encoding
>
> > for i in range(len(lst_asci)):
> > try:
> > os.rename(lst_uni[i], lst_asci[i])
> > except Exception as e:
> > print e
>
> > On windows it converts most of the accented chars from the latin1
> > encoding. This does not work in Linux since it uses 'chcp'.
>
> > The questions are (1) *why* does it work on windows, and (2) what is
> > the proper and portable way to convert unicode characters to similarly
> > looking plain ascii chars?
>
> > That is how to properly do this kind of conversion?
> > ü > u
> > é > e
> > â > a
> > ä > a
> > à > a
> > á > a
> > ç > c
> > ê > e
> > ë > e
> > è > e
>
> > Is there any other way apart from creating my own char replacement
> > table?
> >>> from unicodedata import normalize
> >>> s = u"""ü > u
>
> ... é > e
> ... â > a
> ... ä > a
> ... à > a
> ... á > a
> ... ç > c
> ... ê > e
> ... ë > e
> ... è > e
> ... """>>> from unicodedata import normalize
> >>> print normalize("NFD", s).encode("ascii", "ignore")
>
> u > u
> e > e
> a > a
> a > a
> a > a
> a > a
> c > c
> e > e
> e > e
> e > e
More information about the Python-list
mailing list