Managing non-ascii filenames in python

pdenize denize.paul at gmail.com
Sun Jul 19 22:57:50 EDT 2009


I created the following filename in windows just as a test -
“Dönåld’s™ Néphêws” deg°.txt
The quotes are non -ascii, many non english characters, long hyphen
etc.

Now in DOS I can do a directory and it translates them all to
something close.
"Dönåld'sT Néphêws" deg°.txt

I thought the correct way to do this in python would be to scan the
dir
files=os.listdir(os.path.dirname( os.path.realpath( __file__ ) ))

then print the filenames
for filename in files:
  print filename

but as expected teh filename is not correct - so correct it using the
file sysytems encoding

  print filename.decode(sys.getfilesystemencoding())

But I get
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014'
in position 6: character maps to <undefined>

All was working well till these characters came along

I need to be able to write (a representation) to the screen (and I
don't see why I should not get something as good as DOS shows).

Write it to an XML file in UTF-8

and write it to a text file and be able to read it back in.
Again I was supprised that this was also difficult - it appears that
the file also wanted ascii.  Should I have to open the file in binary
for write (I expect so) but then what encoding should I write in?

I have been beating myself up with this for weeks as I get it working
then come across some outher character that causes it all to stop
again.

Please help.



More information about the Python-list mailing list