Managing non-ascii filenames in python
pdenize
denize.paul at gmail.com
Sun Jul 19 22:57:50 EDT 2009
I created the following filename in windows just as a test -
“Dönåld’s™ Néphêws” deg°.txt
The quotes are non -ascii, many non english characters, long hyphen
etc.
Now in DOS I can do a directory and it translates them all to
something close.
"Dönåld'sT Néphêws" deg°.txt
I thought the correct way to do this in python would be to scan the
dir
files=os.listdir(os.path.dirname( os.path.realpath( __file__ ) ))
then print the filenames
for filename in files:
print filename
but as expected teh filename is not correct - so correct it using the
file sysytems encoding
print filename.decode(sys.getfilesystemencoding())
But I get
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014'
in position 6: character maps to <undefined>
All was working well till these characters came along
I need to be able to write (a representation) to the screen (and I
don't see why I should not get something as good as DOS shows).
Write it to an XML file in UTF-8
and write it to a text file and be able to read it back in.
Again I was supprised that this was also difficult - it appears that
the file also wanted ascii. Should I have to open the file in binary
for write (I expect so) but then what encoding should I write in?
I have been beating myself up with this for weeks as I get it working
then come across some outher character that causes it all to stop
again.
Please help.
More information about the Python-list
mailing list