[Tutor] Not-quite-unicode string, how to convert to ascii?
jtk at yahoo.com
Mon Apr 19 13:34:14 EDT 2004
I have string input in some strange encoding, some editors (Win32 TextPad)
pick it up as Unicode, linux gedit doesn't recognize the encoding and
won't load it as utf-8. Python's encode('ascii') doesn't even alter the
I can see that it is double byte, but which sub-encoding, I have no idea.
Source reads ' Batch '
>>> f = open('input.txt','r')
>>> s = f.read()
' \x00 \x00B\x00a\x00t\x00c\x00h\x00 \x00 \x00 \x00'
I was almost tempted to just iterate over the raw string and remove
'\x00' and leave it at that. The input files are about 180kb in size.
Can anyone suggest a way to convert the DBCS input to plain ascii? Thanks.
More information about the Tutor