[Tutor] Re: How to read unicode strings from a binary file and display them as plain ascii?

Javier Ruere javier at ruere.com.ar
Thu Mar 3 05:42:33 CET 2005


R. Alan Monroe wrote:
>>R. Alan Monroe wrote:
>>
>>>I started writing a program to parse the headers of truetype fonts to
>>>examine their family info. But I can't manage to print out the strings
>>>without the zero bytes in between each character (they display as a
>>>black block labeled 'NUL' in Scite's output pane)
>>>
>>>I tried:
>>>     stuff = f.read(nlength)
>>>     stuff = unicode(stuff, 'utf-8')
> 
> 
>>   If there are embeded 0's in the string, it won't be utf8, it could be 
>>utf16 or 32.
>>   Try:
>>        unicode(stuff, 'utf-16')
>>or
>>        stuff.decode('utf-16')
> 
> 
>>>     print type(stuff), 'stuff', stuff.encode()
>>>This prints:
>>>
>>>     <type 'unicode'> stuff [NUL]C[NUL]o[NUL]p[NUL]y[NUL]r[NUL]i[NUL]g[NUL]
> 
> 
>>   I don't understand what you tried to accomplish here.
> 
> 
> That's evidence of what I failed to accomplish. My expected results
> was to print the word "Copyright" and whatever other strings are
> present in the font, with no intervening NUL characters.

   Oh but why print type(stuff) or 'stuff'?

> Aha, after some trial and error I see that I'm running into an endian
> problem. It's "\x00C" in the file, which needs to be swapped to
> "C\x00". I cheated temporarily by just adding 1 to the file pointer
> :^)

   Ah! Endianness! I completely overlook this issue! I have lost several 
hours of my life to endian problems.
   Glad to see (on another post) there is an encoding which handles 
explicitly the endianness or the encoded string.

Javier



More information about the Tutor mailing list