[Tutor] Re: How to read unicode strings from a binary file and display them as plain ascii?

Thu Mar 3 05:42:33 CET 2005

R. Alan Monroe wrote:
>>R. Alan Monroe wrote:
>>
>>>I started writing a program to parse the headers of truetype fonts to
>>>examine their family info. But I can't manage to print out the strings
>>>without the zero bytes in between each character (they display as a
>>>black block labeled 'NUL' in Scite's output pane)
>>>
>>>I tried:
>>>     stuff = f.read(nlength)
>>>     stuff = unicode(stuff, 'utf-8')
> 
> 
>>   If there are embeded 0's in the string, it won't be utf8, it could be 
>>utf16 or 32.
>>   Try:
>>        unicode(stuff, 'utf-16')
>>or
>>        stuff.decode('utf-16')
> 
> 
>>>     print type(stuff), 'stuff', stuff.encode()
>>>This prints:
>>>
>>>     <type 'unicode'> stuff [NUL]C[NUL]o[NUL]p[NUL]y[NUL]r[NUL]i[NUL]g[NUL]
> 
> 
>>   I don't understand what you tried to accomplish here.
> 
> 
> That's evidence of what I failed to accomplish. My expected results
> was to print the word "Copyright" and whatever other strings are
> present in the font, with no intervening NUL characters.

   Oh but why print type(stuff) or 'stuff'?

> Aha, after some trial and error I see that I'm running into an endian
> problem. It's "\x00C" in the file, which needs to be swapped to
> "C\x00". I cheated temporarily by just adding 1 to the file pointer
> :^)

   Ah! Endianness! I completely overlook this issue! I have lost several 
hours of my life to endian problems.
   Glad to see (on another post) there is an encoding which handles 
explicitly the endianness or the encoded string.

Javier