[Tutor] Struct and UTF-16

Liam Clarke ml.cyresse at gmail.com
Mon Oct 3 13:03:07 CEST 2005

Hi Kent,

> >         return self.location
> >     def sLoc(self, value):
> >         #Need to coerce data into UTF-16 here
> >         self.mhod.payload = value.encode("UTF-16")
> I'm confused about what sLoc is supposed to do. Shouldn't it be setting self.location? ISTM sLoc should parallel what __init__ does. What is value here? If it is an Mhod then you should just do
>   self.location = unicode(mhod.payload, "UTF-16")
> again.

Sorry, that's some bad typos. Value should be val, but I've changed that anyway.
 Yeah, sLoc should set self.location, but I also want it to feed the
"'\xff\xfe:\x00i\x00P\x00o\x00d\x00_\x00C" string into the Mhod object
to be written out into a binary file later.

if not isinstance(value, unicode):
        value = unicode(value)

seems to do the job nicely, so far for my limited test runs.

Basically, I want the Mhod object to be naive as to what encoding the
data it's carrying is, so I want all encoding/decoding issues handled
at the Song object level.

>   self.payload = unicode(value, 'ascii').encode('UTF-16')
> (though see my previous reply about utf-16 vs utf-16be and utf-16le)

> >>>>x.payload
> >
> > ':\x00i\x00P\x00o\x00d\x00_\x00C\x00o\x00n\x00t\x00r\x00o\x00l
> > \x00:\x00M\x00u\x00s\x00i\x00c\x00:\x00F\x004\x004\x00:\x00L
> > \x00W\x00B\x00R\x00.\x00m\x00p\x003\x00' #Line breaks added.
> This is utf-16le

Hmm... interesting. As I may have mentioned, the knowledge of the file
structure I'm working with is incomplete. Thanks for that.

> > Which, as you mention above, is likely to cause me errors. And
> > apparently "Hi Bob!" is an 8 bit string encoded in UTF-16...
> No, what gives you that idea? It is an 8-bit string encoded in ASCII.

Oops, sorry, it's all becoming clear now.

>You are still confused about when to use encode vs decode
>encode goes *away* from unicode
>decode goes *towards* unicode

I can't believe how dense I am. I think I'll blow that up and stick it
on my wall, for any further Unicode work. (I'd like to blame the fact
that I'm giving up smoking, but it was just me not seeing the wood for
the trees.)

> The references are good too, particularly
> Roman Czyborra wrote another explanation of Unicode's basic principles; it's at <http://czyborra.com/unicode/characters.html>. Czyborra has written a number of other Unicode-related documentation, available from <http://www.cyzborra.com>.
> Two other good introductory articles were written by Joel Spolsky <http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff <http://www.jorendorff.com/articles/unicode/>. If this introduction didn't make things clear to you, you should try reading one of these alternate articles before continuing.
> And my own essay has more references at the end:
> http://personalpages.tds.net/~kent37/blog/stories/14.html
> Keep trying, eventually the mists will clear...this is confusing stuff.

Thanks, I've got some light reading for work tomorrow. *grin*

As for the confusion, much thanks for your patience and effort, it's
making a lot more sense now.

I can see potential problems with converting unexpected encodings to
unicode, but according to what I'm reading, that seems to be an issue
that's not easily resolved, so hey, I can rest easy that I'm not
alone, and while I wait for the 0.01% when it throws an error, work
out a way to fix it, patent it and try to make millions by suing
anyone who incorporates it into an open-source project, just like SCO.

Thanks once again,

Liam Clarke

More information about the Tutor mailing list