Python parsing iTunes XML/COM

william tanksley wtanksleyjr at gmail.com
Wed Jul 30 20:27:28 CEST 2008


"Jerry Hill" <malaclyp... at gmail.com> wrote:
> william tanksley <wtanksle... at gmail.com> wrote:
> > Here's one example. The others are similar -- they have the same
> > things that look like problems to me.
> > "Buffett Time - Annual Shareholders\xc2\xa0L.mp3"

> > I tried doing track_id.encode("utf-8"), but it doesn't seem to make
> > any difference at all.

> I don't have anything to say about your iTunes problems, but encode()
> is the wrong method to turn a byte string into a unicode string.
> Instead, use decode(), like this:

Awesome... Thank you! I had my mental model of Python turned around
backwards. That's an odd feeling. Okay, so you decode to go from raw
byes into a given encoding, and you encode to go from a given encoding
to raw bytes. Not what I thought it was, but that's cool, makes sense.

At first I thought this fixed my problem, but I had to tweak the
obvious fix to make it work, and I don't understand why.

Fix #1:

track_id = track_id.decode('utf-8')
track_id = url2pathname(urlparse(track_id).path)

That doesn't work -- it produces no error, but the raw bytes appear in
the unicode string.

Fix #2:

track_id = url2pathname(urlparse(track_id).path)
track_id = track_id.decode('utf-8')

This one appears to work. (Although I can't confirm it for sure,
because although all my debug prints are now correct, the overall
application fails in the same way it did before, back before I put in
debug printfs. I'm going to spend some time assuming that the problem
is elsewhere in my code, since at least I definitely fixed one serious
problem.)

I've got a few questions for Python-XML-Unicode experts...

1. Why does the order of those statements matter?
2. Shouldn't it be more correct to decode BEFORE transforming the
string? Why does that kill the decoding?
3. Why is ElementTree dumping raw bytes on me instead of decoding to
UTF-8? The XML file has its encoding set to:
<?xml version="1.0" encoding="UTF-8"?>, so it seems like it should
know what codec to use.

> Jerry

-Wm



More information about the Python-list mailing list