Python parsing iTunes XML/COM

John Machin sjmachin at lexicon.net
Thu Jul 31 22:41:38 CEST 2008


On Jul 31, 11:54 pm, william tanksley <wtanksle... at gmail.com> wrote:
> John Machin <sjmac... at lexicon.net> wrote:
> > william tanksley <wtanksle... at gmail.com> wrote:
> > > "Buffett Time - Annual Shareholders\xc2\xa0L.mp3"
> > > 1. This isn't Unicode; it's missing the u"" (I printed using repr).
> > > 2. It's got the UTF-8 bytes there in the middle.
> > > In addition to the above results,
> > *WHAT* results? I don't see any repr() output, just your
> > interpretation of what you think you saw!
>
> That *is* the repr. I said it's the repr, and it IS. It's not an
> interpretation; it's a screenscrape. Really, truly. If I paste it in
> again it'll look the same.
>
> What do you want? Can I post something that will convince you it's a
> repr?
>

Let's try again:

>> # track_id = url2pathname(urlparse(track_id).path)
>> print repr(track_id)
>> parse_result = urlparse(track_id).path
>> print repr(parse_result)
>> track_id_replacement = url2pathname(parse_result)
>> print repr(track_id_replacement)

> The "important" value here is track_id_replacement; it contains the
> data that's throwing me. It appears that some UTF-8 characters are
> being read as multiple bytes by ElementTree rather than being decoded
> into Unicode.

> Here's one example. The others are similar -- they have the same
> things that look like problems to me.

> "Buffett Time - Annual Shareholders\xc2\xa0L.mp3"

ROTFL! I thought the Buffett thing was a Windows filename! What I was
expecting was THREE lots of repr() output, and I'm quite unused to
seeing repr() output with quotes around it instead of apostrophes; how
did you achieve that?

So you're saying that track_id_replacement contains utf8 characters.
It is obtained by track_id_replacement = url2pathname(parse_result).
You don't show us what is in parse_result. url2pathname() is nothing
to do with ElementTree. urlparse() is nothing to do with ElementTree.
You have provided no evidence that ElementTree is doing what you
accuse it of.

Please try again. Backtrack in your code to where you are pulling the
url out of an element. Do print repr(some_element.some_attribute).
Show us.



More information about the Python-list mailing list