Python parsing iTunes XML/COM

william tanksley wtanksleyjr at
Wed Jul 30 16:58:36 CEST 2008

Thank you for the response. Here's some more info, including a little
that you didn't ask me for but which might be useful.

John Machin <sjmac... at> wrote:
> william tanksley <wtanksle... at> wrote:
> > To ask another way: how do I convert from a file:// URL to a local
> > path in a standard way, so that filepaths from two different sources
> > will work the same way in a dictionary?
> > The problems occur when the filenames have non-ascii characters in
> > them -- I suspect that the URLs are having some encoding placed on
> > them that Python's decoder doesn't know about.

> # track_id = url2pathname(urlparse(track_id).path)
> print repr(track_id)
> parse_result = urlparse(track_id).path
> print repr(parse_result)
> track_id_replacement = url2pathname(parse_result)
> print repr(track_id_replacement)

The "important" value here is track_id_replacement; it contains the
data that's throwing me. It appears that some UTF-8 characters are
being read as multiple bytes by ElementTree rather than being decoded
into Unicode. Could this be a bug in ElementTree's Unicode support? If
so, can I work around it?

Here's one example. The others are similar -- they have the same
things that look like problems to me.

"Buffett Time - Annual Shareholders\xc2\xa0L.mp3"

Note some problems here:

1. This isn't Unicode; it's missing the u"" (I printed using repr).
2. It's got the UTF-8 bytes there in the middle.

I tried doing track_id.encode("utf-8"), but it doesn't seem to make
any difference at all.

Of course, my ultimate goal is to compare the track_id to the track_id
I get from iTunes' COM interface, including hashing to the same value
for dict lookups.

> and copy/paste the results into your next posting.

In addition to the above results, while trying to get more diagnostic
printouts I got the following warning from Python:

C:\projects\podcasts\podstrand\ UnicodeWarning: Unicode
equal comparison failed to convert both arguments to Unicode -
interpreting them as being unequal
  return track.databaseID == trackLocation

The code that triggered this is as follows:

if trackLocation in self.podcasts:
    track = self.podcasts[trackLocation]
    if trackRelease:
        track.release_date = trackRelease
    elif track.is_podcast:
        print "No release date:", repr(
    # For the sake of diagnostics, try to find the track.
    def track_has_location(track):
        return track.databaseID == trackLocation
    fillers = filter(track_has_location, self.fillers)
    if len(fillers):
    disabled = filter(track_has_location, self.deferred)
    if len(disabled):
    print "Location not known:", repr(trackLocation)


More information about the Python-list mailing list