right curly quote and unicode
leo.kislov at gmail.com
Wed Oct 18 23:44:49 CEST 2006
On 10/17/06, TiNo <tinodb at gmail.com> wrote:
> Hi all,
> I am trying to compare my Itunes Library xml to the actual files on my
> As the xml file is in UTF-8 encoding, I decided to do the comparison of the
> filenames in that encoding.
> It all works, except with one file. It is named 'The Chemical
> Brothers-Elektrobank-04 - Don't Stop the Rock (Electronic Battle Weapon
> Version).mp3'. It goes wrong with the apostrophe in Don't. That is actually
> not an apostrophe, but ASCII char 180: ´
It's actually Unicode char #180, not ASCII. ASCII characters are in
> In the Itunes library it is encoded as: Don%E2%80%99t
Looks like a utf-8 encoded string, then encoded like an url.
> I do some some conversions with both the library path names and the folder
> path names. Here is the code:
> (in the comment I dispay how the Don't part looks. I got this using print
> #Once I have the filenames from the library I clean them using the following
> code (as filenames are in the format '
> filename = urlparse.urlparse(filename)[1:] # u'Don%E2%80%99t' ; side
> question, anybody who nows a way to do this in a more fashionable way?
> filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t'
This doesn't work for me in python 2.4, unquote expects str type, not
unicode. So it should be:
filename = urllib.unquote(filename.encode('ascii')).decode('utf-8')
> filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t'
> I get the files in my music folder with the os.walk method and then
> I do:
> filename = os.path.normpath(os.path.join (root,name)) # 'Don\x92t'
> filename = unicode(filename,'latin1') # u'Don\x92t'
> filename = filename.encode('utf-8') # 'Don\xc2\x92t'
> filename = unicode(filename,'latin1') # u'Don\xc2\x92t'
This looks like calling random methods with random parameters :)
Python is able to return you unicode file names right away, you just
need to pass input parameters as unicode strings:
[u'alarm', u'ARCSOFT' ...]
So in your case you need to make sure the start directory parameter
for walk function is unicode.
More information about the Python-list