right curly quote and unicode

TiNo tinodb at gmail.com
Thu Oct 19 15:56:49 EDT 2006


That is actually
> > not an apostrophe, but ASCII char 180: ´
>
> It's actually Unicode char #180, not ASCII. ASCII characters are in
> 0..127 range.


Yep, that's what I ment... :D


> In the Itunes library it is encoded as: Don%E2%80%99t
>
> Looks like a utf-8 encoded string, then encoded like an url.


It is. I just found out it is unicode character 2019. So in the Itunes
library it is not unicode char 180, but it looks exactly the same...


> I do some some conversions with both the library path names and the folder
> > path names. Here is the code:
> > (in the comment I dispay how the Don't part looks. I got this using
> print
> > repr(filename))
> > -------------------------------------------------------------
> > #Once I have the filenames from the library I clean them using the
> following
> > code (as filenames are in the format '
> > file://localhost/m:/music/track%20name.mp3')
> >
> > filename = urlparse.urlparse(filename)[2][1:]  # u'Don%E2%80%99t' ; side
> > question, anybody who nows a way to do this in a more fashionable way?
> > filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t'
>
> This doesn't work for me in python 2.4, unquote expects str type, not
> unicode. So it should be:
>
> filename = urllib.unquote(filename.encode('ascii')).decode('utf-8')


 It works for me with python 2.4.3. It returns a unicode string.

> filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t'
> >
> > I get the files in my music folder with the os.walk method and then
> > I do:
> >
> > filename = os.path.normpath(os.path.join (root,name))  # 'Don\x92t'
> > filename = unicode(filename,'latin1') # u'Don\x92t'
> > filename = filename.encode('utf-8') # 'Don\xc2\x92t'
> > filename = unicode(filename,'latin1') # u'Don\xc2\x92t'
>
> This looks like calling random methods with random parameters :)


It is... Well, not totally random. I figured I needed a unicode string to be
able to encode it to utf-8 (otherwise it gives an error). After that is
appears not to be a unicode string anymore(no u in front of it), so I
decided to unicode it again....
It worked, but I now accomplish the same by just the encode line and the
following:

Python is able to return you unicode file names right away, you just
> need to pass input parameters as unicode strings:
>
> >>> os.listdir(u"/")
> [u'alarm', u'ARCSOFT' ...]
>
> So in your case you need to make sure the start directory parameter
> for walk function is unicode.


That does not matter much for me. Then I will have to convert the path name
to unicode, as it is user input. (ok, it still saves me converting a string
to unicode a thousand times, so I'll do it :)

Now I know where the problem lies. The character in the actual file path is
u+00B4 (Acute accent) and in the Itunes library it is u+2019 (a right curly
quote). Somehow Itunes manages to make these two the same...?

As it is the only file that gave me trouble, I changed the accent in the
file to an apostrophe and re-imported it in Itunes. But I would like to hear
if there is a solution for this problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20061019/30851145/attachment.html>


More information about the Python-list mailing list