urllib unqoute providing string mismatch between string found using os.walk (Python3)
Richard Damon
Richard at Damon-Family.org
Sat Dec 21 21:36:16 EST 2019
On 12/21/19 8:25 PM, MRAB wrote:
> On 2019-12-22 00:22, Michael Torrie wrote:
>> On 12/21/19 2:46 PM, Ben Hearn wrote:
>>> These 2 paths look identical, one from the drive & the other from an
>>> xml url:
>>> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
>>> ¡Móchate! _PromoMix_.wav'
>> ^^
>>> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate!
>>> _PromoMix_.wav'
>> ^^
>> They are actually are different strings. The name is spelled
>> differently between the two. Móchate vs Móchate (the former seems to
>> be the correct spelling according to my inline spell checker). Is this
>> from your own program? I wonder how it got switched?
>>
> Use the 'ascii' function to see what's different:
>
> >>> ascii(a)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1Mo\\u0301chate! _PromoMix_.wav'"
> >>> ascii(b)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1M\\xf3chate! _PromoMix_.wav'"
> >>>
It is a Unicode Normalization issue. A number of characters can be
'spelled' different ways.
ó can be either a single codepoint U+00F3, or it can be the pair of
codepoints, the o and U+0301 (the accent).
If you want to make the strings compare equal then you need to make sure
that you have normalized both strings the same way. I beleive that the
Mac OS always converts file names into the NFD format when it uses them
(that is what the first (a) string is in)
--
Richard Damon
More information about the Python-list
mailing list