os.walk the apostrophe and unicode
MRAB
python at mrabarnett.plus.com
Sat Jun 24 15:25:53 EDT 2017
On 2017-06-24 19:57, Rod Person wrote:
> Hi,
>
> I'm working on a program that will walk a file system and clean the id3
> tags of mp3 and flac files, everything is working great until the
> follow file is found
>
> '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
>
> for some reason that I can't understand os.walk() returns this file
> name as
>
> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'
>
> which then causes more hell than a little bit for me. I'm not
> understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> about it.
>
> The script is Python 3, the file system it is running on is a hammer
> filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS which
> runs some kind of Linux so it probably ext3/4. The files came from
> various system (Mac, Windows, FreeBSD).
>
If you treat it as a bytestring b'\xe2\x80\x99' and decode it:
>>> c = b'\xe2\x80\x99'.decode('utf-8')
>>> ascii(c)
"'\\u2019'"
>>> import unicodedata
>>> unicodedata.name(c)
'RIGHT SINGLE QUOTATION MARK'
It's not an apostrophe, it's '\u2019' ('\N{RIGHT SINGLE QUOTATION MARK}').
It looks like the filename is encoded as UTF-8, but Python thinks that
the filesystem encoding is something like Latin-1.
More information about the Python-list
mailing list