os.walk the apostrophe and unicode
Peter Otten
__peter__ at web.de
Sat Jun 24 15:28:45 EDT 2017
Rod Person wrote:
> Hi,
>
> I'm working on a program that will walk a file system and clean the id3
> tags of mp3 and flac files, everything is working great until the
> follow file is found
>
> '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
>
> for some reason that I can't understand os.walk() returns this file
> name as
>
> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'
>
> which then causes more hell than a little bit for me. I'm not
> understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> about it.
>>> b"\xe2\x80\x99".decode("utf-8")
'’'
>>> unicodedata.name(_)
'RIGHT SINGLE QUOTATION MARK'
So it's '’' rather than "'".
> The script is Python 3, the file system it is running on is a hammer
> filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS which
> runs some kind of Linux so it probably ext3/4. The files came from
> various system (Mac, Windows, FreeBSD).
There seems to be a mismatch between the assumed and the actual file system
encoding somewhere in this mix. Is this the only glitch or are there similar
problems with other non-ascii characters?
More information about the Python-list
mailing list