os.walk the apostrophe and unicode
Peter Otten
__peter__ at web.de
Sat Jun 24 17:17:07 EDT 2017
Rod Person wrote:
> On Sat, 24 Jun 2017 21:28:45 +0200
> Peter Otten <__peter__ at web.de> wrote:
>
>> Rod Person wrote:
>>
>> > Hi,
>> >
>> > I'm working on a program that will walk a file system and clean the
>> > id3 tags of mp3 and flac files, everything is working great until
>> > the follow file is found
>> >
>> > '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
>> >
>> > for some reason that I can't understand os.walk() returns this file
>> > name as
>> >
>> > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
>> > Progress).flac'
>> >
>> > which then causes more hell than a little bit for me. I'm not
>> > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
>> > about it.
>>
>> >>> b"\xe2\x80\x99".decode("utf-8")
>> '’'
>> >>> unicodedata.name(_)
>> 'RIGHT SINGLE QUOTATION MARK'
>>
>> So it's '’' rather than "'".
>>
>> > The script is Python 3, the file system it is running on is a hammer
>> > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
>> > which runs some kind of Linux so it probably ext3/4. The files came
>> > from various system (Mac, Windows, FreeBSD).
>>
>> There seems to be a mismatch between the assumed and the actual file
>> system encoding somewhere in this mix. Is this the only glitch or are
>> there similar problems with other non-ascii characters?
>>
>
> This is the only glitch as in file names so far.
>
Then I'd fix the name manually...
More information about the Python-list
mailing list