[Tutor] Unicode? UTF-8? UTF-16? WTF-8? ;)

Ray Jones crawlzone at gmail.com
Wed Sep 5 16:04:04 CEST 2012


On 09/05/2012 04:52 AM, Peter Otten wrote:
> Ray Jones wrote:
>
>>
>> But doesn't that entail knowing in advance which encoding you will be
>> working with? How would you automate the process while reading existing
>> files?
> If you don't *know* the encoding you *have* to guess. For instance you could 
> default to UTF-8 and fall back to Latin-1 if you get an error. While 
> decoding non-UTF-8 data with an UTF-8 decoder is likely to fail Latin-1 will 
> always "succeed" as there is one codepoint associated with every possible 
> byte. The result howerver may not make sense. Think
>
> for line in codecs.open("lol_cat.jpg", encoding="latin1"):
>     print line.rstrip()
:))

So when glob reads and returns garbley, non-unicode file
names....\xb4\xb9....is it making a guess as to which encoding should be
used for that filename? Does Linux store that information when it saves
the file name? And (most?) importantly, how can I use that fouled up
file name as an argument in calling Dolphin?


Ray


More information about the Tutor mailing list