[Tutor] name shortening in a csv module output

Alan Gauld alan.gauld at btinternet.com
Thu Apr 23 23:09:50 CEST 2015


On 23/04/15 19:14, Jim Mooney wrote:
>>
>> By relying on the default when you read it, you're making an unspoken
>> assumption about the encoding of the file.

>
> So is there any way to sniff the encoding, including the BOM (which appears
> to be used or not used randomly for utf-8), so you can then use the proper
> encoding, or do you wander in the wilderness?

Pretty much guesswork.

The move from plain old ASCII to Unicode (and others) has made the 
handling of text much more like binary. You have to know the binary 
format/encoding to know how to decode binary data. Its the same with 
text, if you don't know what produced it, and in what format, then you 
have to guess.

There are some things you can do to check your results (such as try 
spell checking the results) and you can try checking the characters 
against the Unicode mappings to see if the sequences look sane.
(for example a lot of mixed alphabets - like arabic, greek and
latin - suggests you guessed wrong!) But none of it is really
reliable.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list