[Tutor] name shortening in a csv module output

Thu Apr 23 14:05:31 CEST 2015

Jim Mooney wrote:

> ..
> 
>> Ï»¿
>>
>> is the UTF-8 BOM (byte order mark) interpreted as Latin 1.
>>
>> If the input is UTF-8 you can get rid of the BOM with
>>
>> with open("data.txt", encoding="utf-8-sig") as csvfile:
>>
> 
> Peter Otten
> 
> I caught the bad arithmetic on name length, but where is the byte order
> mark coming from? 

Did you touch the data with an editor? That might be the culprit.

> My first line is plain English so far as I can see - no
> umlauts or foreign characters.
> first_name|last_name|email|city|state or region|address|zip
> 
> Is this an artifact of csv module output, or is it the data from
> generatedata.com, which looks global? More likely it means I have to
> figure out unicode ;'(