issue in handling CSV data
Piet van Oostrum
piet-l at vanoostrum.org
Tue Sep 10 05:34:12 EDT 2019
Sharan Basappa <sharan.basappa at gmail.com> writes:
>>
>> Note that the commas are within the quotes. I'd say Andrea is correct:
>> This is a tab-separated file, not a comma-separated file. But for some
>> reason all fields except the last end with a comma.
>>
However, genfromtxt is not a full-fledged CSV parser. It does not obey quotes. So the commas inside the quotes ARE treated as separators.
> Hi Peter,
>
> I respectfully disagree that it is not a comma separated. Let me explain why.
> If you look the following line in the code, it specifies comma as the delimiter:
>
> ########################
> my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> ########################
>
> Now, if you see the print after getting the data, it looks like this:
>
> ##############################
> [['"\t"81' '"\t5c']
> ['"\t"04' '"\t11']
> ['"\t"e1' '"\t17']
> ['"\t"6a' '"\t6c']
> ['"\t"53' '"\t69']
> ['"\t"98' '"\t87']
> ['"\t"5c' '"\t4b']
> ##############################
1) Where did the other fields (address, length) go?
>
> if you observe, the commas have disappeared. That, I think, is because
> it actually treated this as a CSV file.
2) As I said above, if you choose ',' as separator, these will disappear. Similarly, if you choose TAB as seperator, the TABs will disappear. As the format is a strange mixture of the two, you can use either one. But if it would be read with a real CSV-reader, that obeys the quote convention, than using ',' as seperator will not work. Only TAB will work.
But in both cases you would have to do some pre- or post-processing to get the data as you want them.
> Anyway, I am checking to see if I can discard the tabs and process this.
> I will keep everyone posted.
--
Piet van Oostrum <piet-l at vanoostrum.org>
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
More information about the Python-list
mailing list