issue in handling CSV data
Sharan Basappa
sharan.basappa at gmail.com
Mon Sep 9 23:07:08 EDT 2019
On Sunday, 8 September 2019 12:45:45 UTC-4, Peter J. Holzer wrote:
> On 2019-09-08 05:41:07 -0700, Sharan Basappa wrote:
> > On Sunday, 8 September 2019 04:56:29 UTC-4, Andrea D'Amore wrote:
> > > On Sun, 8 Sep 2019 at 02:19, Sharan Basappa <sharan.basappa at gmail.com> wrote:
> > > > As you can see, the string "\t"81 is causing the error.
> > > > It seems to be due to char "\t".
> > >
> > > It is not clear what format do you expect to be in the file.
> > > You say "it is CSV" so your actual payload seems to be a pair of three
> > > bytes (a tab and two hex digits in ASCII) per line.
> >
> > The issue seems to be presence of tabs along with the numbers in a single string. So, when I try to convert strings to numbers, it fails due to presence of tabs.
> >
> > Here is the hex dump:
> >
> > 22 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67
> > 74 68 2c 22 09 22 38 31 2c 22 09 35 63 0d 0a 22
> > 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74
> ...
>
> This looks like this:
>
> "address," "length," "81," 5c
> "address," "length," "04," 11
> "address," "length," "e1," 17
> "address," "length," "6a," 6c
> ...
>
> Note that the commas are within the quotes. I'd say Andrea is correct:
> This is a tab-separated file, not a comma-separated file. But for some
> reason all fields except the last end with a comma.
>
> I would
>
> a) try to convince the person producing the file to clean up the mess
>
> b) if that is not successful, use the csv module to read the file with
> separator tab and then discard the trailing commas.
>
Hi Peter,
I respectfully disagree that it is not a comma separated. Let me explain why.
If you look the following line in the code, it specifies comma as the delimiter:
########################
my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
########################
Now, if you see the print after getting the data, it looks like this:
##############################
[['"\t"81' '"\t5c']
['"\t"04' '"\t11']
['"\t"e1' '"\t17']
['"\t"6a' '"\t6c']
['"\t"53' '"\t69']
['"\t"98' '"\t87']
['"\t"5c' '"\t4b']
##############################
if you observe, the commas have disappeared. That, I think, is because it actually treated this as a CSV file.
Anyway, I am checking to see if I can discard the tabs and process this.
I will keep everyone posted.
More information about the Python-list
mailing list