Reading 'scientific' csv using Pandas?
Thomas Jollans
tjol at tjol.eu
Mon Nov 19 08:08:28 EST 2018
On 2018-11-18 19:22, Martin Schöön wrote:
> Den 2018-11-18 skrev Shakti Kumar <shakti.shrivastava13 at gmail.com>:
>> On Sun, 18 Nov 2018 at 18:18, Martin Schöön <martin.schoon at gmail.com> wrote:
>>>
>>> Now I hit a bump in the road when some of the data is not in plain
>>> decimal notation (xxx,xx) but in 'scientific' (xx,xxxe-xx) notation.
>>>
>>
>> Martin, I believe this should be done by pandas itself while reading
>> the csv file,
>> I took an example in scientific notation and checked this out,
>>
>> my sample.csv file is,
>> col1,col2
>> 1.1,0
>> 10.24e-05,1
>> 9.492e-10,2
>>
> That was a quick answer!
>
> My pandas is up to date.
>
> In your example you use the US convention of using "." for decimals
> and "," to separate data. This works perfect for me too.
>
> However, my data files use European conventions: decimal "," and TAB
> to separate data:
>
> col1 col2
> 1,1 0
> 10,24e-05 1
> 9,492e-10 2
>
> I use
>
> EUData = pd.read_csv('file.csv', skiprows=1, sep='\t',
> decimal=',', engine='python')
>
> to read from such files. This works so so. 'Common floats' (3,1415 etc)
> works just fine but 'scientific' stuff (1,6023e23) does not work.
>
> /Martin
>
This looks like a bug in the 'python' engine specifically. I suggest you
write a bug report at https://github.com/pandas-dev/pandas/issues
(conda:nb) /tmp
0:jollans at mn70% cat test.csv
Index Value
0 1,674
1 3,48e+3
2 8,1834e-10
3 3984,109
4 2830812370
(conda:nb) /tmp
0:jollans at mn70% ipython
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.1.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t')
Out[2]:
Value
Index
0 1.674000e+00
1 3.480000e+03
2 8.183400e-10
3 3.984109e+03
4 2.830812e+09
In [3]: pd.read_csv('test.csv', header=[0], index_col=0, decimal=',',
sep='\t', engine='python')
Out[3]:
Value
Index
0 1.674
1 3,48e+3
2 8,1834e-10
3 3984.109
4 2830812370
In [4]: pd.__version__
Out[4]: '0.23.4'
--
Cheers,
Thomas
More information about the Python-list
mailing list