Silent data corruption in pandas, was Re: Python read text file columnwise
Peter Otten
__peter__ at web.de
Sat Jan 12 05:12:43 EST 2019
Peter Otten wrote:
> shibashibani at gmail.com wrote:
>
>> Hello
>>>
>>> I'm very new in python. I have a file in the format:
>>>
>>> 2018-05-31 16:00:00 28.90 81.77 4.3
>>> 2018-05-31 20:32:00 28.17 84.89 4.1
>>> 2018-06-20 04:09:00 27.36 88.01 4.8
>>> 2018-06-20 04:15:00 27.31 87.09 4.7
>>> 2018-06-28 04.07:00 27.87 84.91 5.0
>>> 2018-06-29 00.42:00 32.20 104.61 4.8
>>
>> I would like to read this file in python column-wise.
> However, in the long term you may be better off with a tool like pandas:
>
>>>> import pandas
>>>> pandas.read_table(
> ... "seismicity_R023E.txt", sep=r"\s+",
> ... names=["date", "time", "foo", "bar", "baz"],
> ... parse_dates=[["date", "time"]]
> ... )
> date_time foo bar baz
> 0 2018-05-31 16:00:00 28.90 81.77 4.3
> 1 2018-05-31 20:32:00 28.17 84.89 4.1
> 2 2018-06-20 04:09:00 27.36 88.01 4.8
> 3 2018-06-20 04:15:00 27.31 87.09 4.7
> 4 2018-06-28 04:00:00 27.87 84.91 5.0
> 5 2018-06-29 00:00:00 32.20 104.61 4.8
>
> [6 rows x 4 columns]
>>>>
>
> It will be harder in the beginning, but if you work with tabular data
> regularly it will pay off.
After posting the above I noted that the malformed time in the last two rows
was silently botched. So I just spent an insane amount of time to try and
fix this from within pandas:
import datetime
import numpy
import pandas
def parse_datetime(dt):
return datetime.datetime.strptime(
dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
)
def date_parser(dates, times):
return numpy.array([
parse_datetime(date + " " + time)
for date, time in zip(dates, times)
])
df = pandas.read_table(
"seismicity_R023E.txt", sep=r"\s+",
names=["date", "time", "foo", "bar", "baz"],
parse_dates=[["date", "time"]], date_parser=date_parser
)
print(df)
There's probably a better way as I am only a determined amateur...
More information about the Python-list
mailing list