Preprocessing not quite fixed-width file before parsing
Loris Bennett
loris.bennett at fu-berlin.de
Wed Nov 23 11:00:44 EST 2022
Hi,
I am using pandas to parse a file with the following structure:
Name fileset type KB quota limit in_doubt grace | files quota limit in_doubt grace
shortname sharedhome USR 14097664 524288000 545259520 0 none | 107110 0 0 0 none
gracedays sharedhome USR 774858944 524288000 775946240 0 5 days | 1115717 0 0 0 none
nametoolong sharedhome USR 27418496 524288000 545259520 0 none | 11581 0 0 0 none
I was initially able to use
df = pandas.read_csv(file_name, delimiter=r"\s+")
because all the values for 'grace' were 'none'. Now, however,
non-"none" values have appeared and this fails.
I can't use
pandas.read_fwf
even with an explicit colspec, because the names in the first column
which are too long for the column will displace the rest of the data to
the right.
The report which produces the file could in fact also generate a
properly delimited CSV file, but I have a lot of historical data in the
readable but poorly parsable format above that I need to deal with.
If I were doing something similar in the shell, I would just pipe the
file through sed or something to replace '5 days' with, say '5_days'.
How could I achieve a similar sort of preprocessing in Python, ideally
without having to generate a lot of temporary files?
Cheers,
Loris
--
This signature is currently under constuction.
More information about the Python-list
mailing list