splitting tables
Bengt Richter
bokr at oz.net
Sat Feb 7 16:56:23 EST 2004
On Sat, 7 Feb 2004 20:08:50 +0000 (UTC), robsom <no.mail at no.mail.it> wrote:
>
>Hi, I have a problem with a small python program I'm trying to write
>and I hope somebody may help me. I'm working on tables of this kind:
>
>CGA 1988 06 21 13 48 G500-050 D 509.62 J.. R1 1993 01 28 00 00 880006
>CGA 1988 06 21 14 04 G500-051 D 550.62 J.. R1 1993 01 28 00 00 880007
>
>I have to read each line of the table and put it into comma-separated
>lists like these for later manipulation:
>
>CGA,1988,06,21,13,48,G500-050,D,509.62,J..,R1,1993,01,28,00,00,880006
>CGA,1988,06,21,14,04,G500-051,D,550.62,J..,R1,1993,01,28,00,00,880007
>
>The 'split' function works pretty well, except when there is an error in
>the original data table. For example if an element is missin in a line,
>like this:
>
>CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
>CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065
>
>This error happens quite often in my dataset and the tables are too
>large to check for it manually. In this case what I get splitting the
>line string is of course this:
>
>CGA,1990,08,15,13,16,G500-105,D,524.45,J..,R1,1993,01,29,00,00,900069
>CGA,1990,08,16,01,22,D,508.06,J..,R1,1993,01,27,00,00,900065
>
>And when the program tries to work on the second list it stops (of course!).
>Is there any way to avoid this problem? This kind of error happens quite
>often in my dataset and the tables are usually too large to check for it
>manually. Thanks a lot for any suggestions.
>
>>> s = """\
... CGA 1990 08 15 13 16 G500-105 D 524.45 J.. R1 1993 01 29 00 00 900069
... CGA 1990 08 16 01 22 D 508.06 J.. R1 1993 01 27 00 00 900065
... """
>>> import re
>>> rxo = re.compile(
... '(...) (....) (..) (..) (..) (..) (........) (.) '
... '(......) (...) (..) (....) (..) (..) (..) (..) (......)'
... )
>>> import csv
>>> import sys
>>> writer = csv.writer(sys.stdout)
>>> for line in s.splitlines(): writer.writerow(*rxo.findall(line))
...
CGA,1990,08,15,13,16,G500-105,D,524.45,J..,R1,1993,01,29,00,00,900069
CGA,1990,08,16,01,22, ,D,508.06,J..,R1,1993,01,27,00,00,900065
To write the csv lines to a file instead of sys.stdout, substitute (untested)
file('your_csv_output_file.csv') in place of sys.stdout in the above, and get your
lines from something like (note chopping off the trailing newline)
for line in file('your_table_file'):
line = line.rstrip('\n')
instead of
for line in s.splitlines()
If you have possible short lines that create no match, you'll need to check for those
before unpacking (by using the prefixed *) into writer.writerow's arg list.
That's it for clp today ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list