a re question
James J. Besemer
jb at cascade-sys.com
Mon Sep 9 20:38:52 EDT 2002
Rajarshi Guha wrote:
>Hi,
> I have a file with lines of the format:
>
>001 Abc D Efg 123456789 7 100 09/05/2002 20:23:23
>001 Xya FGh 143557789 7 100 09/05/2002 20:23:23
>
>I am trying to extract the 9 digit field and the single digit field
>immediatley after that.
>
Regex is great but on the surface it appears to be overkill for your
application.
I would like to suggest some alternatives not using regex.
(A) IF all the fields are fixed width (up to and including the fields
of interest, but not necessarily the ones following) then you can
extract sub fields by simple indexing into the string.
E.g., assuming a single space or TAB for a separator and that variable
'line' contains one of the above data lines, then something like
line[14:23]
would extract the larger numeric field. (I may have counted wrong --
you may have to debug that fragment before using it.)
(B) If the fields are variable width (as your regex suggests) BUT always
separated by spaces or tabs, you can simply split the line into fields:
fields = line.split()
and then,
fields[4] and fields[5]
would contain the nonwhite space contents of your desired numeric
fields. The split function (in the string module) takes an optional
argument to specify separators (e.g., commas) other than whitespace.
I expect these alternatives would be faster than regex, though I have
not measured to make sure.
If I'm mistaken and the fields are all run together, without whitespace
separators, then you're stuck with regex. However, then your existing
expressions likely need more work to work right in that case.
"There's more than one way to do it!"
Regards
--jb
--
James J. Besemer 503-280-0838 voice
2727 NE Skidmore St. 503-280-0375 fax
Portland, Oregon 97211-6557 mailto:jb at cascade-sys.com
http://cascade-sys.com
More information about the Python-list
mailing list