a re question

David LeBlanc whisper at oz.net
Mon Sep 9 19:28:09 EDT 2002


> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Rajarshi Guha
> Sent: Monday, September 09, 2002 13:11
> To: python-list at python.org
> Subject: a re question
>
>
> Hi,
>   I have a file with lines of the format:
>
> 001 Abc D Efg 123456789   7 100 09/05/2002 20:23:23
> 001 Xya FGh   143557789   7 100 09/05/2002 20:23:23
>
> I am trying to extract the 9 digit field and the single digit field
> immediatley after that.
>
> When I use Visual Regexp to try out the regexp
>
> (\d{9,} {3,}\d)
>
> it highlights the 2 fields exactly.
>
> But when I use the following Python code I get None:
>
> >> s='001 Abc D Efg 123456789   7 100 09/05/2002 20:23:23'
> >> p = re.compile(r'(\d{9,} {3,}\d)')
> >> print p.match(s)
> >> None
>
> Could anybody point out where I'm going wrong?
>
> Thanks,

Yes - match always starts looking for a match at the beginning of the line -
try p.search instead. See python22/doc/lib/matching-searching.html

I would also suggest using \s{3, } instead of what you have for clarity in
skipping the spaces between the 9 digit number and the single digit number.

You might find that putting parens around each element you want to extract a
convenient shortcut; then if you do elts = p.search(s), elts.group(1) will
be the 9 digit number and elts.group(2) will be the single digit. The
pattern would then look like:

r'(\d{9, })\s{3, }(\d)'

you could also do num9, num1 = elts.group(1,2) if you're in a hurry ;)

HTH,

Dave LeBlanc
Seattle, WA USA





More information about the Python-list mailing list