Multiline regex help

Yatima yatima_ at konishi.polis.net
Thu Mar 3 21:03:57 CET 2005


On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <steven.bethard at gmail.com> wrote:
>
> A possible solution, using the re module:
>
> py> s = """\
> ... Gibberish
> ... 53
> ... MoreGarbage
> ... 12
> ... RelevantInfo1
> ... 10/10/04
> ... NothingImportant
> ... ThisDoesNotMatter
> ... 44
> ... RelevantInfo2
> ... 22
> ... BlahBlah
> ... 343
> ... RelevantInfo3
> ... 23
> ... Hubris
> ... Crap
> ... 34
> ... """
> py> import re
> py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
> ...                    .*
> ...                    ^RelevantInfo2\n([^\n]*)
> ...                    .*
> ...                    ^RelevantInfo3\n([^\n]*)""",
> ...                re.DOTALL | re.MULTILINE | re.VERBOSE)
> py> score = {}
> py> for info1, info2, info3 in m.findall(s):
> ...     score.setdefault(info1, {})[info3] = info2
> ...
> py> score
> {'10/10/04': {'23': '22'}}
>
> Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE 
> to have ^ apply at the start of each line, and VERBOSE to allow me to 
> write the re in a more readable form.
>
> If I didn't get your dict update quite right, hopefully you can see how 
> to fix it!

Thanks! That was very helpful. Unfortunately, I wasn't completely clear when
describing the problem. Is there anyway to extract multiple scores from the
same file and from multiple files (I will probably use the "fileinput"
module to deal with multiple files). So, if I've got say:

Gibberish
53
MoreGarbage
12
RelevantInfo1
10/10/04
NothingImportant
ThisDoesNotMatter
44
RelevantInfo2
22
BlahBlah
343
RelevantInfo3
23
Hubris
Crap
34

SecondSetofGarbage
2423
YouGetThePicture
342342
RelevantInfo1
10/10/04
HoHum
343
MoreStuffNotNeeded
232
RelevantInfo2
33
RelevantInfo3
44
sdfsdf
RelevantInfo1
10/11/04
InsertBoringFillerHere
43234
Stuff
MoreStuff
RelevantInfo2
45
ExcitingIsntIt
324234
RelevantInfo3
60
Lalala

Sorry for the long and painful example input. Notice that the first two
"RelevantInfo1" fields have the same info but that the RelevantInfo2 and
RelevantInfo3 fields have different info. Also, there will be cases where
RelevantInfo3 might be the same with a different RelevantInfo2. What, I'm
hoping for is something along then lines of being able to organize it like
so (don't worry about the format of the output -- I'll deal with that
later; "RelevantInfo" shortened to "Info" for readability):

            Info1[0],                   Info[1],                    Info[2] ...
Info3[0]    Info2[Info1[0],Info3[0]]    Info2[Info1[1],Info3[1]]    ...
Info3[1]    Info2[Info1[0],Info3[1]]    ...
Info3[2]    Info2[Info1[0],Info3[2]]    ...
...

I don't really care if it's a list, dictionary, array etc. 

Thanks again for your help. The multiline option in the re module is very
useful. 

Take care.

-- 
Clarke's Conclusion:
	Never let your sense of morals interfere with doing the right thing.



More information about the Python-list mailing list