Multiline regex help
Steven Bethard
steven.bethard at gmail.com
Thu Mar 3 11:54:02 EST 2005
Yatima wrote:
> Hey Folks,
>
> I've got some info in a bunch of files that kind of looks like so:
>
> Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34
>
> and so on...
>
> Anyhow, these "fields" repeat several times in a given file (number of
> repetitions varies from file to file). The number on the line following the
> "RelevantInfo" lines is really what I'm after. Ideally, I would like to have
> something like so:
>
> RelevantInfo1 = 10/10/04 # The variable name isn't actually important
> RelevantInfo3 = 23 # it's just there to illustrate what info I'm
> # trying to snag.
>
> Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
A possible solution, using the re module:
py> s = """\
... Gibberish
... 53
... MoreGarbage
... 12
... RelevantInfo1
... 10/10/04
... NothingImportant
... ThisDoesNotMatter
... 44
... RelevantInfo2
... 22
... BlahBlah
... 343
... RelevantInfo3
... 23
... Hubris
... Crap
... 34
... """
py> import re
py> m = re.compile(r"""^RelevantInfo1\n([^\n]*)
... .*
... ^RelevantInfo2\n([^\n]*)
... .*
... ^RelevantInfo3\n([^\n]*)""",
... re.DOTALL | re.MULTILINE | re.VERBOSE)
py> score = {}
py> for info1, info2, info3 in m.findall(s):
... score.setdefault(info1, {})[info3] = info2
...
py> score
{'10/10/04': {'23': '22'}}
Note that I use DOTALL to allow .* to cross line boundaries, MULTILINE
to have ^ apply at the start of each line, and VERBOSE to allow me to
write the re in a more readable form.
If I didn't get your dict update quite right, hopefully you can see how
to fix it!
HTH,
STeVe
More information about the Python-list
mailing list