Multiline regex help

Yatima yatima_ at konishi.polis.net
Thu Mar 3 15:27:57 EST 2005


On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson <kent37 at tds.net> wrote:
>
> Here is a way to create a list of [RelevantInfo, value] pairs:
> import cStringIO
>
> raw_data = '''Gibberish
> 53
> MoreGarbage
> 12
> RelevantInfo1
> 10/10/04
> NothingImportant
> ThisDoesNotMatter
> 44
> RelevantInfo2
> 22
> BlahBlah
> 343
> RelevantInfo3
> 23
> Hubris
> Crap
> 34'''
> raw_data = cStringIO.StringIO(raw_data)
>
> data = []
> for line in raw_data:
>      if line.startswith('RelevantInfo'):
>          key = line.strip()
>          value = raw_data.next().strip()
>          data.append([key, value])
>
> print data
>

Thank you. This isn't exactly what I'm looking for (I wasn't clear in
describing the problem -- please see my reply to Steve for a, hopefully,
better explanation) but it does give me a few ideas.
>
>> 
>> Score[RelevantInfo1][RelevantInfo3] = 22 # The value from RelevantInfo2
>
> I'm not sure what you mean by this. Do you want to build a Score dictionary as well?

Sure... Uhhh.. I think. Okay, what I want is some kind of awk-like
associative array because the raw data files will have repeats for certain
field vaues such that there would be, for example, multiple RelevantInfo2's
and RelevantInfo3's for the same RelevantInfo1 (i.e. on the same date). To
make matters more exciting, there will be multiple RelevantInfo1's (dates)
for the same RelevantInfo3 (e.g. a subject ID). RelevantInfo2 will be the
value for all unique combinations of RelevantInfo1 and RelevantInfo3. There
will be multiple occurrences of these fields in the same file (original data
sample was not very good for this reason) and multiple files as well. The
interesting three fields will always be repeated in the same order although
the amount of irrelevant data in between may vary. So:

RelevantInfo1
10/10/04
<snipped crap>
RelevantInfo2
12
<more snippage>
RelevantInfo3
43
<more snippage>
RelevantInfo1
10/10/04            <- The same as the first occurrence of RelevantInfo1
<snipped>
RelevantInfo2
22
<snipped>
RelevantInfo3
25
<snipped>
RelevantInfo1
10/11/04
<snipped>
RelevantInfo2
34
<snipped>
RelevantInfo3
28
<snipped>
RelevantInfo1
10/12/04
<snipped>
RelevantInfo2
98
<snipped>
RelevantInfo3
25                <- The same as the second occurrence of RelevantInfo3
...

Sorry for the long and tedious "data" example.

There will be missing values for some combinations of RelevantInfo1 and
RelevantInfo3 so hopefully that won't be an issue.

Thanks again for your reply.

Take care.

-- 
"I figured there was this holocaust, right, and the only ones left alive were
 Donna Reed, Ozzie and Harriet, and the Cleavers."
-- Wil Wheaton explains why everyone in "Star Trek: The Next Generation" 
    is so nice



More information about the Python-list mailing list