[Tutor] Help with re.sub()

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Fri Mar 17 06:03:41 CET 2006


> I have a file that is a long list of records (roughly) in the format
>
> objid at objdata
>
> So, for example:
>
> id1 at data1
> id1 at data2
> id1 at data3
> id1 at data4
> id2 at data1
> ....
>
> What I would like to do is run a regular expression against this and
> wind up with:

I'd recommend scratching out the requirement to use regular expressions.
*grin*

I'm actually not certain they're appropriate for this problem; it seems
more like knowing about data structures like lists and dictionaries will
be more crucial here.


> Actually, should I be able to do something like that?  If I execute it
> in my debugger, my string gets really funky... like the re is losing
> track of what the groups are... and I end up with a single really long
> string rather than what I expect..

I do not see an obvious regular expression that does what you want.
I'm not saying that no such regex exists (I'd have to think about it a
bit), but that simpler approaches will probably work out better.



Would you might if we simplify the problem a bit?  Rather than working
directly on files, what if you were working on tuples where the id and the
data portion was already split up for you?

That is, would life be simpler for you if you had a list like:

[('id1', 'data1'),
 ('id1', 'data2'),
 ('id1', 'data3'),
 ('id1', 'data4'),
 ('id2', 'data1'),
 ...]

and given input like this, you were to try to compute something like a
dictionary from ids to a list of the data?

{ 'id1' : ['data1', 'data2', 'data3', 'data4'],
  'id2' : ['data1'],
  ...}

Would this be something you'd know how to do?


Best of wishes to you!



More information about the Tutor mailing list