Data type ideas

John Roth johnroth at ameritech.net
Sat Mar 30 07:50:02 EST 2002


This is a sorting problem. If you can't do it in
memory (and I suspect that you probably can't)
look at the unix sort command.

John Roth

"Joel Ricker" <joejava at dragoncat.net> wrote in message
news:mailman.1017469191.15784.python-list at python.org...
HI all, got a new problem :)

I have a tab delimited file of people plus a list of groups they belong
to like so:

Person 1  Group A
               Group B
Person 2  Group B
Person 3  Group A
               Group C

So basically a person can be part of one of more groups.  I'm looking to
process this list so that I can take each group and examine the list of
people in it.  Basically turn the list into:

Group A Person 1
             Person 3
Group B Person 1
             Person 2
Group C Person 3

The drawback I have to all this is, the file I'm working is pretty big:
about 40 megs. A majority of the file is going to be extraneous data
that I have weeded out with regular expressions but it is still a large
data file.

My first (naive) approach was to just create a Dict type using the name
of the group as a the key and for the value a list of people. I learned
that due to the overhead, that was going to take alot of memory and
processing time.

It would look something like this:

{"Group A" : ["Person 1", "Person 3"],
 "Group B" : ["Person 1", "Person 2"],
 "Group C" : ["Person 3"]}

My next idea was what about references?  Maybe create a list of people
and a Dict as above with a list of references to the list of people.
But as I learned you can't do references to simple data objects (like a
subscript of a list).  I could be wrong but thats what I gathered.  I
tried using a list of integers for the value of the Group Dict,
"pointing" to the list of People:

{"Group A" : [0, 2],
 "Group B" : [0, 1],
 "Group C" : [2] }

["Person 1", "Person 2", "Person 3"]

This helped a little but obviously not much since it isn't much of a
change from what I've had before.

So what next?  Any ideas that I can use?

Thanks
Joel





More information about the Python-list mailing list