Efficient posting-list
Terry Reedy
tjreedy at udel.edu
Mon Jun 3 18:31:10 CEST 2002
"Shagshag13" <shagshag13 at yahoo.fr> wrote in message
news:adfu52$10t9pp$1 at ID-146704.news.dfncis.de...
> By now i use a node with : 2 int and 1 float. Node objects are
handled
> in a python list. ...(i can't use a tuple, i need to update values)
I would think that the data for a given keyword in a given document
would be fixed ... or do docs change without changing id?
> > For pure Python, tuple is most space efficient. If
> > each node really is (integer id, integer count), one could devise
a
> > system using a pair of array (module) objects or 2-dimensional
> > numerical python arrays; which store actual integers rather than
> > pointers to int PyObjects. When filled, they would have to be
copied
> > to larger arrays in much the same manner as done automatically by
> > Python lists.
>
> So here you think that i should use only one list containing :
>
> [id_x, count_x, float_x, id_x+1, count_x+1, float_x+1, ...]
>
> and so on ?
With modification, yes. To reword a bit: Python is a bit proliflagate
with space, especial when boxing numbers. A tuple of three numbers +
the three number objects take up roughtly ten times as much space as
the raw numbers themselves. A list is a bit worse. So, to save space
and (usually) time when working with 'lots' of numbers, one needs to
switch to (boxed) arrays of raw (unboxed) numbers. I believe both the
array and numpy modules make such available but I have not yet worked
with either, so I will let you explore. With your additional
information, I suggest that a node be a list or class instance
wrapping three arrays - idx, countx, floatx - which you copy/extend
as needed.
Terry J. Reedy
More information about the Python-list
mailing list