Efficient posting-list

Terry Reedy tjreedy at udel.edu
Mon Jun 3 18:31:10 CEST 2002

"Shagshag13" <shagshag13 at yahoo.fr> wrote in message
news:adfu52$10t9pp$1 at ID-146704.news.dfncis.de...
> By now i use a node with : 2 int and 1 float. Node objects are
> in a python list. ...(i can't use a tuple, i need to update values)

I would think that the data for a given keyword in a given document
would be fixed ... or do docs change without changing id?

> > For pure Python, tuple is most space efficient.  If
> > each node really is (integer id, integer count), one could devise
> > system using a pair of array (module) objects or 2-dimensional
> > numerical python arrays; which store actual integers rather than
> > pointers to int PyObjects.  When filled, they would have to be
> > to larger arrays in much the same manner as done automatically by
> > Python lists.
> So here you think that i should use only one list  containing :
> [id_x, count_x, float_x, id_x+1, count_x+1, float_x+1, ...]
> and so on ?

With modification, yes.  To reword a bit: Python is a bit proliflagate
with space, especial when boxing numbers.  A tuple of three numbers +
the three number objects take up roughtly ten times as much space as
the raw numbers themselves.  A list is a bit worse.  So, to save space
and (usually) time when working with 'lots' of numbers, one needs to
switch to (boxed) arrays of raw (unboxed) numbers.  I believe both the
array and numpy modules make such available but I have not yet worked
with either, so I will let you explore.  With your additional
information, I suggest that a node be a list or class instance
wrapping three arrays - idx, countx, floatx  - which you copy/extend
as needed.

Terry J. Reedy

More information about the Python-list mailing list