little question

Sat May 25 04:39:56 EDT 2002

Kragen Sitaker <kragen at pobox.com> wrote in message news:<83vg9dl9vh.fsf at panacea.canonical.org>...
> shagshag13 at yahoo.fr (Shagshag) writes:
> > As i'm still a newbie in python, comments are welcome.
> > ...
> > 	documents = {}
> > 
> > 	documents[0] = 'pease porridge hot pease porridge cold'
> > 	documents[1] = 'pease porridge in the pot'
> > 	documents[2] = 'nine days old'
> > 	documents[3] = 'some like it hot some like it cold'
> > 	documents[4] = 'some like it in the pot'
> > 	documents[5] = 'nine days old'
> 
> I'd write this whole stretch here as 
>     documents = ['pease porridge hot pease porridge cold',
>                  'pease porridge in the pot',
>                  'nine days old',
>                  'some like it hot some like it cold',
>                  'some like it in the pot',
>                  'nine days old']
> 
> This turns documents from a dict into a list.

Yes, i missed that, Thanks !!!

> > 		terms = documents[i].split()
> > 		added = []
> 
> I think 'added' should be a dict, at which point you can use not
> added.has_key(t) instead of t not in added (although t not in added
> will still work in recent Pythons) and added[t] = 1 instead of
> added.append(t).  For your sample documents, it doesn't matter, but
> for documents with more than a few hundred words, you'll spend all
> your time scanning the 'added' list otherwise.

It's ok, another thanks !!!

> 
> > 		for t in terms:
> > 			node = PostingListNode(i, terms.count(t))
> 
> Similarly, terms.count() has to scan terms from beginning to end; I
> would instead use a dict to count the number of times each term is
> found in 'terms' and then actually add things to the invertedIndex
> after the end of this loop, in another loop over the contents of that
> dict.

I understand what you mean, but sorry, I think i don't see how i
should do that...

> How about 'return str((self.getDocumentID(), self.getInformation()))'?

Ok, it's shorter.

> Frankly, though, I'd be inclined to just use (documentID, information)
> tuples instad of defining a class.

No in fact i was thinking of putting object in "information", so
PostingListNode where a sort of container.

> Why don't you just store the lists of nodes that are in _container
> directly in the hash instead?  It would shorten add() considerably,
> make get_nodes one line long, and __str__ could just return something
> like str(self._hash).

Here i wish to save memory place, by having an array with words, or
multiple words and only working on integer. But i'm not really aware
of how i should do that and maybe i missed this point too. Do you know
where should i go to read this kind of stuff ?

> Your caller here is going to be expecting a list in the normal case.
> Why is it better to return None instead of raising an exception in the
> abnormal case?

Well by now, i don't understand all the holding and outcomes of
exception...

>  I can see that it might be better to return []

I'm going to do that !!!

Great thanks for all your comments !!! 

S13.

ps : do you think i could post newer version here for comments ? or as
this will be long post i should avoid it ?