little question
Shagshag
shagshag13 at yahoo.fr
Sat May 25 04:39:56 EDT 2002
Kragen Sitaker <kragen at pobox.com> wrote in message news:<83vg9dl9vh.fsf at panacea.canonical.org>...
> shagshag13 at yahoo.fr (Shagshag) writes:
> > As i'm still a newbie in python, comments are welcome.
> > ...
> > documents = {}
> >
> > documents[0] = 'pease porridge hot pease porridge cold'
> > documents[1] = 'pease porridge in the pot'
> > documents[2] = 'nine days old'
> > documents[3] = 'some like it hot some like it cold'
> > documents[4] = 'some like it in the pot'
> > documents[5] = 'nine days old'
>
> I'd write this whole stretch here as
> documents = ['pease porridge hot pease porridge cold',
> 'pease porridge in the pot',
> 'nine days old',
> 'some like it hot some like it cold',
> 'some like it in the pot',
> 'nine days old']
>
> This turns documents from a dict into a list.
Yes, i missed that, Thanks !!!
> > terms = documents[i].split()
> > added = []
>
> I think 'added' should be a dict, at which point you can use not
> added.has_key(t) instead of t not in added (although t not in added
> will still work in recent Pythons) and added[t] = 1 instead of
> added.append(t). For your sample documents, it doesn't matter, but
> for documents with more than a few hundred words, you'll spend all
> your time scanning the 'added' list otherwise.
It's ok, another thanks !!!
>
> > for t in terms:
> > node = PostingListNode(i, terms.count(t))
>
> Similarly, terms.count() has to scan terms from beginning to end; I
> would instead use a dict to count the number of times each term is
> found in 'terms' and then actually add things to the invertedIndex
> after the end of this loop, in another loop over the contents of that
> dict.
I understand what you mean, but sorry, I think i don't see how i
should do that...
> How about 'return str((self.getDocumentID(), self.getInformation()))'?
Ok, it's shorter.
> Frankly, though, I'd be inclined to just use (documentID, information)
> tuples instad of defining a class.
No in fact i was thinking of putting object in "information", so
PostingListNode where a sort of container.
> Why don't you just store the lists of nodes that are in _container
> directly in the hash instead? It would shorten add() considerably,
> make get_nodes one line long, and __str__ could just return something
> like str(self._hash).
Here i wish to save memory place, by having an array with words, or
multiple words and only working on integer. But i'm not really aware
of how i should do that and maybe i missed this point too. Do you know
where should i go to read this kind of stuff ?
> Your caller here is going to be expecting a list in the normal case.
> Why is it better to return None instead of raising an exception in the
> abnormal case?
Well by now, i don't understand all the holding and outcomes of
exception...
> I can see that it might be better to return []
I'm going to do that !!!
Great thanks for all your comments !!!
S13.
ps : do you think i could post newer version here for comments ? or as
this will be long post i should avoid it ?
More information about the Python-list
mailing list