[Tutor] dictionary-ness ...

Magnus Lyckå magnus@thinkware.se
Wed May 14 07:37:17 2003


At 00:17 2003-05-14 -0400, David Broadwell wrote:
>As a 'little' project, I'm planning on creating a tool to index all of the
>words used in files on my HDD.
>The best schema I have though of so far is dictionary storage as a prelim
>stage to database ...

Ok. As long as the keys are strings, you can replace the dictionary with a
shelve. See http://www.python.org/doc/current/lib/module-shelve.html

Another option would be to use an SQL database, maybe SQLite, but that
means that you have to learn a little SQL as well. You can use a
wrapper like SQLObject to aviod writing a lot of SQL, but you probably
need some SQL knowledge anyway. It's a much more powerful database though.
It would mean that you get a lot of the query functions for free.

>Wacking down to the 'words' in the files is trivial and with os.path.walk's
>help, just a few minutes in the interpreter and I can grab the wordlist of
>the files.

So what is the purpose? Do you want to be able to enter a word and
see in what files it occurs, or what?

>Now for the dictionary design, I was planning on as a stage one, the
>dictionary being keyed on the first two letters of the word. No biggie ...
>The real question is, let's say I have a ['word', 'wording'] in the list to
>add to the dictionary, if I'm keying on 'wo' for both of them, I believe I'm
>going to run into key name clashes.

I don't understand the purpose of this. Why key on parts of a
word? Why two letters?

>So I went to the next level with the idea, keep the same key schema, but use
>lists AS my value. Which would look like;
>dict = {['wo',['word','wording]], ...}

No, it would not look like that. It might look like...

dict = {'wo': ['word', 'wording'], ... }

...but I can't see the point in storing anything like that. What is
the purpose of this dictionary?

There are several python based search engines already. I think
a list was posted recently on this mailing list. Perhaps you
would learn something from studying some of them.


--
Magnus Lycka (It's really Lyckå), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program