[Tutor] data structures

Steven D'Aprano steve at pearwood.info
Fri Dec 3 01:20:24 CET 2010


Knacktus wrote:
> Am 02.12.2010 02:51, schrieb Dana:
>> Hello,
>>
>> I'm using Python to extract words from plain text files. I have a list
>> of words. Now I would like to convert that list to a dictionary of
>> features where the key is the word and the value the number of
>> occurrences in a group of files based on the filename (different files
>> correspond to different categories). What is the best way to represent
>> this data? When I finish I expect to have about 70 unique dictionaries
>> with values I plan to use in frequency distributions, etc. Should I use
>> globally defined dictionaries?

> Depends on what else you want to do with the group of files. If you're 
> expecting some operations on the group's data you should create a class 
> to be able to add some more methods to the data. I would probably go 
> with a class.

Unless you're planning to have multiple "file groups" at once, or 
intending to re-use this code for other modules, using a class is 
probably overkill. This isn't Java where everything has to be a class :)


> But also I think dictionaries can be fine. If you really only need the 
> dicts. You could create a function to create those.

Agreed.

One way or the other, a dict {word: count} is the natural data structure 
to use for a concordance. Whether you store those dicts in a class, or 
just operate directly on the dicts, is relatively unimportant.


-- 
Steven



More information about the Tutor mailing list