"Newbie" questions - "unique" sorting ?
jdhunter at ace.bsd.uchicago.edu
Wed Jun 25 05:11:59 CEST 2003
>>>>> "John" == John Fitzsimons <xpm4senn001 at sneakemail.com> writes:
John> (B) I am wanting to sort words (or is that strings ?) into a
John> list from a clipboard and/or file input and/or....
John> (C) To sort out the list of "unique" words/strings.
The classic idiom for getting a unique list is to use a dictionary
If you have enough memory to do everything in memory, the following
should be quote efficient
allWords = file('myfile.dat').read().split()
uwords = dict([(w,1) for w in allWords]).keys()
By using list comprehensions to build the dict, as above, you avoid
some of the overhead of a manual loop approach.
Although this approach conserves speed over memory, in my own
experience processing text files, it is the way to go. Very large
text files (you mentioned 50MB) are extremely rare. For example, the
entire King James bible, including html markup, is < 5MB. The
complete works of Shakespeare, including html markup, are < 10MB. So
I think it would be unusual for you to need to process a single text
file larger that 10MB. Unless you have a specific example where you
need to process such extremely large files, I recommend doing as much
as possible in memory.
More information about the Python-list