Orders of magnitude

Dang Griffith noemail at noemail4u.com
Wed Mar 31 17:51:58 CEST 2004

On 30 Mar 2004 06:57:16 -0800, bucknuggets at yahoo.com (Buck Nuggets)

>Christian Tismer <tismer at stackless.com> wrote in message news:<mailman.86.1080611520.20120.python-list at python.org>...
>> Buck Nuggets wrote:
>> > "Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.38.1080542935.20120.python-list at python.org>...
>> > 
>> > In case you are interested in alternatives approaches...here's how I
>> > typically do this:
>> > 
>> > step 1: sort the file using a separate sort utility (unix sort, cygwin
>> > sort, etc)
>> > 
>> > step 2: have a python program read in rows, 
>> >         compare each row to the prior,
>> >         write out only one row for each set
>> Good solution, but wayyyy too much effort.
>> You probably know it:
>> If you are seeking for duplicates, and doing it by
>> complete ordering, then you are thwowing lots of information
>> away, since you are not seeking for neighborship, right?
>> That clearly means: it must be inefficient.
>> No offense, just trying to get you on the right track!
>Ha, that's ok.  I've been doing exactly this kind of thing for over
>twenty years (crusty old database developer).  I think that you will
>find that it is more efficient in both development and run time.  And
>it's simple enough that once you start down this path you won't need
>to brainstorm on how to get it to work.
>Rather than taking 2-18 hours with the previously mentioned solutions
>(which require index-building and 10 million index lookups), you'll
>probably do the entire thing in about 10 minutes (9 minutes to sort
>file + 1 minute to check dups).
>From a crusty old unix developer to a crusty old database developer...
Part 2 can be done by piping the output of sort to the 'uniq' program
(available in cygwin and mingw also, I think).

And it's no effort, if it fits the bill.  It may be inneficient with 
regards to sorting algorithms, but extremely efficient in terms of
system and developer resources.


More information about the Python-list mailing list