Orders of magnitude

Dang Griffith noemail at noemail4u.com
Wed Mar 31 10:51:58 EST 2004


On 30 Mar 2004 06:57:16 -0800, bucknuggets at yahoo.com (Buck Nuggets)
wrote:

>Christian Tismer <tismer at stackless.com> wrote in message news:<mailman.86.1080611520.20120.python-list at python.org>...
>> Buck Nuggets wrote:
>> 
>> > "Robert Brewer" <fumanchu at amor.org> wrote in message news:<mailman.38.1080542935.20120.python-list at python.org>...
>> > 
>> > In case you are interested in alternatives approaches...here's how I
>> > typically do this:
>> > 
>> > step 1: sort the file using a separate sort utility (unix sort, cygwin
>> > sort, etc)
>> > 
>> > step 2: have a python program read in rows, 
>> >         compare each row to the prior,
>> >         write out only one row for each set
>> 
>> Good solution, but wayyyy too much effort.
>> You probably know it:
>> If you are seeking for duplicates, and doing it by
>> complete ordering, then you are thwowing lots of information
>> away, since you are not seeking for neighborship, right?
>> That clearly means: it must be inefficient.
>> 
>> No offense, just trying to get you on the right track!
>
>Ha, that's ok.  I've been doing exactly this kind of thing for over
>twenty years (crusty old database developer).  I think that you will
>find that it is more efficient in both development and run time.  And
>it's simple enough that once you start down this path you won't need
>to brainstorm on how to get it to work.
>
>Rather than taking 2-18 hours with the previously mentioned solutions
>(which require index-building and 10 million index lookups), you'll
>probably do the entire thing in about 10 minutes (9 minutes to sort
>file + 1 minute to check dups).
>From a crusty old unix developer to a crusty old database developer...
Part 2 can be done by piping the output of sort to the 'uniq' program
(available in cygwin and mingw also, I think).

And it's no effort, if it fits the bill.  It may be inneficient with 
regards to sorting algorithms, but extremely efficient in terms of
system and developer resources.

    --dang



More information about the Python-list mailing list