Python's simplicity philosophy

Erik Max Francis max at
Thu Nov 20 23:17:19 CET 2003

Curt wrote:

> curty at einstein:~$ less uniq.txt
> flirty
> curty
> flirty
> curty
> curty at einstein:~$ uniq uniq.txt
> flirty
> curty
> flirty
> curty
> curty at einstein:~$ sort uniq.txt | uniq
> curty
> flirty
> Maybe my uniq is unique.

No, that's expected behavior, and consistent with what I said.  uniq
doesn't really care whether its input is sorted, it just takes
consecutive sequences of identical lines (identically by the criteria
you specify on the command line) and collapses them into at most one.

In your sample, there were no consecutive lines that were identical, so
uniq did nothing.  Change the order of them, and despite still being
non-sorted, you'll see that uniq is working:

max at oxygen:~/tmp% cat > uniq.txt
max at oxygen:~/tmp% uniq uniq.txt

The duplicate consecutive "curty" lines got collapsed into one.

>        uniq - remove duplicate lines from a sorted file
>                                             ******

That's true that's in the one-line description of uniq on some systems,
such as GNU, since that's the most common usage.  But if you look at the
description of what it actually does, you'll see its behavior doesn't
require sorted input:

       Discard  all  but  one  of successive identical lines from
       INPUT (or standard input), writing to OUTPUT (or  standard

And on some systems, the summary doesn't mention sorting at all; Solaris
8, for instance:

     uniq - report or filter out repeated lines in a file

and sort is only mentioned in the "SEE ALSO" section, nowhere in the
main descpription.

For an example where uniq would be useful despite the input deliberately
not being sorted, consider processing a log file with a lot of duplicate
entries, and you only want to see the first of each series of
consecutive duplicates.  (This is actually not unheard of; syslog for
instance will do this automatically.)

   Erik Max Francis && max at &&
 __ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/  \ 
\__/ Extremes meet.
    -- John Hall Wheelock

More information about the Python-list mailing list