Python's simplicity philosophy
Erik Max Francis
max at alcyone.com
Thu Nov 20 20:02:17 EST 2003
Curt wrote:
> Well, changing the order of the lines in my sample to ensure the
> contiguity of identical entries _is_ sorting.
The tweak I made to your sample file wasn't sorted. It just had two
identical adjacent lines. The modified sample again was:
max at oxygen:~/tmp% cat > uniq.txt
flirty
curty
curty
flirty
^D
max at oxygen:~/tmp% uniq uniq.txt
flirty
curty
flirty
You don't really think the sequence [flirty, curty, curty, flirty] is
sorted, do you?
> I don't know what else
> one could call that procedure, but "non-sorted" appears to me to be
> a rather provocative description of the modified sample which you were
> constrained to alter in order that it meet a criterion whose existence
> you deny.
man uniq on GNU:
DESCRIPTION
Discard all but one of successive identical lines from
INPUT (or standard input), writing to OUTPUT (or standard
output).
This says nothing about sorting.
man uniq on Solaris 8:
DESCRIPTION
The uniq utility will read an input file comparing adjacent
lines, and write one copy of each input line on the output.
The second and succeeding copies of repeated adjacent input
lines will not be written.
Repeated lines in the input will not be detected if they are
not adjacent.
Neither of these detailed descriptions makes any reference to sorting
whatsoever; uniq acts completely locally and doesn't care whether its
input is sorted or not.
As a more extended example, consider processing by uniq with a
hypothetical log file:
max at oxygen:~/tmp% cat > uniq2.txt
startup
connect from A
message from A
message from A
message from A
message from A
disconnect from B
mark
mark
connect from B
message from B
disconnect from B
shutdown
^D
max at oxygen:~/tmp% uniq uniq2.txt
startup
connect from A
message from A
disconnect from B
mark
connect from B
message from B
disconnect from B
shutdown
max at oxygen:~/tmp% uniq -c uniq2.txt # to see the number of duplicates
1 startup
1 connect from A
4 message from A
1 disconnect from B
2 mark
1 connect from B
1 message from B
1 disconnect from B
1 shutdown
I hope you'd agree that this input is obviously not sorted in any way.
Yet uniq works precisely as described.
Yes, obviously sending uniq sorted input is a common way it is invoked.
But it is by no means required.
--
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \
\__/ We are victims of our circumstance.
-- Sade Adu
More information about the Python-list
mailing list