Scalable python dict {'key_is_a_string': [count, some_val]}

Paul Rubin no.email at nospam.invalid
Sat Feb 20 01:47:41 EST 2010


krishna <krishna.k.0001 at gmail.com> writes:
> entries from two files match by key) and sorting using the 'sort'
> command. Thus the bottleneck is the 'sort' command.

That is a good approach.   The sort command is highly optimized and will
beat any Python program that does something comparable.  Set LC_ALL=C if
the file is all ascii, since that will bypass a lot of slow Unicode
conversion and make sorting go even faster.

> By the way, is there a linux command that does the merging part?

sort -m

Note that the sort command already does external sorting, so if you
can just write out one large file and sort it, instead of sorting and
then merging a bunch of smaller files, that may simplify your task.



More information about the Python-list mailing list