[Tutor] Sorting a list in an add order

Kent Johnson kent37 at tds.net
Wed Sep 6 13:51:27 CEST 2006


Matt Williams wrote:
> Dear List,
>
> I've written a small script to extract the definitions from my thesis, 
> and output them as a .tex file, which works ok but I have a small problem.
>
> The input is done by specifying a directory, and using glob to find the 
> ".tex" filenames.
>
> However, I want to process them so that they are arranged in the correct 
> order, which means I need to sort the list of files. Of course, because 
> they aren't named in any (obvious) order, I'm a bit stuck.
>
> I thought about using a dictionary to map names and order: so {"OAF":1, 
> "Valuation":2...etc}, but I don't don't know how to take it on from 
> here. I was thinking of looking up the filename in the dictionary (using 
> .startswith() to get some basic rough-matching capacity) and then using 
> that to return the order that the files should be handled in.

I'm not entirely sure what you want to do - some example filenames would 
help.

I guess that "OAF" and "Valuation" are prefixes to the filenames so you 
might have files named OAF2 and Valuation4, is that right? Then the sort 
would be by prefix, then by the number following?

If this is correct, I think a list would be more helpful than a 
dictionary since dictionary lookup is always by exact key. I would make 
a helper function that makes a tuple of (index of prefix in key list, 
exact filename). Then use the "key=" parameter to sort on these tuples. 
For example:

# The ordered list of prefixes
In [1]: prefixes = ['OAF', 'ABC', 'Valuation']

# Some filenames
In [2]: filenames = 'ABC3 ABC1 Valuation2 OAF3 Valuation5 OAF2'.split()

# This is the key function, it finds the matching prefix for a name
In [3]: def makekey(name):
   ...:     for i, prefix in enumerate(prefixes):
   ...:         if name.startswith(prefix):
   ...:             return (i, name)
   ...:     return (len(prefixes), name)
   ...:

# Show what makekey does
In [4]: [makekey(name) for name in filenames]
Out[4]:
[(1, 'ABC3'),
 (1, 'ABC1'),
 (2, 'Valuation2'),
 (0, 'OAF3'),
 (2, 'Valuation5'),
 (0, 'OAF2')]

# Sort using makekey
In [5]: filenames.sort(key=makekey)

# It works!
In [6]: filenames
Out[6]: ['OAF2', 'OAF3', 'ABC1', 'ABC3', 'Valuation2', 'Valuation5']

There are several variations possible depending on what the data looks 
like. If the number part of the filename has varying numbers of digits 
you will have to convert it to an integer to get the correct sort order. 
If you have a *lot* of files and prefixes, the lookup of the prefix 
might be too costly (it is a linear search of a list). Then maybe you 
want to use a regular expression to pick off the prefix and look it up 
in a dictionary to get the index.

I hope I haven't completely misunderstood what you want to do :-)

Kent



More information about the Tutor mailing list