How to sort a list of file paths

Eriksson, John john.eriksson at logica.com
Tue Dec 2 14:43:51 CET 2008


Hi again,

I've updated the example using the ideas and python tricks used on pages found via the link you gave me, Chris.

So... for future references here's the best (?) way of sorting a list of file names in the correct way (correct for some applications at least).

Note: For some odd reason I seemed to get the best performance using a lambda defined function instead of an ordinary function.

# --- EXAMPLE ---

import re
RE_DIGIT = re.compile(r'(\d+)')
ALPHANUM_KEY = lambda s: [int(g) if g.isdigit() else g for g in RE_DIGIT.split(s)]

file_list = ["File2.txt","File1.txt","File10.txt"]
file_list.sort(key=ALPHANUM_KEY)

# ---------------

Best Regards
/John


-----Original Message-----
From: cvrebert at gmail.com [mailto:cvrebert at gmail.com] On Behalf Of Chris Rebert
Sent: den 2 december 2008 10:26
To: Eriksson, John
Cc: python-list at python.org
Subject: Re: How to sort a list of file paths

On Tue, Dec 2, 2008 at 12:36 AM, Eriksson, John
<john.eriksson at logica.com> wrote:
> Hi,
>
>
>
> This weekend I had some problems to get a list containing file paths to be
> sorted in a way that I could use.
>
>
>
> I also found a thread in this mailing list (
> http://mail.python.org/pipermail/python-list/2007-April/433590.html ) and
> realized that others might be interested in a solution.
>
>
>
> So... here is my five cents regarding file path sorting:
>
>
>
> Problem description:
>
>
>
> You have a list containing some file names:
>
>
>
>>>> file_list = ["File2.txt","File1.txt","File10.txt"]
>
>
>
> If you sort this list in the conventional way you end up with a result like:
>
>
>
>>>> file_list.sort()
>
>>>> print file_list
>
> ['File1.txt','File10.txt','File2.txt']
>
>
>
> Solution:
>
>
>
> Sort the list by splitting alphas and digits in to groups and compare them
> separately.
>
>
>
> import re
>
> def true_alphanum_cmp(a,b):
>
>     aa = re.findall(r'\d |\D ', a)
>
>     bb = re.findall(r'\d |\D ', b)
>
>     for i in range(min(len(aa),len(bb))):
>
>         if aa[i].isdigit() and bb[i].isdigit():
>
>             c = cmp(int(aa[i]),int(bb[i]))
>
>         else:
>
>             c = cmp(aa[i],bb[i])
>
>         if c!=0:
>
>             return c
>
>     return cmp(len(aa),len(bb))
>
>
>
> file_list = ["File2.txt","File1.txt","File10.txt"]
>
> file_list.sort(true_alphanum_cmp)
>
>
>
> If the formatting in this mail is messed up you can find the example at
> http://arainyday.se/notebook/true_alphanum_cmp.php
>
>
>
> All comments and improvements are welcome!

Sounds like the issue discussed in the post on Coding Horror:
http://www.codinghorror.com/blog/archives/001018.html
It even links to another Python version of the algorithm.

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com

>
>
>
> Best regards
>
> John Eriksson
>
> _________________________________________
>
>
>
> Logica - Releasing your potential
>
> Tegsplan 2b
>
> 904 20 UMEÅ
>
> Sweden
>
>
>
> T: +46 (0) 90 15 91 38
>
> M: +46 (0) 70 366 16 77
>
> E: john.eriksson at logica.com
>
> www.logica.se
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>



More information about the Python-list mailing list