sorting 1172026 entries
Gary Herron
gary.herron at islandtraining.com
Sun May 6 12:37:53 EDT 2012
On 05/06/2012 09:29 AM, J. Mwebaze wrote:
> sorry see, corrected code
>
>
> for filename in txtfiles:
> temp=[]
> f=open(filename)
> for line in f.readlines():
> line = line.strip()
> line=line.split()
> temp.append((parser.parse(line[0]), float(line[1])))
> temp=sorted(temp)
> with open(filename.strip('.txt')+ '.sorted', 'wb') as p:
> for i, j in temp:
> p.write('%s %s\n' %(str(i),j))
Don't do
temp = sorted(temp)
That will create a *new* copy of the list to sort, and the assignment
will free up the original list for deletion and garbage collection.
Instead do the in-place sort:
temp.sort()
Same result, less thrashing.
This will make your program slightly more efficient, HOWEVER, it is not
the solution of your week-long sort problem.
Gary Herron
>
>
> On Sun, May 6, 2012 at 6:26 PM, J. Mwebaze <jmwebaze at gmail.com
> <mailto:jmwebaze at gmail.com>> wrote:
>
> I have attached one of the files, try to sort and let me know the
> results. Kindly sort by date. ooops - am told the file exceed 25M.
>
> below is the code
>
> import glob
> txtfiles =glob.glob('*.txt')
> import dateutil.parser as parser
>
>
> for filename in txtfiles:
> temp=[]
> f=open(filename)
> for line in f.readlines():
> line = line.strip()
> line=line.split()
> temp.append((parser.parse(line[0]), float(line[1])))
> temp=sorted(temp)
> with open(filename.strip('.txt')+ '.sorted', 'wb') as p:
> for i, j in temp:
> p.write('%s %s\n' %(str(i),j))
>
>
> On Sun, May 6, 2012 at 6:21 PM, Devin Jeanpierre
> <jeanpierreda at gmail.com <mailto:jeanpierreda at gmail.com>> wrote:
>
> On Sun, May 6, 2012 at 12:11 PM, J. Mwebaze
> <jmwebaze at gmail.com <mailto:jmwebaze at gmail.com>> wrote:
> > [ (datatime, int) ] * 1172026
>
> I can't duplicate slowness. It finishes fairly quickly here.
> Maybe you
> could try posting specific code? It might be something else
> that is
> making your program take forever.
>
> >>> x = [(datetime.datetime.now() +
> datetime.timedelta(random.getrandbits(10)),
> random.getrandbits(32)) for _ in xrange(1172026)]
> >>> random.shuffle(x)
> >>> x.sort()
> >>>
>
> -- Devin
>
>
>
>
> --
> *Mob UG: +256 (0) 70 1735800 <tel:%2B256%20%280%29%2070%201735800>
> | NL +31 (0) 6 852 841 38
> <tel:%2B31%20%280%29%206%20852%20841%2038> | Gtalk: jmwebaze |
> skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze
> <http://www.astro.rug.nl/%7Ejmwebaze>
>
> /* Life runs on code */*
>
>
>
>
> --
> *Mob UG: +256 (0) 70 1735800 | NL +31 (0) 6 852 841 38 | Gtalk:
> jmwebaze | skype: mwebazej | URL: www.astro.rug.nl/~jmwebaze
> <http://www.astro.rug.nl/%7Ejmwebaze>
>
> /* Life runs on code */*
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120506/eb518f8e/attachment-0001.html>
More information about the Python-list
mailing list