[Chicago] is there really no built-in file/iter split() thing?

Kumar McMillan kumar.mcmillan at gmail.com
Sun Dec 2 04:58:50 CET 2007


On Dec 1, 2007 8:17 PM, Carl Karsten <carl at personnelware.com> wrote:
> Kumar McMillan wrote:
> > On Dec 1, 2007 4:59 PM, Carl Karsten <carl at personnelware.com> wrote:
> >>  > Assuming open('big.sql').read().split(';') would be a dumb idea,
> >>
> >> How about we just not assume that?  If it is, lets see the proof so we have a
> >> good idea how bad it is, which will help gauge how elaborate of a work around is
> >> justified.
> >
> > the file I was parsing was 45M.  If you want to test it on *your*
> > machine, go ahead and post back the results :)  It would be nice to
> > see, actually.  My assumption is that it will try to allocate at least
> > 90M of memory but, yes, it is still just an assumption.
>
> carl at vaio:~$ free -m
>               total       used       free     shared    buffers     cached
> Mem:           376         26        349          0          0          4
> -/+ buffers/cache:         21        354
> Swap:          627         56        570
> carl at vaio:~$ time python
> Python 2.5.1 (r251:54863, Oct  5 2007, 13:36:32)
> [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import datetime
>  >>> s=datetime.datetime.now()
>  >>> x='abc;'*(45000000/4)
>  >>> datetime.datetime.now() - s
> datetime.timedelta(0, 9, 131028)
>  >>> len(x)
> 45000000
>  >>> s=datetime.datetime.now()
>  >>> y=x.split(';')
> datetime.datetime.now() - s
>  >>> datetime.datetime.now() - s
> datetime.timedelta(0, 23, 222340)
>  >>> len(y)
> 11250001
>  >>>
>
> real    2m48.391s
> user    0m4.016s
> sys     0m2.320s
>
> in a 2nd shell, after doing y=...
> carl at vaio:~$ ps vp 7191
>    PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
>   7191 pts/2    S+     0:03    283   985 461442 340092 88.1 python
>
> Anyone know what that means?

you could try using pysizer instead:
http://pysizer.8325.org/
http://pysizer.8325.org/doc/tutorial.html

"PySizer is a memory usage profiler for Python code."

I've never tried using it myself.

>
> Most of the 2m48s was after I hit ^D to exit python.  Not really sure why that
> would take so much longer than creating y.  I got too much stuff open on my box
> with 1gb.
>
>
> Carl K
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>


More information about the Chicago mailing list