So I have some data sets of about 160000 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected? Would it be faster if it were loading binary data? -gideon
On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:
So I have some data sets of about 160000 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected? Would it be faster if it were loading binary data?
i have run into this as well. loadtxt uses a python list to allocate memory for the data it reads in, so once you get to about 1/4th of your available memory, it will start allocating the updated list (every time it reads a new value from your data file) in swap instead of main memory, which is rediculously slow (in fact it causes my system to be quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be smarter about allocating memory, but it is slower overall and doesn't support all of the original arguments/options (yet). i have some ideas to make it smarter/more efficient, but have not had the time to work on it recently. i will send the current version to the list tomorrow when i have access to the system that it is on. best wishes, mike
On Sun, 1 Mar 2009 14:29:54 -0500 Michael Gilbert wrote:
i have rewritten loadtxt to be smarter about allocating memory, but it is slower overall and doesn't support all of the original arguments/options (yet).
i had meant to say that my version is slower for smaller data sets (when you aren't close to your main memory limit), but it is orders of magnitude faster for large data sets.
On Sun, Mar 1, 2009 at 11:29 AM, Michael Gilbert <michael.s.gilbert@gmail.com> wrote:
On Sun, 1 Mar 2009 16:12:14 -0500 Gideon Simpson wrote:
So I have some data sets of about 160000 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected? Would it be faster if it were loading binary data?
i have run into this as well. loadtxt uses a python list to allocate memory for the data it reads in, so once you get to about 1/4th of your available memory, it will start allocating the updated list (every time it reads a new value from your data file) in swap instead of main memory, which is rediculously slow (in fact it causes my system to be quite unresponsive and a jumpy cursor). i have rewritten loadtxt to be smarter about allocating memory, but it is slower overall and doesn't support all of the original arguments/options (yet). i have some ideas to make it smarter/more efficient, but have not had the time to work on it recently.
i will send the current version to the list tomorrow when i have access to the system that it is on.
best wishes, mike _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
to address the slowness, i use wrappers around savetxt/loadtxt that save/load a .npy file along with/instead of the .txt file. -- and the loadtxt wrapper checks if the .npy is up-to-date. code here: http://rafb.net/p/dGBJjg80.html of course it's still slow the first time. i look forward to your speedups. -brentp
On Sun, 1 Mar 2009 14:29:54 -0500, Michael Gilbert wrote:
i will send the current version to the list tomorrow when i have access to the system that it is on.
attached is my current version of loadtxt. like i said, it's slower for small data sets (because it reads through the whole data file twice). the first loop is used to figure out how much memory to allocate, and i can optimize this by intelligently seeking through the file. but like i said, i haven't had the time to implement it. all of the options should work, except for "converters" (i have never used "converters" and i couldn't figure out exactly what it does based on a quick read-through of the docs). best wishes, mike
On Sun, Mar 1, 2009 at 15:12, Gideon Simpson <simpson@math.toronto.edu> wrote:
So I have some data sets of about 160000 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected?
Probably. You don't say exactly what you mean by "slow", so it's difficult to tell. But it is unlikely that you are running into some slow corner case or something that no one else has seen.
Would it be faster if it were loading binary data?
Substantially. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Gideon Simpson wrote:
So I have some data sets of about 160000 floating point numbers stored in text files. I find that loadtxt is rather slow. Is this to be expected? Would it be faster if it were loading binary data?
Depending on the format you may be able to use numpy.fromfile, which I suspect would be much faster. It only handles very simple ascii formats, though. Eric
participants (6)
-
Brent Pedersen -
Eric Firing -
Gideon Simpson -
Michael Gilbert -
Michael S. Gilbert -
Robert Kern