
On 11/16/10 8:57 AM, Darren Dale wrote:
In my case, I am making an assumption about the integrity of the file.
That does make things easier, but less universal. I guess this is the whole trade-off about "reusable code". It sure it a lot easier to write code that does the one thing you need than something general purpose.
Anyone know what the advantage of ato* is over scanf()/fscanf()?
Also, why are you doing string parsing rather than parsing the files directly, wouldn't that be a bit faster?
Rank inexperience, I guess. I don't understand what you have in mind.
if your goal is to read numbers from an ascii file, you can use fromfile() directly, rather than reading the file (or some of it) into a string, and then using fromstring(). Also, in C, you can use fscanf to read the file directly (of course, under the hood, it's putting stuff in stings somewhere along the line, but presumably in an optimized way.
scanf/fscanf don't actually convert strings to numbers, do they?
yes, that's exactly what they do.
http://en.wikipedia.org/wiki/Scanf
The C lib may very well use ato* under the hood.
My idea at this point is to write a function in Cython to takes a file and a numpy dtype, converts the dtype to a scanf format string, then calls fscanf (or scanf) to parse out the file. My existing scanner code more or less does that, but the format string is hard-code to be either for floats or doubles.
I've got some C extension code for simple parsing of text files into arrays of floats or doubles (using fscanf). I'd be curious how the performance compares to what you've got. Let me know if you're interested.
I'm curious, yes.
OK -- I'll whip up a test similar to yours -- stay tuned!
-Chris