<br><div class="gmail_quote">On Thu, May 5, 2011 at 2:12 PM, Miki Tebeka <span dir="ltr"><<a href="mailto:miki.tebeka@gmail.com">miki.tebeka@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Greetings,<br>
<br>
I'm reading some data from avro file using the avro library. It takes about a minute to load 33K objects from the file. This seem very slow to me, specially with the Java version reading the same file in about 1sec.<br>
</blockquote><div> </div><div>You might want to try an apache mailing list, like at <a href="http://avro.apache.org/mailing_lists.html">http://avro.apache.org/mailing_lists.html</a> , as I suspect most Python people use Python's native pickle support instead.<br>
<br>It looks like the Python version of Avro is doing single-byte-at-a-time I/O for some types, which is almost guaranteed to perform poorly. If you're decoding an 8 byte integer, its much faster to at least read 8 bytes and then chop that up, and better still is to read a buffer at a time and chop that up too.<br>
<br>Even in C, the performance of byte-at-a-time I/O is not going to be stellar, especially if you use read() rather than fread().<br><br>A related note: Python is often more about programmer efficiency than machine efficiency. With cost per MIPS going down and the price of programmer time going up, it seems a good idea.<br>
<br></div></div>