[Numpy-discussion] mysql -> record array

nicholas cunliffe ndcunliffe at gmail.com
Fri Nov 17 09:34:05 EST 2006


I think someone needs to sort out distribution lists, im getting lots
of unstructured emails under many different titles

On 11/17/06, Francesc Altet <faltet at carabos.com> wrote:
> A Dijous 16 Novembre 2006 22:28, Erin Sheldon escrigué:
> > Hi Francesc -
> >
> > Unless I missed something, I think what you have
> > shown is that the combination of
> >       (getting data from database into python lists) +
> >       (converting to arrays)
> > is what is taking time.   I would guess the first takes
> > significantly longer than the second.
>
> Seriously, I don't think I have demonstrated nothing really solid in
> that regard with so little evidences. But we can try looking for more
> of those :)
>
> For example, I'd split the times in:
>
>     t1 (getting data from database) +
>     t2 (python lists) +
>     t3 (converting to arrays) =
>     tt (total time)
>
> We don't know t1, but we do know tt. Now, we can try to get a guess
> for t1 and t2. Perhaps I'm wrong, but the next could be good
> estimates.
>
> For t1 (creating the python list of tuples):
> In [44]: Timer("[(x,x) for x in np.arange(500000, dtype='float64')]", "import
> numpy as np").repeat(3,1)
> Out[44]: [0.55968594551086426, 0.48462891578674316, 0.4855189323425293]
>
> For t2 (converting to recarrays):
> In [49]: Timer("np.fromiter(lot, dtype=dtype)", "import numpy as np;
> lot=[(x,x) for x in np.arange(500000, dtype='float64')];
> dtype=np.dtype([('x', 'float64'), ('y', 'float64')])").repeat(3,1)
> Out[49]: [0.50310707092285156, 0.50920987129211426, 0.50304579734802246]
>
> So, it seems that t1 and t2 are similar and they take aproximately 0.5
> seconds each.
>
> Now, let me remember the timings for reading the databases on my
> laptop at work (a Pentium4 @ 2 GHz):
>
> setup SQLite took 23.5661110878 seconds
> retrieve SQLite took 3.26717996597 seconds
> setup PyTables took 0.139157056808 seconds
> retrieve PyTables took 0.13444685936 seconds
>
> So, in our case, tt for SQLite3 was 3.26 seconds. With that, we can
> derive its t1 (getting data from database):
>
> t1 = tt - t1 - t2 =~ 2.26 seconds
>
> However, this is still far more than tt for PyTables (~ 0.14 sec), so
> I'm not completely sure what's going on. Honest, I don't think that
> HDF5 (the underlying library for doing I/O in PyTables) would be
> almost 20x faster than SQLite3 for reading purposes. So my guess is
> that there should be more factors contributing tt for SQLite3 that
> I've not taken in account. Anyone can find them?
>
> Cheers,
>
> --
> >0,0<   Francesc Altet http://www.carabos.com/
> V   V   Cárabos Coop. V. Enjoy Data
>  "-"
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list