From dlenski at gmail.com  Wed Jun 11 03:06:25 2014
From: dlenski at gmail.com (Daniel Lenski)
Date: Tue, 10 Jun 2014 18:06:25 -0700
Subject: [DB-SIG] how greedy should cursor.fetchmany be?
Message-ID: <CAOw_LSGBOwK6=xTS=ULun5eNQ+s=BgaRQVO83HVSK_H8WNr0=g@mail.gmail.com>

I'm writing a DBAPI module for a custom database wrapper. The way I get
data from a query is to ask the underlying database it for a chunk of rows,
at which point it spits out an unpredictable number (it's a black box to
me).

I'm trying to figure out how to implement cursor.fetchmany on top of this,
and make it as efficient as possible. There are two limiting cases:

   1. Python caller asks for the next 10 rows, but the DB gives the next
   100 rows. Obviously, I just cache the extra rows in this case.
   2. Python caller asks for the next 100 rows, but the next call to the DB
   only gives me 10 rows.

In case (2), PEP-249 seems to be telling me that I should keep calling the
DB to get more rows:

The number of rows to fetch per call is specified by the parameter. If it
is not given, the cursor's arraysize determines the number of rows to be
fetched. *The method should try to fetch as many rows as indicated by the
size parameter*. If this is not possible due to the specified number of
rows not being available, fewer rows may be returned.

However, I'd rather implement it slightly differently: "... fetch as many
rows as possible without incurring multiple calls to the underlying DB
engine, but no more than indicated by the size parameter."

Having a loop to make multiple DB calls hidden in the DBAPI module, rather
than in the client code, strikes me as a bad approach because it adds extra
unpredictable latency. A properly-written client will have to do some
looping anyway:

# client code to get next 10k rows from the cursor

rows = []
while True:
    # PEP-249 says that fetchmany should do its best to get as many rows as
requested

    # ... but that will mean a similar or identical loop in the DBAPI module

    more_rows = cur.fetchmany(len(rows) - 10000)
    if not more_rows:
        break
    else:
        rows += more_rows


What do you think? Is there some good reason to adhere strictly to PEP-249?

Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/db-sig/attachments/20140610/e3ed0f03/attachment.html>

From mal at egenix.com  Wed Jun 11 10:55:44 2014
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 11 Jun 2014 10:55:44 +0200
Subject: [DB-SIG] how greedy should cursor.fetchmany be?
In-Reply-To: <CAOw_LSGBOwK6=xTS=ULun5eNQ+s=BgaRQVO83HVSK_H8WNr0=g@mail.gmail.com>
References: <CAOw_LSGBOwK6=xTS=ULun5eNQ+s=BgaRQVO83HVSK_H8WNr0=g@mail.gmail.com>
Message-ID: <53981990.8030702@egenix.com>

On 11.06.2014 03:06, Daniel Lenski wrote:
> I'm writing a DBAPI module for a custom database wrapper. The way I get
> data from a query is to ask the underlying database it for a chunk of rows,
> at which point it spits out an unpredictable number (it's a black box to
> me).
> 
> I'm trying to figure out how to implement cursor.fetchmany on top of this,
> and make it as efficient as possible. There are two limiting cases:
> 
>    1. Python caller asks for the next 10 rows, but the DB gives the next
>    100 rows. Obviously, I just cache the extra rows in this case.
>    2. Python caller asks for the next 100 rows, but the next call to the DB
>    only gives me 10 rows.
> 
> In case (2), PEP-249 seems to be telling me that I should keep calling the
> DB to get more rows:
> 
> The number of rows to fetch per call is specified by the parameter. If it
> is not given, the cursor's arraysize determines the number of rows to be
> fetched. *The method should try to fetch as many rows as indicated by the
> size parameter*. If this is not possible due to the specified number of
> rows not being available, fewer rows may be returned.
> 
> However, I'd rather implement it slightly differently: "... fetch as many
> rows as possible without incurring multiple calls to the underlying DB
> engine, but no more than indicated by the size parameter."
> 
> Having a loop to make multiple DB calls hidden in the DBAPI module, rather
> than in the client code, strikes me as a bad approach because it adds extra
> unpredictable latency. A properly-written client will have to do some
> looping anyway:
> 
> # client code to get next 10k rows from the cursor
> 
> rows = []
> while True:
>     # PEP-249 says that fetchmany should do its best to get as many rows as
> requested
> 
>     # ... but that will mean a similar or identical loop in the DBAPI module
> 
>     more_rows = cur.fetchmany(len(rows) - 10000)
>     if not more_rows:
>         break
>     else:
>         rows += more_rows
> 
> 
> What do you think? Is there some good reason to adhere strictly to PEP-249?

Applications will usually regard the case of returning fewer rows
than specified as having hit the end of the result set and then
not call .fetchmany() again to check for more rows, so I'm not
sure whether your less strict interpretation will be helpful in
making the database module compatible with existing applications.

Note that these applications use this interpretation for the
same reason that you specify: to avoid additional network
roundtrips just to check for an empty list return value
to mark the definite end of the result set.

It would probably be better for compatibility reasons to
have your module to introduce a special new method (say
.fetchchunk()) that works as you describe and have
.fetchmany() provide the regular DB-API behavior.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 11 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2014-06-09: Released eGenix pyOpenSSL 0.13.3 ...  http://egenix.com/go57
2014-07-02: Python Meeting Duesseldorf ...                 21 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From tlocke at tlocke.org.uk  Wed Jun 11 07:55:46 2014
From: tlocke at tlocke.org.uk (Tony Locke)
Date: Wed, 11 Jun 2014 06:55:46 +0100
Subject: [DB-SIG] how greedy should cursor.fetchmany be?
In-Reply-To: <CAOw_LSGBOwK6=xTS=ULun5eNQ+s=BgaRQVO83HVSK_H8WNr0=g@mail.gmail.com>
References: <CAOw_LSGBOwK6=xTS=ULun5eNQ+s=BgaRQVO83HVSK_H8WNr0=g@mail.gmail.com>
Message-ID: <CAPpF0yMH6x68_rVO_BGX=mO3nmLO7aFR547b=msJW0DFY=evEQ@mail.gmail.com>

Hi, I would say that a user of your driver would expect the full
number of rows specified in fetchmany to be returned (if the rows are
there). So my view would be to stick to PEP 249 in this case.

Cheers,

Tony.

On 11 June 2014 02:06, Daniel Lenski <dlenski at gmail.com> wrote:
> I'm writing a DBAPI module for a custom database wrapper. The way I get data
> from a query is to ask the underlying database it for a chunk of rows, at
> which point it spits out an unpredictable number (it's a black box to me).
>
> I'm trying to figure out how to implement cursor.fetchmany on top of this,
> and make it as efficient as possible. There are two limiting cases:
>
> Python caller asks for the next 10 rows, but the DB gives the next 100 rows.
> Obviously, I just cache the extra rows in this case.
> Python caller asks for the next 100 rows, but the next call to the DB only
> gives me 10 rows.
>
> In case (2), PEP-249 seems to be telling me that I should keep calling the
> DB to get more rows:
>
> The number of rows to fetch per call is specified by the parameter. If it is
> not given, the cursor's arraysize determines the number of rows to be
> fetched. The method should try to fetch as many rows as indicated by the
> size parameter. If this is not possible due to the specified number of rows
> not being available, fewer rows may be returned.
>
> However, I'd rather implement it slightly differently: "... fetch as many
> rows as possible without incurring multiple calls to the underlying DB
> engine, but no more than indicated by the size parameter."
>
> Having a loop to make multiple DB calls hidden in the DBAPI module, rather
> than in the client code, strikes me as a bad approach because it adds extra
> unpredictable latency. A properly-written client will have to do some
> looping anyway:
>
> # client code to get next 10k rows from the cursor
>
> rows = []
> while True:
>     # PEP-249 says that fetchmany should do its best to get as many rows as
> requested
>
>     # ... but that will mean a similar or identical loop in the DBAPI module
>
>     more_rows = cur.fetchmany(len(rows) - 10000)
>     if not more_rows:
>         break
>     else:
>         rows += more_rows
>
>
> What do you think? Is there some good reason to adhere strictly to PEP-249?
>
> Thanks,
> Dan
>
> _______________________________________________
> DB-SIG maillist  -  DB-SIG at python.org
> https://mail.python.org/mailman/listinfo/db-sig
>