Psycopg2 pool clarification

Sat Jun 10 12:36:55 EDT 2017

On 2017-06-08 19:55, Ian Kelly wrote:
> On Thu, Jun 8, 2017 at 10:47 AM, Israel Brewster 
> <israel at ravnalaska.net> wrote:
>>> On Jun 7, 2017, at 10:31 PM, dieter <dieter at handshake.de> wrote:
>>> 
>>> israel <israel at ravnalaska.net> writes:
>>>> On 2017-06-06 22:53, dieter wrote:
>>>> ...
>>>> As such, using psycopg2's pool is essentially
>>>> worthless for me (plenty of use for it, i'm sure, just not for me/my
>>>> use case).
>>> 
>>> Could you not simply adjust the value for the "min" parameter?
>>> If you want at least "n" open connections, then set "min" to "n".
>> 
>> Well, sure, if I didn't care about wasting resources (which, I guess 
>> many people don't). I could set "n" to some magic number that would 
>> always give "enough" connections, such that my application never has 
>> to open additional connections, then adjust that number every few 
>> months as usage changes. In fact, now that I know how the logic of the 
>> pool works, that's exactly what I'm doing until I am confident that my 
>> caching replacement is solid.
>> 
>> Of course, in order to avoid having to open/close a bunch of 
>> connections during the times when it is most critical - that is, when 
>> the server is under heavy load - I have to set that number arbitrarily 
>> high. Furthermore, that means that much of the time many, if not most, 
>> of those connections would be idle. Each connection uses a certain 
>> amount of RAM on the server, not to mention using up limited 
>> connection slots, so now I've got to think about if my server is sized 
>> properly to be able to handle that load not just occasionally, but 
>> constantly - when reducing server load by reducing the frequency of 
>> connections being opened/closed was the goal in the first place. So 
>> all I've done is trade dynamic load for static load - increasing 
>> performance at the cost of resources, rather than more intelligently 
>> using the available resources. All-in-all, not the best solution, 
>> though it does work. Maybe if load was fairly constant it would make 
>> more sense though. So like I said *my* use c
>  ase, whi
>  ch
>>   is a number of web apps with varying loads, loads that also vary 
>> from day-to-day and hour-to-hour.
>> 
>> On the other hand, a pool that caches connections using the logic I 
>> laid out in my original post would avoid the issue. Under heavy load, 
>> it could open additional connections as needed - a performance penalty 
>> for the first few users over the min threshold, but only the first 
>> few, rather than all the users over a certain threshold ("n"). Those 
>> connections would then remain available for the duration of the load, 
>> so it doesn't need to open/close numerous connections. Then, during 
>> periods of lighter load, the unused connections can drop off, freeing 
>> up server resources for other uses. A well-written pool could even do 
>> something like see that the available connection pool is running low, 
>> and open a few more connections in the background, thus completely 
>> avoiding the connection overhead on requests while never having more 
>> than a few "extra" connections at any given time. Even if you left of 
>> the expiration logic, it would still be an improvement, because while 
>> unused connections
>  wouldn't
>   d
>>  rop, the "n" open connections could scale up dynamically until you 
>> have "enough" connections, without having to figure out and hard-code 
>> that "magic number" of open connections.
>> 
>> Why wouldn't I want something like that? It's not like its hard to 
>> code - took me about an hour and a half to get to a working prototype 
>> yesterday. Still need to write tests and add some polish, but it 
>> works. Perhaps, though, the common thought is just "throw more 
>> hardware at it and keep a lot of connections open at all time?" Maybe 
>> I was raised to conservatively, or the company I work for is too 
>> poor.... :-D
> 
> Psycopg is first and foremost a database adapter. To quote from the
> psycopg2.pool module documentation, "This module offers a few pure
> Python classes implementing *simple* connection pooling directly in
> the client application" (emphasis added). The advertised list of
> features at http://initd.org/psycopg/features/ doesn't even mention
> connection pooling. In short, you're getting what you paid for.
> 
> It sounds like your needs are beyond what the psycopg2.pool module
> provides.

Quite possible. Thus the reason I was looking for clarification on how 
the module was intended to work - if it doesn't work in the way that I 
want it to, I need to look elsewhere for a solution. My main reason for 
posting this thread was that I was expecting it to work one way, but 
testing showed it working another way, so I was trying to find out if 
that was intentional or user error. Apparently it's intentional, so 
there we go - in it's current form at least, my needs are beyond what 
the psycopg2 pool provides. Fair enough.

> I suggest looking into a dedicated connection pooler like
> PgBouncer. You'll find that it's much more feature-rich and
> configurable than psycopg2.pool. It's production-ready, unlike your
> prototype. And since it's a proxy, it can take connections from
> multiple client apps and tune the pool to your overall load rather
> than on an app-by-app basis (and thus risk overloading the backend if
> multiple apps unexpectedly peak together).

Very true, and I've looked into that (as well as the related, but more 
basic, PgPool product), but it seems to me that any external proxy 
product like these would defeat *my* purpose for using a pool in the 
first place: avoiding the overhead of making/breaking many connections 
quickly. That is, all you have really done is gone from connecting to 
Postgres to connecting to PgBouncer. You are still making and breaking 
just as many connections. Unless connecting to PgBouncer is 
significantly cheaper than connecting to Postgres? This may well be the 
case, but I haven't yet seen anything to support that. Haven't seen 
anything to refute that either, however :)

Of course, there may be many other features provided by such tools that 
would make them worthwhile, even for my use case. However, my primary 
goal in using a pool was avoiding the connection overhead with each 
request, so if a tool doesn't do that, then it isn't the right tool for 
me :)

> 
> As for why psycopg2.pool is the way it is, maybe most users don't have
> your situation of serving multiple apps with loads varying on
> different cycles. Most are probably only serving a single app, or if
> serving multiple apps then they likely have common user bases with
> similar peak times. You can't dynamically adjust the amount of RAM in
> your server, so saving resources like RAM at below-peak times only
> matters if you're going to do something else with it. In the scenarios
> I described there isn't much else to do with it, so I can understand
> if saving RAM isn't a priority.

True, but you would still have to deal with the minconn "magic number", 
unless you just adjusted it so high from the start that even if your 
load/use grows over time you never have to mess with it. Even in the 
single app use case, where you don't care about RAM usage (since there 
is nothing else trying to use the RAM) in order to get maximum benefit 
from a pool you'd have to keep an eye on your usage and make sure it 
never (or rarely) exceeds whatever arbitrary value you have set for 
minconn. Not a big deal, especially if you tune it high to begin with, 
but it is one more thing.

Honestly, some of that is just personal issues. I have problems with 
code that is inefficient by design (even if there is nothing to be 
gained from efficiency), or that makes assumptions about things when it 
could be dynamic. I have often coded in such a way that a given value 
can be adjusted dynamically, even when the people giving me the specs 
say it will never change. More than once that has enabled me to respond 
to a "feature request" or change order by saying "it already does that". 
On the other hand, I am probably a poster child for "premature 
optimization", and often have to stop myself from optimizing code just 
because I can, when in reality it is not worth my time. By the same 
token, the idea of wasting resources - even when, as you state, there is 
nothing else to do with them - just rubs me the wrong way. As such, I 
readily acknowledge that some of my requests/statements stem from my own 
personal desires, and not from any actual lack/need in the product.