[Tutor] How should my code handle db connections? Should my db manager module use OOP?

Thu Aug 27 06:42:17 CEST 2015

Hi there,

> My ongoing project will be centered around an SQLite db.

Not a bad way to start.  There are many possible ways to access SQL 
DBs.  I'll talk about one of my favorites, since I'm a big fan of 
sqlalchemy [0], which provides a broad useful toolkit for dealing 
with SQL DBs and an abstraction layer.

To start, often the question is why any such abstraction tool, given 
the additional complexity of a module, a.k.a. another layer of code?

Briefly, my main two reasons:

   A) abstraction of data model from SQL implementation for the
      Python program (allows switching from SQLite another DBAPI,
      e.g. postgres, later with a minimum effort)
   B) somebody has already implemented the tricky bits, such as ORMs
      (see below), failover, connection pooling (see below) and
      other DB-specific features

> Since almost all data needed by the program will be stored in this 
> db, my thought is that I should create a connection to this db 
> shortly after program startup and keep this connection open until 
> program closure.

That is one possible approach.  But, consider using a "connection 
pooling" technique that somebody else has already implemented and 
tested.  This saves your time for working on the logic of your 
program.

There are many different pooling strategies, which include things 
like "Use only one connection at a time." or "Connect on demand." or 
"Hold a bunch of connections open and let me use one when I need 
one, and I'll release it when I'm done." and even "When the 
connection fails, retry quietly in the background until a successful 
connection can be re-established."

> I am assuming that opening and closing a db connection has enough 
> overhead that I should only do this once.  But I do not *know* 
> that this is true.  Is it?  If not, then the alternative would 
> make more sense, i.e., open and close the db as needed.

Measure, measure, measure.  Profile it before coming to such a 
conclusion.  You may be correct, but, it behooves you to measure. 
(My take on an old computing adage:  Premature optimization can lead 
you down unnecessarily painful or time consuming paths.)

N.B.  Only you (or your development cohort) can anticipate the load 
on the DB, the growth of records (i.e. data set size), the growth of 
the complexity of the project, or the user count.  So, even if the 
measurements tell you one thing, be sure to consider the longer-term 
plan for the data and application.

Also, see Steven D'Aprano's comments about concurrency and other 
ACIDic concerns.

> In the first iteration of my project, my intent is to create and 
> populate the db with tables external to the program.  The program 
> will only add entries to tables, query the db, etc.  That is, the 
> structure of the db will be pre-set outside of the program, and 
> the program will only deal with data interactions with the db.

If the structure of the DB is determined outside the program, 
this sounds like a great reason to use an Object Relational 
Modeler (ORM).  An ORM which supports reflection (sqlalchemy 
does) can create Pythonic objects for you.

> My intent is to make the overall design of the program OO, but I 
> am wondering how to handle the db manager module.  Should I go OO 
> here as well?  With each pertinent method handling a very specific 
> means of interacting with the db?  Or go a procedural route with 
> functions similar to the aforementioned methods?  It is not clear 
> to me that OOP provides a real benefit here, but, then again, I am 
> learning how to OOP during this project as well, so I don't have 
> enough knowledge yet to realistically answer this question.

I'm not sure I can weigh in intelligently here (OOP v. procedural), 
but I'd guess that you could get that Object-Oriented feel by taking 
advantage of an ORM, rather than writing one yourself.  Getting used 
to the idea of an ORM can be tricky, but if you can get reflection 
working [1], I think you will be surprised at how quickly your 
application logic (at the business layer) comes together and you can 
(mostly) stop worrying about things like connection logic and SQL 
statements executing from your Python program [2].

There probably are a few people on this list who have used 
sqlalchemy and are competent to answer it, but if you have questions 
specifically about sqlalchemy, you might find better answers on 
their mailing list [3].

Now, back to the beginnings...a SQLite DB is a fine place to start 
if you have only one thread/user/program accessing the data at any 
time.  Don't host it on a network(ed) file system if you have the 
choice.  If your application grows so much in usage or volume that 
it needs a new and different DB, consider it all a success and 
migrate accordingly.

Best of luck,

-Martin

  [0] http://www.sqlalchemy.org/
  [1] http://docs.sqlalchemy.org/en/rel_1_0/core/reflection.html
  [2] Here, naturally, I'm assuming that you know your way around
      SQL, since you are asserting that the DB already exists, is
      maintained and designed outside of the Python program.
  [3] https://groups.google.com/forum/#!forum/sqlalchemy

-- 
Martin A. Brown
http://linux-ip.net/