[Tutor] How is database creation normally handled?

Sun Sep 10 13:41:32 EDT 2017

On 09/10/2017 02:08 AM, Alan Gauld via Tutor wrote:
> On 10/09/17 01:29, boB Stepp wrote:
>> While reading about SQL, SQLite and the Python module sqlite3, it
>> appears that I could (1) have a program check for the existence of the
>> program's database, and if not found, create it, make the tables,
>> etc.; or, (2) create the database separately and include it with the
>> program. 
> 
> Almost always number 2.
> 
> The slight variant on that is to put the SQL to create the
> tables etc into a sql file and then if a database file does
> not exist create the database by executing the sql script
> (which you can do via the executescript() function)
> 
> One reason for doing it this way is that while SQL is
> standardised the standardisation is mainly in the query
> part of the language. Each database is quite different
> in its data definition language (different data types,
> features like triggers and stored procedures etc).
> 
> So by putting the DDL into a sql file that your program
> just executes the non portable bit gets removed from your
> Python code. You can even have multiple different
> versions of the DDL file, one per database, and  your
> Python code can remain pretty much oblivious to the
> changes (not quite true, but its much easier).
> 
> Populating the data is another matter. I tend to do
> that via a Spreadsheet/csv file and write a loader
> in Python to loop over the data loading it into the
> various tables then setting the table properties/constraints
> as needed. (if you set all constraints before loading the
> data you can get into a kind of data deadlock where
> you can't get any initial data loaded!).

All of this depends on how you intend to use the database.  If it's a
small thing, say to enable experiments, you can certainly build it on
the fly (esp. if you'll never go beyond the sqlite level). I'm generally
agreeing that the "build it outside your program" and "keep the DB code
in SQL" are good ideas.

There are so many different types of databases for different uses. I've
spent a decade and a half on a project (sadly now it's just a trickle of
time to maintain anything that breaks) where we do have a central
"official" database, but want people to be able to do local experiments,
because the "official" database helps describe a standard, making it
update-rarely.  So on any approved change, the database is immediately
dumped, and the version-control copy of that dump has the changes
committed.  With some supporting code (written in Perl, not Python -
hey, I did't start those scripts :)), anybody who wants to experiment
with proposed updates, either to fix a reported bug, or to prototype
changes for the next version, can just pull the repository and go "make
restore" and a local copy of the (mysql) db will be built. So that's
what I just agreed with: the setup process happens outside the code app
which will access the DB; the DB creation _and_ the data population are
both kept entirely in SQL (as a result of using the dumps).  That's
proven to be very workable for that somewhat unusual usage model, but
maybe it's not that unusual - any time you need to start with a
reproducible known state of the DB, that would work pretty well; it
would be a disaster for a DB where there were tons of commits, stuff
changed mostly interactively, etc.