[Tutor] File vs. Database (possible off topic)

Steven D'Aprano steve at pearwood.info
Tue Nov 22 14:13:43 CET 2011


Ken G. wrote:
> It occurred to me last week while reviewing the files I made in using 
> Python, it could be somewhat similar to a database.
> 
> What would be a different between a Python files and Python databases?  
> Granted, the access in creating them are different, I really don't see 
> any different in the format of a file and a database.

A database is essentially a powerful managed service built on top of one 
or more files. There's nothing you can do with a database that you can't 
do with a big set of (say) Windows-style INI files and a whole lot of 
code to manage them. A database does all the management for you, 
handling all the complexity, data integrity, and security, so that you 
don't have to re-invent the wheel. Since database software tends to be 
big and complicated, there is a lot of wheel to be re-invented.

To be worthy of the name "database", the service must abide by the ACID 
principles:

Atomicity
---------

The "all or nothing" principle. Every transaction must either completely 
succeed, or else not make any changes at all. For example, if you wish 
to transfer $100 from account A to account B, it must be impossible for 
the money to be removed from A unless it is put into B. Either both 
operations succeed, or neither.


Consistency
-----------

Any operation performed by the database must always leave the system in 
a consistent state at the end of the operation. For example, a database 
might have a table of "Money Received" containing $2, $3, $5, $1 and $2, 
and another field "Total" containing $13, and a rule that the Total is 
the sum of the Money Received. It must be impossible for an operation to 
leave the database in an inconsistent state by adding $5 to the Money 
Received table without increasing Total to $18.


Isolation
---------

Two transactions must always be independent. It must be impossible for 
two transactions to attempt to update a field at the same time, as the 
effect would then be unpredictable.


Durability
----------

Once a transaction is committed, it must remain committed, even if the 
system crashes or the power goes out. Once data is written to disk, 
nothing short of corruption of the underlying bits on the disk should be 
able to hurt the database.


Note that in practice, these four ACID principles may be weakened 
slightly, or a lot, for the sake of speed, convenience, laziness, or 
merely by incompetence. Generally speaking, for any program (not just 
databases!) the rule is:

"Fast, correct, simple... pick any two."

so the smaller, faster, lightweight databases tend to be not quite as 
bullet-proof as the big, heavyweight databases.


Modern databases also generally provide an almost (but not quite) 
standard interface for the user, namely the SQL programming language. 
Almost any decent database will understand SQL. For example, this command:

SELECT * FROM Book WHERE price > 100.00 ORDER BY title;

is SQL to:

* search the database for entries in the Book table
* choose the ones where the price of the book is greater than $100
* sort the results by the book title
* and return the entire record (all fields) for each book

So, broadly speaking, if you learn SQL, you can drive most databases, at 
least well enough to get by.


-- 
Steven



More information about the Tutor mailing list