[CentralOH] Comparing Databases for Test Suite

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Sat Jan 5 02:52:58 CET 2013

What is a better way of easily comparing the data of some tables 
of two databases? 

I have Django code that modifies a Postgresql database. As part 
of a test suite, I want to compare several tables of the 
modified database to corresponding tables of a known good result 
database, without writing much code. 

Here's what I have so far. It works, but golly it's ugly. 
I use the output of pgdump, of which the order of tables and rows 
is not deterministic, so I use sort to impose the same order on 
both dumps. There are timestamps in the tables, which can not 
match, since the known good and test databases were modified 
at different times, so I use sed to change timestamps to 
a consistent string, 'timestamp-sentinel' that _will_ match. 

[jep200404 at localhost ~]$ cat bin/snapshot 
#!/usr/bin/env sh

# Takes a "snapshot" of a few tables. 

pg_dump -a \
-t file \
-t author \
-t book \
-t article \

[jep200404 at localhost ~]$ 

bin/snapshot known-good-catalog-result >known-good-catalog-result.pgdump
echo 'create database "catalog-test" with template "known-good-catalog-start";' \
| psql -f - -d foo
# foo is an empty database that is not used, 
# except as something to appease psql with. 

django-admin.py mogrify
bin/snapshot catalog-test >catalog-test-result.pgdump

sed -e 's/'"$date_of_known_good"\
' [012][0-9]:[0-5][0-9]:[0-5][0-9].[0-9]*+00/timestamp-sentinel/g' \
<known-good-catalog-result.pgdump | sort >known-good.glop

sed -e 's/'"$today"\
' [012][0-9]:[0-5][0-9]:[0-5][0-9].[0-9]*+00/timestamp-sentinel/g' \
<catalog-test-result.pgdump | sort >test.glop

if cmp known-good.glop test.glop; then
    echo Good!
    rm known-good.glop test.glop
    exit 0
    echo Bad!
    exit 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

http://pgfoundry.org/projects/pg-comparator/ looks interesting, 
although I would likely have to write Django code to change the 
timestamps to a single value before comparing. Two sed commands 
are easier. 

Separately and less importantly, can one easily compare the data 
of tables of two databases in Python as opposed to the shell 
stuff I did? 

More information about the CentralOH mailing list