python daemon - compress data and load data into MySQL by pyodbc

Martin P. Hellwig martin.hellwig at dcuktec.org
Thu Sep 3 15:09:55 EDT 2009


MacRules wrote:
<cut>
> What I am looking for is this.
> 
> Oracle DB in data center 1 (LA, west coast)
> MSSQL DB in data center 2 (DC, east coast)
> So network bandwidth is an issue, I prefer to have gzip fist and deliver 
> the data.

If bandwidth is really an issue, you should send compressed delta's.

> 
> I need 2 python daemons or a web service here in the future.
> I will enter the Oracle table name, user id and password.
> So the task is dump out Oracle data (Linux) and insert that to MSSQL.

That is assuming the table is the same and the columns of the table have 
  the same type with the same restrictions and don't even get me started 
about encoding differences.

> 
> 
> I can try first with 1 daemon python. Take the Oracle data file, and let 
> the daemon connects to MSSQL (with pyodbc) and load the data in.

I think that you are underestimating your task, I would recommend to 
'design' your application first using an UML like approach, whether you 
need one, two or bazillion daemons should not be a design start but a 
consequence of the design.

Anyway here is a sum up and some further pointers to take in regard for 
your design:
- Can I do delta's? (if yes how do I calculate them)
- Are the tables comparable in design
   Specifically if there is a limit on the character width, does Oracle 
and MS-SQL think the same about newlines? (one versus two characters)
- How about encoding, are you sure it works out right?
- Is ATOMIC an issue
- Is latency an issue, that is how long may the tables be out of sync
- Can it be manual or is it preferably automatic
   How about a trigger in the database that sets the sync going, if that 
is too much burden how about a trigger that set a flag on the disk and a 
scheduled job that reads that flag first
- Is security an issue, do I need an encrypted channel

In the past I've wrote a program that had the same starting point as you 
had, over time it grew in design to be a datawarehouse push/pull central 
server with all other databases as an agency. The only reason I wrote it 
was because the more standard approach like business objects data 
integrator was just way too expensive and oracles solutions didn't play 
nice with PostgreSQL (though over time this issue seemed to be resolved).

-- 
MPH
http://blog.dcuktec.com
'If consumed, best digested with added seasoning to own preference.'



More information about the Python-list mailing list