[BangPypers] Unstructured data and python

Pradeep Gowda pradeep at btbytes.com
Fri Oct 16 20:47:48 CEST 2009

On Fri, Oct 16, 2009 at 2:31 PM, Carl Trachte <ctrachte at gmail.com> wrote:
> On 10/16/09, Ramdas S <ramdaz at gmail.com> wrote:
>> Has anyone worked/seen any project which involves migrating unstructured
>> data, mostly text files to a reasonably indexed databas preferably written
>> in Python or has Python APIs.
>> I am even ok if its commercial project.
> FWIW, when I worked in a Microsoft SQL environment, I used DTS for SQL
> 7 or 2000 with the win32com modules and SSIS for with IronPython for
> later versions.
> It was usually a standard process of glueing together a bunch of data
> in a csv file with Python, then automating the DTS or SSIS program to
> dump the data to a database table or series of tables.
> You could probably do something similar with MySQL or Postgres.  The
> hard part was always writing the Python to do the situation-specific
> initial crunch of the data.

I believe what you are looking for is a an ETL (extraction,
tranformation and loading) application.
It can be as simple as couple of python scripts, especially if it is a
one-off job.
You can use web.py's sql module or sqlalchemy(more work..) to generate
sql statements, if you don't
like writing sql statements yourself.

If the data loading/cleaning/transformation has to be on a regular
basis, you may want to investigate
something like http://www.pentaho.com/products/data_integration/. I
have had fairly decent
success with using Pentaho Chef suite (link above) in doing ETL for
telco OLTP data with postgresql as the destination DB.


More information about the BangPypers mailing list