[Chicago] Announcing Datahub 0.7

Lukasz Szybalski szybalski at gmail.com
Mon Jan 12 06:13:10 CET 2009


http://lucasmanual.com/mywiki/DataHub


 *Datahub is a tool that allows faster download/crawl, parse, load,
and visualize of data. It achieves this by allowing you to divide each
step into its own work folders. In each work folder you get a sample
files that you can start coding.
 *Datahub is for people who found some interesting data source for
them, they want to download it, parse it, load it into database,
provide some documentation, and visualize it. Datahub will speed up
the process by creating folder for each of these actions. You will
create all the programs from our base default template and move on to
analyzing the data in no time.

How to get started?: Datahub is a python based tool and here is how to run it.

**Create python virtualenviroment:
virtualenv --no-site-packages datahubENV
source datahubENV/bin/activate


**How to get it:
wget http://launchpad.net/datahub/trunk/0.7/+download/datahub-0.7.tar.gz
tar -xzvf datahub-0.7.tar.gz

** Install it:
cd datahub-0.7/
python setup.py install

**Create you project using datahub default templates:

paster create --list-templates
paster create -t datahub

** Where do I start:
Above commands created a project skeleton that has 4 folders: crawl
(sample code to download via wget or harvestman), parse (here is where
you parse raw data), load (here is where you load the data into
database using sqlalchemy or a tool of your choice), hdf5 (convert to
hdf5 if you don't want to use database), wiki (provide some
documentation)


This is a first release, so feedback is appreciated. Give it a try if
you have some interesting data to deal with.

Thanks,
Lucas


More information about the Chicago mailing list