Using Python for processing of large datasets (convincing managment)

William Park opengeometry at NOSPAM.yahoo.ca
Sat Jul 6 13:54:40 EDT 2002


Thomas Jensen <null at obscure.dk.x> wrote:
> Hello group (and list :-),
> 
> I've used Python for several years (and followed this group until about 6
> months ago).  I work in a small company which specialises in collecting
> and procesing financial data. Most of our production environment is based
> on Microsoft stuff like ASP/VBScript, VB6, WinNT, MS SQL Server, etc.
> 
> One of the next development tasks is rewriting the nightly processing job
> which is having problems with our ~100mb database (it it written in
> Borland C++, but absolutely not optimized for speed!).
> 
> The goals of the rewritten piece of software would be:
> * Improved speed
> * Improved scalability - parallel processing on multiple machines/CPUs
> * Improved scalability - ability to handle greater databases (>1gb)
> * Ability to calculate only a subset of the data
> 
> Now, instead of rewriting the job in C++, I'd (of course) like to use 
> Python.  However the CEO (small company, told you :-), made a couple of
> somewhat valid points against it.
> 1) He was worried about getting a replacement devlopper in case I left.
> 2) He said, "Name 3 companies using Python for key functions"
> 3) He was worried about the stability/reliability of python in our 
> production environment (you know, 99.999 % and all that)
> 
> I was hoping someone in this group could help with some really 
> compelling arguments, as I'd really to use Python for this job.


If your cronjob can tackle 1MB but not 1GB, then I don't think this is
programming language issue.  Rather, you should look at your algorithm and
data structure.

If your company is private for-profit company, then use money argument:

    - Anyone who knows Python or Unix shell will have the necessary
      analytical skills.  And, there can easily be found on
      <comp.lang.python> or <comp.unix.shell>.

    - Scaling to multiple CPU is OS issue.  Much easier with Linux (no
      comment on Windows :-)

    - Scaling to GB is algorithm issue.  Python makes development easier,
      because it's easy to write and read.

Mostly, he saves money because he will be able to find right people.  The
fact that they happen to know the right language is just bonus.

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



More information about the Python-list mailing list