Using Python for processing of large datasets (convincing managment)

Thomas Jensen spam at ob_scure.dk
Sun Jul 7 06:17:10 EDT 2002


Matt Gerrans wrote:
> BCB is great -- and you can still use it for the performance-critical areas
> by creating COM Automation servers which are a snap to call from Python.
> One of the great things about Python is that it works well with C/C++ so
> that you can eat your cake and have it too.

I'll consider it, however I don't want to rely too much on COM, in case 
we'd migrate to some Unix flavour some day.
Maybe I'll look into writing some of the algoritms as C modules, but I'm 
not sure it's worth it (see below).

>>The goals of the rewritten piece of software would be:
>>* Improved speed
> 
> Python is not going to help in this area, unfortunately, unless you are
> talking about improved speed of development!  ;-)

Yes and no. In my estimate, the primary reason for the slowness of the 
current job is inefficient dataaccess and datahandling.
A few examples:
* Using "SELECT TOP 1 value FROM T_MyTable ORDER BY value DESC" to get 
maximum value.
* linear seraches
* LOTS of SQL calls returning only oe row

When the job was originally written a lot of factors were not known and, 
well, it did it's job.
However now the amount of data requires better performance.

>>* Improved scalability - parallel processing on multiple machines/CPUs
> 
> This might be more easily accomplished with Java, depending on exactly how
> you intend to implement it.   Java is probably the best tool for distributed
> processing; in particular JINI is ideal for this kind of thing.

I Don't know much Java I must admit. However for my needs I belive 
XMLRPC will do just fine.
I've looked a little into DCOM and, while I won't use it, I must admit I 
like the idea of having the component automatically instiantiated on the 
remote machine instead of having to have a running server. Hmm, if only 
Windows had inetd.

>>* Improved scalability - ability to handle greater databases (>1gb)
 >
> This is probably more dependent on your design than the language or platform
> you choose.

Yes, but I also think that Python makes it easier to get the design right.

>>Now, instead of rewriting the job in C++, I'd (of course) like to use
> Python.
> 
> Naturally!

:-)

>>However the CEO (small company, told you :-), made a couple of somewhat
>>valid points against it.
>>1) He was worried about getting a replacement devlopper in case I left.
> 
> I don't think that is a problem at all, these days.   I think Python
> developers are becoming pretty ubiquitous.   On top of that, any experienced
> programmer can learn Python in a snap -- it is so engaging that it is fun
> and quick to learn.

I'm happy so many people seems to think that, last week I sent a Python 
example to one of my co-workers and he picked it up immediatly (he's a 
VB/ASP devloper with a little C++ knowledge).
When told about it, most people are somewhat sceptic about using 
indentation for program structure (funny, it seems to be the same people 
that never indent their code out of lazyness or whatever ;-). However 
when they see an actual program, most realize how beneficial it is.

>>2) He said, "Name 3 companies using Python for key functions"
> 
> I'd bet *every* company in the Fortune 500 uses Python for one thing or
> another, whether they know it or not.   Many are probably using it for very
> important functions; they just don't advertise it.   Why should they --
> their business is not about explaining how they accomplish every task, it is
> about doing it.   I have developed Python code for one of the largest of
> them that is very key to their business, but I doubt that the CEO would know
> of it or that the company would tout this fact -- what they care about is
> creating and  selling thier products.

And that's what they should care about I guess. It's funny how it works 
isn't it. Our CEO is very worried about all this Open Source stuff (be 
it Python, Linux, *BSD, MySQL, whatever). The problem is not the free as 
in speach - it's the free as in beer. Many people simply can't belive 
that something that is gratis can be any good (which is probably a good 
rule of thumb when speaking of material goods like books and such, just 
not software).
Beeing able to buy MySQL might actually be the convincing factor.

[snip]

> I think the most compelling argument you can come up with is to write a demo
> in Python that works on a subset of the data, as you mentioned above.    The
> speed with which you can develop and the quality of the code you develop
> will be the biggest selling factor.

I'm doing it as we speak, shuldn't take long :-)

> Be aware that your demo could also convince you that Python is not the right
> tool for the job as well.   Python is a great tool, but it is not the best
> tool for *every* task.

I belive it's the perfect tool for this case, but if it's not, it's 
better to find out now i think.

-- 
Best Regards
Thomas Jensen
(remove underscore in email address to mail me)




More information about the Python-list mailing list