Perl / python regex / performance comparison
Ciprian Dorin, Craciun
ciprian.craciun at gmail.com
Tue Mar 3 21:47:16 CET 2009
On Tue, Mar 3, 2009 at 7:03 PM, Ivan <ivan at invalid.com> wrote:
> Hello everyone,
> I know this is not a direct python question, forgive me for that, but
> maybe some of you will still be able to help me. I've been told that
> for my application it would be best to learn a scripting language, so
> I looked around and found perl and python to be the nice. Their syntax
> and "way" is not similar, though.
> So, I was wondering, could any of you please elaborate on the
> following, as to ease my dilemma:
> 1. Although it is all relatively similar, there are differences
> between regexes of these two. Which do you believe is the more
> powerful variant (maybe an example) ?
> 2. They are both interpreted languages, and I can't really be sure how
> they measure in speed. In your opinion, for handling large files,
> which is better ?
> (I'm processing files of numerical data of several hundred mb - let's
> say 200mb - how would python handle file of such size ? As compared to
> perl ?)
> 3. This last one is somewhat subjective, but what do you think, in the
> future, which will be more useful. Which, in your (humble) opinion
> "has a future" ?
> Thank you for all the info you can spare, and expecially grateful for
> the time in doing so.
> -- Ivan
I could answer to your second question (will Python handle large
files). In my case I use Python to create statistics from some trace
files from a genetic algorithm, and my current size is up to 20MB for
about 40 files. I do the following:
* use regular expressions to identify each line type, extract the
information (as numbers);
* either create statistics on the fly, either load the dumped data
into an Sqlite3 database (which got up to a couple of hundred MB);
* everything works fine until now;
I've also used Python (better said an application built in Python
with cElementTree?), that took the Wikipedia XML dumps (7GB? I'm not
sure, but a couple of GB), then created a custom format file, from
which I've tried to create SQL inserts... And everything worked good.
(Of course it took some time to do all the processing).
So my conclusion is that if you try to keep your in-memory data
small, and use the smart (right) solution for the problem you could
use Python without (big) overhead.
Another side-note, I've also used Python (with NumPy) to implement
neural networks (in fact clustering with ART), where I had about 20
thousand training elements (arrays of thousands of elements), and it
worked remarkably good (I would better than in Java, and comparable
I hope I've helped you,
P.S. If you just need one regular expression transformation to
another, or you need regular expression searching, then just use sed
or grep as you would not get anything better than them.
More information about the Python-list