RegExp, Python and strings

Tim Peters tim_one at email.msn.com
Sat Jan 8 17:28:06 EST 2000


[Matthias Huening]
> I'm new to PYTHON and I am trying to find out if PYTHON is the
> right language for work with strings (which is what I need most).

In my experience, the best languages for working with strings are SNOBOL4
and Icon (http://www.cs.arizona.edu/icon/).  They're outstanding for
strings, but not so hot for many other things.  Python is not outstanding
for strings, but pretty darn good at darned near everything (incl. strings).

> Therefore I am playing around with the RE-module.
>
> In PERL I can do things like this (in one line):
> $A = "Rossum, Guido van; Harms, Daryl; Python, Franz-Josef";
> $A =~ s/([A-Z])[\w]+(?![ \w\-]+,)([ ;-]|$)/$1.$2/g;
> This RegExp results in: "Rossum, G. van; Harms, D.; Python, F.-J."

As Andrew showed, there's an almost direct translation of this into Python.
If fitting things "in one line" is an important goal, you'll be happier with
Perl; Python doesn't strive to be a one-liner kind of language.

String manipulation is also (in general) quicker using Perl, although that's
subject to change in future releases (in particular, relative to Python
1.5.2, Perl has special speed advantages in running the same regexp
thousands of times over relatively short (say, < 80 chars) strings).

Regexps in Python are also "just another module"; they're not integrated
deeply into the language as they are in Perl.  It's fair to say that string
manipulation in Python gets more effective the more you get away from trying
to "write Perl in Python".  Python's non-regexp string module functions
(like string.find and string.split) are too often overlooked by new
immigrants.

> Now I am trying to rebuild this in PYTHON, but I can't get it
> to work.  Any hints? Should I keep trying? Or should I stick
> to PERL for those kinds of string-manipulations?

If you're thoroughly happy with Perl, there's no need to switch -- but if
you were I doubt you'd be asking the question <wink>.  Maybe you could say
more about what you're trying to get *away* from, and we can try to guess
whether you'll be happier with Python on those bases.

While you *can* do the regexp substitution above very much the same way you
do it in Perl, a "native Pythoneer" is likely to think about ways to break
it into smaller problems (e.g., for staters using string.split to break the
input on semicolons).  In this specific problem that's mostly a matter of
taste, but Python *encourages* decomposition in ways Perl does not (e.g.,
there are no embedded assignments, and in various other ways Python erects
barriers against trying to "do too much" in one line -- you can fight that
with some success, but the result will be strained).

if-you-"think-in-perl"-perl-is-the-right-language-ly y'rs  - tim






More information about the Python-list mailing list