[Python-Dev] Missing arguments in RE functions

Noam Raphael noamr at myrealbox.com
Tue Sep 7 21:34:11 CEST 2004


Hello,

I've now finished teaching Python to a group of people, and regular 
expressions was a part of the course. I have encountered a few missing 
features (that is, optional arguments) in RE functions. I've checked, 
and it seems to me that they can be added very easily.

The first missing feature is the "flags" argument in the findall and 
finditer functions. Searching for all occurances of an RE is, of course, 
a legitimate action, and I had to use (?s) in my RE, instead of adding 
re.DOTALL, which, to my opinion, is a lot clearer.
The solution is simple: the functions sub, subn, split, findall and 
finditer all first compile the given RE, with the flags argument set to 
0, and then run the appropriate method. As far as I can see, they could 
all get an additional optional argument, flags=0, and compile the RE 
with it.

The second missing feature is the ability to specify start and end 
indices when doing matches and searches. This feature is available when 
using a compiled RE, but isn't mentioned at all in any of the 
straightforward functions (That's why I didn't even know it was 
possible, until I now checked - I naturally assumed that all the 
functionality is availabe when using the functions).
I think these should be added to the functions match, search, findall 
and finditer. This feature isn't documented for the findall and finditer 
methods, but I checked, and it seems to work fine.
(In case you are interested in the use case: the exercise was to parse 
an XML file. It was done by first matching the beginning of a tag, then 
trying to match attributes, and so on - each match starts from where the 
previous successfull match ended. Since I didn't know of this feature, 
it was done by replacing the original string with a substring after 
every match, which is terribly unefficient.)

If you approve, I can create a patch in a few minutes and send it.

Have a good day,
Noam Raphael



More information about the Python-Dev mailing list