[Python-Dev] Should we move to replace re with regex?

Ezio Melotti ezio.melotti at gmail.com
Sun Aug 28 05:59:59 CEST 2011

On Sat, Aug 27, 2011 at 4:56 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sat, 27 Aug 2011 04:37:21 +0300
> Ezio Melotti <ezio.melotti at gmail.com> wrote:
> >
> > I'm not sure it's worth doing an extensive review of the code, a better
> > approach might be to require extensive test coverage  (and a review of
> > tests).  If the code seems well written, commented, documented (I think
> > proper rst documentation is still missing),
> Isn't this precisely what a review is supposed to assess?

This can be done without actually knowing and understanding every single
function in the module (I got the impression that someone wants this kind of
review, correct me if I'm wrong).

> > We will get familiar with the code once we start contributing
> > to it and fixing bugs, as it already happens with most of the other
> modules.
> I'm not sure it's a good idea for a module with more than 10000 lines
> of C code (and 4000 lines of pure Python code). This is several times
> the size of multiprocessing. The C code looks very cleanly written, but
> it's still a big chunk of algorithmically sophisticated code.

Even unicodeobject.c is 10k+ lines of C code and I got familiar with (parts
of) it just by fixing bugs in specific functions.
I took a look at the regex code and it seems clear, with enough comments and
several small functions that are easy to follow and understand.
multiprocessing requires good knowledge of a number of concepts and
platform-specific issues that makes it more difficult to understand and
maintain (but maybe regex-related concepts seems easier to me because I'm
already familiar with them).

I think it would be good to:
  1) have some document that explains the general design and main (internal)
functions of the module (e.g. a PEP);
  2) make a review on rietveld (possibly only of the diff with re, to limit
the review to the new code only), so that people can ask questions, discuss
and understand the code;
  3) possibly update the document/PEP with the outcome of the rietveld
review(s) and/or address the issues discussed (if any);
  4) add documentation for the module and the (public) functions in
Doc/library (this should be done anyway).

This will ensure that the general quality of the code is good, and when
someone actually has to work on the code, there's enough documentation to
make it possible.

Best Regards,
Ezio Melotti

> Another "interesting" question is whether it's easy to port to the PEP
> 393 string representation, if it gets accepted.
> Regards
> Antoine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110828/4448fb17/attachment-0001.html>

More information about the Python-Dev mailing list