[New-bugs-announce] [issue22434] Use named constants internally in the re module

Serhiy Storchaka report at bugs.python.org
Wed Sep 17 18:09:48 CEST 2014


New submission from Serhiy Storchaka:

Regular expression parser parses a pattern to a tree, marking nodes by string identifiers. Regular expression compiler converts this three into plain list of integers. Node's identifiers are transformed to sequential integers. Resulting list is not human readable. Proposed patch converts string constants in the sre_constants module to named integer constants. These constants doesn't need converting to integers, because they are already integers, and when printed they looks human-friendly. Now intermediate result of regular expression compiler looks much more readable.

Example.

>>> import re, sre_compile, sre_parse
>>> sre_compile._code(sre_parse.parse('[a-z_][a-z_0-9]+', re.I), re.I)

Before patch:

[17, 4, 0, 2, 2147483647, 16, 7, 27, 97, 122, 19, 95, 0, 29, 16, 1, 2147483647, 16, 11, 10, 0, 67043328, 2147483648, 134217726, 0, 0, 0, 0, 0, 1, 1]

After patch:

[INFO, 4, 0, 2, MAXREPEAT, IN_IGNORE, 7, RANGE, 97, 122, LITERAL, 95, FAILURE, REPEAT_ONE, 16, 1, MAXREPEAT, IN_IGNORE, 11, CHARSET, 0, 67043328, 2147483648, 134217726, 0, 0, 0, 0, FAILURE, SUCCESS, SUCCESS]

This patch also affects debugging output when regular expression is compiled with re.DEBUG (identifiers are uppercased and MAXREPEAT is displayed instead of 2147483647 in repeat statements).

Besides debugging output these changes are invisible for ordinal user. They are needed only for developing and debugging the re module itself. The patch doesn't affect performance and almost not affects memory consumption.

----------
components: Regular Expressions
files: re_named_consts.patch
keywords: patch
messages: 227008
nosy: ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Use named constants internally in the re module
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file36642/re_named_consts.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22434>
_______________________________________


More information about the New-bugs-announce mailing list