Is there a maximum length of a regular expression in python?
Fredrik Lundh
fredrik at pythonware.com
Wed Jan 18 09:14:52 EST 2006
olekristianvillabo at gmail.com wrote:
> I have a regular expression that is approximately 100k bytes. (It is
> basically a list of all known norwegian postal numbers and the
> corresponding place with | in between. I know this is not the intended
> use for regular expressions, but it should nonetheless work.
>
> the pattern is
> ur'(N-|NO-)?(5259 HJELLESTAD|4026 STAVANGER|4027 STAVANGER........|8305
> SVOLVÆR)'
>
> The error message I get is:
> RuntimeError: internal error in regular expression engine
you're most likely exceeding the allowed code size (usually 64k).
however, putting all postal numbers in a single RE is a horrid abuse of the RE
engine. why not just scan for "(N-|NO-)?(\d+)" and use a dictionary to check
if you have a valid match?
postcodes = {
"5269": "HJELLESTAD",
...
"9999": "ØSTRE FJORDVIDDA",
}
for m in re.finditer("(N-|NO-)?(\d+) ", text):
prefix, number = m.groups()
try:
place = postcodes[number]
except KeyError:
continue
if not text.startswith(place, m.end()):
continue
# got a match!
print prefix, number, place
</F>
More information about the Python-list
mailing list