[Spambayes] Mail with problem
Tim Stone - Four Stones Expressions
Thu Nov 14 19:48:36 2002
Depending on what kind of regex engine python has (NFA or DFA) and on how the
html parsing regex is implemented relative to its engine, it can take an
enormous amount of memory. For example, with an NFA and a regex that uses
alternation in certain ways, the stack can grow exponentially.
We may want to take a hard look at tokenizer's html parsing regex. I looked
at it briefly yesterday, but didn't pay much attention.
Tim, do you know if the python regex is NFA or DFA? If it's NFA, is there a
DFA engine we can plug in?
11/14/2002 1:24:58 PM, Tim Peters <email@example.com> wrote:
>> The enclosed file contains a mail wich when received or trained throught
>> pop3prowy give me the following error:
>> (MacOS 9.1 24 Mo memory for Python 2.2.1)
>Looks like the regular expression engine runs out of (C) stack space while
>trying to find HTML tags to strip. I don't know enough about Macs to
>suggest something specific, but in general you have to do whatever it takes
>to convince he OS to give the program more stack space to work with.
>Short of that, reducing the instances of "2048" in html_re in tokenizer.py
>should make the problem go away, but since C stack space limits are
>platform-specific, it's impossible to say how small "is safe" for you
>without simply trying it over and over until the error goes away.
>Spambayes mailing list
More information about the Spambayes