Python segfault

Thomas Wouters thomas at xs4all.net
Fri Jun 16 07:33:55 EDT 2000


On Fri, Jun 16, 2000 at 08:29:44AM +0200, Peter Schneider-Kamp wrote:

> > I don't know if anyone has seen this,  but python segfaults when given a
> > rather large input.

> > [pat at perly pat]$ perl -e "print '2+2+' x 10000" > input
> > [pat at perly pat]$ python < input

> Strange. Looks like the limit is exactly at 8192. That makes 16384
> operations the parser (or whatever crashes) can hold.

> Not that the problem is not the length of the input itself:

> $ perl -e "print '22222222 ' x 200000 > input
> $ python < input

> works just fine. I think the problem is the add (or mul) operation.
> So the problem might be in the expression parser.

> Is there anyone with some more insight around?

The problem is not the length of the expression, nor the 'depth' of the
expression (as someone else suggested), but the 'width' of the expression. 

Each expression (a 'node') is parsed into X different expressions (also
nodes) which are the 'children' of the current node. And each of those nodes
is parsed just like the first one, and so on.


So, the expression '2 + 2 + 2 + 2' is parsed somewhat like this:

            '2+2+2+2'
               /|\
              / | \
             / /|\ \
            / / | \ \
           / / /|\ \ \
          / / / | \ \ \
          2 + 2 + 2 + 2

(actual operation to perform ommited, obviously)
Whereas '(2 + 2) + 2 + 2' is parsed like this:

         '(2+2)+2+2'
               |\
              /| \
             / | |\
            /  | | \
         (2+2) 2 + 2
          /|\
         / | \
        2  +  2

The problem is not the 'height' of this graph, or the total number of
'nodes' (which is limited only by memory), but the 'width' of the first
tree. The 'node' struct is defined like this:

typedef struct _node {
        short           n_type;
        char            *n_str;
        short           n_lineno;
        short           n_nchildren;
        struct _node    *n_child;
} node;

and the 'n_nchildren' member is the number of immediate subexpressions (not
grandchildren, only children) -- and it is limited to a short, which is 16
bits on most systems. And the short is signed, which means the maximum
number of children is 32767. This includes the operators itself, so you can
operate on just 16384 items in a single expression. If you try to operate on
more, the short wraps, and the parser tries to index node->n_child[-32768],
which usually segfaults.

If you really hit this limit, you can add a couple of parentheses to solve
the problem ;) But I agree the segfault is not very nice. There should
probably be an overflow detection somewhere in node.c, but I'm not sure what
kind of error it should generate.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!




More information about the Python-list mailing list