[Python-Dev] cpython: replace custom validation logic in the parse module with a simple DFA validator
Benjamin Peterson
benjamin at python.org
Mon Jun 6 02:51:54 EDT 2016
On Sat, Jun 4, 2016, at 17:26, Christian Heimes wrote:
> On 2016-06-02 11:32, benjamin.peterson wrote:
> > https://hg.python.org/cpython/rev/4a9159ea2536
> > changeset: 101601:4a9159ea2536
> > user: Benjamin Peterson <benjamin at python.org>
> > date: Thu Jun 02 11:30:18 2016 -0700
> > summary:
> > replace custom validation logic in the parse module with a simple DFA validator (closes #26526)
> >
> > Patch from A. Skrobov.
> >
> > files:
> > Misc/NEWS | 3 +
> > Modules/parsermodule.c | 2545 +--------------------------
> > 2 files changed, 96 insertions(+), 2452 deletions(-)
> >
> >
> > diff --git a/Misc/NEWS b/Misc/NEWS
> > --- a/Misc/NEWS
> > +++ b/Misc/NEWS
> > @@ -22,6 +22,9 @@
> > Library
> > -------
> >
> > +- Issue #26526: Replace custom parse tree validation in the parser
> > + module with a simple DFA validator.
> > +
> > - Issue #27114: Fix SSLContext._load_windows_store_certs fails with
> > PermissionError
> >
> > diff --git a/Modules/parsermodule.c b/Modules/parsermodule.c
> > --- a/Modules/parsermodule.c
> > +++ b/Modules/parsermodule.c
> > @@ -670,9 +670,75 @@
> >
> >
> > static node* build_node_tree(PyObject *tuple);
> > -static int validate_expr_tree(node *tree);
> > -static int validate_file_input(node *tree);
> > -static int validate_encoding_decl(node *tree);
> > +
> > +static int
> > +validate_node(node *tree)
> > +{
> > + int type = TYPE(tree);
> > + int nch = NCH(tree);
> > + dfa *nt_dfa;
> > + state *dfa_state;
> > + int pos, arc;
> > +
> > + assert(ISNONTERMINAL(type));
> > + type -= NT_OFFSET;
> > + if (type >= _PyParser_Grammar.g_ndfas) {
> > + PyErr_Format(parser_error, "Unrecognized node type %d.", TYPE(tree));
> > + return 0;
> > + }
> > + nt_dfa = &_PyParser_Grammar.g_dfa[type];
> > + REQ(tree, nt_dfa->d_type);
> > +
> > + /* Run the DFA for this nonterminal. */
> > + dfa_state = &nt_dfa->d_state[nt_dfa->d_initial];
> > + for (pos = 0; pos < nch; ++pos) {
> > + node *ch = CHILD(tree, pos);
> > + int ch_type = TYPE(ch);
> > + for (arc = 0; arc < dfa_state->s_narcs; ++arc) {
> > + short a_label = dfa_state->s_arc[arc].a_lbl;
> > + assert(a_label < _PyParser_Grammar.g_ll.ll_nlabels);
> > + if (_PyParser_Grammar.g_ll.ll_label[a_label].lb_type == ch_type) {
> > + /* The child is acceptable; if non-terminal, validate it recursively. */
> > + if (ISNONTERMINAL(ch_type) && !validate_node(ch))
> > + return 0;
> > +
> > + /* Update the state, and move on to the next child. */
> > + dfa_state = &nt_dfa->d_state[dfa_state->s_arc[arc].a_arrow];
> > + goto arc_found;
> > + }
> > + }
> > + /* What would this state have accepted? */
> > + {
> > + short a_label = dfa_state->s_arc->a_lbl;
> > + int next_type;
> > + if (!a_label) /* Wouldn't accept any more children */
> > + goto illegal_num_children;
> > +
> > + next_type = _PyParser_Grammar.g_ll.ll_label[a_label].lb_type;
> > + if (ISNONTERMINAL(next_type))
> > + PyErr_Format(parser_error, "Expected node type %d, got %d.",
> > + next_type, ch_type);
> > + else
> > + PyErr_Format(parser_error, "Illegal terminal: expected %s.",
> > + _PyParser_TokenNames[next_type]);
>
> Coverity doesn't that line:
>
> CID 1362505 (#1 of 1): Out-of-bounds read (OVERRUN)
> 20. overrun-local: Overrunning array _PyParser_TokenNames of 58 8-byte
> elements at element index 255 (byte offset 2040) using index next_type
> (which evaluates to 255).
I don't think this can cause a problem because it doesn't ever come from
user-provided input.
More information about the Python-Dev
mailing list