Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..)

Peter Funk
Thu, 19 Apr 2001 09:11:33 +0200 (MEST)


Edward D. Loper:
> Below is my first attempt at an EBNF-like formalism for these rules.
> IND and DED are indent and dedent (by a sinlge space); I use=20
> the notation IND[n] to mean n IND tokens.  Note that the rule::

Why don't you simply use INDENT and DEDENT tokens, which may represent
any arbitrary number of spaces as long as they match up?  Don't forget:
This is Python and anyone seriously interested in Python should be
already familar with this concept from the Python Grammar file and
will probably understand this at the first glance.

This might help to get rid of your `[n]' meta notation.  In EBNF the
square brackets `[' and `]' are normally used as meta symbols to
enclose optional terms (see below).  So the notation you invented
here irritates because it suggests that `IND[n]' is an `IND' token
followed by an optional term `n' ;-).

For your entertainment I like to quote a small passage from science
report No.36 written by Niklaus Wirth, ETH Eidgen=F6ssische Technische
Hochschule Z=FCrich, Institut f=FCr Informatik, introducing the programmi=
language MODULA-2 in March 1980:

"""Notation for syntactic description
   To describe the syntax, an Extended Backus-Naur Formalism called EBNF
   is used.
   Each factor F is either a (terminal or non-terminal) symbol, or it is
   of the form [ E ] denoting the union of the set E and the empty senten=
   or { E } denoting the union of the empty sequence and E, EE, EEE, ... =
   Parentheses may be used for grouping terms and factors.
   EBNF is capable of describing its own syntax.  We use it here as an

       syntax     =3D { production } .
       production =3D NTSym "=3D" expression "." .
       expression =3D term {"|" term} .
       term       =3D factor {factor} .
       factor     =3D TSym | NTSym | "(" expression ")" |
                    "[" expression "]" | "{" expression "}"=20

As a student I was very impressed by this short and precise description
of the EBNF formalism. =20

The most common variations of this notation are to
use `::=3D', `:=3D' or `<-' instead of `=3D' in productions or to use=20
`(' expression `)+' instead of the square brackets to mark optional
terms or to use `(' expression `)*' instead of the curly braces to
mark [0..n] repetition.  For example the Python Grammar file uses=20
the asterisk notation for repetitions.  IMO the {} notation as used
by N.Wirth is easier to read.

> Anyway, I'm sure I didn't get that quite right, but it's a
> start, anyway.

Yes.  That's fine.  I will try to have a deeper look into it later.

Regards, Peter
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 422=
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germa=