regex functionality ( was: Re: function attributes (ANN: Introducing PLY-1.0 (Python Lex-Yacc)))

Michael Robin me at mikerobin.com
Wed Jun 20 21:01:57 CEST 2001


This is from the PLY1.0 doc page:
"""
The lexer requires input to be supplied as a single input string.
Since most machines have more than enough memory, this rarely presents
a performance concern. However, it means that the lexer currently
can't be used with streaming data such as open files or sockets. This
limitation is primarily a side-effect of using the re module.
"""

In my limited Python experiences I've already encountered this
"problem" and a related desired feature for re():
 (1) An ".append(moreString) / .allDone()" type of functionality, with
eager matching semantics. E.g., I should be able to find out after
each .append() (character-by-character, or whatever granularity is
desired) if the match has already failed (and perhaps if it's
currently succeeding due to "constant-match" or "wildcard"). Perhaps
the yield() and friends will make state-maintainance easier for this
purpose, and least for stuff witten in pure python.
 (2) Partial- or fuzzy-match semantics, or "scoring". This could be as
simple as the number of characters and/or fields matched, or more
complicated, based on caller-provided character classes or other
means. Having the # of characters that matched before is all I needed
at the time, and seems rather useful. (I should RTFM again and make
sure this info isn't available.)

I'm sure there's lot's of magic in good regex code, so I'll plead
ignorance if these features don't really map to the implementaion.

m

------------------------------

"M.-A. Lemburg" <mal at lemburg.com> wrote in message news:<mailman.993044131.3290.python-list at python.org>...
> David Beazley wrote:
> > 
> > M.-A. Lemburg writes:
> >  >
> >  > Just a suggestion: PLY seems to use the same logic for attaching
> >  > grammar snippets to functions as SPARK does. IMHO, this is bad
> >  > design since doc-strings should really only be used for documentation
> >  > and not include vital information for the program logic.
> > 
> > Actually, I thought the doc string hack was one of the neatest
> > programming tricks I've ever seen (which is exactly why I copied it
> > from SPARK).  Why would I write a different documentation string for a
> > grammar rule anyways?  The grammar rule in the docstring not only
> > tells the parser generator what to do, but it precisely documents what
> > the function does at the same time.  I don't know what inspired John
> > to take this approach in SPARK, but it's pure genius if you ask me :-).
> 
> IMHO, doc-strings are there to
> document functions/methods in a human readable way, with additional
> comments and maybe even usage examples. As such they are nice
> to have around in the source, but are not necessarily needed
> for program execution (e.g. python -OO removes them).
> 
> Maybe just me, but I believe that putting program logic into 
> documentation is not a clean design.
> 
> >  > Note that in Python 2.1 we have function attributes which were
> >  > added for exactly this reason, so the doc-string approach is
> >  > not really needed anymore. I'd suggest to move to these for one
> >  > of the next releases.
> > 
> > A fine idea, but the implementation is completely useless because
> > there is no syntactically convenient way to attach the
> > attributes. 
> 
> Ah... so that's what you're after: convenience !
> That, of course, is true. 
> 
> So the conclusion should be: how can we make function attribute 
> assignment more convienent and not why not to use them :-)
> 
> > IMHO, having to type something like this is an even more
> > horrible design than using the docstrings (not to mention that it
> > looks ugly and unnatural):
> > 
> >    def p_expr_plus(t):
> >         t[0] = t[1] + t[3]
> > 
> >    p_expr_plus.grammar = 'expr : expr PLUS expr'
> > 
> > I don't have any plans to abandon the use of doc strings because I
> > like the way that they work. If I were to use function attributes for
> > anything, I would probably use them for some purpose other than
> > grammar specification.  I'd have to think about that however.



More information about the Python-list mailing list