Re: [Python-ideas] Support parsing stream with `re`

6 Oct 2018

      On Sat, Oct 06, 2018 at 02:00:27PM -0700, Nathaniel Smith wrote:
...
Fortunately, there's an elegant and natural solution: Just save the
regex engine's internal state when it hits the end of the string, and
then when more data arrives, use the saved state to pick up the search
where we left off. Theoretically, any regex engine *could* support
this – it's especially obvious for DFA-based matchers, but even
backtrackers like Python's re could support it, basically by making
the matching engine a coroutine that can suspend itself when it hits
the end of the input, then resume it when new input arrives. Like, if
you asked Knuth for the theoretically optimal design for this parser,
I'm pretty sure this is what he'd tell you to use, and it's what
people do when writing high-performance HTTP parsers in C.
The message I take from this is:

- regex engines certainly can be written to support streaming data;
- but few of them are;
- and it is exceedingly unlikely to be able to easily (or at all) 
  retro-fit that support to Python's existing re module.

Perhaps the solution is a lightweight streaming DFA regex parser?

Does anyone know whether MRAB's regex library supports this?

https://pypi.org/project/regex/
...
you can't write efficient
character-by-character algorithms in Python
I'm sure that Python will never be as efficient as C in that regard 
(although PyPy might argue the point) but is there something we can do 
to ameliorate this? If we could make char-by-char processing only 10 
times less efficient than C instead of 100 times (let's say...) perhaps 
that would help Ram (and you?) with your use-cases?

-- 
Steve

Re: [Python-ideas] Support parsing stream with `re`

Steven D'Aprano