[Python-ideas] Support parsing stream with `re`

Ram Rachum ram at rachum.com
Sun Oct 7 00:30:32 EDT 2018

Hi Ned! I'm happy to see you here.

I'm doing multi-color 3d-printing. The slicing software generates a GCode
file, which is a text file of instructions for the printer, each command
meaning something like "move the head to coordinates x,y,z while extruding
plastic at a rate of w" and lots of other administrative commands. (Turn
the print fan on, heat the extruder to a certain temperature, calibrate the
printer, etc.)

Here's an example of a simple GCode from a print I did last week:

It's 1.8MB in size. They could get to 1GB for complex prints.

Multi-color prints means that at some points in the print, usually in a
layer change, I'm changing the color. This means I need to insert an M600
command, which tells the printer to pause the print, move the head around,
and give me a prompt to change the filament before continuing printing.

I'm sick of injecting the M600 manually after every print. I've been doing
that for the last year. I'm working on a script so I could say "Insert an
M600 command after layers 5, 88 and 234, and also before process Foo."

The slicer inserts comments saying "; layer 234" Or "; process Foo". I want
to identify the entire layer as one match. That's because I want to find
the layer and possibly process at the start, I want to find the last
retraction command, the first extrusion command in the new layer, etc. So
it's a regex that spans potentially thousands of lines.

Then I'll know just where to put my M600 and how much retraction to do


On Sat, Oct 6, 2018 at 6:58 PM Ned Batchelder <ned at nedbatchelder.com> wrote:

> On 10/6/18 7:25 AM, Ram Rachum wrote:
> "This is a regular expression problem, rather than a Python problem."
> Do you have evidence for this assertion, except that other regex
> implementations have this limitation? Is there a regex specification
> somewhere that specifies that streams aren't supported? Is there a
> fundamental reason that streams aren't supported?
> "Can the lexing be done on a line-by-line basis?"
> For my use case, it unfortunately can't.
> You mentioned earlier that your use case doesn't have to worry about the
> "a.*b" problem.  Can you tell us more about your scenario?  How would the
> stream know it had read enough to match or not match?  Perhaps that same
> logic can be used to feed the data in chunks?
> --Ned.
> On Sat, Oct 6, 2018 at 1:53 PM Jonathan Fine <jfine2358 at gmail.com> wrote:
>> Hi Ram
>> You wrote:
>> > I'd like to use the re module to parse a long text file, 1GB in size. I
>> > wish that the re module could parse a stream, so I wouldn't have to load
>> > the whole thing into memory. I'd like to iterate over matches from the
>> > stream without keeping the old matches and input in RAM.
>> This is a regular expression problem, rather than a Python problem. A
>> search for
>>     regular expression large file
>> brings up some URLs that might help you, starting with
>> https://stackoverflow.com/questions/23773669/grep-pattern-match-between-very-large-files-is-way-too-slow
>> This might also be helpful
>> https://svn.boost.org/trac10/ticket/11776
>> What will work for your problem depends on the nature of the problem
>> you have. The simplest thing that might work is to iterate of the file
>> line-by-line, and use a regular expression to extract matches from
>> each line.
>> In other words, something like (not tested)
>>    def helper(lines):
>>        for line in lines:
>>            yield from re.finditer(pattern, line)
>>     lines = open('my-big-file.txt')
>>     for match in helper(lines):
>>         # Do your stuff here
>> Parsing is not the same as lexing, see
>> https://en.wikipedia.org/wiki/Lexical_analysis
>> I suggest you use regular expressions ONLY for the lexing phase. If
>> you'd like further help, perhaps first ask yourself this. Can the
>> lexing be done on a line-by-line basis? And if not, why not?
>> If line-by-line not possible, then you'll have to modify the helper.
>> At the end of each line, they'll be a residue / remainder, which
>> you'll have to bring into the next line. In other words, the helper
>> will have to record (and change) the state that exists at the end of
>> each line. A bit like the 'carry' that is used when doing long
>> addition.
>> I hope this helps.
>> --
>> Jonathan
> _______________________________________________
> Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181007/9b188c33/attachment.html>

More information about the Python-ideas mailing list