[Python-ideas] Support parsing stream with `re`

Ram Rachum ram at rachum.com
Sun Oct 7 09:11:57 EDT 2018


Hi Cameron,

Thanks for putting in the time to study my problem and sketch a solution.

Unfortunately, it's not helpful. I was developing a solution similar to
yours before I came to the conclusion that a multilne regex would be more
elegant. I find this algorithm to be quite complicated. It's basically a
poor man's regex engine.

I'm more likely to use a shim to make the re package work on streams (like
regexy or reading chunks until I get a match) than to use an algorithm like
that.


Thanks,
Ram.

On Sun, Oct 7, 2018 at 9:58 AM Cameron Simpson <cs at cskk.id.au> wrote:

> On 07Oct2018 07:30, Ram Rachum <ram at rachum.com> wrote:
> >I'm doing multi-color 3d-printing. The slicing software generates a GCode
> >file, which is a text file of instructions for the printer, each command
> >meaning something like "move the head to coordinates x,y,z while extruding
> >plastic at a rate of w" and lots of other administrative commands. (Turn
> >the print fan on, heat the extruder to a certain temperature, calibrate
> the
> >printer, etc.)
> >
> >Here's an example of a simple GCode from a print I did last week:
> >https://www.dropbox.com/s/kzmm6v8ilcn0aik/JPL%20Go%20hook.gcode?dl=0
> >
> >It's 1.8MB in size. They could get to 1GB for complex prints.
> >
> >Multi-color prints means that at some points in the print, usually in a
> >layer change, I'm changing the color. This means I need to insert an M600
> >command, which tells the printer to pause the print, move the head around,
> >and give me a prompt to change the filament before continuing printing.
> >
> >I'm sick of injecting the M600 manually after every print. I've been doing
> >that for the last year. I'm working on a script so I could say "Insert an
> >M600 command after layers 5, 88 and 234, and also before process Foo."
> >
> >The slicer inserts comments saying "; layer 234" Or "; process Foo". I
> want
> >to identify the entire layer as one match. That's because I want to find
> >the layer and possibly process at the start, I want to find the last
> >retraction command, the first extrusion command in the new layer, etc. So
> >it's a regex that spans potentially thousands of lines.
> >
> >Then I'll know just where to put my M600 and how much retraction to do
> >afterwards.
>
> Aha.
>
> Yeah, don't use a regexp for "the whole layer". I've fetched your file,
> and it
> is one instruction or comment per line. This is _easy_ to parse. Consider
> this
> totally untested sketch:
>
>   layer_re = re.compile('^; layer (\d+), Z = (.*)')
>   with open("JPL.gcode") as gcode:
>     current_layer = None
>     for lineno, line in enumerate(gcode, 1):
>       m = layer_re.match(line)
>       if m:
>         # new layer
>         new_layer = int(m.group(1))
>         new_z = float(m.group(2))
>         if current_layer is not None:
>           # process the saved previous layer
>           ..........
>         current_layer = new_layer
>         accrued_layer = []
>       if current_layer is not None:
>         # we're saving lines for later work
>         accrued_layer.append(line)
>         continue
>       # otherwise, copy the line straight out
>       sys.stdout.write(line)
>
> The idea is that you scan the data on a per-line basis, adjusting some
> state
> variables as you see important lines.  If you're "saving" a chunk of lines
> such
> as the instructions in a layer (in the above code: "current_layer is not
> None")
> you can stuff just those lines into a list for use when complete.
>
> On changes of state you deal with what you may have saved, etc.
>
> But just looking at your examples, you may not need to save anything; just
> insert or append lines during the copy. Example:
>
>   with open("JPL.gcode") as gcode:
>     for lineno, line in enumerate(gcode, 1):
>       # pre line actions
>       if line.startswith('; process '):
>         print("M600 instruction...")
>       # copy out the line
>       sys.stdout.write(line)
>       # post line actions
>       if ...
>
> So you don't need to apply a regexp to a huge chunk of file. Process the
> file
> on an instruction basis and insert/append your extra instructions as you
> see
> the boundaries of the code you're after.
>
> A minor note. This incantation:
>
>   for lineno, line in enumerate(gcode, 1):
>
> is to make it easy to print error message which recite the file line
> number to
> aid debugging. If you don't need that you'd just run with:
>
>   for line in gcode:
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181007/0c1b1dec/attachment-0001.html>


More information about the Python-ideas mailing list