Hi Cameron,

Thanks for putting in the time to study my problem and sketch a solution.

Unfortunately, it's not helpful. I was developing a solution similar to yours before I came to the conclusion that a multilne regex would be more elegant. I find this algorithm to be quite complicated. It's basically a poor man's regex engine.

I'm more likely to use a shim to make the re package work on streams (like regexy or reading chunks until I get a match) than to use an algorithm like that. 


On Sun, Oct 7, 2018 at 9:58 AM Cameron Simpson <cs@cskk.id.au> wrote:
On 07Oct2018 07:30, Ram Rachum <ram@rachum.com> wrote:
>I'm doing multi-color 3d-printing. The slicing software generates a GCode
>file, which is a text file of instructions for the printer, each command
>meaning something like "move the head to coordinates x,y,z while extruding
>plastic at a rate of w" and lots of other administrative commands. (Turn
>the print fan on, heat the extruder to a certain temperature, calibrate the
>printer, etc.)
>Here's an example of a simple GCode from a print I did last week:
>It's 1.8MB in size. They could get to 1GB for complex prints.
>Multi-color prints means that at some points in the print, usually in a
>layer change, I'm changing the color. This means I need to insert an M600
>command, which tells the printer to pause the print, move the head around,
>and give me a prompt to change the filament before continuing printing.
>I'm sick of injecting the M600 manually after every print. I've been doing
>that for the last year. I'm working on a script so I could say "Insert an
>M600 command after layers 5, 88 and 234, and also before process Foo."
>The slicer inserts comments saying "; layer 234" Or "; process Foo". I want
>to identify the entire layer as one match. That's because I want to find
>the layer and possibly process at the start, I want to find the last
>retraction command, the first extrusion command in the new layer, etc. So
>it's a regex that spans potentially thousands of lines.
>Then I'll know just where to put my M600 and how much retraction to do


Yeah, don't use a regexp for "the whole layer". I've fetched your file, and it
is one instruction or comment per line. This is _easy_ to parse. Consider this
totally untested sketch:

  layer_re = re.compile('^; layer (\d+), Z = (.*)')
  with open("JPL.gcode") as gcode:
    current_layer = None
    for lineno, line in enumerate(gcode, 1):
      m = layer_re.match(line)
      if m:
        # new layer
        new_layer = int(m.group(1))
        new_z = float(m.group(2))
        if current_layer is not None:
          # process the saved previous layer
        current_layer = new_layer
        accrued_layer = []
      if current_layer is not None:
        # we're saving lines for later work
      # otherwise, copy the line straight out

The idea is that you scan the data on a per-line basis, adjusting some state
variables as you see important lines.  If you're "saving" a chunk of lines such
as the instructions in a layer (in the above code: "current_layer is not None")
you can stuff just those lines into a list for use when complete.

On changes of state you deal with what you may have saved, etc.

But just looking at your examples, you may not need to save anything; just
insert or append lines during the copy. Example:

  with open("JPL.gcode") as gcode:
    for lineno, line in enumerate(gcode, 1):
      # pre line actions
      if line.startswith('; process '):
        print("M600 instruction...")
      # copy out the line
      # post line actions
      if ...

So you don't need to apply a regexp to a huge chunk of file. Process the file
on an instruction basis and insert/append your extra instructions as you see
the boundaries of the code you're after.

A minor note. This incantation:

  for lineno, line in enumerate(gcode, 1):

is to make it easy to print error message which recite the file line number to
aid debugging. If you don't need that you'd just run with:

  for line in gcode:

Cameron Simpson <cs@cskk.id.au>