<div dir="ltr"><div dir="ltr">Hi Ned! I'm happy to see you here.<div><br></div><div>I'm doing multi-color 3d-printing. The slicing software generates a GCode file, which is a text file of instructions for the printer, each command meaning something like "move the head to coordinates x,y,z while extruding plastic at a rate of w" and lots of other administrative commands. (Turn the print fan on, heat the extruder to a certain temperature, calibrate the printer, etc.)</div><div><br></div><div>Here's an example of a simple GCode from a print I did last week: <a href="https://www.dropbox.com/s/kzmm6v8ilcn0aik/JPL%20Go%20hook.gcode?dl=0">https://www.dropbox.com/s/kzmm6v8ilcn0aik/JPL%20Go%20hook.gcode?dl=0</a></div><div><br></div><div>It's 1.8MB in size. They could get to 1GB for complex prints.</div><div><br></div><div>Multi-color prints means that at some points in the print, usually in a layer change, I'm changing the color. This means I need to insert an M600 command, which tells the printer to pause the print, move the head around, and give me a prompt to change the filament before continuing printing. </div><div><br></div><div>I'm sick of injecting the M600 manually after every print. I've been doing that for the last year. I'm working on a script so I could say "Insert an M600 command after layers 5, 88 and 234, and also before process Foo."</div><div><br></div><div>The slicer inserts comments saying "; layer 234" Or "; process Foo". I want to identify the entire layer as one match. That's because I want to find the layer and possibly process at the start, I want to find the last retraction command, the first extrusion command in the new layer, etc. So it's a regex that spans potentially thousands of lines.</div><div><br></div><div>Then I'll know just where to put my M600 and how much retraction to do afterwards.</div><div><br></div><div><br></div><div>Thanks,</div><div>Ram.</div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Oct 6, 2018 at 6:58 PM Ned Batchelder <<a href="mailto:ned@nedbatchelder.com">ned@nedbatchelder.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    On 10/6/18 7:25 AM, Ram Rachum wrote:<br>

    <blockquote type="cite">

      
      <div dir="ltr">"This is a regular expression problem, rather than

        a Python problem."

        <div><br>

        </div>

        <div>Do you have evidence for this assertion, except that other

          regex implementations have this limitation? Is there a regex

          specification somewhere that specifies that streams aren't

          supported? Is there a fundamental reason that streams aren't

          supported?</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>"Can the lexing be done on a line-by-line basis?"<br>

        </div>

        <div><br>

        </div>

        <div>For my use case, it unfortunately can't.</div>

      </div>

    </blockquote>

    <br>

    You mentioned earlier that your use case doesn't have to worry about

    the "a.*b" problem.  Can you tell us more about your scenario?  How

    would the stream know it had read enough to match or not match? 

    Perhaps that same logic can be used to feed the data in chunks?<br>

    <br>

    --Ned.<br>

    <br>

    <blockquote type="cite"><br>

      <div class="gmail_quote">

        <div dir="ltr">On Sat, Oct 6, 2018 at 1:53 PM Jonathan Fine <<a href="mailto:jfine2358@gmail.com" target="_blank">jfine2358@gmail.com</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Ram<br>

          <br>

          You wrote:<br>

          <br>

          > I'd like to use the re module to parse a long text file,

          1GB in size. I<br>

          > wish that the re module could parse a stream, so I

          wouldn't have to load<br>

          > the whole thing into memory. I'd like to iterate over

          matches from the<br>

          > stream without keeping the old matches and input in RAM.<br>

          <br>

          This is a regular expression problem, rather than a Python

          problem. A search for<br>

              regular expression large file<br>

          brings up some URLs that might help you, starting with<br>

          <a href="https://stackoverflow.com/questions/23773669/grep-pattern-match-between-very-large-files-is-way-too-slow" rel="noreferrer" target="_blank">https://stackoverflow.com/questions/23773669/grep-pattern-match-between-very-large-files-is-way-too-slow</a><br>

          <br>

          This might also be helpful<br>

          <a href="https://svn.boost.org/trac10/ticket/11776" rel="noreferrer" target="_blank">https://svn.boost.org/trac10/ticket/11776</a><br>

          <br>

          What will work for your problem depends on the nature of the

          problem<br>

          you have. The simplest thing that might work is to iterate of

          the file<br>

          line-by-line, and use a regular expression to extract matches

          from<br>

          each line.<br>

          <br>

          In other words, something like (not tested)<br>

          <br>

             def helper(lines):<br>

                 for line in lines:<br>

                     yield from re.finditer(pattern, line)<br>

          <br>

              lines = open('my-big-file.txt')<br>

              for match in helper(lines):<br>

                  # Do your stuff here<br>

          <br>

          Parsing is not the same as lexing, see<br>

          <a href="https://en.wikipedia.org/wiki/Lexical_analysis" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Lexical_analysis</a><br>

          <br>

          I suggest you use regular expressions ONLY for the lexing

          phase. If<br>

          you'd like further help, perhaps first ask yourself this. Can

          the<br>

          lexing be done on a line-by-line basis? And if not, why not?<br>

          <br>

          If line-by-line not possible, then you'll have to modify the

          helper.<br>

          At the end of each line, they'll be a residue / remainder,

          which<br>

          you'll have to bring into the next line. In other words, the

          helper<br>

          will have to record (and change) the state that exists at the

          end of<br>

          each line. A bit like the 'carry' that is used when doing long<br>

          addition.<br>

          <br>

          I hope this helps.<br>

          <br>

          -- <br>

          Jonathan<br>

          <br>

        </blockquote>

      </div>

      <br>

      <fieldset class="m_8438104099612928794mimeAttachmentHeader"></fieldset>

      <br>

      <pre>_______________________________________________

Python-ideas mailing list

<a class="m_8438104099612928794moz-txt-link-abbreviated" href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a>

<a class="m_8438104099612928794moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/python-ideas" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a>

Code of Conduct: <a class="m_8438104099612928794moz-txt-link-freetext" href="http://python.org/psf/codeofconduct/" target="_blank">http://python.org/psf/codeofconduct/</a>

</pre>

    </blockquote>

    <br>

  </div>


_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a><br>

</blockquote></div>