[Tutor] implementing sed - termination error

cs at zip.com.au cs at zip.com.au
Wed Nov 2 01:22:21 EDT 2016


On 01Nov2016 20:18, bruce <badouglas at gmail.com> wrote:
>Running a test on a linux box, with python.
>Trying to do a search/replace over a file, for a given string, and
>replacing the string with a chunk of text that has multiple lines.
>
>From the cmdline, using sed, no prob. however, implementing sed, runs
>into issues, that result in a "termination error"

Just terminology: you're not "implementing sed", which is a nontrivial task 
that would involve writing a python program that could do everything sed does.  
You're writing a small python program to call sed to do the work.

Further discussion below.

>The error gets thrown, due to the "\" of the newline. SO, and other
>sites have plenty to say about this, but haven't run across any soln.
>
>The test file contains 6K lines, but, the process requires doing lots
>of search/replace operations, so I'm interested in testing this method
>to see how "fast" the overall process is.
>
>The following psuedo code is what I've used to test. The key point
>being changing the "\n" portion to try to resolved the termination
>error.
>
>import subprocess
>
>ll_="ffdfdfdfghhhh"
>ll2_="12112121212121212"
>hash="aaaaa"
>
>data_=ll_+"\n"+ll2_+"\n"+qq22_
>print data_

Presuming qq22_ is not shown.

>cc='sed -i "s/'+hash+'/'+data_+'/g" '+dname
>print cc
>proc=subprocess.Popen(cc, shell=True,stdout=subprocess.PIPE)
>res=proc.communicate()[0].strip()

There are two fairly large problems with this program. The first is your need 
to embed newlines in the replacement pattern. You have genuine newlines in your 
string, but a sed command would look like this:

  sed 's/aaaaa/ffdfdfdfghhhh\
  12112121212121212\
  qqqqq/g'

so you need to replace the newlines with "backslash and newline".

Fortunately strings have a .replace() method which you can use for this 
purpose. Look it up:

  https://docs.python.org/3/library/stdtypes.html#str.replace

You can use it to make data_ how you want it to be for the command.

The second problem is that you're then trying to invoke sed by constructing a 
shell command string and handing that to Popen. This means that you need to 
embed shell syntax in that string to quote things like the sed command. All 
very messy.

It is better to _bypass_ the shell and invoke sed directory by leaving out the 
"shell=True" parameter. All the command line (which is the shell) is doing is 
honouring the shell quoting and constructing a sed invocation as distinct 
strings:

  sed
  -i
  s/this/that/g
  filename

You want to do the equivalent in python, something like this:

  sed_argv = [ 'sed', '-i', 's/'+hash+'/'+data_+'/g', dname ]
  proc=subprocess.Popen(sed_argv, stdout=subprocess.PIPE)

See how you're now unconcerned by any difficulties around shell quoting? You're 
now dealing directly in strings.

There are a few other questions, such as: if you're using sed's -i option, why 
is stdout a pipe? And what if hash or data_ contain slashes, which you are 
using in sed to delimit them?

Hoping this will help you move forward.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list