delete from pattern to pattern if it contains match
harirammanohar at gmail.com
harirammanohar at gmail.com
Mon Apr 25 02:29:00 EDT 2016
On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
>
> > harirammanohar at gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar at gmail.com writes:
> >>>
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram... at gmail.com wrote:
> >>> >> HI All,
> >>> >>
> >>> >> can you help me out in doing below.
> >>> >>
> >>> >> file:
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >>
> >>> >> output should be:
> >>> >>
> >>> >> <start>
> >>> >> guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >> orange
> >>> >> fruit
> >>> >> <end>
> >>> >>
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>>
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>>
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>>
> >>> from io import StringIO
> >>>
> >>> text = '''\
> >>> <start>
> >>> guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>> mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>> orange
> >>> fruit
> >>> <end>
> >>> '''
> >>>
> >>> def records(source):
> >>> current = []
> >>> for line in source:
> >>> if line.startswith('<end>'):
> >>> yield current
> >>> current = []
> >>> else:
> >>> current.append(line)
> >>>
> >>> def hasmango(record):
> >>> return any('mango' in it for it in record)
> >>>
> >>> for record in records(StringIO(text)):
> >>> hasmango(record) or print(*record)
> >>
> >> Hi,
> >>
> >> not working....this is the output i am getting...
> >>
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
>
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
>
> >> <start>
> >> guava
> >> fruit
> >>
> >> <start>
> >> orange
> >> fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
>
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
>
> It was already known that this is not the actual format of the data.
>
> > To fix this change the generator to
> >
> > def records(source):
> > current = []
> > for line in source:
> > current.append(line)
> > if line.startswith('<end>'):
> > yield current
> > current = []
>
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
>
> >>> hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
>
> Yes, I forgot about the space. Sorry about that.
>
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
>
> print(*record, sep = "", end = "<end>\n")
>
> Or so:
>
> print(*record, sep = "")
> print("<end>")
>
> Or so:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> print("<end>")
>
> Or:
>
> for line in record:
> print(line.rstrip("\n")
> else:
> if record and not record[-1].strip() == "<end>":
> print("<end>")
>
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
>
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
>
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> > print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
>
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
>
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
>
> text = '''<start>
> guava
> fruit
> <end>
> <start>
> mango
> fruit
> <end>
> <start>
> orange
> fruit
> <end>
> '''
Hi Jussi,
i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
and entire thing works if it has as below:
<!DOCTYPE web-app
<web-app>
what i observe is xml tree parsing is not working if http tags are there in between web-app...
More information about the Python-list
mailing list