[Tutor] delete strings from specificed words

Tue Jan 9 22:45:23 EST 2018

Hi,

First, thank you very much for your reply.

On Tue, Jan 09, 2018 at 10:25:11PM +0000, Alan Gauld via Tutor wrote:
>On 09/01/18 14:20, YU Bo wrote:
>
>> But, i am facing an interesting question.I have no idea to deal with it.
>
>I don;t think you have given us enough context to
>be able to help much. WE would need some idea of
>the input and output data (both current and desired)
>
>>
>
>It sounds like you are building some kind of pretty printer.
>Maybe you could use Pythons pretty-print module as a design
>template? Or maybe even use some of it directly. It just
>depends on your data formats etc.

Yes. I think python can deal with it directly.

>
>> In fact, this is a patch from lkml,my goal is to design a kernel podcast
>> for myself to focus on what happened in kernel.
>
>Sorry, I've no idea what lkml is nor what kernel you are talking about.
>
>Can you show us what you are receiving, what you are
>currently producing and what you are trying to produce?
>
>Some actual code might be an idea too.
>And the python version and OS.

Sorry, i don't to explain it.But, my code is terribly.

lkml.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: lkml.py
# Author: Bo Yu

""" This is source code in page that i want to get

"""
import sys
reload(sys)
sys.setdefaultencoding('utf8')

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

# import myself print function

from get_content import print_content

if __name__ == '__main__':
    comment_url = []
    target = 'https://www.spinics.net/lists/kernel/threads.html'
    req = requests.get(url=target)
    req.encoding = 'utf-8'
    content = req.text
    bf = BeautifulSoup(content ,'lxml') # There is no problem

    context = bf.find_all('strong')
    for ret in context[0:1]:
          for test in ret:
                print '\t'
	        x = re.split(' ', str(test))
		y = re.search('"(.+?)"', str(x)).group(1)
		comment_url.append(target.replace("threads.html", str(y)))

    for tmp_target in comment_url:
	print "===This is a new file ==="
	print_content(tmp_target, 'utf-8', 'title')

```
get_content.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: get_content.py

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

def print_content(url, charset, find_id):
    req = requests.get(url=url)
    req.encoding = charset
    content = req.text
    bf = BeautifulSoup(content ,'lxml')
    article_title = bf.find('h1')
    #author = bf.find_all('li')
    commit = bf.find('pre')
    print '\t'
    print article_title.get_text()
    print '\t'
    x = str(commit.get_text())
    print x
```
python --version: Python 2.7.13
OS: debian 9
usage: python lkml.py
output: oh...
https://pastecode.xyz/view/04645424

Please ignore my print debug format.

This is my code and i can get text like output above.
So, simple my quzz:
I dont know how to delete strings after special word, for example:

```text
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e0..3b9b238 100644
...
```
I want to delete string from *diff --git* to end, because too many code is here

Whatever, thanks!

>
>--
>Alan G
>Author of the Learn to Program web site
>http://www.alan-g.me.uk/
>http://www.amazon.com/author/alan_gauld
>Follow my photo-blog on Flickr at:
>http://www.flickr.com/photos/alangauldphotos
>
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>To unsubscribe or change subscription options:
>https://mail.python.org/mailman/listinfo/tutor