[Tutor] delete strings from specificed words
YU Bo
tsu.yubo at gmail.com
Tue Jan 9 22:45:23 EST 2018
Hi,
First, thank you very much for your reply.
On Tue, Jan 09, 2018 at 10:25:11PM +0000, Alan Gauld via Tutor wrote:
>On 09/01/18 14:20, YU Bo wrote:
>
>> But, i am facing an interesting question.I have no idea to deal with it.
>
>I don;t think you have given us enough context to
>be able to help much. WE would need some idea of
>the input and output data (both current and desired)
>
>>
>
>It sounds like you are building some kind of pretty printer.
>Maybe you could use Pythons pretty-print module as a design
>template? Or maybe even use some of it directly. It just
>depends on your data formats etc.
Yes. I think python can deal with it directly.
>
>> In fact, this is a patch from lkml,my goal is to design a kernel podcast
>> for myself to focus on what happened in kernel.
>
>Sorry, I've no idea what lkml is nor what kernel you are talking about.
>
>Can you show us what you are receiving, what you are
>currently producing and what you are trying to produce?
>
>Some actual code might be an idea too.
>And the python version and OS.
Sorry, i don't to explain it.But, my code is terribly.
lkml.py:
```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: lkml.py
# Author: Bo Yu
""" This is source code in page that i want to get
"""
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import urllib2
from bs4 import BeautifulSoup
import requests
import chardet
import re
# import myself print function
from get_content import print_content
if __name__ == '__main__':
comment_url = []
target = 'https://www.spinics.net/lists/kernel/threads.html'
req = requests.get(url=target)
req.encoding = 'utf-8'
content = req.text
bf = BeautifulSoup(content ,'lxml') # There is no problem
context = bf.find_all('strong')
for ret in context[0:1]:
for test in ret:
print '\t'
x = re.split(' ', str(test))
y = re.search('"(.+?)"', str(x)).group(1)
comment_url.append(target.replace("threads.html", str(y)))
for tmp_target in comment_url:
print "===This is a new file ==="
print_content(tmp_target, 'utf-8', 'title')
```
get_content.py:
```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: get_content.py
import urllib2
from bs4 import BeautifulSoup
import requests
import chardet
import re
def print_content(url, charset, find_id):
req = requests.get(url=url)
req.encoding = charset
content = req.text
bf = BeautifulSoup(content ,'lxml')
article_title = bf.find('h1')
#author = bf.find_all('li')
commit = bf.find('pre')
print '\t'
print article_title.get_text()
print '\t'
x = str(commit.get_text())
print x
```
python --version: Python 2.7.13
OS: debian 9
usage: python lkml.py
output: oh...
https://pastecode.xyz/view/04645424
Please ignore my print debug format.
This is my code and i can get text like output above.
So, simple my quzz:
I dont know how to delete strings after special word, for example:
```text
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e0..3b9b238 100644
...
```
I want to delete string from *diff --git* to end, because too many code is here
Whatever, thanks!
>
>--
>Alan G
>Author of the Learn to Program web site
>http://www.alan-g.me.uk/
>http://www.amazon.com/author/alan_gauld
>Follow my photo-blog on Flickr at:
>http://www.flickr.com/photos/alangauldphotos
>
>
>_______________________________________________
>Tutor maillist - Tutor at python.org
>To unsubscribe or change subscription options:
>https://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list