Extract the “Matrix form” dataset from BCS website.
hongy...@gmail.com
hongyi.zhao at gmail.com
Thu Dec 22 08:35:04 EST 2022
I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.
I tried with the following python code snippet, but still failed to figure out the trick:
import requests
from bs4 import BeautifulSoup
import re
proxies = {
'http': 'socks5h://127.0.0.1:18888',
'https': 'socks5h://127.0.0.1:18888'
}
requests.packages.urllib3.disable_warnings()
r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
soup = BeautifulSoup(r.content, features="lxml")
table = soup.find('table')
id = table.find_all('id')
My python environment is as follows:
werner at X10DAi:~$ pyenv shell datasci
(datasci) werner at X10DAi:~$ python --version
Python 3.11.1
Any tips will be appreciated.
[1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane
Regards,
Zhao
More information about the Python-list
mailing list