Extract the “Matrix form” dataset from BCS website.

hongy...@gmail.com hongyi.zhao at gmail.com
Thu Dec 22 08:35:04 EST 2022


I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.

I tried with the following python code snippet, but still failed to figure out the trick:

import requests
from bs4 import BeautifulSoup
import re

proxies = {
    'http': 'socks5h://127.0.0.1:18888',
    'https': 'socks5h://127.0.0.1:18888'
}

requests.packages.urllib3.disable_warnings()
r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
soup = BeautifulSoup(r.content, features="lxml")

table = soup.find('table')
id = table.find_all('id')

My python environment is as follows:

werner at X10DAi:~$ pyenv shell datasci 
(datasci) werner at X10DAi:~$ python --version
Python 3.11.1

Any tips will be appreciated.

[1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane

Regards,
Zhao


More information about the Python-list mailing list