Obtain the query interface url of BCS server.

hongy...@gmail.com hongyi.zhao at gmail.com
Tue Sep 13 19:29:16 EDT 2022


On Tuesday, September 13, 2022 at 9:33:20 PM UTC+8, DFS wrote:
> On 9/13/2022 3:46 AM, hongy... at gmail.com wrote: 
> > On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote: 
> >> On 9/12/2022 5:00 AM, hongy... at gmail.com wrote: 
> >>> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script. 
> >>> 
> >>> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim? 
> >>> 
> >>> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10 
> >>> 
> >>> Regards, 
> >>> Zhao 
> >> You didn't say what you want to query. Are you trying to download 
> >> entire sections of the Bilbao Crystallographic Server? 
> > 
> > I am engaged in some related research and need some specific data used by BCS server.
> What specific data? 

All the data corresponding to the total catalog here:
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
 
> Is it available elsewhere?

This is an internationally recognized authoritative data source in this field. Data from other places, even if there are readily available electronic versions, are basically taken from here and are not comprehensive.

> >> Maybe the admins will give you access to the data. 
> > 
> > I don't think they will provide such convenience to researchers who have no cooperative relationship with them.
> You can try. Tell the admins what data you want, and ask them for the 
> easiest way to get it.
> >> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen 
> >> brings up the table of space group symbols. 
> >> 
> >> * choose say #7: Pc 
> >> 
> >> * now click ITA Settings, then choose the last entry "P c 1 1" and it 
> >> loads: 
> >> 
> >> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita 
> > 
> > Not only that, but I want to obtain all such URLs programmatically! 
> > 
> >> You might be able to fool around with that URL and substitute values and 
> >> get back the data you want (in HTML) via Python. Do you really want 
> >> HTML results? 
> >> 
> >> Hit Ctrl+U to see the source HTML of a webpage 
> >> 
> >> Right-click or hit Ctrl + Shift + C to inspect the individual elements 
> >> of the page 
> > 
> > For batch operations, all these manual methods are inefficient.
> Yes, but I don't think you'll be able to retrieve the URLs 
> programmatically. The JavaScript code doesn't put them in the HTML 
> result, except for that one I showed you, which seems like a mistake on 
> their part. 
> 
> So you'll have to figure out the search fields, and your python program 
> will have to cycle through the search values: 
> 
> Sample from above 
> gnum = 007 
> what = gp 
> trmat = b,-a-c,c 
> unconv = P c 1 1 
> from = ita 

The problem is that I must first get all possible combinations of these variables.
 
> wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen" 
> wGnum = "?gnum=" + findgnum 
> wWhat = "&what=" + findWhat 
> wTrmat = "&trmat=" + findTrmat 
> wUnconv = "&unconv=" + findUnconv 
> wFrom = "&from=" + findFrom 
> webpage = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom 
> 
> Then if that returns a hit, you'll have to parse the resulting HTML and 
> extract the exact data you want. 
> 
> 
> 
> I did something similar a while back using the requests and lxml libraries 
> ---------------------------------------------------------------- 
> #build url 
> wBase = "http://www.usdirectory.com" 
> wForm = "/ypr.aspx?fromform=qsearch" 
> wKeyw = "&qhqn=" + keyw 
> wCityZip = "&qc=" + cityzip 
> wState = "&qs=" + state 
> wDist = "&rg=" + str(miles) 
> wSort = "&sb=a2z" #sort alpha 
> wPage = "&ap=" #used with the results page number 
> webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist 
> 
> #open URL 
> page = requests.get(webpage) 
> tree = html.fromstring(page.content) 
> 
> #no matches 
> matches = tree.xpath('//strong/text()') 
> if passNbr == 1 and ("No results were found" in str(matches)): 
> print "No results found for that search" 
> exit(0) 
> ---------------------------------------------------------------- 
> 
> 
> 
> 2.x code file: https://file.io/VdptORSKh5CN 
> 
> 
> 
> > Best Regards, 
> > Zhao


More information about the Python-list mailing list