[Tutor] Request review: A DSL for scraping a web page

Joe Farro joe.farro at gmail.com
Thu Apr 2 16:54:01 CEST 2015


Alan Gauld <alan.gauld <at> btinternet.com> writes:
> DSL?

Good to know the term/acronym is not ubiquitous. I was going for
succinct, possibly too succinct...

> Have you looked at the existing web scraping tools in Python?
> There are several to pick from. They all avoid the kind of mess
> you describe.

I'm familiar with a few of them. I've used beautiful soup, PyQuery,
and cssselect. I got frustrated with my scraping code and wrote the
DSL. It's still in the "proving ground" phase of things. A later post
asked about a real-world sample, I'm going to work something up.

> And how is that run?
> What is the syntax for the config file?
> It is not self evident. The other example on github is no less obscure.
> I'm sure it means something to you but it is not obvious.

> 
> OK, I see there is much more on the github. Sadly too much for me to 
> plough through just now.

Thanks for looking. 

> I think the main python list is a better bet for feedback on something 
> of this size.

I wasn't sure if this was sort of at that level, thanks for the suggestion.






More information about the Tutor mailing list