FYI: Removing posts with All Cap Authors

rurpy at yahoo.com rurpy at yahoo.com
Sat Mar 4 11:48:03 EST 2017


On Saturday, March 4, 2017 at 9:37:35 AM UTC-7, Wanderer wrote:
> On Saturday, March 4, 2017 at 11:31:13 AM UTC-5, Chris Angelico wrote:
> > On Sun, Mar 5, 2017 at 3:22 AM, Wanderer wrote:
> > > I mostly just lurk and view the post titles to see if something interesting is being discussed. This code gets me a web page without the spam. You need to compile it to a pyc file and create a bookmark. Probably not useful for most people who don't use their browsers the way I do, but here it is.
> > >
> > > # remove authors with mostly caps
> > >
> > > USERAGENTBASE = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:40.0) Gecko/20100101 '
> > > BROWSERPATH = 'C:\\"Program Files"\\Waterfox\\waterfox.exe'
> > > FILENAME = 'C:\\PyStuff\\pygroup.htm'
> > > WEBPAGE = "https://groups.google.com/forum/?_escaped_fragment_=forum/comp.lang.python"
> > >
> > 
> > Interesting. Any particular reason to screen-scrape Google Groups
> > rather than start with the netnews protocol? You can get a
> > machine-readable version of the newsgroup much more simply that way, I
> > would have thought.
> > 
> > ChrisA
> 
> I don't know what a netnews protocol is. I use Google Groups to look at usenet.

As a lot of us do.


More information about the Python-list mailing list