[Tutor] Using Python to access .txt files stored behind a firewall as .exe files

Mats Wichmann mats at wichmann.us
Mon May 1 19:14:51 EDT 2017


On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote:
> On 01/05/17 18:20, Ian Monat wrote:
>> ...  I've written a script using the requests module but I
>> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
>> required.
> 
> I'm not sure what you are looking for. Scrapy, BS etc will
> help you read the HTML but not to fetch the file. Also do
> you want to process the file (extract the text) in Python
> too, or is it enough to just fetch the file?
> 
> If the problem is with reading the HTML then you need to
> give us more detail about the problem areas and HTML
> format.
> 
> If the problem is fetching the file, it sounds like you
> have already done that and it should be a case of fine
> tuning/tidying up the code you've written.
> 
> What kind of help exactly are you asking for?
> 

This is a completely non-Python, non-Tutor response to part of this:

The self-extracting archive. Convenience, at a price: running
executables of unverified reliability is just a terrible idea.

I know you said your disty won't change their website, but you should
tell them they should: a tremendous number of organizations have
policies that don't just allow pulling down and running an exe file from
a website. Even if that's not currently the case for you, you could say
that you're not allowed, and get someone in your management chain to
promise to support that if there's a question - should not be hard. It
may be wired into the distributor's content delivery system, but that's
a stupid choice on their part.

"Then you have you run the .exe which produces a zipped file"

Don't do this ("run"), unless there's a way you trust to be able to
verify the security of what is offered. Just about any payload could be
buried in the exe, especially if someone broke in to the distributor's site.

Possibly slightly pythonic:

if it is really just a wrapper for a zipfile (i.e. the aforementioned
self-extracting archive), you should be able to open it in 7zip or
similar, and extract the zipfile, without ever "running" it.  And if
that is the case, you should be able to script extracting the zipfile
from the .exe, and then extracting the text file from the zipfile, using
Python (or other scripting languages: that's not particularly
Python-specific).


More information about the Tutor mailing list