[Tutor] Using Python to access .txt files stored behind a firewall as .exe files

Steven D'Aprano steve at pearwood.info
Tue May 2 12:44:12 EDT 2017


On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote:
[...]
> Then you have you run the .exe which produces a zipped file, and inside the
> zipped file, is the .txt, which what I really want. There's no way the
> distributor will change anything about how they store files on their
> website for me.  I've written a script using the requests module but I
> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> required.
> 
> What would you do?

Find another distributor.

(Its this sort of business to business incompetence that makes me laugh 
when people say that private industry is always more efficient than the 
alternatives. Did I say laugh? I meant cry.)

Seriously, can't you tell them that your anti-virus blocks the .exe 
files, and if they want you to use their system, they'll have to provide 
text files as text files?

Or tell them that you're using Apple Macs and the .exe files don't run 
under Mac.

I guess it depends on whether you need them more than they need you.

In any case, this isn't a problem that can be solved by a web scraper. 
The distributor's website provides .exe files. There's nothing you can 
do about that except complain or leave. The website gives you a .exe 
file, so that's what you receive.

However, once you have the .exe file in your possession, you *may* be 
able to hack open the file and extract the .zip file without running it. 
That will require detailed knowledge of how the .exe file does its job, 
but it is conceivable that it will work. A good low-level hacker could 
probably determine whether the zip file is embedded in the .exe or if it 
is generated on the fly. That's beyond my skills though.

If it is generated on the fly, you're screwed. You have no choice but to 
run the .exe, until you do the zip doesn't even exist. But if it is 
embedded, it can be extracted, and once the zip file is extracted, 
Python can easily unzip it.



-- 
Steve


More information about the Tutor mailing list