[Tutor] Using Python to access .txt files stored behind a firewall as .exe files
Steven D'Aprano
steve at pearwood.info
Tue May 2 12:44:12 EDT 2017
On Mon, May 01, 2017 at 10:20:42AM -0700, Ian Monat wrote:
[...]
> Then you have you run the .exe which produces a zipped file, and inside the
> zipped file, is the .txt, which what I really want. There's no way the
> distributor will change anything about how they store files on their
> website for me. I've written a script using the requests module but I
> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> required.
>
> What would you do?
Find another distributor.
(Its this sort of business to business incompetence that makes me laugh
when people say that private industry is always more efficient than the
alternatives. Did I say laugh? I meant cry.)
Seriously, can't you tell them that your anti-virus blocks the .exe
files, and if they want you to use their system, they'll have to provide
text files as text files?
Or tell them that you're using Apple Macs and the .exe files don't run
under Mac.
I guess it depends on whether you need them more than they need you.
In any case, this isn't a problem that can be solved by a web scraper.
The distributor's website provides .exe files. There's nothing you can
do about that except complain or leave. The website gives you a .exe
file, so that's what you receive.
However, once you have the .exe file in your possession, you *may* be
able to hack open the file and extract the .zip file without running it.
That will require detailed knowledge of how the .exe file does its job,
but it is conceivable that it will work. A good low-level hacker could
probably determine whether the zip file is embedded in the .exe or if it
is generated on the fly. That's beyond my skills though.
If it is generated on the fly, you're screwed. You have no choice but to
run the .exe, until you do the zip doesn't even exist. But if it is
embedded, it can be extracted, and once the zip file is extracted,
Python can easily unzip it.
--
Steve
More information about the Tutor
mailing list