[Tutor] Using Python to access .txt files stored behind a firewall as .exe files
ian.monat at gmail.com
Mon May 1 20:12:40 EDT 2017
Thank you for the reply Mats.
I agree the fact that files are wrapped in an .exe is ridiculous. We're
talking about a $15B company that is doing this by the way, not a ma and pa
If I understand you correctly, you're saying I can:
1) Use Python to download the file from the web (but not by using a
webscraper, according to Alan)
2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to
unzip the file and place the .txt file in the desired folder
Am I understanding you correctly?
Thank you -Ian
On Mon, May 1, 2017 at 4:14 PM, Mats Wichmann <mats at wichmann.us> wrote:
> On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote:
> > On 01/05/17 18:20, Ian Monat wrote:
> >> ... I've written a script using the requests module but I
> >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> >> required.
> > I'm not sure what you are looking for. Scrapy, BS etc will
> > help you read the HTML but not to fetch the file. Also do
> > you want to process the file (extract the text) in Python
> > too, or is it enough to just fetch the file?
> > If the problem is with reading the HTML then you need to
> > give us more detail about the problem areas and HTML
> > format.
> > If the problem is fetching the file, it sounds like you
> > have already done that and it should be a case of fine
> > tuning/tidying up the code you've written.
> > What kind of help exactly are you asking for?
> This is a completely non-Python, non-Tutor response to part of this:
> The self-extracting archive. Convenience, at a price: running
> executables of unverified reliability is just a terrible idea.
> I know you said your disty won't change their website, but you should
> tell them they should: a tremendous number of organizations have
> policies that don't just allow pulling down and running an exe file from
> a website. Even if that's not currently the case for you, you could say
> that you're not allowed, and get someone in your management chain to
> promise to support that if there's a question - should not be hard. It
> may be wired into the distributor's content delivery system, but that's
> a stupid choice on their part.
> "Then you have you run the .exe which produces a zipped file"
> Don't do this ("run"), unless there's a way you trust to be able to
> verify the security of what is offered. Just about any payload could be
> buried in the exe, especially if someone broke in to the distributor's
> Possibly slightly pythonic:
> if it is really just a wrapper for a zipfile (i.e. the aforementioned
> self-extracting archive), you should be able to open it in 7zip or
> similar, and extract the zipfile, without ever "running" it. And if
> that is the case, you should be able to script extracting the zipfile
> from the .exe, and then extracting the text file from the zipfile, using
> Python (or other scripting languages: that's not particularly
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
More information about the Tutor