[Tutor] Using Python to access .txt files stored behind a firewall as .exe files

Mon May 1 20:12:40 EDT 2017

Thank you for the reply Mats.

I agree the fact that files are wrapped in an .exe is ridiculous. We're
talking about a $15B company that is doing this by the way, not a ma and pa
shop.  Anyways...

If I understand you correctly, you're saying I can:

1) Use Python to download the file from the web (but not by using a
webscraper, according to Alan)
2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to
unzip the file and place the .txt file in the desired folder

Am I understanding you correctly?

Thank you -Ian

On Mon, May 1, 2017 at 4:14 PM, Mats Wichmann <mats at wichmann.us> wrote:

> On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote:
> > On 01/05/17 18:20, Ian Monat wrote:
> >> ...  I've written a script using the requests module but I
> >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be
> >> required.
> >
> > I'm not sure what you are looking for. Scrapy, BS etc will
> > help you read the HTML but not to fetch the file. Also do
> > you want to process the file (extract the text) in Python
> > too, or is it enough to just fetch the file?
> >
> > If the problem is with reading the HTML then you need to
> > give us more detail about the problem areas and HTML
> > format.
> >
> > If the problem is fetching the file, it sounds like you
> > have already done that and it should be a case of fine
> > tuning/tidying up the code you've written.
> >
> > What kind of help exactly are you asking for?
> >
>
> This is a completely non-Python, non-Tutor response to part of this:
>
> The self-extracting archive. Convenience, at a price: running
> executables of unverified reliability is just a terrible idea.
>
> I know you said your disty won't change their website, but you should
> tell them they should: a tremendous number of organizations have
> policies that don't just allow pulling down and running an exe file from
> a website. Even if that's not currently the case for you, you could say
> that you're not allowed, and get someone in your management chain to
> promise to support that if there's a question - should not be hard. It
> may be wired into the distributor's content delivery system, but that's
> a stupid choice on their part.
>
> "Then you have you run the .exe which produces a zipped file"
>
> Don't do this ("run"), unless there's a way you trust to be able to
> verify the security of what is offered. Just about any payload could be
> buried in the exe, especially if someone broke in to the distributor's
> site.
>
> Possibly slightly pythonic:
>
> if it is really just a wrapper for a zipfile (i.e. the aforementioned
> self-extracting archive), you should be able to open it in 7zip or
> similar, and extract the zipfile, without ever "running" it.  And if
> that is the case, you should be able to script extracting the zipfile
> from the .exe, and then extracting the text file from the zipfile, using
> Python (or other scripting languages: that's not particularly
> Python-specific).
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>