Re: [Python-ideas] Remote package/module imports through HTTP/S

On Thu, Aug 24, 2017 at 3:37 AM, John Torakis <john.torakis@gmail.com> wrote:
Right, but that just pushes the problem one level further out: you need to have a 100% dependable certificate chain. And that means absolutely completely trusting all of your root certificates, and it also means either not needing to add any _more_ root certificates, or being able to configure the cert store. As we've seen elsewhere, this is nontrivial.
Glad we agree about that! I have seen people wanting all sorts of things to become core features (usually for the sake of interactive work), and a lot of it is MUCH better handled as a non-core feature. Though a lot of what you're saying here - especially this:
could be equally well handled by pip-installing httpimport itself, and using that to bootstrap your testing procedures. Unless, of course, you're wanting to httpimport httpimport, in which case you're going to run into bootstrapping problems whichever way you do it :) I think we're on the same page here, but it definitely needs some more text in the README to explain this - particularly how this is not a replacement for pip. For example, my first thought on seeing this was "wow, that's going to be abysmally slow unless it has a cache", but the answer to that is: if you need a cache, you probably should be using pip to install things properly. Still -1 on this becoming a stdlib package, as there's nothing I've yet seen that can't be done as a third-party package. But it's less scary than I thought it was :) ChrisA

On 23 August 2017 at 18:49, Chris Angelico <rosuav@gmail.com> wrote:
IMO, this would make a great 3rd party package (I note that it's not yet published on PyPI). It's possible that it would end up being extremely popular, and recognised as sufficiently secure - at which point it may be worth considering for core inclusion. But it's also possible that it remains niche, and/or people aren't willing to take the security risks that it implies, in which case it's still useful to those who do like it. One aspect that hasn't been mentioned yet - as a 3rd party module, the user (or the organisation's security team) can control whether or not the ability to import over the web is available by controlling whether the module is allowed to be installed - whereas with a core module, it's there, like it or not, and *all* Python code has to be audited on the assumption that it might be used. I could easily imagine cases where the httpimport module was allowed on development machines and CI servers, but forbidden on production (and pre-production) systems. That option simply isn't available if the feature is in the core. Paul

This isn't ever going to be a standard feature. It's available as a third-party package and that's fine. I'd like to add a historic note -- this was first proposed around 1995 by Michael McLay. (Sorry, I don't have an email sitting around, but I'm sure he brought this up at or around the first Python workshop at NIST in 1995 -- I was his guest at NIST for several months at the time.) -- --Guido van Rossum (python.org/~guido)

Dark times... So is it a "case closed", or is there any improvement that will make it worth it to be an stdlib module? I mean, times have changed from 1995, and I am not referring to HTTPS invention. This is the reason that makes httpimport just tolerable security-wise. I'm talking about the need to rapidly test public code. I insist that testing code available on Github (or other repos), without the venv/clone/install hassle is a major improvement in my (and most sec researchers' I know) Python workflow. It makes REPL prototyping million times smoother. We all have created small scripts that auto load modules from URLs anyway. That's why I thought that this modules falls under the second category of 20.2.1 in https://docs.python.org/devguide/stdlibchanges.html (I did my homework before getting to mail in this list). So, if there is something that would make this module acceptable for stdlib, please let me know! I'd more than happily reform it and make it comply with Python stdlib requirements. John Torakis On 23/08/2017 21:48, Guido van Rossum wrote:

On 23/08/2017 22:06, Chris Angelico wrote:
Anyway, I will post it to PyPI when I finalize Github support and extend the testing a little bit. I will then shoot a mail again and repropose the module when it reaches full maturity.
Thank you all for your time! John Torakis

John Torakis writes:
But, as it seems like it is a very big feature (to me at least),
From the point of view of the blue team, checking for mere presence of httpimport in the environment is indicative of danger if it's
And "pip install httpimport" seems like it is a very small burden (to me at least). I agree with Paul Moore. Putting this in the stdlib seems both unnecessary, given pip, and an attractive nuisance for naive users. pip-able, useless if it's in the stdlib. With respect to "it just makes exec(urlopen()) easier", any code must be audited for application of exec() to user input anyway, regardless of whether it fetches stuff off the Internet. Adding httpimport use to the checklist adds a little bit of complexity to *every* security check, and a fair amount of danger in security-oblivious environments such as many university labs, and I would imagine many corporate development groups as well. YMMV, but from the point of view of the larger, security-conscious organization, I would say -1. It's an attractive nuisance unless you're a security person, and then pip is not a big deal. Steve

On 24 August 2017 at 05:04, John Torakis <john.torakis@gmail.com> wrote:
Not really, as even aside from the security concerns, there are simply too many ways that it can fail that are outside of our control, but would potentially lead to folks filing bug reports against CPython without realising that the problem actually lies somewhere else (e.g. with their network configuration). For a third party module, that's not a problem: - folks have to find out httpimport exists - folks have to decide "I want this" - folks have to explicitly install & enable it - folks still get to keep all the very shiny pieces when it breaks unexpectedly, but they also already know where to go for help :) Being a third party utility means you can also update it on your own timeline, rather than being limited to the standard library's relatively slow update and rollout cycles.
- it actually makes sense to define & maintain the import plugin APIs that make it possible - there's additional integration testing of those APIs happening beyond our own test suite Putting away my import system co-maintainer hat and donning my commercial redistributor hat: it already bothers some of our (and our customers') security folks that we ship package installation tools that access unfiltered third party package repositories by default (e.g. pip defaulting to querying PyPI). As a result, I'm pretty sure that even if upstream said "httpimport is in the Python standard library now!", we'd get explicit requests asking us to take it out of our redistributed version and make it at most an optional install (similar to what we do with IDLE and Tcl/Tk support in general). Cheers, Nick. P.S. As a potentially useful point of reference: "it's hard to debug when it breaks" is the main reason we resisted adding native lazy import support for so long, and that's just a matter of moving import errors away from the import statement and instead raising them as a side effect of an attribute access. It's also why we moved reload() *out* of the builtins in the move to Python 3: while module reloading is a fully supported operation, it also has a lot of subtleties that make it easy to get wrong. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23/08/2017 21:24, Paul Moore wrote:
But it remains an option to use it or not! I, for example, find myself REPLing more than scripting. When REPLing for something you plan to implement sometime-somehow, this module is really what you need! But when I finally create a script, I won't disable its offline functionality just to use httpimport. That would be suicidal! When I finally come with a working thing I will finally land the used packages to disk and to a virtual environment. My argument is that this module will add greatly to the Python's ad-hoc testing capabilities! I find it elegant for such feature to be in the stdlib of a language. I don't doubt that it can survive as a 3rd party module, though.

On Wed, Aug 23, 2017 at 10:37 AM, John Torakis <john.torakis@gmail.com> wrote:
Github can be trusted 100% percent for example.
This isn't even remotely close to true. While I'd agree with the statement that the SSL cert on github is reasonably trustworthy, the *content* on github is NOT trustworthy and that's where the security risk is. I agree that this is a useful feature and there is no way it should be on by default. The right way IMHO to do this is to have a command line option something like this: python --http-import somelib=https://github.com/someuser/somelib which then redefines the import somelib command to import from that source. Along with your scenario, it allows people, for example, to replace a library with a different version without modifying source or installing a different version. That's pretty useful. --- Bruce

On Thu, Aug 24, 2017 at 4:04 AM, Bruce Leban <bruce@leban.us> wrote:
If you read his README, it's pretty explicit about URLs; the risk is that "https://github.com/someuser/somelib" can be intercepted, not that "someuser" is malicious. If you're worried about the latter, don't use httpimport. ChrisA

On 23/08/2017 21:11, Chris Angelico wrote:
Again, if https://github.com/someuser/somelib can be intercepted, https://pypi.python.org/pypi can too. If HTTPS is intercepted so easily (when not used from browsers) we are f**ed...

On Wed, Aug 23, 2017 at 11:11 AM, Chris Angelico <rosuav@gmail.com> wrote:
I don't see the word "security" or "risk" in the readme. The risk is not just that someuser is malicious but the risk that they, their github credentials or their code have been compromised. The reason that if this feature were to be implemented, I would want it outside the source code (command line option) is that that puts the control in the hands of the person running the code. This is appropriate for the stated scenarios. There's no possibility of a hidden live github dependency. --- Bruce

Chris Angelico writes:
If you're worried about the latter, don't use httpimport.
I guarantee you that in my (university) environment, if httpimport is in the stdlib, its use will be rampant (and not just by students, but by security-oblivious faculty). I want to be able to walk up to a student, say "may I?" and type "python -m httpimport" to determine if that particular risky behavior is a worry. Because *I'm* liable for my students' PCs' behavior on the network. Personally speaking, +1 on PyPI, -100 on stdlib. Steve

On Thu, Aug 24, 2017 at 12:13 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Agreed, and a VERY good reason for this to be an explicitly-installed package. By its nature, it won't be a dependency of other packages, so keeping it out of the stdlib pretty much guarantees that it'll only be available if it's been called for by name. ChrisA

On 23/08/2017 21:04, Bruce Leban wrote:
Do we trust code on github? Do we trust code on PyPI? This is why I **don't** want it ON by default. You have to explicitly point the Finder/Loader to a repo that you created or you trust. And provide a list of available modules/packages to import from that URL too. If the developer isn't sure about the code she/he is importing then it is her/his fault... Same goes for pip installing though...
That's what I am thinking too! just provide the module so someone can "python -m" it, or start a REPL in the context that some packages/modules are available from a URL.
--- Bruce
John Torakis

On Thu, Aug 24, 2017 at 3:37 AM, John Torakis <john.torakis@gmail.com> wrote:
Right, but that just pushes the problem one level further out: you need to have a 100% dependable certificate chain. And that means absolutely completely trusting all of your root certificates, and it also means either not needing to add any _more_ root certificates, or being able to configure the cert store. As we've seen elsewhere, this is nontrivial.
Glad we agree about that! I have seen people wanting all sorts of things to become core features (usually for the sake of interactive work), and a lot of it is MUCH better handled as a non-core feature. Though a lot of what you're saying here - especially this:
could be equally well handled by pip-installing httpimport itself, and using that to bootstrap your testing procedures. Unless, of course, you're wanting to httpimport httpimport, in which case you're going to run into bootstrapping problems whichever way you do it :) I think we're on the same page here, but it definitely needs some more text in the README to explain this - particularly how this is not a replacement for pip. For example, my first thought on seeing this was "wow, that's going to be abysmally slow unless it has a cache", but the answer to that is: if you need a cache, you probably should be using pip to install things properly. Still -1 on this becoming a stdlib package, as there's nothing I've yet seen that can't be done as a third-party package. But it's less scary than I thought it was :) ChrisA

On 23 August 2017 at 18:49, Chris Angelico <rosuav@gmail.com> wrote:
IMO, this would make a great 3rd party package (I note that it's not yet published on PyPI). It's possible that it would end up being extremely popular, and recognised as sufficiently secure - at which point it may be worth considering for core inclusion. But it's also possible that it remains niche, and/or people aren't willing to take the security risks that it implies, in which case it's still useful to those who do like it. One aspect that hasn't been mentioned yet - as a 3rd party module, the user (or the organisation's security team) can control whether or not the ability to import over the web is available by controlling whether the module is allowed to be installed - whereas with a core module, it's there, like it or not, and *all* Python code has to be audited on the assumption that it might be used. I could easily imagine cases where the httpimport module was allowed on development machines and CI servers, but forbidden on production (and pre-production) systems. That option simply isn't available if the feature is in the core. Paul

This isn't ever going to be a standard feature. It's available as a third-party package and that's fine. I'd like to add a historic note -- this was first proposed around 1995 by Michael McLay. (Sorry, I don't have an email sitting around, but I'm sure he brought this up at or around the first Python workshop at NIST in 1995 -- I was his guest at NIST for several months at the time.) -- --Guido van Rossum (python.org/~guido)

Dark times... So is it a "case closed", or is there any improvement that will make it worth it to be an stdlib module? I mean, times have changed from 1995, and I am not referring to HTTPS invention. This is the reason that makes httpimport just tolerable security-wise. I'm talking about the need to rapidly test public code. I insist that testing code available on Github (or other repos), without the venv/clone/install hassle is a major improvement in my (and most sec researchers' I know) Python workflow. It makes REPL prototyping million times smoother. We all have created small scripts that auto load modules from URLs anyway. That's why I thought that this modules falls under the second category of 20.2.1 in https://docs.python.org/devguide/stdlibchanges.html (I did my homework before getting to mail in this list). So, if there is something that would make this module acceptable for stdlib, please let me know! I'd more than happily reform it and make it comply with Python stdlib requirements. John Torakis On 23/08/2017 21:48, Guido van Rossum wrote:

On 23/08/2017 22:06, Chris Angelico wrote:
Anyway, I will post it to PyPI when I finalize Github support and extend the testing a little bit. I will then shoot a mail again and repropose the module when it reaches full maturity.
Thank you all for your time! John Torakis

John Torakis writes:
But, as it seems like it is a very big feature (to me at least),
From the point of view of the blue team, checking for mere presence of httpimport in the environment is indicative of danger if it's
And "pip install httpimport" seems like it is a very small burden (to me at least). I agree with Paul Moore. Putting this in the stdlib seems both unnecessary, given pip, and an attractive nuisance for naive users. pip-able, useless if it's in the stdlib. With respect to "it just makes exec(urlopen()) easier", any code must be audited for application of exec() to user input anyway, regardless of whether it fetches stuff off the Internet. Adding httpimport use to the checklist adds a little bit of complexity to *every* security check, and a fair amount of danger in security-oblivious environments such as many university labs, and I would imagine many corporate development groups as well. YMMV, but from the point of view of the larger, security-conscious organization, I would say -1. It's an attractive nuisance unless you're a security person, and then pip is not a big deal. Steve

On 24 August 2017 at 05:04, John Torakis <john.torakis@gmail.com> wrote:
Not really, as even aside from the security concerns, there are simply too many ways that it can fail that are outside of our control, but would potentially lead to folks filing bug reports against CPython without realising that the problem actually lies somewhere else (e.g. with their network configuration). For a third party module, that's not a problem: - folks have to find out httpimport exists - folks have to decide "I want this" - folks have to explicitly install & enable it - folks still get to keep all the very shiny pieces when it breaks unexpectedly, but they also already know where to go for help :) Being a third party utility means you can also update it on your own timeline, rather than being limited to the standard library's relatively slow update and rollout cycles.
- it actually makes sense to define & maintain the import plugin APIs that make it possible - there's additional integration testing of those APIs happening beyond our own test suite Putting away my import system co-maintainer hat and donning my commercial redistributor hat: it already bothers some of our (and our customers') security folks that we ship package installation tools that access unfiltered third party package repositories by default (e.g. pip defaulting to querying PyPI). As a result, I'm pretty sure that even if upstream said "httpimport is in the Python standard library now!", we'd get explicit requests asking us to take it out of our redistributed version and make it at most an optional install (similar to what we do with IDLE and Tcl/Tk support in general). Cheers, Nick. P.S. As a potentially useful point of reference: "it's hard to debug when it breaks" is the main reason we resisted adding native lazy import support for so long, and that's just a matter of moving import errors away from the import statement and instead raising them as a side effect of an attribute access. It's also why we moved reload() *out* of the builtins in the move to Python 3: while module reloading is a fully supported operation, it also has a lot of subtleties that make it easy to get wrong. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23/08/2017 21:24, Paul Moore wrote:
But it remains an option to use it or not! I, for example, find myself REPLing more than scripting. When REPLing for something you plan to implement sometime-somehow, this module is really what you need! But when I finally create a script, I won't disable its offline functionality just to use httpimport. That would be suicidal! When I finally come with a working thing I will finally land the used packages to disk and to a virtual environment. My argument is that this module will add greatly to the Python's ad-hoc testing capabilities! I find it elegant for such feature to be in the stdlib of a language. I don't doubt that it can survive as a 3rd party module, though.

On Wed, Aug 23, 2017 at 10:37 AM, John Torakis <john.torakis@gmail.com> wrote:
Github can be trusted 100% percent for example.
This isn't even remotely close to true. While I'd agree with the statement that the SSL cert on github is reasonably trustworthy, the *content* on github is NOT trustworthy and that's where the security risk is. I agree that this is a useful feature and there is no way it should be on by default. The right way IMHO to do this is to have a command line option something like this: python --http-import somelib=https://github.com/someuser/somelib which then redefines the import somelib command to import from that source. Along with your scenario, it allows people, for example, to replace a library with a different version without modifying source or installing a different version. That's pretty useful. --- Bruce

On Thu, Aug 24, 2017 at 4:04 AM, Bruce Leban <bruce@leban.us> wrote:
If you read his README, it's pretty explicit about URLs; the risk is that "https://github.com/someuser/somelib" can be intercepted, not that "someuser" is malicious. If you're worried about the latter, don't use httpimport. ChrisA

On 23/08/2017 21:11, Chris Angelico wrote:
Again, if https://github.com/someuser/somelib can be intercepted, https://pypi.python.org/pypi can too. If HTTPS is intercepted so easily (when not used from browsers) we are f**ed...

On Wed, Aug 23, 2017 at 11:11 AM, Chris Angelico <rosuav@gmail.com> wrote:
I don't see the word "security" or "risk" in the readme. The risk is not just that someuser is malicious but the risk that they, their github credentials or their code have been compromised. The reason that if this feature were to be implemented, I would want it outside the source code (command line option) is that that puts the control in the hands of the person running the code. This is appropriate for the stated scenarios. There's no possibility of a hidden live github dependency. --- Bruce

Chris Angelico writes:
If you're worried about the latter, don't use httpimport.
I guarantee you that in my (university) environment, if httpimport is in the stdlib, its use will be rampant (and not just by students, but by security-oblivious faculty). I want to be able to walk up to a student, say "may I?" and type "python -m httpimport" to determine if that particular risky behavior is a worry. Because *I'm* liable for my students' PCs' behavior on the network. Personally speaking, +1 on PyPI, -100 on stdlib. Steve

On Thu, Aug 24, 2017 at 12:13 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Agreed, and a VERY good reason for this to be an explicitly-installed package. By its nature, it won't be a dependency of other packages, so keeping it out of the stdlib pretty much guarantees that it'll only be available if it's been called for by name. ChrisA

On 23/08/2017 21:04, Bruce Leban wrote:
Do we trust code on github? Do we trust code on PyPI? This is why I **don't** want it ON by default. You have to explicitly point the Finder/Loader to a repo that you created or you trust. And provide a list of available modules/packages to import from that URL too. If the developer isn't sure about the code she/he is importing then it is her/his fault... Same goes for pip installing though...
That's what I am thinking too! just provide the module so someone can "python -m" it, or start a REPL in the context that some packages/modules are available from a URL.
--- Bruce
John Torakis
participants (7)
-
Bruce Leban
-
Chris Angelico
-
Guido van Rossum
-
John Torakis
-
Nick Coghlan
-
Paul Moore
-
Stephen J. Turnbull