Environment markers for GPU/CUDA availibility
Hi all, trying to pull together a few separate discussions into a single thread here. The main issue is that currently PEP 508 does not provide environment markers for GPU/CUDA availability, which leads to problems for projects that want to provide distributions for environments with and without GPU support. As far as I can tell, there's been multiple suggestions to bring this issue to distutils-sig, but no one has actually done it. Relevant issues: (closed) "How should Python packages depending on TensorFlow structure their requirements?" https://github.com/tensorflow/tensorflow/issues/7166 (closed) "Adding gpu or cuda specification in PEP 508" https://github.com/python/peps/issues/581 (closed) "More support for conditional installation" https://github.com/pypa/pipenv/issues/1353 (no response) "Adding gpu or cuda markers in PEP 508" https://github.com/pypa/interoperability-peps/issues/68 There is now a third-party project which attempts to amend this for tensorflow (https://github.com/akatrevorjay/tensorflow-auto-detect) but this approach is somewhat fragile (depends on version numbers being in sync), doesn't directly scale to all similar projects, and would require maintainers for a given project to maintain _three_ separate projects, instead of just one. I'm not intimately familiar with PEP 508, so my questions for this list: * Is the demand sufficient to justify supporting this use case? * Is it possible to add support for GPU Environment markers? * If so, what would need to be done? * If implemented, what should the transition look like for projects like tensorflow? Thanks! D.
I’m not knowledgable about GPUs, but from limited conversations with others, it is important to first decide what exactly the problem area is. Unlike currently available environment markers, there’s currently not a very reliable way to programmatically determine even if there is a GPU, let alone what that GPU can actually do (not every GPU can be used by Tensorflow, for example). IMO it would likely be a good route to first implement some interface for GPU environment detection in Python. This interface can then be used in projects like tensorflow-auto-detect. Projects like Tensorflow can also detect directly what implementation it should use, like many projects do platform-specific things by detection os.name of sys.platform. Once we’re sure we have all the things needed for detection, markers can be drafted based on the detection interface. TP
On 01/9/2018, at 03:57, Dustin Ingram <di@python.org> wrote:
Hi all, trying to pull together a few separate discussions into a single thread here.
The main issue is that currently PEP 508 does not provide environment markers for GPU/CUDA availability, which leads to problems for projects that want to provide distributions for environments with and without GPU support.
As far as I can tell, there's been multiple suggestions to bring this issue to distutils-sig, but no one has actually done it.
Relevant issues:
(closed) "How should Python packages depending on TensorFlow structure their requirements?" https://github.com/tensorflow/tensorflow/issues/7166
(closed) "Adding gpu or cuda specification in PEP 508" https://github.com/python/peps/issues/581
(closed) "More support for conditional installation" https://github.com/pypa/pipenv/issues/1353
(no response) "Adding gpu or cuda markers in PEP 508" https://github.com/pypa/interoperability-peps/issues/68
There is now a third-party project which attempts to amend this for tensorflow (https://github.com/akatrevorjay/tensorflow-auto-detect) but this approach is somewhat fragile (depends on version numbers being in sync), doesn't directly scale to all similar projects, and would require maintainers for a given project to maintain _three_ separate projects, instead of just one.
I'm not intimately familiar with PEP 508, so my questions for this list:
* Is the demand sufficient to justify supporting this use case? * Is it possible to add support for GPU Environment markers? * If so, what would need to be done? * If implemented, what should the transition look like for projects like tensorflow?
Thanks! D. -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/L...
What are the conditionals/criteria? - non Von Neumann (hardly debuggable)? - GPUs - CUDA support - TPUs - If the GPU card is detected but the drivers aren't installed, what should it do? On Friday, August 31, 2018, Tzu-ping Chung <uranusjr@gmail.com> wrote:
I’m not knowledgable about GPUs, but from limited conversations with others, it is important to first decide what exactly the problem area is. Unlike currently available environment markers, there’s currently not a very reliable way to programmatically determine even if there is a GPU, let alone what that GPU can actually do (not every GPU can be used by Tensorflow, for example).
IMO it would likely be a good route to first implement some interface for GPU environment detection in Python. This interface can then be used in projects like tensorflow-auto-detect. Projects like Tensorflow can also detect directly what implementation it should use, like many projects do platform-specific things by detection os.name of sys.platform. Once we’re sure we have all the things needed for detection, markers can be drafted based on the detection interface.
TP
On 01/9/2018, at 03:57, Dustin Ingram <di@python.org> wrote:
Hi all, trying to pull together a few separate discussions into a single thread here.
The main issue is that currently PEP 508 does not provide environment markers for GPU/CUDA availability, which leads to problems for projects that want to provide distributions for environments with and without GPU support.
As far as I can tell, there's been multiple suggestions to bring this issue to distutils-sig, but no one has actually done it.
Relevant issues:
(closed) "How should Python packages depending on TensorFlow structure their requirements?" https://github.com/tensorflow/tensorflow/issues/7166
(closed) "Adding gpu or cuda specification in PEP 508" https://github.com/python/peps/issues/581
(closed) "More support for conditional installation" https://github.com/pypa/pipenv/issues/1353
(no response) "Adding gpu or cuda markers in PEP 508" https://github.com/pypa/interoperability-peps/issues/68
There is now a third-party project which attempts to amend this for tensorflow (https://github.com/akatrevorjay/tensorflow-auto-detect) but this approach is somewhat fragile (depends on version numbers being in sync), doesn't directly scale to all similar projects, and would require maintainers for a given project to maintain _three_ separate projects, instead of just one.
I'm not intimately familiar with PEP 508, so my questions for this list:
* Is the demand sufficient to justify supporting this use case? * Is it possible to add support for GPU Environment markers? * If so, what would need to be done? * If implemented, what should the transition look like for projects like tensorflow?
Thanks! D. -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/ archives/list/distutils-sig@python.org/message/ LXLF4YSC4WUZOYRX65DW7CESIX7UUBK5/ -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/ archives/list/distutils-sig@python.org/message/ 3VLRDFVQS7E7BBAYF6WQQ5TM2QJMVEDX/
On Sat, 1 Sep 2018 at 11:02, Tzu-ping Chung <uranusjr@gmail.com> wrote:
I’m not knowledgable about GPUs, but from limited conversations with others, it is important to first decide what exactly the problem area is. Unlike currently available environment markers, there’s currently not a very reliable way to programmatically determine even if there is a GPU, let alone what that GPU can actually do (not every GPU can be used by Tensorflow, for example).
As Tzu-Ping notes, using environment markers currently requires that there be a well-defined "Python equivalent" to explain how installers should calculate the install-time value of the environment marker. However, even regular CPU detection has problems when it comes to environment markers, since platform_machine reports x86_64 on a 64-bit CPU, even if the current interpreter is built as a 32-bit binary, and there are other oddities like Linux having two different 32-bit ABIs (there's i686, which is the original 32 bit ABI that predates x86_64, and then there's x32, which is the full x86_64 instruction set, but using 32-bit pointers: https://github.com/pypa/pip/issues/4962 ). (Also see https://github.com/pypa/pipenv/issues/2397 for some additional discussion) Given the complexity of the problem, what we may want to do is to go with a manylinux style solution, where even though installers are expected to make a minimal effort to figure out an answer on their own, a particular answer can also be forced by installing a module into the current environment that has a particular attribute set to True or False. (See https://www.python.org/dev/peps/pep-0571/#platform-detection-for-installers for details) For example, let's suppose we call the magic module "__installmarkers__": - dunder-name to indicate that it's a special metadata module, rather than a regular one - "install markers" rather than "environment markers", since they're not for general purpose information about the environment, they're specifically the markers that relate to declarations of installation dependencies Given that, the environment marker lookup rules could be amended to say to check for "__installmarkers__.<name>" before checking the regular definition, and otherwise hard-to-define cases like "Is a GPU available?" could be handled by: - expanding the environment marker syntax to allow for standalone flag attributes (i.e. no comparison operation, just a flag name) - expanding the environment marker syntax to allow for negation (i.e. a preceding unary "not") - define "have_gpu" as a new flag attribute that's assumed to be "False" by default - by default, conditional dependencies like "cpu-only-version; not have_gpu" will get installed - setting "__installmarkers__.have_gpu" to True will mean that conditional dependencies like "gpu-optimised-version; have_gpu" will get installed To help improve forwards compatibility in the future, it may even make sense to say that all installers should treat unknown names in environment markers as "getattr(__installmarkers__, NAME, None)", and all environment markers that they can't parse as False. Note that I don't think it's possible for folks to get away from the "3 projects" requirement if publishers want their users to be able to selectively *install* the GPU optimised version - when you're keeping everything within one project, then you don't need an environment marker at all, you just decide at import time which version you're actually going to import. Cheers, Nick. P.S. As an alternative to a magic module, the install marker overrides could be placed in pyvenv.cfg. Even if we did that, we'd probably still want the magic module option though, as pyvenv.cfg doesn't exist for user-level and interpreter-installation-level installs. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2 Sep 2018, at 18:04, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat, 1 Sep 2018 at 11:02, Tzu-ping Chung <uranusjr@gmail.com> wrote:
I’m not knowledgable about GPUs, but from limited conversations with others, it is important to first decide what exactly the problem area is. Unlike currently available environment markers, there’s currently not a very reliable way to programmatically determine even if there is a GPU, let alone what that GPU can actually do (not every GPU can be used by Tensorflow, for example).
As Tzu-Ping notes, using environment markers currently requires that there be a well-defined "Python equivalent" to explain how installers should calculate the install-time value of the environment marker.
However, even regular CPU detection has problems when it comes to environment markers, since platform_machine reports x86_64 on a 64-bit CPU, even if the current interpreter is built as a 32-bit binary, and there are other oddities like Linux having two different 32-bit ABIs (there's i686, which is the original 32 bit ABI that predates x86_64, and then there's x32, which is the full x86_64 instruction set, but using 32-bit pointers: https://github.com/pypa/pip/issues/4962 ). (Also see https://github.com/pypa/pipenv/issues/2397 for some additional discussion)
This is primarily an indication that there is a missing API: an API that tells what architecture the Python interpreter is build for, rather than the architecture of the CPU. Or maybe not: distutils.util.get_platform() could be taught to do the right thing here, as was already done for macOS in the (ancient) past (although it is probably better to introduce a new API because of backward compatibility concerns). […]
Note that I don't think it's possible for folks to get away from the "3 projects" requirement if publishers want their users to be able to selectively *install* the GPU optimised version - when you're keeping everything within one project, then you don't need an environment marker at all, you just decide at import time which version you're actually going to import.
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past). Ronald
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms. Cheers, Nick.
Would warehouse need to be extended to support additional non-exclusive environment markers? On Monday, September 3, 2018, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
Cheers, Nick.
On 4 Sep 2018, at 01:51, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com <mailto:ronaldoussoren@mac.com>> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
Ok. I’m more used to much smaller deployments where I don’t always know up front what the capabilities are of the system that the code will run on. And looking at tensorflow specifically the difference in size is very much significant, the GPU variant is 5 times as large as the non-GPU variant (55MB vs 255MB). That’s a good reason for not wanting to unconditionally ship both variants. Ronald
On Tue, 4 Sep 2018 at 08:07, Ronald Oussoren via Distutils-SIG <distutils-sig@python.org> wrote:
On 4 Sep 2018, at 01:51, Nick Coghlan <ncoghlan@gmail.com> wrote: On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
Ok. I’m more used to much smaller deployments where I don’t always know up front what the capabilities are of the system that the code will run on.
And looking at tensorflow specifically the difference in size is very much significant, the GPU variant is 5 times as large as the non-GPU variant (55MB vs 255MB). That’s a good reason for not wanting to unconditionally ship both variants.
(Excuse messed up quoting - clients seem to use such different conventions for quoting these days, it's hard to manually fix things up sometimes :-() Without trying to minimise the impact of this issue, how niche is the problem we're discussing here? At some point, we need to be careful not to cram too much into tags - and ultimately tags are the only mechanism pip uses to determine what wheel it's going to install (currently, at least). If we were to switch to a scheme where installers need to check more generalised metadata (which is only available after you've downloaded the wheel and opened it up) then that has a significant cost in terms of bandwidth. We cannot assume that medadata is available without downloading the wheel, PEP 503 allows an index to expose Python-Requires (and could be extended to allow other metadata) but that's optional, and does nothing for a case like pip's `--find-links http://my.server/my/wheel/directory` which allows a plain directory to be served over HTTP, and allows for no metadata other than the filename. There's very much an 80-20 question here, we need to avoid letting the needs of the 20% of projects with unusual needs, complicate usage for the 80%. On the other hand, of course, leaving the specialist cases with no viable solution also isn't reasonable, so even if tags aren't practical here, finding a solution that allows projects to ship specialised binaries some other way would be good. Just as a completely un-thought through suggestion, maybe we could have a mechanism where a small "generic" wheel can include pointers to specialised extra code that gets downloaded at install time? Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code) Metadata - Implementation links: If we have a GPU -> <link to an archive of code to be added to the install> If we don't have a GPU -> <link to an alternative non-GPU archive> There's obviously a lot of unanswered questions here, but maybe something like this would be better than forcing everything into the wheel tags? Paul
On Tue, Sep 4, 2018 at 3:10 AM, Paul Moore <p.f.moore@gmail.com> wrote:
There's very much an 80-20 question here, we need to avoid letting the needs of the 20% of projects with unusual needs, complicate usage for the 80%. On the other hand, of course, leaving the specialist cases with no viable solution also isn't reasonable, so even if tags aren't practical here, finding a solution that allows projects to ship specialised binaries some other way would be good. Just as a completely un-thought through suggestion, maybe we could have a mechanism where a small "generic" wheel can include pointers to specialised extra code that gets downloaded at install time?
Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code) Metadata - Implementation links: If we have a GPU -> <link to an archive of code to be added to the install> If we don't have a GPU -> <link to an alternative non-GPU archive>
There's obviously a lot of unanswered questions here, but maybe something like this would be better than forcing everything into the wheel tags?
I think you've reinvented Requires-Dist and PEP 508 markers :-). (The ones that look like '; python_version < "3.6"'.) Which IIUC was also Dustin's original suggestion: make it possible to write requirements like tensorflow; not has_gpu tensorflow-gpu; has_gpu But... do we actually know enough to define a "has_gpu" marker? It isn't literally "this system has a gpu", right, it's something more like "this system has an NVIDIA-brand GPU of a certain generation or later with their proprietary libraries installed"? Or something like that? There are actually lots of packages on PyPI with foo/foo-gpu pairs, e.g. strawberryfields, paddlepaddle, magenta, cntk, deepspeech, ... Do these -gpu packages all have the same environmental requirements, or is it different from package to package? It would help if we had folks in the conversation who actually work on these packages :-/. Anyone have contacts on the Tensorflow team? (It'd also be good to talk to them about platform specifiers... the tensorflow "manylinux1" wheels are really ubuntu-only, but they intentionally lie about that b/c there is no ubuntu tag; maybe they're interested in fixing that...?) Anyway, I don't see how we could add an environment marker without having a precise definition, and one that's useful for multiple packages. Which may or may not be possible here... One thing that would help would be if tensorflow-gpu could say "Provides-Dist: tensorflow", so that downstream packages can say "Requires-Dist: tensorflow" and pip won't freak out if the user has manually installed tensorflow-gpu instead. E.g. in the proposal at [1], you could have 'tensorflow' as one wheel and 'tensorflow[gpu]' as a second wheel that 'Provides-Dist: tensorflow'. Conflicts-Dist would also be useful, though might require a real resolver first. Another wacky idea, maybe worth thinking about: should we let packages specify their own auto-detection code that pip should run? E.g. you could have a PEP 508 requirement like "somepkg; extension[otherpackage.key] = ..." and that means "install otherpackage inside the target Python environment, look up otherpackage.key, and use its value to decide whether to install somepkg". Maybe that's too messy to be worth it, but if "gpu detection" isn't a well-defined problem then maybe it's the best approach? Though basically that's what sdists do right now, and IIUC how tensorflow-gpu-detect works. Maybe tensorflow-gpu-detect should become the standard tensorflow library, with an sdist only, and at install time it could decide whether to pull in 'tensorflow-gpu' or 'tensorflow-nogpu'... -n [1] https://mail.python.org/pipermail/distutils-sig/2015-October/027364.html -- Nathaniel J. Smith -- https://vorpus.org
On Tue, 4 Sep 2018 at 11:28, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Sep 4, 2018 at 3:10 AM, Paul Moore <p.f.moore@gmail.com> wrote:
There's very much an 80-20 question here, we need to avoid letting the needs of the 20% of projects with unusual needs, complicate usage for the 80%. On the other hand, of course, leaving the specialist cases with no viable solution also isn't reasonable, so even if tags aren't practical here, finding a solution that allows projects to ship specialised binaries some other way would be good. Just as a completely un-thought through suggestion, maybe we could have a mechanism where a small "generic" wheel can include pointers to specialised extra code that gets downloaded at install time?
Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code) Metadata - Implementation links: If we have a GPU -> <link to an archive of code to be added to the install> If we don't have a GPU -> <link to an alternative non-GPU archive>
There's obviously a lot of unanswered questions here, but maybe something like this would be better than forcing everything into the wheel tags?
I think you've reinvented Requires-Dist and PEP 508 markers :-). (The ones that look like '; python_version < "3.6"'.)
Oh, I see. Yes, I have haven't I? Aren't I clever (but forgetful)? :-)
Which IIUC was also Dustin's original suggestion: make it possible to write requirements like
tensorflow; not has_gpu tensorflow-gpu; has_gpu
Yes, I'd seen that, but thought it was in terms of using those markers to say "this wheel is only valid on systems with/without a GPU" (which doesn't work, because pip checks that too late). But you're right, using it in requires-dist does the right thing.
But... do we actually know enough to define a "has_gpu" marker? It isn't literally "this system has a gpu", right, it's something more like "this system has an NVIDIA-brand GPU of a certain generation or later with their proprietary libraries installed"? Or something like that? There are actually lots of packages on PyPI with foo/foo-gpu pairs, e.g. strawberryfields, paddlepaddle, magenta, cntk, deepspeech, ... Do these -gpu packages all have the same environmental requirements, or is it different from package to package?
Yep, that's the killer question here. IMO, someone needs to come up with a concrete proposal, along the likes of "here's some Python code that returns a True/False value, and we want to name that value using the market "has_gpu" (or whatever). There are then two debates: 1. Does that Python code return a value that's useful for a sufficiently large consensus of the projects who care about shipping GPU-enabled code? 2. Is has_gpu a sufficiently useful marker to warrant including in the packaging standards? Question 1 is for the package maintainers to debate, question 2 is for distutils-sig (IMO). If the package maintainers aren't sufficiently motivated to co-operate and come up with a concrete proposal, then there's not much that the non-specialists on distutils-sig can do.
It would help if we had folks in the conversation who actually work on these packages :-/. Anyone have contacts on the Tensorflow team? (It'd also be good to talk to them about platform specifiers... the tensorflow "manylinux1" wheels are really ubuntu-only, but they intentionally lie about that b/c there is no ubuntu tag; maybe they're interested in fixing that...?)
Anyway, I don't see how we could add an environment marker without having a precise definition, and one that's useful for multiple packages. Which may or may not be possible here...
See? I'm reinventing things other people have suggested again ;-)
Another wacky idea, maybe worth thinking about: should we let packages specify their own auto-detection code that pip should run? E.g. you could have a PEP 508 requirement like "somepkg; extension[otherpackage.key] = ..." and that means "install otherpackage inside the target Python environment, look up otherpackage.key, and use its value to decide whether to install somepkg". Maybe that's too messy to be worth it, but if "gpu detection" isn't a well-defined problem then maybe it's the best approach? Though basically that's what sdists do right now, and IIUC how tensorflow-gpu-detect works. Maybe tensorflow-gpu-detect should become the standard tensorflow library, with an sdist only, and at install time it could decide whether to pull in 'tensorflow-gpu' or 'tensorflow-nogpu'...
This feels to me like going back to runtime executable metadata. Maybe it's not, but I'd like to be careful here. Paul
On Tue, 4 Sep 2018 at 20:50, Paul Moore <p.f.moore@gmail.com> wrote:
On Tue, 4 Sep 2018 at 11:28, Nathaniel Smith <njs@pobox.com> wrote:
But... do we actually know enough to define a "has_gpu" marker? It isn't literally "this system has a gpu", right, it's something more like "this system has an NVIDIA-brand GPU of a certain generation or later with their proprietary libraries installed"? Or something like that? There are actually lots of packages on PyPI with foo/foo-gpu pairs, e.g. strawberryfields, paddlepaddle, magenta, cntk, deepspeech, ... Do these -gpu packages all have the same environmental requirements, or is it different from package to package?
Yep, that's the killer question here. IMO, someone needs to come up with a concrete proposal, along the likes of "here's some Python code that returns a True/False value, and we want to name that value using the market "has_gpu" (or whatever). There are then two debates:
1. Does that Python code return a value that's useful for a sufficiently large consensus of the projects who care about shipping GPU-enabled code? 2. Is has_gpu a sufficiently useful marker to warrant including in the packaging standards?
Question 1 is for the package maintainers to debate, question 2 is for distutils-sig (IMO). If the package maintainers aren't sufficiently motivated to co-operate and come up with a concrete proposal, then there's not much that the non-specialists on distutils-sig can do.
As far as markers go, "naive check with an override for when it's inevitably wrong" seems to have worked pretty well in the manylinux case (the code just checks for the available glibc version if there's no "_manylinux" module to give an explicit answer). For the GPU case, the naive check (at least initially) could just be "False", and folks would have to set the explicit marker in order to opt-in to the GPU accelerated variant. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Sep 4, 2018 at 7:32 PM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Sep 4, 2018 at 3:10 AM, Paul Moore <p.f.moore@gmail.com> wrote:
There's very much an 80-20 question here, we need to avoid letting the needs of the 20% of projects with unusual needs, complicate usage for the 80%. On the other hand, of course, leaving the specialist cases with no viable solution also isn't reasonable, so even if tags aren't practical here, finding a solution that allows projects to ship specialised binaries some other way would be good. Just as a completely un-thought through suggestion, maybe we could have a mechanism where a small "generic" wheel can include pointers to specialised extra code that gets downloaded at install time?
Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code) Metadata - Implementation links: If we have a GPU -> <link to an archive of code to be added to the install> If we don't have a GPU -> <link to an alternative non-GPU archive>
There's obviously a lot of unanswered questions here, but maybe something like this would be better than forcing everything into the wheel tags?
I think you've reinvented Requires-Dist and PEP 508 markers :-). (The ones that look like '; python_version < "3.6"'.) Which IIUC was also Dustin's original suggestion: make it possible to write requirements like
tensorflow; not has_gpu tensorflow-gpu; has_gpu
But... do we actually know enough to define a "has_gpu" marker? It isn't literally "this system has a gpu", right, it's something more like "this system has an NVIDIA-brand GPU of a certain generation or later with their proprietary libraries installed"? Or something like that? There are actually lots of packages on PyPI with foo/foo-gpu pairs, e.g. strawberryfields, paddlepaddle, magenta, cntk, deepspeech, ... Do these -gpu packages all have the same environmental requirements, or is it different from package to package?
It would help if we had folks in the conversation who actually work on these packages :-/. Anyone have contacts on the Tensorflow team? (It'd also be good to talk to them about platform specifiers... the tensorflow "manylinux1" wheels are really ubuntu-only, but they intentionally lie about that b/c there is no ubuntu tag; maybe they're interested in fixing that...?)
Anyway, I don't see how we could add an environment marker without having a precise definition, and one that's useful for multiple packages. Which may or may not be possible here...
One thing that would help would be if tensorflow-gpu could say "Provides-Dist: tensorflow", so that downstream packages can say "Requires-Dist: tensorflow" and pip won't freak out if the user has manually installed tensorflow-gpu instead. E.g. in the proposal at [1], you could have 'tensorflow' as one wheel and 'tensorflow[gpu]' as a second wheel that 'Provides-Dist: tensorflow'. Conflicts-Dist would also be useful, though might require a real resolver first.
I don't know the situation of pip w.r.t. resolver algo, but having a provide-like mechanism would completely solve the issue of tensorflow vs tensorflow-gpu for the usecases I have seen at my current company, heavy user of tensorflow. It would solve the two main issues of not being able to specify tensorflow as a dependency, and avoiding accidental clobbering of tf vs tf-gpu. At least in the usecases I am familiar with, I don't find the use of tag and gpu detection that useful. And it would be really hard to get it right because as you notice, it is not about GPU being available, but about "nvidia GPU with proprietary nvidia driver and the appropriate version of cuda version". I also don't necessarily want to install tensorflow-gpu because I have a GPU. David
Another wacky idea, maybe worth thinking about: should we let packages specify their own auto-detection code that pip should run? E.g. you could have a PEP 508 requirement like "somepkg; extension[otherpackage.key] = ..." and that means "install otherpackage inside the target Python environment, look up otherpackage.key, and use its value to decide whether to install somepkg". Maybe that's too messy to be worth it, but if "gpu detection" isn't a well-defined problem then maybe it's the best approach? Though basically that's what sdists do right now, and IIUC how tensorflow-gpu-detect works. Maybe tensorflow-gpu-detect should become the standard tensorflow library, with an sdist only, and at install time it could decide whether to pull in 'tensorflow-gpu' or 'tensorflow-nogpu'...
-n
[1] https://mail.python.org/pipermail/distutils-sig/2015-October/027364.html
-- Nathaniel J. Smith -- https://vorpus.org -- Distutils-SIG mailing list -- distutils-sig@python.org To unsubscribe send an email to distutils-sig-leave@python.org https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/ Message archived at https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/A...
On Mon, Sep 3, 2018 at 4:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
None of the links that Dustin gave at the top of the thread are about managed systems though. As far as I can tell, they all come down to one of two issues: given "tensorflow" and "tensorflow-gpu" are both on PyPI, how can (a) users automatically get the appropriate version without having to manually select one, and (b) other packages express a dependency on "tensorflow or tensorflow-gpu"? And maybe (c) how can we stop tensorflow and tensorflow-gpu from accidentally getting installed on top of each other. -n -- Nathaniel J. Smith -- https://vorpus.org
On Tue, 4 Sep 2018 at 20:30, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Sep 3, 2018 at 4:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com> wrote:
What’s the problem with including GPU and non-GPU variants of code in a binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just that once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
None of the links that Dustin gave at the top of the thread are about managed systems though.
When you're only managing a few systems, or only saving a few MB per download, "install both and pick at runtime" is an entirely viable option. However, since tensorflow is the example, neither of those cases is true: 1. It's a Google project, so they have tens of thousands of instances to worry about (as do other cloud providers) 2. The size difference is in the tens or even hundreds of megabytes I didn't actually realise the GPU tensorflow package was over 200 MB, though - that's large enough to be noticeably slow to extract and install even from a local zipfile, let alone if you're needing to download it first. Cheers, Nick. P.S. If anyone were to be worrying about this problem in the context of PyPI specifically, the most likely candidates would actually be Fastly, and then to a lesser degree, AWS, as they're handling the bulk of any data transfer costs associated with folks downloading packages that are larger than they really need to be :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2018-09-05 00:42:04 +1000 (+1000), Nick Coghlan wrote: [...]
I didn't actually realise the GPU tensorflow package was over 200 MB, though - that's large enough to be noticeably slow to extract and install even from a local zipfile, let alone if you're needing to download it first. [...]
Yes. If you haven't tried running a mirror of PyPI lately you're likely not to have noticed, but the various nightly builds for tensorflow seem to be the majority of the data on PyPI now. I'm sure it's a very neat and useful tool, but we basically had to switch from mirroring PyPI in our CI system to using a caching proxy because of this. -- Jeremy Stanley
On Tue, Sep 4, 2018 at 11:33 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
Yes. If you haven't tried running a mirror of PyPI lately you're likely not to have noticed, but the various nightly builds for tensorflow seem to be the majority of the data on PyPI now. I'm sure it's a very neat and useful tool, but we basically had to switch from mirroring PyPI in our CI system to using a caching proxy because of this.
Side note: PyPI now provides a list of the largest packages by total filesize: https://pypi.org/stats/ Depending on what mirror you're using, you may be able to exclude these packages from your mirror if you don't need them, e.g. for bandersnatch: https://github.com/pypa/bandersnatch/blob/master/docs/filtering_configuratio...
What's Fastly's monthly/yearly cost? Thanks Fastly! https://www.fastly.com On Tuesday, September 4, 2018, Dustin Ingram <di@python.org> wrote:
On Tue, Sep 4, 2018 at 11:33 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
Yes. If you haven't tried running a mirror of PyPI lately you're likely not to have noticed, but the various nightly builds for tensorflow seem to be the majority of the data on PyPI now. I'm sure it's a very neat and useful tool, but we basically had to switch from mirroring PyPI in our CI system to using a caching proxy because of this.
Side note: PyPI now provides a list of the largest packages by total filesize: https://pypi.org/stats/
Bandwidth & download counts might be helpful here too.
Depending on what mirror you're using, you may be able to exclude these packages from your mirror if you don't need them, e.g. for bandersnatch: https://github.com/pypa/bandersnatch/blob/master/docs/ filtering_configuration.md#blacklist-filtering-settings
A list of these as such might be helpful for maintenance of mirrors. Is it possible to donate to PSF specifically for PyPA? "Donation for the Packaging Workgroup" https://psfmember.org/civicrm/contribute/transact?reset=1&id=13 ... $5 minimum.
How much of the distributed data in these packages is redundant between versions? Can those parts be factored out into another dependency? On Tuesday, September 4, 2018, Wes Turner <wes.turner@gmail.com> wrote:
What's Fastly's monthly/yearly cost?
Thanks Fastly!
On Tuesday, September 4, 2018, Dustin Ingram <di@python.org> wrote:
On Tue, Sep 4, 2018 at 11:33 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
Yes. If you haven't tried running a mirror of PyPI lately you're likely not to have noticed, but the various nightly builds for tensorflow seem to be the majority of the data on PyPI now. I'm sure it's a very neat and useful tool, but we basically had to switch from mirroring PyPI in our CI system to using a caching proxy because of this.
Side note: PyPI now provides a list of the largest packages by total filesize: https://pypi.org/stats/
Bandwidth & download counts might be helpful here too.
Depending on what mirror you're using, you may be able to exclude these packages from your mirror if you don't need them, e.g. for bandersnatch: https://github.com/pypa/bander snatch/blob/master/docs/filtering_configuration.md#blacklist -filtering-settings
A list of these as such might be helpful for maintenance of mirrors.
Is it possible to donate to PSF specifically for PyPA? "Donation for the Packaging Workgroup" https://psfmember.org/civicrm/contribute/transact?reset=1&id=13 ... $5 minimum.
On Sep 4, 2018, at 1:36 PM, Wes Turner <wes.turner@gmail.com> wrote:
What's Fastly's monthly/yearly cost?
The PSF “Bill” for Fastly in August was $132,086.96 before the discount which brought it down to $0. I think that might be missing a thousand or two in extra features that were manually enabled for our account years ago that don’t show up on the bill. That includes all of the PSF’s use of Fastly, but PyPI is like 95%+ of that.
They'd probably just ask for donations to go to PyPA. On Wednesday, September 5, 2018, Donald Stufft <donald@stufft.io> wrote:
On Sep 4, 2018, at 1:36 PM, Wes Turner <wes.turner@gmail.com> wrote:
What's Fastly's monthly/yearly cost?
The PSF “Bill” for Fastly in August was $132,086.96 before the discount which brought it down to $0. I think that might be missing a thousand or two in extra features that were manually enabled for our account years ago that don’t show up on the bill.
That includes all of the PSF’s use of Fastly, but PyPI is like 95%+ of that.
On 2018-09-04 11:40:17 -0500 (-0500), Dustin Ingram wrote:
On Tue, Sep 4, 2018 at 11:33 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
Yes. If you haven't tried running a mirror of PyPI lately you're likely not to have noticed, but the various nightly builds for tensorflow seem to be the majority of the data on PyPI now. I'm sure it's a very neat and useful tool, but we basically had to switch from mirroring PyPI in our CI system to using a caching proxy because of this.
Side note: PyPI now provides a list of the largest packages by total filesize: https://pypi.org/stats/
Depending on what mirror you're using, you may be able to exclude these packages from your mirror if you don't need them, e.g. for bandersnatch: https://github.com/pypa/bandersnatch/blob/master/docs/filtering_configuratio...
We played whack-a-mole blacklisting some of the largest offenders in our bandersnatch config for a while, but really needed to rebuild the mirror from scratch since there's no easy way to go back and delete the now-blacklisted packages from before the blacklist entries were added (and that's a week+ effort to bootstrap a new mirror these days). In the end we just switched to a caching proxy we already had on hand because it got us most of the benefit of mirroring with a tiny fraction of the disk space, given we use fewer than 1000 packaged Python library dependencies across our CI jobs anyway. -- Jeremy Stanley
On Tue, Sep 4, 2018, 07:42 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Tue, 4 Sep 2018 at 20:30, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Sep 3, 2018 at 4:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldoussoren@mac.com
wrote:
What’s the problem with including GPU and non-GPU variants of code in
binary wheel other than the size of the wheel? I tend to prefer binaries that work “everywhere", even if that requires some more work in building binaries (such as including multiple variants of extensions to have optimised code for different CPU variants, such as SSE and non-SSE variants in the past).
As far as I'm aware, binary artifact size *is* the problem. It's just
a that
once you're automatically building and pushing an artifact (or an image containing that artifact) to thousands or tens of thousands of managed systems, the wasted bandwidth from pushing redundant implementations of the same functionality becomes more of a concern than the convenience of being able to use the same artifact across multiple platforms.
None of the links that Dustin gave at the top of the thread are about managed systems though.
When you're only managing a few systems, or only saving a few MB per download, "install both and pick at runtime" is an entirely viable option.
Sure, this is true, and obviously size is a major reason for splitting up these packages, but this doesn't have anything in particular to do with managed systems AFAICT.
However, since tensorflow is the example, neither of those cases is true:
1. It's a Google project, so they have tens of thousands of instances to worry about (as do other cloud providers)
They do have those instances, but they handle them via totally different methods that don't involve PyPI package names or pip's dependency tracking. (Specifically, a giant internal monorepo where they check in every piece of code they use, and then they build everything from source through their internal version of Bazel.) This is about how they, and other projects, are distributed to the general public on PyPI, and how to manage that public, shared dependency graph. -n
participants (10)
-
David Cournapeau
-
Donald Stufft
-
Dustin Ingram
-
Jeremy Stanley
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Ronald Oussoren
-
Tzu-ping Chung
-
Wes Turner