On Tue, 4 Sep 2018 at 11:28, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Sep 4, 2018 at 3:10 AM, Paul Moore <p.f.moore@gmail.com> wrote:
There's very much an 80-20 question here, we need to avoid letting the needs of the 20% of projects with unusual needs, complicate usage for the 80%. On the other hand, of course, leaving the specialist cases with no viable solution also isn't reasonable, so even if tags aren't practical here, finding a solution that allows projects to ship specialised binaries some other way would be good. Just as a completely un-thought through suggestion, maybe we could have a mechanism where a small "generic" wheel can include pointers to specialised extra code that gets downloaded at install time?
Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code) Metadata - Implementation links: If we have a GPU -> <link to an archive of code to be added to the install> If we don't have a GPU -> <link to an alternative non-GPU archive>
There's obviously a lot of unanswered questions here, but maybe something like this would be better than forcing everything into the wheel tags?
I think you've reinvented Requires-Dist and PEP 508 markers :-). (The ones that look like '; python_version < "3.6"'.)
Oh, I see. Yes, I have haven't I? Aren't I clever (but forgetful)? :-)
Which IIUC was also Dustin's original suggestion: make it possible to write requirements like
tensorflow; not has_gpu tensorflow-gpu; has_gpu
Yes, I'd seen that, but thought it was in terms of using those markers to say "this wheel is only valid on systems with/without a GPU" (which doesn't work, because pip checks that too late). But you're right, using it in requires-dist does the right thing.
But... do we actually know enough to define a "has_gpu" marker? It isn't literally "this system has a gpu", right, it's something more like "this system has an NVIDIA-brand GPU of a certain generation or later with their proprietary libraries installed"? Or something like that? There are actually lots of packages on PyPI with foo/foo-gpu pairs, e.g. strawberryfields, paddlepaddle, magenta, cntk, deepspeech, ... Do these -gpu packages all have the same environmental requirements, or is it different from package to package?
Yep, that's the killer question here. IMO, someone needs to come up with a concrete proposal, along the likes of "here's some Python code that returns a True/False value, and we want to name that value using the market "has_gpu" (or whatever). There are then two debates: 1. Does that Python code return a value that's useful for a sufficiently large consensus of the projects who care about shipping GPU-enabled code? 2. Is has_gpu a sufficiently useful marker to warrant including in the packaging standards? Question 1 is for the package maintainers to debate, question 2 is for distutils-sig (IMO). If the package maintainers aren't sufficiently motivated to co-operate and come up with a concrete proposal, then there's not much that the non-specialists on distutils-sig can do.
It would help if we had folks in the conversation who actually work on these packages :-/. Anyone have contacts on the Tensorflow team? (It'd also be good to talk to them about platform specifiers... the tensorflow "manylinux1" wheels are really ubuntu-only, but they intentionally lie about that b/c there is no ubuntu tag; maybe they're interested in fixing that...?)
Anyway, I don't see how we could add an environment marker without having a precise definition, and one that's useful for multiple packages. Which may or may not be possible here...
See? I'm reinventing things other people have suggested again ;-)
Another wacky idea, maybe worth thinking about: should we let packages specify their own auto-detection code that pip should run? E.g. you could have a PEP 508 requirement like "somepkg; extension[otherpackage.key] = ..." and that means "install otherpackage inside the target Python environment, look up otherpackage.key, and use its value to decide whether to install somepkg". Maybe that's too messy to be worth it, but if "gpu detection" isn't a well-defined problem then maybe it's the best approach? Though basically that's what sdists do right now, and IIUC how tensorflow-gpu-detect works. Maybe tensorflow-gpu-detect should become the standard tensorflow library, with an sdist only, and at install time it could decide whether to pull in 'tensorflow-gpu' or 'tensorflow-nogpu'...
This feels to me like going back to runtime executable metadata. Maybe it's not, but I'd like to be careful here. Paul