[Distutils] command line versus python API for build systemabstraction (was Re: build system abstraction PEP)

Donald Stufft donald at stufft.io
Wed Nov 11 08:06:50 EST 2015


On November 11, 2015 at 7:51:05 AM, Steve Dower (steve.dower at python.org) wrote:
> As much as I dislike sniping into threads like this, my gut feeling is strongly pushing  
> towards defining the Python interface in the PEP and keeping command line interfaces  
> as private.
>  
> I don't have any new evidence, but pickle and binary stdio (not to mention TCP/HTTP for  
> doing things remotely) are reliable cross-platform where CLIs are not, so you're going  
> to have a horrible time locking down something that will work across multiple OS/shell  
> combinations. There are also limits to command lines lengths that may be triggered when  
> passing many long paths (if that ends up in there).

The flip side is we are already successfully creating a cross-platform CLI via setup.py. It’s not like that is some new thing that we’ve not been handling for like two decades already.

Pickle makes me nervous because it’s trivial for something to “leak” out of the subprocess into the main process that shouldn’t. For example, if we implement isolated builds then we might end up having a build tool like “mycoolbuildthing” installed not into the same location as pip, but added to PYTHONPATH when invoking the build tool. The build tool then returns some internally defined class as part of it’s interface and pickle dutifully serializes that. Then when we go to deserialize that in the main pip process, it blows up and fails because we don’t have “mycoolbuildthing” installed.

I could see an in language API if Python had a history of typed interfaces where we could write an interface that said “it is an error for this interface to ever return anything but a True/False” or some other such rule. However Python doesn’t and duck typing works against us here because build tool authors will have to be aware of how we’re serializing the results across the IPC boundary without actually having that IPC being defined.

>  
> Might be nice to have an in-proc option for builders too, so I can handle the IPC in my own  
> way. Maybe that's not useful, but with a Python interface it's trivial to enable.
>  
> Cheers,
> Steve
>  
> Top-posted from my Windows Phone
>  
> -----Original Message-----
> From: "Nathaniel Smith"  
> Sent: ‎11/‎11/‎2015 4:18
> To: "Robert Collins"  
> Cc: "DistUtils mailing list"  
> Subject: Re: [Distutils] command line versus python API for build systemabstraction  
> (was Re: build system abstraction PEP)
>  
> In case it's useful to make this discussion more concrete, here's a
> sketch of what the pip code for dealing with a build system defined by
> a Python API might look like:
>  
> https://gist.github.com/njsmith/75818a6debbce9d7ff48
>  
> Obviously there's room to build on this to get much fancier, but
> AFAICT even this minimal version is already enough to correctly handle
> all the important stuff -- schema version checking, error reporting,
> full args/kwargs/return values. (It does assume that we'll only use
> json-serializable data structures for argument and return values, but
> that seems like a good plan anyway. Pickle would probably be a bad
> idea because we're crossing between two different python environments
> that may have different or incompatible packages/classes available.)
>  
> -n
>  
> On Wed, Nov 11, 2015 at 1:04 AM, Nathaniel Smith wrote:
> > On Tue, Nov 10, 2015 at 11:27 PM, Robert Collins
> > wrote:
> >> On 11 November 2015 at 19:49, Nick Coghlan wrote:
> >>> On 11 November 2015 at 16:19, Robert Collins wrote:  
> >> ...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
> >>>> requirement unless Python's very fundamental import behaviour changes.
> >>>> Slapping a Python API on things is lipstick on a pig here IMO: we're
> >>>> going to have to downgrade any richer interface; and by specifying the
> >>>> actual LCD as the interface it is then amenable to direct exploration
> >>>> by users without them having to reverse engineer an undocumented thunk
> >>>> within pip.
> >>>
> >>> I'm not opposed to documenting how pip talks to its worker CLI - I
> >>> just share Nathan's concerns about locking that down in a PEP vs
> >>> keeping *that* CLI within pip's boundary of responsibilities, and
> >>> having a documented Python interface used for invoking build systems.
> >>
> >> I'm also very wary of something that would be an attractive nuisance.
> >> I've seen nothing suggesting that a Python API would be anything but:
> >> - it won't be usable [it requires the glue to set up an isolated
> >> context, which is buried in pip] in the general case
> >
> > This is exactly as true of a command line API -- in the general case
> > it also requires the glue to set up an isolated context. People who go
> > ahead and run 'flit' from their global environment instead of in the
> > isolated build environment will experience exactly the same problems
> > as people who go ahead and import 'flit.build_system_api' in their
> > global environment, so I don't see how one is any more of an
> > attractive nuisance than the other?
> >
> > AFAICT the main difference is that "setting up a specified Python
> > context and then importing something and exploring its API" is
> > literally what I do all day as a Python developer. Either way you have
> > to set stuff up, and then once you do, in the Python API case you get
> > stuff like tab completion, ipython introspection (? and ??), etc. for
> > free.
> >
> >> - no matter what we do, pip can't benefit from it beyond the
> >> subprocess interface pip needs, because pip *cannot* import and use
> >> the build interface
> >
> > Not sure what you mean by "benefit" here. At best this is an argument
> > that the two options have similar capabilities, in which case I would
> > argue that we should choose the one that leads to simpler and thus
> > more probably bug-free specification language.
> >
> > But even this isn't really true -- the difference between them is that
> > either way you have a subprocess API, but with a Python API, the
> > subprocess interface that pip uses has the option of being improved
> > incrementally over time -- including, potentially, to take further
> > advantage of the underlying richness of the Python semantics. Sure,
> > maybe the first release would just take all exceptions and map them
> > into some text printed to stderr and a non-zero return code, and
> > that's all that pip would get. But if someone had an idea for how pip
> > could do better than this by, I dunno, encoding some structured
> > metadata about the particular exception that occurred and passing this
> > back up to pip to do something intelligent with it, they absolutely
> > could write the code and submit a PR to pip, without having to write a
> > new PEP.
> >
> >> tl;dr - I think making the case that the layer we define should be a
> >> Python protocol rather than a subprocess protocol requires some really
> >> strong evidence. We're *not* dealing with the same moving parts that
> >> typical Python stuff requires.
> >
> > I'm very confused and honestly do not understand what you find
> > attractive about the subprocess protocol approach. Even your arguments
> > above aren't really even trying to be arguments that it's good, just
> > arguments that the Python API approach isn't much better. I'm sure
> > there is some reason you like it, and you might even have said it but
> > I missed it because I disagreed or something :-). But literally the
> > only reason I can think of right now for why one would prefer the
> > subprocess approach is that it lets one remove 50 lines of "worker
> > process" code from pip and move them into the individual build
> > backends instead, which I guess is a win if one is focused narrowly on
> > pip itself. But surely there is more I'm missing?
> >
> > (And even this is lines-of-code argument is actually pretty dubious --
> > right now your draft PEP is importing-by-reference an entire existing
> > codebase (!) for shell variable expansion in command lines, which is
> > code that simply doesn't need to exist in the Python API approach. I'd
> > be willing to bet that your approach requires more code in pip than
> > mine :-).)
> >
> >>> However, I've now realised that we're not constrained even if we start
> >>> with the CLI interface, as there's still a migration path to a Python
> >>> API based model:
> >>>
> >>> Now: documented CLI for invoking build systems
> >>> Future: documented Python API for invoking build systems, default
> >>> fallback invokes the documented CLI
> >>
> >> Or we just issue an updated bootstrap schema, and there's no fallback
> >> or anything needed.
> >
> > Oh no! But this totally gives up the most brilliant part of your
> > original idea! :-)
> >
> > In my original draft, I had each hook specified separately in the
> > bootstrap file, e.g. (super schematically):
> >
> > build-requirements = flit-build-requirements
> > do-wheel-build = flit-do-wheel-build
> > do-editable-build = flit-do-editable build
> >
> > and you counterproposed that instead there should just be one line like
> >
> > build-system = flit-build-system
> >
> > and this is exactly right, because it means that if some new
> > capability is added to the spec (e.g. a new hook -- like
> > hypothetically imagine if we ended up deferring the equivalent of
> > egg-info or editable-build-mode to v2), then the new capability just
> > needs to be implemented in pip and in flit, and then all the projects
> > that use flit immediately gain superpowers without anyone having to go
> > around and manually change all the bootstrap files in every project
> > individually.
> >
> > But for this to work it's crucial that the pip<->build-system
> > interface have some sort of versioning or negotiation beyond the
> > bootstrap file's schema version.
> >
> >>> So the CLI documented in the PEP isn't *necessarily* going to be the
> >>> one used by pip to communicate into the build environment - it may be
> >>> invoked locally within the build environment.
> >>
> >> No, it totally will be. Exactly as setup.py is today. Thats
> >> deliberate: The *new* thing we're setting out to enable is abstract
> >> build systems, not reengineering pip.
> >>
> >> The future - sure, someone can write a new thing, and the necessary
> >> capability we're building in to allow future changes will allow a new
> >> PEP to slot in easily and take on that [non trivial and substantial
> >> chunk of work]. (For instance, how do you do compiler and build system
> >> specific options when you have a CLI to talk to pip with)?
> >
> > I dunno, that seems pretty easy? My original draft just suggested that
> > the build hook would take a dict of string-valued keys, and then we'd
> > add some options to pip like "--project-build-option foo=bar" that
> > would set entries in that dict, and that's pretty much sufficient to
> > get the job done. To enable backcompat you'd also want to map the old
> > --install-option and --build-option switches to add entries to some
> > well-known keys in that dict. But none of the details here need to be
> > specified, because it's up to individual projects/build-systems to
> > assign meaning to this stuff and individual build-frontends like pip
> > to provide an interface to it -- at the build-frontent/build-backend
> > interface layer we just need some way to pass through the blobs.
> >
> > I admit that this is another case where the Python API approach is
> > making things trivial though ;-). If you want to pass arbitrary
> > user-specified data through a command-line API, while avoiding things
> > like potential namespace collisions between user-defined switches and
> > standard-defined switches, then you have to do much more work than
> > just say "there's another argument that's a dict".
> >
> > -n
> >
> > --
> > Nathaniel J. Smith -- http://vorpus.org
>  
>  
>  
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> Distutils-SIG maillist - Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
> _______________________________________________
> Distutils-SIG maillist - Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>  

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA




More information about the Distutils-SIG mailing list