I'd like to start a discussion around practices for vendoring package
dependencies. I'm not sure python-dev is the appropriate venue for this
discussion. If not, please point me to one and I'll gladly take it there.
I'll start with a problem statement.
Not all consumers of Python packages wish to consume Python packages in the
common `pip install <package>` + `import <package>` manner. Some Python
applications may wish to vendor Python package dependencies such that known
compatible versions are always available.
For example, a Python application targeting a general audience may not wish
to expose the existence of Python nor want its users to be concerned about
Python packaging. This is good for the application because it reduces
complexity and the surface area of things that can go wrong.
But at the same time, Python applications need to be aware that the Python
environment may contain more than just the Python standard library and
whatever Python packages are provided by that application. If using the
system Python executable, other system packages may have installed Python
packages in the system site-packages and those packages would be visible to
your application. A user could `pip install` a package and that would be in
the Python environment used by your application. In short, unless your
application distributes its own copy of Python, all bets are off with
regards to what packages are installed. (And even then advanced users could
muck with the bundled Python, but let's ignore that edge case.)
In short, `import X` is often the wild west. For applications that want to
"just work" without requiring end users to manage Python packages, `import
X` is dangerous because `X` could come from anywhere and be anything -
possibly even a separate code base providing the same package name!
Since Python applications may not want to burden users with Python
packaging, they may vendor Python package dependencies such that a known
compatible version is always available. In most cases, a Python application
can insert itself into `sys.path` to ensure its copies of packages are
picked up first. This works a lot of the time. But the strategy can fall
Some Python applications support loading plugins or extensions. When
user-provided code can be executed, that code could have dependencies on
additional Python packages. Or that custom code could perform `sys.path`
modifications to provide its own package dependencies. What this means is
that `import X` from the perspective of the main application becomes
dangerous again. You want to pick up the packages that you provided. But
you just aren't sure that those packages will actually be picked up. And to
complicate matters even more, an extension may wish to use a *different*
version of a package from what you distribute. e.g. they may want to adopt
the latest version that you haven't ported to you or they may want to use
an old versions because they haven't ported yet. So now you have the
requirements that multiple versions of packages be available. In Python's
shared module namespace, that means having separate package names.
A partial solution to this quagmire is using relative - not absolute -
imports. e.g. say you have a package named "knights." It has a dependency
on a 3rd party package named "shrubbery." Let's assume you distribute your
application with a copy of "shrubbery" which is installed at some packages
root, alongside "knights:"
If from `knights.ni` you `import shrubbery`, you /could/ get the copy of
"shrubbery" distributed by your application. Or you could pick up some
other random copy that is also installed somewhere in `sys.path`.
Whereas if you vendor "shrubbery" into your package. e.g.
Then from `knights.ni` you `from .vendored import shrubbery`, you are
*guaranteed* to get your local copy of the "shrubbery" package.
This reliable behavior is highly desired by Python applications.
But there are problems.
What we've done is effectively rename the "shrubbery" package to
"knights.vendored.shrubbery." If a module inside that package attempts an
`import shrubbery.x`, this could fail because "shrubbery" is no longer the
package name. Or worse, it could pick up a separate copy of "shrubbery"
somewhere else in `sys.path` and you could have a Frankenstein package
pulling its code from multiple installs. So for this to work, all
package-local imports must be using relative imports. e.g. `from . import
The takeaway is that packages using relative imports for their own modules
are much more flexible and therefore friendly to downstream consumers that
may wish to vendor them under different names. Packages using relative
imports can be dropped in and used, often without source modifications.
This is a big deal, as downstream consumers don't want to be
modifying/forking packages they don't maintain. Because of the advantages
of relative imports, *I've individually reached the conclusion that
relative imports within packages should be considered a best practice.* I
would encourage the Python community to discuss adopting that practice more
formally (perhaps as a PEP or something).
But package-local relative imports aren't a cure-all. There is a major
problem with nested dependencies. e.g. if "shrubbery" depends on the
"herring" package. There's no reasonable way of telling "shrubbery" that
"herring" is actually provided by "knights.vendored." You might be tempted
to convert non package-local imports to relative. e.g. `from .. import
herring`. But the importer doesn't allow relative imports outside the
current top-level package and this would break classic installs where
"shrubbery" and "herring" are proper top-level packages and not
sub-packages in e.g. a "vendored" sub-package. For cases where this occurs,
the easiest recourse today is to rewrite imported source code to use
relative imports. That's annoying, but it works.
In summary, some Python applications may want to vendor and distribute
Python package dependencies. Reliance on absolute imports is dangerous
because the global Python environment is effectively undefined from the
perspective of the application. The safest thing to do is use relative
imports from within the application. But because many packages don't use
relative imports themselves, vendoring a package can require rewriting
source code so imports are relative. And even if relative imports are used
within that package, relative imports can't be used for other top-level
packages. So source code rewriting is required to handle these. If you
vendor your Python package dependencies, your world often consists of a lot
of pain. It's better to absorb that pain than inflict it on the end-users
of your application (who shouldn't need to care about Python packaging).
But this is a pain that Python application developers must deal with. And I
feel that pain undermines the health of the Python ecosystem because it
makes Python a less attractive platform for standalone applications.
I would very much welcome a discussion and any ideas on improving the
Python package dependency problem for standalone Python applications. I
think encouraging the use of relative imports within packages is a solid
first step. But it obviously isn't a complete solution.
Just a reminder that 3.7.0b3 is almost upon us. Please get your
feature fixes, bug fixes, and documentation updates in before
2018-03-26 ~23:59 Anywhere on Earth (UTC-12:00). That's a little over
3.5 days from now.
IMPORTANT: We are now entering the final phases of 3.7.0. After the
tagging for 3.7.0b3, the intention is that the ABI for 3.7.0 is
frozen. After next week's 3.7.0b3, there will only be two more
opportunities planned for changes prior to 3.7.0 final:
- 2018-04-30 3.7.0 beta 4
- 2018-05-31 3.7.0 release candidate
As I've noted in previous communications, we need to start locking
down 3.7.0 so that our downstream users, that is, third-party package
developers, Python distributors, and end users, can test their code
with confidence that the actual release of 3.7.0 will hold no
unpleasant surprises. So after 3.7.0b3, you should treat the 3.7
branch as if it is already released and in maintenance mode. That
means you should only push the kinds of changes that are appropriate
for a maintenance release: non-ABI-changing bug and feature fixes and
documentation updates. If you find a problem that requires an
ABI-altering or other significant user-facing change (for example,
something likely to introduce an incompatibility with existing users'
code or require rebuilding of user extension modules), please make
sure to set the b.p.o issue to "release blocker" priority and describe
there why you feel the change is necessary. If you are reviewing PRs
for 3.7 (and please do!), be on the lookout for and flag potential
incompatibilities (we've all made them).
Thanks again for all of your hard work towards making 3.7.0 yet
another great release!
nad(a)python.org -- 
As the BDFL-Delegate, I’m happy to announce PEP 541 has been accepted.
PEP 541 has been voted by the packaging-wg (https://wiki.python.org/psf/
- Donald Stufft
- Dustin Ingram
- Ernest W. Durbin III
- Ewa Jodlowska
- Kenneth Reitz
- Mark Mangoba
- Nathaniel J. Smith
- Nick Coghlan
- Nicole Harris
- Sumana Harihareswara
Thank you to the packaging-wg and to everyone that has contributed to PEP
Mark Mangoba | PSF IT Manager | Python Software Foundation |
mmangoba(a)python.org | python.org | Infrastructure Staff:
infrastructure-staff(a)python.org | GPG: 2DE4 D92B 739C 649B EBB8 CCF6 DC05
E024 5F4C A0D1
I searched usages of is_integer() on GitHub and have found that it is
used *only* in silly code like (x/5).is_integer(), (x**0.5).is_integer()
(or even (x**(1/3)).is_integer()) and in loops like:
i = 0
while i < 20:
i += 0.1
(x/5).is_integer() is an awful way of determining the divisibility by 5.
It returns wrong result for large integers and some floats. (x % 5 == 0)
is a more clear and reliable way (or PEP 8 compliant (not x % 5)).
Does anybody know examples of the correct use of float.is_integer() in
real programs? For now it looks just like a bug magnet. I suggest to
deprecate it in 3.7 or 3.8 and remove in 3.9 or 3.10. If you even need
to test if a float is an exact integer, you could use (not x % 1.0). It
is even faster than x.is_integer().
I've made a custom concurrent.futures.Executor mixing the
ProcessPoolExecutor and ThreadPoolExecutor.
I've published it here:
This executor is very similar to a ProcessPoolExecutor, but each process in
the pool have it's own ThreadPoolExecutor inside.
The motivation for this executor is mitigate the problem we have in a
project were we have a very large number of long running IO bounded tasks,
that have to run concurrently. Those long running tasks have sparse CPU
To resolve this problem I considered multiple solutions:
1. Use asyncio to run the IO part as tasks and use a ProcessPoolExecutor
to run the CPU bounded operations with "run_in_executor". Unfortunately the
CPU operations depends on a large memory context, and using a
ProcessPoolExecutor this way force the parent process to picklelize all the
context to send it to the task, and because the context is so large, this
operation is itself very CPU demanding. So it doesn't work.
2. Executing the IO/CPU bounded operations in different processes with
multiprocessing.Process. This actually works, but the number of idle
processes in the system is too large, resulting in a bad memory footprint.
3. Executing the IO/CPU bounded operations in threads. This doesn't work
because the sum of all CPU operations saturate the core where the Python
process is running and the other cores are wasted doing nothing.
So I coded the ThreadedProcessPoolExecutor that helped me maintaining the
number of processes under control (I just have one process per CPU core)
allowing me to have a very high concurrency (hundreds of threads per
I have a couple of questions:
The first one is about the license. Given that I copied the majority of the
code from the concurrent.futures library, I understand that I have to
publish the code under the PSF LICENSE. Is this correct?
My second question is about the package namespace. Given that this is an
concurrent.futures.Executor subclass I understand that more intuitive place
to locate it is under concurrent.futures. Is this a suitable use case for
namespace packages? Is this a good idea?
There is the NEXT_BLOCK() macro in compile.c. It creates a new block,
creates an implicit jump from the current block to the new block, and
sets it as the current block.
But why it is used? All seems working if remove NEXT_BLOCK(). If there
was a need of NEXT_BLOCK() (if it reduces the computational complexity
of compilation or allows some optimizations), it should be documented,
and we should analyze the code and add missed NEXT_BLOCK() where they
are needed, and perhaps add new tests. Otherwise it can be removed.