There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Hello all,
This is a writeup of a proposal I floated here:
https://mail.python.org/pipermail/python-list/2015-August/694905.html
last Sunday. If the response is positive I wish to write a PEP.
Briefly, it is a natural expectation in users that the command:
python -m module_name ...
used to invoke modules in "main program" mode on the command line imported the
module as "module_name". It does not, it imports it as "__main__". An import
within the program of "module_name" makes a new instance of the module, which
causes cognitive dissonance and has the side effect that now the program has
two instances of the module.
What I propose is that the above command line _should_ bind
sys.modules['module_name'] as well as binding '__main__' as it does currently.
I'm proposing that the python -m option have this effect (python pseudocode):
% python -m module.name ...
runs:
# pseudocode, with values hardwired for clarity
import sys
M = new_empty_module(name='__main__', qualname='module.name')
sys.modules['__main__'] = M
sys.modules['module.name'] = M
# load the module code from wherever (not necessarily a file - CPython
# already must do this phase)
M.execfile('/path/to/module/name.py')
Specificly, this would have the following two changes to current practice:
1) the module is imported _once_, and bound to both its canonical name and
also to __main__.
2) imported modules acquire a new attribute __qualname__ (analogous to the
recent __qualname__ on functions). This is always the conanoical name of the
module as resolved by the importer. For most modules __name__ will be the same
as __qualname__, but for the "main" module __name__ will be '__main__'.
This change has the following advantages:
The current standard boilerplate:
if __name__ == '__main__':
... invoke "main program" here ...
continues to work unchanged.
Importantly, if the program then issues "import module_name", it is already
there and the existing instance is found and used.
The thread referenced above outlines my most recent encounter with this and the
trouble it caused me. Followup messages include some support for this proposed
change, and some criticism.
The critiquing article included some workarounds for this multiple module
situation, but they were (1) somewhat dependent on modules coming from a file
pathname and (2) cumbersome and require every end user to adopt these changes
if affected by the situation. I'd like to avoid that.
Cheers,
Cameron Simpson <cs(a)zip.com.au>
The reasonable man adapts himself to the world; the unreasonable one persists
in trying to adapt the world to himself. Therefore all progress depends
on the unreasonable man. - George Bernard Shaw
tl;dr Let's exploit multiple cores by fixing up subinterpreters,
exposing them in Python, and adding a mechanism to safely share
objects between them.
This proposal is meant to be a shot over the bow, so to speak. I plan
on putting together a more complete PEP some time in the future, with
content that is more refined along with references to the appropriate
online resources.
Feedback appreciated! Offers to help even more so! :)
-eric
--------
Python's multi-core story is murky at best. Not only can we be more
clear on the matter, we can improve Python's support. The result of
any effort must make multi-core (i.e. parallelism) support in Python
obvious, unmistakable, and undeniable (and keep it Pythonic).
Currently we have several concurrency models represented via
threading, multiprocessing, asyncio, concurrent.futures (plus others
in the cheeseshop). However, in CPython the GIL means that we don't
have parallelism, except through multiprocessing which requires
trade-offs. (See Dave Beazley's talk at PyCon US 2015.)
This is a situation I'd like us to solve once and for all for a couple
of reasons. Firstly, it is a technical roadblock for some Python
developers, though I don't see that as a huge factor. Regardless,
secondly, it is especially a turnoff to folks looking into Python and
ultimately a PR issue. The solution boils down to natively supporting
multiple cores in Python code.
This is not a new topic. For a long time many have clamored for death
to the GIL. Several attempts have been made over the years and failed
to do it without sacrificing single-threaded performance.
Furthermore, removing the GIL is perhaps an obvious solution but not
the only one. Others include Trent Nelson's PyParallels, STM, and
other Python implementations..
Proposal
=======
In some personal correspondence Nick Coghlan, he summarized my
preferred approach as "the data storage separation of multiprocessing,
with the low message passing overhead of threading".
For Python 3.6:
* expose subinterpreters to Python in a new stdlib module: "subinterpreters"
* add a new SubinterpreterExecutor to concurrent.futures
* add a queue.Queue-like type that will be used to explicitly share
objects between subinterpreters
This is less simple than it might sound, but presents what I consider
the best option for getting a meaningful improvement into Python 3.6.
Also, I'm not convinced that the word "subinterpreter" properly
conveys the intent, for which subinterpreters is only part of the
picture. So I'm open to a better name.
Influences
========
Note that I'm drawing quite a bit of inspiration from elsewhere. The
idea of using subinterpreters to get this (more) efficient isolated
execution is not my own (I heard it from Nick). I have also spent
quite a bit of time and effort researching for this proposal. As part
of that, a number of people have provided invaluable insight and
encouragement as I've prepared, including Guido, Nick, Brett Cannon,
Barry Warsaw, and Larry Hastings.
Additionally, Hoare's "Communicating Sequential Processes" (CSP) has
been a big influence on this proposal. FYI, CSP is also the
inspiration for Go's concurrency model (e.g. goroutines, channels,
select). Dr. Sarah Mount, who has expertise in this area, has been
kind enough to agree to collaborate and even co-author the PEP that I
hope comes out of this proposal.
My interest in this improvement has been building for several years.
Recent events, including this year's language summit, have driven me
to push for something concrete in Python 3.6.
The subinterpreter Module
=====================
The subinterpreters module would look something like this (a la
threading/multiprocessing):
settrace()
setprofile()
stack_size()
active_count()
enumerate()
get_ident()
current_subinterpreter()
Subinterpreter(...)
id
is_alive()
running() -> Task or None
run(...) -> Task # wrapper around PyRun_*, auto-calls Task.start()
destroy()
Task(...) # analogous to a CSP process
id
exception()
# other stuff?
# for compatibility with threading.Thread:
name
ident
is_alive()
start()
run()
join()
Channel(...) # shared by passing as an arg to the subinterpreter-running func
# this API is a bit uncooked still...
pop()
push()
poison() # maybe
select() # maybe
Note that Channel objects will necessarily be shared in common between
subinterpreters (where bound). This sharing will happen when the one
or more of the parameters to the function passed to Task() is a
Channel. Thus the channel would be open to the (sub)interpreter
calling Task() (or Subinterpreter.run()) and to the new
subinterpreter. Also, other channels could be fed into such a shared
channel, whereby those channels would then likewise be shared between
the interpreters.
I don't know yet if this module should include *all* the essential
pieces to implement a complete CSP library. Given the inspiration
that CSP is providing, it may make sense to support it fully. It
would be interesting then if the implementation here allowed the
(complete?) formalisms provided by CSP (thus, e.g. rigorous proofs of
concurrent system models).
I expect there will also be a _subinterpreters module with low-level
implementation-specific details.
Related Ideas and Details Under Consideration
====================================
Some of these are details that need to be sorted out. Some are
secondary ideas that may be appropriate to address in this proposal or
may need to be tabled. I have some others but these should be
sufficient to demonstrate the range of points to consider.
* further coalesce the (concurrency/parallelism) abstractions between
threading, multiprocessing, asyncio, and this proposal
* only allow one running Task at a time per subinterpreter
* disallow threading within subinterpreters (with legacy support in C)
+ ignore/remove the GIL within subinterpreters (since they would be
single-threaded)
* use the GIL only in the main interpreter and for interaction between
subinterpreters (and a "Local Interpreter Lock" for within a
subinterpreter)
* disallow forking within subinterpreters
* only allow passing plain functions to Task() and
Subinterpreter.run() (exclude closures, other callables)
* object ownership model
+ read-only in all but 1 subinterpreter
+ RW in all subinterpreters
+ only allow 1 subinterpreter to have any refcounts to an object
(except for channels)
* only allow immutable objects to be shared between subinterpreters
* for better immutability, move object ref counts into a separate table
* freeze (new machinery or memcopy or something) objects to make them
(at least temporarily) immutable
* expose a more complete CSP implementation in the stdlib (or make the
subinterpreters module more compliant)
* treat the main interpreter differently than subinterpreters (or
treat it exactly the same)
* add subinterpreter support to asyncio (the interplay between them
could be interesting)
Key Dependencies
================
There are a few related tasks/projects that will likely need to be
resolved before subinterpreters in CPython can be used in the proposed
manner. The proposal could implemented either way, but it will help
the multi-core effort if these are addressed first.
* fixes to subinterpreter support (there are a couple individuals who
should be able to provide the necessary insight)
* PEP 432 (will simplify several key implementation details)
* improvements to isolation between subinterpreters (file descriptors,
env vars, others)
Beyond those, the scale and technical scope of this project means that
I am unlikely to be able to do all the work myself to land this in
Python 3.6 (though I'd still give it my best shot). That will require
the involvement of various experts. I expect that the project is
divisible into multiple mostly independent pieces, so that will help.
Python Implementations
===================
They can correct me if I'm wrong, but from what I understand both
Jython and IronPython already have subinterpreter support. I'll be
soliciting feedback from the different Python implementors about
subinterpreter support.
C Extension Modules
=================
Subinterpreters already isolate extension modules (and built-in
modules, including sys). PEP 384 provides some help too. However,
global state in C can easily leak data between subinterpreters,
breaking the desired data isolation. This is something that will need
to be addressed as part of the effort.
When defining a place for config files, cache files, and so on, people
usually hack around in a OS-dependent, misinformed, and therefore wrong way.
Thanks to the tempfile API we at least don’t see people hardcoding /tmp/
too much.
There is a beautiful little module that does things right and is easy to
use: appdirs <https://pypi.python.org/pypi/appdirs>
TI think this is a *really* good candidate for the stdlib since this
functionality is useful for everything that needs a cache or config (so not
only GUI and CLI applications, but also scripts that download and cache
stuff from the internet for faster re-running)
probably we should build the API around pathlib, since i found myself not
touching os.path with a barge pole since pathlib exists.
i’ll write a PEP about this soon :)
Now that f-strings are in the 3.6 branch, I'd like to turn my attention
to binary f-strings (fb'' or bf'').
The idea is that:
>>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
Might be translated as:
>>> (b'datestamp:' +
... bytes(format(datetime.datetime.now(),
... str(b'%Y%m%d', 'ascii')),
... 'ascii') +
... b'\r\n')
Which would result in:
b'datestamp:20150927\r\n'
The only real question is: what encoding to use for the second parameter
to bytes()? Since an object must return unicode from __format__(), I
need to convert that to bytes in order to join everything together. But how?
Here I suggest 'ascii'. Unfortunately, this would give an error if
__format__ returned anything with a char greater than 127. I think we've
learned that an API that only raises an exception with certain specific
inputs is fragile.
Guido has suggested using 'utf-8' as the encoding. That has some appeal,
but if we're designing this for wire protocols, not all protocols will
be using utf-8.
Another idea would be to extend the "conversion char" from just 's',
'r', or 'a', which don't make much sense for bytes, to instead be a
string that specifies the encoding. The default could be ascii, and if
you want to specify something else:
bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n'
That would work for any encoding that doesn't have ':', '{', or '}' in
the encoding name. Which seems like a reasonable restriction.
And I might be over-generalizing here, but you'd presumably want to make
the encoding a non-constant:
bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n'
I think my initial proposal will be to use 'ascii', and not support any
conversion characters at all for fb-strings, not even 's', 'r', and 'a'.
In the future, if we want to support encodings other than 'ascii', we
could then add !conversions mapping to encodings.
My reasoning for using 'ascii' is that 'utf-8' could easily be an error
for non-utf-8 protocols. And by using 'ascii', at least we'd give a
runtime error and not put possibly bogus data into the resulting binary
string. Granted, the tradeoff is that we now have a case where whether
or not the code raises an exception is dependent upon the values being
formatted. If 'ascii' is the default, we could later switch to 'utf-8',
but we couldn't go the other way.
The only place this is likely to be a problem is when formatting unicode
string values. No other built-in type is going to have a non-ascii
compatible character in its __format__, unless you do tricky things with
datetime format_specs. Of course user-defined types can return any
unicode chars from __format__.
Once we make a decision, I can apply the same logic to b''.format(), if
that's desirable.
I'm open to suggestions on this.
Thanks for reading.
--
Eric.
Following on to the discussions about changing the default random number
generator, I would like to propose an alternative: adding a secrets
module to the standard library.
Attached is a draft PEP. Feedback is requested.
(I'm going to only be intermittently at the keyboard for the next day or
so, so my responses may be rather slow.)
--
Steve
I've created a prototype for how we could add foreign language names to the
turtle.py module and erase the language barrier for non-English schoolkids.
The Tortuga module has the same functionality as
You can test it out by running "pip install tortuga"
https://pypi.python.org/pypi/Tortuga
Since Python 2 doesn't have simpledialog, I used the PyMsgBox pure-python
module for the input boxes. This code is small enough that it could be
added into turtle.py. (It's just used for simple tkinter dialog boxes.)
Check out the diff between Tortuga and turtle.py here:
https://www.diffchecker.com/2xmbrkhk
This file can be easily adapted to support multiple programming languages.
Thoughts? Suggestions?
-Al
TL;DR:
+1 for the idea
-1 on the propagating member-access or index operators
+1 on spelling it "or?"
C# has had null-coalescing since about 2005, and it's one feature I miss in
every other language that I use. I view null/None as a necessary evil, so
getting rid of them as soon possible is a good thing in my book. Nearly
every bit of Python I've ever written would have benefitted from it, if
just to get rid of the "x if x is not None else []" mess.
That said, I think the other (propagating) operators are a mistake, and I
think they were a mistake in C# as well. I'm not I've ever had a situation
where I wished they existed, in any language. Better to get rid of the
Nones as soon as possible than bring them along. It's worth reading the C#
design team's notes and subsequent discussion on the associativity of "?."
[1] since it goes around and around with no really good answer and no
particularly intuitive behaviour.
Rather than worry about that, I'd prefer to see just the basic
None-coalescing added. I like Alessio's suggestion of "or?" (which seems
like it should be read in a calm but threatening tone, a la Liam Neeson).
It just seems more Pythonic; ?? is fine in C# but seems punctuation-heavy
for Python. It does mean the ?= and ?. and ?[] are probably out, and I'm OK
with that.
- Jeff
[1] https://roslyn.codeplex.com/discussions/543895