At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I normally wouldn't bring something like this up here, except I think
that there is possibility of something to be done--a language
documentation clarification if nothing else, though possibly an actual
code change as well.
I've been having an argument with a colleague over the last couple
days over the proper way order of statements when setting up a
try/finally to perform cleanup of some action. On some level we're
both being stubborn I think, and I'm not looking for resolution as to
who's right/wrong or I wouldn't bring it to this list in the first
place. The original argument was over setting and later restoring
os.environ, but we ended up arguing over
threading.Lock.acquire/release which I think is a more interesting
example of the problem, and he did raise a good point that I do want
to bring up.
My colleague's contention is that given
lock = threading.Lock()
this is simply *wrong*:
whereas this is okay:
Ignoring other details of how threading.Lock is actually implemented,
assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls
release() then as far as I've known ever since Python 2.5 first came
out these two examples are semantically *equivalent*, and I can't find
any way of reading PEP 343 or the Python language reference that would
However, there *is* a difference, and has to do with how signals are
handled, particularly w.r.t. context managers implemented in C (hence
we are talking CPython specifically):
If Lock.__enter__ is a pure Python method (even if it maybe calls some
C methods), and a SIGINT is handled during execution of that method,
then in almost all cases a KeyboardInterrupt exception will be raised
from within Lock.__enter__--this means the suite under the with:
statement is never evaluated, and Lock.__exit__ is never called. You
can be fairly sure the KeyboardInterrupt will be raised from somewhere
within a pure Python Lock.__enter__ because there will usually be at
least one remaining opcode to be evaluated, such as RETURN_VALUE.
Because of how delayed execution of signal handlers is implemented in
the pyeval main loop, this means the signal handler for SIGINT will be
called *before* RETURN_VALUE, resulting in the KeyboardInterrupt
exception being raised. Standard stuff.
However, if Lock.__enter__ is a PyCFunction things are quite
different. If you look at how the SETUP_WITH opcode is implemented,
it first calls the __enter__ method with _PyObjet_CallNoArg. If this
returns NULL (i.e. an exception occurred in __enter__) then "goto
error" is executed and the exception is raised. However if it returns
non-NULL the finally block is set up with PyFrame_BlockSetup and
execution proceeds to the next opcode. At this point a potentially
waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
while inside the with statement's suite, and finally block, and hence
Lock.__exit__ are entered.
Long story short, because Lock.__enter__ is a C function, assuming
that it succeeds normally then
always guarantees that Lock.__exit__ will be called if a SIGINT was
handled inside Lock.__enter__, whereas with
there is at last a small possibility that the SIGINT handler is called
after the CALL_FUNCTION op but before the try/finally block is entered
(e.g. before executing POP_TOP or SETUP_FINALLY). So the end result
is that the lock is held and never released after the
KeyboardInterrupt (whether or not it's handled somehow).
Whereas, again, if Lock.__enter__ is a pure Python function there's
less likely to be any difference (though I don't think the possibility
can be ruled out entirely).
At the very least I think this quirk of CPython should be mentioned
somewhere (since in all other cases the semantic meaning of the
"with:" statement is clear). However, I think it might be possible to
gain more consistency between these cases if pending signals are
checked/handled after any direct call to PyCFunction from within the
Sorry for the tl;dr; any thoughts?
For technical reasons, many functions of the Python standard libraries
implemented in C have positional-only parameters. Example:
Python 3.7.0a0 (default, Feb 25 2017, 04:30:32)
replace(self, old, new, count=-1, /) # <== notice "/" at the end
>>> "a".replace("x", "y") # ok
>>> "a".replace(old="x", new="y") # ERR!
TypeError: replace() takes at least 2 arguments (0 given)
When converting the methods of the builtin str type to the internal
"Argument Clinic" tool (tool to generate the function signature,
function docstring and the code to parse arguments in C), I asked if
we should add support for keyword arguments in str.replace(). The
answer was quick: no! It's a deliberate design choice.
Quote of Yury Selivanov's message:
I think Guido explicitly stated that he doesn't like the idea to
always allow keyword arguments for all methods. I.e. `str.find('aaa')`
just reads better than `str.find(needle='aaa')`. Essentially, the idea
is that for most of the builtins that accept one or two arguments,
positional-only parameters are better.
I just noticed a module on PyPI to implement this behaviour on Python functions:
My question is: would it make sense to implement this feature in
Python directly? If yes, what should be the syntax? Use "/" marker?
Use the @positional() decorator?
Do you see concrete cases where it's a deliberate choice to deny
passing arguments as keywords?
Don't you like writing int(x="123") instead of int("123")? :-) (I know
that Serhiy Storshake hates the name of the "x" parameter of the int
By the way, I read that "/" marker is unknown by almost all Python
developers, and [...] syntax should be preferred, but
inspect.signature() doesn't support this syntax. Maybe we should fix
signature() and use [...] format instead?
Replace "replace(self, old, new, count=-1, /)" with "replace(self,
old, new[, count=-1])" (or maybe even not document the default
Python 3.5 help (docstring) uses "S.replace(old, new[, count])".
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
The `assert' statment was created the same as in previous languages like
C/C++: a check to only do in debug mode, when you can't yet trust your
code to manage and pass around internal data correctly. Examples are
array bounds and object state integrity constraints.
Unlike C, Python does the aforementioned checks all the time, i.e. it's
effectively always in "debug mode". Furthermore, validation checks are
an extremily common use case.
I have been using assert in my in-house code for input validation for a
long time, the only thing preventing me from doing the same in public
code is the fact that it only raises AssertionError's while library code
is expected to raise TypeError or ValueError on invalid input.
The last drop that incited me to do this proposition was
where none of the _seven_ answer authors (beside me) managed to make a
_single realistic use case_ for `assert' as a debug-only check.
So, I'm hereby proposing to:
* make `assert' stay in optimized mode
* change its syntax to raise other types of exceptions
E.g.: "assert condition, type, value" or "assert condition, type,
exception_constructor_argument(s)" to maintain backward compatibility.
I’m trying to get up to speed again w.r.t. PEP 447 (and to be honest, with CPython’s development workflow as its been too long since I’ve actually done work on the CPython codebase…).
Let me start by recapping the PEP (<https://www.python.org/dev/peps/pep-0447/ <https://www.python.org/dev/peps/pep-0447/>>) is about: The problem I’m trying to solve is that super.__getattribute__ basicy assumes looking at the __dict__ of classes on the MRO is all that’s needed to check if those classes provide an attribute. The core of both object.__getattribute__ and super.__getattribute__ for finding a descriptor on the class is basically (from the PEP):
def _PyType_Lookup(tp, name):
mro = tp.mro()
assert isinstance(mro, tuple)
for base in mro:
assert isinstance(base, type)
Note that the real implementation in in C, and accesses base.__dict__ directly instead of through Python’s attribute lookup (which would lead to infinite recursion).
This is problematic when the class is dynamically populated, as the descriptor may not yet be present in the class __dict__. This is not a problem for normal attribute lookup, as you can replace the __getattribute__ method of the class to do the additional work (at the cost of having to reimplement all of __getattribute__). This is a problem for super() calls though, as super will unconditionally use the lookup code above.
The PEP proposed to replace “base.__dict__[name]” by “base.__class__.__getdescriptor__(name)” (once again, in C with direct access to the two slots instead of accessing them through normal attribute lookup).
I have two open questions about this PEP:
1) Last time around Mark Shannon worried that this introduces infinite recursion in the language itself (in my crummy summary, please read this message to get the real concern <https://mail.python.org/pipermail/python-dev/2015-July/140938.html <https://mail.python.org/pipermail/python-dev/2015-July/140938.html>>). Is this truly a problem? I don’t think there is a problem, but I’m worried that I don’t fully understand Mark’s concerns.
2) PEP 487 introduced __init_subclass__ as a class method to avoid having to write a metaclass for a number of use cases. My PEP currently does require a metaclass, but it might be nicer to switch to a regular class method instead (like __init_subclass__).
My primary usecase for this PEP is PyObjC, which does populate the class dictionary on demand for performance and correctness reasons and PyObC therefore currently includes a replacement for super. For PyObjC’s implementation making __getdescriptor__ a classmethod would be a win as this would avoid creating yet another level of metaclasses in C code (PyObjC already has a Python class and metaclass for every Objective-C class, PEP 447 would require the introduction of a metaclass for those metaclasses).
BTW. Two other open issues w.r.t. this PEP are forward porting the patch (in issue 18181) and performing benchmarks. The patch doesn’t apply cleanly to the current tip of the tree, I’ll try to update it tomorrow. And with some luck will manage to update PyObjC’s implementation as well to validate the design.
On Tue, Nov 28, 2017 at 12:31:06PM -0800, Raymond Hettinger wrote:
> > I also cc python-dev to see if anybody here is strongly in favor or against this inclusion.
> Put me down for a strong -1. The proposal would occasionally save a
> few keystokes but comes at the expense of giving Python a more Perlish
> look and a more arcane feel.
I think that's an unfair characterisation of the benefits of the PEP.
It's not just "a few keystrokes".
Ironically, the equivalent in Perl is // which Python has used for
truncating division since version 2.4 or so. So if we're in danger of
looking "Perlish", that ship has sailed a long time ago.
Perl is hardly the only language with null-coalescing operators -- we
might better describe ?? as being familiar to C#, PHP, Swift and Dart.
That's two mature, well-known languages and two up-and-coming languages.
> timeout ?? local_timeout ?? global_timeout
As opposed to the status quo:
timeout if timeout is not None else (local_timeout if local_timeout is not None else global_timeout)
Or shorter, but even harder to understand:
(global_timeout if local_timeout is None else local_timeout) if timeout is None else timeout
I'd much prefer to teach the version with ?? -- it has a simple
explanation: "the first of the three given values which isn't None". The
?? itself needs to be memorized, but that's no different from any other
operator. The first time I saw ** I was perplexed and couldn't imagine
what it meaned.
Here ?? doesn't merely save a few keystrokes, it significantly reduces
the length and complexity of the expression and entirely cuts out the
duplication of names.
If you can teach
timeout or local_timeout or global_timeout
then you ought to be able to teach ??, as it is simpler: it only
compares to None, and avoids needing to explain or justify Python's
> 'foo' in (None ?? ['foo', 'bar'])
If you can understand
'foo' in (False or ['foo', 'bar'])
then surely you can understand the version with ??.
> requested_quantity ?? default_quantity * price
(default_quantity if requested_quantity is None else requested_quantity) * price
I'd much prefer to read, write and teach the version with ?? over the
Currently during assignment, when target list is a comma-separated list of
targets (*without "starred" target*) the rule is that the object (rhs) must
be an iterable with the same number of items as there are targets in the
target list. That is, no check is performed on the number of targets
present, and if something goes wrong the ValueError is raised. To show
this on simple example:
>>> from itertools import count, islice
>>> it = count()
>>> x, y = it
Here the count was advanced two times but assignment did not happen. I
found that in some cases it is too much restricting that rhs must have the
same number of items as targets. It is proposed that if the rhs is a
generator or an iterator (better some object that yields values on demand),
the assignmenet should be lazy and dependent on the number of targets. I
find this feature to be very convenient for interactive use, while it
remains readable, expected, and expressed in a more compact code.
There are some Pros:
1. No overhead
2. Readable and not so verbose code
3. Optimized case for x,y,*z = iterator
4. Clear way to assign values partially from infinite generators.
1. A special case of how assignment works
2. As with any implicit behavior, hard-to-find bugs
There several cases with "undefined" behavior:
1. Because the items are assigned, from left to right to the corresponding
targets, should rhs see side effects during assignment or not?
2. Should this work only for generators or for any iterators?
3. Is it Pythonic to distinguish what is on the rhs during assignment, or
it contradicts with duck typing (goose typing)?
In many cases it is possible to do this right now, but in too verbose way:
>>> x, y = islice(gen(), 2)
But it seems for me that:
>>> x, y = gen()
Looks the same, easy to follow and not so verbose. Another case, which is a
pitfall, but will be possible:
>>> x, y = iter([0,1,2,3,4]) # Now it is Ok
>>> x, y = [0,1,2,3,4] # raises ValueError
With kind regards, -gdg
In 3.7 I have removed an old-deprecated plistlib.Dict.  Actually it
already was deprecated when the plistlib module was added to the regular
stdlib in Python 2.6.
This is a dict subclass which allows to access items as attributes.
d = plistlib.Dict()
d['a'] = 1
assert d.a == 1
d.b = 2
assert d['b'] == 2
Raymond noticed that that capability seemed nice to have.
What do you think about reviving this type as general purpose type in
collections or types? Perhaps it can be convenient for working with
JSON, plists, configuration files, databases and in other cases that
need a dict with string keys.
If reintroduce it, there are open questions.
1. The name of the type.
2. The location of the type. collections or types? Or other variants?
3. How it will collaborate with OrderedDict, defaultdict, etc?
4. Should it be a dict subclass, or a mixin, or a proxy? Or add several
On Wed, Nov 29, 2017 at 5:46 AM, Alon Snir <AlonSnir(a)hotmail.com> wrote:
> I would like to mention that the issue of assignment to a target list, is also relevant to the case of elementwise assignment to a mutable sequence (e.g. lists and arrays). Here as well, the rhs of the assignment statement states the number of elements to be assigned. Consequently, the following example will get into infinite loop:
>>>> from itertools import count
>>>> A = 
>>>> A[:] = count()
> Writing "A[:2] = count()" will cause the same result.
> Here as well, it is currently safer to use islice if the rhs is a generator or an iterator. e.g.:
>>>> it = count()
>>>> A[:] = islice(it,2)
> In my opinion, it is be better to devise a solution that could be applied in both cases. Maybe a new kind of assignment operator that will be dedicated to this kind of assignment. i.e. elementwise assignment with restriction on the number of elements to be assigned, based on the length of the lhs object (or the number of targets in the target list).
Hmm. The trouble is that slice assignment doesn't have a fixed number of targets. If you say "x, y = spam", there's a clear indication that 'spam' needs to provide exactly two values; but "A[:] = spam" could have any number of values, and it'll expand or shrink the list accordingly.
Rhodri James wrote:
Flatly, no. It is better not to ask for things you don't want in the first place, in this case the infinite sequence.
Still, don't let me discourage you from working on this. If you can define how such an assignment would work, or even the length of A[:] as an assignment target, I'm not going to dismiss it out of hand.
The idea is to define an alternative assignment rule, that is to assign exactly as many elements as the current length of the lhs object (without expanding or shrinking it). Suppose "?=" is the operator for the alternative assignment rule; A=[None]*2; and "iterator" is any iterator (finite or infinite). In this case, the following code:
>>> A ?= iterator
would behave like this:
>>> A[:] = islice(iterator, 2) # where 2 is the length of A
And as suggested earlier for the case of assignment to a target list, the following code:
>>> x, y ?= iterator
would behave like this:
>>> x, y = islice(iterator, 2) # where 2 is the number of targets
Regarding the length issue:
Is there any difficulty in finding the length of a sliced sequence? After all, the range object has a len method. Therefore, the length of A[s] (where s is a slice object) could be evaluated as follows: