At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I normally wouldn't bring something like this up here, except I think
that there is possibility of something to be done--a language
documentation clarification if nothing else, though possibly an actual
code change as well.
I've been having an argument with a colleague over the last couple
days over the proper way order of statements when setting up a
try/finally to perform cleanup of some action. On some level we're
both being stubborn I think, and I'm not looking for resolution as to
who's right/wrong or I wouldn't bring it to this list in the first
place. The original argument was over setting and later restoring
os.environ, but we ended up arguing over
threading.Lock.acquire/release which I think is a more interesting
example of the problem, and he did raise a good point that I do want
to bring up.
My colleague's contention is that given
lock = threading.Lock()
this is simply *wrong*:
whereas this is okay:
Ignoring other details of how threading.Lock is actually implemented,
assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls
release() then as far as I've known ever since Python 2.5 first came
out these two examples are semantically *equivalent*, and I can't find
any way of reading PEP 343 or the Python language reference that would
However, there *is* a difference, and has to do with how signals are
handled, particularly w.r.t. context managers implemented in C (hence
we are talking CPython specifically):
If Lock.__enter__ is a pure Python method (even if it maybe calls some
C methods), and a SIGINT is handled during execution of that method,
then in almost all cases a KeyboardInterrupt exception will be raised
from within Lock.__enter__--this means the suite under the with:
statement is never evaluated, and Lock.__exit__ is never called. You
can be fairly sure the KeyboardInterrupt will be raised from somewhere
within a pure Python Lock.__enter__ because there will usually be at
least one remaining opcode to be evaluated, such as RETURN_VALUE.
Because of how delayed execution of signal handlers is implemented in
the pyeval main loop, this means the signal handler for SIGINT will be
called *before* RETURN_VALUE, resulting in the KeyboardInterrupt
exception being raised. Standard stuff.
However, if Lock.__enter__ is a PyCFunction things are quite
different. If you look at how the SETUP_WITH opcode is implemented,
it first calls the __enter__ method with _PyObjet_CallNoArg. If this
returns NULL (i.e. an exception occurred in __enter__) then "goto
error" is executed and the exception is raised. However if it returns
non-NULL the finally block is set up with PyFrame_BlockSetup and
execution proceeds to the next opcode. At this point a potentially
waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
while inside the with statement's suite, and finally block, and hence
Lock.__exit__ are entered.
Long story short, because Lock.__enter__ is a C function, assuming
that it succeeds normally then
always guarantees that Lock.__exit__ will be called if a SIGINT was
handled inside Lock.__enter__, whereas with
there is at last a small possibility that the SIGINT handler is called
after the CALL_FUNCTION op but before the try/finally block is entered
(e.g. before executing POP_TOP or SETUP_FINALLY). So the end result
is that the lock is held and never released after the
KeyboardInterrupt (whether or not it's handled somehow).
Whereas, again, if Lock.__enter__ is a pure Python function there's
less likely to be any difference (though I don't think the possibility
can be ruled out entirely).
At the very least I think this quirk of CPython should be mentioned
somewhere (since in all other cases the semantic meaning of the
"with:" statement is clear). However, I think it might be possible to
gain more consistency between these cases if pending signals are
checked/handled after any direct call to PyCFunction from within the
Sorry for the tl;dr; any thoughts?
For technical reasons, many functions of the Python standard libraries
implemented in C have positional-only parameters. Example:
Python 3.7.0a0 (default, Feb 25 2017, 04:30:32)
replace(self, old, new, count=-1, /) # <== notice "/" at the end
>>> "a".replace("x", "y") # ok
>>> "a".replace(old="x", new="y") # ERR!
TypeError: replace() takes at least 2 arguments (0 given)
When converting the methods of the builtin str type to the internal
"Argument Clinic" tool (tool to generate the function signature,
function docstring and the code to parse arguments in C), I asked if
we should add support for keyword arguments in str.replace(). The
answer was quick: no! It's a deliberate design choice.
Quote of Yury Selivanov's message:
I think Guido explicitly stated that he doesn't like the idea to
always allow keyword arguments for all methods. I.e. `str.find('aaa')`
just reads better than `str.find(needle='aaa')`. Essentially, the idea
is that for most of the builtins that accept one or two arguments,
positional-only parameters are better.
I just noticed a module on PyPI to implement this behaviour on Python functions:
My question is: would it make sense to implement this feature in
Python directly? If yes, what should be the syntax? Use "/" marker?
Use the @positional() decorator?
Do you see concrete cases where it's a deliberate choice to deny
passing arguments as keywords?
Don't you like writing int(x="123") instead of int("123")? :-) (I know
that Serhiy Storshake hates the name of the "x" parameter of the int
By the way, I read that "/" marker is unknown by almost all Python
developers, and [...] syntax should be preferred, but
inspect.signature() doesn't support this syntax. Maybe we should fix
signature() and use [...] format instead?
Replace "replace(self, old, new, count=-1, /)" with "replace(self,
old, new[, count=-1])" (or maybe even not document the default
Python 3.5 help (docstring) uses "S.replace(old, new[, count])".
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
Hi, do you have an opinion on the following?
Wouldn't it be nice to define classes via a simple constructor function (as
below) instead of a conventional class definition?
self._x = x
z = self._x + y
self = ParentClass()
z = x + y
self.my_method = my_method # that's cumbersome (see comments below)
Here are the pros and cons I could come up with for the proposed method:
(+) Simpler and more explicit.
(+) No need to create attributes (like `self._x`) just to pass something
from `__init__` to another method.
(+) Default arguments / annotations for methods could be different for each
class instance. Adaptive defaults wouldn't have to simulated with a None.
(+) Class/instance level imports would work.
(-/+) Speed: The `def`-based objects take 0.6 μs to create while the
`class`-based objects take only 0.4 μs. For method execution however the
closure takes only 0.15 μs while the proper method takes 0.22 μs (script
(-/+) Checking types: In the proposed example above the returned object
wouldn't know that it has been created by `MyClass`. There are a couple of
solutions to that, though. The easiest to implement would be to change the
first line to `self = subclass(ParentClass())` where the subclass function
looks at the next item in the call stack (i.e. `MyClass`) and makes it the
type of the object. Another solution would be to have a special rule for
functions with capital first letter returning a single object to append
itself to the list of types of the returned object. Alternatively there
could be a special keyword e.g. `classdef` that would be used instead of
`def` if we wouldn't want to rely on the name.
(-) The current syntax for adding a function to an object is cumbersome.
That's what is preventing me from actually using the proposed pattern. But
is this really the only reason for not using it? And if so, wouldn't that
be a good argument for enabling something like below?
*attribute function definitions*:
self = ParentClass()
z = x + y
or alternatively *multiline lambdas*:
self = ParentClass()
self.my_method = (y):
z = x + y
just one note I'd like to dump here.
We usually teach our newbies to catch exceptions as narrowly as
possible, i.e. MyModel.DoesNotExist instead of a plain Exception. This
works out quite well for now but the number of examples continue to grow
where it's not enough.
There are at least three examples I can name off the top of my head:
1) nested StopIteration - PEP 479
2) nested ImportError
3) nested AttributeError
1) is clear. 2) usually can be dealt with by applying the following pattern:
Chris showed how to deal with 3). Catching nested exception is not what
people want many times.
Am I the only one getting the impression that there's a common theme here?
There has been some discussion here and there concerning the differences
between runtime types and static types (mypy etc.). What I write below is
not really an idea or proposal---just a perspective, or a topic that people
may want to discuss. Since the discussion on this is currently very fuzzy
and scattered and not really happening either AFAICT (I've probably missed
many discussions, though). Anyway, I thought I'd give it a shot:
Clearly, there needs to be some sort of distinction between runtime
classes/types and static types, because static types can be more precise
than Python's dynamic runtime semantics. For example, Iterable[int] is an
iterable that contains integers. For a static type checker, it is clear
what this means. But at runtime, it may be impossible to figure out whether
an iterable is really of this type without consuming the whole iterable and
checking whether each yielded element is an integer. Even that is not
possible if the iterable is infinite. Even Sequence[int] is problematic,
because checking the types of all elements of the sequence could take a
Since things like isinstance(it, Iterable[int]) cannot guarantee a proper
answer, one easily arrives at the conclusion that static types and runtime
classes are just two separate things and that one cannot require that all
types support something like isinstance at runtime.
On the other hand, there are many runtime things that can or could be done
using (type) annotations, for example:
Multidispatch (example with hypothetical syntax below):
def concatenate(parts: Iterable[str]) -> str:
def concatenate(parts: Iterable[bytes]) -> bytes:
def concatenate(parts: Iterable[Iterable]) -> Iterable:
or runtime type checking:
def load_from_file(filename: Union[os.PathLike, str, bytes]):
with open(filename) as f:
which would automatically give a nice error message if, say, a file object
is given as argument instead of a path to a file.
However useful (and efficient) these things might be, the runtime type
checks are problematic, as discussed above.
Furthermore, other differences between runtime and static typing may emerge
(or have emerged), which will complicate the matter further. For instance,
the runtime __annotations__ of classes, modules and functions may in some
cases contain something completely different from what a type checker
thinks the type should be.
These and other incompatibilities between runtime and static typing will
create two (or more) different kinds of type-annotated Python:
runtime-oriented Python and Python with static type checking. These may be
incompatible in both directions: a static type checker may complain about
code that is perfectly valid for the runtime folks, and code written for
static type checking may not be able to use new Python techniques that make
use of type hints at runtime. There may not even be a fully functional
subset of the two "languages". Different libraries will adhere to different
standards and will not be compatible with each other. The split will be
much worse and more difficult to understand than Python 2 vs 3, peoples
around the world will suffer like never before, and programming in Python
will become a very complicated mess.
One way of solving the problem would be that type annotations are only a
static concept, like with stubs or comment-based type annotations. This
would also be nice from a memory and performance perspective, as evaluating
and storing the annotations would not occupy memory (although both issues
and some more might be nicely solved by making the annotations lazily
ealuated). However, leaving out runtime effects of type annotations is not
the approach taken, and runtime introspection of annotations seems to have
some promising applications as well. And for many cases, the traditional
Python class actually acts very nicely as both the runtime and static type.
So if type annotations will be both for runtime and for static checking,
how to make everything work for both static and runtime typing?
Since a writer of a library does not know what the type hints will be used
for by the library users, it is very important that there is only one way
of making type annotations which will work regardless of what the
annotations are used for in the end. This will also make it much easier to
learn Python typing.
Regarding runtime types and isinstance, let's look at the Iterable[int]
example. For this case, there are a few options:
1) Don't implement isinstance
This is problematic for runtime uses of annotations.
2) isinstance([1, '2', 'three'], Iterable[int]) returns True
This is in fact now the case. This is ok for many runtime situations, but
lacks precision compared to the static version. One may want to distinguish
between Iterable[int] and Iterable[str] at runtime (e.g. the multidispatch
3) Check as much as you can at runtime
There could be something like Reiterable, which means the object is not
consumed by iterating over it, so one could actually check if all elements
are instances of int. This would be useful in some situations, but not
available for every object. Furthermore, the check could take an arbitrary
amount of time so it is not really suitable for things like multidispatch
or some matching constructs etc., where the performance overhead of the
type check is really important.
4) Do a deeper check than in (2) but trust the annotations
For example, an instance of a class that has a method like
def __iter__(self) -> Iterator[int]:
could be identified as Iterable[int] at runtime, even if it is not
guaranteed that all elements are really integers.
On the other hand, an object returned by
def get_ints() -> Iterable[int]:
does not know its own annotations, so the check is difficult to do at
runtime. And of course, there may not be annotations available.
5) Something else?
And what about PEP544 (protocols), which is being drafted? The PEP seems to
aim for having type objects that represent duck-typing
protocols/interfaces. Checking whether a protocol is implemented by an
object or type is clearly a useful thing to do at runtime, but it is not
really clear if isinstance would be a guaranteed feature for PEP544
So one question is, is it possible to draw the lines between what works
with isinstance and what doesn't, and between what details are checked by
isinstance and what aren't? -- Or should insinstance be reserved for a more
limited purpose, and add another check function, say `implements(...)`,
which would perhaps guarantee some answer for all combinations of object
I'll stop here---this email is probably already much longer than a single
email should be ;)
+ Koos Zevenhoven + http://twitter.com/k7hoven +
IIRC I'm pretty sure the OP just didn't know about the existence of tuple
unpacking and the ability to use that to return multiple values.
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone
On Jun 25, 2017 at 6:09 PM, <Mikhail V <mikhailwas(a)gmail.com>> wrote:
joannah nanjekye wrote:
>Today I was writing an example snippet for the book and needed to write a
>function that returns two values something like this:
>def return_multiplevalues(num1, num2):
> return num1, num2
> I noticed that this actually returns a tuple of the values which I did not
>want in the first place.I wanted python to return two values in their own
>types so I can work with them as they are but here I was stuck with working
>around a tuple.
It was quite puzzling at first what was the actual idea but probably I
can guess why this question came up by you.
It seems to me (I am just intuitively guessing that) that you were about to
write a procedure which operates on global variables.
If so you should use the keyword "global" for that.
E.g. if you want to work with the variables defined in other
part of the code you can simply do it:
x = 0
y = 0
global x, y
x = x + 10
y = y + 20
This function will change x and y (global variables in this case).
Note that without the line with the "global" statement this will not work.
Another typical usage is initialising variables inside a procedure:
So for some reason it seemed to me that you are trying to do
something like that.
>My proposal is we provide a way of functions returning multiple values.
>This has been implemented in languages like Go and I have found many cases
>where I needed and used such a functionality. I wish for this convenience
>in python so that I don't have to suffer going around a tuple.
So if using globals as in the above examples you certinly don't
have to suffer going around a tuple.
Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/
I often use generators, and itertools.chain on them.
What about providing something like the following:
a = (n for n in range(2))
b = (n for n in range(2, 4))
tuple(a + b) # -> 0 1 2 3
This, from user point of view, is just as how the
__add__ operator works on lists and tuples.
Making generators works the same way could be a great way to avoid calls
to itertools.chain everywhere, and to limits the differences between
generators and other "linear" collections.
I do not know exactly how to implement that (i'm not that good at C, nor
CPython source itself), but by seeing the sources,
i imagine that i could do something like the list_concat function at
Objects/listobject.c:473, but in the Objects/genobject.c file,
where instead of copying elements i'm creating and initializing a new
chainobject as described at Modules/itertoolsmodule.c:1792.
(In pure python, the implementation would be something like `def
__add__(self, othr): return itertools.chain(self, othr)`)