Mailman 3 January 2018 - Python-ideas

Adding a thin wrapper class around the functions in stdlib.heapq
by bunslow 18 Feb '22

18 Feb '22

Nothing so bombastic this time. The heapq functions are basically all named "heapsomething", and basically all take a "heap" for their first argument, with supplementary args coming after. It's a textbook example of the (hypothetical) Object Oriented Manifesto™ where defining a class increases type safety and programmers' conceptual clarity. There're practically no drawbacks, and the code to be added would be very simple. Updating the tests and docs would probably be harder. In pure Python, such a class would look like this: class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__() self.heapify() push = heapq.heappush pop = heapq.heappop pushpop = heapq.heappushpop replace = heapq.heapreplace heapify = heapq.heapify # This could be a simple wrapper as well, but I had the following thoughts anyways, so here they are def nsmallest(self, n, key=None): # heapq.nsmallest makes a *max* heap of the first n elements, # while we know that self is already a min heap, so we can # make the max heap construction faster self[:n] = reversed(self[:n]) return heapq.nsmallest(n, self, key) # do we define nlargest on a minheap?? Wrapping around the C builtin functions (which aren't descriptors) would be a bit harder, but not much so: from functools import partial class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__() self.heapify = partial(heapq.heapify, self) self.push = partial(heapq.heappush, self) ... self.heapify() Thoughts?

17 32

Dataclasses, keyword args, and inheritance
by George Leslie-Waksman 12 Mar '21

12 Mar '21

The proposed implementation of dataclasses prevents defining fields with defaults before fields without defaults. This can create limitations on logical grouping of fields and on inheritance. Take, for example, the case: @dataclass class Foo: some_default: dict = field(default_factory=dict) @dataclass class Bar(Foo): other_field: int this results in the error: 5 @dataclass ----> 6 class Bar(Foo): 7 other_field: int 8 ~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py in dataclass(_cls, init, repr, eq, order, hash, frozen) 751 752 # We're called as @dataclass, with a class. --> 753 return wrap(_cls) 754 755 ~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py in wrap(cls) 743 744 def wrap(cls): --> 745 return _process_class(cls, repr, eq, order, hash, init, frozen) 746 747 # See if we're being called as @dataclass or @dataclass(). ~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py in _process_class(cls, repr, eq, order, hash, init, frozen) 675 # in __init__. Use "self" if possible. 676 '__dataclass_self__' if 'self' in fields --> 677 else 'self', 678 )) 679 if repr: ~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py in _init_fn(fields, frozen, has_post_init, self_name) 422 seen_default = True 423 elif seen_default: --> 424 raise TypeError(f'non-default argument {f.name!r} ' 425 'follows default argument') 426 TypeError: non-default argument 'other_field' follows default argument I understand that this is a limitation of positional arguments because the effective __init__ signature is: def __init__(self, some_default: dict = <something>, other_field: int): However, keyword only arguments allow an entirely reasonable solution to this problem: def __init__(self, *, some_default: dict = <something>, other_field: int): And have the added benefit of making the fields in the __init__ call entirely explicit. So, I propose the addition of a keyword_only flag to the @dataclass decorator that renders the __init__ method using keyword only arguments: @dataclass(keyword_only=True) class Bar(Foo): other_field: int --George Leslie-Waksman

12 35

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 16 Dec '20

16 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Re: [Python-ideas] Allow star unpacking within an slice expression
by Neil Girdhar 07 Mar '20

07 Mar '20

I didn't think of this when we were discussing 448. I ran into this today, so I agree with you that it would be nice to have this. Best, Neil On Monday, December 4, 2017 at 1:02:09 AM UTC-5, Eric Wieser wrote: > > Hi, > > I've been thinking about the * unpacking operator while writing some numpy > code. PEP 448 allows the following: > > values = 1, *some_tuple, 2 > object[(1, *some_tuple, 2)] > > It seems reasonable to me that it should be extended to allow > > item = object[1, *some_tuple, 2] > item = object[1, *some_tuple, :] > > Was this overlooked in the original proposal, or deliberately rejected? > > Eric >

2 2

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 22 Feb '20

22 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

Asynchronous exception handling around with/try statement borders
by Erik Bray 24 Sep '18

24 Sep '18

Hi folks, I normally wouldn't bring something like this up here, except I think that there is possibility of something to be done--a language documentation clarification if nothing else, though possibly an actual code change as well. I've been having an argument with a colleague over the last couple days over the proper way order of statements when setting up a try/finally to perform cleanup of some action. On some level we're both being stubborn I think, and I'm not looking for resolution as to who's right/wrong or I wouldn't bring it to this list in the first place. The original argument was over setting and later restoring os.environ, but we ended up arguing over threading.Lock.acquire/release which I think is a more interesting example of the problem, and he did raise a good point that I do want to bring up. </prologue> My colleague's contention is that given lock = threading.Lock() this is simply *wrong*: lock.acquire() try: do_something() finally: lock.release() whereas this is okay: with lock: do_something() Ignoring other details of how threading.Lock is actually implemented, assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls release() then as far as I've known ever since Python 2.5 first came out these two examples are semantically *equivalent*, and I can't find any way of reading PEP 343 or the Python language reference that would suggest otherwise. However, there *is* a difference, and has to do with how signals are handled, particularly w.r.t. context managers implemented in C (hence we are talking CPython specifically): If Lock.__enter__ is a pure Python method (even if it maybe calls some C methods), and a SIGINT is handled during execution of that method, then in almost all cases a KeyboardInterrupt exception will be raised from within Lock.__enter__--this means the suite under the with: statement is never evaluated, and Lock.__exit__ is never called. You can be fairly sure the KeyboardInterrupt will be raised from somewhere within a pure Python Lock.__enter__ because there will usually be at least one remaining opcode to be evaluated, such as RETURN_VALUE. Because of how delayed execution of signal handlers is implemented in the pyeval main loop, this means the signal handler for SIGINT will be called *before* RETURN_VALUE, resulting in the KeyboardInterrupt exception being raised. Standard stuff. However, if Lock.__enter__ is a PyCFunction things are quite different. If you look at how the SETUP_WITH opcode is implemented, it first calls the __enter__ method with _PyObjet_CallNoArg. If this returns NULL (i.e. an exception occurred in __enter__) then "goto error" is executed and the exception is raised. However if it returns non-NULL the finally block is set up with PyFrame_BlockSetup and execution proceeds to the next opcode. At this point a potentially waiting SIGINT is handled, resulting in KeyboardInterrupt being raised while inside the with statement's suite, and finally block, and hence Lock.__exit__ are entered. Long story short, because Lock.__enter__ is a C function, assuming that it succeeds normally then with lock: do_something() always guarantees that Lock.__exit__ will be called if a SIGINT was handled inside Lock.__enter__, whereas with lock.acquire() try: ... finally: lock.release() there is at last a small possibility that the SIGINT handler is called after the CALL_FUNCTION op but before the try/finally block is entered (e.g. before executing POP_TOP or SETUP_FINALLY). So the end result is that the lock is held and never released after the KeyboardInterrupt (whether or not it's handled somehow). Whereas, again, if Lock.__enter__ is a pure Python function there's less likely to be any difference (though I don't think the possibility can be ruled out entirely). At the very least I think this quirk of CPython should be mentioned somewhere (since in all other cases the semantic meaning of the "with:" statement is clear). However, I think it might be possible to gain more consistency between these cases if pending signals are checked/handled after any direct call to PyCFunction from within the ceval loop. Sorry for the tl;dr; any thoughts?

7 15

Positional-only parameters
by Victor Stinner 11 Sep '18

11 Sep '18

Hi, For technical reasons, many functions of the Python standard libraries implemented in C have positional-only parameters. Example: ------- $ ./python Python 3.7.0a0 (default, Feb 25 2017, 04:30:32) >>> help(str.replace) replace(self, old, new, count=-1, /) # <== notice "/" at the end ... >>> "a".replace("x", "y") # ok 'a' >>> "a".replace(old="x", new="y") # ERR! TypeError: replace() takes at least 2 arguments (0 given) ------- When converting the methods of the builtin str type to the internal "Argument Clinic" tool (tool to generate the function signature, function docstring and the code to parse arguments in C), I asked if we should add support for keyword arguments in str.replace(). The answer was quick: no! It's a deliberate design choice. Quote of Yury Selivanov's message: """ I think Guido explicitly stated that he doesn't like the idea to always allow keyword arguments for all methods. I.e. `str.find('aaa')` just reads better than `str.find(needle='aaa')`. Essentially, the idea is that for most of the builtins that accept one or two arguments, positional-only parameters are better. """ http://bugs.python.org/issue29286#msg285578 I just noticed a module on PyPI to implement this behaviour on Python functions: https://pypi.python.org/pypi/positional My question is: would it make sense to implement this feature in Python directly? If yes, what should be the syntax? Use "/" marker? Use the @positional() decorator? Do you see concrete cases where it's a deliberate choice to deny passing arguments as keywords? Don't you like writing int(x="123") instead of int("123")? :-) (I know that Serhiy Storshake hates the name of the "x" parameter of the int constructor ;-)) By the way, I read that "/" marker is unknown by almost all Python developers, and [...] syntax should be preferred, but inspect.signature() doesn't support this syntax. Maybe we should fix signature() and use [...] format instead? Replace "replace(self, old, new, count=-1, /)" with "replace(self, old, new[, count=-1])" (or maybe even not document the default value?). Python 3.5 help (docstring) uses "S.replace(old, new[, count])". Victor

19 27

Implicit string literal concatenation considered harmful?
by Guido van Rossum 14 Mar '18

14 Mar '18

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

51 165

Why CPython is still behind in performance for some widely used patterns ?
by Pau Freixes 21 Feb '18

21 Feb '18

Hi, This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go. One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that def filter(rule, whatever): if rule.x in whatever.x: return True rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1 return cnt The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations. Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1 The performance of the CPython boosted x3/x4 just doing these "silly" things. The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default? The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all. Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns. This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong. BTW: pypy looks like is immunized [1] [1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau

22 40

Support WHATWG versions of legacy encodings
by Rob Speer 06 Feb '18

06 Feb '18

Hi! I joined this list because I'm interested in filling a gap in Python's standard library, relating to text encodings. There is an encoding with no name of its own. It's supported by every current web browser and standardized by WHATWG. It's so prevalent that if you ask a Web browser to decode "iso-8859-1" or "windows-1252", you will get this encoding _instead_. It is probably the second or third most common text encoding in the world. And Python doesn't quite support it. You can see the character table for this encoding at: https://encoding.spec.whatwg.org/index-windows-1252.txt For the sake of discussion, let's call this encoding "web-1252". WHATWG calls it "windows-1252", but notice that it's subtly different from Python's "windows-1252" encoding. Python's windows-1252 has bytes that are undefined: >>> b'\x90'.decode('windows-1252') UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 0: character maps to <undefined> In web-1252, the bytes that are undefined according to windows-1252 map to the control characters in those positions in iso-8859-1 -- that is, the Unicode codepoints with the same number as the byte. In web-1252, b'\x90' would decode as '\u0090'. This may seem like a silly encoding that encourages doing horrible things with text. That's pretty much the case. But there's a reason every Web browser implements it: - It's compatible with windows-1252 - Any sequence of bytes can be round-tripped through it without losing information It's not just this one encoding. WHATWG's encoding standard ( https://encoding.spec.whatwg.org/) contains modified versions of windows-1250 through windows-1258 and windows-874. Support for these encodings matters to me, in part, because I maintain a Unicode data-cleaning library, "ftfy". One thing it does is to detect and undo encoding/decoding errors that cause mojibake, as long as they're detectible and reversible. Looking at real-world examples of text that has been damaged by mojibake, it's clear that lots of text is transferred through what I'm calling the "web-1252" encoding, in a way that's incompatible with Python's "windows-1252". In order to be able to work with and fix this kind of text, ftfy registers new codecs -- and I implemented this even before I knew that they were standardized in Web browsers. When ftfy is imported, you can decode text as "sloppy-windows-1252" (the name I chose for this encoding), for example. ftfy can tell people a sequence of steps that they can use in the future to fix text that's like the text they provided. Very often, these steps require the sloppy-windows-1252 or sloppy-windows-1251 encoding, which means the steps only work with ftfy imported, even for people who are not using the features of ftfy. Support for these encodings also seems highly relevant to people who use Python for web scraping, as it would be desirable to maximize compatibility with what a Web browser would do. This really seems like it belongs in the standard library instead of being an incidental feature of my library. I know that code in the standard library has "one foot in the grave". I _want_ these legacy encodings to have one foot in the grave. But some of them are extremely common, and Python code should be able to deal with them. Adding these encodings to Python would be straightforward to implement. Does this require a PEP, a pull request, or further discussion?

21 84