Mailman 3 May 2013 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 16 Dec '20

16 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 22 Feb '20

22 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

Implicit string literal concatenation considered harmful?
by Guido van Rossum 14 Mar '18

14 Mar '18

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

51 165

while conditional in list comprehension ??
by Wolfgang Maier 21 Feb '14

21 Feb '14

Dear all, I guess this is so obvious that someone must have suggested it before: in list comprehensions you can currently exclude items based on the if conditional, e.g.: [n for n in range(1,1000) if n % 4 == 0] Why not extend this filtering by allowing a while statement in addition to if, as in: [n for n in range(1,1000) while n < 400] Trivial effect, I agree, in this example since you could achieve the same by using range(1,400), but I hope you get the point. This intuitively understandable extension would provide a big speed-up for sorted lists where processing all the input is unnecessary. Consider this: some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # a sorted list of names [n for n in some_names if n.startswith("A")] # certainly gives a list of all names starting with A, but . [n for n in some_names while n.startswith("A")] # would have saved two comparisons Best, Wolfgang

19 70

A nice __repr__ for the ast.* classes?
by Haoyi Li 19 Nov '13

19 Nov '13

Wouldn't it be nice if this >>> import ast >>> print repr(ast.parse("(1 + 1)").body[0].value) <_ast.BinOp object at 0x0000000001E94B38> printed something more useful? >>> print repr(ast.parse("(1 + 1)").body[0].value) BinOp(left=Num(n=1), op=Add(), right=Num(n=1)) I've been doing some work on macropy <https://github.com/lihaoyi/macropy>, which uses the ast.* classes extensively, and it's annoying that we have to resort to dirty-tricks like monkey-patching the AST classes (for CPython 2.7) or even monkey-patching __builtin__.repr (to get it working on PyPy) just to get eval(repr(my_ast)) == my_ast to hold true. And a perfectly good solution already exists in the ast.dump() method, too! (It would also be nice if "==" did a structural comparison on the ast.* classes too, but that's a different issue). -Haoyi

5 5

Gzip and zip extra field
by Serhiy Storchaka 17 Nov '13

17 Nov '13

Gzip files can contains an extra field [1] and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field. ZIP file entries also can contains an extra field [2]. Currently it just saved as bytes in the `extra` attribute of ZipInfo. I propose to save an extra field for gzip file and provide structural access to subfields. f = gzip.GzipFile('somefile.gz', 'rb') f.extra_bytes # A raw extra field as bytes # iterating over all subfields for xid, data in f.extra_map.items(): ... # get Apollo file type information f.extra_map[b'AP'] # (or f.extra_map['AP']?) # creating gzip file with extra field f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes) f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)]) f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata}) # change Apollo file type information f.extra_map[b'AP'] = ... Issue #17681 [3] has preliminary patches. There is some open doubt about interface. Is not it over-engineered? Currently GzipFile supports seamless reading a sequence of separately compressed gzip files. Every such chunk can have own extra field (this is used in dictzip for example). It would be desirable to be able to read only until the end of current chunk in order not to miss an extra field. [1] http://www.gzip.org/format.txt [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT [3] http://bugs.python.org/issue17681

3 4

Idea: Compressing the stack on the fly
by Ram Rachum 12 Sep '13

12 Sep '13

Hi everybody, Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how programming languages are implemented, so this idea will most likely be either (a) completely impossible or (b) trivial knowledge. I was thinking about the implementation of the factorial in Python. I was comparing in my mind 2 different solutions: The recursive one, and the one that uses a loop. Here are example implementations for them: def factorial_recursive(n): if n == 1: return 1 return n * factorial_recursive(n - 1) def factorial_loop(n): result = 1 for i in range(1, n + 1): result *= i return result I know that the recursive one is problematic, because it's putting a lot of items on the stack. In fact it's using the stack as if it was a loop variable. The stack wasn't meant to be used like that. Then the question came to me, why? Maybe the stack could be built to handle this kind of (ab)use? I read about tail-call optimization on Wikipedia. If I understand correctly, the gist of it is that the interpreter tries to recognize, on a frame-by-frame basis, which frames could be completely eliminated, and then it eliminates those. Then I read Guido's blog post explaining why he doesn't want it in Python. In that post he outlined 4 different reasons why TCO shouldn't be implemented in Python. But then I thought, maybe you could do something smarter than eliminating individual stack frames. Maybe we could create something that is to the current implementation of the stack what `xrange` is to the old-style `range`. A smart object that allows access to any of a long list of items in it, without actually having to store those items. This would solve the first argument that Guido raises in his post, which I found to be the most substantial one. What I'm saying is: Imagine the stack of the interpreter when it runs the factorial example above for n=1000. It has around 1000 items in it and it's just about to explode. But then, if you'd look at the contents of that stack, you'd see it's embarrassingly regular, a compression algorithm's wet dream. It's just the same code location over and over again, with a different value for `n`. So what I'm suggesting is an algorithm to compress that stack on the fly. An algorithm that would detect regularities in the stack and instead of saving each individual frame, save just the pattern. Then, there wouldn't be any problem with showing informative stack trace: Despite not storing every individual frame, each individual frame could still be *accessed*, similarly to how `xrange` allow access to each individual member without having to store each of them. Then, the stack could store a lot more items, and tasks that currently require recursion (like pickling using the standard library) will be able to handle much deeper recursions. What do you think? Ram.

13 20

format specifier for "not bytes"
by Daniel Holth 05 Jul '13

05 Jul '13

While I was implementing JSON-JWS (JSON web signatures), a format which in Python 3 has to go from bytes > unicode > bytes > unicode several times in its construction, I notice I wrote a lot of bugs: "sha256=b'abcdef1234'" When I meant to say: "sha256=abcdef1234" Everything worked perfectly on Python 3 because the verifying code also generated the sha256=b'abcdef1234' as a comparison. I would have never noticed at all unless I had tried to verify the Python 3 output with Python 2. I know I'm a bad person for not having unit tests capable enough to catch this bug, a bug I wrote repeatedly in each layer of the bytes > unicode > bytes > unicode dance, and that there is no excuse for being confused at any time about the type of a variable, but I'm not willing to reform. Instead, I would like a new string formatting operator tentatively called 'notbytes': "sha256=%notbytes" % (b'abcdef1234'). It gives the same error as 'sha256='+b'abc1234' would: TypeError: Can't convert 'bytes' object to str implictly

8 17

Line continuations with comments
by Ron Adam 10 Jun '13

10 Jun '13

There were a few people who liked the idea of having comments after a line continuation. I was able to make a small patch that removed the some of the restrictions on the '\' for testing some ideas which does the following. * Allow a line to continue on the same line. * Skips comments before checking for the new line after a back slash. Here are some examples... These are technically the same. >>> 'aaa' \ ... 'bbb' \ ... 'ccc' 'aaabbbccc' >>> 'aaa' \ 'bbb' \ 'ccc' 'aaabbbccc' Yes there is't much need for this, but I wanted to see if it would work and if the test suit passes. It does. ;-) You can put a comment after a line continuation. >>> 'aaa' \# one ... 'bbb' \# two ... 'ccc' # three 'aaabbbccc' Works with expressions too. >>> result = \ ... + 111 \# A ... + 222 \# B ... + 333 \# C ... + 444 # D >>> result 1110 But if it has a space between the \ and the #, the line is continued on the same line instead of the following line. >>> 'aaa' \ #comment 'aaa' The reason \# works, but not \ #, is when the comment comes directly after the back slash, it's removed and leaves a (backslash + new-line) pair. Removing the white space before the new line check caused some errors in the test suite. I haven't figured out why yet. So this doesn't do that for now. Currently you get this if you try any of these examples. >>> 'abc' \#comment File "<stdin>", line 1 'abc' \#comment ^ SyntaxError: unexpected character after line continuation character Only one of pythons tests fail, and I don't think it's related. test test_urllib2_localnet failed See the diff below if you want to play with it. It's not big. Cheers, Ron diff -r 155e6fb309f5 Parser/tokenizer.c --- a/Parser/tokenizer.c Tue May 21 21:02:04 2013 +0200 +++ b/Parser/tokenizer.c Tue May 21 22:10:31 2013 -0500 @@ -1391,18 +1391,31 @@ again: tok->start = NULL; + + c = tok_nextc(tok); + + /* Check if continuing line */ + if (tok->cont_line == 1 && c == '\n') { + tok->cont_line = 0; + c = tok_nextc(tok); + } + /* Skip spaces */ - do { + while (c == ' ' || c == '\t' || c == '\014') { c = tok_nextc(tok); - } while (c == ' ' || c == '\t' || c == '\014'); + tok->cont_line = 0; + } /* Set start of current token */ tok->start = tok->cur - 1; /* Skip comment */ - if (c == '#') + if (c == '#') { while (c != EOF && c != '\n') c = tok_nextc(tok); + tok_backup(tok, c); + goto again; + } /* Check for EOF and errors now */ if (c == EOF) { @@ -1641,12 +1654,6 @@ /* Line continuation */ if (c == '\\') { - c = tok_nextc(tok); - if (c != '\n') { - tok->done = E_LINECONT; - tok->cur = tok->inp; - return ERRORTOKEN; - } tok->cont_line = 1; goto again; /* Read next line */ }

9 12

PEP 426, YAML in the stdlib and implementation discovery
by Philipp A. 04 Jun '13

04 Jun '13

Hi, reading PEP 426<http://www.python.org/dev/peps/pep-0426/#switching-to-a-json-compatible-for…>, I made a connection to a (IMHO) longstanding issue: YAML not being in the stdlib. I’m no big fan of JSON, because it’s so strict and comparatively verbose compared with YAML. I just think YAML is more pythonic, and a better choice for any kind of human-written data format. So i devised 3 ideas: 1. *YAML in the stdlib* The stdlib shouldn’t get more C code; that’s what I’ve gathered. So let’s put a pure-python implementation of YAML into the stdlib. Let’s also strictly define the API and make it secure-by-naming™. What i mean is let’s use the safe load function that doesn’t instantiate user-defined classes (in PyYAML called “safe_load”) as default load function “load”, and call the unsafe one by a longer, explicit name (e.g. “unsafe_load” or “extended_load” or something) Let’s base the parser on generators, since generators are cool, easy to debug, and allow us to emit and test the token stream (other than e.g. the HTML parser we have) 2. *Implementation discovery* People want fast parsing. That’s incompatible with a pure python implementation. So let’s define (or use, if there is one I’m not aware of) a discovery mechanism that allows implementations of certain APIs to register themselves as such. Let “import yaml” use this mechanism to import a compatible 3rd party implementation in preference to the stdlib one Let’s define a property of the implementation that tells the user which implementation he’s using, and a way to select a specific implementation (Although that’s probably easily done by just not doing “import yaml”, but “import std_yaml” or “import pyyaml2”) 3. Allow YAML to be used besides JSON as metadata like in PEP 426. (so including either pymeta.yaml or pymeta.json makes a valid package) I don’t propose that we exclusively use YAML, but only because I think that PEP 426 shouldn’t be hindered from being implemented ASAP by waiting for a new std-library to be ready. What do you think? Is there a reason for not including a YAML lib that i didn’t cover? Is there a reason JSON is used other than YAML not being in the stdlib?

19 39