Mailman 3 December 2012 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 16 Dec '20

16 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 21 Feb '20

21 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

format specifier for "not bytes"
by Daniel Holth 05 Jul '13

05 Jul '13

While I was implementing JSON-JWS (JSON web signatures), a format which in Python 3 has to go from bytes > unicode > bytes > unicode several times in its construction, I notice I wrote a lot of bugs: "sha256=b'abcdef1234'" When I meant to say: "sha256=abcdef1234" Everything worked perfectly on Python 3 because the verifying code also generated the sha256=b'abcdef1234' as a comparison. I would have never noticed at all unless I had tried to verify the Python 3 output with Python 2. I know I'm a bad person for not having unit tests capable enough to catch this bug, a bug I wrote repeatedly in each layer of the bytes > unicode > bytes > unicode dance, and that there is no excuse for being confused at any time about the type of a variable, but I'm not willing to reform. Instead, I would like a new string formatting operator tentatively called 'notbytes': "sha256=%notbytes" % (b'abcdef1234'). It gives the same error as 'sha256='+b'abc1234' would: TypeError: Can't convert 'bytes' object to str implictly

8 17

hexdump
by anatoly techtonik 29 Apr '13

29 Apr '13

Just an idea of usability fix for Python 3. hexdump module (function or bytes method is better) as simple, easy and intuitive way for dumping binary data when writing programs in Python. hexdump(bytes) - produce human readable dump of binary data, byte-by-byte representation, separated by space, 16-byte rows Rationale: 1. Debug. Generic binary data can't be output to console. A separate helper is needed to print, log or store its value in human readable format in database. This takes time. 2. Usability. binascii is ugly: name is not intuitive any more, there are a lot of functions, and it is not clear how it relates to unicode. 3. Serialization. It is convenient to have format that can be displayed in a text editor. Simple tools encourage people to use them. Practical example: >>> print(b) � � � �� >>> b '\xe6\xb0\x08\x04\xe7\x9e\x08\x04\xe7\xbc\x08\x04\xe7\xd5\x08\x04\xe7\xe4\x08\x04\xe6\xb0\x08\x04\xe7\xf0\x08\x04\xe7\xff\x08\x04\xe8\x0b\x08\x04\xe8\x1a\x08\x04\xe6\xb0\x08\x04\xe6\xb0\x08\x04' >>> print(binascii.hexlify(data)) e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804 >>> >>> data = hexdump(b) >>> print(data) E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04 E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04 E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04 >>> >>> # achieving the same output with binascii is overcomplicated >>> data_lines = [binascii.hexlify(b)[i:min(i+32, len(binascii.hexlify(b)))] for i in xrange(0, len(binascii.hexlify(b)), 32)] >>> data_lines = [' '.join(l[i:min(i+2, len(l))] for i in xrange(0, len(l), 2)).upper() for l in data_lines] >>> print('\n'.join(data_lines)) E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04 E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04 E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04 On the other side, getting rather useless binascii output from hexdump() is quite trivial: >>> data.replace(' ','').replace('\n','').lower() 'e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804' But more practical, for example, would be counting offset from hexdump: >>> print( ''.join( '%05x: %s\n' % (i*16,l) for i,l in enumerate(hexdump(b).split('\n')))) Etc. Conclusion: By providing better building blocks on basic level Python will become a better tool for more useful tasks. References: [1] http://stackoverflow.com/questions/2340319/python-3-1-1-string-to-hex [2] http://en.wikipedia.org/wiki/Hex_dump -- anatoly t.

8 10

The async API of the future
by Guido van Rossum 16 Jan '13

16 Jan '13

Work priorities don't allow me to spend another day replying in detail to the various emails on this topic, but I am still keeping up reading! I have read Greg's response to my comparison between Future+yield-based coroutines and his yield-from-based, Future-free coroutines, and after having written a small prototype, I am now pretty much convinced that Greg's way is superior. This doesn't mean you can't use generators or yield-from for other purposes! It's just that *if* you are writing a coroutine for use with a certain schedule, you must use yield and yield-from in accordance to the scheduler's rules. However, code you call can still use yield and yield-from for iteration, and you can still use for-loops. In particular, if f is a coroutine, it can still write "for x in g(): ..." where g is a generator meant to be an iterator. However if g were instead a coroutine, f should call it using "yield from g()", and f and g should agree on the interface of their scheduler. As to other topics, my current feeling is that we should try to separately develop requirements and prototype implementations of the I/O loop of the future, and to figure the loosest possible coupling between that and a coroutine scheduler (or any other type of scheduler). In particular, I think the I/O loop should not assume the event handlers are implemented using coroutines -- but if someone wants to write an awesome coroutine scheduler, they should be able to delegate all their I/O waiting needs to the I/O loop with very little trouble. To me, this means that the I/O loop probably should use "plain" callback functions (i.e., not Futures, Deferreds or coroutines). We should also standardize the interface to the I/O loop so that 3rd parties can plug in their own I/O loop -- I don't see an end to the debate whether the best C library for event handling is libevent, libev or libuv. While the focus of the I/O loop should be on single-threaded event handling, some standard interface should exist so that you can run certain code in a separate thread and wait for its completion -- I've found this handy when calling socket.getaddrinfo(), which may block. (Apparently async DNS lookups are really hard -- I read some complaints about libevent's DNS lookups, and IIUC many Firefox lockups are due to this.) But there may be other uses for this too. An issue in the design of the I/O loop is the strain between a ready-based and completion-based design. The typical Unix design (whether based on select or any of the poll variants) is usually ready-based; but on Windows, the only way to get high performance is to base it on IOCP, which is completion-based (i.e. you start a specific async operation, like writing N bytes, and the I/O loop tells you when it is done). I would like people to be able to write fast event handling programs on Windows too, and ideally the only change would be the implementation of the I/O loop. But I don't know how tenable that is given the dramatically different style used by IOCP and the need to use native Windows API for all async I/O -- it sounds like we could only do this if the library providing the I/O loop implementation also wrapped all I/O operations, andthat may be a bit much. Finally, there should also be some minimal interface so that multiple I/O loops can interact -- at least in the case where one I/O loop belongs to a GUI library. It seems this is a solved problem (as well solved as you can hope for) to Twisted, so we should just adopt their approach. -- --Guido van Rossum (python.org/~guido)

15 54

Yielding through context managers
by Joshua Bartlett 08 Jan '13

08 Jan '13

I'd like to propose adding the ability for context managers to catch and handle control passing into and out of them via yield and generator.send() / generator.next(). For instance, class cd(object): def __init__(self, path): self.inner_path = path def __enter__(self): self.outer_path = os.getcwd() os.chdir(self.inner_path) def __exit__(self, exc_type, exc_val, exc_tb): os.chdir(self.outer_path) def __yield__(self): self.inner_path = os.getcwd() os.chdir(self.outer_path) def __send__(self): self.outer_path = os.getcwd() os.chdir(self.inner_path) Here __yield__() would be called when control is yielded through the with block and __send__() would be called when control is returned via .send() or .next(). To maintain compatibility, it would not be an error to leave either __yield__ or __send__ undefined. The rationale for this is that it's sometimes useful for a context manager to set global or thread-global state as in the example above, but when the code is used in a generator, the author of the generator needs to make assumptions about what the calling code is doing. e.g. def my_generator(path): with cd(path): yield do_something() do_something_else() Even if the author of this generator knows what effect do_something() and do_something_else() have on the current working directory, the author needs to assume that the caller of the generator isn't touching the working directory. For instance, if someone were to create two my_generator() generators with different paths and advance them alternately, the resulting behaviour could be most unexpected. With the proposed change, the context manager would be able to handle this so that the author of the generator doesn't need to make these assumptions. Naturally, nested with blocks would be handled by calling __yield__ from innermost to outermost and __send__ from outermost to innermost. I rather suspect that if this change were included, someone could come up with a variant of the contextlib.contextmanager decorator to simplify writing generators for this sort of situation. Cheers, J. D. Bartlett

6 14

Preventing out of memory conditions
by Max Moroz 03 Jan '13

03 Jan '13

Sometimes, I have the flexibility to reduce the memory used by my program (e.g., by destroying large cached objects, etc.). It would be great if I could ask Python interpreter to notify me when memory is running out, so I can take such actions. Of course, it's nearly impossible for Python to know in advance if the OS would run out of memory with the next malloc call. Furthermore, Python shouldn't guess which memory (physical, virtual, etc.) is relevant in the particular situation (for instance, in my case, I only care about physical memory, since swapping to disk makes my application as good as frozen). So the problem as stated above is unsolvable. But let's say I am willing to do some work to estimate the maximum amount of memory my application can be allowed to use. If I provide that number to Python interpreter, it may be possible for it to notify me when the next memory allocation would exceed this limit by calling a function I provide it (hopefully passing as arguments the amount of memory being requested, as well as the amount currently in use). My callback function could then destroy some objects, and return True to indicate that some objects were destroyed. At that point, the intepreter could run its standard garbage collection routines to release the memory that corresponded to those objects - before proceeding with whatever it was trying to do originally. (If I returned False, or if I didn't provide a callback function at all, the interpreter would simply behave as it does today.) Any memory allocations that happen while the callback function itself is executing, would not trigger further calls to it. The whole mechanism would be disabled for the rest of the session if the memory freed by the callback function was insufficient to prevent going over the memory limit. Would this be worth considering for a future language extension? How hard would it be to implement? Max

7 8

Order in the documentation search results
by Hernan Grecco 03 Jan '13

03 Jan '13

Hi, I have seen many people new to Python stumbling while using the Python docs due to the order of the search results. For example, if somebody new to python searches for `tuple`, the actual section about `tuple` comes in place 39. What is more confusing for people starting with the language is that all the C functions come first. I have seen people clicking in PyTupleObject just to be totally disoriented. Maybe `tuple` is a silly example. But if somebody wants to know how does `open` behaves and which arguments it takes, the result comes in position 16. `property` does not appear in the list at all (but built-in appears in position 31). This is true for most builtins. Experienced people will have no trouble navigating through these results, but new users do. It is not terrible and at the end they get it, but I think it would be nice to change it to more (new) user friendly order. So my suggestion is to put the builtins first, the rest of the standard lib later including HowTos, FAQ, etc and finally the c-modules. Additionally, a section with a title matching exactly the search query should come first. (I am not sure if the last suggestion belongs in python-ideas or in the sphinx mailing list, please advice) Thanks, Hernan

10 13

Documenting Python warts on Stack Overflow
by anatoly techtonik 02 Jan '13

02 Jan '13

I am thinking about [python-wart] on SO. There is no currently a list of Python warts, and building a better language is impossible without a clear visibility of warts in current implementations. Why Roundup doesn't work ATM. - warts are lost among other "won't fix" and "works for me" issues - no way to edit description to make it more clear - no voting/stars to percieve how important is this issue - no comment/noise filtering and the most valuable - there is no query to list warts sorted by popularity to explore other time-consuming areas of Python you are not aware of, but which can popup one day SO at least allows: + voting + community wiki edits + useful comment upvoting + sorted lists + user editable tags (adding new warts is easy) This post is a result of facing with numerous locals/settrace/exec issues that are closed on tracker. I also have my own list of other issues (logging/subprocess) at GC project, which I might be unable to maintain in future. There is also some undocumented stuff (subprocess deadlocks) that I'm investigating, but don't have time for a write-up. So I'd rather move this somewhere where it could be updated. -- anatoly t.

21 59

proposed methods: list.replace / list.indices
by David Kreuter 31 Dec '12

31 Dec '12

Hi python-ideas. I think it would be nice to have a method in 'list' to replace certain elements by others in-place. Like this: l = [x, a, y, a] l.replace(a, b) assert l == [x, b, y, b] The alternatives are longer than they should be, imo. For example: for i, n in enumerate(l): if n == a: l[i] = b Or: l = [b if n==a else n for n in l] And this is what happens when someone tries to "optimize" this process. It totally obscures the intention: try: i = 0 while i < len(l): i = l.index(a, i) l[i] = b i += 1 except ValueError: pass If there is a reason not to add '.replace' as built-in method, it could be implemented in pure python efficiently if python provided a version of '.index' that returns the index of more than just the first occurrence of a given item. Like this: l = [x, a, b, a] for i in l.indices(a): l[i] = b So adding .replace and/or .indices… Good idea? Bad idea?

11 21