Mailman 3 June 2014 - Python-ideas

"Escape hatch" for preferred encoding (default encoding for `open`)
by Ionel Maries Cristian June 21, 2014

June 21, 2014

It would be nice if there would be an escape hatch for situations where the value of locale.getpreferredencoding() can't be changed (eg: windows - try changing that to utf8 <http://blog.ionelmc.ro/2014/06/19/just-another-day-using-python-3/>) in the form of an environment variable like PYTHONPREFERREDENCODING (or something like that). The idea is that it would override the default encoding for open() for platforms/situations where it's infeasible to manually specify the encoding to open (… [View More]

1 0

Really support custom types for global namespace
by Robert Lehmann June 19, 2014

June 19, 2014

[resending w/o Google Groups <https://groups.google.com/d/msg/python-ideas/PRLbe6ERtx4/0fXq3lI6TjgJ>] I'm not sure if this is a beaten horse; I could only find vaguely related discussions on other scoping issues (so please, by all means, point me to past discussions of what I propose.) The interpreter currently supports setting a custom type for globals() and overriding __getitem__. The same is not true for __setitem__: class Namespace(dict): def __getitem__(self, key): … [View More]

5 4

Improving xmlrpc introspection
by Claudiu Popa June 19, 2014

June 19, 2014

Hello. This idea proposes enhancing the xmlrpc library by adding a couple of introspectable servers and proxies. For instance, here's an output of using the current idioms. >>> proxy = ServerProxy('http://localhost:8000') >>> dir(proxy) ['_ServerProxy__allow_none', '_ServerProxy__close', '_ServerProxy__encoding', '_ServerProxy__handler', '_ServerProxy__host', '_ServerProxy__request', '_ServerProxy__transport', '_ServerProxy__verbose', '__call__', '__class__', '__delattr__', '… [View More]__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattr__' , '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] Nothing useful in dir. The following works only if the server enables introspection: >>> proxy.system.listMethods() ['mul', 'pow', 'system.listMethods', 'system.methodHelp', 'system.methodSignature'] Now, let's see what mul does: >>> proxy.mul <xmlrpc.client._Method object at 0x02AFB690> >>> help(proxy.mul) Help on _Method in module xmlrpc.client object: class _Method(builtins.object) | Methods defined here: | | __call__(self, *args) | | __getattr__(self, name) | | __init__(self, send, name) | # some magic to bind an XML-RPC method to an RPC server. | # supports "nested" methods (e.g. examples.getStateName) | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Nothing useful for us. Neither methodHelp, nor methodSignature are very useful: >>> proxy.system.methodHelp('mul') 'multiplication' >>> proxy.system.methodSignature('mul') 'signatures not supported' We can find out something about that method by calling it. >>> proxy.mul(1, 2, 3) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1091, in __call__ return self.__send(self.__name, args) File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1421, in __request verbose=self.__verbose File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1133, in request return self.single_request(host, handler, request_body, verbose) File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1149, in single_request return self.parse_response(resp) File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1320, in parse_response return u.close() File "D:\Projects\cpython\lib\xmlrpc\client.py", line 658, in close raise Fault(**self._stack[0]) xmlrpc.client.Fault: <Fault 1: "<class 'TypeError'>:mul() takes 3 positional arguments but 4 were given"> So, only after calling a method, one can find meaningful informations about it. My idea behaves like this: >>> from xmlrpc.client import MagicProxy # not a very good name, but it does some magic behind >>> proxy = MagicProxy('http://localhost:8000') >>> dir(proxy) ['_ServerProxy__allow_none', '_ServerProxy__close', '_ServerProxy__encoding', '_ServerProxy__handler', '_ServerProxy__host', '_ServerProxy__request', '_ServerProxy__trans ', '_ServerProxy__verbose', '__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_collect_methods', '_original_mul', '_original_pow', 'mul', 'pow'] >>> proxy.mul <function mul at 0x035AD5D8> >>> proxy.pow <function pow at 0x035AD638> >>> help(proxy.mul) Help on function mul in module xmlrpc.client: mul(x:1, y) -> 2 multiplication >>> help(proxy.pow) Help on function pow in module xmlrpc.client: pow(*args, **kwargs) pow(x, y[, z]) -> number With two arguments, equivalent to x**y. With three arguments, equivalent to (x**y) % z, but may be more efficient (e.g. for ints). >>> proxy.mul(1) Traceback (most recent call last): File "<console>", line 1, in <module> TypeError: mul() missing 1 required positional argument: 'y' >>> proxy.mul(1, 2, 3) Traceback (most recent call last): File "<console>", line 1, in <module> TypeError: mul() takes 2 positional arguments but 3 were given >>> proxy.mul(1, 2) 2 >>> import inspect >>> inspect.signature(proxy.mul) <Signature at 0x35d4b98 "(x:1, y) -> 2"> >>> As we can see, the registered methods can be introspected and calling one with the wrong number of arguments will not trigger a request to the server, but will fail right in the user's code. As a problem, it will work only for servers written in Python. For others will fallback to the current idiom. Would something like this be useful as an addition to the stdlib's xmlrpc module? If someone wants to test it, here's a rough patch against tip: https://gist.github.com/PCManticore/cf82ab421d4dc5c7f6ff. Thanks! [View Less]

4 4

Bitwise operations on bytes class
by Nathaniel McCallum June 18, 2014

June 18, 2014

I find myself, fairly often, needing to perform bitwise operations (rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes and bytearray). I can't think of any other reasonable use for these operators. Is upstream Python interested in this kind of behavior by default? At the least, it would make many algorithms very easy to read and write. Nathaniel

14 35

A General Outline for Just-in-Time Acceleration of Python
by Neil Girdhar June 16, 2014

June 16, 2014

I was wondering what work is being done on Python to make it faster. I understand that cpython is incrementally improved. I'm not sure, but I think that pypy acceleration works by compiling a restricted set of Python. And I think I heard something about Guido working on a different model for accelerating Python. I apologize in advance that I didn't look into these projects in a lot of detail. My number one dream about computer languages is for me to be able to write in a language as … [View More]easy as Python and have it run as quickly as if it were written. I do believe that this is possible (since in theory someone could look at my Python code and port it to C++). Unfortunately, I don't have time to work on this goal, but I still wanted to get feedback about some ideas I have about reaching this goal. First, I don't think it's important for a "code block" (say, a small section of code with less coupling to statements outside the block than to within the block) to run quickly on its first iteration. What I'm suggesting instead is for every iteration of a "code block", the runtime stochastically decides whether to collect statistics about that iteration. Those statistics include the the time running the block, the time perform attribute accesses including type method lookups and so on. Basically, the runtime is trying to guess the potential savings of optimizing this block. If the block is run many times and the potential savings are large, then stochastically again, the block is promoted to a second-level statistics collection. This level collects statistics about all of the external couplings of the block, like the types and values of the passed-in and returned values. Using the second-level statistics, the runtime can now guess whether the block should be promoted to a third level whereby any consistencies are exploited. For example, if the passed-in parameter types and return value type of the "min" function are (int, int, int) for 40% of the statistics and (float, float, float) for 59%, and other types for the remaining 1%, then two precompiled versions of min are generated: one for int and one for float. These precompiled code blocks have different costs than regular Python blocks. They need to pay the following costs: * a check for the required invariants (parameter types above, but it could be parameter values, or other invariants) * they need to install hooks on objects that must remain invariant during the execution of the block; if the invariants are ever violated during the execution of the block, then all of the computations done during this execution of the block must be discarded * therefore a third cost is the probability of discarded the computation times the average cost of the doing the wasted computation. The saving is that the code block * can be transformed into a faster bytecode, which includes straight assembly instructions in some sections since types or values can now be assumed, * can use data structures that make type or access assumptions (for example a list that always contains ints can use a flattened representation; a large set that is repeatedly having membership checked with many negative results might benefit from an auxiliary bloom filter, etc.) In summary the runtime performs stochastic, incremental promotion of code blocks from first-level, to second-level, to multiple precompiled versions. It can also demote a code block. The difference between the costs of the different levels is statistically estimated. Examples of optimizations that can be commonly accomplished using such a system are: * global variables are folded into code as constants. (Even if they change rarely, you pay the discarding penalty described above plus the recompilation cost; the benefit of inline use of the constant (and any constant folding) might outweigh these costs.) * lookup of member functions, which almost never change * flattening of homogeneously-typed lists Best, Neil [View Less]

8 19

A possible transition plan to bytes-based iteration and indexing for binary data
by Nick Coghlan June 16, 2014

June 16, 2014

At PyCon earlier this year, Guido (and others) persuaded me that the integer based indexing and iteration for bytes and bytearray in Python 3 was a genuine design mistake based on the initial Python 3 design which lacked an immutable bytes type entirely (so producing integers was originally the only reasonable choice). The earlier design discussions around PEP 467 (which proposes to clean up a few other bits and pieces of that original legacy which PEP 3137 left in place) all treated "bytes … [View More]indexing returns an integer" as an unchangeable aspect of Python 3, since there wasn't an obvious way to migrate to instead returning length 1 bytes objects with a reasonable story to handle the incompatibility for Python 3 users, even if everyone was in favour of the end result. A few weeks ago I had an idea for a migration strategy that seemed feasible, and I now have a very, very preliminary proof of concept up at https://bitbucket.org/ncoghlan/cpython_sandbox/branch/bytes_migration_exper… The general principle involved would be to return an integer *subtype* from indexing and iteration operations on bytes, bytearray and memoryview objects using the "default" format character. That subtype would then be detected in various locations and handled the way a length 1 bytes object would be handled, rather than the way an integer would be handled. The current proof of concept adds such handling to ord(), bytes() and bytearray() (with appropriate test cases in test_bytes) giving the following results: >>> b'hello'[0] 104 >>> ord(b'hello'[0]) 104 >>> bytes(b'hello'[0]) b'h' >>> bytearray(b'hello'[0]) bytearray(b'h') (the subtype is currently visible at the Python level as "types._BytesInt") The proof of concept doesn't override any normal integer behaviour, but a more complete solution would be in a position to emit a warning when the result of binary indexing is used as an integer (either always, or controlled by a command line switch, depending on the performance impact). With this integer subtype in place for Python 3.5 to provide a transition period where both existing integer-compatible operations (like int() and arithmetic operations) and selected bytes-compatible operations (like ord(), bytes() and bytearray()) are supported, these operations could then be switched to producing a normal length 1 bytes object in Python 3.6. It wouldn't be pretty, and it would be a pain to document, but it seems feasible. The alternative is for PEP 367 to add a separate bytes iteration method, which strikes me as further entrenching a design we aren't currently happy with. Regards, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia [View Less]

10 12

Re: [Python-ideas] Empty set, Empty dict
by Neil Girdhar June 10, 2014

June 10, 2014

I've seen this proposed before, and I personally would love this, but my guess is that it breaks too much code for too little gain. On Wednesday, May 21, 2014 12:33:30 PM UTC-4, Frédéric Legembre wrote: > > > Now | Future | > ---------------------------------------------------- > () | () | empty tuple ( 1, 2, 3 ) > [] | [] | empty list [ 1, 2, 3 ] > set() | {} | empty set { 1, 2, 3 } > {} | {:} | empty dict … [View More]

3 2

Implement `itertools.permutations.__getitem__` and `itertools.permutations.index`
by Ram Rachum June 10, 2014

June 10, 2014

I suggest implementing: - `itertools.permutations.__getitem__`, for getting a permutation by its index number, and possibly also slicing, and - `itertools.permutations.index` for getting the index number of a given permutation. What do you think? Thanks, Ram.

7 14

Make __reduce__ to correspond to __getnewargs_ex__
by Neil Girdhar June 8, 2014

June 8, 2014

Currently __reduce__<http://docs.python.org/3.4/library/pickle.html#object.__reduce__>returns up to five things: (1) self.__new__ (or a substitute) (2) the result of __getnewargs__<http://docs.python.org/3.4/library/pickle.html#object.__getnewargs__>, which returns a tuple of positional arguments for __new__<http://docs.python.org/3.4/reference/datamodel.html#object.__new__> , (3) the result of __getstate__<http://docs.python.org/3.4/library/pickle.html#object.… [View More]

3 7

Disable all peephole optimizations
by Ned Batchelder June 8, 2014

June 8, 2014

** The problem A long-standing problem with CPython is that the peephole optimizer cannot be completely disabled. Normally, peephole optimization is a good thing, it improves execution speed. But in some situations, like coverage testing, it's more important to be able to reason about the code's execution. I propose that we add a way to completely disable the optimizer. To demonstrate the problem, here is continue.py: a = b = c = 0 for n in range(100): if n % 2: … [View More] if n % 4: a += 1 continue else: b += 1 c += 1 assert a == 50 and b == 50 and c == 50 If you execute "python3.4 -m trace -c -m continue.py", it produces this continue.cover file: 1: a = b = c = 0 101: for n in range(100): 100: if n % 2: 50: if n % 4: 50: a += 1 >>>>>> continue else: 50: b += 1 50: c += 1 1: assert a == 50 and b == 50 and c == 50 This indicates that the continue line is not executed. It's true: the byte code for that statement is not executed, because the peephole optimizer has removed the jump to the jump. But in reasoning about the code, the continue statement is clearly part of the semantics of this program. If you remove the statement, the program will run differently. If you had to explain this code to a learner, you would of course describe the continue statement as part of the execution. So the trace output does not match our (correct) understanding of the program. The reason we are running trace (or coverage.py) in the first place is to learn something about our code, but it is misleading us. The peephole optimizer is interfering with our ability to reason about the code. We need a way to disable the optimizer so that this won't happen. This type of control is well-known in C compilers, for the same reasons: when running code, optimization is good for speed; when reasoning about code, optimization gets in the way. More details are in http://bugs.python.org/issue2506, which also includes previous discussion of the idea. This has come up on Python-Dev, and Guido seemed supportive: https://mail.python.org/pipermail/python-dev/2012-December/123099.html . ** Implementation Although it may seem like a big change to be able to disable the optimizer, the heart of it is quite simple. In compile.c is the only call to PyCode_Optimize. That function takes a string of bytecode and returns another. If we skip that call, the peephole optimizer is disabled. ** User Interface Unfortunately, the -O command-line switch does not lend itself to a new value that means, "less optimization than the default." I propose a new switch -P, to control the peephole optimizer, with a value of -P0 meaning no optimization at all. The PYTHONPEEPHOLE environment variable would also control the option. There are about a dozen places internal to CPython where optimization level is indicated with an integer, for example, in Py_CompileStringObject. Those uses also don't allow for new values indicating less optimization than the default: 0 and -1 already have meanings. Unless we want to start using -2 for less that the default. I'm not sure we need to provide for those values, or if the PYTHONPEEPHOLE environment variable provides enough control. ** Ramifications This switch makes no changes to the semantics of Python programs, although clearly, if you are tracing a program, the exact sequence of lines and bytecodes will be different (this is the whole point). In the ticket, one objection raised is that providing this option will complicate testing, and that optimization is a difficult enough thing to get right as it is. I disagree, I think providing this option will help test the optimizer, because it will give us a way to test that code runs the same with and without the optimizer. This gives us a tool to use to demonstrate that the optimizer isn't changing the behavior of programs. [View Less]

17 41