It would be nice if there would be an escape hatch for situations where the
value of locale.getpreferredencoding() can't be changed (eg: windows - try
changing that to utf8
<http://blog.ionelmc.ro/2014/06/19/just-another-day-using-python-3/>) in
the form of an environment variable like PYTHONPREFERREDENCODING (or
something like that).
The idea is that it would override the default encoding for open() for
platforms/situations where it's infeasible to manually specify the encoding
to open (eg: lots of old code) or change locale to something utf8-ish
(windows).
I've found an old thread
<http://grokbase.com/t/python/python-dev/116w24gdra/open-set-the-default-enc…>
about this problem but to my bewilderment no one considered using an
environment variable.
Thanks,
-- Ionel M.
[resending w/o Google Groups
<https://groups.google.com/d/msg/python-ideas/PRLbe6ERtx4/0fXq3lI6TjgJ>]
I'm not sure if this is a beaten horse; I could only find vaguely related
discussions on other scoping issues (so please, by all means, point me to
past discussions of what I propose.)
The interpreter currently supports setting a custom type for globals() and
overriding __getitem__. The same is not true for __setitem__:
class Namespace(dict):
def __getitem__(self, key):
print("getitem", key)
def __setitem__(self, key, value):
print("setitem", key, value)
def fun():
global x, y
x # should call globals.__getitem__
y = 1 # should call globals.__setitem__
dis.dis(fun)
# 3 0 LOAD_GLOBAL 0 (x)
# 3 POP_TOP
#
# 4 4 LOAD_CONST 1 (1)
# 7 STORE_GLOBAL 1 (y)
# 10 LOAD_CONST 0 (None)
# 13 RETURN_VALUE
exec(fun.__code__, Namespace())
# => getitem x
# no setitem :-(
I think it is weird why reading global variables goes through the usual
magic methods just fine, while writing does not. The behaviour seems to
have been introduced in Python 3.3.x (commit e3ab8aa
<http://hg.python.org/cpython/rev/e3ab8aa0216c>) to support custom
__builtins__. The documentation is fuzzy on this issue:
If only globals is provided, it must be a dictionary, which will be used
> for both the global and the local variables. If globals and locals are
> given, they are used for the global and local variables, respectively. If
> provided, locals can be any mapping object.
People at python-list
<https://groups.google.com/d/msg/comp.lang.python/lqnYwf3-Pjw/EiaBJO5H3T0J>
were at odds if this was a bug, unspecified/unsupported behaviour, or a
deliberate design decision. If it is just unsupported, I don't think the
asymmetry makes it any better. If it is deliberate, I don't understand why
dispatching on the dictness of globals (PyDict_CheckExact(f_globals)) is
good enough for LOAD_GLOBAL, but not for STORE_GLOBAL in terms of
performance.
I have a patch (+ tests) to the current default branch straightening out
this asymmetry and will happily open a ticket if you think this is indeed a
bug.
Thanks in advance,
Robert
Hello.
This idea proposes enhancing the xmlrpc library by adding a couple
of introspectable servers and proxies. For instance, here's an output of
using the current idioms.
>>> proxy = ServerProxy('http://localhost:8000')
>>> dir(proxy)
['_ServerProxy__allow_none', '_ServerProxy__close',
'_ServerProxy__encoding', '_ServerProxy__handler',
'_ServerProxy__host', '_ServerProxy__request',
'_ServerProxy__transport', '_ServerProxy__verbose', '__call__',
'__class__', '__delattr__', '__dict__', '__dir__', '__doc__',
'__enter__', '__eq__', '__exit__', '__format__', '__ge__',
'__getattr__'
, '__getattribute__', '__gt__', '__hash__', '__init__', '__le__',
'__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__']
Nothing useful in dir. The following works only if the server enables
introspection:
>>> proxy.system.listMethods()
['mul', 'pow', 'system.listMethods', 'system.methodHelp',
'system.methodSignature']
Now, let's see what mul does:
>>> proxy.mul
<xmlrpc.client._Method object at 0x02AFB690>
>>> help(proxy.mul)
Help on _Method in module xmlrpc.client object:
class _Method(builtins.object)
| Methods defined here:
|
| __call__(self, *args)
|
| __getattr__(self, name)
|
| __init__(self, send, name)
| # some magic to bind an XML-RPC method to an RPC server.
| # supports "nested" methods (e.g. examples.getStateName)
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
Nothing useful for us. Neither methodHelp, nor methodSignature are very useful:
>>> proxy.system.methodHelp('mul')
'multiplication'
>>> proxy.system.methodSignature('mul')
'signatures not supported'
We can find out something about that method by calling it.
>>> proxy.mul(1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1091, in __call__
return self.__send(self.__name, args)
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1421, in __request
verbose=self.__verbose
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1133, in request
return self.single_request(host, handler, request_body, verbose)
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1149, in single_request
return self.parse_response(resp)
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 1320, in parse_response
return u.close()
File "D:\Projects\cpython\lib\xmlrpc\client.py", line 658, in close
raise Fault(**self._stack[0])
xmlrpc.client.Fault: <Fault 1: "<class 'TypeError'>:mul() takes 3
positional arguments but 4 were given">
So, only after calling a method, one can find meaningful informations about it.
My idea behaves like this:
>>> from xmlrpc.client import MagicProxy # not a very good name, but it does some magic behind
>>> proxy = MagicProxy('http://localhost:8000')
>>> dir(proxy)
['_ServerProxy__allow_none', '_ServerProxy__close',
'_ServerProxy__encoding', '_ServerProxy__handler',
'_ServerProxy__host', '_ServerProxy__request', '_ServerProxy__trans
', '_ServerProxy__verbose', '__call__', '__class__', '__delattr__',
'__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__',
'__format__', '__ge__',
'__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '__weakref__',
'_collect_methods', '_original_mul', '_original_pow', 'mul', 'pow']
>>> proxy.mul
<function mul at 0x035AD5D8>
>>> proxy.pow
<function pow at 0x035AD638>
>>> help(proxy.mul)
Help on function mul in module xmlrpc.client:
mul(x:1, y) -> 2
multiplication
>>> help(proxy.pow)
Help on function pow in module xmlrpc.client:
pow(*args, **kwargs)
pow(x, y[, z]) -> number
With two arguments, equivalent to x**y. With three arguments,
equivalent to (x**y) % z, but may be more efficient (e.g. for ints).
>>> proxy.mul(1)
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: mul() missing 1 required positional argument: 'y'
>>> proxy.mul(1, 2, 3)
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: mul() takes 2 positional arguments but 3 were given
>>> proxy.mul(1, 2)
2
>>> import inspect
>>> inspect.signature(proxy.mul)
<Signature at 0x35d4b98 "(x:1, y) -> 2">
>>>
As we can see, the registered methods can be introspected and calling
one with the wrong number of arguments will not trigger a request to
the server, but will fail right in the user's code.
As a problem, it will work only for servers written in Python. For
others will fallback to the current idiom.
Would something like this be useful as an addition to the stdlib's
xmlrpc module?
If someone wants to test it, here's a rough patch against tip:
https://gist.github.com/PCManticore/cf82ab421d4dc5c7f6ff.
Thanks!
I find myself, fairly often, needing to perform bitwise operations
(rshift, lshift, and, or, xor) on arrays of bytes in python (both bytes
and bytearray). I can't think of any other reasonable use for these
operators. Is upstream Python interested in this kind of behavior by
default? At the least, it would make many algorithms very easy to read
and write.
Nathaniel
I was wondering what work is being done on Python to make it faster. I
understand that cpython is incrementally improved. I'm not sure, but I
think that pypy acceleration works by compiling a restricted set of Python.
And I think I heard something about Guido working on a different model for
accelerating Python. I apologize in advance that I didn't look into these
projects in a lot of detail. My number one dream about computer languages
is for me to be able to write in a language as easy as Python and have it
run as quickly as if it were written. I do believe that this is possible
(since in theory someone could look at my Python code and port it to C++).
Unfortunately, I don't have time to work on this goal, but I still wanted
to get feedback about some ideas I have about reaching this goal.
First, I don't think it's important for a "code block" (say, a small
section of code with less coupling to statements outside the block than to
within the block) to run quickly on its first iteration.
What I'm suggesting instead is for every iteration of a "code block", the
runtime stochastically decides whether to collect statistics about that
iteration. Those statistics include the the time running the block, the
time perform attribute accesses including type method lookups and so on.
Basically, the runtime is trying to guess the potential savings of
optimizing this block.
If the block is run many times and the potential savings are large, then
stochastically again, the block is promoted to a second-level statistics
collection. This level collects statistics about all of the external
couplings of the block, like the types and values of the passed-in and
returned values.
Using the second-level statistics, the runtime can now guess whether the
block should be promoted to a third level whereby any consistencies are
exploited. For example, if the passed-in parameter types and return value
type of the "min" function are (int, int, int) for 40% of the statistics
and (float, float, float) for 59%, and other types for the remaining 1%,
then two precompiled versions of min are generated: one for int and one for
float.
These precompiled code blocks have different costs than regular Python
blocks. They need to pay the following costs:
* a check for the required invariants (parameter types above, but it could
be parameter values, or other invariants)
* they need to install hooks on objects that must remain invariant during
the execution of the block; if the invariants are ever violated during the
execution of the block, then all of the computations done during this
execution of the block must be discarded
* therefore a third cost is the probability of discarded the computation
times the average cost of the doing the wasted computation.
The saving is that the code block
* can be transformed into a faster bytecode, which includes straight
assembly instructions in some sections since types or values can now be
assumed,
* can use data structures that make type or access assumptions (for example
a list that always contains ints can use a flattened representation; a
large set that is repeatedly having membership checked with many negative
results might benefit from an auxiliary bloom filter, etc.)
In summary the runtime performs stochastic, incremental promotion of code
blocks from first-level, to second-level, to multiple precompiled versions.
It can also demote a code block. The difference between the costs of the
different levels is statistically estimated.
Examples of optimizations that can be commonly accomplished using such a
system are:
* global variables are folded into code as constants. (Even if they change
rarely, you pay the discarding penalty described above plus the
recompilation cost; the benefit of inline use of the constant (and any
constant folding) might outweigh these costs.)
* lookup of member functions, which almost never change
* flattening of homogeneously-typed lists
Best,
Neil
At PyCon earlier this year, Guido (and others) persuaded me that the
integer based indexing and iteration for bytes and bytearray in Python
3 was a genuine design mistake based on the initial Python 3 design
which lacked an immutable bytes type entirely (so producing integers
was originally the only reasonable choice).
The earlier design discussions around PEP 467 (which proposes to clean
up a few other bits and pieces of that original legacy which PEP 3137
left in place) all treated "bytes indexing returns an integer" as an
unchangeable aspect of Python 3, since there wasn't an obvious way to
migrate to instead returning length 1 bytes objects with a reasonable
story to handle the incompatibility for Python 3 users, even if
everyone was in favour of the end result.
A few weeks ago I had an idea for a migration strategy that seemed
feasible, and I now have a very, very preliminary proof of concept up
at https://bitbucket.org/ncoghlan/cpython_sandbox/branch/bytes_migration_exper…
The general principle involved would be to return an integer *subtype*
from indexing and iteration operations on bytes, bytearray and
memoryview objects using the "default" format character. That subtype
would then be detected in various locations and handled the way a
length 1 bytes object would be handled, rather than the way an integer
would be handled. The current proof of concept adds such handling to
ord(), bytes() and bytearray() (with appropriate test cases in
test_bytes) giving the following results:
>>> b'hello'[0]
104
>>> ord(b'hello'[0])
104
>>> bytes(b'hello'[0])
b'h'
>>> bytearray(b'hello'[0])
bytearray(b'h')
(the subtype is currently visible at the Python level as "types._BytesInt")
The proof of concept doesn't override any normal integer behaviour,
but a more complete solution would be in a position to emit a warning
when the result of binary indexing is used as an integer (either
always, or controlled by a command line switch, depending on the
performance impact).
With this integer subtype in place for Python 3.5 to provide a
transition period where both existing integer-compatible operations
(like int() and arithmetic operations) and selected bytes-compatible
operations (like ord(), bytes() and bytearray()) are supported, these
operations could then be switched to producing a normal length 1 bytes
object in Python 3.6.
It wouldn't be pretty, and it would be a pain to document, but it
seems feasible. The alternative is for PEP 367 to add a separate bytes
iteration method, which strikes me as further entrenching a design we
aren't currently happy with.
Regards,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
I've seen this proposed before, and I personally would love this, but my
guess is that it breaks too much code for too little gain.
On Wednesday, May 21, 2014 12:33:30 PM UTC-4, Frédéric Legembre wrote:
>
>
> Now | Future |
> ----------------------------------------------------
> () | () | empty tuple ( 1, 2, 3 )
> [] | [] | empty list [ 1, 2, 3 ]
> set() | {} | empty set { 1, 2, 3 }
> {} | {:} | empty dict { 1:a, 2:b, 3:c }
>
>
I suggest implementing:
- `itertools.permutations.__getitem__`, for getting a permutation by its
index number, and possibly also slicing, and
- `itertools.permutations.index` for getting the index number of a given
permutation.
What do you think?
Thanks,
Ram.
** The problem
A long-standing problem with CPython is that the peephole optimizer
cannot be completely disabled. Normally, peephole optimization is a
good thing, it improves execution speed. But in some situations, like
coverage testing, it's more important to be able to reason about the
code's execution. I propose that we add a way to completely disable the
optimizer.
To demonstrate the problem, here is continue.py:
a = b = c = 0
for n in range(100):
if n % 2:
if n % 4:
a += 1
continue
else:
b += 1
c += 1
assert a == 50 and b == 50 and c == 50
If you execute "python3.4 -m trace -c -m continue.py", it produces this
continue.cover file:
1: a = b = c = 0
101: for n in range(100):
100: if n % 2:
50: if n % 4:
50: a += 1
>>>>>> continue
else:
50: b += 1
50: c += 1
1: assert a == 50 and b == 50 and c == 50
This indicates that the continue line is not executed. It's true: the
byte code for that statement is not executed, because the peephole
optimizer has removed the jump to the jump. But in reasoning about the
code, the continue statement is clearly part of the semantics of this
program. If you remove the statement, the program will run
differently. If you had to explain this code to a learner, you would of
course describe the continue statement as part of the execution. So the
trace output does not match our (correct) understanding of the program.
The reason we are running trace (or coverage.py) in the first place is
to learn something about our code, but it is misleading us. The peephole
optimizer is interfering with our ability to reason about the code. We
need a way to disable the optimizer so that this won't happen. This
type of control is well-known in C compilers, for the same reasons: when
running code, optimization is good for speed; when reasoning about code,
optimization gets in the way.
More details are in http://bugs.python.org/issue2506, which also
includes previous discussion of the idea.
This has come up on Python-Dev, and Guido seemed supportive:
https://mail.python.org/pipermail/python-dev/2012-December/123099.html .
** Implementation
Although it may seem like a big change to be able to disable the
optimizer, the heart of it is quite simple. In compile.c is the only
call to PyCode_Optimize. That function takes a string of bytecode and
returns another. If we skip that call, the peephole optimizer is disabled.
** User Interface
Unfortunately, the -O command-line switch does not lend itself to a new
value that means, "less optimization than the default." I propose a new
switch -P, to control the peephole optimizer, with a value of -P0
meaning no optimization at all. The PYTHONPEEPHOLE environment variable
would also control the option.
There are about a dozen places internal to CPython where optimization
level is indicated with an integer, for example, in
Py_CompileStringObject. Those uses also don't allow for new values
indicating less optimization than the default: 0 and -1 already have
meanings. Unless we want to start using -2 for less that the default.
I'm not sure we need to provide for those values, or if the
PYTHONPEEPHOLE environment variable provides enough control.
** Ramifications
This switch makes no changes to the semantics of Python programs,
although clearly, if you are tracing a program, the exact sequence of
lines and bytecodes will be different (this is the whole point).
In the ticket, one objection raised is that providing this option will
complicate testing, and that optimization is a difficult enough thing to
get right as it is. I disagree, I think providing this option will help
test the optimizer, because it will give us a way to test that code runs
the same with and without the optimizer. This gives us a tool to use to
demonstrate that the optimizer isn't changing the behavior of programs.