While I was implementing JSON-JWS (JSON web signatures), a format
which in Python 3 has to go from bytes > unicode > bytes > unicode
several times in its construction, I notice I wrote a lot of bugs:
"sha256=b'abcdef1234'"
When I meant to say:
"sha256=abcdef1234"
Everything worked perfectly on Python 3 because the verifying code
also generated the sha256=b'abcdef1234' as a comparison. I would have
never noticed at all unless I had tried to verify the Python 3 output
with Python 2.
I know I'm a bad person for not having unit tests capable enough to
catch this bug, a bug I wrote repeatedly in each layer of the bytes >
unicode > bytes > unicode dance, and that there is no excuse for being
confused at any time about the type of a variable, but I'm not willing
to reform.
Instead, I would like a new string formatting operator tentatively
called 'notbytes': "sha256=%notbytes" % (b'abcdef1234'). It gives the
same error as 'sha256='+b'abc1234' would: TypeError: Can't convert
'bytes' object to str implictly
Just an idea of usability fix for Python 3.
hexdump module (function or bytes method is better) as simple, easy
and intuitive way for dumping binary data when writing programs in
Python.
hexdump(bytes) - produce human readable dump of binary data,
byte-by-byte representation, separated by space, 16-byte rows
Rationale:
1. Debug.
Generic binary data can't be output to console. A separate helper
is needed to print, log or store its value in human readable format in
database. This takes time.
2. Usability.
binascii is ugly: name is not intuitive any more, there are a lot
of functions, and it is not clear how it relates to unicode.
3. Serialization.
It is convenient to have format that can be displayed in a text
editor. Simple tools encourage people to use them.
Practical example:
>>> print(b)
� � � �� �� � �� �� �
� � �
>>> b
'\xe6\xb0\x08\x04\xe7\x9e\x08\x04\xe7\xbc\x08\x04\xe7\xd5\x08\x04\xe7\xe4\x08\x04\xe6\xb0\x08\x04\xe7\xf0\x08\x04\xe7\xff\x08\x04\xe8\x0b\x08\x04\xe8\x1a\x08\x04\xe6\xb0\x08\x04\xe6\xb0\x08\x04'
>>> print(binascii.hexlify(data))
e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804
>>>
>>> data = hexdump(b)
>>> print(data)
E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04
E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04
E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
>>>
>>> # achieving the same output with binascii is overcomplicated
>>> data_lines = [binascii.hexlify(b)[i:min(i+32, len(binascii.hexlify(b)))] for i in xrange(0, len(binascii.hexlify(b)), 32)]
>>> data_lines = [' '.join(l[i:min(i+2, len(l))] for i in xrange(0, len(l), 2)).upper() for l in data_lines]
>>> print('\n'.join(data_lines))
E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04
E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04
E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
On the other side, getting rather useless binascii output from
hexdump() is quite trivial:
>>> data.replace(' ','').replace('\n','').lower()
'e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804'
But more practical, for example, would be counting offset from hexdump:
>>> print( ''.join( '%05x: %s\n' % (i*16,l) for i,l in enumerate(hexdump(b).split('\n'))))
Etc.
Conclusion:
By providing better building blocks on basic level Python will become
a better tool for more useful tasks.
References:
[1] http://stackoverflow.com/questions/2340319/python-3-1-1-string-to-hex
[2] http://en.wikipedia.org/wiki/Hex_dump
--
anatoly t.
I'd like to propose adding the ability for context managers to catch and
handle control passing into and out of them via yield and generator.send()
/ generator.next().
For instance,
class cd(object):
def __init__(self, path):
self.inner_path = path
def __enter__(self):
self.outer_path = os.getcwd()
os.chdir(self.inner_path)
def __exit__(self, exc_type, exc_val, exc_tb):
os.chdir(self.outer_path)
def __yield__(self):
self.inner_path = os.getcwd()
os.chdir(self.outer_path)
def __send__(self):
self.outer_path = os.getcwd()
os.chdir(self.inner_path)
Here __yield__() would be called when control is yielded through the with
block and __send__() would be called when control is returned via .send()
or .next(). To maintain compatibility, it would not be an error to leave
either __yield__ or __send__ undefined.
The rationale for this is that it's sometimes useful for a context manager
to set global or thread-global state as in the example above, but when the
code is used in a generator, the author of the generator needs to make
assumptions about what the calling code is doing. e.g.
def my_generator(path):
with cd(path):
yield do_something()
do_something_else()
Even if the author of this generator knows what effect do_something() and
do_something_else() have on the current working directory, the author needs
to assume that the caller of the generator isn't touching the working
directory. For instance, if someone were to create two my_generator()
generators with different paths and advance them alternately, the resulting
behaviour could be most unexpected. With the proposed change, the context
manager would be able to handle this so that the author of the generator
doesn't need to make these assumptions.
Naturally, nested with blocks would be handled by calling __yield__ from
innermost to outermost and __send__ from outermost to innermost.
I rather suspect that if this change were included, someone could come up
with a variant of the contextlib.contextmanager decorator to simplify
writing generators for this sort of situation.
Cheers,
J. D. Bartlett
I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
>>> 1/0
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
André
Greg Ewing wrote:
> Mark Shannon wrote:
>
>> Why not have proper co-routines, instead of hacked-up generators?
>
> What do you mean by a "proper coroutine"?
>
A parallel, non-concurrent, thread of execution.
It should be able to transfer control from arbitrary places in
execution, not within generators.
Stackless provides coroutines. Greenlets are also coroutines (I think).
Lua has them, and is implemented in ANSI C, so it can be done portably.
See: http://www.jucs.org/jucs_10_7/coroutines_in_lua/de_moura_a_l.pdf
(One of the examples in the paper uses coroutines to implement
generators, which is obviously not required in Python :) )
Cheers,
Mark.
This one is practical. I am looking at NaCl SDK download page:
https://developers.google.com/native-client/sdk/download
"you need Python installed", "download SDK update utility"
What makes me sad that update utility is a Python script in a zip file
- nacl_sdk.zip
which includes shell script and a .bat file for launching this Python script.
This makes me kind of sad. You have Python installed. Why can't you
just crossplatformly do:
mkdir nacl
cd nacl
python -m urllib get
http://commondatastorage.googleapis.com/nativeclient-mirror/nacl/nacl_sdk...
python update_sdk.py
Here's an updated version of the PEP reflecting my
recent suggestions on how to eliminate 'codef'.
PEP: XXX
Title: Cofunctions
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing(a)canterbury.ac.nz>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Feb-2009
Python-Version: 3.x
Post-History:
Abstract
========
A syntax is proposed for defining and calling a special type of generator
called a 'cofunction'. It is designed to provide a streamlined way of
writing generator-based coroutines, and allow the early detection of
certain kinds of error that are easily made when writing such code, which
otherwise tend to cause hard-to-diagnose symptoms.
This proposal builds on the 'yield from' mechanism described in PEP 380,
and describes some of the semantics of cofunctions in terms of it. However,
it would be possible to define and implement cofunctions independently of
PEP 380 if so desired.
Specification
=============
Cofunction definitions
----------------------
A cofunction is a special kind of generator, distinguished by the presence
of the keyword ``cocall`` (defined below) at least once in its body. It may
also contain ``yield`` and/or ``yield from`` expressions, which behave as
they do in other generators.
From the outside, the distinguishing feature of a cofunction is that it cannot
be called the same way as an ordinary function. An exception is raised if an
ordinary call to a cofunction is attempted.
Cocalls
-------
Calls from one cofunction to another are made by marking the call with
a new keyword ``cocall``. The expression
::
cocall f(*args, **kwds)
is evaluated by first checking whether the object ``f`` implements
a ``__cocall__`` method. If it does, the cocall expression is
equivalent to
::
yield from f.__cocall__(*args, **kwds)
except that the object returned by __cocall__ is expected to be an
iterator, so the step of calling iter() on it is skipped.
If ``f`` does not have a ``__cocall__`` method, or the ``__cocall__``
method returns ``NotImplemented``, then the cocall expression is
treated as an ordinary call, and the ``__call__`` method of ``f``
is invoked.
Objects which implement __cocall__ are expected to return an object
obeying the iterator protocol. Cofunctions respond to __cocall__ the
same way as ordinary generator functions respond to __call__, i.e. by
returning a generator-iterator.
Certain objects that wrap other callable objects, notably bound methods,
will be given __cocall__ implementations that delegate to the underlying
object.
Grammar
-------
The full syntax of a cocall expression is described by the following
grammar lines:
::
atom: cocall | <existing alternatives for atom>
cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
cotrailer: '[' subscriptlist ']' | '.' NAME
Note that this syntax allows cocalls to methods and elements of sequences
or mappings to be expressed naturally. For example, the following are valid:
::
y = cocall self.foo(x)
y = cocall funcdict[key](x)
y = cocall a.b.c[i].d(x)
Also note that the final calling parentheses are mandatory, so that for example
the following is invalid syntax:
::
y = cocall f # INVALID
New builtins, attributes and C API functions
--------------------------------------------
To facilitate interfacing cofunctions with non-coroutine code, there will
be a built-in function ``costart`` whose definition is equivalent to
::
def costart(obj, *args, **kwds):
try:
m = obj.__cocall__
except AttributeError:
result = NotImplemented
else:
result = m(*args, **kwds)
if result is NotImplemented:
raise TypeError("Object does not support cocall")
return result
There will also be a corresponding C API function
::
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
It is left unspecified for now whether a cofunction is a distinct type
of object or, like a generator function, is simply a specially-marked
function instance. If the latter, a read-only boolean attribute
``__iscofunction__`` should be provided to allow testing whether a given
function object is a cofunction.
Motivation and Rationale
========================
The ``yield from`` syntax is reasonably self-explanatory when used for the
purpose of delegating part of the work of a generator to another function. It
can also be used to good effect in the implementation of generator-based
coroutines, but it reads somewhat awkwardly when used for that purpose, and
tends to obscure the true intent of the code.
Furthermore, using generators as coroutines is somewhat error-prone. If one
forgets to use ``yield from`` when it should have been used, or uses it when it
shouldn't have, the symptoms that result can be extremely obscure and confusing.
Finally, sometimes there is a need for a function to be a coroutine even though
it does not yield anything, and in these cases it is necessary to resort to
kludges such as ``if 0: yield`` to force it to be a generator.
The ``cocall`` construct address the first issue by making the syntax directly
reflect the intent, that is, that the function being called forms part of a
coroutine.
The second issue is addressed by making it impossible to mix coroutine and
non-coroutine code in ways that don't make sense. If the rules are violated, an
exception is raised that points out exactly what and where the problem is.
Lastly, the need for dummy yields is eliminated by making it possible for a
cofunction to call both cofunctions and ordinary functions with the same syntax,
so that an ordinary function can be used in place of a cofunction that yields
zero times.
Record of Discussion
====================
An earlier version of this proposal required a special keyword ``codef`` to be
used in place of ``def`` when defining a cofunction, and disallowed calling an
ordinary function using ``cocall``. However, it became evident that these
features were not necessary, and the ``codef`` keyword was dropped in the
interests of minimising the number of new keywords required.
The use of a decorator instead of ``codef`` was also suggested, but the current
proposal makes this unnecessary as well.
It has been questioned whether some combination of decorators and functions
could be used instead of a dedicated ``cocall`` syntax. While this might be
possible, to achieve equivalent error-detecting power it would be necessary
to write cofunction calls as something like
::
yield from cocall(f)(args)
making them even more verbose and inelegant than an unadorned ``yield from``.
It is also not clear whether it is possible to achieve all of the benefits of
the cocall syntax using this kind of approach.
Prototype Implementation
========================
An implementation of an earlier version of this proposal in the form of patches
to Python 3.1.2 can be found here:
http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/cofunctions...
If this version of the proposal is received favourably, the implementation will
be updated to match.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
Today pypy and CPython's "setup.py bdist" generate the same filename
but incompatible bdists. This makes it difficult to share both bdists
in the same folder or index. Instead, they should generate different
bdist filenames because one won't work with the other implementation.
This PEP specifies a tagging system that includes enough information
to decide whether a particular bdist is expected to work on a
particular Python.
Also at https://bitbucket.org/dholth/python-peps/raw/98cd36228c2e/pep-CTAG.txt
Thanks for your feedback,
Daniel Holth
Hello,
We have been discussing the value of having namedtuple as the return
type for urlparse.urlparse and urlparse.urlsplit. See that thread
here: http://bugs.python.org/issue15824 . I jumped the gun and
submitted a patch without seeing if anyone else thought different
behavior was desirable. My argument is that it would be a major
usability improvement if the return type supported item assignment.
Currently, something like the following is necessary in order to
parse, make changes, and unparse:
import urlparse
url = list(urlparse.urlparse('http://www.example.com/foo/bar?hehe=haha'))
url[1] = 'python.com'
new_url = urllib.urlunparse(url)
I think this is really clunky. I don't see any reason why we should be
using a type that doesn't support item assignment and needs to be
casted to a another type in order to make changes. I think an
interface like this is more useful:
import urlparse
url = urlparse.urlparse('http://www.example.com/foo/bar?hehe=haha')
url.netloc = 'www.python.com'
urlparse.urlunparse(url)
What do other people think?
--
-Ben Toews
Now that Python 3 is all about iterators (which is a user killer
feature for Python according to StackOverflow -
http://stackoverflow.com/questions/tagged/python) would it be nice to
introduce more first class functions to work with them? One function
to be exact to split string into chunks.
itertools.chunks(iterable, size, fill=None)
Which is the 33th most voted Python question on SO -
http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-ev...
P.S. CC'ing to python-dev@ to notify about the thread in python-ideas.