I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
>>> 1/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
>>> 1/0
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
André
Here's an updated version of the PEP reflecting my
recent suggestions on how to eliminate 'codef'.
PEP: XXX
Title: Cofunctions
Version: $Revision$
Last-Modified: $Date$
Author: Gregory Ewing <greg.ewing(a)canterbury.ac.nz>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Feb-2009
Python-Version: 3.x
Post-History:
Abstract
========
A syntax is proposed for defining and calling a special type of generator
called a 'cofunction'. It is designed to provide a streamlined way of
writing generator-based coroutines, and allow the early detection of
certain kinds of error that are easily made when writing such code, which
otherwise tend to cause hard-to-diagnose symptoms.
This proposal builds on the 'yield from' mechanism described in PEP 380,
and describes some of the semantics of cofunctions in terms of it. However,
it would be possible to define and implement cofunctions independently of
PEP 380 if so desired.
Specification
=============
Cofunction definitions
----------------------
A cofunction is a special kind of generator, distinguished by the presence
of the keyword ``cocall`` (defined below) at least once in its body. It may
also contain ``yield`` and/or ``yield from`` expressions, which behave as
they do in other generators.
From the outside, the distinguishing feature of a cofunction is that it cannot
be called the same way as an ordinary function. An exception is raised if an
ordinary call to a cofunction is attempted.
Cocalls
-------
Calls from one cofunction to another are made by marking the call with
a new keyword ``cocall``. The expression
::
cocall f(*args, **kwds)
is evaluated by first checking whether the object ``f`` implements
a ``__cocall__`` method. If it does, the cocall expression is
equivalent to
::
yield from f.__cocall__(*args, **kwds)
except that the object returned by __cocall__ is expected to be an
iterator, so the step of calling iter() on it is skipped.
If ``f`` does not have a ``__cocall__`` method, or the ``__cocall__``
method returns ``NotImplemented``, then the cocall expression is
treated as an ordinary call, and the ``__call__`` method of ``f``
is invoked.
Objects which implement __cocall__ are expected to return an object
obeying the iterator protocol. Cofunctions respond to __cocall__ the
same way as ordinary generator functions respond to __call__, i.e. by
returning a generator-iterator.
Certain objects that wrap other callable objects, notably bound methods,
will be given __cocall__ implementations that delegate to the underlying
object.
Grammar
-------
The full syntax of a cocall expression is described by the following
grammar lines:
::
atom: cocall | <existing alternatives for atom>
cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
cotrailer: '[' subscriptlist ']' | '.' NAME
Note that this syntax allows cocalls to methods and elements of sequences
or mappings to be expressed naturally. For example, the following are valid:
::
y = cocall self.foo(x)
y = cocall funcdict[key](x)
y = cocall a.b.c[i].d(x)
Also note that the final calling parentheses are mandatory, so that for example
the following is invalid syntax:
::
y = cocall f # INVALID
New builtins, attributes and C API functions
--------------------------------------------
To facilitate interfacing cofunctions with non-coroutine code, there will
be a built-in function ``costart`` whose definition is equivalent to
::
def costart(obj, *args, **kwds):
try:
m = obj.__cocall__
except AttributeError:
result = NotImplemented
else:
result = m(*args, **kwds)
if result is NotImplemented:
raise TypeError("Object does not support cocall")
return result
There will also be a corresponding C API function
::
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
It is left unspecified for now whether a cofunction is a distinct type
of object or, like a generator function, is simply a specially-marked
function instance. If the latter, a read-only boolean attribute
``__iscofunction__`` should be provided to allow testing whether a given
function object is a cofunction.
Motivation and Rationale
========================
The ``yield from`` syntax is reasonably self-explanatory when used for the
purpose of delegating part of the work of a generator to another function. It
can also be used to good effect in the implementation of generator-based
coroutines, but it reads somewhat awkwardly when used for that purpose, and
tends to obscure the true intent of the code.
Furthermore, using generators as coroutines is somewhat error-prone. If one
forgets to use ``yield from`` when it should have been used, or uses it when it
shouldn't have, the symptoms that result can be extremely obscure and confusing.
Finally, sometimes there is a need for a function to be a coroutine even though
it does not yield anything, and in these cases it is necessary to resort to
kludges such as ``if 0: yield`` to force it to be a generator.
The ``cocall`` construct address the first issue by making the syntax directly
reflect the intent, that is, that the function being called forms part of a
coroutine.
The second issue is addressed by making it impossible to mix coroutine and
non-coroutine code in ways that don't make sense. If the rules are violated, an
exception is raised that points out exactly what and where the problem is.
Lastly, the need for dummy yields is eliminated by making it possible for a
cofunction to call both cofunctions and ordinary functions with the same syntax,
so that an ordinary function can be used in place of a cofunction that yields
zero times.
Record of Discussion
====================
An earlier version of this proposal required a special keyword ``codef`` to be
used in place of ``def`` when defining a cofunction, and disallowed calling an
ordinary function using ``cocall``. However, it became evident that these
features were not necessary, and the ``codef`` keyword was dropped in the
interests of minimising the number of new keywords required.
The use of a decorator instead of ``codef`` was also suggested, but the current
proposal makes this unnecessary as well.
It has been questioned whether some combination of decorators and functions
could be used instead of a dedicated ``cocall`` syntax. While this might be
possible, to achieve equivalent error-detecting power it would be necessary
to write cofunction calls as something like
::
yield from cocall(f)(args)
making them even more verbose and inelegant than an unadorned ``yield from``.
It is also not clear whether it is possible to achieve all of the benefits of
the cocall syntax using this kind of approach.
Prototype Implementation
========================
An implementation of an earlier version of this proposal in the form of patches
to Python 3.1.2 can be found here:
http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/cofunctions...
If this version of the proposal is received favourably, the implementation will
be updated to match.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
One of the use cases for named tuples is to have them be automatically created from a SQL query or CSV header. Sometimes (but not often), those can have a huge number of columns. In Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. In Python 3.x, there is a SyntaxError when there are more than 255 fields.
The origin of the change was a hack to fit positional argument counts and keyword-only argument counts in a single oparg in the python opcode encoding.
ISTM, this is an implementation specific hack and there is no reason that other implementations would have the same restriction (unless their starting point is Python's bytecode).
The good news is that long argument lists are uncommon. They probably only arise in cases with dynamically created functions and classes. Most people are unaffected.
The bad news is that an implementation detail has become visible and added a language restriction. The 255 limit seems weird to me in a version of Python that has gone to lengths to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users.
Is there any support here for trying to get smarter about the keyword-only argument implementation? The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram. If the new restriction isn't necessary, it would be great to remove it.
Raymond
Hello,
(moved to python-ideas)
On Mon, 27 Sep 2010 17:39:45 -0700
Guido van Rossum <guido(a)python.org> wrote:
> On Mon, Sep 27, 2010 at 3:41 PM, Antoine Pitrou <solipsis(a)pitrou.net> wrote:
> > While trying to solve #3873 (poor performance of pickle on file
> > objects, due to the overhead of calling read() with very small values),
> > it occurred to me that the prefetching facilities offered by
> > BufferedIOBase are not flexible and efficient enough.
>
> I haven't read the whole bug but there seem to be lots of different
> smaller issues there, right?
The bug entry is quite old and at first the slowness had to do with the
pure Python IO layer. Now the remaining performance difference with
Python 2 is entirely caused by the following core issue:
> It seems that one (unfortunate)
> constraint is that reading pickles cannot use buffered I/O (at least
> not on a non-seekable file) because the API has been documented to
> leave the file positioned right after the last byte of the pickled
> data, right?
Right.
> > Indeed, if you use seek() and read(), 1) you limit yourself to seekable
> > files 2) performance can be hampered by very bad seek() performance
> > (this is true on GzipFile).
>
> Ow... I've always assumed that seek() is essentially free, because
> that's how a typical OS kernel implements it. If seek() is bad on
> GzipFile, how hard would it be to fix this?
The worst case is backwards seeks. Forward seeks are implemented as a
simply read(), which makes them O(k) where k is the displacement. For
buffering applications where k is bounded by the buffer size, it is
O(1) (still with, of course, a non-trivial multiplier).
Backwards seeks are implemented as rewinding the whole file (seek(0))
and then reading again up to the requested position, which makes them
O(n) with n the absolute target position. When your requirement is to
rewind by a bounded number of bytes in order to undo some readahead,
this is rather catastrophic.
I don't know how the gzip algorithm works under the hood; my impression
is that optimizing backwards seeks would have us save us checkpoints of
the decompressor state and restore it if needed. It doesn't sound like a
trivial improvement, and would involve tradeoffs w.r.t. to
performance of sequential reads.
(I haven't looked at BZ2File, which has a totally different -- and
outdated -- implementation)
It's why I would favour the peek() (or peek()-like, as in the prefetch()
idea) approach anyway. Not only it works on unseekable files, but
implementing peek() when you have an internal buffer is quite simple
(see GzipFile.peek here: http://bugs.python.org/issue9962).
peek() could also be added to BytesIO even though it claims to
implement RawIOBase rather than BufferedIOBase.
(buf of course, when you have a BytesIO, you can simply feed its
getvalue() or getbuffer() directly to pickle.loads)
> How common is the use case where you need to read a gzipped pickle
> *and* you need to leave the unzipped stream positioned exactly at the
> end of the pickle?
I really don't know. But I don't think we can break the API for a
special case without potentially causing nasty surprises for the user.
Also, my intuition is that pickling directly from a stream is partly
meant for cases where you want to access data following the pickle
data in the stream.
> > If instead you use peek() and read(), the situation is better, but you
> > end up doing multiple copies of data; also, you must call read() to
> > advance the file pointer even though you don't care about the results.
>
> Have you measured how bad the situation is if you do implement it this way?
It is actually quite good compared to the statu quo (3x to 10x), and as
good as the seek/read solution for regular files (and, of course, much
better for gzipped files once GzipFile.peek is implemented):
http://bugs.python.org/issue3873#msg117483
So, for solving the unpickle performance issue, it is sufficient.
Chances are the bottleneck for further improvements would be in the
unpickling logic itself. It feels a bit clunky, though.
Direct timing shows that peek()+read() has a non-trivial cost compared
to read():
$ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
"while f.read(4096): pass"
1000 loops, best of 3: 277 usec per loop
$ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
"while f.read(4096): f.peek(4096)"
1000 loops, best of 3: 361 usec per loop
(that's on a C extension type where peek() is almost a single call to
PyBytes_FromStringAndSize)
> > So I would propose adding the following method to BufferedIOBase:
> >
> > prefetch(self, buffer, skip, minread)
> >
> > Skip `skip` bytes from the stream. Then, try to read at
> > least `minread` bytes and write them into `buffer`. The file
> > pointer is advanced by at most `skip + minread`, or less if
> > the end of file was reached. The total number of bytes written
> > in `buffer` is returned, which can be more than `minread`
> > if additional bytes could be prefetched (but, of course,
> > cannot be more than `len(buffer)`).
> >
> > Arguments:
> > - `buffer`: a writable buffer (e.g. bytearray)
> > - `skip`: number of bytes to skip (must be >= 0)
> > - `minread`: number of bytes to read (must be >= 0 and <= len(buffer))
>
> I like the idea of an API that combines seek and read into a mutable
> buffer. However the semantics of this call seem really weird: there is
> no direct relationship between where it leaves the stream position and
> how much data it reads into the buffer. can you explain how exactly
> this will help solve the gzipped pickle performance problem?
The general idea with buffering is that:
- you want to skip the previously prefetched bytes (through peek()
or prefetch()) which have been consumed -> hence the `skip` argument
- you want to consume a known number of bytes from the stream (for
example a 4-bytes little-endian integer) -> hence the `minread`
argument
- you would like to prefetch some more bytes if cheaply possible, so as
to avoid calling read() or prefetch() too much; but you don't know
yet if you will consume those bytes, so the file pointer shouldn't be
advanced for them
If you don't prefetch more than the minimum needed amount of bytes, you
don't solve the performance problem at all (unpickling needs many tiny
reads). If you advance the file pointer after the whole prefetched data
(even though it might not be entirely consumed), you need to seek()
back at the end: it doesn't work on unseekable files, and is very slow
on some seekable file types.
So, the proposal is like a combination of forward seek() + read() +
peek() in a single call. With the advantages that:
- it works on non-seekable files (things like SocketIO)
- it allows the caller to operate in its own buffer (this is nice in C)
- it returns the data naturally concatenated, so you don't have to do
it yourself if needed
- it gives more guarantees than peek() as to the min and max number of
bytes returned; peek(), as it is not allowed to advance the file
pointer, can return as little as 1 byte (even if you ask for 4096,
and even if EOF isn't reached)
I also find it interesting that implementing a single primitive be
enough for creating custom buffered types (by deriving other methods
from it), but the aesthetics of this can be controversial.
Regards
Antoine.
On Mon, Sep 27, 2010 at 5:41 PM, Antoine Pitrou <solipsis(a)pitrou.net> wrote:
> While trying to solve #3873 (poor performance of pickle on file
> objects, due to the overhead of calling read() with very small values),
>
After looking over the relevant code, it looks to me like the overhead of
calling the read() method compared to calling fread() in Python 2 is the
overhead of calling PyObject_Call along with the construction of argument
tuples and deconstruction of the return value. I don't think the extra
interface would benefit code written in Python as much. Even if Python
code gets the data into a buffer more easily, it's going to pay those costs
to manipulate the buffered data. It would mostly help modules written in C,
such as pickle, which right now are heavily bottlenecked getting the data
into a buffer.
Comparing the C code for Python 2's cPickle and Python 3's pickle, I see
that Python 2 has paths for unpickling from a FILE *, cStringIO, and
"other". Python effectively only has a code path for "other", so it's not
surprising that it's slower. In the worst case, I am sure that if we
re-added specialized code paths that we could make it just as fast as Python
2, although that would make the code messy.
Some ideas:
- Use readinto() instead of read(), to avoid extra allocations/deallocations
- But first, fix bufferediobase_readinto() so it doesn't work by calling the
read() method and/or follow up on the TODO in buffered_readinto()
If you want a new API, I think a new C API for I/O objects with C-friendly
arguments would be better than a new Python-level API.
In a nutshell, if you feel the need to make a buffer around BufferedReader,
then I agree there's a problem, but I don't think helping you make a buffer
around BufferedReader is the right solution. ;-)
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
Hello,
multiline string
By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well:
-1- no need for a separate multiline string notation
A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote.
No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this?
-2- trimming of indentation
On my computer, calling the following function:
def write():
if True:
print """To be or not to be,
that is the question."""
results in the following output:
|To be or not to be,
| that is the question.
This is certainly not the programmer's intent. To get what is expected, one should write instead:
def write():
if True:
print """To be or not to be,
that is the question."""
...which distorts the visual presentation of code by breaking correct indentation.
To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like:
def write():
if True:
print "To be or not to be,\n" + \
"that is the question."
(Actually, the '+' can be here omitted, but this fact is not commonly known.)
My project uses a visual structure à la python (and no curly braces). Indentation is removed by the arser from the significant part of code even inside strings (and also comments). This allows the programmer preserving clean source outline, while having multiline text be simply written as is. In other words, the following routine would work as you guess (':' is assignment sign):
write : action
if true
terminal.write "To be or not to be,
that is the question."
I imagine the python parser replaces indentation by block-delimiting tokens (analog in role to C braces). My language's parser thus has a preprocessing phase that would transform the above piece of code above to:
write : action
{
if true
{
terminal.write "To be or not to be,
that is the question."
}
}
The preprocess routine is actually easier than it would be with python rules, since one can trim indents systematically, without any exception for strings (and comments).
Thank you for reading,
Denis
(*) namely WML, scripting language of the game called Wesnoth
(**) This is true for 1-pass parsers (like PEG), as well as for 2-pass ones (with separate lexical phase).
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com
Hi,
It would be really nice if elementary mathematical operations such as
sin/cosine (via __sin__ and __cos__) were available as base parts of
the python data model [0]. This would make it easier to write new math
classes, and it would eliminate the ugliness of things like self.exp().
This would also eliminate the need for separate math and cmath
libraries since those could be built into the default float and complex
types. Of course if those libs were removed, that would be a potential
backwards compatibility issue.
It would also help new users who just want to do math and don't know
that they need to import separate classes just for elementary math
functionality.
I think full coverage of the elementary function set would be the goal
(i.e. exp, sqrt, ln, trig, and hyperbolic functions). This would not
include special functions since that would be overkill, and they are
already handled well by scipy and numpy.
Anyway, just a thought.
Best wishes,
Mike
[0] http://docs.python.org/reference/datamodel.html
Guido van Rossum wrote:
> Maybe the API could be called os.path.unnormpath(), since it is in a
> sense the opposite of normpath() (which removes case) ?
Cute, but not very intuitive. Something like actualpath()
might be better -- although that's somewhat arbitrarily
different from realpath().
--
Greg
Ben Finney wrote:
> Your heuristics seem to assume there will only ever be a maximum of one
> match, which is false. I present the following example:
>
> $ ls foo/
> bAr.dat BaR.dat bar.DAT
There should perhaps be an extra step at the beginning:
0) Test whether the specified path refers to an existing
file. If not, raise an exception.
If that passes, and the file system is case-sensitive, then
there must be a directory entry that is an exact match, so
it will be returned by step 1.
If the file system is case-insensitive, then there can be
at most one entry that matches except for case, and it must
be the one we're looking for, so there is no need for the
extra test in step 2.
So the revised algorithm is:
0) Test whether the specified path refers to an existing
file. If not, raise an exception.
1) Search the directory for an exact match, return it if found.
2) Search for a match ignoring case, and return one if found.
3) Otherwise, raise an exception.
There's also some prior art that might be worth looking at:
On Windows, Python checks to see whether the file name of an
imported module has the same case as the name being imported,
which is a similar problem in some ways.
> It seems to me this whole thing should be hashed out on ‘python-ideas’.
Good point -- I've redirected the discussion there.
--
Greg
Hello,
ABC __subclasshook__ implementations will only check that the method
is present in the class. That's the case for example in
collections.Container. It will check that the __contains__ method is
present but that's it. It won't check that the method has only one
argument. e.g. __contains__(self, x)
The problem is that the implemented method could have a different list
of arguments and will eventually fail.
Using inspect, we could check in __subclasshook__ that the arguments
defined are the same than the ones defined in the abstractmethod.--
the name and the ordering.
I can even think of a small function in ABC for that:
same_signature(method1, method2) => True or False
Regards
Tarek
--
Tarek Ziadé | http://ziade.org