There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
While I was implementing JSON-JWS (JSON web signatures), a format
which in Python 3 has to go from bytes > unicode > bytes > unicode
several times in its construction, I notice I wrote a lot of bugs:
"sha256=b'abcdef1234'"
When I meant to say:
"sha256=abcdef1234"
Everything worked perfectly on Python 3 because the verifying code
also generated the sha256=b'abcdef1234' as a comparison. I would have
never noticed at all unless I had tried to verify the Python 3 output
with Python 2.
I know I'm a bad person for not having unit tests capable enough to
catch this bug, a bug I wrote repeatedly in each layer of the bytes >
unicode > bytes > unicode dance, and that there is no excuse for being
confused at any time about the type of a variable, but I'm not willing
to reform.
Instead, I would like a new string formatting operator tentatively
called 'notbytes': "sha256=%notbytes" % (b'abcdef1234'). It gives the
same error as 'sha256='+b'abc1234' would: TypeError: Can't convert
'bytes' object to str implictly
Just an idea of usability fix for Python 3.
hexdump module (function or bytes method is better) as simple, easy
and intuitive way for dumping binary data when writing programs in
Python.
hexdump(bytes) - produce human readable dump of binary data,
byte-by-byte representation, separated by space, 16-byte rows
Rationale:
1. Debug.
Generic binary data can't be output to console. A separate helper
is needed to print, log or store its value in human readable format in
database. This takes time.
2. Usability.
binascii is ugly: name is not intuitive any more, there are a lot
of functions, and it is not clear how it relates to unicode.
3. Serialization.
It is convenient to have format that can be displayed in a text
editor. Simple tools encourage people to use them.
Practical example:
>>> print(b)
� � � �� �� � �� �� �
� � �
>>> b
'\xe6\xb0\x08\x04\xe7\x9e\x08\x04\xe7\xbc\x08\x04\xe7\xd5\x08\x04\xe7\xe4\x08\x04\xe6\xb0\x08\x04\xe7\xf0\x08\x04\xe7\xff\x08\x04\xe8\x0b\x08\x04\xe8\x1a\x08\x04\xe6\xb0\x08\x04\xe6\xb0\x08\x04'
>>> print(binascii.hexlify(data))
e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804
>>>
>>> data = hexdump(b)
>>> print(data)
E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04
E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04
E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
>>>
>>> # achieving the same output with binascii is overcomplicated
>>> data_lines = [binascii.hexlify(b)[i:min(i+32, len(binascii.hexlify(b)))] for i in xrange(0, len(binascii.hexlify(b)), 32)]
>>> data_lines = [' '.join(l[i:min(i+2, len(l))] for i in xrange(0, len(l), 2)).upper() for l in data_lines]
>>> print('\n'.join(data_lines))
E6 B0 08 04 E7 9E 08 04 E7 BC 08 04 E7 D5 08 04
E7 E4 08 04 E6 B0 08 04 E7 F0 08 04 E7 FF 08 04
E8 0B 08 04 E8 1A 08 04 E6 B0 08 04 E6 B0 08 04
On the other side, getting rather useless binascii output from
hexdump() is quite trivial:
>>> data.replace(' ','').replace('\n','').lower()
'e6b00804e79e0804e7bc0804e7d50804e7e40804e6b00804e7f00804e7ff0804e80b0804e81a0804e6b00804e6b00804'
But more practical, for example, would be counting offset from hexdump:
>>> print( ''.join( '%05x: %s\n' % (i*16,l) for i,l in enumerate(hexdump(b).split('\n'))))
Etc.
Conclusion:
By providing better building blocks on basic level Python will become
a better tool for more useful tasks.
References:
[1] http://stackoverflow.com/questions/2340319/python-3-1-1-string-to-hex
[2] http://en.wikipedia.org/wiki/Hex_dump
--
anatoly t.
Work priorities don't allow me to spend another day replying in detail
to the various emails on this topic, but I am still keeping up
reading!
I have read Greg's response to my comparison between
Future+yield-based coroutines and his yield-from-based, Future-free
coroutines, and after having written a small prototype, I am now
pretty much convinced that Greg's way is superior. This doesn't mean
you can't use generators or yield-from for other purposes! It's just
that *if* you are writing a coroutine for use with a certain schedule,
you must use yield and yield-from in accordance to the scheduler's
rules. However, code you call can still use yield and yield-from for
iteration, and you can still use for-loops. In particular, if f is a
coroutine, it can still write "for x in g(): ..." where g is a
generator meant to be an iterator. However if g were instead a
coroutine, f should call it using "yield from g()", and f and g should
agree on the interface of their scheduler.
As to other topics, my current feeling is that we should try to
separately develop requirements and prototype implementations of the
I/O loop of the future, and to figure the loosest possible coupling
between that and a coroutine scheduler (or any other type of
scheduler). In particular, I think the I/O loop should not assume the
event handlers are implemented using coroutines -- but if someone
wants to write an awesome coroutine scheduler, they should be able to
delegate all their I/O waiting needs to the I/O loop with very little
trouble.
To me, this means that the I/O loop probably should use "plain"
callback functions (i.e., not Futures, Deferreds or coroutines). We
should also standardize the interface to the I/O loop so that 3rd
parties can plug in their own I/O loop -- I don't see an end to the
debate whether the best C library for event handling is libevent,
libev or libuv.
While the focus of the I/O loop should be on single-threaded event
handling, some standard interface should exist so that you can run
certain code in a separate thread and wait for its completion -- I've
found this handy when calling socket.getaddrinfo(), which may block.
(Apparently async DNS lookups are really hard -- I read some
complaints about libevent's DNS lookups, and IIUC many Firefox
lockups are due to this.) But there may be other uses for this too.
An issue in the design of the I/O loop is the strain between a
ready-based and completion-based design. The typical Unix design
(whether based on select or any of the poll variants) is usually
ready-based; but on Windows, the only way to get high performance is
to base it on IOCP, which is completion-based (i.e. you start a
specific async operation, like writing N bytes, and the I/O loop tells
you when it is done). I would like people to be able to write fast
event handling programs on Windows too, and ideally the only change
would be the implementation of the I/O loop. But I don't know how
tenable that is given the dramatically different style used by IOCP
and the need to use native Windows API for all async I/O -- it sounds
like we could only do this if the library providing the I/O loop
implementation also wrapped all I/O operations, andthat may be a bit
much.
Finally, there should also be some minimal interface so that multiple
I/O loops can interact -- at least in the case where one I/O loop
belongs to a GUI library. It seems this is a solved problem (as well
solved as you can hope for) to Twisted, so we should just adopt their
approach.
--
--Guido van Rossum (python.org/~guido)
I'd like to propose adding the ability for context managers to catch and
handle control passing into and out of them via yield and generator.send()
/ generator.next().
For instance,
class cd(object):
def __init__(self, path):
self.inner_path = path
def __enter__(self):
self.outer_path = os.getcwd()
os.chdir(self.inner_path)
def __exit__(self, exc_type, exc_val, exc_tb):
os.chdir(self.outer_path)
def __yield__(self):
self.inner_path = os.getcwd()
os.chdir(self.outer_path)
def __send__(self):
self.outer_path = os.getcwd()
os.chdir(self.inner_path)
Here __yield__() would be called when control is yielded through the with
block and __send__() would be called when control is returned via .send()
or .next(). To maintain compatibility, it would not be an error to leave
either __yield__ or __send__ undefined.
The rationale for this is that it's sometimes useful for a context manager
to set global or thread-global state as in the example above, but when the
code is used in a generator, the author of the generator needs to make
assumptions about what the calling code is doing. e.g.
def my_generator(path):
with cd(path):
yield do_something()
do_something_else()
Even if the author of this generator knows what effect do_something() and
do_something_else() have on the current working directory, the author needs
to assume that the caller of the generator isn't touching the working
directory. For instance, if someone were to create two my_generator()
generators with different paths and advance them alternately, the resulting
behaviour could be most unexpected. With the proposed change, the context
manager would be able to handle this so that the author of the generator
doesn't need to make these assumptions.
Naturally, nested with blocks would be handled by calling __yield__ from
innermost to outermost and __send__ from outermost to innermost.
I rather suspect that if this change were included, someone could come up
with a variant of the contextlib.contextmanager decorator to simplify
writing generators for this sort of situation.
Cheers,
J. D. Bartlett
Sometimes, I have the flexibility to reduce the memory used by my
program (e.g., by destroying large cached objects, etc.). It would be
great if I could ask Python interpreter to notify me when memory is
running out, so I can take such actions.
Of course, it's nearly impossible for Python to know in advance if the
OS would run out of memory with the next malloc call. Furthermore,
Python shouldn't guess which memory (physical, virtual, etc.) is
relevant in the particular situation (for instance, in my case, I only
care about physical memory, since swapping to disk makes my
application as good as frozen). So the problem as stated above is
unsolvable.
But let's say I am willing to do some work to estimate the maximum
amount of memory my application can be allowed to use. If I provide
that number to Python interpreter, it may be possible for it to notify
me when the next memory allocation would exceed this limit by calling
a function I provide it (hopefully passing as arguments the amount of
memory being requested, as well as the amount currently in use). My
callback function could then destroy some objects, and return True to
indicate that some objects were destroyed. At that point, the
intepreter could run its standard garbage collection routines to
release the memory that corresponded to those objects - before
proceeding with whatever it was trying to do originally. (If I
returned False, or if I didn't provide a callback function at all, the
interpreter would simply behave as it does today.) Any memory
allocations that happen while the callback function itself is
executing, would not trigger further calls to it. The whole mechanism
would be disabled for the rest of the session if the memory freed by
the callback function was insufficient to prevent going over the
memory limit.
Would this be worth considering for a future language extension? How
hard would it be to implement?
Max
Hi,
I have seen many people new to Python stumbling while using the Python
docs due to the order of the search results.
For example, if somebody new to python searches for `tuple`, the
actual section about `tuple` comes in place 39. What is more confusing
for people starting with the language is that all the C functions come
first. I have seen people clicking in PyTupleObject just to be totally
disoriented.
Maybe `tuple` is a silly example. But if somebody wants to know how
does `open` behaves and which arguments it takes, the result comes in
position 16. `property` does not appear in the list at all (but
built-in appears in position 31). This is true for most builtins.
Experienced people will have no trouble navigating through these
results, but new users do. It is not terrible and at the end they get
it, but I think it would be nice to change it to more (new) user
friendly order.
So my suggestion is to put the builtins first, the rest of the
standard lib later including HowTos, FAQ, etc and finally the
c-modules. Additionally, a section with a title matching exactly the
search query should come first. (I am not sure if the last suggestion
belongs in python-ideas or in
the sphinx mailing list, please advice)
Thanks,
Hernan
I am thinking about [python-wart] on SO. There is no currently a list of
Python warts, and building a better language is impossible without a clear
visibility of warts in current implementations.
Why Roundup doesn't work ATM.
- warts are lost among other "won't fix" and "works for me" issues
- no way to edit description to make it more clear
- no voting/stars to percieve how important is this issue
- no comment/noise filtering
and the most valuable
- there is no query to list warts sorted by popularity to explore other
time-consuming areas of Python you are not aware of, but which can popup
one day
SO at least allows:
+ voting
+ community wiki edits
+ useful comment upvoting
+ sorted lists
+ user editable tags (adding new warts is easy)
This post is a result of facing with numerous locals/settrace/exec issues
that are closed on tracker. I also have my own list of other issues
(logging/subprocess) at GC project, which I might be unable to maintain in
future. There is also some undocumented stuff (subprocess deadlocks) that
I'm investigating, but don't have time for a write-up. So I'd rather move
this somewhere where it could be updated.
--
anatoly t.
Hi python-ideas.
I think it would be nice to have a method in 'list' to replace certain
elements by others in-place. Like this:
l = [x, a, y, a]
l.replace(a, b)
assert l == [x, b, y, b]
The alternatives are longer than they should be, imo. For example:
for i, n in enumerate(l):
if n == a:
l[i] = b
Or:
l = [b if n==a else n for n in l]
And this is what happens when someone tries to "optimize" this process.
It totally obscures the intention:
try:
i = 0
while i < len(l):
i = l.index(a, i)
l[i] = b
i += 1
except ValueError:
pass
If there is a reason not to add '.replace' as built-in method, it could be
implemented in pure python efficiently if python provided a version of
'.index' that returns the index of more than just the first occurrence of a
given item. Like this:
l = [x, a, b, a]
for i in l.indices(a):
l[i] = b
So adding .replace and/or .indices… Good idea? Bad idea?