There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
I think it would be helpful for folks using the asyncio module to be able
to make non-blocking calls to objects in the multiprocessing module more
easily. While some use-cases for using multiprocessing can be replaced with
ProcessPoolExecutor/run_in_executor, there are others that cannot; more
advanced usages of multiprocessing.Pool aren't supported by
ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
multiprocessing classes like Lock and Queue have blocking methods that
could be made into coroutines.
Consider this (extremely contrived, but use your imagination) example of a
asyncio-friendly Queue:
import asyncio
import time
def do_proc_work(q, val, val2):
time.sleep(3) # Imagine this is some expensive CPU work.
ok = val + val2
print("Passing {} to parent".format(ok))
q.put(ok) # The Queue can be used with the normal blocking API, too.
item = q.get()
print("got {} back from parent".format(item))
def do_some_async_io_task():
# Imagine there's some kind of asynchronous I/O
# going on here that utilizes asyncio.
asyncio.sleep(5)
@asyncio.coroutine
def do_work(q):
loop.run_in_executor(ProcessPoolExecutor(),
do_proc_work, q, 1, 2)
do_some_async_io_task()
item = yield from q.coro_get() # Non-blocking get that won't affect our
io_task
print("Got {} from worker".format(item))
item = item + 25
yield from q.coro_put(item)
if __name__ == "__main__":
q = AsyncProcessQueue() # This is our new asyncio-friendly version of
multiprocessing.Queue
loop = asyncio.get_event_loop()
loop.run_until_complete(do_work(q))
I have seen some rumblings about a desire to do this kind of integration on
the bug tracker (http://bugs.python.org/issue10037#msg162497 and
http://bugs.python.org/issue9248#msg221963) though that discussion is
specifically tied to merging the enhancements from the Billiard library
into multiprocessing.Pool. Are there still plans to do that? If so, should
asyncio integration with multiprocessing be rolled into those plans, or
does it make sense to pursue it separately?
Even more generally, do people think this kind of integration is a good
idea to begin with? I know using asyncio is primarily about *avoiding* the
headaches of concurrent threads/processes, but there are always going to be
cases where CPU-intensive work is going to be required in a primarily
I/O-bound application. The easier it is to for developers to handle those
use-cases, the better, IMO.
Note that the same sort of integration could be done with the threading
module, though I think there's a fairly limited use-case for that; most
times you'd want to use threads over processes, you could probably just use
non-blocking I/O instead.
Thanks,
Dan
Is there a good reason for not implementing the "+" operator for dict.update()?
A = dict(a=1, b=1)
B = dict(a=2, c=2)
B += A
B
dict(a=1, b=1, c=2)
That is
B += A
should be equivalent to
B.update(A)
It would be even better if there was also a regular "addition"
operator that is equivalent to creating a shallow copy and then
calling update():
C = A + B
should equal to
C = dict(A)
C.update(B)
(obviously not the same as C = B + A, but the "+" operator is not
commutative for most operations)
class NewDict(dict):
def __add__(self, other):
x = dict(self)
x.update(other)
return x
def __iadd__(self, other):
self.update(other)
My apologies if this has been posted before but with a quick google
search I could not see it; if it was, could you please point me to the
thread? I assume this must be a design decision that has been made a
long time ago, but it is not obvious to me why.
I would expect that all standard IO in Python goes through sys.stdin,
sys.stdout and sys.stderr or the underlying buffer or raw objects. The only
exception should be error messages before the sys.std* objects are
initialized.
I was surprised that this is actually not the case – reading input in the
interactive loop actually doesn't use sys.stdin (see
http://bugs.python.org/issue17620). However it uses its encoding, which
doesn't make sense. My knowledge of the actual implementation is rather
poor, but I got impression that the codepath of getting input from user in
interactive loop is complicated. I would think that it consits just of
wrapping an underlying system call (or GNU readline or anything) in
sys.stdin.buffer.raw.readinto or something. With current implementation,
fixing issues may be complicated – for example handling SIGINT produced by
Ctrl-C on Windows issues. There is a closed issue
http://bugs.python.org/issue17619 but also an open issue
http://bugs.python.org/issue18597.
There is also a seven years old issue http://bugs.python.org/issue1602
regarding Unicode support on Windows console. Even if the issue isn't
fixed, anyone could just write their own sys.std* objects a install them in
the running interpreter. This doesn't work now because of the problem
described.
I just wanted to bring up the idea of redesign the stdio backend which also
results in fixing http://bugs.python.org/issue17620 and helping fixing the
others.
Regards, Drekin
Currently, os.path.join joins strings specified in its arguments, with one
string per argument.
On its own, that is not a problem. However, it is inconsistent with
str.join, which accepts only a list of strings. This inconsistency can
lead to some confusion, since these operations that have similar names and
carry out similar tasks have fundamentally different syntax.
My suggestion is to allow os.path.join to accept a list of strings in
addition to existing one string per argument. This would allow it to be
used in a manner consistent with str.join, while still allowing existing
code to run as expected.
Currently, when os.path.join is given a single list, it returns that list
exactly. This is undocumented behavior (I am surprised it is not an
exception). It would mean, however, this change would break code that
wants a list if given a list but wants to join if given multiple strings.
This is conceivable, but outside of catching the sorts of errors this
change would prevent, I would be surprised if it is a common use-case.
In the case where multiple arguments are used and one or more of those
arguments are a list, I think the best solution would be to raise an
exception, since this would avoid corner cases and be less likely to
silently propagate bugs. However, I am not set on that, so if others
prefer it join all the strings in all the lists that would be okay too.
So the syntax would be like this (on POSIX as an example):
>>> os.path.join('test1', 'test2', 'test3') # current syntax
'test1/test2/test3'
>>> os.path.join(['test1', 'test2', 'test3']) # new syntax
'test1/test2/test3'
>>> os.path.join(['test1', 'test2'], 'test3')
Exception
>>> os.path.join(['test1'], 'test2', 'test3')
Exception
>>> os.path.join(['test1', 'test2'], ['test3'])
Exception
Hi,
After a long hiatus I’ve done some updates to PEP 447 which proposes a new metaclass method that’s used in attribute resolution for normal and super instances. There have been two updates, the first one is trivial, the proposed method has a new name (__getdescriptor__). The second change to the PEP is to add a Python pseudo implementation of object.__getattribute__ and super.__getattribute__ to make it easier to reason about the impact of the proposal.
I’d like to move forward with this PEP, either to rejection or (preferable) to acceptance of the feature in some form. That said, I’m not too attached to the exact proposal, it just seems to be the minmal clean change that can be used to implement my use case for this.
My use case is fairly obscure, but hopefully it is not too obscure :-). The problem I have at the moment is basically that it is not possible to hook into the attribute resolution algorithm used by super.__getattribute__ and this PEP would solve that.
My use case for this PEP is PyObjC, the PEP would make it possible to remove a custom “super” class used in that project. I’ll try to sketch what PyObjC does and why the current super is a problem in the paragraphs below.
PyObjC is a bridge between Python and Objective-C. The bit that’s important for this discussion is that every Objective-C object and class can be proxied into Python code. That’s done completely dynamically: the PyObjC bridge reads information from the Objective-C runtime (using a public API for that) to determine which classes are present there and which methods those classes have.
Accessing the information on methods is done on demand, the bridge only looks for a method when Python code tries to access it. There are two reasons for that, the first one is performance: extracting method information eagerly is too expensive because there are a lot of them and Python code typically uses only a fraction of them. The second reason is more important than that: Objective-C classes are almost as dynamic Python classes and it is possible to add new methods at runtime either by loading add-on bundles (“Categories”) or by interacting with the Objective-C runtime. Both are actually used by Apple’s frameworks. There are no hooks that can be used to detect there modification, the only option I’ve found that can be used to keep the Python representation of a class in sync with the Objective-C representation is to eagerly scan classes every time they might be accessed, for example in the __getattribute__ of the proxies for Objective-C classes and instances.
That’s terribly expensive, and still leaves a race condition when using super, in code like the code below the superclass might grow a new method between the call to the python method and using the superclass method:
def myMethod(self):
self.objectiveCMethod()
super().otherMethod()
Because of this the current PyObjC release doesn’t even try to keep the Python representation in sync, but always lazily looks for methods (but with a cache for all found methods to avoid the overhead of looking for them when methods are used multiple times). As that definitely will break builtin.super PyObjC also includes a custom super implementation that must be used. That works, but can lead to confusing errors when users forget to add “from objc import super” to modules that use super in subclasses from Objective-C classes.
The performance impact on CPython seemed to be minimal according to the testing I performed last year, but I have no idea what the impact would be on other implementation (in particular PyPy’s JIT).
A link to the PEP: http://legacy.python.org/dev/peps/pep-0447/
I’d really appreciate further feedback on this PEP.
Regards,
Ronald
On Jul 23, 2014, at 5:13, Akira Li <4kir4.1i(a)gmail.com> wrote:
> Andrew Barnert <abarnert(a)yahoo.com> writes:
>
>> On Jul 22, 2014, at 9:05, Akira Li <4kir4.1i(a)gmail.com> wrote:
>>
>>> Paul Moore <p.f.moore(a)gmail.com> writes:
>>>
>>>> On 21 July 2014 01:41, Andrew Barnert
>>>> <abarnert(a)yahoo.com.dmarc.invalid> wrote:
>>>>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>>>>> not a good thing to do, apologies); you can find it at
>>>>> http://bugs.python.org/file36008/pep-newline.txt
>>>>
>>>> As a suggestion, how about adding an example of a simple nul-separated
>>>> filename filter - the sort of thing that could go in a find -print0 |
>>>> xxx | xargs -0 pipeline? If I understand it, that's one of the key
>>>> motivating examples for this change, so seeing how it's done would be
>>>> a great help.
>>>>
>>>> Here's the sort of thing I mean, written for newline-separated files:
>>>>
>>>> import sys
>>>>
>>>> def process(filename):
>>>> """Trivial example"""
>>>> return filename.lower()
>>>>
>>>> if __name__ == '__main__':
>>>>
>>>> for filename in sys.stdin:
>>>> filename = process(filename)
>>>> print(filename)
>>>>
>>>> This is also an example of why I'm struggling to understand how an
>>>> open() parameter "solves all the cases". There's no explicit open()
>>>> call here, so how do you specify the record separator? Seeing how you
>>>> propose this would work would be really helpful to me.
>>>
>>> `find -print0 | ./tr-filename -0 | xargs -0` example implies that you
>>> can replace `sys.std*` streams without worrying about preserving
>>> `sys.__std*__` streams:
>>>
>>> #!/usr/bin/env python
>>> import io
>>> import re
>>> import sys
>>> from pathlib import Path
>>>
>>> def transform_filename(filename: str) -> str: # example
>>> """Normalize whitespace in basename."""
>>> path = Path(filename)
>>> new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
>>> path.replace(new_path) # rename on disk if necessary
>>> return str(new_path)
>>>
>>> def SystemTextStream(bytes_stream, **kwargs):
>>> encoding = sys.getfilesystemencoding()
>>> return io.TextIOWrapper(bytes_stream,
>>> encoding=encoding,
>>> errors='surrogateescape' if encoding != 'mbcs' else 'strict',
>>> **kwargs)
>>>
>>> nl = '\0' if '-0' in sys.argv else None
>>> sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>> for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>> print(transform_filename(line.rstrip(nl)), end=nl)
>>
>> Nice, much more complete example than mine. I just tried to handle as
>> many edge cases as the original he asked about, but you handle
>> everything.
>>
>>> io.TextIOWrapper() plays the role of open() in this case. The code
>>> assumes that `newline` parameter accepts '\0'.
>>>
>>> The example function handles Unicode whitespace to demonstrate why
>>> opaque bytes-based cookies can't be used to represent filenames in this
>>> case even on POSIX, though which characters are recognized depends on
>>> sys.getfilesystemencoding().
>>>
>>> Note:
>>>
>>> - `end=nl` is necessary because `print()` prints '\n' by default -- it
>>> does not use `file.newline`
>>
>> Actually, yes it does. Or, rather, print pastes on a '\n', but
>> sys.stdout.write translates any '\n' characters to sys.stdout.writenl
>> (a private variable that's initialized from the newline argument at
>> construction time if it's anything other than None or '').
>
> You are right. I've stopped reading the source for print() function at
> `PyFile_WriteString("\n", file);` line assuming that "\n" is not
> translated if newline="\0". But the current behaviour if "\0" were in
> "the other legal values" category (like "\r") would be to translate "\n"
> [1]:
>
> When writing output to the stream, if newline is None, any '\n'
> characters written are translated to the system default line
> separator, os.linesep. If newline is '' or '\n', no translation takes
> place. If newline is any of the other legal values, any '\n'
> characters written are translated to the given string.
>
> [1] https://docs.python.org/3/library/io.html#io.TextIOWrapper
>
> Example:
>
> $ ./python -c 'import sys, io;
> sys.stdout=io.TextIOWrapper(sys.stdout.detach(), newline="\r\n");
> sys.stdout.write("\n\r\r\n")'| xxd
> 0000000: 0d0a 0d0d 0d0a ......
>
> "\n" is translated to b"\r\n" here and "\r" is left untouched (b"\r").
>
> In order to newline="\0" case to work, it should behave similar to
> newline='' or newline='\n' case instead i.e., no translation should take
> place, to avoid corrupting embed "\n\r" characters.
The draft PEP discusses this. I think it would be more consistent to translate for \0, just like \r and \r\n.
For the your script, there is no reason to pass newline=nl to the stdout replacement. The only effect that has on output is \n replacement, which you don't want. And if we removed that effect from the proposal, it would have no effect at all on output, so why pass it?
Do you have a use case where you need to pass a non-standard newline to a text file/stream, but don't want newline replacement? Or is it just a matter of avoiding confusion if people accidentally pass it for stdout when they didn't want it?
> My original code
> works as is in this case i.e., *end=nl is still necessary*.
>> But of course that's the newline argument to sys.stdout, and you only
>> changed sys.stdin, so you do need end=nl anyway. (And you wouldn't
>> want output translation here anyway, because that could also translate
>> \n' characters in the middle of a line, re-creating the same problem
>> we're trying to avoid...)
>>
>> But it uses sys.stdout.newline, not sys.stdin.newline.
>
> The code affects *both* sys.stdout/sys.stdin. Look [2]:
I didn't notice that you passed it for stdout as well--as I explained above, you don't need it, and shouldn't do it.
As a side note, I think it might have been a better design to have separate arguments for input newline, output newline, and universal newlines mode, instead of cramming them all into one argument; for some simple cases the current design makes things a little less verbose, but it gets in the way for more complex cases, even today with \r or \r\n. However, I don't think that needs to be changed as part of this proposal.
It also might be nice to have a full set of PYTHONIOFOO env variables rather than just PYTHONIOENCODING, but again, I don't think that needs to be part of this proposal. And likewise for Nick Coghlan's rewrap method proposal on TextIOWrapper and maybe BufferedFoo.
>>> sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>> for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>> print(transform_filename(line.rstrip(nl)), end=nl)
>
> [2] https://mail.python.org/pipermail/python-ideas/2014-July/028372.html
>
>>> - SystemTextStream() handles undecodable in the current locale filenames
>>> i.e., non-ascii names are allowed even in C locale (LC_CTYPE=C)
>>> - undecodable filenames are not supported on Windows. It is not clear
>>> how to pass an undecodable filename via a pipe on Windows -- perhaps
>>> `GetShortPathNameW -> fsencode -> pipe` might work in some cases. It
>>> assumes that the short path exists and it is always encodable using
>>> mbcs. If we can control all parts of the pipeline *and* Windows API
>>> uses proper utf-16 (not ucs-2) then utf-8 can be used to pass
>>> filenames via a pipe otherwise ReadConsoleW/WriteConsoleW could be
>>> tried e.g., https://github.com/Drekin/win-unicode-console
>>
>> First, don't both the Win32 APIs and the POSIX-ish layer in msvcrt on
>> top of it guarantee that you can never get such unencodable filenames
>> (sometimes by just pretending the file doesn't exist, but if possible
>> by having the filesystem map it to something valid, unique, and
>> persistent for this session, usually the short name)?
>> Second, trying to solve this implies that you have some other native
>> (as opposed to Cygwin) tool that passes or accepts such filenames over
>> simple pipes (as opposed to PowerShell typed ones). Are there any?
>> What does, say, mingw's find do with invalid filenames if it finds
>> them?
>
> In short: I don't know :)
>
> To be clear, I'm talking about native Windows applications (not
> find/xargs on Cygwin). The goal is to process robustly *arbitrary*
> filenames on Windows via a pipe (SystemTextStream()) or network (bytes
> interface).
Yes, I assumed that, I just wanted to make that clear.
My point is that if there isn't already an ecosystem of tools that do so on Windows, or a recommended answer from Microsoft, we don't need to fit into existing practices here. (Actually, there _is_ a recommended answer from Microsoft, but it's "don't send encoded filenames over a binary stream, send them as an array of UTF-16 strings over PowerShell cmdlet typed pipes"--and, more generally, "don't use any ANSI interfaces except for backward compatibility reasons".)
At any rate, if the filenames-over-pipes encoding problem exists on Windows, and if it's solvable, it's still outside the scope of this proposal, unless you think the documentation needs a completely worked example that shows how to interact with some Windows tool, alongside one for interacting with find -print0 on Unix. (And I don't think it does. If we want a Windows example, resource compiler string input files, which are \0-terminated UTF-16, probably serve better.)
> I know that (A)nsi API (and therefore "POSIX-ish layer" that uses narrow
> strings such main(), fopen(), fstream is broken e.g., Thai filenames on
> Greek computer [3].
Yes, and broken in a way that people cannot easily work around except by using the UTF-16 interfaces. That's been Microsoft's recommended answer to the problem since NT 3.5, Win 95, and MSVCRT 3: if you want to handle all filenames, use _wmain, _wfopen, etc.--or, better, use CreateFileW instead of fopen. They never really addressed the issue of passing filenames between command-line tools at all, until PowerShell, where you pass them as a list of UTF-16 strings rather than a stream of newline-separated encoded bytes. (As a side note, I have no idea how well Python works for writing PowerShell cmdlets, but I don't think that's relevant to the current proposal.)
> Unicode (W) API should enforce utf-16 in principle
> since Windows 2000 [4]. But I expect ucs-2 shows its ugly head in many
> places due to bad programming practices (based on the common wrong
> assumption that Unicode == UTF-16 == UCS-2) and/or bugs that are not
> fixed due to MS' backwards compatibility policies in the past [5].
Yes, I've run into such bugs in the past. It's even more fun when you're dealing with unterminated string with separate length interfaces. Fortunately, as far as I know, no such bugs affect reading and writing binary files, pipes, and sockets, so they don't affect us here.