Mailman 3 June 2019 - Python-ideas

Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]
by Raymond Hettinger Oct. 23, 2019

Oct. 23, 2019

> On Friday, April 6, 2018 at 8:14:30 AM UTC-7, Guido van Rossum wrote: > On Fri, Apr 6, 2018 at 7:47 AM, Peter O'Connor <peter.ed...(a)gmail.com> wrote: >> So some more humble proposals would be: >> >> 1) An initializer to itertools.accumulate >> functools.reduce already has an initializer, I can't see any controversy to adding an initializer to itertools.accumulate > > See if that's accepted in the bug tracker. It did come-up once but was closed … [View More]for a number reasons including lack of use cases. However, Peter's signal processing example does sound interesting, so we could re-open the discussion. For those who want to think through the pluses and minuses, I've put together a Q&A as food for thought (see below). Everybody's design instincts are different -- I'm curious what you all think think about the proposal. Raymond --------------------------------------------- Q. Can it be done? A. Yes, it wouldn't be hard. _sentinel = object() def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes. * http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.… * https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html * http://microapl.com/apl/apl_concepts_chapter5.html \+ 1 2 3 4 5 1 3 6 10 15 * https://reference.wolfram.com/language/ref/Accumulate.html * https://www.haskell.org/hoogle/?hoogle=mapAccumL Q. How much work for a person to do it currently? A. Almost zero effort to write a simple helper function: myaccum = lambda it, func, start: accumulate(chain([start], it), func) Q. How common is the need? A. Rare. Q. Which would be better, a simple for-loop or a customized itertool? A. The itertool is shorter but more opaque (especially with respect to the argument order for the function call): result = [start] for x in iterable: y = func(result[-1], x) result.append(y) versus: result = list(accumulate(iterable, func, start=start)) Q. How readable is the proposed code? A. Look at the following code and ask yourself what it does: accumulate(range(4, 6), operator.mul, start=6) Now test your understanding: How many values are emitted? What is the first value emitted? Are the two sixes related? What is this code trying to accomplish? Q. Are there potential surprises or oddities? A. Is it readily apparent which of assertions will succeed? a1 = sum(range(10)) a2 = sum(range(10), 0) assert a1 == a2 a3 = functools.reduce(operator.add, range(10)) a4 = functools.reduce(operator.add, range(10), 0) assert a3 == a4 a4 = list(accumulate(range(10), operator.add)) a5 = list(accumulate(range(10), operator.add, start=0)) assert a5 == a6 Q. What did the Python 3.0 Whatsnew document have to say about reduce()? A. "Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable." Q. What would this look like in real code? A. We have almost no real-world examples, but here is one from a StackExchange post: def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89 wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2, 4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10] cs = accumulate(cycle(wh11), start=11) yield( next( cs)) # cf. ideone.com/WFv4f ps = wsieve() # codereview.stackexchange.com/q/92365/9064 p = next(ps) # 11 psq = p*p # 121 D = dict( zip( accumulate(wh11, start=0), count(0))) # start from sieve = {} for c in cs: if c in sieve: wheel = sieve.pop(c) for m in wheel: if not m in sieve: break sieve[m] = wheel # sieve[143] = wheel@187 elif c < psq: yield c else: # (c==psq) # map (p*) (roll wh from p) = roll (wh*p) from (p*p) x = [p*d for d in wh11] i = D[ (p-11) % 210] wheel = accumulate(cycle(x[i:] + x[:i]), start=psq) p = next(ps) ; psq = p*p next(wheel) ; m = next(wheel) sieve[m] = wheel [View Less]

17 49

Why not accept lists or arbitrary iterables in str.endswith?
by Soni L. Oct. 14, 2019

Oct. 14, 2019

I'm parsing configs for domain filtering rules, and they come as a list. However, str.endswith requires a tuple. So I need to use str.endswith(tuple(list)). I don't know the reasoning for this, but why not just accept a list as well?

3 6

support toml for pyproject support
by Jimmy Girardet July 29, 2019

July 29, 2019

Hi, I don't know if this was already debated but I don't know how to search in the whole archive of the list. For now the adoption of pyproject.toml file is more difficult because toml is not in the standard library. Each tool which wants to use pyproject.toml has to add a toml lib as a conditional or hard dependency. Since toml is now the standard configuration file format, it's strange the python does not support it in the stdlib lije it would have been strange to … [View More]

13 15

Integer infinity and extended integers
by Neil Girdhar July 2, 2019

July 2, 2019

Now that Python is beginning to embrace type annotations, is it worth revisiting the idea of having extended integers and an integer infinity? I found myself trying to annotate this line: events_to_do: Union[int, float] = math.inf where I am only including float in the union to accommodate math.inf. I'm interested in exploring this concrete proposal: Add a class to the numeric hierarchy (https://www.python.org/dev/peps/pep-3141/) ExtendedIntegral whereby Real :> ExtendedIntegral :… [View More]

2 1

Overloading assignment concrete proposal (Re: Re: Operator as first class citizens -- like in scala -- or yet another new operator?)
by Andrew Barnert July 1, 2019

July 1, 2019

The thread on operators as first-class citizens keeps getting vague ideas about assignment overloading that wouldn't actually work, or don't even make sense. I think it's worth writing down the simplest design that would actually work, so people can see why it's not a good idea (or explain why they think it would be anyway). in pseudocode, just as x += y means this: xval = globals()['x'] try: result = xval.__iadd__(y) except … [View More]AttributeError: result = xval.__add__(y) globals()['x'] = result … x = y would mean this: try: xval = globals()['x'] result = xval.__iassign__(y) except (LookupErrorr, AttributeError): result = y globals()['x'] = result If you don't understand why this would work or why it wouldn't be a great idea (or want to nitpick details), read on; otherwise, you can skip the rest of this message. --- First, why is there even a problem? Because Python doesn't even have "variables" in the same sense that languages like C++ that allow assignment overloading do. In C++, a variable is an "lvalue", a location with identity and type, and an object is just a value that lives in a location. So assignment is an operation on variables: x = 2 is the same as XClass::operator=(&x, y). In Python, an object is a value that lives wherever it wants, with identity and type, and a variable is just a name that can be bound to a value in a namespace. So assignment is an operation on namespaces, not on variables: x = 2 is the same as dict.__settem__(globals(), 'x', 2). The same thing is true for more complicated assignments. For example, a.x = 2 is just an operation on a's namespace instead of the global namespace: type(a).__setattr__(a, 'x', 2). Likewise, a.b['x'] = 2 is type(a.b).__setitem__(a.b, 'x', 2), And so on, --- But Python allows overloading augmented assignment. How does that work? There's a perfectly normal namespace lookup at the start and namespace store at the end—but in between, the existing value of the target gets to specify the value being assigned. Immutable types like int don't define __iadd__, and __add__ creates and returns a new object. So, x += y ends up the same as x = x + y. But mutable types like list define an __iadd__ that mutates self in-place and then returns self, so x gets harmlessly rebound to the same object it was already bound to. So x += y ends up the same as x.extend(y); x = x. The exact same technique would work for overloading normal assignment. The only difference is that x += y is illegal if x is unbound, while x = y obviously has to be legal (and mean there is no value to intercept the assignment). So, the fallback happens when xval doesn't define __iassign__, but also when x isn't bound at all. So, for immutable types like eint, and almost all mutable types like list—and when x is unbound—x = y does the same thing it always did. But special types that want to act like transparent mutable handles define an __iassign__ that mutates self in place and returns self, so x gets harmlessly rebound to the same object. So x = y ends up the same as, say, x.set_target(y); x = x. This all works the same if the variables are local rather than global, or for more complicated targets like attribution or subscription, and even for target lists; the intercept still happens the same way, between the (more complicated) lookup and storage steps. --- Now, why is this a bad idea? First, the benefit of __iassign__ is a lot smaller than __iadd__. A sizable fraction of "x += y" statements are for mutable "x" values, but only a rare handful of "x = y" statements would be for special handle "x" values. Even the same cost for a much smaller benefit would be a much harder sell. But the runtime performance cost difference is huge. If augmented assignment weren't overloadable, it would still have to lookup the value, lookup and call a special method on it, and store the value. The only cost overloading adds is trying two special methods instead of one, which is tiny. But regular assignment doesn't have to do a value lookup or a special method call at all, only a store; adding those steps would roughly double the cost of every new variable assignment, and even more for every reassignment. And assignments are very common in Python, even within inner loops, so we're talking about a huge slowdown to almost every program out there. Also, the fact that assignment always means assignment makes Python code easier both for humans to skim, and for automated programs to process. Consider, for example, a static type checker like mypy. Today, x = 2 means that x must now be an int, always. But if x could be a Signal object with an overloaded __iassign__, then, x = 2 might mean that x must now be an int, or it might mean that x must now be whatever type(x).__iassign__ returns. Finally, the complexity of __iassign__ is at least a little higher than __iadd__. Notice that in my pseudocode above, I cheated—obviously the xval = and result = lines are not supposed to recursively call the same pseudocode, but to directly store a value in new temporary local variable. In the real implementation, there wouldn't even be such a temporary variable (in CPython, the values would just be pushed on the stack), but for documenting the behavior, teaching it to students, etc., that doesn't matter. Being precise here wouldn't be hugely difficult, but it is a little more difficult than with __iadd__, where there's no similar potential confusion even possible. On Wednesday, June 19, 2019, 10:54:04 AM PDT, Andrew Barnert via Python-ideas <python-ideas(a)python.org> wrote: On Jun 18, 2019, at 12:43, nate lust <natelust(a)linux.com> wrote: I have been following this discussion for a long time, and coincidentally I recently started working on a project that could make use of assignment overloading. (As an aside it is a configuration system for a astronomical data analysis pipeline that makes heavy use of descriptors to work around historical decisions and backward compatibility). Our system makes use of nested chains of objects and descriptors and proxy object to manage where state is actually stored. The whole system could collapse down nicely if there were assignment overloading. However, this works OK most of the time, but sometimes at the end of the chain things can become quite complicated. I was new to this code base and tasked with making some additions to it, and wished for an assignment operator, but knew the data binding model of python was incompatible from p. This got me thinking. I didnt actually need to overload assignment per-say, data binding could stay just how it was, but if there was a magic method that worked similar to how __get__ works for descriptors but would be called on any variable lookup (if the method was defined) it would allow for something akin to assignment. What counts as “variable lookup”? In particular: For example: class Foo: def __init__(self): self.value = 6 self.myself = weakref.ref(self) def important_work(self): print(self.value) … why doesn’t every one of those “self” lookups call self.__get_self__()? It’s a local variable being looked up by name, just like your “foo” below, and it finds the same value, which has the same __get_self__ method on its type. The only viable answer seems to that it does. So, to avoid infinite circularity, your class needs to use the same kind of workaround used for attribute lookup in classes that define __getattribute__ and/or __setattr__: def important_work(self): print(object.__get_self__(self).value) def __get_self__(self): return object.__get_self__(self).myself But even that won’t work here, because you still have to look up self to call the superclass method on it. I think it would require some new syntax, or at least something horrible involving locals(), to allow you to write the appropriate methods. def __get_self__(self): return self.myself Besides recursively calling itself for that “self” lookup, why doesn’t this also call weakref.ref.__get_self__ for that “myself” lookup? It’s an attribute lookup rather than a local namespace lookup, but surely you need that to work too, or as soon as you store a Foo instance in another object it stops overloading. For this case there’s at least an obvious answer: because weakref.ref doesn’t override that method, the variable lookup doesn’t get intercepted. But notice that this means every single value access in Python now has to do an extra special-method lookup that almost always does nothing, which is going to be very expensive. def __setattr__(self, name, value): self.value = value You can’t write __setattr__ methods this way. That assignment statement just calls self.__setattr__(‘value’, value), which will endlessly recurse. That’s why you need something like the object method call to break the circularity. Also, this will take over the attribute assignments in your __init__ method. And, because it ignores the name and always sets the value attribute, it means that self.myself = is just going to override value rather than setting myself. To solve both of these problems, you want a standard __setattr__ body here: def __setattr__(self, name, value): object.__setattr__(self, name, value) But that immediately makes it obvious that your __setattr__ isn’t actually doing anything, and could just be left out entirely. foo = Foo() # Create an instancefoo # The interpreter would return foo.myselffoo.value # The interpreter would return foo.myself.value foo = 19 # The interpreter would run foo.myself = 6 which would invoke foo.__setattr__('myself', 19) For this last one, why would it do that? There’s no lookup here at all, only an assignment. The only way to make this work would be for the interpreter to lookup the current value of the target on every assignment before assigning to it, so that lookup could be overloaded. If that were doable, then assignment would already be overloadable, and this whole discussion wouldn’t exist. But, even if you did add that, __get_self__ is just returning the value self.myself, not some kind of reference to it. How can the interpreter figure out that the weakref.ref value it got came from looking up the name “myself” on the Foo instance? (This is the same reason __getattr__ can’t help you override attribute setting, and a separate method __setattr__ is needed.) To make this work, you’d need a __set_self__ to go along with __get_self__. Otherwise, your changes not only don’t provide a way to do assignment overloading, they’d break assignment overloading if it existed. Also, all of the extra stuff you’re trying to add on top of assignment overloading can already be done today. You just want a transparent proxy: a class whose instances act like a reference to some other object, and delegate all methods (and maybe attribute lookups and assignments) to it. This is already pretty easy; you can define __getattr__ (and __setattr__) to do it dynamically, or you can do some clever stuff to create static delegating methods (and properties) explicitly at object-creation or class-creation time. Then foo.value returns foo.myself.value, foo.important_work() calls the Foo method but foo.__str__() calls foo.myself.__str__(), you can even make it pass isinstance checks if you want. The only thing it can’t do is overload assignment. I think the real problem here is that you’re thinking about references to variables rather than values, and overloading operators on variables rather than values, and neither of those makes sense in Python. Looking up, or assigning to, a local variable named “foo” is not an operation on “the foo variable”, because there is no such thing; it’s an operation on the locals namespace._______________________________________________ Python-ideas mailing list -- python-ideas(a)python.org To unsubscribe send an email to python-ideas-leave(a)python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4JMNZ… Code of Conduct: http://python.org/psf/codeofconduct/ [View Less]

14 80

shutil.symlink to allow non-race replacement of existing link targets
by Tom Hale June 30, 2019

June 30, 2019

As suggested by Toshio Kuratomi at https://bugs.python.org/issue36656, I am raising this here for inclusion in the shutil module. Mimicking POSIX, os.symlink() will raise FileExistsError if the link name to be created already exists. A common use case is overwriting an existing file (often a symlink) with a symlink. Naively, one would delete the file named link_name file if it exists, then call symlink(). This "solution" is already 3 lines of code, and without exception handling it … [View More]introduces the race condition of a file named link_name being created between unlink and symlink. Depending on the functionality required, I suggest: * os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file Handling all possible race conditions (some detailed in issue36656) is non-trivial, however this is the best that I have come up with so far: ========================================================================== import os, tempfile def symlink(target, link_name): '''Create a symbolic link link_name pointing to target. Overwrites link_name if it exists. ''' # os.replace() may fail if files are on different filesystems link_dir = os.path.dirname(link_name) # Link to a temporary filename that doesn't exist while True: temp_link_name = tempfile.mktemp(dir=link_dir) # os.* functions mimic as closely as possible system functions # The POSIX symlink() returns EEXIST if link_name already exists # https://pubs.opengroup.org/onlinepubs/9699919799/functions/symlink.html try: os.symlink(target, temp_link_name) break except FileExistsError: pass # Replace link_name with temp_link_name try: # Pre-empt os.replace on a directory with a nicer message if os.path.isdir(link_name): raise IsADirectoryError(f"Cannot symlink over existing directory: '{link_name}'") os.replace(temp_link_name, link_name) except: if os.path.islink(temp_link_name): os.remove(temp_link_name) raise ========================================================================== The documentation (https://docs.python.org/3/library/shutil.html) I suggest for this is: shutil.symlink(target, link_name) Create a symbolic link named link_name pointing to target, overwriting target if it exists. If link_name is a directory, IsADirectoryError is raised. To not overwrite target, use os.symlink() ========================================================================== It would be tempting to do: while True: try: os.symlink(target, link_name) break except FileExistsError: os.remove(link_name) But this has a race condition when replacing a symlink should should *always* exist, eg: /lib/critical.so -> /lib/critical.so.1.2 When upgrading by: symlink('/lib/critical.so.2.0', '/lib/critical.so') There is a point in time when /lib/critical.so doesn't exist. ========================================================================== One issue I see with my suggested code is that the file at temp_link_name could be changed before target is replaced with it. This is mitigated by the randomness introduced by mktemp(). While it is far less likely that a file is accessed with a random and unknown name than with an existing known name, I seek input on a solution if this is an unacceptable risk. Prior art: * https://bugs.python.org/issue36656 (already mentioned above) * https://stackoverflow.com/a/55742015/5353461 * https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ln.c -- Tom Hale [View Less]

14 49

something like sscanf for Python
by James Lu June 29, 2019

June 29, 2019

> On Jun 26, 2019, at 7:13 PM, Chris Angelico <rosuav(a)gmail.com> wrote: > > The main advantage of sscanf over a regular expression is that it > performs a single left-to-right pass over the format string and the > target string simultaneously, with no backtracking. (This is also its > main DISadvantage compared to a regular expression.) A tiny amount of > look-ahead in the format string is the sole exception (for instance, > format string "%s$%d" would collect a … [View More]

8 10

Canceling thread in python
by Matúš Valo June 27, 2019

June 27, 2019

Hi All, Currently it is not possible to "kill" thread which is blocked. The rationale for this is handling situations when thread is blocked - e.g. when thread is quering DB when lock occurred on Database. In this case, the main thread has no way how to stop the blocked thread. Killing a thread is also popular question - see [1][2]. pthread library and Windows API contains mechanisms for forced termination of threads - see [3] and [4]. It is also simple to use them using ctypes library but … [View More]

9 18

Re: A proposal (and implementation) to add assignment and LOAD overloading
by Steven D'Aprano June 27, 2019

June 27, 2019

On Fri, Jun 28, 2019 at 02:44:28AM +1000, Chris Angelico wrote: > If it's ALWAYS called, then it's almost useless. The wrapper object > will vanish the moment you attempt to do anything with it, devolving > instantly to the result of getself. I don't understand why it is useless. If the wrapper object is no longer needed, then getself will return the object which is needed, and the wrapper is superfluous and should be garbage collected. If the wrapper object is needed, then … [View More]

3 6

Proposal: Using % sign for percentage
by Ronie Martinez June 26, 2019

June 26, 2019

Good day! As Python focuses on readability, why not use % sign for actual percentages? For example: ``` rate = 0.1058 # float rate = 10.58% # percent, similar to above ``` It does not interfere with modulo operator as modulo follows a different format: ``` a = x % y ``` This looks like a small feature but it will surely set Python a level higher in terms of readability. Thanks a lot!

7 6