Good day all,
as a continuation of thread "OS related file operations (copy, move,
delete, rename...) should be placed into one module"
https://mail.python.org/pipermail/python-ideas/2017-January/044217.html
please consider making pathlib to a central file system module with putting
file operations (copy, move, delete, rmtree etc) into pathlib.
BR,
George
> On Friday, April 6, 2018 at 8:14:30 AM UTC-7, Guido van Rossum wrote:
> On Fri, Apr 6, 2018 at 7:47 AM, Peter O'Connor <peter.ed...(a)gmail.com> wrote:
>> So some more humble proposals would be:
>>
>> 1) An initializer to itertools.accumulate
>> functools.reduce already has an initializer, I can't see any controversy to adding an initializer to itertools.accumulate
>
> See if that's accepted in the bug tracker.
It did come-up once but was closed for a number reasons including lack of use cases. However, Peter's signal processing example does sound interesting, so we could re-open the discussion.
For those who want to think through the pluses and minuses, I've put together a Q&A as food for thought (see below). Everybody's design instincts are different -- I'm curious what you all think think about the proposal.
Raymond
---------------------------------------------
Q. Can it be done?
A. Yes, it wouldn't be hard.
_sentinel = object()
def accumulate(iterable, func=operator.add, start=_sentinel):
it = iter(iterable)
if start is _sentinel:
try:
total = next(it)
except StopIteration:
return
else:
total = start
yield total
for element in it:
total = func(total, element)
yield total
Q. Do other languages do it?
A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes.
* http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.…
* https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html
* http://microapl.com/apl/apl_concepts_chapter5.html
\+ 1 2 3 4 5
1 3 6 10 15
* https://reference.wolfram.com/language/ref/Accumulate.html
* https://www.haskell.org/hoogle/?hoogle=mapAccumL
Q. How much work for a person to do it currently?
A. Almost zero effort to write a simple helper function:
myaccum = lambda it, func, start: accumulate(chain([start], it), func)
Q. How common is the need?
A. Rare.
Q. Which would be better, a simple for-loop or a customized itertool?
A. The itertool is shorter but more opaque (especially with respect
to the argument order for the function call):
result = [start]
for x in iterable:
y = func(result[-1], x)
result.append(y)
versus:
result = list(accumulate(iterable, func, start=start))
Q. How readable is the proposed code?
A. Look at the following code and ask yourself what it does:
accumulate(range(4, 6), operator.mul, start=6)
Now test your understanding:
How many values are emitted?
What is the first value emitted?
Are the two sixes related?
What is this code trying to accomplish?
Q. Are there potential surprises or oddities?
A. Is it readily apparent which of assertions will succeed?
a1 = sum(range(10))
a2 = sum(range(10), 0)
assert a1 == a2
a3 = functools.reduce(operator.add, range(10))
a4 = functools.reduce(operator.add, range(10), 0)
assert a3 == a4
a4 = list(accumulate(range(10), operator.add))
a5 = list(accumulate(range(10), operator.add, start=0))
assert a5 == a6
Q. What did the Python 3.0 Whatsnew document have to say about reduce()?
A. "Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
Q. What would this look like in real code?
A. We have almost no real-world examples, but here is one from a StackExchange post:
def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89
wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2,
4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10]
cs = accumulate(cycle(wh11), start=11)
yield( next( cs)) # cf. ideone.com/WFv4f
ps = wsieve() # codereview.stackexchange.com/q/92365/9064
p = next(ps) # 11
psq = p*p # 121
D = dict( zip( accumulate(wh11, start=0), count(0))) # start from
sieve = {}
for c in cs:
if c in sieve:
wheel = sieve.pop(c)
for m in wheel:
if not m in sieve:
break
sieve[m] = wheel # sieve[143] = wheel@187
elif c < psq:
yield c
else: # (c==psq)
# map (p*) (roll wh from p) = roll (wh*p) from (p*p)
x = [p*d for d in wh11]
i = D[ (p-11) % 210]
wheel = accumulate(cycle(x[i:] + x[:i]), start=psq)
p = next(ps) ; psq = p*p
next(wheel) ; m = next(wheel)
sieve[m] = wheel
I'm not sure if there is any interest by others but I have frequently come across cases where I would like to compare items in one list in another similar to relational algebra. For example are the file names in A in B and if so return a new list with those items. Long story short, I wrote some functions to do that. They are quite simple and fast (due to timsort in no small part). Even the plain python code is faster than the built in set functions (afaik). I created a github and put the ones I thought the community might like in there. https://github.com/ponderpanda/listex
an example would be
a = [1,2,3,4,5,6]
b = [1,3,7,4]
list_intersection(a,b, sorta=True, sortb=True)
returns [1,3,4]
The complexity ends up being about the longer of list a or b. if they are over some thousands the speed difference with set() operations becomes significant. There are some details about naming, unique values (not necessary), and sort order that would probably need to be ironed out if they were to be included with the built in list type. I'm not qualified to do that work but I'd be happy to help.
Best Regards,
Richard Higginbotham
I'm parsing configs for domain filtering rules, and they come as a list.
However, str.endswith requires a tuple. So I need to use
str.endswith(tuple(list)). I don't know the reasoning for this, but why
not just accept a list as well?
I keep finding myself needing to test for objects that support
subscripting. This is one case where EAFP is *not* actually easier:
try:
obj[0]
except TypeError:
subscriptable = False
except (IndexError, KeyError):
subscriptable = True
else:
subscriptable = True
if subscriptable:
...
But I don't like manually testing for it like this:
if getattr(obj, '__getitem__', None) is not None: ...
because it is wrong. (Its wrong because an object with __getitem__
defined as an instance attribute isn't subscriptable; it has to be in
the class, or a superclass.)
But doing it correctly is too painful:
if any(getattr(T, '__getitem__', None) is not None for T in type(obj).mro())
and besides I've probably still got it wrong in some subtle way. What
I'd really like to do is use the collections.abc module to do the check:
if isinstance(obj, collections.abc.Subscriptable): ...
in the same way we can check for Sized, Hashable etc.
Alternatively, if we had a getclassattr that skipped the instance
attributes, I could say:
if getclassattr(obj, '__getitem__', None) is not None: ...
(1) Am I doing it wrong? Perhaps I've missed some already existing
solution to this.
(2) If not, is there any reason why we shouldn't add Subscriptable to
the collection.abc module? I think I have the implementation:
class Subscriptable(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __getitem__(self, idx):
return None
@classmethod
def __subclasshook__(cls, C):
if cls is Subscriptable:
return _check_methods(C, "__getitem__")
return NotImplemented
Comments, questions, flames?
--
Steven
Among other places, Python ideas was recommended as a place to goto.
In the meantime I have been discussing this on pypa/pip (mainly), and also on wheel and packaging. Even submitted PRs. But the PRs are only needed if the tag is inadequate.
So, part of my reason for being here is to figure out if I can call the current tag algorithm a bug, or is correcting it a feature?
I am trying to approach this the Python way rather than be seen as a bull in a china shop.
Thanks for replying!
Sent from my iPhone
It's pretty wasteful to use a dynamic storage dictionary to hold the data
of a "struct-like data container".
Users can currently add `__slots__` manually to your `@dataclass` class,
but it means you can no longer use default values, and the manual typing
gets very tedious.
I compared the RAM usage and benchmarked the popular attrs library vs
dataclass, and saw the following result: Slots win heavily in the memory
usage department, regardless of whether you use dataclass or attrs. And
dataclass with manually written slots use 8 bytes less than
attrs-with-slots (static number, does not change based on how many fields
the class has). But dataclass loses with its lack of features, lack of
default values if slots are used, and tedious way to write slots manually
(see class "D").
Here are the numbers in bytes per-instance for classes:
```
attrs size 512
attrs-with-slots size 200
dataclass size 512
dataclass-with-slots size 192
```
As for data access benchmarks: The result varied too much between runs to
draw any conclusions except to say that slots was slightly faster than
dictionary-based storage. And that there's no real difference between the
dataclass and attrs libraries in access-speed.
Here is the full benchmark code:
```
import attr
from dataclasses import dataclass
from pympler import asizeof
import time
# every additional field adds 88 bytes
@attr.s
class A:
a = attr.ib(type=int, default=0)
b = attr.ib(type=int, default=4)
c = attr.ib(type=int, default=2)
d = attr.ib(type=int, default=8)
# every additional field adds 40 bytes
@attr.s(slots=True)
class B:
a = attr.ib(type=int, default=0)
b = attr.ib(type=int, default=4)
c = attr.ib(type=int, default=2)
d = attr.ib(type=int, default=8)
# every additional field adds 88 bytes
@dataclass
class C:
a: int = 0
b: int = 4
c: int = 2
d: int = 8
# every additional field adds 40 bytes
@dataclass
class D:
__slots__ = {"a", "b", "c", "d"}
a: int
b: int
c: int
d: int
Ainst = A()
Binst = B()
Cinst = C()
Dinst = D(0,4,2,8)
print("attrs size", asizeof.asizeof(Ainst)) # 512 bytes
print("attrs-with-slots size", asizeof.asizeof(Binst)) # 200 bytes
print("dataclass size", asizeof.asizeof(Cinst)) # 512 bytes
print("dataclass-with-slots size", asizeof.asizeof(Dinst)) # 192 bytes
s = time.perf_counter()
for i in range(0,250000000):
x = Ainst.a
elapsed = time.perf_counter() - s
print("elapsed attrs:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Binst.a
elapsed = time.perf_counter() - s
print("elapsed attrs-with-slots:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Cinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Dinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass-with-slots:", (elapsed*1000), "milliseconds")
```
Also note that it IS possible to annotate attrs-classes using the PEP 526
annotation (ie. `a: int = 0` instead of `a = attr.ib(type=int, default=0)`,
but then you lose out on a bunch of its extra features that are also
specified as named parameters to attr.ib (such as validators, kw_only
parameters, etc).
Anyway, the gist of everything is: Slots heavily beat dictionaries,
reducing the RAM usage to less than half of the current dataclass
implementation.
My proposal: Implement `@dataclass(slots=True)` which does the same thing
as attrs: Replaces the class with a modified class that has a `__slots__`
property instead of a `__dict__`. And fully supporting default values in
the process.
Hello,
So when coding, at deferent stages we need different levels of error handling. For example at stage of finding the concept, initial implementation, alpha, releasing testing, etc.
By some bits of code we can simulate enable/disable error handling.
But it would be very helpful to have such thing built in the language with some more pythonic salt 😁.
For example,
Assume this behavior..
************
SetExceptionLevel(50)
try:
x = 1 / 0
except(0) Exception as e:
print(e)
except (40) ZeroDivisionError as e1:
x = math.inf
except (40) ValueError as e2:
x = None
except(60) Exception as e3:
raise e3
****************
i.e. one exception can have more than one handler and each handler has a level to enable it,
so now for this example the first 2 exception will run only,
If we want the last one to run also we need to change the first line to "SetExceptionLevel(70)" for example.
The third one " ValueError " will never be triggered.
And if no exception met the requirements "level and exception type" then the error will be raised out.
Hello,
Several libraries have complex objects but no comparison operators for them
other than "is" which checks if we are comparing an object with itself.
It would be quite nice to be able to compare any two objects together.
I made this function in python to have a starting point
https://gist.github.com/SebastienEske/5a9c04e718becd93b7928514e80f0211
I know that it needs some improvement to protect against infinite loops and
I don't like the hardcoded check for strings, it should probably be a
parameter that allows the function to directly check certain types.
What do you think?
Best regards,
*Sébastien Eskenazi*
<http://goog_31297828>
<https://www.pixelz.com>
Pixelz
15th Floor, Detech Tower 2 Building, 107 Nguyen Phong Sac,
Hanoi, Vietnam
<https://www.linkedin.com/in/sebastieneskenazi/>
Beautiful Images Sell Products
Learn eCommerce Photography and Image Editing Tips on the Pixelz Blog
<http://www.pixelz.com/blog/>!
Idea: how about an alternative syntax specifically for argument list
definitions. To be able to write down a dict holding argument list in
simpler form, namely with a
function-like syntax:
mystyle = dict ** (linestyle="dotted", marker="s", color=
(0.1,0.1,0.0), markersize=4)
as a synonym for the common
dict:mystyle= {"linestyle": "dotted", marker": "s", "color": (0.1,0.1,
0.0), "markersize": "4"}
(The example is from Mathplotlib plot line style.)
Pros: same look as in functon arguments, thus less visual confusion and
less re-typing, With such syntax I could simply copy-paste, say, examples
from library docs to separate dict "mystyle" and ready to insert the dict
into other function making use of dict unpackong. If need to move some
argument from "mystyle" to function call, no re-typing needed. Also it
looks significantly cleaner with less quotes and less colons.
Mikhail