What little documentation I could find, providing a stride on the
assignment target for a list is supposed to trigger 'advanced slicing'
causing element-wise replacement - and hence requiring that the source
iterable has the appropriate number of elements.
>>> a = [0,1,2,3]
>>> a[::2] = [4,5]
>>> a
[4, 1, 5, 3]
>>> a[::2] = [4,5,6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
This is in contrast to regular slicing (*without* a stride), allowing to
replace a *range* by another sequence of arbitrary length.
>>> a = [0,1,2,3]
>>> a[:3] = [4]
>>> a
[4, 3]
Issue
=====
When, however, a stride of `1` is specified, advanced slicing is not
triggered.
>>> a = [0,1,2,3]
>>> a[:3:1] = [4]
>>> a
[4, 3]
If advanced slicing had been triggered, there should have been a ValueError
instead.
Expected behaviour:
>>> a = [0,1,2,3]
>>> a[:3:1] = [4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 1 to extended slice of size 3
I think that is an inconsistency in the language that should be fixed.
Why do we need this?
====================
One may want this as extra check as well so that list does not change
size. Depending on implementation, it may come with performance benefits
as well.
One could, though, argue that you still get the same result if you do all
correctly
>>> a = [0,1,2,3]
>>> a[:3:1] = [4,5,6]
>>> a
[4, 5, 6, 3]
But I disagree that there should be no error when it is wrong.
*Strides that are not None should always trigger advanced slicing.*
Other Data Types
================
This change should also be applied to bytearray, etc., though see below.
Concerns
========
It may break some code that uses advanced slicing and expects regular
slicing to occur? These cases should be rare, and the error message should
be clear enough to allow fixes? I assume these cases should be
exceptionally rare.
If the implementation relies on `slice.indices(len(seq))[2] == 1` to
determine about advance slicing or not, that would require some
refactoring. If it is only `slice.stride in (1, None)` then this could
easily replaced by checking against None.
Will there be issues with syntax consistency with other data types, in
particular outside the core library?
- I always found that the dynamic behaviour of lists w/r non-advanced
slicing to be somewhat peculiar in the first place, though, undeniably, it
can be immensely useful.
- Most external data types with fixed memory such as numpy do not have this
dynamic flexibility, and the behavior of regular slicing on assignment is
the same as regular slicing. The proposed change would increase
consistency with these other data types.
More surprises
==============
>>> import array
>>> a[1::2] = a[3:3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 0 to extended slice of size 2
whereas
>>> a = [1,2,3,4,5]
>>> a[1::2] = a[3:3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 0 to extended slice of size 2
>>> a = bytearray(b'12345')
>>> a[1::2] = a[3:3]
>>> a
bytearray(b'135')
but numpy
>>> import numpy as np
>>> a = np.array([1,2,3,4,5])
>>> a[1::2] = a[3:3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (0) into shape (2)
and
>>> import numpy as np
>>> a[1:2] = a[3:3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (0) into shape (1)
The latter two as expected. memoryview behaves the same.
Issue 2
=======
Whereas NumPy is know to behave differently as a data type with fixed
memory layout, and is not part of the standard library anyway, the
difference in behaviour between lists and arrays I find disconcerting.
This should be resolved to a consistent behaviour.
Proposal 2
==========
Arrays and bytearrays should should adopt the same advanced slicing
behaviour I suggest for lists.
Concerns 2
==========
This has the potential for a lot more side effects in existing code, but as
before in most cases error message should be triggered.
Summary
=======
I find it it not acceptable as a good language design that there is a large
range of behaviour on slicing in assignment target for the different
native (and standard library) data type of seemingly similar kind, and that
users have to figure out for each data type by testing - or at the very
least remember if documented - how it behaves on slicing in assignment
targets. There should be a consistent behaviour at the very least, ideally
even one with a clear user interface as suggested for lists.
-Alexander
As far as I can see, a comprehension like
alist = [f(x) for x in range(10)]
is better than a for-loop
for x in range(10):
alist.append(f(x))
because the previous one shows every element of the list explicitly so that we don't need to handle `append` mentally.
But when it comes to something like
[f(x) + g(f(x)) for x in range(10)]
you find you have to sacrifice some readableness if you don't want two f(x) which might slow down your code.
Someone may argue that one can write
[y + g(y) for y in [f(x) for x in range(10)]]
but it's not as clear as to show what `y` is in a subsequent clause, not to say there'll be another temporary list built in the process.
We can even replace every comprehension with map and filter, but that would face the same problems.
In a word, what I'm arguing is that we need a way to assign temporary variables in a comprehension.
In my opinion, code like
[y + g(y) for x in range(10) **some syntax for `y=f(x)` here**]
is more natural than any solution we now have.
And that's why I pro the new syntax, it's clear, explicit and readable, and is nothing beyond the functionality of the present comprehensions so it's not complicated.
And I hope the discussion could focus more on whether we should allow assigning temporary variables in comprehensions rather than how to solve the specific example I mentioned above.
The numbers module provides very useful ABC for the 'numeric tower', able to abstract away the differences between python primitives and for example numpy primitives.
I could not find any equivalent for Booleans.
However numpy defines np.bool too, so being able to have an abstract Boolean class for both python bool and numpy bool would be great.
Here is a version that I included in valid8 in the meantime
-----------------------
class Boolean(metaclass=ABCMeta):
"""
An abstract base class for booleans, similar to what is available in numbers
see https://docs.python.org/3.5/library/numbers.html
"""
__slots__ = ()
@abstractmethod
def __bool__(self):
"""Return a builtin bool instance. Called for bool(self)."""
@abstractmethod
def __and__(self, other):
"""self & other"""
@abstractmethod
def __rand__(self, other):
"""other & self"""
@abstractmethod
def __xor__(self, other):
"""self ^ other"""
@abstractmethod
def __rxor__(self, other):
"""other ^ self"""
@abstractmethod
def __or__(self, other):
"""self | other"""
@abstractmethod
def __ror__(self, other):
"""other | self"""
@abstractmethod
def __invert__(self):
"""~self"""
# register bool and numpy bool_ as virtual subclasses
# so that issubclass(bool, Boolean) = issubclass(np.bool_, Boolean) = True
Boolean.register(bool)
try:
import numpy as np
Boolean.register(np.bool_)
except ImportError:
# silently escape
pass
---------------------------
If that topic was already discussed and settled in the past, please ignore this thread - apologies for not being able to find it.
Best regards
Sylvain
Hi,
This mail is the consequence of a true story, a story where CPython
got defeated by Javascript, Java, C# and Go.
One of the teams of the company where Im working had a kind of
benchmark to compare the different languages on top of their
respective "official" web servers such as Node.js, Aiohttp, Dropwizard
and so on. The test by itself was pretty simple and tried to test the
happy path of the logic, a piece of code that fetches N rules from
another system and then apply them to X whatevers also fetched from
another system, something like that
def filter(rule, whatever):
if rule.x in whatever.x:
return True
rules = get_rules()
whatevers = get_whatevers()
for rule in rules:
for whatever in whatevers:
if filter(rule, whatever):
cnt = cnt + 1
return cnt
The performance of Python compared with the other languages was almost
x10 times slower. It's true that they didn't optimize the code, but
they did not for any language having for all of them the same cost in
terms of iterations.
Once I saw the code I proposed a pair of changes, remove the call to
the filter function making it "inline" and caching the rule's
attributes, something like that
for rule in rules:
x = rule.x
for whatever in whatevers:
if x in whatever.x:
cnt += 1
The performance of the CPython boosted x3/x4 just doing these "silly" things.
The case of the rule cache IMHO is very striking, we have plenty
examples in many repositories where the caching of none local
variables is a widely used pattern, why hasn't been considered a way
to do it implicitly and by default?
The case of the slowness to call functions in CPython is quite
recurrent and looks like its an unsolved problem at all.
Sure I'm missing many things, and I do not have all of the
information. This mail wants to get all of this information that might
help me to understand why we are here - CPython - regarding this two
slow patterns.
This could be considered an unimportant thing, but its more relevant
than someone could expect, at least IMHO. If the default code that you
can write in a language is by default slow and exists an alternative
to make it faster, this language is doing something wrong.
BTW: pypy looks like is immunized [1]
[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582
--
--pau
Hello,
Brief problem statement: Let's say I have a custom file type (say,
with extension .foo) and these .foo files are included in a package
(along with other Python modules with standard extensions like .py and
.so), and I want to make these .foo files importable like any other
module.
On its face, importlib.machinery.FileFinder makes this easy. I make a
loader for my custom file type (say, FooSourceLoader), and I can use
the FileFinder.path_hook helper like:
sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
sys.path_importer_cache.clear()
Great--now I can import my .foo modules like any other Python module.
However, any standard Python modules now cannot be imported. The way
PathFinder sys.meta_path hook works, sys.path_hooks entries are
first-come-first-serve, and furthermore FileFinder.path_hook is very
promiscuous--it will take over module loading for *any* directory on
sys.path, regardless what the file extensions are in that directory.
So although this mechanism is provided by the stdlib, it can't really
be used for this purpose without breaking imports of normal modules
(and maybe it's not intended for that purpose, but the documentation
is unclear).
There are a number of different ways one could get around this. One
might be to pass FileFinder.path_hook loaders/extension pairs for all
the basic file types known by the Python interpreter. Unfortunately
there's no great way to get that information. *I* know that I want to
support .py, .pyc, .so etc. files, and I know which loaders to use for
them. But that's really information that should belong to the Python
interpreter, and not something that should be reverse-engineered. In
fact, there is such a mapping provided by
importlib.machinery._get_supported_file_loaders(), but this is not a
publicly documented function.
One could probably think of other workarounds. For example you could
implement a custom sys.meta_path hook. But I think it shouldn't be
necessary to go to higher levels of abstraction in order to do
this--the default sys.path handler should be able to handle this use
case.
In order to support adding support for new file types to
sys.path_hooks, I ended up implementing the following hack:
#############################################################
import os
import sys
from importlib.abc import PathEntryFinder
@PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.
The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory. So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.
This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else). Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec. So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""
def __init__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)
self.path = path
self._finder_cache = {}
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)
def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None
for hook in sys.path_hooks:
if hook is self.__class__:
continue
finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass
if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass
try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass
if finder is not None:
spec = finder.find_spec(fullname, target)
if spec is not None:
return spec
# Module spec not found through any of the finders
return None
def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()
@classmethod
def install(cls):
sys.path_hooks.insert(0, cls)
sys.path_importer_cache.clear()
#############################################################
This works, for example, like:
>>> MetaFileFinder.install()
>>> sys.path_hooks.append(FileFinder.path_hook((SourceFileLoader, ['.foo'])))
And now, .foo modules are importable, without breaking support for the
built-in module types.
This is still overkill though. I feel like there should instead be a
way to, say, extend a sys.path_hooks hook based on FileFinder so as to
be able to support loading other file types, without having to go
above the default sys.meta_path hooks.
A small, but related problem I noticed in the way FileFinder.path_hook
is implemented, is that for almost *every directory* that gets cached
in sys.path_importer_cache, a new FileFinder instance is created with
its own self._loaders attribute, each containing a copy of the same
list of (loader, extensions) tuples. I calculated that on one large
project this alone accounted for nearly 1 MB. Not a big deal in the
grand scheme of things, but still a bit overkill.
ISTM it would kill two birds with one stone if FileFinder were
changed, or there were a subclass thereof, that had a class attribute
containing the standard loader/extension mappings. This in turn could
simply be appended to in order to support new extension types.
Thanks,
E
The recent thread on variable assignment in comprehensions has
prompted me to finally share
https://gist.github.com/ncoghlan/a1b0482fc1ee3c3a11fc7ae64833a315 with
a wider audience (see the comments there for some notes on iterations
I've already been through on the idea).
== The general idea ==
The general idea would be to introduce a *single* statement local
reference using a new keyword with a symbolic prefix: "?it"
* `(?it=expr)` is a new atomic expression for an "it reference
binding" (whitespace would be permitted around "?it" and "=", but PEP
8 would recommend against it in general)
* subsequent subexpressions (in execution order) can reference the
bound subexpression using `?it` (an "it reference")
* `?it` is reset between statements, including before entering the
suite within a compound statement (if you want a persistent binding,
use a named variable)
* for conditional expressions, put the reference binding in the
conditional, as that gets executed first
* to avoid ambiguity, especially in function calls (where it could be
confused with keyword argument syntax), the parentheses around
reference bindings are always required
* unlike regular variables, you can't close over statement local
references (the nested scope will get an UnboundLocalError if you try
it)
The core inspiration here is English pronouns (hence the choice of
keyword): we don't generally define arbitrary terms in the middle of
sentences, but we *do* use pronouns to refer back to concepts
introduced earlier in the sentence. And while it's not an especially
common practice, pronouns are sometimes even used in a sentence
*before* the concept they refer to ;)
If we did pursue this, then PEPs 505, 532, and 535 would all be
withdrawn or rejected (with the direction being to use an it-reference
instead).
== Examples ==
`None`-aware attribute access:
value = ?it.strip()[4:].upper() if (?it=var1) is not None else None
`None`-aware subscript access:
value = ?it[4:].upper() if (?it=var1) is not None else None
`None`-coalescense:
value = ?it if (?it=var1) is not None else ?it if (?it=var2) is
not None else var3
`NaN`-coalescence:
value = ?it if not math.isnan((?it=var1)) else ?it if not
math.isnan((?that=var2)) else var3
Conditional function call:
value = ?it() if (?it=calculate) is not None else default
Avoiding repeated evaluation of a comprehension filter condition:
filtered_values = [?it for x in keys if (?it=get_value(x)) is not None]
Avoiding repeated evaluation for range and slice bounds:
range((?it=calculate_start()), ?it+10)
data[(?it=calculate_start()):?it+10]
Avoiding repeated evaluation in chained comparisons:
value if (?it=lower_bound()) <= value < ?it+tolerance else 0
Avoiding repeated evaluation in an f-string:
print(f"{?it=get_value()!r} is printed in pure ASCII as {?it!a}
and in Unicode as {?it}"
== Possible future extensions ==
One possible future extension would be to pursue PEP 3150, treating
the nested namespace as an it reference binding, giving:
sorted_data = sorted(data, key=?it.sort_key) given ?it=:
def sort_key(item):
return item.attr1, item.attr2
(A potential bonus of that spelling is that it may be possible to make
"given ?it=:" the syntactic keyword introducing the suite, allowing
"given" itself to continue to be used as a variable name)
Another possible extension would be to combine it references with `as`
clauses on if statements and while loops:
if (?it=pattern.match(data)) is not None as matched:
...
while (?it=pattern.match(data)) is not None as matched:
...
== Why not arbitrary embedded assignments? ==
Primarily because embedded assignments are inherently hard to read,
especially in long expressions. Restricting things to one pronoun, and
then pursuing PEP 3150's given clause in order to expand to multiple
statement local names should help nudge folks towards breaking things
up into multiple statements rather than writing ever more complex
one-liners.
That said, the ?-prefix notation is deliberately designed such that it
*could* be used with arbitrary identifiers rather then being limited
to a single specific keyword, and the explicit lack of closure support
means that there wouldn't be any complex nested scope issues
associated with lambda expressions, generator expressions, or
container comprehensions.
With that approach, "?it" would just be an idiomatic default name like
"self" or "cls" rather than being a true keyword. Given arbitrary
identifier support, some of the earlier examples might instead be
written as:
value = ?f() if (?f=calculate) is not None else default
range((?start=calculate_start()), ?start+10)
value if (?lower=lower_bound()) <= value < ?lower+tolerance else 0
The main practical downside to this approach is that *all* the
semantic weight ends up resting on the symbolic "?" prefix, which
makes it very difficult to look up as a new Python user. With a
keyword embedded in the construct, there's a higher chance that folks
will be able to guess the right term to search for (i.e. "python it
expression" or "python it keyword").
Another downside of this more flexible option is that it likely
*wouldn't* be amenable to the "if expr as name:" syntax extension, as
there wouldn't be a single defined pronoun expression to bind the name
to.
However, the extension to PEP 3150 would allow the statement local
namespace to be given an arbitrary name:
sorted_data = sorted(data, key=?ns.sort_key) given ?ns=:
def sort_key(item):
return item.attr1, item.attr2
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
I don't know...to me this looks downright ugly and an awkward special case.
It feels like it combines reading difficulty of inline assignment with the
awkwardness of a magic word and the ugliness of using ?. Basically, every
con of the other proposals combined...
--
Ryan (ライアン)
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone
elsehttps://refi64.com/
On Feb 15, 2018 at 8:07 PM, <Nick Coghlan <ncoghlan(a)gmail.com>> wrote:
The recent thread on variable assignment in comprehensions has
prompted me to finally share
https://gist.github.com/ncoghlan/a1b0482fc1ee3c3a11fc7ae64833a315 with
a wider audience (see the comments there for some notes on iterations
I've already been through on the idea).
== The general idea ==
The general idea would be to introduce a *single* statement local
reference using a new keyword with a symbolic prefix: "?it"
* `(?it=expr)` is a new atomic expression for an "it reference
binding" (whitespace would be permitted around "?it" and "=", but PEP
8 would recommend against it in general)
* subsequent subexpressions (in execution order) can reference the
bound subexpression using `?it` (an "it reference")
* `?it` is reset between statements, including before entering the
suite within a compound statement (if you want a persistent binding,
use a named variable)
* for conditional expressions, put the reference binding in the
conditional, as that gets executed first
* to avoid ambiguity, especially in function calls (where it could be
confused with keyword argument syntax), the parentheses around
reference bindings are always required
* unlike regular variables, you can't close over statement local
references (the nested scope will get an UnboundLocalError if you try
it)
The core inspiration here is English pronouns (hence the choice of
keyword): we don't generally define arbitrary terms in the middle of
sentences, but we *do* use pronouns to refer back to concepts
introduced earlier in the sentence. And while it's not an especially
common practice, pronouns are sometimes even used in a sentence
*before* the concept they refer to ;)
If we did pursue this, then PEPs 505, 532, and 535 would all be
withdrawn or rejected (with the direction being to use an it-reference
instead).
== Examples ==
`None`-aware attribute access:
value = ?it.strip()[4:].upper() if (?it=var1) is not None else None
`None`-aware subscript access:
value = ?it[4:].upper() if (?it=var1) is not None else None
`None`-coalescense:
value = ?it if (?it=var1) is not None else ?it if (?it=var2) is
not None else var3
`NaN`-coalescence:
value = ?it if not math.isnan((?it=var1)) else ?it if not
math.isnan((?that=var2)) else var3
Conditional function call:
value = ?it() if (?it=calculate) is not None else default
Avoiding repeated evaluation of a comprehension filter condition:
filtered_values = [?it for x in keys if (?it=get_value(x)) is not None]
Avoiding repeated evaluation for range and slice bounds:
range((?it=calculate_start()), ?it+10)
data[(?it=calculate_start()):?it+10]
Avoiding repeated evaluation in chained comparisons:
value if (?it=lower_bound()) <= value < ?it+tolerance else 0
Avoiding repeated evaluation in an f-string:
print(f"{?it=get_value()!r} is printed in pure ASCII as {?it!a}
and in Unicode as {?it}"
== Possible future extensions ==
One possible future extension would be to pursue PEP 3150, treating
the nested namespace as an it reference binding, giving:
sorted_data = sorted(data, key=?it.sort_key) given ?it=:
def sort_key(item):
return item.attr1, item.attr2
(A potential bonus of that spelling is that it may be possible to make
"given ?it=:" the syntactic keyword introducing the suite, allowing
"given" itself to continue to be used as a variable name)
Another possible extension would be to combine it references with `as`
clauses on if statements and while loops:
if (?it=pattern.match(data)) is not None as matched:
...
while (?it=pattern.match(data)) is not None as matched:
...
== Why not arbitrary embedded assignments? ==
Primarily because embedded assignments are inherently hard to read,
especially in long expressions. Restricting things to one pronoun, and
then pursuing PEP 3150's given clause in order to expand to multiple
statement local names should help nudge folks towards breaking things
up into multiple statements rather than writing ever more complex
one-liners.
That said, the ?-prefix notation is deliberately designed such that it
*could* be used with arbitrary identifiers rather then being limited
to a single specific keyword, and the explicit lack of closure support
means that there wouldn't be any complex nested scope issues
associated with lambda expressions, generator expressions, or
container comprehensions.
With that approach, "?it" would just be an idiomatic default name like
"self" or "cls" rather than being a true keyword. Given arbitrary
identifier support, some of the earlier examples might instead be
written as:
value = ?f() if (?f=calculate) is not None else default
range((?start=calculate_start()), ?start+10)
value if (?lower=lower_bound()) <= value < ?lower+tolerance else 0
The main practical downside to this approach is that *all* the
semantic weight ends up resting on the symbolic "?" prefix, which
makes it very difficult to look up as a new Python user. With a
keyword embedded in the construct, there's a higher chance that folks
will be able to guess the right term to search for (i.e. "python it
expression" or "python it keyword").
Another downside of this more flexible option is that it likely
*wouldn't* be amenable to the "if expr as name:" syntax extension, as
there wouldn't be a single defined pronoun expression to bind the name
to.
However, the extension to PEP 3150 would allow the statement local
namespace to be given an arbitrary name:
sorted_data = sorted(data, key=?ns.sort_key) given ?ns=:
def sort_key(item):
return item.attr1, item.attr2
Cheers,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
_______________________________________________
Python-ideas mailing list
Python-ideas(a)python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/