Hello. Currently during star assignement the new list is created. What was the idea to produce new list instead of returning an iterator? It seems to me that returning an iterator more suited to the spirit of Python 3. There are three cases:
1. a,b,c,*d = something_iterable 2. *a,b,c,d = something_iterable 3. a,*b,c,d = something_iterable
The first one is obvious. For the rest two we always need to iterate through entire iterable to achieve values for b,c,d (or c,d) binding. But this can be done more memory effiecient than currently (may be I'm wrong). And we can iterate in space of last three (or two) variables. Some rough (simplified) Python code:
from itertools import islice, chain from collections import deque
def good_star_exp(signature, seq):
if signature.count('*') > 1: raise SyntaxError('two starred expressions in assignment')
vrs = signature.split(',') idx_max = len(vrs) - 1 star_pos, = (i for i,v in enumerate(vrs) if '*' in v)
#First case if star_pos == idx_max: head = islice(seq, idx_max) tail = islice(seq, idx_max, None) return chain(head, (tail,))
#Second case elif star_pos == 0: tail = deque(maxlen=idx_max) for seq_idx_max, v in enumerate(seq): tail.append(v) head = islice(seq, 0, seq_idx_max-(idx_max-1)) return chain([head], tail)
#Third case else: head = islice(seq, star_pos) tail = deque(maxlen=(idx_max-star_pos)) for seq_idx_max, v in enumerate(seq): tail.append(v) mid = islice(seq, star_pos, seq_idx_max-(idx_max-2)) return chain(head, [mid], tail)
ls = range(100000) a,b,c,d = good_star_exp('a,b,c,*d', ls) a,b,c,d = good_star_exp('*a,b,c,d', ls) a,b,c,d = good_star_exp('a,*b,c,d', ls)
Of course this version has drawbacks (the first that come to mind): 1. Will *b see change if rhs is some muttable sequence? 2. Will *b one way iterator or somethong like range?
But still it seems to me that the "iterator way" has more useful applications.
With best regards, -gdg
21.11.17 10:54, Kirill Balunov пише:
Of course this version has drawbacks (the first that come to mind):
- Will *b see change if rhs is some muttable sequence?
- Will *b one way iterator or somethong like range?
But still it seems to me that the "iterator way" has more useful applications.
Your implementation iterates seq multiple times. But iterable unpacking syntax works with an arbitrary iterable, and iterates it only once.
Changing the result of iterable unpacking will break existing code that depends on the result been a list.
And you already have mentioned a question about mutable sequence.
If these conditions and restrictions suit you, you can use your good_star_exp() in your code or share it with others. But the semantic of iterable unpacking can't be changed.
Your implementation iterates seq multiple times. But iterable unpacking syntax works with an arbitrary iterable, and iterates it only once.
Oh sorry, I know that my implementation iterates seq multiple times, I only provide this to show the idea. It can be much optimized at C level. I just want to understand if it's worth the time and effort.
Changing the result of iterable unpacking will break existing code that depends on the result been a list.
Backward compatibility is an important issue, but at the same time it is the main brake on progress.
And you already have mentioned a question about mutable sequence.
If these conditions and restrictions suit you, you can use your good_star_exp() in your code or share it with others. But the semantic of iterable unpacking can't be changed.
And how do you look at something like this (deferred star evaluation)?:
a, ?*b, c, d = something_iterable
With kind regards, -gdg
21.11.17 11:27, Kirill Balunov пише:
Your implementation iterates seq multiple times. But iterable unpacking syntax works with an arbitrary iterable, and iterates it only once.
Oh sorry, I know that my implementation iterates seq multiple times, I only provide this to show the idea. It can be much optimized at C level. I just want to understand if it's worth the time and effort.
You can implement the first case, but for other cases you will need a storage for saving intermediate items. And using a list is a good option.
And you already have mentioned a question about mutable sequence. If these conditions and restrictions suit you, you can use your good_star_exp() in your code or share it with others. But the semantic of iterable unpacking can't be changed.
And how do you look at something like this (deferred star evaluation)?:
a, ?*b, c, d = something_iterable
This will be not different from
a, *b, c, d = something_iterable b = iter(b)
There is nothing deferred here.
The only possible benefit can be in the case
a, b, ?*c = something_iterable
But I have doubts that this special case deserves introducing a new syntax.
On Tue, Nov 21, 2017 at 12:27:32PM +0300, Kirill Balunov wrote:
Backward compatibility is an important issue, but at the same time it is the main brake on progress.
"Progress just means bad things happen faster." -- Terry Pratchett, "Witches Abroad"
[...]
And how do you look at something like this (deferred star evaluation)?:
a, ?*b, c, d = something_iterable
A waste of effort?
How do you defer evaluating the second and subsequent items if you evaluate the final two? Given:
def gen(): yield 999 for i in range(100): yield random.random() yield 999 yield 999
then
a, ?*b, c, d = gen()
has to evaluate all 100 random numbers in order to assign a, c, d all equal to 999. Making b an iterator instead of a list doesn't actually avoid evaluating anything, and it will still require as much storage as a list. The most likely implementation would:
- store the evaluated items in a list; - assign iter(the list) as b.
I suppose that there could be some way of delaying the calls to random.random() by returning a thunk, but that is likely to be more expensive in both memory and time than a simple list of floats.
May be the first thing which I should do is to improve my English:) My main point was that in many cases (in my experience) it is a waste of memory to store entire list for star variable (*b) instead of some kind of iterator or deferred evaluation.
How do you defer evaluating the second and subsequent items if you
evaluate the final two? Given:
def gen(): yield 999 for i in range(100): yield random.random() yield 999 yield 999
then
a, ?*b, c, d = gen()
If I can not copy at Python level, I can 'tee' when 'star_pos' is reached.
In my usual practice, the main use that I encounter when see an assignment to a star variable is as a storage which is used only if the other vars match some criterion.
With kind regards, -gdg
FWIW, here's something for working with memory-efficient sequences (and generators), which should get more features in the future:
pip install git+https://github.com/k7hoven/views
Some examples of what it does:
py> from views import seq py> seq[::range(3), None, ::"abc", "Hi!"] <sequence view 8: [0, 1, 2, None, 'a', 'b', 'c', 'Hi!'] > py> seq[::range(100)] <sequence view 100: [0, 1, 2, 3, 4, ..., 96, 97, 98, 99] >
py> from views import seq, gen py> seq.chain([1, 2, 3], [4, 5, 6]) <sequence view 6: [1, 2, 3, 4, 5, 6] > py> list(gen.chain([1, 2, 3], [4, 5, 6])) [1, 2, 3, 4, 5, 6]
py> from views import range py> range(5) range(0, ..., 4) py> range(1, 10, 3) range(1, ..., 7, step=3) py> range(1, ..., 5) range(1, ..., 5) py> range(1, 3, ..., 10) range(1, ..., 9, step=2)
Sequences are perhaps more interesting than the generators, which are just there, because I don't want to implicitly try to convert generators/iterators into sequences. I do intend to add at least one *explicit* mechanism.
Much of this is thread-safe, but the assumption in general is that one does not modify the original sequences. One problem is that there's no way to efficiently check if the originals have been mutated. Currently it just sometimes checks that the lengths match.
This approach can also be a big performance boost because it avoids copying stuff around in memory etc. However, many possible optimizations have not been implemented yet, so there's overhead that can be significant for small sequences. For instance, itertools could be used to optimize some features.
––Koos
On Tue, Nov 21, 2017 at 2:35 PM, Serhiy Storchaka storchaka@gmail.com wrote:
21.11.17 13:53, Kirill Balunov пише:
If I can not copy at Python level, I can 'tee' when 'star_pos' is reached.
And tee() uses a real RAM for saving items.
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 11/21/2017 3:54 AM, Kirill Balunov wrote:
Hello. Currently during star assignement the new list is created. What was the idea to produce new list instead of returning an iterator? It seems to me that returning an iterator more suited to the spirit of Python 3. There are three cases:
- a,b,c,*d = something_iterable
- *a,b,c,d = something_iterable
- a,*b,c,d = something_iterable
The first one is obvious.
Right, and easily dealt with with current Python.
d = iter(something iterable) a,b,c = islice(d, 3) (or 3 next(d) calls)
More typical is to pull one item off the iterator with next(), as is optionally done with csv readers.
it = iter(iterable) header = next(it) # such as column names for item in it: process(item)
For the rest two we always need to iterate through entire iterable to achieve values for b,c,d (or c,d) binding. But this can be done more memory effiecient than currently (may be I'm wrong).
For the general case, there is no choice but to save in a list.