data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
Maybe it's time to add a new module for sequence-specific functions (seqtools?). It should contain at least two classes or fabric functions: 1. A view that represents a sliced subsequence. Lazy equivalent of seq[start:end:step]. This feature is implemented in third-party module dataview [1]. 2. A view that represents a linear sequence as 2D array. Iterating this view emits non-intersecting chunks of the sequence. For example it can be used for representing the bytes object as a sequence of 1-byte bytes objects (as in 2.x), a generalized alternative to iterbytes() from PEP 467 [2]. Neither itertools nor collections modules look good place for these features, since they are not concrete classes and work only with sequences, not general iterables or iterators. On other side, mappingproxy and ChainMap look close, maybe new module should be oriented not on sequences, but on views. [1] https://pypi.python.org/pypi/dataview [2] https://www.python.org/dev/peps/pep-0467
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I don't want to speak for Serhiy, but it seems like he wants NumPy-like behaviors over generic sequences. I think this idea is appealing. For example, Python list has O(1) append, while the equivalent for np.ndarray would be an O(n) copy to a larger array. Expressing those NumPy affordances genetically feels like a good thing. However, maybe this is something that could live in PyPI first top stabilize APIs. On Jul 17, 2016 1:08 PM, "Michael Selik" <michael.selik@gmail.com> wrote:
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
There are a number of generic implementations of these sequence algorithms: * http://toolz.readthedocs.io/en/latest/api.html#itertoolz * https://github.com/kachayev/fn.py#itertools-recipes * http://funcy.readthedocs.io/en/stable/seqs.html * http://docs.python.org/2/reference/expressions.html#slicings * https://docs.python.org/2/library/itertools.html#itertools.islice * https://docs.python.org/3/library/itertools.html#itertools.islice On Jul 17, 2016 3:23 PM, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
Maybe it's time to add a new module for sequence-specific functions
(seqtools?). It should contain at least two classes or fabric functions:
1. A view that represents a sliced subsequence. Lazy equivalent of
seq[start:end:step]. This feature is implemented in third-party module dataview [1]. islice?
2. A view that represents a linear sequence as 2D array. Iterating this
view emits non-intersecting chunks of the sequence. For example it can be used for representing the bytes object as a sequence of 1-byte bytes objects (as in 2.x), a generalized alternative to iterbytes() from PEP 467 [2]. partition? http://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.partition _all
Neither itertools nor collections modules look good place for these
features, since they are not concrete classes and work only with sequences, not general iterables or iterators. On other side, mappingproxy and ChainMap look close, maybe new module should be oriented not on sequences, but on views.
[1] https://pypi.python.org/pypi/dataview [2] https://www.python.org/dev/peps/pep-0467
data:image/s3,"s3://crabby-images/4ee71/4ee711049761912dd0a652f75ae8c531bb5351c5" alt=""
SortedContainers implements exactly that as a method on SortedList as: SortedList.islice(start=None, stop=None, reverse=False) Returns an iterator that slices self from start to stop index, inclusive and exclusive respectively. When reverse is True, values are yielded from the iterator in reverse order. Both start and stop default to None which is automatically inclusive of the beginning and end. Return type: iterator Reference: http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontai... I chose to limit the stride to 1 or -1 using the keyword parameter "reverse." No complaints though it differs from traditional slice syntax. Grant
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 18 July 2016 at 08:21, Wes Turner <wes.turner@gmail.com> wrote:
I think the existence of multiple implementations in the context of larger libraries lends weight to the notion of a "seqtools" standard library module that works with arbitrary sequences, just as itertools works with arbitrary iterables. I don't think combining these algorithms with the "algorithms that work with arbitrary mappings" classes would make sense - for better or for worse, I think the latter is "collections" now, since that's also where the dict variants live (defaultdict, OrderedDict, Counter). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/348fe/348fefeddc4874f0c48d14d5bcbd189dd5cb9633" alt=""
On Tue, Jul 19, 2016 at 7:48 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Additionally, itertools.islice doesn't support negative indexing and it's O(start) to get the first element rather than the O(1) that it could be for sequences.
With that said, I've never really worked with sequences that are big enough for the runtime complexity of normal python slicing to actually matter. I have a feeling that in the typical case, wrapping more abstraction around slicing and creating lazy "views" wouldn't lead to any practical performance benefits. For the cases where it *does* lead to practical performance benefits, `numpy` starts to look a whole lot more attractive as an option. I wonder how many applications would actually benefit from this but can't/shouldn't switch to `numpy` due to other constraints?
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
On Jul 19, 2016 10:51 AM, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
On 18.07.16 01:21, Wes Turner wrote:
There are a number of generic implementations of these sequence
algorithms:
so you're looking for something like strided memoryviews for nonsequential access over sequences/iterables which are sequential? sort of like a bitmask? https://docs.python.org/3/library/stdtypes.html#memoryview
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I don't want to speak for Serhiy, but it seems like he wants NumPy-like behaviors over generic sequences. I think this idea is appealing. For example, Python list has O(1) append, while the equivalent for np.ndarray would be an O(n) copy to a larger array. Expressing those NumPy affordances genetically feels like a good thing. However, maybe this is something that could live in PyPI first top stabilize APIs. On Jul 17, 2016 1:08 PM, "Michael Selik" <michael.selik@gmail.com> wrote:
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
There are a number of generic implementations of these sequence algorithms: * http://toolz.readthedocs.io/en/latest/api.html#itertoolz * https://github.com/kachayev/fn.py#itertools-recipes * http://funcy.readthedocs.io/en/stable/seqs.html * http://docs.python.org/2/reference/expressions.html#slicings * https://docs.python.org/2/library/itertools.html#itertools.islice * https://docs.python.org/3/library/itertools.html#itertools.islice On Jul 17, 2016 3:23 PM, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
Maybe it's time to add a new module for sequence-specific functions
(seqtools?). It should contain at least two classes or fabric functions:
1. A view that represents a sliced subsequence. Lazy equivalent of
seq[start:end:step]. This feature is implemented in third-party module dataview [1]. islice?
2. A view that represents a linear sequence as 2D array. Iterating this
view emits non-intersecting chunks of the sequence. For example it can be used for representing the bytes object as a sequence of 1-byte bytes objects (as in 2.x), a generalized alternative to iterbytes() from PEP 467 [2]. partition? http://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.partition _all
Neither itertools nor collections modules look good place for these
features, since they are not concrete classes and work only with sequences, not general iterables or iterators. On other side, mappingproxy and ChainMap look close, maybe new module should be oriented not on sequences, but on views.
[1] https://pypi.python.org/pypi/dataview [2] https://www.python.org/dev/peps/pep-0467
data:image/s3,"s3://crabby-images/4ee71/4ee711049761912dd0a652f75ae8c531bb5351c5" alt=""
SortedContainers implements exactly that as a method on SortedList as: SortedList.islice(start=None, stop=None, reverse=False) Returns an iterator that slices self from start to stop index, inclusive and exclusive respectively. When reverse is True, values are yielded from the iterator in reverse order. Both start and stop default to None which is automatically inclusive of the beginning and end. Return type: iterator Reference: http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontai... I chose to limit the stride to 1 or -1 using the keyword parameter "reverse." No complaints though it differs from traditional slice syntax. Grant
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 18 July 2016 at 08:21, Wes Turner <wes.turner@gmail.com> wrote:
I think the existence of multiple implementations in the context of larger libraries lends weight to the notion of a "seqtools" standard library module that works with arbitrary sequences, just as itertools works with arbitrary iterables. I don't think combining these algorithms with the "algorithms that work with arbitrary mappings" classes would make sense - for better or for worse, I think the latter is "collections" now, since that's also where the dict variants live (defaultdict, OrderedDict, Counter). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/348fe/348fefeddc4874f0c48d14d5bcbd189dd5cb9633" alt=""
On Tue, Jul 19, 2016 at 7:48 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Additionally, itertools.islice doesn't support negative indexing and it's O(start) to get the first element rather than the O(1) that it could be for sequences.
With that said, I've never really worked with sequences that are big enough for the runtime complexity of normal python slicing to actually matter. I have a feeling that in the typical case, wrapping more abstraction around slicing and creating lazy "views" wouldn't lead to any practical performance benefits. For the cases where it *does* lead to practical performance benefits, `numpy` starts to look a whole lot more attractive as an option. I wonder how many applications would actually benefit from this but can't/shouldn't switch to `numpy` due to other constraints?
-- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt@getpattern.com // P: 603.892.7736 We’re looking for beta testers. Go here <https://www.getpattern.com/meetpattern> to sign up!
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
On Jul 19, 2016 10:51 AM, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
On 18.07.16 01:21, Wes Turner wrote:
There are a number of generic implementations of these sequence
algorithms:
so you're looking for something like strided memoryviews for nonsequential access over sequences/iterables which are sequential? sort of like a bitmask? https://docs.python.org/3/library/stdtypes.html#memoryview
participants (8)
-
David Mertz
-
Grant Jenks
-
Matt Gilson
-
Michael Selik
-
Nick Coghlan
-
Serhiy Storchaka
-
Sven R. Kunze
-
Wes Turner