For basic slices, the normal "slice(start, stop, step)" syntax works well.
But it becomes much more verbose to create more complicated slices that you
want to re-use for multiple multidimensional data structures, like numpy,
pandas, xarray, etc.
One idea I had was to allow creating slices by using indexing on the slice
class. So for example:
x = slice[5:1:-1, 10:20:2, 5:end]
Would be equivalent to:
x = (slice(5, 1, -1), slice(10, 20, 2), slice(5, None))
Note that this wouldn't be done on a slice instance, it would be done on
the slice class. The basic idea is that it would simply return whatever is
given to __getitem__.
Aargh, I hate Google Groups with a vengeance. If people *have* to post
from there, can they please change reply-to so that replies don't get
messed up. Or is that not possible, and yet another way that GG is
just broken?
Paul
---------- Forwarded message ----------
From: Paul Moore <p.f.moore(a)gmail.com>
Date: 22 July 2018 at 12:05
Subject: Re: [Python-ideas] PEP 505: None-aware operators
To: Grégory Lielens <gregory.lielens(a)gmail.com>
Cc: python-ideas <python-ideas(a)googlegroups.com>
On 22 July 2018 at 11:54, Grégory Lielens <gregory.lielens(a)gmail.com> wrote:
> Except that the third possibility is not possible...if a is None, a[2] will throw an exception...
> For now at least ;-)
Doh. True, I should have said "If a is not None and a[2] is not None, use a[2]".
But my point about unintended behaviour if a[2] is None stands. And
the wider point that these operators are hard to reason correctly
about is probably emphasised by my mistake.
Paul
This was previously proposed here in 2014 <
https://mail.python.org/pipermail/python-ideas/2014-January/025091.html>,
but the discussion fizzled out. To me, str.rreplace() is an obvious and
necessary complement to str.replace(), just as str.rsplit() is a complement
to str.split(). It would make Python closer to the goal of the Zen of
Python: "There should be one-- and preferably only one --obvious way to do
it." To support its usefulness, this question has been asked on Stack
Overflow a number of times, (<https://stackoverflow.com/q/2556108>, <
https://stackoverflow.com/q/14496006>, <https://stackoverflow.com/q/9943504>)
with more than a trivial number of votes for the answers (>100 for the top
answer of the first two questions, and >50 for the third, not necessarily
mutually exclusive voters, but probably largely; even assuming the worst,
>100 is nothing to scoff at). While anonymous Stack Overflow votes are not
necessarily the best arbiter on what's a good idea, they at least show
interest.
My proposed behavior (though probably not the implementation details) would
be as follows (though the implementation would obviously be as a string
method, not a function, with 'self' replacing 's'):
def rreplace(s, old, new, count=-1):
'''
Return a copy with all occurrences of substring old replaced by new.
count
Maximum number of occurrences to replace.
-1 (the default value) means replace all occurrences
Substrings are replaced starting at the end of the string and working
to the front.
'''
return new.join(s.rsplit(old, count))
Addressing some (paraphrased) issues from the previous discussion, in the
form of an FAQ:
rreplace() looks weird.
>
This is maintaining consistency with the trend of 'r' prefixes for
'reverse' methods. Introducing 'reverse_' as a prefix would be
inconsistent, but worse, it would also encourage backwards incompatible
changes into existing methods. I think such a prefix naming change warrants
its own separate discussion.
There are already ways to do this. Why should we add another string method?
>
My main motivation is having one clear and efficient way to do this. I
explain this in greater detail in my introduction above.
The default of count=-1 has the same behavior as str.replace(), right?
>
Actually, it doesn't have the same behavior, as MRAB pointed out in the
previous discussion <
https://mail.python.org/pipermail/python-ideas/2014-January/025102.html>.
The default of -1 also keeps the syntax consistent with str.rsplit()'s
syntax.
If we're adding this, shouldn't we also add bytes.rreplace,
> bytearray.rreplace, bytearray.rremove, tuple.rindex, list.rindex,
> list.rremove?
>
Honestly, I don't know. I would prefer not to dilute this discussion too
much, or start making a slippery slope argument, but if it's directly
relevant, I think any thoughts are welcome.
Couldn't we just add a traverse_backwards parameter to str.replace()? In
> fact, why don't we just get rid of str.rfind() and str.rindex() entirely
> and just add new parameters to str.find() and str.index()?
>
I think Steven D'Aprano explained this well in the previous discussion
here: <
https://mail.python.org/pipermail/python-ideas/2014-January/025132.html>.
And addressing counterarguments here: <
https://mail.python.org/pipermail/python-ideas/2014-January/025135.html>.
Basically, different functions for different purposes make the purpose
clearer (no confusing/complicated parameter names and functions), and
str.rreplace() and str.replace() are usually going to be used in situations
where the traversal direction is known at edit time, so there's no need to
make the method determine the direction at runtime.
Thoughts? Support/oppose?
While I am aware of projects like Cython and mypy, it seems to make sense
for CPython to allow optional enforcement of type hints, with compiler
optimizations related to it to be used. While this would not receive the
same level of performance benefits as using ctypes directly, there do
appear to be various gains available here.
My main concern with this as a thought was how to specify type hints as
optional, as for maximum benefit, this shouldn't prevent the ability to
type hint functions that you don't want to be treated in this manner. While
I don't have an answer for that yet, I thought I'd toss the idea out there
first. If it needs to be seen in action before deciding if it makes sense
to add, I can work on a potential implementation soon, but right now, this
is just an idea.
On Sat, Jul 21, 2018 at 11:05:43AM +0100, Daniel Moisset wrote:
[snip interesting and informative discussion, thank you]
> @Steven D'Aprano: you mentioned soemthign about race conditions but I don't
> think this algorithm has any (the article you linked just said that doing
> refcounting in the traditional way and without synchronization in a multi
> core environment has race conditions, which is not what is being discussed
> here). Could you expand on this?
Certainly not. I already said that my view of this was very naive.
Now that I have a better understanding that Jonathan doesn't have a
specific issue to solve, other than "improved performance through better
ref counting", I'll most likely just sit back and lurk.
--
Steve
In the past I have personally viewed Python as difficult to use for
parallel applications, which need to do multiple things simultaneously
for increased performance:
* The old Threads, Locks, & Shared State model is inefficient in Python
due to the GIL, which limits CPU usage to only one thread at a time
(ignoring certain functions implemented in C, such as I/O).
* The Actor model can be used with some effort via the “multiprocessing”
module, but it doesn’t seem that streamlined and forces there to be a
separate OS process per line of execution, which is relatively expensive.
I was thinking it would be nice if there was a better way to implement
the Actor model, with multiple lines of execution in the same process,
yet avoiding contention from the GIL. This implies a separate GIL for
each line of execution (to eliminate contention) and a controlled way to
exchange data between different lines of execution.
So I was thinking of proposing a design for implementing such a system.
Or at least get interested parties thinking about such a system.
With some additional research I notice that [PEP 554] (“Multiple
subinterpeters in the stdlib”) appears to be putting forward a design
similar to the one I described. I notice however it mentions that
subinterpreters currently share the GIL, which would seem to make them
unusable for parallel scenarios due to GIL contention.
I'd like to solicit some feedback on what might be the most efficient
way to make forward progress on efficient parallelization in Python
inside the same OS process. The most promising areas appear to be:
1. Make the current subinterpreter implementation in Python have more
complete isolation, sharing almost no state between subinterpreters. In
particular not sharing the GIL. The "Interpreter Isolation" section of
PEP 554 enumerates areas that are currently shared, some of which
probably shouldn't be.
2. Give up on making things work inside the same OS process and rather
focus on implementing better abstractions on top of the existing
multiprocessing API so that the actor model is easier to program
against. For example, providing some notion of Channels to communicate
between lines of execution, a way to monitor the number of Messages
waiting in each channel for throughput profiling and diagnostics,
Supervision, etc. In particular I could do this by using an existing
library like Pykka or Thespian and extending it where necessary.
Thoughts?
[PEP 554]: https://www.python.org/dev/peps/pep-0554/
--
David Foster | Seattle, WA, USA
The goal of this idea is to make it easier to find out when someone has
installed packages for the wrong python installation. I'm coming across
quite a few StackOverflow posts and emails where beginners are using pip to
install a package, but then finding they can't import it because they have
multiple python installations and used the wrong pip.
For example, this guy has this problem:
https://stackoverflow.com/questions/37662012/which-pip-is-with-which-python
I'd propose adding a simple line to the output of "pip install" that
changes this:
user@user:~$ pip3 install pyperclip
Collecting pyperclip
Installing collected packages: pyperclip
Successfully installed pyperclip-1.6.2
...to something like this:
user@user:~$ pip3 install pyperclip
Running pip for /usr/bin/python3
Collecting pyperclip
Installing collected packages: pyperclip
Successfully installed pyperclip-1.6.2
This way, when they copy/paste their output to StackOverflow, it'll be
somewhat more obvious to their helper that they used pip for the wrong
python installation.
This info would also be useful for the output of "pip info", but that would
break scripts that reads that output.
Any thoughts?
-Al
On Thu, Jul 19, 2018 at 10:07 AM, Stephan Houben <stephanh42(a)gmail.com>
wrote:
> You are aware of numba?
>
> https://numba.pydata.org/
>
> Stephan
>
> Op do 19 jul. 2018 16:03 schreef Eric Fahlgren <ericfahlgren(a)gmail.com>:
>
>> On Thu, Jul 19, 2018 at 6:52 AM Michael Hall
>> <python-ideas(a)michaelhall.tech> wrote:
>>
>>> While I am aware of projects like Cython and mypy, it seems to make
>>> sense for CPython to allow optional enforcement of type hints, with
>>> compiler optimizations related to it to be used. While this would not
>>> receive the same level of performance benefits as using ctypes directly,
>>> there do appear to be various gains available here.
>>>
>>
>> Just to make sure I understand: In other words, they would no longer be
>> "hints" but "guarantees". This would allow an optimizer pass much greater
>> latitude in code generation, somehow or other.
>>
>> For purposes of illustration (this is not a proposal, just for
>> clarification):
>>
>> @guaranteed_types
>> def my_sqrt(x:c_double) -> c_double:
>> ...
>>
>> would tell the compiler that it's now possible to replace the general
>> PyObject marshalling of this function with a pure-C one that only accepts
>> doubles and woe be unto those who use it otherwise.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas(a)python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
Less so than I probably should have been given the idea I'm pitching. I've
given it a quick look, and at a glance, it seems to already be capable of
the desired behavior specifically centered around performance. I do still
think it *may* be beneficial to have in the CPython reference
implementation alongside a standard grammar, which could further enable the
various libraries and python implementations to make use of the additional
knowledge that the type hint is intended as a more than a hint.
Hi,
I use list and dict comprehension a lot, and a problem I often have is to
do the equivalent of a group_by operation (to use sql terminology).
For example if I have a list of tuples (student, school) and I want to have
the list of students by school the only option I'm left with is to write
student_by_school = defaultdict(list)
for student, school in student_school_list:
student_by_school[school].append(student)
What I would expect would be a syntax with comprehension allowing me to
write something along the lines of:
student_by_school = {group_by(school): student for school, student in
student_school_list}
or any other syntax that allows me to regroup items from an iterable.
Small FAQ:
Q: Why include something in comprehensions when you can do it in a small
number of lines ?
A: A really appreciable part of the list and dict comprehension is the fact
that it allows the developer to be really explicit about what he wants to
do at a given line.
If you see a comprehension, you know that the developer wanted to have an
iterable and not have any side effect other than depleting the iterator (if
he respects reasonable code guidelines).
Initializing an object and doing a for loop to construct it is both too
long and not explicit enough about what is intended.
It should be reserved for intrinsically complex operations, not one of the
base operation one can want to do with lists and dicts.
Q: Why group by in particular ?
A: If we take SQL queries (https://en.wikipedia.org/wiki/SQL_syntax#Queries)
as a reasonable way of seeing how people need to manipulate data on a
day-to-day basis, we can see that dict comprehensions already covers most
of the base operations, the only missing operations being group by and
having.
Q: Why not use it on list with syntax such as
student_by_school = [
school, student
for school, student in student_school_list
group by school
]
?
A: It would create either a discrepancy with iterators or a perhaps
misleading semantic (the one from itertools.groupby, which requires the
iterable to be sorted in order to be useful).
Having the option do do it with a dict remove any ambiguity and should be
enough to cover most "group by" applications.
Examples:
edible_list = [('fruit', 'orange'), ('meat', 'eggs'), ('meat', 'spam'),
('fruit', 'apple'), ('vegetable', 'fennel'), ('fruit', 'pineapple'),
('fruit', 'pineapple'), ('vegetable', 'carrot')]
edible_list_by_food_type = {group_by(food_type): edible for food_type,
edible in edible_list}
print(edible_list_by_food_type)
{'fruit': ['orange', 'pineapple'], 'meat': ['eggs', 'spam'],
'vegetable': ['fennel', 'carrot']}
bank_transactions = [200.0, -357.0, -9.99, -15.6, 4320.0, -1200.0]
splited_bank_transactions = {group_by('credit' if amount > 0 else
'debit'): amount for amount in bank_transactions}
print(splited_bank_transactions)
{'credit': [200.0, 4320.0], 'debit': [-357.0, -9.99, -15.6, -1200.0]}
--
Nicolas Rolin
In the cPython repository, there is an unparse module in the Tools section.
https://github.com/python/cpython/blob/master/Tools/parser/unparse.py
However, as it is not part of the standard library, it cannot be easily
used; to do so, one needs to make a local copy in a place from where it can
be imported.
This module can be useful for people using the ast module to create and
parse trees, modify them ... and who want to convert the result back into
source code. Since it is obviously maintained to be compatible with the
current Python version, would it be possible to include the unparse module
in the standard library?
André Roberge