Mailman 3 Shouldn't all in-place operations simply return self? - NumPy-Discussion

Shouldn't all in-place operations simply return self?

eat

Jan. 16, 2013

7:11 p.m.

Hi, In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner. I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html. But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's ( http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner. What do you think? -eat P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to create a PR of it.

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

josef.pktd＠gmail.com

January 2013

8:53 p.m.

New subject: Shouldn't all in-place operations simply return self?

On Wed, Jan 16, 2013 at 7:11 PM, eat <e.antero.tammi@gmail.com> wrote:

...

Hi,

In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner.

I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html.

But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's (http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner.

What do you think?

I'm against it. I think it requires too much thinking by users and developers. The function in numpy are conceptually much closer to basic python, not some heavy object oriented framework where we need lots of chaining. (I thought I remembered some discussion and justification for returning self in sqlalchemy for this, but couldn't find it.) I'm chasing quite a few bugs with inplace operations

...

...
...
a = np.arange(10) a *= np.pi a ???

...

...
...
a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5])

...

...
...
b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ? Do we have a list of functions that are inplace? Josef

...

-eat

P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to create a PR of it.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

1:41 a.m.

New subject: Shouldn't all in-place operations simply return self?

On 16 Jan 2013 17:54, <josef.pktd@gmail.com> wrote:

...

...
...
...
a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5])

...
...
...
b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ?

Do we have a list of functions that are inplace?

I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc. Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted(). -n

Jim Vickroy

8:54 a.m.

New subject: Shouldn't all in-place operations simply return self?

On 1/16/2013 11:41 PM, Nathaniel Smith wrote:

...

On 16 Jan 2013 17:54, <josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote:

...
...
...
...
a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5])

...
...
...
b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ?

Do we have a list of functions that are inplace?

I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc.

Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted().

I like this (tense) suggestion. It seems easy to remember. --jv

...

-n

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Alan G Isaac

9:32 a.m.

New subject: Shouldn't all in-place operations simply return self?

Is it really better to have `permute` and `permuted` than to add a keyword? (Note that these are actually still ambiguous, except by convention.) Btw, two separate issues seem to be running side by side. i. should in-place operations return their result? ii. how can we signal that an operation is inplace? I expect NumPy to do inplace operations when feasible, so maybe they could take an `out` keyword with a None default. Possibly recognize `out=True` as asking for the original array object to be returned (mutated); `out='copy'` as asking for a copy to be created, operated upon, and returned; and `out=a` to ask for array `a` to be used for the output (without changing the original object, and with a return value of None). Alan Isaac

Nathaniel Smith

11:33 a.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 2:32 PM, Alan G Isaac <alan.isaac@gmail.com> wrote:

...

Is it really better to have `permute` and `permuted` than to add a keyword? (Note that these are actually still ambiguous, except by convention.)

The convention in question, though, is that of English grammar. In practice everyone who uses numpy is a more-or-less skilled English speaker in any case, so re-using the conventions is helpful! "Shake the martini!" <- an imperative command This is a complete statement all by itself. You can't say "Hand me the shake the martini". In procedural languages like Python, there's a strong distinction between statements (whole lines, a = 1), which only matter because of their side-effects, and expressions (a + b) which have a value and can be embedded into a larger statement or expression ((a + b) + c). "Shake the martini" is clearly a statement, not an expression, and therefore clearly has a side-effect. "shaken martini" <- a noun phrase Grammatically, this is like plain "martini", you can use it anywhere you can use a noun. "Hand me the martini", "Hand me the shaken martini". In programming terms, it's an expression, not a statement. And side-effecting expressions are poor style, because when you read procedural code, you know each statement contains at least 1 side-effect, and it's much easier to figure out what's going on if each statement contains *exactly* one side-effect, and it's the top-most operation. This underlying readability guideline is actually baked much more deeply into Python than the sort/sorted distinction -- this is why in Python, 'a = 1' is *not* an expression, but a statement. C allows you to say things like "b = (a = 1)", but in Python you have to say "a = 1; b = a".

...

Btw, two separate issues seem to be running side by side.

i. should in-place operations return their result? ii. how can we signal that an operation is inplace?

I expect NumPy to do inplace operations when feasible, so maybe they could take an `out` keyword with a None default. Possibly recognize `out=True` as asking for the original array object to be returned (mutated); `out='copy'` as asking for a copy to be created, operated upon, and returned; and `out=a` to ask for array `a` to be used for the output (without changing the original object, and with a return value of None).

Good point that numpy also has a nice convention with out= arguments for ufuncs. I guess that convention is, by default return a new array, but also allow one to modify the same (or another!) array in-place, by passing out=. So this would suggest that we'd have b = shuffled(a) shuffled(a, out=a) shuffled(a, out=b) shuffle(a) # same as shuffled(a, out=a) and if people are bothered by having both 'shuffled' and 'shuffle', then we drop 'shuffle'. (And the decision about whether to include the imperative form can be made on a case-by-case basis; having both shuffled and shuffle seems fine to me, but probably there are other cases where this is less clear.) There is also an argument that if out= is given, then we should always return None, in general. I'm having a lot of trouble thinking of any situation where it would be acceptable style (or even useful) to write something like: c = np.add(a, b, out=a) + 1 But, 'out=' is very large and visible (which makes the readability less terrible than it could be). And np.add always returns the out array when working out-of-place (so there's at least a weak countervailing convention). So I feel much more strongly that shuffle() should return None, than I do that np.add(out=...) should return None. A compromise position would be to make all new functions that take out= return None when out= is given, while leaving existing ufuncs and such as they are for now. -n

Dag Sverre Seljebotn

1:08 p.m.

New subject: Shouldn't all in-place operations simply return self?

On 01/17/2013 05:33 PM, Nathaniel Smith wrote:

...

On Thu, Jan 17, 2013 at 2:32 PM, Alan G Isaac <alan.isaac@gmail.com> wrote:

...
Is it really better to have `permute` and `permuted` than to add a keyword? (Note that these are actually still ambiguous, except by convention.)

The convention in question, though, is that of English grammar. In practice everyone who uses numpy is a more-or-less skilled English speaker in any case, so re-using the conventions is helpful!

"Shake the martini!" <- an imperative command

This is a complete statement all by itself. You can't say "Hand me the shake the martini". In procedural languages like Python, there's a strong distinction between statements (whole lines, a = 1), which only matter because of their side-effects, and expressions (a + b) which have a value and can be embedded into a larger statement or expression ((a + b) + c). "Shake the martini" is clearly a statement, not an expression, and therefore clearly has a side-effect.

"shaken martini" <- a noun phrase

Grammatically, this is like plain "martini", you can use it anywhere you can use a noun. "Hand me the martini", "Hand me the shaken martini". In programming terms, it's an expression, not a statement. And side-effecting expressions are poor style, because when you read procedural code, you know each statement contains at least 1 side-effect, and it's much easier to figure out what's going on if each statement contains *exactly* one side-effect, and it's the top-most operation.

This underlying readability guideline is actually baked much more deeply into Python than the sort/sorted distinction -- this is why in Python, 'a = 1' is *not* an expression, but a statement. C allows you to say things like "b = (a = 1)", but in Python you have to say "a = 1; b = a".

...
Btw, two separate issues seem to be running side by side.

i. should in-place operations return their result? ii. how can we signal that an operation is inplace?

I expect NumPy to do inplace operations when feasible, so maybe they could take an `out` keyword with a None default. Possibly recognize `out=True` as asking for the original array object to be returned (mutated); `out='copy'` as asking for a copy to be created, operated upon, and returned; and `out=a` to ask for array `a` to be used for the output (without changing the original object, and with a return value of None).

Good point that numpy also has a nice convention with out= arguments for ufuncs. I guess that convention is, by default return a new array, but also allow one to modify the same (or another!) array in-place, by passing out=. So this would suggest that we'd have b = shuffled(a) shuffled(a, out=a) shuffled(a, out=b) shuffle(a) # same as shuffled(a, out=a) and if people are bothered by having both 'shuffled' and 'shuffle', then we drop 'shuffle'. (And the decision about whether to include the imperative form can be made on a case-by-case basis; having both shuffled and shuffle seems fine to me, but probably there are other cases where this is less clear.)

In addition to the verb tense, I think it's important that mutators are methods whereas functions do not mutate their arguments: lst.sort() sorted(lst) So -1 on shuffle(a) and a.shuffled(). Dag Sverre

...

There is also an argument that if out= is given, then we should always return None, in general. I'm having a lot of trouble thinking of any situation where it would be acceptable style (or even useful) to write something like: c = np.add(a, b, out=a) + 1 But, 'out=' is very large and visible (which makes the readability less terrible than it could be). And np.add always returns the out array when working out-of-place (so there's at least a weak countervailing convention). So I feel much more strongly that shuffle() should return None, than I do that np.add(out=...) should return None.

A compromise position would be to make all new functions that take out= return None when out= is given, while leaving existing ufuncs and such as they are for now.

-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Nathaniel Smith

1:29 p.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 6:08 PM, Dag Sverre Seljebotn < d.s.seljebotn@astro.uio.no> wrote:

...

In addition to the verb tense, I think it's important that mutators are methods whereas functions do not mutate their arguments:

lst.sort() sorted(lst)

Unfortunately this isn't really viable in a language like Python where you can't add methods to a class. (list.sort() versus sorted() has as much or more to do with the fact that sort's implementation only works on lists, while sorted takes an arbitrary iterable.) Even core python provides a function for in-place list randomization, not a method. Following the proposed rule would just mean that we couldn't provide in-place shuffles at all, which is clearly not going to be acceptable. -n

Benjamin Root

9:49 a.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy <jim.vickroy@noaa.gov> wrote:

...

On 1/16/2013 11:41 PM, Nathaniel Smith wrote:

On 16 Jan 2013 17:54, <josef.pktd@gmail.com> wrote:

...
...
...
...
a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5])

...
...
...
b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ?

Do we have a list of functions that are inplace?

I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc.

Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted().

I like this (tense) suggestion. It seems easy to remember. --jv

And another score for functions as verbs! :-P Ben Root

josef.pktd＠gmail.com

10:24 a.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 9:49 AM, Benjamin Root <ben.root@ou.edu> wrote:

...

On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy <jim.vickroy@noaa.gov> wrote:

...
On 1/16/2013 11:41 PM, Nathaniel Smith wrote:

On 16 Jan 2013 17:54, <josef.pktd@gmail.com> wrote:

...
...
...
...
a = np.random.random_integers(0, 5, size=5) b = a.sort() b a array([0, 1, 2, 5, 5])

...
...
...
b = np.random.shuffle(a) b b = np.random.permutation(a) b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ?

Do we have a list of functions that are inplace?

I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc.

Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted().

I like this (tense) suggestion. It seems easy to remember. --jv

And another score for functions as verbs!

I don't thing the filled we discuss here is an action. The current ``fill`` is an inplace operation, operating on an existing array. ``filled`` would be the analog that returns a copy. However ``filled`` here is creating an object I still think ``array_filled`` is the most precise '''Create an array and initialize it with the ``value``, returning the array ''' my 2.5c Josef

...

:-P

Ben Root

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

josef.pktd＠gmail.com

10:28 a.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 10:24 AM, <josef.pktd@gmail.com> wrote:

...

On Thu, Jan 17, 2013 at 9:49 AM, Benjamin Root <ben.root@ou.edu> wrote:

...
On Thu, Jan 17, 2013 at 8:54 AM, Jim Vickroy <jim.vickroy@noaa.gov> wrote:

...
On 1/16/2013 11:41 PM, Nathaniel Smith wrote:

On 16 Jan 2013 17:54, <josef.pktd@gmail.com> wrote:

...
...
...
> a = np.random.random_integers(0, 5, size=5) > b = a.sort() > b > a array([0, 1, 2, 5, 5])

...
...
> b = np.random.shuffle(a) > b > b = np.random.permutation(a) > b array([0, 5, 5, 2, 1])

How do I remember if shuffle shuffles or permutes ?

Do we have a list of functions that are inplace?

I rather like the convention used elsewhere in Python of naming in-place operations with present tense imperative verbs, and out-of-place operations with past participles. So you have sort/sorted, reverse/reversed, etc.

Here this would suggest we name these two operations as either shuffle() and shuffled(), or permute() and permuted().

I like this (tense) suggestion. It seems easy to remember. --jv

And another score for functions as verbs!

I don't thing the filled we discuss here is an action.

The current ``fill`` is an inplace operation, operating on an existing array. ``filled`` would be the analog that returns a copy.

However ``filled`` here is creating an object

I still think ``array_filled`` is the most precise

'''Create an array and initialize it with the ``value``, returning the array '''

my 2.5c

Josef

Sorry, completely out of context. I shouldn't write emails, when I'm running in and out the office. Josef

...

...
:-P

Ben Root

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Charles R Harris

10:27 a.m.

New subject: Shouldn't all in-place operations simply return self?

On Wed, Jan 16, 2013 at 5:11 PM, eat <e.antero.tammi@gmail.com> wrote:

...

Hi,

In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner.

I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html.

But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's ( http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner.

What do you think?

I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so. Chuck

...

-eat

P.S. FWIW, if this idea really gains momentum obviously I'm volunteering to create a PR of it.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Thouis (Ray) Jones

5:13 p.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...

On Wed, Jan 16, 2013 at 5:11 PM, eat <e.antero.tammi@gmail.com> wrote:

...
Hi,

In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a trivial two-liner.

I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html.

But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's (http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner.

What do you think?

I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so.

Chuck

I'm -1 on breaking with Python convention without very good reasons. Ray

Ralf Gommers

7:08 p.m.

New subject: Shouldn't all in-place operations simply return self?

On Thu, Jan 17, 2013 at 11:13 PM, Thouis (Ray) Jones <thouis@gmail.com>wrote:

...

On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Wed, Jan 16, 2013 at 5:11 PM, eat <e.antero.tammi@gmail.com> wrote:

...
Hi,

In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a

trivial

...
...
two-liner.

I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html .

But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's (http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner.

What do you think?

I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so.

Chuck

I'm -1 on breaking with Python convention without very good reasons.

Three times -1: on breaking Python conventions, on changing any existing numpy functions/methods for something like this, and on having similarly named functions like shuffle/shuffled that basically do the same thing. +1 on using out= more, and on some general guideline on function-naming-grammar. Ralf

eat

6:35 a.m.

New subject: Shouldn't all in-place operations simply return self?

Hi, On Fri, Jan 18, 2013 at 12:13 AM, Thouis (Ray) Jones <thouis@gmail.com>wrote:

...

On Thu, Jan 17, 2013 at 10:27 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

...
On Wed, Jan 16, 2013 at 5:11 PM, eat <e.antero.tammi@gmail.com> wrote:

...
Hi,

In a recent thread http://article.gmane.org/gmane.comp.python.numeric.general/52772 it was proposed that .fill(.) should return self as an alternative for a

trivial

...
...
two-liner.

I'm raising now the question: what if all in-place operations indeed could return self? How bad this would be? A 'strong' counter argument may be found at http://mail.python.org/pipermail/python-dev/2003-October/038855.html .

But anyway, at least for me. it would be much more straightforward to implement simple mini dsl's (http://en.wikipedia.org/wiki/Domain-specific_language) a much more straightforward manner.

What do you think?

I've read Guido about why he didn't like inplace operations returning self and found him convincing for a while. And then I listened to other folks express a preference for the freight train style and found them convincing also. I think it comes down to a preference for one style over another and I go back and forth myself. If I had to vote, I'd go for returning self, but I'm not sure it's worth breaking python conventions to do so.

Chuck

I'm -1 on breaking with Python convention without very good reasons.

As an example I personally find following behavior highly counter intuitive. In []: p, P= rand(3, 1), rand(3, 5) In []: ((p- P)** 2).sum(0).argsort() Out[]: array([2, 4, 1, 3, 0]) In []: ((p- P)** 2).sum(0).sort().diff() ------------------------------------------------------------ Traceback (most recent call last): File "<ipython console>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'diff' Regards, -eat

...

Ray _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

4436

Age (days ago)

4438

Last active (days ago)

List overview

Download

14 comments

10 participants

participants (10)

Alan G Isaac
Benjamin Root
Charles R Harris
Dag Sverre Seljebotn
eat
Jim Vickroy
josef.pktd＠gmail.com
Nathaniel Smith
Ralf Gommers
Thouis (Ray) Jones

Shouldn't all in-place operations simply return self?

Benjamin Root

Thouis (Ray) Jones

tags

participants (10)