From cool-rr at cool-rr.com  Sat Dec  5 12:55:37 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 5 Dec 2009 11:55:37 +0000 (UTC)
Subject: [Python-ideas] =?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?=
	=?utf-8?q?_=60start=60_parameter=3F?=
Message-ID: <loom.20091205T125239-371@post.gmane.org>

I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip 
adding any start value if none is specified?

This current behavior is preventing me from using `sum` to add up a bunch of non-
number objects.

Ram.

From python at mrabarnett.plus.com  Sat Dec  5 17:43:15 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 05 Dec 2009 16:43:15 +0000
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T125239-371@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>
Message-ID: <4B1A8DA3.40904@mrabarnett.plus.com>

Ram Rachum wrote:
> I noticed that `sum` tries to add zero to your iterable. Why? Why not
> just skip adding any start value if none is specified?
> 
> This current behavior is preventing me from using `sum` to add up a 
> bunch of non-number objects.
> 
Sometimes you might find that the list you're summing is empty. Because
'sum' is most often used with numbers, the default sum of a list is 0.
If you want to sum a list of non-numbers, provide a suitable start
value. For example, to sum a list of lists a suitable start value is []:

>>> sum([[0, 1], [2, 3]], [])
[0, 1, 2, 3]

I agree that it would be nice if the start value could just be omitted,
but then what should 'sum' return if the list is empty?

If sum([1, 2]) returned 3, then I'd want sum([]) to return 0.

If sum([[1], [2]]) returned [1, 2], then I'd want sum([]) to return [].

Unfortunately, I can't have it both ways.

From andreengels at gmail.com  Sat Dec  5 17:45:33 2009
From: andreengels at gmail.com (Andre Engels)
Date: Sat, 5 Dec 2009 17:45:33 +0100
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T125239-371@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>
Message-ID: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>

On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr at cool-rr.com> wrote:
> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
> adding any start value if none is specified?
>
> This current behavior is preventing me from using `sum` to add up a bunch of non-
> number objects.

In your proposed implementation, sum([]) would be undefined.

-- 
Andr? Engels, andreengels at gmail.com

From cool-rr at cool-rr.com  Sat Dec  5 17:56:09 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 5 Dec 2009 16:56:09 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?=
	=?utf-8?q?_=60start=60=09parameter=3F?=
References: <loom.20091205T125239-371@post.gmane.org>
	<4B1A8DA3.40904@mrabarnett.plus.com>
Message-ID: <loom.20091205T175122-316@post.gmane.org>

> Sometimes you might find that the list you're summing is empty. Because
> 'sum' is most often used with numbers, the default sum of a list is 0.
> If you want to sum a list of non-numbers, provide a suitable start
> value. For example, to sum a list of lists a suitable start value is []:
> 
> >>> sum([[0, 1], [2, 3]], [])
> [0, 1, 2, 3]
> 
> I agree that it would be nice if the start value could just be omitted,
> but then what should 'sum' return if the list is empty?

I see the problem. I think a good solution would be to tell the user, "If you
want `sum` to be able to handle a non-empty list, you must supply `start`."
Users that want to add up a (possibly empty) sequence of numbers will have to 
specify `start`.

If start is supplied, it will work like it does now. If start isn't supplied, it 
will add up all the elements without adding any `start` to them.

What do you think?

From george.sakkis at gmail.com  Sat Dec  5 18:01:01 2009
From: george.sakkis at gmail.com (George Sakkis)
Date: Sat, 5 Dec 2009 19:01:01 +0200
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
Message-ID: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>

On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:

> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr at cool-rr.com> wrote:
>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>> adding any start value if none is specified?
>>
>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>> number objects.
>
> In your proposed implementation, sum([]) would be undefined.

Which would make it consistent with min/max.

George

From algorias at gmail.com  Sat Dec  5 18:23:19 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Sat, 5 Dec 2009 14:23:19 -0300
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
Message-ID: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>

2009/12/5 George Sakkis <george.sakkis at gmail.com>:
> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
>
>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr at cool-rr.com> wrote:
>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>> adding any start value if none is specified?
>>>
>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>> number objects.
>>
>> In your proposed implementation, sum([]) would be undefined.
>
> Which would make it consistent with min/max.

And in that case the special string handling could also be dropped?

>>> sum(["a","b"], "start")
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    sum(["a","b"], "start")
TypeError: sum() can't sum strings [use ''.join(seq) instead]

This behaviour is quite bothersome. Sum can handle arbitrary objects
in theory (as long as they define the correct special methods, etc.),
but it gratuitously raises an exception on strings. This behaviour is
also inconsistent with the following:

>>> sum(["a","b"])
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    sum(["a","b"])
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Where sum actually tries to add "a" to the default value of 0.

From g.brandl at gmx.net  Sat Dec  5 18:33:13 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 05 Dec 2009 18:33:13 +0100
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T175122-316@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>
	<loom.20091205T175122-316@post.gmane.org>
Message-ID: <hfe5hn$217$1@ger.gmane.org>

Ram Rachum schrieb:
>> Sometimes you might find that the list you're summing is empty. Because
>> 'sum' is most often used with numbers, the default sum of a list is 0.
>> If you want to sum a list of non-numbers, provide a suitable start
>> value. For example, to sum a list of lists a suitable start value is []:
>> 
>> >>> sum([[0, 1], [2, 3]], [])
>> [0, 1, 2, 3]
>> 
>> I agree that it would be nice if the start value could just be omitted,
>> but then what should 'sum' return if the list is empty?
> 
> 
> I see the problem. I think a good solution would be to tell the user, "If you
> want `sum` to be able to handle a non-empty list, you must supply `start`."
> Users that want to add up a (possibly empty) sequence of numbers will have to 
> specify `start`.
> 
> If start is supplied, it will work like it does now. If start isn't supplied, it 
> will add up all the elements without adding any `start` to them.
> 
> What do you think?

There is a choice between these two variants:

a) require start for non-numerical sequences
b) require start for possibly empty sequences

I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a).  It also stands to reason that case b)

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From g.brandl at gmx.net  Sat Dec  5 18:35:07 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 05 Dec 2009 18:35:07 +0100
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T175122-316@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>
	<loom.20091205T175122-316@post.gmane.org>
Message-ID: <hfe5l9$2af$1@ger.gmane.org>

Ram Rachum schrieb:
>> Sometimes you might find that the list you're summing is empty. Because
>> 'sum' is most often used with numbers, the default sum of a list is 0.
>> If you want to sum a list of non-numbers, provide a suitable start
>> value. For example, to sum a list of lists a suitable start value is []:
>> 
>> >>> sum([[0, 1], [2, 3]], [])
>> [0, 1, 2, 3]
>> 
>> I agree that it would be nice if the start value could just be omitted,
>> but then what should 'sum' return if the list is empty?
> 
> 
> I see the problem. I think a good solution would be to tell the user, "If you
> want `sum` to be able to handle a non-empty list, you must supply `start`."
> Users that want to add up a (possibly empty) sequence of numbers will have to 
> specify `start`.
> 
> If start is supplied, it will work like it does now. If start isn't supplied, it 
> will add up all the elements without adding any `start` to them.
> 
> What do you think?

(sorry, pressed wrong key)

There is a choice between these two variants:

a) require start for non-numerical sequences
b) require start for possibly empty sequences

I don't have a preference for either, so for compatibility's sake I would
vote to keep the current one, which is a).  It also stands to reason that
buggy usage in case b) is harder to detect, since the common case will
not uncover the bug (the sequence being nonempty), while for case a) it does.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From g.brandl at gmx.net  Sat Dec  5 18:36:32 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 05 Dec 2009 18:36:32 +0100
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
Message-ID: <hfe5nt$217$2@ger.gmane.org>

Vitor Bosshard schrieb:
> 2009/12/5 George Sakkis <george.sakkis at gmail.com>:
>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
>>
>>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr at cool-rr.com> wrote:
>>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>>> adding any start value if none is specified?
>>>>
>>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>>> number objects.
>>>
>>> In your proposed implementation, sum([]) would be undefined.
>>
>> Which would make it consistent with min/max.
> 
> 
> And in that case the special string handling could also be dropped?
> 
>>>> sum(["a","b"], "start")
> Traceback (most recent call last):
>   File "<pyshell#0>", line 1, in <module>
>     sum(["a","b"], "start")
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
> 
> 
> This behaviour is quite bothersome. Sum can handle arbitrary objects
> in theory (as long as they define the correct special methods, etc.),
> but it gratuitously raises an exception on strings.

This seems to be an instance where the "practicality" Zen rule beats the
"special cases" rule :)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From algorias at gmail.com  Sat Dec  5 19:04:42 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Sat, 5 Dec 2009 15:04:42 -0300
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <hfe5nt$217$2@ger.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
	<hfe5nt$217$2@ger.gmane.org>
Message-ID: <2987c46d0912051004j3d590135j392e770219a0dbbe@mail.gmail.com>

2009/12/5 Georg Brandl <g.brandl at gmx.net>:
> Vitor Bosshard schrieb:
>> 2009/12/5 George Sakkis <george.sakkis at gmail.com>:
>>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
>>>
>>>> On Sat, Dec 5, 2009 at 12:55 PM, Ram Rachum <cool-rr at cool-rr.com> wrote:
>>>>> I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip
>>>>> adding any start value if none is specified?
>>>>>
>>>>> This current behavior is preventing me from using `sum` to add up a bunch of non-
>>>>> number objects.
>>>>
>>>> In your proposed implementation, sum([]) would be undefined.
>>>
>>> Which would make it consistent with min/max.
>>
>>
>> And in that case the special string handling could also be dropped?
>>
>>>>> sum(["a","b"], "start")
>> Traceback (most recent call last):
>> ? File "<pyshell#0>", line 1, in <module>
>> ? ? sum(["a","b"], "start")
>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>
>>
>> This behaviour is quite bothersome. Sum can handle arbitrary objects
>> in theory (as long as they define the correct special methods, etc.),
>> but it gratuitously raises an exception on strings.
>
> This seems to be an instance where the "practicality" Zen rule beats the
> "special cases" rule :)
>

It might be more accurate to say "hand-holding" instead of
practicality (and it doesn't even catch all errors it's meant to). I'm
not so sure that's special enough ;-)

Vitor

From stephen at xemacs.org  Sat Dec  5 19:10:51 2009
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 06 Dec 2009 03:10:51 +0900
Subject: [Python-ideas] Why does `sum` use a default for the
	`start`	parameter?
In-Reply-To: <91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
Message-ID: <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>

George Sakkis writes:
 > On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:

 > > In your proposed implementation, sum([]) would be undefined.
 > 
 > Which would make it consistent with min/max.

There's no justification for trying to make 'min' and 'sum'
consistent.  The sum of an empty list of numbers is a well-defined
*number*, namely 0, but the max of an empty list of numbers is a
well-defined *non-number*, namely "minus infinity".

The real question is "what harm is done by preferring the
(well-defined) sum of an empty list of numbers over the (well-defined)
empty sums of lists and/or strings?"  Then, if there is any harm, "can
the situation be improved by having no useful default for empty lists
of any type?"  Finally, "is it worth breaking existing code to ensure
equal treatment of different types?"

My guess is that the answers are "very little", "hardly at all", and
"emphatically no."<wink>

From cool-rr at cool-rr.com  Sat Dec  5 19:05:59 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 5 Dec 2009 18:05:59 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?=
	=?utf-8?q?_=60start=60=09parameter=3F?=
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>
	<loom.20091205T175122-316@post.gmane.org>
	<hfe5l9$2af$1@ger.gmane.org>
Message-ID: <loom.20091205T190251-626@post.gmane.org>

> There is a choice between these two variants:
> 
> a) require start for non-numerical sequences
> b) require start for possibly empty sequences
> 
> I don't have a preference for either, so for compatibility's sake I would
> vote to keep the current one, which is a).  It also stands to reason that
> buggy usage in case b) is harder to detect, since the common case will
> not uncover the bug (the sequence being nonempty), while for case a) it does.

I prefer (b). The problem with requiring `start` for sequences of non-numerical 
objects is that you now have to go out and create a "zero object" of the same
type as your other objects. The object class might not even have a concept of a 
"zero object".

Ram.

From python at mrabarnett.plus.com  Sat Dec  5 19:12:31 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 05 Dec 2009 18:12:31 +0000
Subject: [Python-ideas] Why does `sum` use a default for the
	`start`	parameter?
In-Reply-To: <hfe5l9$2af$1@ger.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org>
	<hfe5l9$2af$1@ger.gmane.org>
Message-ID: <4B1AA28F.7040809@mrabarnett.plus.com>

Georg Brandl wrote:
> Ram Rachum schrieb:
>>> Sometimes you might find that the list you're summing is empty.
>>> Because 'sum' is most often used with numbers, the default sum of
>>> a list is 0. If you want to sum a list of non-numbers, provide a
>>> suitable start value. For example, to sum a list of lists a
>>> suitable start value is []:
>>> 
>>>>>> sum([[0, 1], [2, 3]], [])
>>> [0, 1, 2, 3]
>>> 
>>> I agree that it would be nice if the start value could just be
>>> omitted, but then what should 'sum' return if the list is empty?
>> 
>> I see the problem. I think a good solution would be to tell the
>> user, "If you want `sum` to be able to handle a non-empty list, you
>> must supply `start`." Users that want to add up a (possibly empty)
>> sequence of numbers will have to specify `start`.
>> 
>> If start is supplied, it will work like it does now. If start isn't
>> supplied, it will add up all the elements without adding any
>> `start` to them.
>> 
>> What do you think?
> 
> (sorry, pressed wrong key)
> 
> There is a choice between these two variants:
> 
> a) require start for non-numerical sequences
 > b) require start for possibly empty sequences
> 
> I don't have a preference for either, so for compatibility's sake I
> would vote to keep the current one, which is a).  It also stands to
> reason that buggy usage in case b) is harder to detect, since the
> common case will not uncover the bug (the sequence being nonempty),
> while for case a) it does.
> 
True, providing start will ensure that the result is of the correct
class, instead of it sometimes being an int, causing a TypeError later
on.

From python at mrabarnett.plus.com  Sat Dec  5 19:18:08 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 05 Dec 2009 18:18:08 +0000
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T190251-626@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org>	<hfe5l9$2af$1@ger.gmane.org>
	<loom.20091205T190251-626@post.gmane.org>
Message-ID: <4B1AA3E0.7000604@mrabarnett.plus.com>

Ram Rachum wrote:
>> There is a choice between these two variants:
>>
>> a) require start for non-numerical sequences
>> b) require start for possibly empty sequences
>>
>> I don't have a preference for either, so for compatibility's sake I would
>> vote to keep the current one, which is a).  It also stands to reason that
>> buggy usage in case b) is harder to detect, since the common case will
>> not uncover the bug (the sequence being nonempty), while for case a) it does.
> 
> 
> I prefer (b). The problem with requiring `start` for sequences of non-numerical 
> objects is that you now have to go out and create a "zero object" of the same
> type as your other objects. The object class might not even have a concept of a 
> "zero object".
> 
If the objects can be summed, shouldn't there also be a zero object?
Does anyone have an example when that's not possible?

From george.sakkis at gmail.com  Sat Dec  5 19:23:35 2009
From: george.sakkis at gmail.com (George Sakkis)
Date: Sat, 5 Dec 2009 20:23:35 +0200
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>

On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

> George Sakkis writes:
> ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
>
> ?> > In your proposed implementation, sum([]) would be undefined.
> ?>
> ?> Which would make it consistent with min/max.
>
> There's no justification for trying to make 'min' and 'sum'
> consistent. ?The sum of an empty list of numbers is a well-defined
> *number*, namely 0, but the max of an empty list of numbers is a
> well-defined *non-number*, namely "minus infinity".
>
> The real question is "what harm is done by preferring the
> (well-defined) sum of an empty list of numbers over the (well-defined)
> empty sums of lists and/or strings?" ?Then, if there is any harm, "can
> the situation be improved by having no useful default for empty lists
> of any type?" ?Finally, "is it worth breaking existing code to ensure
> equal treatment of different types?"
>
> My guess is that the answers are "very little", "hardly at all", and
> "emphatically no."<wink>

Agreed that there is little harm in preferring numbers over other
types when it comes to empty sequences, but the more important
question is "should the start argument be used even if the sequence is
*not* empty?". The OP doesn't think so and I agree.

George

From algorias at gmail.com  Sat Dec  5 19:39:39 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Sat, 5 Dec 2009 15:39:39 -0300
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
	<91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
Message-ID: <2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com>

2009/12/5 George Sakkis <george.sakkis at gmail.com>:
>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.
>

In that case, "default" would be a more appropriate name than "start".
That change of concept is a potential break in compatibility. How
often is the start argument given as a non-zero value? Not all that
often I suppose, but it's still a valid use-case. Ergo, the start
argument should never be omitted if it was explicitly set.

From janssen at parc.com  Sat Dec  5 19:40:21 2009
From: janssen at parc.com (Bill Janssen)
Date: Sat, 5 Dec 2009 10:40:21 PST
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
	<91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
Message-ID: <42682.1260038421@parc.com>

George Sakkis <george.sakkis at gmail.com> wrote:

> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> 
> > George Sakkis writes:
> > ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
> >
> > ?> > In your proposed implementation, sum([]) would be undefined.
> > ?>
> > ?> Which would make it consistent with min/max.
> >
> > There's no justification for trying to make 'min' and 'sum'
> > consistent. ?The sum of an empty list of numbers is a well-defined
> > *number*, namely 0, but the max of an empty list of numbers is a
> > well-defined *non-number*, namely "minus infinity".
> >
> > The real question is "what harm is done by preferring the
> > (well-defined) sum of an empty list of numbers over the (well-defined)
> > empty sums of lists and/or strings?" ?Then, if there is any harm, "can
> > the situation be improved by having no useful default for empty lists
> > of any type?" ?Finally, "is it worth breaking existing code to ensure
> > equal treatment of different types?"
> >
> > My guess is that the answers are "very little", "hardly at all", and
> > "emphatically no."<wink>
> 
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.

Or perhaps, the *default* start value should not be used if it doesn't
match in type the first element of a non-empty sequence.  An explicitly
specified start value should still be used even if the sequence is *not*
empty.

Bill

From cool-rr at cool-rr.com  Sat Dec  5 19:42:31 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sat, 5 Dec 2009 18:42:31 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_does_=60sum=60_use_a_default_for_the?=
	=?utf-8?q?_=60start=60=09parameter=3F?=
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org>	<hfe5l9$2af$1@ger.gmane.org>
	<loom.20091205T190251-626@post.gmane.org>
	<4B1AA3E0.7000604@mrabarnett.plus.com>
Message-ID: <loom.20091205T193504-515@post.gmane.org>

MRAB <python at ...> writes:

> > I prefer (b). The problem with requiring `start` for sequences of non-
numerical 
> > objects is that you now have to go out and create a "zero object" of the 
same
> > type as your other objects. The object class might not even have a concept 
of a 
> > "zero object".
> > 
> If the objects can be summed, shouldn't there also be a zero object?
> Does anyone have an example when that's not possible?

You're right MRAB, probably almost every object type that has a concept of 
"addition" will have a concept of a zero element.

BUT, that zero object has to be created by the user of `sum`, and that has two 
problems:

1. The user might not know from beforehand which type of object he's adding. 
Even within the same type there might be problems. What happens when the user is 
using `sum` to add a bunch of vectors, and he doesn't know from beforehand what 
the dimensions of the vectors are? How will he know if his zero element should 
be Vector([0, 0]) or Vector([0, 0, 0])

2. A smaller problem: The user has to actually create that zero object now, and 
for some objects the definition might be lengthy, adding needless complexity to 
the code.

Also, using the `start` has some overhead, for creating the zero object and 
calling __add__.

Ram.

From rhamph at gmail.com  Sat Dec  5 19:48:52 2009
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 5 Dec 2009 11:48:52 -0700
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
Message-ID: <aac2c7cb0912051048v1b7596ddr801d7ef82b72ad15@mail.gmail.com>

On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias at gmail.com> wrote:
> And in that case the special string handling could also be dropped?
>
>>>> sum(["a","b"], "start")
> Traceback (most recent call last):
> ?File "<pyshell#0>", line 1, in <module>
> ? ?sum(["a","b"], "start")
> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>
>
> This behaviour is quite bothersome. Sum can handle arbitrary objects
> in theory (as long as they define the correct special methods, etc.),
> but it gratuitously raises an exception on strings. This behaviour is
> also inconsistent with the following:
>
>>>> sum(["a","b"])
> Traceback (most recent call last):
> ?File "<pyshell#1>", line 1, in <module>
> ? ?sum(["a","b"])
> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>
>
> Where sum actually tries to add "a" to the default value of 0.

sum is defined by repeatedly adding each number in a sequence.  As
each number is usually constant, and the size of total grows
logarithmically, this is O(n log n) (but due to implementation
coarseness it usually isn't distinguished from O(n)).

Concatenation however grows the total's size very quickly.  You
instead get a performance of O(n**2).  Same result, wrong algorithm.

It would be possible to special case strings, but why?  The programmer
should know what algorithm they're using and what complexity class it
has, so they can pick the right one (''.join(seq) in this case).  IOW,
handling arbitrary objects is an illusion.

For an another example on why the programmer needs to understand the
algorithmic complexity of the operations they're using, and that the
language should value performance consistency and not just correct
output, see ABC's usage of rational numbers:
http://python-history.blogspot.com/2009/03/problem-with-integer-division.html

-- 
Adam Olsen, aka Rhamphoryncus

From algorias at gmail.com  Sat Dec  5 19:55:53 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Sat, 5 Dec 2009 15:55:53 -0300
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T193504-515@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>
	<4B1A8DA3.40904@mrabarnett.plus.com>
	<loom.20091205T175122-316@post.gmane.org> <hfe5l9$2af$1@ger.gmane.org>
	<loom.20091205T190251-626@post.gmane.org>
	<4B1AA3E0.7000604@mrabarnett.plus.com>
	<loom.20091205T193504-515@post.gmane.org>
Message-ID: <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>

2009/12/5 Ram Rachum <cool-rr at cool-rr.com>:
> MRAB <python at ...> writes:
>
>> > I prefer (b). The problem with requiring `start` for sequences of non-
> numerical
>> > objects is that you now have to go out and create a "zero object" of the
> same
>> > type as your other objects. The object class might not even have a concept
> of a
>> > "zero object".
>> >
>> If the objects can be summed, shouldn't there also be a zero object?
>> Does anyone have an example when that's not possible?
>
> You're right MRAB, probably almost every object type that has a concept of
> "addition" will have a concept of a zero element.
>
> BUT, that zero object has to be created by the user of `sum`, and that has two
> problems:
>
> 1. The user might not know from beforehand which type of object he's adding.
> Even within the same type there might be problems. What happens when the user is
> using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
> the dimensions of the vectors are? How will he know if his zero element should
> be Vector([0, 0]) or Vector([0, 0, 0])

Ugly, but works:

itr = iter(sequence)
sum(itr, itr.next())

This is actually a good example in favor of not requiring a start value.

From rhamph at gmail.com  Sat Dec  5 20:03:02 2009
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 5 Dec 2009 12:03:02 -0700
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
	<91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
Message-ID: <aac2c7cb0912051103rd0d16b6pb4e88b0e35e76992@mail.gmail.com>

On Sat, Dec 5, 2009 at 11:23, George Sakkis <george.sakkis at gmail.com> wrote:
> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>
>> George Sakkis writes:
>> ?> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels <andreengels at gmail.com> wrote:
>>
>> ?> > In your proposed implementation, sum([]) would be undefined.
>> ?>
>> ?> Which would make it consistent with min/max.
>>
>> There's no justification for trying to make 'min' and 'sum'
>> consistent. ?The sum of an empty list of numbers is a well-defined
>> *number*, namely 0, but the max of an empty list of numbers is a
>> well-defined *non-number*, namely "minus infinity".
>>
>> The real question is "what harm is done by preferring the
>> (well-defined) sum of an empty list of numbers over the (well-defined)
>> empty sums of lists and/or strings?" ?Then, if there is any harm, "can
>> the situation be improved by having no useful default for empty lists
>> of any type?" ?Finally, "is it worth breaking existing code to ensure
>> equal treatment of different types?"
>>
>> My guess is that the answers are "very little", "hardly at all", and
>> "emphatically no."<wink>
>
> Agreed that there is little harm in preferring numbers over other
> types when it comes to empty sequences, but the more important
> question is "should the start argument be used even if the sequence is
> *not* empty?". The OP doesn't think so and I agree.

Only sometimes adding the start value makes it more fragile.  If you
have Foo() objects that aren't compatible with int and you do
sum([Foo(), Foo()]) you get a Foo() back.  If your sequence then
happens to be empty you do sum([]) and get an int back.  The result is
likely to be used in a context that's not compatible with int either.
Better always fail and require an explicit start if you need it.

-- 
Adam Olsen, aka Rhamphoryncus

From george.sakkis at gmail.com  Sat Dec  5 20:07:22 2009
From: george.sakkis at gmail.com (George Sakkis)
Date: Sat, 5 Dec 2009 21:07:22 +0200
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>
	<91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
	<2987c46d0912051039w1028a365j94d6e0e1c7ea8279@mail.gmail.com>
Message-ID: <91ad5bf80912051107v3b345b55nfc6580a7d088431@mail.gmail.com>

On Sat, Dec 5, 2009 at 8:39 PM, Vitor Bosshard <algorias at gmail.com> wrote:
> 2009/12/5 George Sakkis <george.sakkis at gmail.com>:
>>
>> Agreed that there is little harm in preferring numbers over other
>> types when it comes to empty sequences, but the more important
>> question is "should the start argument be used even if the sequence is
>> *not* empty?". The OP doesn't think so and I agree.
>>
>
> In that case, "default" would be a more appropriate name than "start".
> That change of concept is a potential break in compatibility. How
> often is the start argument given as a non-zero value? Not all that
> often I suppose, but it's still a valid use-case. Ergo, the start
> argument should never be omitted if it was explicitly set.

Ok I see the different semantics between 'start' and 'default' and the
use cases for each but at the end of the day there should be a way
(preferably the default) that given a sequence [x1, ..., xN] one can
compute "x1+...+xN" instead of "start+x1+...+xN".

George

From algorias at gmail.com  Sat Dec  5 20:19:06 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Sat, 5 Dec 2009 16:19:06 -0300
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <aac2c7cb0912051048v1b7596ddr801d7ef82b72ad15@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
	<aac2c7cb0912051048v1b7596ddr801d7ef82b72ad15@mail.gmail.com>
Message-ID: <2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com>

2009/12/5 Adam Olsen <rhamph at gmail.com>:
> On Sat, Dec 5, 2009 at 10:23, Vitor Bosshard <algorias at gmail.com> wrote:
>> And in that case the special string handling could also be dropped?
>>
>>>>> sum(["a","b"], "start")
>> Traceback (most recent call last):
>> ?File "<pyshell#0>", line 1, in <module>
>> ? ?sum(["a","b"], "start")
>> TypeError: sum() can't sum strings [use ''.join(seq) instead]
>>
>>
>> This behaviour is quite bothersome. Sum can handle arbitrary objects
>> in theory (as long as they define the correct special methods, etc.),
>> but it gratuitously raises an exception on strings. This behaviour is
>> also inconsistent with the following:
>>
>>>>> sum(["a","b"])
>> Traceback (most recent call last):
>> ?File "<pyshell#1>", line 1, in <module>
>> ? ?sum(["a","b"])
>> TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>
>>
>> Where sum actually tries to add "a" to the default value of 0.
>
> sum is defined by repeatedly adding each number in a sequence. ?As
> each number is usually constant, and the size of total grows
> logarithmically, this is O(n log n) (but due to implementation
> coarseness it usually isn't distinguished from O(n)).
>
> Concatenation however grows the total's size very quickly. ?You
> instead get a performance of O(n**2). ?Same result, wrong algorithm.
>
> It would be possible to special case strings, but why? ?The programmer
> should know what algorithm they're using and what complexity class it
> has, so they can pick the right one (''.join(seq) in this case). ?IOW,
> handling arbitrary objects is an illusion.

I think you misunderstood my point. Sorry if I wasn't clear enough in
my original message. I understand the performance characteristics of
repeated concatenation vs str.join. I just wonder why the language
goes out of its way to catch this particular occurrence of bad code,
given there are plenty of ways to misuse sum or any other builtin for
that matter. A newbie is more likely to get n**2 performance by using
a for loop than sum:

final = ""
for s in strings:
    final += s

Should python refuse to compile the above snippet? The answer is an
emphatic "no".

From python at rcn.com  Sat Dec  5 20:31:14 2009
From: python at rcn.com (Raymond Hettinger)
Date: Sat, 5 Dec 2009 11:31:14 -0800
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
References: <loom.20091205T125239-371@post.gmane.org>
Message-ID: <F716115EF3344CB1B70274C2045A14EF@RaymondLaptop1>

[Ram Rachum]
>I noticed that `sum` tries to add zero to your iterable. Why? Why not just skip 
> adding any start value if none is specified?

Once the API has been released, it is difficult to change without breaking code.

> This current behavior is preventing me from using `sum` to add up a bunch of non-
> number objects.

You have plenty of options:
* use sum() as designed and supply your own Zero object as a start (see below)
* use reduce(operator.add, s)
* write a simple for-loop to do summing

It's not like summing is a hard task. There's nothing in you situation that
would warrant changing the behavior of a published API where sum(s)
is defined even when s is of length zero or one.

Raymond

------------------------------------

>>> class Zero:
...     'universal zero for addition'
...     def __add__(self, other):
...         return other
...     def __radd__(self, other):
...         return other
...
>>> Zero() + 'xyz'
'xyz'
>>> sum(['xyz pdq'], Zero())
'xyz pdq'

From python at mrabarnett.plus.com  Sat Dec  5 20:34:44 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 05 Dec 2009 19:34:44 +0000
Subject: [Python-ideas] Why does `sum` use a default for the
	`start`	parameter?
In-Reply-To: <42682.1260038421@parc.com>
References: <loom.20091205T125239-371@post.gmane.org>	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>	<87ein9thp0.fsf@uwakimon.sk.tsukuba.ac.jp>	<91ad5bf80912051023h4b88a114q43f7eae60c78ff0e@mail.gmail.com>
	<42682.1260038421@parc.com>
Message-ID: <4B1AB5D4.1000802@mrabarnett.plus.com>

Bill Janssen wrote:
> George Sakkis <george.sakkis at gmail.com> wrote:
> 
>> On Sat, Dec 5, 2009 at 8:10 PM, Stephen J. Turnbull
>> <stephen at xemacs.org> wrote:
>> 
>>> George Sakkis writes:
>>>> On Sat, Dec 5, 2009 at 6:45 PM, Andre Engels
>>>> <andreengels at gmail.com> wrote:
>>> 
>>>>> In your proposed implementation, sum([]) would be undefined.
>>>> 
>>>> Which would make it consistent with min/max.
>>> 
>>> There's no justification for trying to make 'min' and 'sum' 
>>> consistent.  The sum of an empty list of numbers is a
>>> well-defined *number*, namely 0, but the max of an empty list of
>>> numbers is a well-defined *non-number*, namely "minus infinity".
>>> 
>>> The real question is "what harm is done by preferring the 
>>> (well-defined) sum of an empty list of numbers over the
>>> (well-defined) empty sums of lists and/or strings?"  Then, if
>>> there is any harm, "can the situation be improved by having no
>>> useful default for empty lists of any type?"  Finally, "is it
>>> worth breaking existing code to ensure equal treatment of
>>> different types?"
>>> 
>>> My guess is that the answers are "very little", "hardly at all",
>>> and "emphatically no."<wink>
>> Agreed that there is little harm in preferring numbers over other 
>> types when it comes to empty sequences, but the more important 
>> question is "should the start argument be used even if the sequence
>> is *not* empty?". The OP doesn't think so and I agree.
> 
> Or perhaps, the *default* start value should not be used if it
> doesn't match in type the first element of a non-empty sequence.  An
> explicitly specified start value should still be used even if the
> sequence is *not* empty.
> 
Currently if start is None then the result is None if the sequence is
empty, but raises a TypeError otherwise.

Would it break any existing code if was this instead:

sum(sequence, start=0)

If start is None then it's omitted from the summation, unless the
sequence is empty, in which case the result is None.

From g.brandl at gmx.net  Sat Dec  5 21:59:36 2009
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 05 Dec 2009 21:59:36 +0100
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org>
	<hfe5l9$2af$1@ger.gmane.org>	<loom.20091205T190251-626@post.gmane.org>	<4B1AA3E0.7000604@mrabarnett.plus.com>	<loom.20091205T193504-515@post.gmane.org>
	<2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>
Message-ID: <hfehkm$29l$1@ger.gmane.org>

Vitor Bosshard schrieb:
> 2009/12/5 Ram Rachum <cool-rr at cool-rr.com>:
>> MRAB <python at ...> writes:
>>
>>> > I prefer (b). The problem with requiring `start` for sequences of non-
>> numerical
>>> > objects is that you now have to go out and create a "zero object" of the
>> same
>>> > type as your other objects. The object class might not even have a concept
>> of a
>>> > "zero object".
>>> >
>>> If the objects can be summed, shouldn't there also be a zero object?
>>> Does anyone have an example when that's not possible?
>>
>> You're right MRAB, probably almost every object type that has a concept of
>> "addition" will have a concept of a zero element.
>>
>> BUT, that zero object has to be created by the user of `sum`, and that has two
>> problems:
>>
>> 1. The user might not know from beforehand which type of object he's adding.
>> Even within the same type there might be problems. What happens when the user is
>> using `sum` to add a bunch of vectors, and he doesn't know from beforehand what
>> the dimensions of the vectors are? How will he know if his zero element should
>> be Vector([0, 0]) or Vector([0, 0, 0])
> 
> Ugly, but works:
> 
> itr = iter(sequence)
> sum(itr, itr.next())

Or, for sequences:

sum(islice(seq, 1), seq[0])

which clearly communicates the need for a non-empty sequence.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

From python at rcn.com  Sat Dec  5 22:13:45 2009
From: python at rcn.com (Raymond Hettinger)
Date: Sat, 5 Dec 2009 13:13:45 -0800
Subject: [Python-ideas] Why does `sum` use a default for the
	`start`parameter?
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org><hfe5l9$2af$1@ger.gmane.org>	<loom.20091205T190251-626@post.gmane.org>	<4B1AA3E0.7000604@mrabarnett.plus.com>	<loom.20091205T193504-515@post.gmane.org><2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>
	<hfehkm$29l$1@ger.gmane.org>
Message-ID: <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1>

>>>> > I prefer (b). The problem with requiring `start` for sequences of non-
>>> numerical
>>>> > objects is that you now have to go out and create a "zero object" of the
>>> same
>>>> > type as your other objects. The object class might not even have a concept
>>> of a
>>>> > "zero object".
>>>> >
>>>> If the objects can be summed, shouldn't there also be a zero object?

Use a single univeral zero object that works for everything.
Here's an example from my earlier post:

>>> class Zero:
...     'universal zero for addition'
...     def __add__(self, other):
...         return other
...     def __radd__(self, other):
...         return other
...
>>> Zero() + 'xyz'
'xyz'
>>> sum(['xyz', 'pdq'], Zero())
'xyzpdq'

Raymond

From ncoghlan at gmail.com  Sun Dec  6 00:49:53 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 06 Dec 2009 09:49:53 +1000
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <loom.20091205T190251-626@post.gmane.org>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org>	<hfe5l9$2af$1@ger.gmane.org>
	<loom.20091205T190251-626@post.gmane.org>
Message-ID: <4B1AF1A1.7050709@gmail.com>

Ram Rachum wrote:
> I prefer (b). The problem with requiring `start` for sequences of non-numerical 
> objects is that you now have to go out and create a "zero object" of the same
> type as your other objects. The object class might not even have a concept of a 
> "zero object".

class _AdditiveIdentity(object):
  def __add__(self, other):
    return other
  __radd__ = __add__

AdditiveIdentity = _AdditiveIdentity()

total = sum(itr, start=AdditiveIdentity)
if total is AdditiveIdentity:
  # Iterable was empty
else:
  # we got a real result

(Raymond already posted along these lines, but I wanted to point out
that by making the identity object a singleton you can save the cost of
repeated instantiation and simplify the after-the-fact check for an
empty iterable)

The other philosophical point here is one Guido has expressed several
times in the past: "In general, the type of a return value should not
depend on the *value* of an argument" (although the different numeric
types tend to blur together a bit in this specific context)

With only a default value, sum() could return entirely different types
based on whether or not the sequence was empty.

With a start value, on the other hand, the type returned must at least
be one that is compatible under addition with the start value. You can
subvert that a bit through the use of a universal additive identity, but
it holds short of that.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From tjreedy at udel.edu  Sun Dec  6 01:57:38 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 05 Dec 2009 19:57:38 -0500
Subject: [Python-ideas] Why does `sum` use a default for the
	`start`parameter?
In-Reply-To: <1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1>
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org><hfe5l9$2af$1@ger.gmane.org>	<loom.20091205T190251-626@post.gmane.org>	<4B1AA3E0.7000604@mrabarnett.plus.com>	<loom.20091205T193504-515@post.gmane.org><2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>	<hfehkm$29l$1@ger.gmane.org>
	<1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1>
Message-ID: <hfevi0$64f$1@ger.gmane.org>

Raymond Hettinger wrote:
> 
>>>>> > I prefer (b). The problem with requiring `start` for sequences of 
>>>>> non-
>>>> numerical
>>>>> > objects is that you now have to go out and create a "zero object" 
>>>>> of the
>>>> same
>>>>> > type as your other objects. The object class might not even have 
>>>>> a concept
>>>> of a
>>>>> > "zero object".
>>>>> >
>>>>> If the objects can be summed, shouldn't there also be a zero object?
> 
> 
> Use a single univeral zero object that works for everything.
> Here's an example from my earlier post:
> 
>>>> class Zero:
> ...     'universal zero for addition'
> ...     def __add__(self, other):
> ...         return other
> ...     def __radd__(self, other):
> ...         return other
> ...
>>>> Zero() + 'xyz'
> 'xyz'
>>>> sum(['xyz', 'pdq'], Zero())
> 'xyzpdq'

I would not have expected this to work, as it does not match "The 
iterable?s items are normally numbers, and are not allowed to be strings."

It appears that it is the start value that may not be a string.
I suggest a doc fix in
http://bugs.python.org/issue7447

FWIW, sum was designed for summing numbers at C speed. I think it 
probably is as good a compromise as we can get. It is easy to program 
any other exact behavior one wants, and summing user objects is going to 
go at Python speed anyway. Certainly, none of the suggested alterations 
strike me as worth breaking code.

Terry Jan Reedy

From python at rcn.com  Sun Dec  6 02:18:38 2009
From: python at rcn.com (Raymond Hettinger)
Date: Sat, 5 Dec 2009 17:18:38 -0800
Subject: [Python-ideas] Why does `sum` use a default for
	the`start`parameter?
References: <loom.20091205T125239-371@post.gmane.org>	<4B1A8DA3.40904@mrabarnett.plus.com>	<loom.20091205T175122-316@post.gmane.org><hfe5l9$2af$1@ger.gmane.org>	<loom.20091205T190251-626@post.gmane.org>	<4B1AA3E0.7000604@mrabarnett.plus.com>	<loom.20091205T193504-515@post.gmane.org><2987c46d0912051055k4e7206cpce591cd21819e117@mail.gmail.com>	<hfehkm$29l$1@ger.gmane.org><1653BCD6766143B8B457E06C6FBB54BD@RaymondLaptop1>
	<hfevi0$64f$1@ger.gmane.org>
Message-ID: <F5750EDB78294F9BA85250CDAF2D3AE5@RaymondLaptop1>

["Terry Reedy"]
> FWIW, sum was designed for summing numbers at C speed. I think it 
> probably is as good a compromise as we can get. It is easy to program 
> any other exact behavior one wants, and summing user objects is going to 
> go at Python speed anyway. Certainly, none of the suggested alterations 
> strike me as worth breaking code.

Wisely spoken.

Raymond

From rhamph at gmail.com  Sun Dec  6 07:29:05 2009
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 5 Dec 2009 23:29:05 -0700
Subject: [Python-ideas] Why does `sum` use a default for the `start`
	parameter?
In-Reply-To: <2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com>
References: <loom.20091205T125239-371@post.gmane.org>
	<6faf39c90912050845p66da83dfx4a9b376ef92fdd6e@mail.gmail.com>
	<91ad5bf80912050901i54dbda51y48cb662b2138b16@mail.gmail.com>
	<2987c46d0912050923v730777f5y62b904c61f122c81@mail.gmail.com>
	<aac2c7cb0912051048v1b7596ddr801d7ef82b72ad15@mail.gmail.com>
	<2987c46d0912051119v524a94d7pc5235ef6ab2d58b5@mail.gmail.com>
Message-ID: <aac2c7cb0912052229j51bc35efof2990b74241067fa@mail.gmail.com>

On Sat, Dec 5, 2009 at 12:19, Vitor Bosshard <algorias at gmail.com> wrote:
> I think you misunderstood my point. Sorry if I wasn't clear enough in
> my original message. I understand the performance characteristics of
> repeated concatenation vs str.join. I just wonder why the language
> goes out of its way to catch this particular occurrence of bad code,
> given there are plenty of ways to misuse sum or any other builtin for
> that matter. A newbie is more likely to get n**2 performance by using
> a for loop than sum:
>
> final = ""
> for s in strings:
> ? ?final += s
>
> Should python refuse to compile the above snippet? The answer is an
> emphatic "no".

All the individual operations there are fine.  It's the composition
that's wrong.  Adding a sanity check would require recognizing that
pattern, and changing the semantics of an individual operation based
on what surrounds it.  Not a nice thing to do.

sum() is already a single operation (regardless of how it's
implemented), so it doesn't have that problem.

-- 
Adam Olsen, aka Rhamphoryncus

From facundobatista at gmail.com  Mon Dec  7 11:36:34 2009
From: facundobatista at gmail.com (Facundo Batista)
Date: Mon, 7 Dec 2009 07:36:34 -0300
Subject: [Python-ideas] Heap data type
In-Reply-To: <3CDA63554E1546DEA84A696B56BB4876@RaymondLaptop1>
References: <e04bdf310904180521j76689f6j6cc7d207094b2d33@mail.gmail.com>
	<20090418124357.GA8506@panix.com>
	<3CDA63554E1546DEA84A696B56BB4876@RaymondLaptop1>
Message-ID: <e04bdf310912070236w33ae73dasa76a863198f4f23d@mail.gmail.com>

On Sat, Apr 18, 2009 at 8:40 PM, Raymond Hettinger <python at rcn.com> wrote:

> Facundo, I would like to work with you on this.
> I've been the primary maintainer for heapq for a while
> and had already started working on something like this
> in response to repeated requested to support a key= function
> (like that for sorted/min/max).

After a not much complicated, but different year (I had a kid!), I'm
bringing this thread back to live.

There were different proposals of different people about what to do
after my initial mail, we can separate them in two sections:

- Move the Heap class to the collections module: I'm just changing the
heapq module to have an OO interface, instead a bunch of functions.
I'm +0 to moving it to "collections", but note that even after the
reordering of the stdlib for Py3, the heapq module remained there.

- Add functionality to the Heap class: I'm +0 to this, but I don't
want to stop this change in function of further functionality... I
propose to have the same functionality, in an OO and less error prone
way. We can add more functionality afterwards.

What do you think?

Raymond, let's work together... but don't know where. The Heap class
is already coded in my first mail, if you want to start from there and
add functionality, I'm +0. If you want me to add tests and push the
inclusion of that class into the module, just tell me. Something else,
I'm all ears, :)

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From kristjan at ccpgames.com  Tue Dec  8 18:51:46 2009
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Tue, 8 Dec 2009 17:51:46 +0000
Subject: [Python-ideas] disabling .pyc and .pyo files
Message-ID: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>

Hello there.
We have a large project involving multiple perforce branches of hundreds of .py files each.
Although we employ our own import mechanism for the bulk of these files, we do use the regular import mechanism for an essential core of them.

Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.  This can happen for a variety of reasons, but most often it occurs when .py files are being removed, or moved in the hierarchy.  The problem is that the application will happily load and import an orphaned .pyo file, even though the .py file has gone or moved.

I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files.  I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified.   This will ensure that the application will execute only the code represented by the checked-out .py files.  But it occurred to me that this functionality might be of interest to other people than just us.  I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

Do you think that such a command line option would be useful for Python at large?

Cheers,
Kristj?n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091208/ae5d9a32/attachment.html>

From jnoller at gmail.com  Tue Dec  8 19:58:41 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 8 Dec 2009 13:58:41 -0500
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com>

2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>:
> Hello there.
>
> We have a large project involving multiple perforce branches of hundreds of
> .py files each.
>
> Although we employ our own import mechanism for the bulk of these files, we
> do use the regular import mechanism for an essential core of them.
>
>
>
> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
> This can happen for a variety of reasons, but most often it occurs when .py
> files are being removed, or moved in the hierarchy.? The problem is that the
> application will happily load and import an orphaned .pyo file, even though
> the .py file has gone or moved.
>
>
>
> I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files.? I am about to implement that patch for
> our purposes, thus forcing recompilation of the .py files on each run if so
> specified.?? This will ensure that the application will execute only the
> code represented by the checked-out .py files.? But it occurred to me that
> this functionality might be of interest to other people than just us.? I can
> imagine, for example, that buildbots running the python regression testsuite
> might be running into problems with stray .pyo files from time to time.
>
>
>
> Do you think that such a command line option would be useful for Python at
> large?
>
>
>
> Cheers,
>
> Kristj?n

FWIW: I've been bitten by this more than once, especially on Django
projects, mainly during the development cycle.

From toddw at activestate.com  Tue Dec  8 20:07:46 2009
From: toddw at activestate.com (Todd Whiteman)
Date: Tue, 08 Dec 2009 11:07:46 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <4B1EA402.50704@activestate.com>

Kristj?n Valur J?nsson wrote:
> I looked at the import code and I found that it is trivial to block the 
> reading and writing of .pyo files.  I am about to implement that patch 
> for our purposes, thus forcing recompilation of the .py files on each 
> run if so specified.   This will ensure that the application will 
> execute only the code represented by the checked-out .py files.  But it 
> occurred to me that this functionality might be of interest to other 
> people than just us.  I can imagine, for example, that buildbots running 
> the python regression testsuite might be running into problems with 
> stray .pyo files from time to time.
> 
> Do you think that such a command line option would be useful for Python 
> at large?

Yes, this is already implemented (as of Python 2.6), see -B option:
http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

From guido at python.org  Tue Dec  8 20:10:00 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Dec 2009 11:10:00 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com>
Message-ID: <ca471dc20912081110s139fe30cpb7a6fd3a6a242410@mail.gmail.com>

Agreed. I wonder if this functionality ought to be opt-in instead of
opt-out? The only use cases I am aware of are software vendors who
don't want to distribute their source (a near-extinct breed for
sure...) or people with absurdly small disks (ditto).

2009/12/8 Jesse Noller <jnoller at gmail.com>:
> 2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>:
>> Hello there.
>>
>> We have a large project involving multiple perforce branches of hundreds of
>> .py files each.
>>
>> Although we employ our own import mechanism for the bulk of these files, we
>> do use the regular import mechanism for an essential core of them.
>>
>>
>>
>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>> This can happen for a variety of reasons, but most often it occurs when .py
>> files are being removed, or moved in the hierarchy.? The problem is that the
>> application will happily load and import an orphaned .pyo file, even though
>> the .py file has gone or moved.
>>
>>
>>
>> I looked at the import code and I found that it is trivial to block the
>> reading and writing of .pyo files.? I am about to implement that patch for
>> our purposes, thus forcing recompilation of the .py files on each run if so
>> specified.?? This will ensure that the application will execute only the
>> code represented by the checked-out .py files.? But it occurred to me that
>> this functionality might be of interest to other people than just us.? I can
>> imagine, for example, that buildbots running the python regression testsuite
>> might be running into problems with stray .pyo files from time to time.
>>
>>
>>
>> Do you think that such a command line option would be useful for Python at
>> large?
>>
>>
>>
>> Cheers,
>>
>> Kristj?n
>
> FWIW: I've been bitten by this more than once, especially on Django
> projects, mainly during the development cycle.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Tue Dec  8 20:11:32 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Dec 2009 11:11:32 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <4B1EA402.50704@activestate.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<4B1EA402.50704@activestate.com>
Message-ID: <ca471dc20912081111r1e1f0bbdpddcb31f0dced08de@mail.gmail.com>

-B only blocks *writing* of bytecode. I think the OP wants to block
*reading*, and only in the specific case where there is no
corresponding source code file.

2009/12/8 Todd Whiteman <toddw at activestate.com>:
> Kristj?n Valur J?nsson wrote:
>>
>> I looked at the import code and I found that it is trivial to block the
>> reading and writing of .pyo files. ?I am about to implement that patch for
>> our purposes, thus forcing recompilation of the .py files on each run if so
>> specified. ? This will ensure that the application will execute only the
>> code represented by the checked-out .py files. ?But it occurred to me that
>> this functionality might be of interest to other people than just us. ?I can
>> imagine, for example, that buildbots running the python regression testsuite
>> might be running into problems with stray .pyo files from time to time.
>>
>> Do you think that such a command line option would be useful for Python at
>> large?
>
> Yes, this is already implemented (as of Python 2.6), see -B option:
> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>

-- 
--Guido van Rossum (python.org/~guido)

From john.arbash.meinel at gmail.com  Tue Dec  8 20:27:21 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Tue, 08 Dec 2009 13:27:21 -0600
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <ca471dc20912081111r1e1f0bbdpddcb31f0dced08de@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<4B1EA402.50704@activestate.com>
	<ca471dc20912081111r1e1f0bbdpddcb31f0dced08de@mail.gmail.com>
Message-ID: <4B1EA899.8060409@gmail.com>

Guido van Rossum wrote:
> -B only blocks *writing* of bytecode. I think the OP wants to block
> *reading*, and only in the specific case where there is no
> corresponding source code file.
> 
> 2009/12/8 Todd Whiteman <toddw at activestate.com>:
>> Kristj?n Valur J?nsson wrote:
>>> I looked at the import code and I found that it is trivial to block the
>>> reading and writing of .pyo files.  I am about to implement that patch for
>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>> specified.   This will ensure that the application will execute only the
>>> code represented by the checked-out .py files.  But it occurred to me that
>>> this functionality might be of interest to other people than just us.  I can
>>> imagine, for example, that buildbots running the python regression testsuite
>>> might be running into problems with stray .pyo files from time to time.
>>>
>>> Do you think that such a command line option would be useful for Python at
>>> large?
>> Yes, this is already implemented (as of Python 2.6), see -B option:
>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options

This would be quite nice for us. In our case we have been bit several
times during refactoring. You move one file, but your test suite still
passes because .pyc is still around.

I think having it be opt-in would be nice.

I do think that the standard py2exe code generates a library.zip that
only has .pyc or .pyo files (and no .py files). It isn't that we would
care if they were present, but I suppose it makes the final .zip file
smaller and faster to load?

Whatever flag is available, though, I'm sure py2exe could be taught to
pass it.

John
=:->

From python at rcn.com  Tue Dec  8 20:34:25 2009
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 8 Dec 2009 11:34:25 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com>
	<ca471dc20912081110s139fe30cpb7a6fd3a6a242410@mail.gmail.com>
Message-ID: <90229A93A0D24EC387526F18D5741983@RaymondLaptop1>

>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>> This can happen for a variety of reasons, but most often it occurs when .py
>> files are being removed, or moved in the hierarchy. The problem is that the
>> application will happily load and import an orphaned .pyo file, even though
>> the .py file has gone or moved.

I've seen this same problem occur for a number of users.
It is recurring opportunity to get tripped-up.

Raymond

From brett at python.org  Tue Dec  8 20:51:04 2009
From: brett at python.org (Brett Cannon)
Date: Tue, 8 Dec 2009 11:51:04 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <90229A93A0D24EC387526F18D5741983@RaymondLaptop1>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com> 
	<ca471dc20912081110s139fe30cpb7a6fd3a6a242410@mail.gmail.com> 
	<90229A93A0D24EC387526F18D5741983@RaymondLaptop1>
Message-ID: <bbaeab100912081151o49c77acdjbeef3164d8ca2fc@mail.gmail.com>

On Tue, Dec 8, 2009 at 11:34, Raymond Hettinger <python at rcn.com> wrote:

>
>  Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>>> This can happen for a variety of reasons, but most often it occurs when
>>> .py
>>> files are being removed, or moved in the hierarchy. The problem is that
>>> the
>>> application will happily load and import an orphaned .pyo file, even
>>> though
>>> the .py file has gone or moved.
>>>
>>
> I've seen this same problem occur for a number of users.
> It is recurring opportunity to get tripped-up.

Another way that a sys.dont_read_bytecode flag would be helpful is for VMs
that don't use Python bytecode (e.g. Jython). They could set this flag to
True by default which allows code to introspect on the VM to see if it is
using bytecode or not. Plus it would let importlib easily skip bytecode
usage on VMs that don't support it instead of trying to come up with some
heuristic to pick up on that fact (I have not figured that one out yet, but
Jython folk were thinking about having marshal.loads() always throw an
exception).

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091208/de7fec20/attachment.html>

From ben+python at benfinney.id.au  Tue Dec  8 22:44:01 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Wed, 09 Dec 2009 08:44:01 +1100
Subject: [Python-ideas] Importing orphaned bytecode files (was: disabling
	.pyc and .pyo files)
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <87ljhdf8f2.fsf@benfinney.id.au>

Kristj?n Valur J?nsson
<kristjan at ccpgames.com> writes:

> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
> files. This can happen for a variety of reasons, but most often it
> occurs when .py files are being removed, or moved in the hierarchy.
> The problem is that the application will happily load and import an
> orphaned .pyo file, even though the .py file has gone or moved.

Yes, I think Python users would benefit from having the above behaviour
be opt-in.

I suggest:

* A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
  interpreter follows the current behaviour. If ?False?, any bytecode
  file satisfies an import only if it has a corresponding source file
  (where ?corresponding? means ?this source file would, if compiled,
  result in a bytecode file replacing this one?).

  I suggest this attribute should be implemented as ?True? by default
  (to match current behaviour), then switched to ?False? by default as
  soon as feasible.

* The ?PYTHONIMPORTORPHANEDBYTECODE? environment variable, when set,
  causes the interpreter to set the above option ?True?.

* The ?-b? option to the interpreter command-line sets the above option
  ?True?.

-- 
 \         ?I have yet to see any problem, however complicated, which, |
  `\      when you looked at it in the right way, did not become still |
_o__)                                more complicated.? ?Paul Anderson |
Ben Finney

From collinw at gmail.com  Tue Dec  8 23:20:21 2009
From: collinw at gmail.com (Collin Winter)
Date: Tue, 8 Dec 2009 14:20:21 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <bbaeab100912081151o49c77acdjbeef3164d8ca2fc@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<4222a8490912081058i7ca52869n4b649a926acb06fe@mail.gmail.com>
	<ca471dc20912081110s139fe30cpb7a6fd3a6a242410@mail.gmail.com>
	<90229A93A0D24EC387526F18D5741983@RaymondLaptop1>
	<bbaeab100912081151o49c77acdjbeef3164d8ca2fc@mail.gmail.com>
Message-ID: <43aa6ff70912081420p1cc4d5do6126eb3f50421430@mail.gmail.com>

On Tue, Dec 8, 2009 at 11:51 AM, Brett Cannon <brett at python.org> wrote:
> Another way that a sys.dont_read_bytecode flag would be helpful is for VMs
> that don't use Python bytecode (e.g. Jython). They could set this flag to
> True by default which allows code to introspect on the VM to see if it is
> using bytecode or not. Plus it would let importlib easily skip bytecode
> usage on VMs that don't support it instead of trying to come up with some
> heuristic to pick up on that fact (I have not figured that one out yet, but
> Jython folk were thinking about having marshal.loads() always throw an
> exception).

It would also be useful when benchmarking multiple iterations of the
same VM. I've considered implementing something like this for Unladen
Swallow so that we could more effectively isolate the running binary
from global state (with a sys.dont_read_bytecode command-line flag
doing for bytecode files what -E does for environment variables).

+1 for this in mainline.

Collin Winter

From greg.ewing at canterbury.ac.nz  Tue Dec  8 23:24:15 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 09 Dec 2009 11:24:15 +1300
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <4B1EA899.8060409@gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<4B1EA402.50704@activestate.com>
	<ca471dc20912081111r1e1f0bbdpddcb31f0dced08de@mail.gmail.com>
	<4B1EA899.8060409@gmail.com>
Message-ID: <4B1ED20F.8080707@canterbury.ac.nz>

John Arbash Meinel wrote:

> Whatever flag is available, though, I'm sure py2exe could be taught to
> pass it.

I'm a bit worried about the idea of adding a flag that is
required to turn on functionality that was previously
available without any flag. It could make things awkward
for launcher scripts that are agnostic about the exact
version of Python being used.

-- 
Greg

From brett at python.org  Wed Dec  9 00:13:48 2009
From: brett at python.org (Brett Cannon)
Date: Tue, 8 Dec 2009 15:13:48 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>

2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>

>  [SNIP]
>
I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files.  I am about to implement that patch for
> our purposes, thus forcing recompilation of the .py files on each run if so
> specified.   This will ensure that the application will execute only the
> code represented by the checked-out .py files.  But it occurred to me that
> this functionality might be of interest to other people than just us.  I can
> imagine, for example, that buildbots running the python regression testsuite
> might be running into problems with stray .pyo files from time to time.
>

Are you suggesting that the flag turn off reading *period*, or only if no
source is available? I think you mean the former while Guido suggested the
latter.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091208/9dc081f2/attachment.html>

From kristjan at ccpgames.com  Wed Dec  9 00:23:20 2009
From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=)
Date: Tue, 8 Dec 2009 23:23:20 +0000
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>

You are right, I was suggesting the former.  From what cursory glance I had at the code it seemed simpler to not look for a .pyo file at all, rather than to add a special rule regarding its relation to a .py file.  That would also help rule out any timestamp problems.  But I?m happy with whatever way we agree on to solve the ?orphaned bytecode? problem and glad to see that I?m not the only one experiencing it.

Kristj?n

From: bcannon at gmail.com [mailto:bcannon at gmail.com] On Behalf Of Brett Cannon
Sent: 8. desember 2009 23:14
To: Kristj?n Valur J?nsson
Cc: python-ideas at python.org
Subject: Re: [Python-ideas] disabling .pyc and .pyo files

2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com<mailto:kristjan at ccpgames.com>>
[SNIP]
I looked at the import code and I found that it is trivial to block the reading and writing of .pyo files.  I am about to implement that patch for our purposes, thus forcing recompilation of the .py files on each run if so specified.   This will ensure that the application will execute only the code represented by the checked-out .py files.  But it occurred to me that this functionality might be of interest to other people than just us.  I can imagine, for example, that buildbots running the python regression testsuite might be running into problems with stray .pyo files from time to time.

Are you suggesting that the flag turn off reading *period*, or only if no source is available? I think you mean the former while Guido suggested the latter.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091208/c046b55e/attachment.html>

From debatem1 at gmail.com  Wed Dec  9 01:07:35 2009
From: debatem1 at gmail.com (geremy condra)
Date: Tue, 8 Dec 2009 19:07:35 -0500
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <4B1EA899.8060409@gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<4B1EA402.50704@activestate.com>
	<ca471dc20912081111r1e1f0bbdpddcb31f0dced08de@mail.gmail.com>
	<4B1EA899.8060409@gmail.com>
Message-ID: <f3cc57c60912081607j71e14388v77b7c75c76f9d5c2@mail.gmail.com>

On Tue, Dec 8, 2009 at 2:27 PM, John Arbash Meinel
<john.arbash.meinel at gmail.com> wrote:
> Guido van Rossum wrote:
>> -B only blocks *writing* of bytecode. I think the OP wants to block
>> *reading*, and only in the specific case where there is no
>> corresponding source code file.
>>
>> 2009/12/8 Todd Whiteman <toddw at activestate.com>:
>>> Kristj?n Valur J?nsson wrote:
>>>> I looked at the import code and I found that it is trivial to block the
>>>> reading and writing of .pyo files. ?I am about to implement that patch for
>>>> our purposes, thus forcing recompilation of the .py files on each run if so
>>>> specified. ? This will ensure that the application will execute only the
>>>> code represented by the checked-out .py files. ?But it occurred to me that
>>>> this functionality might be of interest to other people than just us. ?I can
>>>> imagine, for example, that buildbots running the python regression testsuite
>>>> might be running into problems with stray .pyo files from time to time.
>>>>
>>>> Do you think that such a command line option would be useful for Python at
>>>> large?
>>> Yes, this is already implemented (as of Python 2.6), see -B option:
>>> http://www.python.org/doc/2.6.4/using/cmdline.html#miscellaneous-options
>
> This would be quite nice for us. In our case we have been bit several
> times during refactoring. You move one file, but your test suite still
> passes because .pyc is still around.

Same experience here.

Geremy Condra

From eric at trueblade.com  Wed Dec  9 01:04:04 2009
From: eric at trueblade.com (Eric Smith)
Date: Tue, 08 Dec 2009 19:04:04 -0500
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <87ljhdf8f2.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
Message-ID: <4B1EE974.2090801@trueblade.com>

Ben Finney wrote:
> Kristj?n Valur J?nsson
> <kristjan at ccpgames.com> writes:
> 
>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
>> files. This can happen for a variety of reasons, but most often it
>> occurs when .py files are being removed, or moved in the hierarchy.
>> The problem is that the application will happily load and import an
>> orphaned .pyo file, even though the .py file has gone or moved.
> 
> Yes, I think Python users would benefit from having the above behaviour
> be opt-in.

Agreed. This has bitten me, too. Often when it's a permissions problem 
where another user has created the .pyc file and I can't overwrite it 
(this on Windows).

> I suggest:
> 
> * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
>   interpreter follows the current behaviour. If ?False?, any bytecode
>   file satisfies an import only if it has a corresponding source file
>   (where ?corresponding? means ?this source file would, if compiled,
>   result in a bytecode file replacing this one?).

I agree with this in principle, but I don't see how you're going to 
implement it. In order to actually check this condition, aren't you 
going to have to compile the source code anyway? If so, just skip the 
bytecode file. Although I guess you could store a hash of the source in 
the compiled file, or other similar optimizations.

>   I suggest this attribute should be implemented as ?True? by default
>   (to match current behaviour), then switched to ?False? by default as
>   soon as feasible.
> 
> * The ?PYTHONIMPORTORPHANEDBYTECODE? environment variable, when set,
>   causes the interpreter to set the above option ?True?.
> 
> * The ?-b? option to the interpreter command-line sets the above option
>   ?True?.

Sounds good to me.

Eric.

From brett at python.org  Wed Dec  9 01:45:42 2009
From: brett at python.org (Brett Cannon)
Date: Tue, 8 Dec 2009 16:45:42 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com> 
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
Message-ID: <bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>

2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>

>  You are right, I was suggesting the former.  From what cursory glance I
> had at the code it seemed simpler to not look for a .pyo file at all, rather
> than to add a special rule regarding its relation to a .py file.  That would
> also help rule out any timestamp problems.  But I?m happy with whatever way
> we agree on to solve the ?orphaned bytecode? problem and glad to see that
> I?m not the only one experiencing it.
>
>
I prefer the former as well (don't read any bytecode no matter if source is
available or not); clear and simple semantics that are easy to implement.

>
>
> Kristj?n
>
>
>
> *From:* bcannon at gmail.com [mailto:bcannon at gmail.com] *On Behalf Of *Brett
> Cannon
> *Sent:* 8. desember 2009 23:14
> *To:* Kristj?n Valur J?nsson
> *Cc:* python-ideas at python.org
> *Subject:* Re: [Python-ideas] disabling .pyc and .pyo files
>
>
>
>
>
> 2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>
>
> [SNIP]
>
>  I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files.  I am about to implement that patch for
> our purposes, thus forcing recompilation of the .py files on each run if so
> specified.   This will ensure that the application will execute only the
> code represented by the checked-out .py files.  But it occurred to me that
> this functionality might be of interest to other people than just us.  I can
> imagine, for example, that buildbots running the python regression testsuite
> might be running into problems with stray .pyo files from time to time.
>
>
>
> Are you suggesting that the flag turn off reading *period*, or only if no
> source is available? I think you mean the former while Guido suggested the
> latter.
>
>
>
> -Brett
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091208/d53e1b9b/attachment.html>

From ben+python at benfinney.id.au  Wed Dec  9 03:28:01 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Wed, 09 Dec 2009 13:28:01 +1100
Subject: [Python-ideas] Importing orphaned bytecode files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com>
Message-ID: <87r5r4ev9q.fsf@benfinney.id.au>

Eric Smith <eric at trueblade.com> writes:

> Ben Finney wrote:
> > I suggest:
> >
> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
> >   interpreter follows the current behaviour. If ?False?, any bytecode
> >   file satisfies an import only if it has a corresponding source file
> >   (where ?corresponding? means ?this source file would, if compiled,
> >   result in a bytecode file replacing this one?).
>
> I agree with this in principle

Thanks.

> but I don't see how you're going to implement it. In order to actually
> check this condition, aren't you going to have to compile the source
> code anyway? If so, just skip the bytecode file. Although I guess you
> could store a hash of the source in the compiled file, or other
> similar optimizations.

You seem to be seeing something I was careful not to write. The check
is:

   this source file would, if compiled, result in a bytecode file
   replacing this one

Nowhere there is there anything about the resulting bytecode files being
equivalent. I'm limiting the check only to whether the resulting
bytecode file would *replace* the existing bytecode file.

This doesn't require knowing anything at all about the contents of the
current bytecode file; indeed, my intention was to phrase it so that
it's checked before bothering to open the existing bytecode file.

Is there a better term for this? I'm not well-versed enough in the
Python import internals to know.

-- 
 \       ?Philosophy is questions that may never be answered. Religion |
  `\              is answers that may never be questioned.? ?anonymous |
_o__)                                                                  |
Ben Finney

From guido at python.org  Wed Dec  9 04:30:25 2009
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Dec 2009 19:30:25 -0800
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <87r5r4ev9q.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> 
	<87r5r4ev9q.fsf@benfinney.id.au>
Message-ID: <ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com>

On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Eric Smith <eric at trueblade.com> writes:
>
>> Ben Finney wrote:
>> > I suggest:
>> >
>> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
>> > ? interpreter follows the current behaviour. If ?False?, any bytecode
>> > ? file satisfies an import only if it has a corresponding source file
>> > ? (where ?corresponding? means ?this source file would, if compiled,
>> > ? result in a bytecode file replacing this one?).
>>
>> I agree with this in principle
>
> Thanks.
>
>> but I don't see how you're going to implement it. In order to actually
>> check this condition, aren't you going to have to compile the source
>> code anyway? If so, just skip the bytecode file. Although I guess you
>> could store a hash of the source in the compiled file, or other
>> similar optimizations.
>
> You seem to be seeing something I was careful not to write. The check
> is:
>
> ? this source file would, if compiled, result in a bytecode file
> ? replacing this one
>
> Nowhere there is there anything about the resulting bytecode files being
> equivalent. I'm limiting the check only to whether the resulting
> bytecode file would *replace* the existing bytecode file.
>
> This doesn't require knowing anything at all about the contents of the
> current bytecode file; indeed, my intention was to phrase it so that
> it's checked before bothering to open the existing bytecode file.
>
> Is there a better term for this? I'm not well-versed enough in the
> Python import internals to know.

If there was a corresponding source file, it would have been found
first -- and the bytecode file would be used *if* it matches the
source file (by comparing a timestamp in the bytecode file's header to
the actual mtime of the source file).

So I'm not sure what there is to do apart from *not* using "lone"
bytecode files. (The latter was actually added as a feature at some
point so I betcha it's easy to make it conditional on a flag.)

-- 
--Guido van Rossum (python.org/~guido)

From ben+python at benfinney.id.au  Wed Dec  9 06:38:32 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Wed, 09 Dec 2009 16:38:32 +1100
Subject: [Python-ideas] Importing orphaned bytecode files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com>
	<87r5r4ev9q.fsf@benfinney.id.au>
	<ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com>
Message-ID: <87iqcgemg7.fsf@benfinney.id.au>

Guido van Rossum <guido at python.org> writes:

> On Tue, Dec 8, 2009 at 6:28 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> > ? this source file would, if compiled, result in a bytecode file
> > ? replacing this one
> >
> > Nowhere there is there anything about the resulting bytecode files
> > being equivalent. I'm limiting the check only to whether the
> > resulting bytecode file would *replace* the existing bytecode file.
> >
> > This doesn't require knowing anything at all about the contents of
> > the current bytecode file; indeed, my intention was to phrase it so
> > that it's checked before bothering to open the existing bytecode
> > file.
> >
> > Is there a better term for this? I'm not well-versed enough in the
> > Python import internals to know.
>
> If there was a corresponding source file, it would have been found
> first -- and the bytecode file would be used *if* it matches the
> source file (by comparing a timestamp in the bytecode file's header to
> the actual mtime of the source file).

Right, that's what I thought. I was only looking for a way to say ?only
use a bytecode file if the corresponding source code file exists?, and
then trying to define ?corresponding source code file?.

It appears that all I'm doing is confusing the issue, probably because
my understanding of the terminology is fuzzy. I hope someone else can
word it better, so the question of ?which file, exactly, are we saying
must exist?? is well answered.

> So I'm not sure what there is to do apart from *not* using "lone"
> bytecode files. (The latter was actually added as a feature at some
> point so I betcha it's easy to make it conditional on a flag.)

I hope your instinct is right, and I betcha it is too.

-- 
 \        ?Intellectual property is to the 21st century what the slave |
  `\                              trade was to the 16th.? ?David Mertz |
_o__)                                                                  |
Ben Finney

From eric at trueblade.com  Wed Dec  9 07:18:45 2009
From: eric at trueblade.com (Eric Smith)
Date: Wed, 09 Dec 2009 01:18:45 -0500
Subject: [Python-ideas] Importing orphaned bytecode files
Message-ID: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>

Sorry for top posting. My phone makes me!

You're right: I misread. Sorry about that. 
--
Eric.

"Ben Finney" <ben+python at benfinney.id.au> wrote:

>Eric Smith <eric at trueblade.com> writes:
>
>> Ben Finney wrote:
>> > I suggest:
>> >
>> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
>> >   interpreter follows the current behaviour. If ?False?, any bytecode
>> >   file satisfies an import only if it has a corresponding source file
>> >   (where ?corresponding? means ?this source file would, if compiled,
>> >   result in a bytecode file replacing this one?).
>>
>> I agree with this in principle
>
>Thanks.
>
>> but I don't see how you're going to implement it. In order to actually
>> check this condition, aren't you going to have to compile the source
>> code anyway? If so, just skip the bytecode file. Although I guess you
>> could store a hash of the source in the compiled file, or other
>> similar optimizations.
>
>You seem to be seeing something I was careful not to write. The check
>is:
>
>   this source file would, if compiled, result in a bytecode file
>   replacing this one
>
>Nowhere there is there anything about the resulting bytecode files being
>equivalent. I'm limiting the check only to whether the resulting
>bytecode file would *replace* the existing bytecode file.
>
>This doesn't require knowing anything at all about the contents of the
>current bytecode file; indeed, my intention was to phrase it so that
>it's checked before bothering to open the existing bytecode file.
>
>Is there a better term for this? I'm not well-versed enough in the
>Python import internals to know.
>
>-- 
> \       ?Philosophy is questions that may never be answered. Religion |
>  `\              is answers that may never be questioned.? ?anonymous |
>_o__)                                                                  |
>Ben Finney
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>http://mail.python.org/mailman/listinfo/python-ideas

From ben+python at benfinney.id.au  Wed Dec  9 07:28:19 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Wed, 09 Dec 2009 17:28:19 +1100
Subject: [Python-ideas] [OT] Broken email tools (was: Importing orphaned
	bytecode files)
References: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>
Message-ID: <87ein4ek58.fsf@benfinney.id.au>

Eric Smith <eric at trueblade.com> writes:

> Sorry for top posting. My phone makes me!

No, it really doesn't. If you have a broken tool, please don't inflict
its brokenness on others, especially if you *know* it's broken when
you use it.

-- 
 \        ?Nothing so needs reforming as other people's habits.? ?Mark |
  `\                                       Twain, _Pudd'n'head Wilson_ |
_o__)                                                                  |
Ben Finney

From ncoghlan at gmail.com  Wed Dec  9 11:22:35 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 09 Dec 2009 20:22:35 +1000
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <87iqcgemg7.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>	<87ljhdf8f2.fsf@benfinney.id.au>
	<4B1EE974.2090801@trueblade.com>	<87r5r4ev9q.fsf@benfinney.id.au>	<ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com>
	<87iqcgemg7.fsf@benfinney.id.au>
Message-ID: <4B1F7A6B.9060501@gmail.com>

Ben Finney wrote:
> Right, that's what I thought. I was only looking for a way to say ?only
> use a bytecode file if the corresponding source code file exists?, and
> then trying to define ?corresponding source code file?.

As Guido said, the check goes the other way: the interpreter looks for
source files first, and if it doesn't find one, only then does it look
for orphaned bytecode files (pyo/pyc).

The check for a corresponding bytecode files after a source file has
actually been found follows a different path through the import code.

Since the two features are somewhat orthogonal, slicing out the check
for orphaned bytecode files while keeping the check for a cached
bytecode file should be fairly straightforward.

Fair warning to anyone that implements this - expect to be updating
quite a few parts of the test suite. The runpy, command line, import and
zipimport tests would all need to be updated to make sure they were
respecting the flag (and probably the importlib tests as well, at least
in Py3k).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From p.f.moore at gmail.com  Wed Dec  9 13:40:53 2009
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Dec 2009 12:40:53 +0000
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
Message-ID: <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>

2009/12/9 Brett Cannon <brett at python.org>:
> I prefer the former as well (don't read any bytecode no matter if source is
> available or not); clear and simple semantics that are easy to implement.

If that's the rule, what is the point in writing bytecode at all?
It'll never be read...

Paul.

From jnoller at gmail.com  Wed Dec  9 14:04:01 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Wed, 9 Dec 2009 08:04:01 -0500
Subject: [Python-ideas] [OT] Broken email tools (was: Importing orphaned
	bytecode files)
In-Reply-To: <87ein4ek58.fsf@benfinney.id.au>
References: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>
	<87ein4ek58.fsf@benfinney.id.au>
Message-ID: <4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com>

On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Eric Smith <eric at trueblade.com> writes:
>
>> Sorry for top posting. My phone makes me!
>
> No, it really doesn't. If you have a broken tool, please don't inflict
> its brokenness on others, especially if you *know* it's broken when
> you use it.

Top posting isn't that big of an issue. Drop it, please.

From brett at python.org  Wed Dec  9 19:48:30 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 9 Dec 2009 10:48:30 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com> 
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> 
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com> 
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
Message-ID: <bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>

2009/12/9 Paul Moore <p.f.moore at gmail.com>

> 2009/12/9 Brett Cannon <brett at python.org>:
> > I prefer the former as well (don't read any bytecode no matter if source
> is
> > available or not); clear and simple semantics that are easy to implement.
>
> If that's the rule, what is the point in writing bytecode at all?
> It'll never be read...

This entire discussion is in the context of having a flag you need to set to
turn off bytecode usage; the default behavior is not going to change.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091209/b817da0c/attachment.html>

From guido at python.org  Wed Dec  9 19:52:20 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Dec 2009 10:52:20 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com> 
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> 
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com> 
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> 
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
Message-ID: <ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>

Could it be as simple as this:

-b don't read bytecode (new flag)
-B don't write bytecode (existing flag)

?

On Wed, Dec 9, 2009 at 10:48 AM, Brett Cannon <brett at python.org> wrote:
>
>
> 2009/12/9 Paul Moore <p.f.moore at gmail.com>
>>
>> 2009/12/9 Brett Cannon <brett at python.org>:
>> > I prefer the former as well (don't read any bytecode no matter if source
>> > is
>> > available or not); clear and simple semantics that are easy to
>> > implement.
>>
>> If that's the rule, what is the point in writing bytecode at all?
>> It'll never be read...
>
> This entire discussion is in the context of having a flag you need to set to
> turn off bytecode usage; the default behavior is not going to change.
> -Brett
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>

-- 
--Guido van Rossum (python.org/~guido)

From brett at python.org  Wed Dec  9 19:56:03 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 9 Dec 2009 10:56:03 -0800
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <4B1F7A6B.9060501@gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> 
	<87r5r4ev9q.fsf@benfinney.id.au>
	<ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com> 
	<87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com>
Message-ID: <bbaeab100912091056u19509f9et11a255fa3fd80c83@mail.gmail.com>

On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Ben Finney wrote:
> > Right, that's what I thought. I was only looking for a way to say ?only
> > use a bytecode file if the corresponding source code file exists?, and
> > then trying to define ?corresponding source code file?.
>
> As Guido said, the check goes the other way: the interpreter looks for
> source files first, and if it doesn't find one, only then does it look
> for orphaned bytecode files (pyo/pyc).
>
>
Just a data point: I reversed that order in importlib to match mental
semantics.

> The check for a corresponding bytecode files after a source file has
> actually been found follows a different path through the import code.
>
> Since the two features are somewhat orthogonal, slicing out the check
> for orphaned bytecode files while keeping the check for a cached
> bytecode file should be fairly straightforward.
>
> Fair warning to anyone that implements this - expect to be updating
> quite a few parts of the test suite. The runpy, command line, import and
> zipimport tests would all need to be updated to make sure they were
> respecting the flag (and probably the importlib tests as well, at least
> in Py3k).
>

Yep for importlib, but I already protect bytecode-writing tests with a
decorator for sys.dont_write_bytecode, so doing this for tests that rely on
reading bytecode could easily be decorated as well.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091209/265a6a2c/attachment.html>

From brett at python.org  Wed Dec  9 19:57:43 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 9 Dec 2009 10:57:43 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com> 
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local> 
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com> 
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com> 
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com> 
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
Message-ID: <bbaeab100912091057h271db938me453ad9d79d3c49a@mail.gmail.com>

On Wed, Dec 9, 2009 at 10:52, Guido van Rossum <guido at python.org> wrote:

> Could it be as simple as this:
>
> -b don't read bytecode (new flag)
> -B don't write bytecode (existing flag)
>

Unfortunately no: -b is "issue warnings about str(bytes_instance),
str(bytearray_instance) and comparing bytes/bytearray with str. (-bb: issue
errors)" under python3.

-Brett

>
> ?
>
> On Wed, Dec 9, 2009 at 10:48 AM, Brett Cannon <brett at python.org> wrote:
> >
> >
> > 2009/12/9 Paul Moore <p.f.moore at gmail.com>
> >>
> >> 2009/12/9 Brett Cannon <brett at python.org>:
> >> > I prefer the former as well (don't read any bytecode no matter if
> source
> >> > is
> >> > available or not); clear and simple semantics that are easy to
> >> > implement.
> >>
> >> If that's the rule, what is the point in writing bytecode at all?
> >> It'll never be read...
> >
> > This entire discussion is in the context of having a flag you need to set
> to
> > turn off bytecode usage; the default behavior is not going to change.
> > -Brett
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
> >
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091209/0a3600e8/attachment.html>

From jared.grubb at gmail.com  Wed Dec  9 20:07:54 2009
From: jared.grubb at gmail.com (Jared Grubb)
Date: Wed, 9 Dec 2009 11:07:54 -0800
Subject: [Python-ideas] Importing orphaned bytecode files (was:
	disabling .pyc and .pyo files)
In-Reply-To: <87ljhdf8f2.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
Message-ID: <BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>

On 8 Dec 2009, at 13:44, Ben Finney wrote:
> 
> * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
>  interpreter follows the current behaviour. If ?False?, any bytecode
>  file satisfies an import only if it has a corresponding source file
>  (where ?corresponding? means ?this source file would, if compiled,
>  result in a bytecode file replacing this one?).

One problem with a sys flag is that it's a global setting. Suppose a package is distributed with only pyc/pyo files, then the top-level __init__.py might flip the switch such that its sub-files can get imported from the pyc/pyo files. But you wouldnt want that flag to persist beyond that.

Another idea is to use a new file extension, which isnt the best solution, but allows the creator to explicitly set what behavior they intended for their files:
  * if a foo.py file exists, then use the existing foo.pyc/pyo as is done today
  * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but file.pyc/pyo is never used, unlike today)
(pyxxx is a placeholder for whatever would be a reasonable name)

Jared
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091209/01d573eb/attachment.html>

From guido at python.org  Wed Dec  9 20:11:58 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Dec 2009 11:11:58 -0800
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <bbaeab100912091056u19509f9et11a255fa3fd80c83@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com> 
	<87r5r4ev9q.fsf@benfinney.id.au>
	<ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com> 
	<87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com> 
	<bbaeab100912091056u19509f9et11a255fa3fd80c83@mail.gmail.com>
Message-ID: <ca471dc20912091111n345ee8f2i509d6477b141d5aa@mail.gmail.com>

On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon <brett at python.org> wrote:
>
>
> On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> Ben Finney wrote:
>> > Right, that's what I thought. I was only looking for a way to say ?only
>> > use a bytecode file if the corresponding source code file exists?, and
>> > then trying to define ?corresponding source code file?.
>>
>> As Guido said, the check goes the other way: the interpreter looks for
>> source files first, and if it doesn't find one, only then does it look
>> for orphaned bytecode files (pyo/pyc).
>>
>
> Just a data point: I reversed that order in importlib to match mental
> semantics.

IIRC zipimport also reverses the order.

>> The check for a corresponding bytecode files after a source file has
>> actually been found follows a different path through the import code.
>>
>> Since the two features are somewhat orthogonal, slicing out the check
>> for orphaned bytecode files while keeping the check for a cached
>> bytecode file should be fairly straightforward.
>>
>> Fair warning to anyone that implements this - expect to be updating
>> quite a few parts of the test suite. The runpy, command line, import and
>> zipimport tests would all need to be updated to make sure they were
>> respecting the flag (and probably the importlib tests as well, at least
>> in Py3k).
>
> Yep for importlib, but I already protect bytecode-writing tests with a
> decorator for sys.dont_write_bytecode, so doing this for tests that rely on
> reading bytecode could easily be decorated as well.
> -Brett

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Wed Dec  9 20:27:00 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Dec 2009 11:27:00 -0800
Subject: [Python-ideas] Importing orphaned bytecode files (was:
	disabling .pyc and .pyo files)
In-Reply-To: <BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
Message-ID: <ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>

On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared.grubb at gmail.com> wrote:
>
> On 8 Dec 2009, at 13:44, Ben Finney wrote:
>
> * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
> ?interpreter follows the current behaviour. If ?False?, any bytecode
> ?file satisfies an import only if it has a corresponding source file
> ?(where ?corresponding? means ?this source file would, if compiled,
> ?result in a bytecode file replacing this one?).
>
> One problem with a sys flag is that it's a global setting. Suppose a package
> is distributed with only pyc/pyo files, then the top-level __init__.py might
> flip the switch such that its sub-files can get imported from the pyc/pyo
> files. But you wouldnt want that flag to persist beyond that.

I'm not sure that there are any use cases that require using
conflicting values of this setting for different packages.

> Another idea is to use a new file extension, which isnt the best solution,
> but allows the creator to explicitly set what behavior they intended for
> their files:
> ??* if a foo.py file exists, then use the existing foo.pyc/pyo as is done
> today
> ??* if a foo.py file does not exist, but a foo.pyxxx exists, use it (but
> file.pyc/pyo is never used, unlike today)
> (pyxxx is a placeholder for whatever would be a reasonable name)

It's a much bigger change, but using a different extension would
probably remove the need for a flag. It would also help with some
tools that hide .pyc/.pyo files from view (e.g. the typical
.svnignore).

-- 
--Guido van Rossum (python.org/~guido)

From john.arbash.meinel at gmail.com  Wed Dec  9 20:34:41 2009
From: john.arbash.meinel at gmail.com (John Arbash Meinel)
Date: Wed, 09 Dec 2009 13:34:41 -0600
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
Message-ID: <4B1FFBD1.4070004@gmail.com>

Guido van Rossum wrote:
> On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared.grubb at gmail.com> wrote:
>> On 8 Dec 2009, at 13:44, Ben Finney wrote:
>>
>> * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
>>  interpreter follows the current behaviour. If ?False?, any bytecode
>>  file satisfies an import only if it has a corresponding source file
>>  (where ?corresponding? means ?this source file would, if compiled,
>>  result in a bytecode file replacing this one?).
>>
>> One problem with a sys flag is that it's a global setting. Suppose a package
>> is distributed with only pyc/pyo files, then the top-level __init__.py might
>> flip the switch such that its sub-files can get imported from the pyc/pyo
>> files. But you wouldnt want that flag to persist beyond that.
> 
> I'm not sure that there are any use cases that require using
> conflicting values of this setting for different packages.
> 

Well, during development of your own codebase, where you would like to
not import stale .pyc files, but it depends on a 3rd-party library where
they only ship you .pyc files.

Now if the flag was somehow "for all modules under this namespace" that
would easily handle it.

Or just living with "if you want to use private 3rd-party libs, then you
don't get this support for your own development".

(I don't currently do this, but it certainly is *a* use case.)

John
=:->

From ben+python at benfinney.id.au  Wed Dec  9 23:18:21 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 10 Dec 2009 09:18:21 +1100
Subject: [Python-ideas] disabling .pyc and .pyo files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
Message-ID: <87skbjdc5u.fsf@benfinney.id.au>

Guido van Rossum <guido at python.org> writes:

> Could it be as simple as this:
>
> -b don't read bytecode (new flag)
> -B don't write bytecode (existing flag)

Almost, but I think many in this discussion are agitating for ?don't
read orphaned bytecode? to become the default.

-- 
 \        ?Visitors are expected to complain at the office between the |
  `\                     hours of 9 and 11 a.m. daily.? ?hotel, Athens |
_o__)                                                                  |
Ben Finney

From brett at python.org  Wed Dec  9 23:43:05 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 9 Dec 2009 14:43:05 -0800
Subject: [Python-ideas] Importing orphaned bytecode files (was:
	disabling .pyc and .pyo files)
In-Reply-To: <ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com> 
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
Message-ID: <bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>

On Wed, Dec 9, 2009 at 11:27, Guido van Rossum <guido at python.org> wrote:

> On Wed, Dec 9, 2009 at 11:07 AM, Jared Grubb <jared.grubb at gmail.com>
> wrote:
> >
> > On 8 Dec 2009, at 13:44, Ben Finney wrote:
> >
> > * A new attribute ?sys.import_orphaned_bytecode?. If set ?True?, the
> >  interpreter follows the current behaviour. If ?False?, any bytecode
> >  file satisfies an import only if it has a corresponding source file
> >  (where ?corresponding? means ?this source file would, if compiled,
> >  result in a bytecode file replacing this one?).
> >
> > One problem with a sys flag is that it's a global setting. Suppose a
> package
> > is distributed with only pyc/pyo files, then the top-level __init__.py
> might
> > flip the switch such that its sub-files can get imported from the pyc/pyo
> > files. But you wouldnt want that flag to persist beyond that.
>
> I'm not sure that there are any use cases that require using
> conflicting values of this setting for different packages.
>
>
Same here. This is straying into optimizations for the sake of optimizing.

>  > Another idea is to use a new file extension, which isnt the best
> solution,
> > but allows the creator to explicitly set what behavior they intended for
> > their files:
> >   * if a foo.py file exists, then use the existing foo.pyc/pyo as is done
> > today
> >   * if a foo.py file does not exist, but a foo.pyxxx exists, use it (but
> > file.pyc/pyo is never used, unlike today)
> > (pyxxx is a placeholder for whatever would be a reasonable name)
>
> It's a much bigger change, but using a different extension would
> probably remove the need for a flag. It would also help with some
> tools that hide .pyc/.pyo files from view (e.g. the typical
> .svnignore).

>From a Python VM perspective, the problem with this is it doesn't help
improve the situation for other VMs that have no concept of bytecode. If we
make pyc/pyo files purely an optimization for CPython (and other VMs that
choose to support the format) and not a recognized executable format on its
own (like it is now) then that would probably help prevent people from
distributing pyc/pyo files only and thus locking out the use of other VMs.

I know some people seem to think pyc/pyo fles are a good way to obfuscate
code, but it honestly isn't, IMO. But these people stand the most to lose
from us even considering changing default behavior.

In a perfect world I would make pyc/pyo files completely optional and only
an optimization that could not work w/o the corresponding source. But in a
backwards-compatible, paranoid world I would make it an opt-in flag to
ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091209/f7ecd8e8/attachment.html>

From ben+python at benfinney.id.au  Wed Dec  9 23:44:29 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 10 Dec 2009 09:44:29 +1100
Subject: [Python-ideas] Importing orphaned bytecode files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<4B1FFBD1.4070004@gmail.com>
Message-ID: <87ocm7daya.fsf@benfinney.id.au>

John Arbash Meinel
<john.arbash.meinel at gmail.com> writes:

> Or just living with "if you want to use private 3rd-party libs, then
> you don't get this support for your own development".

FWIW, that's the option I would advocate. The default is to develop and
distribute with source; choosing to omit source (or choosing to use such
software) is choosing an inferior option for many other reasons as well,
so I don't see it as a use case that needs explicit support.

-- 
 \        ?A learning experience is one of those things that say, ?You |
  `\    know that thing you just did? Don't do that.?? ?Douglas Adams, |
_o__)                                                       2000-04-05 |
Ben Finney

From ben+python at benfinney.id.au  Thu Dec 10 00:00:04 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 10 Dec 2009 10:00:04 +1100
Subject: [Python-ideas] [OT] Broken email tools
References: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>
	<87ein4ek58.fsf@benfinney.id.au>
	<4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com>
Message-ID: <87k4wvda8b.fsf@benfinney.id.au>

Jesse Noller <jnoller at gmail.com> writes:

> On Wed, Dec 9, 2009 at 1:28 AM, Ben Finney <ben+python at benfinney.id.au> wrote:
> > Eric Smith <eric at trueblade.com> writes:
> >
> >> Sorry for top posting. My phone makes me!
> >
> > No, it really doesn't. If you have a broken tool, please don't
> > inflict its brokenness on others, especially if you *know* it's
> > broken when you use it.
>
> Top posting isn't that big of an issue. Drop it, please.

No bigger than other problems of poor human-to-human communication. I
agree with Eric that it deserves apology, even if you don't think it's a
big deal.

-- 
 \         ?In any great organization it is far, far safer to be wrong |
  `\          with the majority than to be right alone.? ?John Kenneth |
_o__)                                            Galbraith, 1989-07-28 |
Ben Finney

From debatem1 at gmail.com  Thu Dec 10 00:04:08 2009
From: debatem1 at gmail.com (geremy condra)
Date: Wed, 9 Dec 2009 18:04:08 -0500
Subject: [Python-ideas] Importing orphaned bytecode files (was:
	disabling .pyc and .pyo files)
In-Reply-To: <bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
Message-ID: <f3cc57c60912091504r42f6a24xf5ed101e98434c00@mail.gmail.com>

> In a perfect world I would make pyc/pyo files completely optional and only
> an optimization that could not work w/o the corresponding source. But in a
> backwards-compatible, paranoid world I would make it an opt-in flag to
> ignore lone pyc/pyo files. I am +10 on the former and +1 on the latter.
> -Brett

FWIW, I'm in about the same boat here.

As a somewhat tangential question, is anybody aware of any python3
projects for which requiring source would be an issue?

Geremy Condra

From fetchinson at googlemail.com  Thu Dec 10 00:27:09 2009
From: fetchinson at googlemail.com (Daniel Fetchinson)
Date: Thu, 10 Dec 2009 00:27:09 +0100
Subject: [Python-ideas] [OT] Broken email tools
In-Reply-To: <87k4wvda8b.fsf@benfinney.id.au>
References: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>
	<87ein4ek58.fsf@benfinney.id.au>
	<4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com>
	<87k4wvda8b.fsf@benfinney.id.au>
Message-ID: <fbe2e2100912091527i2ace6be9n3aee086d2244f84d@mail.gmail.com>

>> >> Sorry for top posting. My phone makes me!
>> >
>> > No, it really doesn't. If you have a broken tool, please don't
>> > inflict its brokenness on others, especially if you *know* it's
>> > broken when you use it.
>>
>> Top posting isn't that big of an issue. Drop it, please.
>
> No bigger than other problems of poor human-to-human communication. I
> agree with Eric that it deserves apology, even if you don't think it's a
> big deal.

Did you actually make a survey of c.l.p users to determine what
fraction finds top posting poor human-to-human communication? My guess
is that below 31%. From the top of my head only one name comes to mind
who thinks top posting is at least sometimes appropriate: GvR.

Note: you are free to install software that will automatically delete
any post that is top posted and voila a, you will never be bothered
again. Why not do that?

Cheers,
Daniel

-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

From solipsis at pitrou.net  Thu Dec 10 04:48:54 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 10 Dec 2009 03:48:54 +0000 (UTC)
Subject: [Python-ideas] disabling .pyc and .pyo files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
Message-ID: <loom.20091210T044538-938@post.gmane.org>

Ben Finney <ben+python at ...> writes:
> 
> Guido van Rossum <guido <at> python.org> writes:
> 
> > Could it be as simple as this:
> >
> > -b don't read bytecode (new flag)
> > -B don't write bytecode (existing flag)
> 
> Almost, but I think many in this discussion are agitating for ?don't
> read orphaned bytecode? to become the default.

Either to become the default (which might require updates to things like
py2exe), or to have a dedicated flag.
On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an
use case. If you don't want to read any bytecode, don't produce/install it in
the first place.
Bytecode is useful, it reduces startup times. It's only annoying when the
original .py file has been deleted and the obsolete .pyc/.pyo is dangling on
disk.

cheers

Antoine.

From collinw at gmail.com  Thu Dec 10 05:47:05 2009
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Dec 2009 20:47:05 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <loom.20091210T044538-938@post.gmane.org>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
	<loom.20091210T044538-938@post.gmane.org>
Message-ID: <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>

On Wed, Dec 9, 2009 at 7:48 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Ben Finney <ben+python at ...> writes:
>>
>> Guido van Rossum <guido <at> python.org> writes:
>>
>> > Could it be as simple as this:
>> >
>> > -b don't read bytecode (new flag)
>> > -B don't write bytecode (existing flag)
>>
>> Almost, but I think many in this discussion are agitating for ?don't
>> read orphaned bytecode? to become the default.
>
> Either to become the default (which might require updates to things like
> py2exe), or to have a dedicated flag.
> On the other hand, a flag not to read bytecode /at all/ doesn't seem to have an
> use case. If you don't want to read any bytecode, don't produce/install it in
> the first place.

I gave such a use-case earlier in this thread:

"""
It would also be useful when benchmarking multiple iterations of the
same VM. I've considered implementing something like this for Unladen
Swallow so that we could more effectively isolate the running binary
from global state (with a sys.dont_read_bytecode command-line flag
doing for bytecode files what -E does for environment variables).
"""

We currently handle this by deleting all .pyc/.pyo files in our
library tree, but that gets more expensive the more third-party
libraries we bring in for testing, and it's not foolproof.

Collin Winter

From solipsis at pitrou.net  Thu Dec 10 05:50:40 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 10 Dec 2009 05:50:40 +0100
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912081513m5ca6e697mb0000bdb5f17afc4@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
	<loom.20091210T044538-938@post.gmane.org>
	<43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>
Message-ID: <1260420640.3371.1.camel@localhost>

> I gave such a use-case earlier in this thread:
> 
> """
> It would also be useful when benchmarking multiple iterations of the
> same VM. I've considered implementing something like this for Unladen
> Swallow so that we could more effectively isolate the running binary
> from global state (with a sys.dont_read_bytecode command-line flag
> doing for bytecode files what -E does for environment variables).
> """

I'm not sure I understand the point. Surely importing modules isn't in
the critical path (or even in the measured path) of your benchmark, is
it?

From collinw at gmail.com  Thu Dec 10 06:00:19 2009
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Dec 2009 21:00:19 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <1260420640.3371.1.camel@localhost>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
	<loom.20091210T044538-938@post.gmane.org>
	<43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>
	<1260420640.3371.1.camel@localhost>
Message-ID: <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com>

On Wed, Dec 9, 2009 at 8:50 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> I gave such a use-case earlier in this thread:
>>
>> """
>> It would also be useful when benchmarking multiple iterations of the
>> same VM. I've considered implementing something like this for Unladen
>> Swallow so that we could more effectively isolate the running binary
>> from global state (with a sys.dont_read_bytecode command-line flag
>> doing for bytecode files what -E does for environment variables).
>> """
>
> I'm not sure I understand the point. Surely importing modules isn't in
> the critical path (or even in the measured path) of your benchmark, is
> it?

When changing the bytecode sequence produced by the CPython compiler,
it would be useful to make sure that a module is being compiled from
scratch (and hence using the new version of the compiler) instead of
reusing older bytecode from a .pyc file. You might say that we should
simply increase the magic number with each iteration, but I've never
found that having to change more code boosts my productivity
(especially in cases where changing the magic number is not necessary
for compatibility purposes). I understand this may be a fringe
use-case, but given the number of optimization projects based on
CPython (of which ours is but one), it may still be worth considering.

Collin

From solipsis at pitrou.net  Thu Dec 10 06:04:31 2009
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 10 Dec 2009 06:04:31 +0100
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<930F189C8A437347B80DF2C156F7EC7F0990675418@exchis.ccp.ad.local>
	<bbaeab100912081645j7e030ee8j8cd22fa2e25e8bfb@mail.gmail.com>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
	<loom.20091210T044538-938@post.gmane.org>
	<43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>
	<1260420640.3371.1.camel@localhost>
	<43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com>
Message-ID: <1260421471.3371.2.camel@localhost>

> When changing the bytecode sequence produced by the CPython compiler,
> it would be useful to make sure that a module is being compiled from
> scratch (and hence using the new version of the compiler) instead of
> reusing older bytecode from a .pyc file. You might say that we should
> simply increase the magic number with each iteration,

Or simply "rm -f `find -name *.pyc`" :-)

From collinw at gmail.com  Thu Dec 10 06:07:47 2009
From: collinw at gmail.com (Collin Winter)
Date: Wed, 9 Dec 2009 21:07:47 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <1260421471.3371.2.camel@localhost>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<79990c6b0912090440w71264a1jb23c9446bb3a5fcc@mail.gmail.com>
	<bbaeab100912091048u40501a95m3d55cb04917da641@mail.gmail.com>
	<ca471dc20912091052r31239960he079b4176aa20075@mail.gmail.com>
	<87skbjdc5u.fsf@benfinney.id.au>
	<loom.20091210T044538-938@post.gmane.org>
	<43aa6ff70912092047h49cfd880l6ff357fc9ec4a019@mail.gmail.com>
	<1260420640.3371.1.camel@localhost>
	<43aa6ff70912092100n5ec8138dhf9cc7787e3a45678@mail.gmail.com>
	<1260421471.3371.2.camel@localhost>
Message-ID: <43aa6ff70912092107n62e9c52ew3ffb8c7ea496f695@mail.gmail.com>

On Wed, Dec 9, 2009 at 9:04 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> When changing the bytecode sequence produced by the CPython compiler,
>> it would be useful to make sure that a module is being compiled from
>> scratch (and hence using the new version of the compiler) instead of
>> reusing older bytecode from a .pyc file. You might say that we should
>> simply increase the magic number with each iteration,
>
> Or simply "rm -f `find -name *.pyc`" :-)

As I said, "We currently handle this by deleting all .pyc/.pyo files
in our library tree, but that gets more expensive the more third-party
libraries we bring in for testing, and it's not foolproof."

I tire of quoting myself.

Collin

From ncoghlan at gmail.com  Thu Dec 10 11:38:06 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Dec 2009 20:38:06 +1000
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <ca471dc20912091111n345ee8f2i509d6477b141d5aa@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au> <4B1EE974.2090801@trueblade.com>
	<87r5r4ev9q.fsf@benfinney.id.au>
	<ca471dc20912081930g2053af8apca07706e3f3a1012@mail.gmail.com>
	<87iqcgemg7.fsf@benfinney.id.au> <4B1F7A6B.9060501@gmail.com>
	<bbaeab100912091056u19509f9et11a255fa3fd80c83@mail.gmail.com>
	<ca471dc20912091111n345ee8f2i509d6477b141d5aa@mail.gmail.com>
Message-ID: <4B20CF8E.7060800@gmail.com>

Guido van Rossum wrote:
> On Wed, Dec 9, 2009 at 10:56 AM, Brett Cannon <brett at python.org> wrote:
>>
>> On Wed, Dec 9, 2009 at 02:22, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> Ben Finney wrote:
>>>> Right, that's what I thought. I was only looking for a way to say ?only
>>>> use a bytecode file if the corresponding source code file exists?, and
>>>> then trying to define ?corresponding source code file?.
>>> As Guido said, the check goes the other way: the interpreter looks for
>>> source files first, and if it doesn't find one, only then does it look
>>> for orphaned bytecode files (pyo/pyc).
>>>
>> Just a data point: I reversed that order in importlib to match mental
>> semantics.
> 
> IIRC zipimport also reverses the order.

Hmm, not as orthogonal as I thought then :P

I guess it is a credit to the PEP 302 API that I've never needed to care
that zipimport might have the check the other way around :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Thu Dec 10 11:43:04 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Dec 2009 20:43:04 +1000
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
Message-ID: <4B20D0B8.9040203@gmail.com>

Brett Cannon wrote:
> I know some people seem to think pyc/pyo fles are a good way to
> obfuscate code, but it honestly isn't, IMO. But these people stand the
> most to lose from us even considering changing default behavior.

People that think it is a good obfuscation trick often don't realise
just how powerful Python's introspection features make the disassembly
process. When decompiled software includes the original variable names
it is a lot easier to follow than the cryptic mass of symbols that is
decompiled machine code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Thu Dec 10 11:49:28 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Dec 2009 20:49:28 +1000
Subject: [Python-ideas] [OT] Broken email tools
In-Reply-To: <87k4wvda8b.fsf@benfinney.id.au>
References: <bnoj4w8v5x9nxi2u6lj8echg.1260339525165@email.android.com>	<87ein4ek58.fsf@benfinney.id.au>	<4222a8490912090504u6b633a99k650c4f57200ed06a@mail.gmail.com>
	<87k4wvda8b.fsf@benfinney.id.au>
Message-ID: <4B20D238.9050605@gmail.com>

Ben Finney wrote:
> No bigger than other problems of poor human-to-human communication. I
> agree with Eric that it deserves apology, even if you don't think it's a
> big deal.

I'd prefer what Eric did (making a valid post, but apologising for using
a poor tool to do so) over someone feeling they can't participate in the
list discussion just because they don't have a decent email client handy.

Now, if someone was to make a habit of it, then sure, they should be
encouraged to switch to a better client. But the occasional post while
away from your regular computer? Not a problem.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From greg.ewing at canterbury.ac.nz  Fri Dec 11 00:17:12 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Dec 2009 12:17:12 +1300
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
Message-ID: <4B218178.2060009@canterbury.ac.nz>

Brett Cannon wrote:

> In a perfect world I would make pyc/pyo files completely optional and 
> only an optimization that could not work w/o the corresponding source.

That wouldn't be a perfect world in every universe. For
example, consider an app installed in an embedded device
with limited memory -- the source is never going to be
seen by anyone, and all it would do is waste resources.

-- 
Greg

From tjreedy at udel.edu  Fri Dec 11 00:25:13 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 10 Dec 2009 18:25:13 -0500
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <4B218178.2060009@canterbury.ac.nz>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>	<87ljhdf8f2.fsf@benfinney.id.au>	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
	<4B218178.2060009@canterbury.ac.nz>
Message-ID: <hfs00o$lro$1@ger.gmane.org>

Greg Ewing wrote:
> Brett Cannon wrote:
> 
>> In a perfect world I would make pyc/pyo files completely optional and 
>> only an optimization that could not work w/o the corresponding source.
> 
> That wouldn't be a perfect world in every universe. For
> example, consider an app installed in an embedded device
> with limited memory -- the source is never going to be
> seen by anyone, and all it would do is waste resources.

In a perfect world, memory would not be limited ;-)

But valid point for this world.

From ben+python at benfinney.id.au  Fri Dec 11 02:03:08 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Fri, 11 Dec 2009 12:03:08 +1100
Subject: [Python-ideas] Importing orphaned bytecode files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
	<4B218178.2060009@canterbury.ac.nz>
Message-ID: <87bpi6725v.fsf@benfinney.id.au>

Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

> Brett Cannon wrote:
>
> > In a perfect world I would make pyc/pyo files completely optional
> > and only an optimization that could not work w/o the corresponding
> > source.
>
> That wouldn't be a perfect world in every universe. For example,
> consider an app installed in an embedded device with limited memory --
> the source is never going to be seen by anyone, and all it would do is
> waste resources.

If we're positing a perfect world, then all embedded devices would have
the source code available and inspectable by any interested user.

-- 
 \             ?We can't depend for the long run on distinguishing one |
  `\         bitstream from another in order to figure out which rules |
_o__)               apply.? ?Eben Moglen, _Anarchism Triumphant_, 1999 |
Ben Finney

From jnoller at gmail.com  Fri Dec 11 02:47:33 2009
From: jnoller at gmail.com (Jesse Noller)
Date: Thu, 10 Dec 2009 20:47:33 -0500
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <87bpi6725v.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
	<4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au>
Message-ID: <4222a8490912101747h201305e2s53522fbb2c9f7ee5@mail.gmail.com>

On Thu, Dec 10, 2009 at 8:03 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Greg Ewing <greg.ewing at canterbury.ac.nz> writes:
>
>> Brett Cannon wrote:
>>
>> > In a perfect world I would make pyc/pyo files completely optional
>> > and only an optimization that could not work w/o the corresponding
>> > source.
>>
>> That wouldn't be a perfect world in every universe. For example,
>> consider an app installed in an embedded device with limited memory --
>> the source is never going to be seen by anyone, and all it would do is
>> waste resources.
>
> If we're positing a perfect world, then all embedded devices would have
> the source code available and inspectable by any interested user.

Please. Seriously, can we drop this and stop complaining about top
posting? I'm pretty sure "alt.general.python.chat" is someplace else.
No one cares.

From ben+python at benfinney.id.au  Fri Dec 11 06:16:44 2009
From: ben+python at benfinney.id.au (Ben Finney)
Date: Fri, 11 Dec 2009 16:16:44 +1100
Subject: [Python-ideas] Importing orphaned bytecode files
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
	<4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au>
	<4222a8490912101747h201305e2s53522fbb2c9f7ee5@mail.gmail.com>
Message-ID: <87pr6m5bur.fsf@benfinney.id.au>

Jesse Noller <jnoller at gmail.com> writes:

> Please. Seriously, can we drop this and stop complaining about top
> posting? I'm pretty sure "alt.general.python.chat" is someplace else.
> No one cares.

Er, this discussion isn't related to top posting; and it's hardly
off-topic to discuss here about importing bytecode files.

-- 
 \     ?I have had a perfectly wonderful evening, but this wasn't it.? |
  `\                                                     ?Groucho Marx |
_o__)                                                                  |
Ben Finney

From greg.ewing at canterbury.ac.nz  Fri Dec 11 11:58:25 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 Dec 2009 23:58:25 +1300
Subject: [Python-ideas] Importing orphaned bytecode files
In-Reply-To: <87bpi6725v.fsf@benfinney.id.au>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<87ljhdf8f2.fsf@benfinney.id.au>
	<BCC327B7-94C4-4AE6-A892-4EA97581199A@gmail.com>
	<ca471dc20912091127w31b71f8bua67af052c7793bff@mail.gmail.com>
	<bbaeab100912091443k4b2756d8nd7c6f8477fdd0a1f@mail.gmail.com>
	<4B218178.2060009@canterbury.ac.nz> <87bpi6725v.fsf@benfinney.id.au>
Message-ID: <4B2225D1.5000806@canterbury.ac.nz>

Ben Finney wrote:

> If we're positing a perfect world, then all embedded devices would have
> the source code available and inspectable by any interested user.

The source wouldn't have to be on the actual device
to make that possible, though.

-- 
Greg

From brett at python.org  Fri Dec 11 20:43:29 2009
From: brett at python.org (Brett Cannon)
Date: Fri, 11 Dec 2009 11:43:29 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <bbaeab100912111143n4ac5ddd9lbb4ee18e33424a4c@mail.gmail.com>

I don't know about the rest of you, but I think it's PEP time as the
conversation seems to have run its course. Looks like the popular options
are a flag to not read any bytecode or to only read bytecode if the source
is also available. And then whether the default behavior should change or
not.

2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com>

>  Hello there.
>
> We have a large project involving multiple perforce branches of hundreds of
> .py files each.
>
> Although we employ our own import mechanism for the bulk of these files, we
> do use the regular import mechanism for an essential core of them.
>
>
>
> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
> This can happen for a variety of reasons, but most often it occurs when .py
> files are being removed, or moved in the hierarchy.  The problem is that the
> application will happily load and import an orphaned .pyo file, even though
> the .py file has gone or moved.
>
>
>
> I looked at the import code and I found that it is trivial to block the
> reading and writing of .pyo files.  I am about to implement that patch for
> our purposes, thus forcing recompilation of the .py files on each run if so
> specified.   This will ensure that the application will execute only the
> code represented by the checked-out .py files.  But it occurred to me that
> this functionality might be of interest to other people than just us.  I can
> imagine, for example, that buildbots running the python regression testsuite
> might be running into problems with stray .pyo files from time to time.
>
>
>
> Do you think that such a command line option would be useful for Python at
> large?
>
>
>
> Cheers,
>
> Kristj?n
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091211/53184945/attachment.html>

From rrr at ronadam.com  Sat Dec 12 17:59:47 2009
From: rrr at ronadam.com (Ron Adam)
Date: Sat, 12 Dec 2009 10:59:47 -0600
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <bbaeab100912111143n4ac5ddd9lbb4ee18e33424a4c@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
	<bbaeab100912111143n4ac5ddd9lbb4ee18e33424a4c@mail.gmail.com>
Message-ID: <4B23CC03.5000403@ronadam.com>

Brett Cannon wrote:
> I don't know about the rest of you, but I think it's PEP time as the 
> conversation seems to have run its course. Looks like the popular 
> options are a flag to not read any bytecode or to only read bytecode if 
> the source is also available. And then whether the default behavior 
> should change or not.

A few additional thoughts...

Could the existing -B flag be extended to not read bytecode?
It might be considered a bug if bytecode is read when the -B option is used 
to prevent writing of bytecode.  Is there a use case for forcing the use of 
old bytecode? What was the original intent of the -B flag?

Would adding a flag to force the writing of bytecode do what is needed?  It 
would generate a noisy fail if a source file is moved or missing and renew 
old bytecode files.

These two together would give read_none and write_all bytecode modes.  With 
the default mode as the write as needed mode.

It may be good to have A utility script in the python tools directory to 
find and/or remove orphaned bytecode.  I'm not sure that just deleting all 
.py(co) files is always a good idea.

A more off the wall random thought ...

It might be nice in the future to have all bytecode in a single directory 
or package combined into a single byte_cache.py(co) file. I think Writing 
all and reading None bytecode files makes good sense in this context.

Ron

> 2009/12/8 Kristj?n Valur J?nsson <kristjan at ccpgames.com 
> <mailto:kristjan at ccpgames.com>>
> 
>     Hello there.
> 
>     We have a large project involving multiple perforce branches of
>     hundreds of .py files each.
> 
>     Although we employ our own import mechanism for the bulk of these
>     files, we do use the regular import mechanism for an essential core
>     of them.
> 
>      
> 
>     Repeatedly we run into trouble because of stray .pyo (and/or .pyc)
>     files.  This can happen for a variety of reasons, but most often it
>     occurs when .py files are being removed, or moved in the hierarchy. 
>     The problem is that the application will happily load and import an
>     orphaned .pyo file, even though the .py file has gone or moved.
> 
>      
> 
>     I looked at the import code and I found that it is trivial to block
>     the reading and writing of .pyo files.  I am about to implement that
>     patch for our purposes, thus forcing recompilation of the .py files
>     on each run if so specified.   This will ensure that the application
>     will execute only the code represented by the checked-out .py
>     files.  But it occurred to me that this functionality might be of
>     interest to other people than just us.  I can imagine, for example,
>     that buildbots running the python regression testsuite might be
>     running into problems with stray .pyo files from time to time.
> 
>      
> 
>     Do you think that such a command line option would be useful for
>     Python at large?
> 
>      
> 
>     Cheers,
> 
>     Kristj?n
> 
> 
>     _______________________________________________
>     Python-ideas mailing list
>     Python-ideas at python.org <mailto:Python-ideas at python.org>
>     http://mail.python.org/mailman/listinfo/python-ideas
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

From cool-rr at cool-rr.com  Tue Dec 15 13:36:46 2009
From: cool-rr at cool-rr.com (cool-RR)
Date: Tue, 15 Dec 2009 14:36:46 +0200
Subject: [Python-ideas] Being able to specify "copy mode" to copy.deepcopy
Message-ID: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>

This is about the `copy.deepcopy` function.

With the __deepcopy__ method, user-defined objects can specify how they will
be copied. But it is assumed that you will always want to copy them the same
way. What if sometimes you want to copy them in one way and sometimes in
another?

I am now being held back by this limitation. I will give some background to
what I'm doing:

I'm developing a simulations framework called GarlicSim. You can see a short
video here:
http://garlicsim.org/brief_introduction.html
The program handles world states in simulated worlds. To generate the next
world state in the timeline, the last world state is deepcopied and then
modified.

Now sometimes in simulations there are big, read-only objects that I don't
want to replicate for each world state. For example, a map of the
environment in which the simulation takes place. So I have defined a class
called `Persistent`, for which I have defined a __deepcopy__ that doesn't
actually copy it, but gives a reference to the original object. So now I can
use `Persistent` as a sub-class to these big objects that I don't want to
replicate.

But in some cases I do want to replicate these objects, and I can't!

So I suggest that it will be possible to specify a "mode" for copying. User
defined objects will be able to specify how they will be deepcopied in each
mode.

What do you think?

Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091215/244edf7e/attachment.html>

From python at mrabarnett.plus.com  Tue Dec 15 16:29:30 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 15 Dec 2009 15:29:30 +0000
Subject: [Python-ideas] Being able to specify "copy mode" to
	copy.deepcopy
In-Reply-To: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
Message-ID: <4B27AB5A.6010107@mrabarnett.plus.com>

cool-RR wrote:
> This is about the `copy.deepcopy` function.
> 
> With the __deepcopy__ method, user-defined objects can specify how
> they will be copied. But it is assumed that you will always want to
> copy them the same way. What if sometimes you want to copy them in
> one way and sometimes in another?
> 
> I am now being held back by this limitation. I will give some
> background to what I'm doing:
> 
> I'm developing a simulations framework called GarlicSim. You can see
> a short video here: http://garlicsim.org/brief_introduction.html The
> program handles world states in simulated worlds. To generate the 
> next world state in the timeline, the last world state is deepcopied
> and then modified.
> 
> Now sometimes in simulations there are big, read-only objects that I
>  don't want to replicate for each world state. For example, a map of
> the environment in which the simulation takes place. So I have
> defined a class called `Persistent`, for which I have defined a
> __deepcopy__ that doesn't actually copy it, but gives a reference to
> the original object. So now I can use `Persistent` as a sub-class to
> these big objects that I don't want to replicate.
> 
> But in some cases I do want to replicate these objects, and I can't!
> 
> So I suggest that it will be possible to specify a "mode" for
> copying. User defined objects will be able to specify how they will
> be deepcopied in each mode.
> 
> What do you think?
> 
My own feeling is that this is a misuse of __deepcopy__: if you ask for
a copy (of a mutable object) then you should get a copy (for immutable
objects copying isn't necessary).

From cool-rr at cool-rr.com  Tue Dec 15 16:35:52 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Tue, 15 Dec 2009 15:35:52 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Being_able_to_specify_=22copy_mode=22_to?=
	=?utf-8?q?=09copy=2Edeepcopy?=
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
	<4B27AB5A.6010107@mrabarnett.plus.com>
Message-ID: <loom.20091215T163317-417@post.gmane.org>

MRAB <python at ...> writes:
> cool-RR wrote:
> > What do you think?
> > 
> My own feeling is that this is a misuse of __deepcopy__: if you ask for
> a copy (of a mutable object) then you should get a copy (for immutable
> objects copying isn't necessary).

I agree it that the Persistent.__deecopy__ thing does smell like misuse on my 
part. However I'd be happy to hear any alternative suggestion you have on how
to solve the problem I have.

Meanwhile, I thought of a nice backwards-compatible way to implement what I 
suggest, but I want to know whether this idea makes sense at all to the people 
here.

Ram.

From algorias at gmail.com  Tue Dec 15 17:19:43 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Tue, 15 Dec 2009 13:19:43 -0300
Subject: [Python-ideas] Being able to specify "copy mode" to
	copy.deepcopy
In-Reply-To: <loom.20091215T163317-417@post.gmane.org>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
	<4B27AB5A.6010107@mrabarnett.plus.com>
	<loom.20091215T163317-417@post.gmane.org>
Message-ID: <2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>

2009/12/15 Ram Rachum <cool-rr at cool-rr.com>:
> MRAB <python at ...> writes:
>> cool-RR wrote:
>> > What do you think?
>> >
>> My own feeling is that this is a misuse of __deepcopy__: if you ask for
>> a copy (of a mutable object) then you should get a copy (for immutable
>> objects copying isn't necessary).
>
>
> I agree it that the Persistent.__deecopy__ thing does smell like misuse on my
> part. However I'd be happy to hear any alternative suggestion you have on how
> to solve the problem I have.

Deepcopy is a very simple operation conceptually, there's no need to
make it more complicated. How about implementing __deepcopy__ in your
world state objects? Specify attributes that don't need copying. You
can even use the Persistent class to signal that. Something like this
(untested!):

def __deepcopy__(self):
  new = self.__class__()
  for k,v in self.__dict__.iteritems():
    setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v))
  return new

Vitor

From jh at improva.dk  Tue Dec 15 17:17:35 2009
From: jh at improva.dk (Jacob Holm)
Date: Tue, 15 Dec 2009 17:17:35 +0100
Subject: [Python-ideas] Being able to specify "copy mode" to
	copy.deepcopy
In-Reply-To: <loom.20091215T163317-417@post.gmane.org>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>	<4B27AB5A.6010107@mrabarnett.plus.com>
	<loom.20091215T163317-417@post.gmane.org>
Message-ID: <4B27B69F.4030506@improva.dk>

Ram Rachum wrote:
> 
> I agree it that the Persistent.__deecopy__ thing does smell like misuse on my 
> part. However I'd be happy to hear any alternative suggestion you have on how
> to solve the problem I have.
> 
> Meanwhile, I thought of a nice backwards-compatible way to implement what I 
> suggest, but I want to know whether this idea makes sense at all to the people 
> here.
> 

It is already quite easy to abuse the "memo" dict argument of 
copy.deepcopy to pass this kind of flag to the __deepcopy__ methods. 
What else do you need?

  - Jacob

From cool-rr at cool-rr.com  Tue Dec 15 17:59:57 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Tue, 15 Dec 2009 16:59:57 +0000 (UTC)
Subject: [Python-ideas] Being able to specify
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
	<4B27AB5A.6010107@mrabarnett.plus.com>
	<loom.20091215T163317-417@post.gmane.org>
	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>
Message-ID: <loom.20091215T175804-541@post.gmane.org>

Vitor Bosshard <algorias at ...> writes:
> Deepcopy is a very simple operation conceptually, there's no need to
> make it more complicated. How about implementing __deepcopy__ in your
> world state objects? Specify attributes that don't need copying. You
> can even use the Persistent class to signal that. Something like this
> (untested!):
> 
> def __deepcopy__(self):
>   new = self.__class__()
>   for k,v in self.__dict__.iteritems():
>     setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v))
>   return new
> 
> Vitor

And what happens when State refers to another object which refers to a
Persistent?

Ram.

From algorias at gmail.com  Tue Dec 15 18:28:12 2009
From: algorias at gmail.com (Vitor Bosshard)
Date: Tue, 15 Dec 2009 14:28:12 -0300
Subject: [Python-ideas] Being able to specify
In-Reply-To: <loom.20091215T175804-541@post.gmane.org>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
	<4B27AB5A.6010107@mrabarnett.plus.com>
	<loom.20091215T163317-417@post.gmane.org>
	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>
	<loom.20091215T175804-541@post.gmane.org>
Message-ID: <2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com>

2009/12/15 Ram Rachum <cool-rr at cool-rr.com>:
> Vitor Bosshard <algorias at ...> writes:
>> Deepcopy is a very simple operation conceptually, there's no need to
>> make it more complicated. How about implementing __deepcopy__ in your
>> world state objects? Specify attributes that don't need copying. You
>> can even use the Persistent class to signal that. Something like this
>> (untested!):
>>
>> def __deepcopy__(self):
>> ? new = self.__class__()
>> ? for k,v in self.__dict__.iteritems():
>> ? ? setattr(new, k, v if isinstance(v, Persistent) else deepcopy(v))
>> ? return new
>>
>> Vitor
>
>
> And what happens when State refers to another object which refers to a
> Persistent?

Then that object would need to implement the same method, perhaps by
inheriting form a common base. The point is that it can be done in a
straightforward manner without needing to change the stdlib.

Vitor

From cool-rr at cool-rr.com  Tue Dec 15 18:51:32 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Tue, 15 Dec 2009 17:51:32 +0000 (UTC)
Subject: [Python-ideas] Being able to specify
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
	<4B27AB5A.6010107@mrabarnett.plus.com>
	<loom.20091215T163317-417@post.gmane.org>
	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>
	<loom.20091215T175804-541@post.gmane.org>
	<2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com>
Message-ID: <loom.20091215T184514-287@post.gmane.org>

Vitor Bosshard <algorias at ...> writes:
> > And what happens when State refers to another object which refers to a
> > Persistent?
> 
> Then that object would need to implement the same method, perhaps by
> inheriting form a common base.

And what if the object is from a class defined by a third-party module that I 
can't change?

> The point is that it can be done in a
> straightforward manner without needing to change the stdlib.

I guess so, yes. My method would be something like what Jacob said, abusing
the memo dict to pass the copying mode. But I thought perhaps we can set a 
standard way for specifying different copy modes, because otherwise I'll do my 
memo hack and someone else will do his different memo hack and it won't be 
compatible.

I'll detail my hack later today when I'll be back home.

Ram.

From tjreedy at udel.edu  Tue Dec 15 21:59:25 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 15 Dec 2009 15:59:25 -0500
Subject: [Python-ideas] Being able to specify
In-Reply-To: <loom.20091215T184514-287@post.gmane.org>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>	<4B27AB5A.6010107@mrabarnett.plus.com>	<loom.20091215T163317-417@post.gmane.org>	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>	<loom.20091215T175804-541@post.gmane.org>	<2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com>
	<loom.20091215T184514-287@post.gmane.org>
Message-ID: <hg8tb9$l70$1@ger.gmane.org>

On 12/15/2009 12:51 PM, Ram Rachum wrote:
> Vitor Bosshard<algorias at ...>  writes:
>>> And what happens when State refers to another object which refers to a
>>> Persistent?
>>
>> Then that object would need to implement the same method, perhaps by
>> inheriting form a common base.
>
> And what if the object is from a class defined by a third-party module that I
> can't change?
>
>> The point is that it can be done in a
>> straightforward manner without needing to change the stdlib.
>
> I guess so, yes. My method would be something like what Jacob said, abusing
> the memo dict to pass the copying mode. But I thought perhaps we can set a
> standard way for specifying different copy modes, because otherwise I'll do my
> memo hack and someone else will do his different memo hack and it won't be
> compatible.

Perhaps you can post a recipe at the Python Cookbook. People who care 
about compatibility can follow the same recipe.

From ncoghlan at gmail.com  Tue Dec 15 22:28:25 2009
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 16 Dec 2009 07:28:25 +1000
Subject: [Python-ideas] Being able to specify
In-Reply-To: <hg8tb9$l70$1@ger.gmane.org>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>	<4B27AB5A.6010107@mrabarnett.plus.com>	<loom.20091215T163317-417@post.gmane.org>	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>	<loom.20091215T175804-541@post.gmane.org>	<2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com>	<loom.20091215T184514-287@post.gmane.org>
	<hg8tb9$l70$1@ger.gmane.org>
Message-ID: <4B27FF79.5050902@gmail.com>

Terry Reedy wrote:
>> I guess so, yes. My method would be something like what Jacob said,
>> abusing
>> the memo dict to pass the copying mode. But I thought perhaps we can
>> set a
>> standard way for specifying different copy modes, because otherwise
>> I'll do my
>> memo hack and someone else will do his different memo hack and it
>> won't be
>> compatible.
> 
> Perhaps you can post a recipe at the Python Cookbook. People who care
> about compatibility can follow the same recipe.

Alternatively, this use case strikes me as being rather similar to the
various flatten() recipes out there that accept a list of "atomic" types
to avoid flattening iterable-but-not-really-a-container types such as
strings.

The analogy currently breaks due to copy.deepcopy() being set up with
each __deepcopy__ method doing its own recursion rather than
constructing a graph of mutable (to be copied) and immutable members (to
be referenced) down the chain of the object graph.

More flexible (but significantly harder) than adding a copy mode would
be defining a protocol for exposing the object graph in a standardised
fashion.

__iter__ in conjunction with __dict__ would get you a fair way, but
there would be a lot of complications.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From greg.ewing at canterbury.ac.nz  Tue Dec 15 23:24:06 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 Dec 2009 11:24:06 +1300
Subject: [Python-ideas] Being able to specify "copy mode" to
	copy.deepcopy
In-Reply-To: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>
Message-ID: <4B280C86.2020304@canterbury.ac.nz>

cool-RR wrote:

> With the __deepcopy__ method, user-defined objects can specify how they 
> will be copied. But it is assumed that you will always want to copy them 
> the same way. What if sometimes you want to copy them in one way and 
> sometimes in another?

Then you need to define your own system of copying methods
and implement them appropriately for the classes they apply
to.

The deepcopy mechanism is only designed to cover simple
cases. It isn't, and can't be, all things to all people.

-- 
Greg

From kevin.watters at gmail.com  Tue Dec 15 23:37:09 2009
From: kevin.watters at gmail.com (Kevin Watters)
Date: Tue, 15 Dec 2009 17:37:09 -0500
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local>
Message-ID: <hg931c$eqh$1@ger.gmane.org>

For what it's worth, I've got an entirely different use case than the 
ones I've seen in this thread so far.

I'd like Python to read .pyo files, but not search for .py or .pyc 
files. This is because we ship a py2exe app in it's "exploded" form, 
where there is an .exe and a lib/ folder full of .pyos.  Purely as an 
optimization, it'd be nice to not have Python stat for .py and then .pyc 
for every new import.

I remember glancing at Python/import.c and thinking that this could 
easily be accomplished by allowing the user to customize 
_PyImport_StandardFiletab at runtime--in fact there is already an 
PyImport_AppendInittab; it's just not exposed to Python. With a function 
like imp.set_inittab, I could get what I want with something like

     imp.set_inittab(['.pyo', 'rb', imp.PY_COMPILED])

And then of course to read just .py files, you could do

     imp.set_inittab([".py", "U", PY_SOURCE])

- Kevin

Kristj?n Valur J?nsson wrote:
> Hello there.
> 
> We have a large project involving multiple perforce branches of hundreds 
> of .py files each.
> 
> Although we employ our own import mechanism for the bulk of these files, 
> we do use the regular import mechanism for an essential core of them.
> 
>  
> 
> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) 
> files.  This can happen for a variety of reasons, but most often it 
> occurs when .py files are being removed, or moved in the hierarchy.  The 
> problem is that the application will happily load and import an orphaned 
> .pyo file, even though the .py file has gone or moved.
> 
>  
> 
> I looked at the import code and I found that it is trivial to block the 
> reading and writing of .pyo files.  I am about to implement that patch 
> for our purposes, thus forcing recompilation of the .py files on each 
> run if so specified.   This will ensure that the application will 
> execute only the code represented by the checked-out .py files.  But it 
> occurred to me that this functionality might be of interest to other 
> people than just us.  I can imagine, for example, that buildbots running 
> the python regression testsuite might be running into problems with 
> stray .pyo files from time to time.
> 
>  
> 
> Do you think that such a command line option would be useful for Python 
> at large?
> 
>  
> 
> Cheers,
> 
> Kristj?n
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

From brett at python.org  Wed Dec 16 20:20:42 2009
From: brett at python.org (Brett Cannon)
Date: Wed, 16 Dec 2009 11:20:42 -0800
Subject: [Python-ideas] disabling .pyc and .pyo files
In-Reply-To: <hg931c$eqh$1@ger.gmane.org>
References: <930F189C8A437347B80DF2C156F7EC7F0990675372@exchis.ccp.ad.local> 
	<hg931c$eqh$1@ger.gmane.org>
Message-ID: <bbaeab100912161120w45533d93tad1430d812726e14@mail.gmail.com>

On Tue, Dec 15, 2009 at 14:37, Kevin Watters <kevin.watters at gmail.com>wrote:

> For what it's worth, I've got an entirely different use case than the ones
> I've seen in this thread so far.
>
> I'd like Python to read .pyo files, but not search for .py or .pyc files.
> This is because we ship a py2exe app in it's "exploded" form, where there is
> an .exe and a lib/ folder full of .pyos.  Purely as an optimization, it'd be
> nice to not have Python stat for .py and then .pyc for every new import.
>
> I remember glancing at Python/import.c and thinking that this could easily
> be accomplished by allowing the user to customize _PyImport_StandardFiletab
> at runtime--in fact there is already an PyImport_AppendInittab; it's just
> not exposed to Python. With a function like imp.set_inittab, I could get
> what I want with something like
>
>    imp.set_inittab(['.pyo', 'rb', imp.PY_COMPILED])
>
> And then of course to read just .py files, you could do
>
>    imp.set_inittab([".py", "U", PY_SOURCE])
>
>
The problem with this is I could easily see it leading to tons of people
using custom file extensions which seems to just be asking for trouble.
Restricting that ability to only people who recompile the interpreter has
kept that in check.

As for avoiding the extra stat calls, your best bet is to either compile
your own version of CPython or use a custom importer (I will be giving a
talk on that at PyCon).

-Brett

> - Kevin
>
> Kristj?n Valur J?nsson wrote:
>
>> Hello there.
>>
>> We have a large project involving multiple perforce branches of hundreds
>> of .py files each.
>>
>> Although we employ our own import mechanism for the bulk of these files,
>> we do use the regular import mechanism for an essential core of them.
>>
>>
>> Repeatedly we run into trouble because of stray .pyo (and/or .pyc) files.
>>  This can happen for a variety of reasons, but most often it occurs when .py
>> files are being removed, or moved in the hierarchy.  The problem is that the
>> application will happily load and import an orphaned .pyo file, even though
>> the .py file has gone or moved.
>>
>>
>> I looked at the import code and I found that it is trivial to block the
>> reading and writing of .pyo files.  I am about to implement that patch for
>> our purposes, thus forcing recompilation of the .py files on each run if so
>> specified.   This will ensure that the application will execute only the
>> code represented by the checked-out .py files.  But it occurred to me that
>> this functionality might be of interest to other people than just us.  I can
>> imagine, for example, that buildbots running the python regression testsuite
>> might be running into problems with stray .pyo files from time to time.
>>
>>
>> Do you think that such a command line option would be useful for Python at
>> large?
>>
>>
>> Cheers,
>>
>> Kristj?n
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20091216/568f6d7d/attachment.html>

From cool-rr at cool-rr.com  Sun Dec 20 14:50:41 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sun, 20 Dec 2009 13:50:41 +0000 (UTC)
Subject: [Python-ideas] Being able to specify
References: <a422b72d0912150436u22ec47c8y646902e2a4856b71@mail.gmail.com>	<4B27AB5A.6010107@mrabarnett.plus.com>	<loom.20091215T163317-417@post.gmane.org>	<2987c46d0912150819x53ad678fjd89044b83ad9a4c5@mail.gmail.com>	<loom.20091215T175804-541@post.gmane.org>	<2987c46d0912150928q7cfbcc8dhf899ed820f5c1e45@mail.gmail.com>
	<loom.20091215T184514-287@post.gmane.org>
	<hg8tb9$l70$1@ger.gmane.org>
Message-ID: <loom.20091220T144904-118@post.gmane.org>

Terry Reedy <tjreedy at ...> writes:
> > I guess so, yes. My method would be something like what Jacob said, abusing
> > the memo dict to pass the copying mode. But I thought perhaps we can set a
> > standard way for specifying different copy modes, because otherwise I'll do
> > memo hack and someone else will do his different memo hack and it won't be
> > compatible.
> 
> Perhaps you can post a recipe at the Python Cookbook. People who care 
> about compatibility can follow the same recipe.

(Just a closing comment about this: I tried this and it was really pretty
simple, just making a dict subclass.)

Ram.

From cool-rr at cool-rr.com  Sun Dec 20 14:54:39 2009
From: cool-rr at cool-rr.com (Ram Rachum)
Date: Sun, 20 Dec 2009 13:54:39 +0000 (UTC)
Subject: [Python-ideas] =?utf-8?b?QWRkaW5nIGEgYG9iamVjdC5fX2RlZXBjb3B5X19g?=
Message-ID: <loom.20091220T145050-854@post.gmane.org>

Do you think there should be a `__deepcopy__` method for `object`? This will
solve a little problem I have here:

http://stackoverflow.com/questions/1933621/deepcopy-a-simple-python-object

I also think it's a more elegant solution than the way it works now.

Ram.

From qrczak at knm.org.pl  Sun Dec 20 15:04:25 2009
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sun, 20 Dec 2009 15:04:25 +0100
Subject: [Python-ideas] Adding a `object.__deepcopy__`
In-Reply-To: <loom.20091220T145050-854@post.gmane.org>
References: <loom.20091220T145050-854@post.gmane.org>
Message-ID: <3f4107910912200604i251735b2jb0354f674d515020@mail.gmail.com>

2009/12/20 Ram Rachum <cool-rr at cool-rr.com>:

> Do you think there should be a `__deepcopy__` method for `object`?

No. Deep copy is an ill-defined concept. There is no universal notion
of a deep copy, various applications need different amount of copying.
Also, various applications need different amount of sharing and cycle
detection while copying. There is no universal solution which fits
all.

Use a domain-specific function instead.

-- 
Marcin Kowalczyk

From tjreedy at udel.edu  Sun Dec 20 21:19:00 2009
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 20 Dec 2009 15:19:00 -0500
Subject: [Python-ideas] Adding a `object.__deepcopy__`
In-Reply-To: <loom.20091220T145050-854@post.gmane.org>
References: <loom.20091220T145050-854@post.gmane.org>
Message-ID: <hgm0ri$fhf$1@ger.gmane.org>

On 12/20/2009 8:54 AM, Ram Rachum wrote:
> Do you think there should be a `__deepcopy__` method for `object`? This will
> solve a little problem I have here:
>
> http://stackoverflow.com/questions/1933621/deepcopy-a-simple-python-object
>
> I also think it's a more elegant solution than the way it works now.

To me, the concept 'deepcopy' only applies to collections, and object 
instances are not collections, hence object.__deepcopy__ would be 
non-sensical.

Terry Jan Reedy

From greg.ewing at canterbury.ac.nz  Mon Dec 21 01:13:35 2009
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 21 Dec 2009 13:13:35 +1300
Subject: [Python-ideas] Alternative weakref implementation
Message-ID: <4B2EBDAF.5030804@canterbury.ac.nz>

I've been thinking about an alternative implementation
of weak references that would remove the limitation of
only certain types being weakly referenceable.

This limitation can be a nuisance sometimes. The
justification for it appears to be that you can't
build a cycle exclusively out of objects that don't
contain any mutable references, so there is no need
to create weakrefs to such objects.

However, avoiding cycles is not the only reason for
using weakrefs. They can also be useful for setting
up callbacks to be triggered when an object is
deallocated.

The restriction would be unnecessary if weakrefs could
be implemented without incurring any overhead in the
object being weakly referenced. This could be done
by keeping a global dictionary, with keys that are
uncounted references to weakly referenced objects, and
values that are lists of weakref objects.

Whenever an object is deallocated, the dict would be
checked to see if there are any weak references to it,
and if so they would be made dead and their callbacks
called, and then the dict entry would be removed.

Can anyone see any serious flaws in this scheme?

-- 
Greg

From python at mrabarnett.plus.com  Mon Dec 21 02:44:13 2009
From: python at mrabarnett.plus.com (MRAB)
Date: Mon, 21 Dec 2009 01:44:13 +0000
Subject: [Python-ideas] Alternative weakref implementation
In-Reply-To: <4B2EBDAF.5030804@canterbury.ac.nz>
References: <4B2EBDAF.5030804@canterbury.ac.nz>
Message-ID: <4B2ED2ED.9090405@mrabarnett.plus.com>

Greg Ewing wrote:
> I've been thinking about an alternative implementation of weak
> references that would remove the limitation of only certain types
> being weakly referenceable.
> 
> This limitation can be a nuisance sometimes. The justification for it
> appears to be that you can't build a cycle exclusively out of objects
> that don't contain any mutable references, so there is no need to
> create weakrefs to such objects.
> 
> However, avoiding cycles is not the only reason for using weakrefs.
> They can also be useful for setting up callbacks to be triggered when
> an object is deallocated.
> 
> The restriction would be unnecessary if weakrefs could be implemented
> without incurring any overhead in the object being weakly referenced.
> This could be done by keeping a global dictionary, with keys that are
>  uncounted references to weakly referenced objects, and values that
> are lists of weakref objects.
> 
> Whenever an object is deallocated, the dict would be checked to see
> if there are any weak references to it, and if so they would be made
> dead and their callbacks called, and then the dict entry would be
> removed.
> 
> Can anyone see any serious flaws in this scheme?
> 
What would be the cost of checking for every deallocation? If most
objects are never weakly referenced, then could the check be made
cheaper by setting a flag in an object if a weak reference is ever made
to it?