Negative times behaviour in itertools.repeat for Python maintenance releases (2.7, 3.3 and maybe 3.4)
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
Dear comrades, I would like to bring to your attention my disagreement with Larry Hastings in this ticket: http://bugs.python.org/issue19145 (Inconsistent behaviour in itertools.repeat when using negative times). Let me give you the context:
Right now, the only way you can tell repeat to do endless repetitions is to omit the `times` argument or by setting `times` argument to -1 via keyword. Larry intends to fix this in Python 3.5 by making None value to `times` argument as a representative of unlimited repetitions and negative `times` argument (either via keyword or positional) ALWAYS means 0 repetitions. This will ensure repeat has the appropriate signature. I have no qualms about it. All is well. My disagreement is related to Larry's decision not to fix this bug in Python 2.7, 3.3, and 3.4. Both of us agree that we should not let Python 2.7, 3.3, and 3.4 happily accepts None value because that is more than bug fix. What we don't agree is whether we should make negative `times` argument via keyword behaviour needs to be changed or not. He prefer let it be. I prefer we change the behaviour so that negative `times` argument in Python 2.7, 3.3, and 3.4 ALWAYS means 0 repetitions. My argument is that, on all circumstances, argument sent to function via positional or keyword must mean the same thing. Let's consider this hypothetical code: # 0 means draw, positive int means win, negative int means lose result = goals_result_of_the_match() # For every goal of the winning match, we donate money to charity. # Every donation consists of 10 $. import itertools itertools.repeat(donate_money_to_charity(), result) Later programmer B refactor this code: # 0 means draw, positive int means win, negative int means lose result = goals_result_of_the_match() # For every goal of the winning match, we donate money to charity. # Every donation consists of 10 $ from itertools import repeat repeat(object=donate_money_to_charity(), times=result) They use Python 2.7 (remember Python 2.7 is here to stay for a long long time). And imagine the match is lost 0-1 or 1-2 or 2-3 (so the goal difference is negative one / -1). It means they donate money to charity endlessly!!! They can go bankrupt. So I hope my argument is convincing enough. We need to fix this bug in Python 2.7, 3.3, and 3.4, by making `times` argument sent via positional or keyword in itertools.repeat ALWAYS means the same thing, which is 0 repetitions. If this is not possible, at the very least, we need to warn this behaviour in the doc. Whatever decision that comes up from this discussion, I will make peace with it. Vajrasky
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 Jan 2014 04:44, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
way. Larry and I were discussing this problem on IRC and the problem is that we can't remove this behaviour without providing equivalent functionality under a different spelling. In 3.5, that will be passing None, rather than -1. For those proposing to change the maintenance releases, how should a user relying on this misbehaviour update their code to handle it? There's also the fact that breaking working code in a maintenance release is always dubious, especially when there's no current supported way to get the equivalent behaviour prior to the maintenance release. This is the kind of change that will require a note in the "porting to Python 3.5" section of the What's New, again suggesting strongly that we can't change it in the maintenance releases. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 8:52 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
It looks like there is more than one bug related to passing negative times as a keyword argument:
itertools.repeat() is documented [1] as being equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object The behavior of the CPython implementation is clearly wrong. If there are people relying on it - they already have code that would break in other implementations. (I would say not accepting None for times is a bug as well if you read the docs literally.) When implementation behavior differs from documentation it is a bug by definition and a fix should go in bug-fix releases. Reproducing old behavior is fairly trivial using an old_repeat(object, *args, **kwds) wrapper to distinguish between arguments passed positionally and by keyword. However, I seriously doubt that anyone would need that. [1] http://docs.python.org/3/library/itertools.html#itertools.repeat
data:image/s3,"s3://crabby-images/4c94f/4c94fef82b11b5a49dabd4c0228ddf483e1fc69f" alt=""
On 27/01/2014 01:52, Nick Coghlan wrote:
I'm -1 on using None. The code currently rejects anything except an int. The docs don't say anything about using None, except in the "equivalent to" section, which is also the only place where it looks as if times can be a keyword argument. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 01:47 AM, Mark Lawrence wrote:
The docs describe the signature of itertools.repeat twice: the first time as its heading (" itertools.repeat(object[, times])"), the second time as an example implementation asserted to be equivalent to Python's implementation. These two signatures are not identical, but they are compatible. You state that we should pay attention to the first and ignore the second. How did you arrive at that conclusion? Also, you say something strange like "which is also the only place where it looks as if times can be a keyword argument.". I don't see a point over debating whether or not "times" *looks* like it can be a keyword argument. itertools.repeat() has accepted keyword arguments since 2.7. The code currently has semantics that cannot be accurately represented in a Python signature. We could do one of three things: 1) Do nothing, and don't allow inspect.Signature to produce a signature for the function. This is the status quo. 2) Change the semantics of the function in a non-backwards-compatible way so that we can represent its signature accurately with an inspect.Signature object. For example, "change the function so that providing times=-1 as a keyword argument behaves the same as providing times=-1 as a positional-only argument" is such an incompatible change. Another is "change the function to not accept keyword arguments at all". 3) Change the semantics of the function in a backwards-compatible way so that we can represent its supported signature accurately with an inspect.Signature object. Allow continued use of the old semantics for a full deprecation cycle (two major versions) if not longer. For example, "change the times argument to have a default of None, and change the logic so that times=None means it repeats forever" would be such an approach. For 3.3 and 3.4, I suggest that only 1) makes sense. For 3.5 I prefer 3), specifically the "times=None" approach, as that's how the function has been documented as working since the itertools module was first introduce in 2.3. And I see functions as having accurate signatures as a good thing. I'm against 2), as I'm against removing functionality simply for purity's sakes. Removing functionality breaks code. So it's best reserved for critical problems like security issues. I cite the thread we just had in python-dev, subject line "Deprecation policy", as an excellent discussion and summary of this topic. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 27, 2014 at 04:29:04AM -0800, Larry Hastings wrote:
Are you rejecting the idea that the current behaviour is an out and out buggy, and therefore fixing these things can and should occur in a bug-fix release? As far as I can see, the only piece of evidence that the given behaviour isn't a bug is that the signature says "object [, times]" rather than "object, times=None". That's not conclusive: I've often written signatures using [ ] to indicate optional arguments without specifying the default value in the signature. As it stands now, the documentation is internally contradictory. In one part of the documentation, it gives a clear indication that "times is None" should select the repeat forever behaviour. In another part of the documentation, it fails to mention that None is an acceptable value to select the repeat forever behaviour.
I'm confused... you seem to be saying that you are *against* changing the behaviour of repeat so that: repeat(x, -1) and repeat(x, times=-1) behave the same. Is that actually what you mean, or have I misunderstood? Are there any other functions in the standard library where the behaviour differs depending on whether an argument is given positionally or by keyword? -- Steven
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 04:56 AM, Steven D'Aprano wrote (rearranged slightly so I could make my points in order):
I apologize for not making myself clear. But that's part of what I meant, yes: we should preserve the existing behavior of times=-1 when passed in by position or by keyword. However, we should *also* add a deprecation warning when passing times=-1 by keyword, suggesting that they use times=None instead. The idea is that we could eventually remove the PyTuple_Size check and make times=-1 always behave like times=0. In practice it'd be okay with me if we never did, or at least not until Python 4.
While it's a bug, it's a very minor bug. As Python 3.4 release manager, my position is: Python 3.4 is in beta, so let's not change semantics for purity's sakes now. I'm -0.5 on adding times=None right now, and until we do we can't deprecate the old behavior.
Not that I know of. This instance seems to be purely unintentional; see my latest message on the relevant issue, where I went back and figured out why itertools.repeat behaves like this in the first place: http://bugs.python.org/issue19145#msg209440 Cheers, //arry/
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 06:00 PM, Vajrasky Kok wrote:
I should have been even *more* precise! When I said "times=-1" I really meant all negative numbers. (I was trying to abbreviate it as -1, as my text was already too long and unwieldly.) I propose the logic be equivalent to this, handwaving for clarity boilerplate error handling (the real implementation would handle PyArg_ParseParseTupleAndKeywords or PyLong_ToPy_ssize_t failing): PyObject *element, times = Py_None; Py_ssize_t cnt; PyArg_ParseTupleAndKeywords(args, kwargs, "O|O:repeat", kwargs, &element, ×); if times == Py_None cnt = -1 else cnt = PyLong_ToPy_ssize_t(times) if cnt < 0 if "times" was passed by keyword issue DeprecationWarning, "use repeat(o, times=None) to repeat indefinitely" else cnt = 0 (For those of you who aren't familiar with the source: "cnt" is the internal variable used to set the repeat count of the iterator. If "cnt" is < 0, the iterator repeats forever.) If in the future we actually removed the deprecated behavior, the last "if" block would change simply to if cnt < 0 cnt = 0 //arry/
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 9:13 PM, Larry Hastings <larry@hastings.org> wrote:
I bow to your decision, Larry. So I believe the doc fix is required then. I propose these for doc fix: 1. Keeps the status quo ================= the object\nendlessly.' We don't explain the meaning of negative `times`. Well, people shouldn't repeat with negative times because statement such as, "Kids, repeat the push-up negative two times more.", does not make sense. 2. Explains the negative times, ignores the keyword ===================================== the object\nendlessly. Negative times means zero repetitions.' The signature repeat(object [,times]) suggest this function does not accept keyword as some core developers have stated. So if the user uses keyword with this function, well, it's too bad for them. 3. Explains the negative times, warns about keyword ====================================== the object\nendlessly. Negative times means zero repetitions. This function accepts keyword argument but the behaviour is buggy and should be avoided.' 4. Explains everything ================ the object\nendlessly. Negative times means zero repetitions via positional-only arguments. -1 value for times via keyword means endless repetitions and is same as omitting times argument and other negative number for times means endless repetitions as well but with different implementation.' If you are wondering about the last statement:
Which one is better? Once we settle this, I can think about the doc fix for Doc/library/itertools.rst. Vajrasky
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 06:26 PM, Vajrasky Kok wrote:
So I believe the doc fix is required then.
I propose the docstring should describe only supported behavior, and the docs in the manual should mention the unsupported behavior. However, I'm interested in Raymond's take, as he's the original author of itertools.repeat. If I were writing it, it might well come out like this: docstring: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. documentation: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. Equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object A common use for repeat is to supply a stream of constant values to map or zip: >>> >>> list(map(pow, range(10), repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] .. note: if "times" is specified using a keyword argument, and provided with a negative value, repeat yields the object forever. This is a bug, its use is unsupported, and this behavior may be removed in a future version of Python. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 27, 2014 at 10:06:57PM -0800, Larry Hastings wrote:
If I were writing it, it might well come out like this: [snip example]
+1 on this wording, with one minor caveat:
How about changing "may be removed" to "will be removed", he asks hopefully? :-) -- Steven
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/28/2014 06:18 AM, Ethan Furman wrote:
See the recent discussion "Deprecation policy" right here in python-dev for a cogent discussion on this issue. I agree with Raymond's view, posted on 1/25: * A good use for deprecations is for features that were flat-out misdesigned and prone to error. For those, there is nothing wrong with deprecating them right away. Once deprecated though, there doesn't need to be a rush to actually remove it -- that just makes it harder for people with currently working code to upgrade to newer versions of Python. * When I became a core developer well over a decade ago, I was a little deprecation happy (old stuff must go, keep everything nice and clean, etc). What I learned though is that deprecations are very hard on users and that the purported benefits usually aren't really important. I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this category, and its advice on how to handle it is sound. Cheers, //arry/
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 01/28/2014 06:50 PM, Larry Hastings wrote:
I also agree with this view.
I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this category, and its advice on how to handle it is sound.
I don't agree with this. This is a bug. Somebody going through (for example) a code review and making minor changes so the code is more readable shouldn't have to be afraid that [inserting | removing] the keyword in the function call is going to *drastically* [1] change the behavior. I understand the need for a cycle of deprecation [2], but not fixing it in 3.5 is folly. -- ~Ethan~ [1] or change the behavior *at all*, for that matter [2] speaking of deprecations, are all the 3.1, 3.2, etc., etc., deprecations being added to 2.7?
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/28/2014 09:18 PM, Ethan Furman wrote:
It's a bug. But it's also a longstanding bug, having been a part of Python since 2.7. Python is the language that cares about backwards-compatibility--bugs and all. If your code runs on version X.Y, it should run without modification on version X.(Y+Z) where Z is a positive integer. Therefore it would be inappropriate to remove the "times=-1 when passed by keyword repeats indefinitely" behavior without at /least/ a full deprecation cycle. Personally I'd prefer to leave the behavior in, undocumented and deprecated, until Python 4.0. //arry/
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 1 February 2014 16:04, Ethan Furman <ethan@stoneleaf.us> wrote:
No, we make a judgment call based on the severity and likelihood of encountering the bug, the consequences of encountering it, and the difficulty of working around it after you *do* encounter it. In this case: * Likelihood: low (using keyword arguments with simple APIs like this is not a common idiom, and you have to specifically pass -1 to trigger misbehaviour) * Consequence: application hang. Minimal chance of silent data corruption, high chance of being caught in testing if it's going to be encountered at all. Effectively equivalent impact arises when passing large integer values. * Security impact: negligible. If you're passing untrusted input to the second argument of repeat() at all, you're already exposed, independent of this bug. * Workaround: don't use keyword arguments or else prevalidate the input (the latter can also handle the large integer problem) * Potential backwards compatibility issue?: Yes, as users could be relying on this as a data driven trigger for infinite repetition and the most convenient currently available alternative spelling is rather awkward (involving *args). * Requires a new feature to fix properly?: Yes, as "None" should be added as a replacement, less error prone, data driven trigger for infinite repetition for deprecating or removing the current behaviour. Assessment: * this is a likely rare, high impact, easy to detect, easy to workaround failure where fixing it involves both adding a new feature and potentially breaking currently working code Conclusion: * add the new, supported feature that provides equivalent functionality (accepting None) in 3.5 * deprecate the problematic behaviour in 3.5 and then start treating it as equivalent to the positional argument handling in 3.6+ It's a complex balance between wanting to fix things that are broken in the interpreter and standard library and not breaking currently working end user code. As general rules of thumb: - for maintenance releases and a new feature release approaching release candidate status, err on the side of "preserving backwards compatibility is paramount" and "no new user visible features allowed" - for a new feature release after feature freeze, "no new user visible features allowed" still applies, but there is still scope to make the case for new deprecations and potentially even arguable backwards compatibility breaks that involve adding a note to the porting guide in the "What's New" document - for a new feature release before feature freeze, new features are allowed, as are new deprecations and notes in the porting guide - security issues that are deemed to be the interpreter's responsibility to resolve can trump all of this and lead to the inclusion of new features in maintenance releases. However, even then the new behaviour is likely to be opt-in in the old releases and opt-out in the next feature release (for example, hash randomisation). The "Porting to Python X.Y" guides can be a useful source of examples of previous changes that have been judged as posing too great a risk of breaking end user code to implement in a maintenance release, even when they're to resolve bugs. This is the one for 3.4: http://docs.python.org/dev/whatsnew/3.4.html#porting-to-python-3-4 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 02/01/2014 04:20 AM, Nick Coghlan wrote: [snip very nice summary] So you agree that the bug should be fixed. So do I. My disagreement with Larry is that he would leave the bug in until Py4k. -- ~Ethan~
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Feb 01, 2014 at 10:20:24PM +1000, Nick Coghlan wrote:
Nice summary Nick, however there is one slight error:
It's not just -1, its any negative value: py> from itertools import repeat py> it = repeat('a', -17) # Correct. py> next(it) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration py> it = repeat('a', times=-17) # Bug, repeats forever. py> next(it) 'a' [...]
Seems reasonable to me. Thanks again for the nice summary. -- Steven
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Jan 31, 2014 at 07:23:52PM -0800, Larry Hastings wrote:
That is a reasonable and responsible position to take.
Personally I'd prefer to leave the behavior in, undocumented and deprecated, until Python 4.0.
It's one thing to avoid removing things which do no harm, but this is a bug which does harm. Anyone who refactors repeat(x, count) to repeat(x, times=count) (or vice versa) is in for a nasty surprise if count ever becomes negative. If anyone out there relies on this bug, they can get the same behaviour easily, and end up with better code: # Before. repeat(x, times=-1) # Magic to make x repeat forever. # After. repeat(x) # Non-magic fully documented way of getting x to repeat forever. Given support for times=None at the same time, then it's easy to support the use-case of "sometimes I want an infinite repeat, sometimes I don't, but I won't know until runtime": # Before. values = (100, -1) # Repeat 100 times, or forever. repeat(x, times=random.choice(values)) # After. values = (100, None) # Repeat 100 times, or forever. repeat(x, times=random.choice(values)) I can see that delaying for a depreciation cycle is the responsible thing to do. I don't think delaying fixing this until some hypothetical distant Python 4000 is a good idea. -- Steven
data:image/s3,"s3://crabby-images/4c94f/4c94fef82b11b5a49dabd4c0228ddf483e1fc69f" alt=""
On 27/01/2014 12:56, Steven D'Aprano wrote:
None is not currently an acceptable value, ValueError is raised if you provide anything other than an int in both Python 2.7 and 3.3. That's why I'm against using it to say "run forever" in Python 3.5. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 12:00 PM, Vajrasky Kok <sky.kok@speaklikeaking.com>wrote:
repeat('a', times=-1) repeat('a')
As I think about it, this may be more than a bug but a door for a denial of service attack. Imagine an application where times comes from an untrusted source. Someone relying on documented behavior may decide to sanitize the value by only checking against an upper bound assuming that negative values would just lead to no repetitions. If an attacker could somehow case times to get the value of -1 this may cause an infinite loop, blow up memory etc. If you think this is far-fetched - consider a web app that uses repeat() as a part of logic to pretty-print user input. The times value may come from a calculation of a difference between the screen width and the length of some string - both under user control. So maybe the fix should go into security bugs only branches as well.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 January 2014 13:51, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
If we do go this path, then we should backport the full fix (i.e. accepting None to indicate repeating forever), rather than just a partial fix. That is, I'm OK with either not backporting anything at all, or backporting the full change. The only idea I object to is the one of removing the infinite iteration capability without providing a replacement spelling for it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 12:20 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
+1
A partial backport will do a disservice to both users and maintainers.
In case we are taking "not backporting anything at all" road, what is the best fix for the document? Old the object\nendlessly.' New the object\nendlessly. If times is specified through positional and negative numbers, it always means 0 repetitions. If times is specified through keyword and -1, it means endless repetition. Other negative numbers for times through keyword should be avoided.'
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok <sky.kok@speaklikeaking.com>wrote:
I would say no fix is needed for this doc because the signature suggests (correctly) that passing times by keyword is not supported. The following behavior further supports this interpretation.
The ReST documentation may benefit from an addition of a warning that behavior of repeat() is "undefined" when times is passed by keyword.
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/26/2014 08:40 PM, Alexander Belopolsky wrote:
Where does it do that? And why would the function support keyword arguments, if it was the author's intent to not support them? It's easier to not support them, you call PyArg_ParseTuple. Calling PyArg_ParseTupleAndKeywords is slightly more involved. //arry/
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 01:39 AM, Antoine Pitrou wrote:
That's not my experience. It's very common--in fact I believe more common--for functions that only accept positional arguments to *not* use the square-brackets-for-optional-parameters convention. The square-brackets-for-optional-parameters convention is not legal Python syntax, so I observe that documentation authors avoid it when they can, preferring to express their function's signature in real Python. As an example, consider "heapq.nlargest(n, iterable, key=None)". The implementation uses PyArg_ParseTuple to parse its parameters, and therefore does not accept keyword arguments. But--no square brackets. My experience is that the doc convention of square-brackets-for-optional-parameters is primarily used in two circumstances: one, when doing something really crazy like optional groups, and two, when the default value of one of the function's parameters is inconvenient to specify as a Python value. Of these two the second is far more common. An example of this latter case is zlib.compressobj(). The documentation shows its last parameter as "[, zdict]". However, the implementation parses uses PyArg_ParseTupleAndKeywords(), and therefore accepts keyword arguments. Furthermore, this notation simply cannot be used for functions that have only required parameters. You can't look at the constructor for "memoryview(object)" and determine whether or not it accepts keyword arguments. (It does.) There seems to be no strong correlation between functions that only accept positional-only parameters and functions whose documentation uses square-brackets-for-optional-parameters. Indeed, this is one of the things that can be frustrating about Python, which is why I hope we can make Python 3.5 more predictable in this area. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 26, 2014 at 11:40:33PM -0500, Alexander Belopolsky wrote:
I would say no fix is needed for this doc because the signature suggests (correctly) that passing times by keyword is not supported.
How do you determine that? Passing times by keyword works in Python 3.3: py> from itertools import repeat py> it = repeat("a", times=5) py> list(it) ['a', 'a', 'a', 'a', 'a'] The docstring signature names both parameters: py> print(repeat.__doc__) repeat(object [,times]) -> create an iterator which returns the object for the specified number of times. If not specified, returns the object endlessly. And both names work fine: py> repeat(object=2, times=5) repeat(2, 5) Judging from the docstring and current behaviour, I think we can conclude that: - keyword arguments are fine; - there shouldn't be any difference between providing a value by position or by keyword; - repeat(x) should yield x forever; - repeat(x, 0) should immediately raise StopIteration; - repeat(x, -n) is not specified by the docstring, but the documentation on the website makes it clear that it should be equivalent to repeat(x, 0) no matter the magnitude of -n; - which matches the behaviour of [x]*-n nicely. As far as I'm concerned, this is a clear case of a bug. Providing times=None (by keyword or by position) ought to be equivalent to not providing times at all, and any negative times ought to be equivalent to zero.
I don't think it does. I think it suggests that something is trying to convert an unsigned value into an int, and failing. I note that repeat is happy to work with negatives times one at a time: py> it = repeat('a', times=-4) py> next(it) 'a' -- Steven
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 12:02 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Is repeat('a') (omitting times argument) not a replacement spelling for it? What about this alternative? Makes -1 consistently mean unlimited repetition and other negative numbers consistently mean zero repetitions then document this behaviour. Just throwing suggestion. I am not so keen to it, though.
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 1/26/2014 11:02 PM, Nick Coghlan wrote:
I believe that the replacement spelling would be the existing 'spelling' of no spelling at all -- omitting the argument. However, having to (temporarily) write def myfunc(...what, num, ...): ... if times is None: itertools.repeat(what, num) else: itertools.repeat(what) is obnoxious at best. While I am a strong supported of no new features in bug releases, one of my early commits added a new parameter to difflib.SequenceMatcher. This was after pydev discussion that included Tim Peters and which concluded that there was no other sane way to fix the bug. If merely making both ways of passing times have the same effect is rejected, then I vote for the complete fix. Fixing the default for an existing parameter is less of a new feature than a new parameter. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Mon, 27 Jan 2014 14:02:29 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I would say not backport at all. The security threat is highly theoretical. If someone blindly accepts user values for repeat(), the user value can just as well be a very large positive with similar effects (e.g. 2**31). Regards Antoine.
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 5:38 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I can not comment about whether this is security issue or not. But the effect of large positive number is not similar to the effect of unlimited repetitions.
That is why I prefer we backport the fix (either partial or full). If not, giving a big warning in the documentation should suffice.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 January 2014 22:29, Antoine Pitrou <solipsis@pitrou.net> wrote:
And for anyone interested in why a sufficiently large positive value that won't fit in available RAM fails gracefully with MemoryError:
list() uses __length_hint__() for preallocation, so a sufficiently large length hint means the preallocation attempt fails with MemoryError. As Antoine showed though, you still can't feed it untrusted data, because a large enough value that just fits into RAM can still cause you a lot of grief. Everything points to "times=-1" behaving as it does being a bug, but not a sufficiently critical one to risk breaking working code in a maintenance release. That makes deprecating the current behaviour of "times=-1" and accepting "times=None" in 3.5 the least controversial course of action. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 8:29 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Sure, just adjust the number to fit the available memory (here, 2**29 does the trick).
I get your point. But strangely enough, I can still recover from list(repeat('a', 2**29)). It only slows down my computer. I can ^Z the application then kill it later. But with list(repeat('a', times=-1)), rebooting the machine is compulsory.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 Jan 2014 04:44, "Serhiy Storchaka" <storchaka@gmail.com> wrote:
way. Larry and I were discussing this problem on IRC and the problem is that we can't remove this behaviour without providing equivalent functionality under a different spelling. In 3.5, that will be passing None, rather than -1. For those proposing to change the maintenance releases, how should a user relying on this misbehaviour update their code to handle it? There's also the fact that breaking working code in a maintenance release is always dubious, especially when there's no current supported way to get the equivalent behaviour prior to the maintenance release. This is the kind of change that will require a note in the "porting to Python 3.5" section of the What's New, again suggesting strongly that we can't change it in the maintenance releases. Cheers, Nick.
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 8:52 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
It looks like there is more than one bug related to passing negative times as a keyword argument:
itertools.repeat() is documented [1] as being equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object The behavior of the CPython implementation is clearly wrong. If there are people relying on it - they already have code that would break in other implementations. (I would say not accepting None for times is a bug as well if you read the docs literally.) When implementation behavior differs from documentation it is a bug by definition and a fix should go in bug-fix releases. Reproducing old behavior is fairly trivial using an old_repeat(object, *args, **kwds) wrapper to distinguish between arguments passed positionally and by keyword. However, I seriously doubt that anyone would need that. [1] http://docs.python.org/3/library/itertools.html#itertools.repeat
data:image/s3,"s3://crabby-images/4c94f/4c94fef82b11b5a49dabd4c0228ddf483e1fc69f" alt=""
On 27/01/2014 01:52, Nick Coghlan wrote:
I'm -1 on using None. The code currently rejects anything except an int. The docs don't say anything about using None, except in the "equivalent to" section, which is also the only place where it looks as if times can be a keyword argument. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 01:47 AM, Mark Lawrence wrote:
The docs describe the signature of itertools.repeat twice: the first time as its heading (" itertools.repeat(object[, times])"), the second time as an example implementation asserted to be equivalent to Python's implementation. These two signatures are not identical, but they are compatible. You state that we should pay attention to the first and ignore the second. How did you arrive at that conclusion? Also, you say something strange like "which is also the only place where it looks as if times can be a keyword argument.". I don't see a point over debating whether or not "times" *looks* like it can be a keyword argument. itertools.repeat() has accepted keyword arguments since 2.7. The code currently has semantics that cannot be accurately represented in a Python signature. We could do one of three things: 1) Do nothing, and don't allow inspect.Signature to produce a signature for the function. This is the status quo. 2) Change the semantics of the function in a non-backwards-compatible way so that we can represent its signature accurately with an inspect.Signature object. For example, "change the function so that providing times=-1 as a keyword argument behaves the same as providing times=-1 as a positional-only argument" is such an incompatible change. Another is "change the function to not accept keyword arguments at all". 3) Change the semantics of the function in a backwards-compatible way so that we can represent its supported signature accurately with an inspect.Signature object. Allow continued use of the old semantics for a full deprecation cycle (two major versions) if not longer. For example, "change the times argument to have a default of None, and change the logic so that times=None means it repeats forever" would be such an approach. For 3.3 and 3.4, I suggest that only 1) makes sense. For 3.5 I prefer 3), specifically the "times=None" approach, as that's how the function has been documented as working since the itertools module was first introduce in 2.3. And I see functions as having accurate signatures as a good thing. I'm against 2), as I'm against removing functionality simply for purity's sakes. Removing functionality breaks code. So it's best reserved for critical problems like security issues. I cite the thread we just had in python-dev, subject line "Deprecation policy", as an excellent discussion and summary of this topic. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 27, 2014 at 04:29:04AM -0800, Larry Hastings wrote:
Are you rejecting the idea that the current behaviour is an out and out buggy, and therefore fixing these things can and should occur in a bug-fix release? As far as I can see, the only piece of evidence that the given behaviour isn't a bug is that the signature says "object [, times]" rather than "object, times=None". That's not conclusive: I've often written signatures using [ ] to indicate optional arguments without specifying the default value in the signature. As it stands now, the documentation is internally contradictory. In one part of the documentation, it gives a clear indication that "times is None" should select the repeat forever behaviour. In another part of the documentation, it fails to mention that None is an acceptable value to select the repeat forever behaviour.
I'm confused... you seem to be saying that you are *against* changing the behaviour of repeat so that: repeat(x, -1) and repeat(x, times=-1) behave the same. Is that actually what you mean, or have I misunderstood? Are there any other functions in the standard library where the behaviour differs depending on whether an argument is given positionally or by keyword? -- Steven
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 04:56 AM, Steven D'Aprano wrote (rearranged slightly so I could make my points in order):
I apologize for not making myself clear. But that's part of what I meant, yes: we should preserve the existing behavior of times=-1 when passed in by position or by keyword. However, we should *also* add a deprecation warning when passing times=-1 by keyword, suggesting that they use times=None instead. The idea is that we could eventually remove the PyTuple_Size check and make times=-1 always behave like times=0. In practice it'd be okay with me if we never did, or at least not until Python 4.
While it's a bug, it's a very minor bug. As Python 3.4 release manager, my position is: Python 3.4 is in beta, so let's not change semantics for purity's sakes now. I'm -0.5 on adding times=None right now, and until we do we can't deprecate the old behavior.
Not that I know of. This instance seems to be purely unintentional; see my latest message on the relevant issue, where I went back and figured out why itertools.repeat behaves like this in the first place: http://bugs.python.org/issue19145#msg209440 Cheers, //arry/
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 06:00 PM, Vajrasky Kok wrote:
I should have been even *more* precise! When I said "times=-1" I really meant all negative numbers. (I was trying to abbreviate it as -1, as my text was already too long and unwieldly.) I propose the logic be equivalent to this, handwaving for clarity boilerplate error handling (the real implementation would handle PyArg_ParseParseTupleAndKeywords or PyLong_ToPy_ssize_t failing): PyObject *element, times = Py_None; Py_ssize_t cnt; PyArg_ParseTupleAndKeywords(args, kwargs, "O|O:repeat", kwargs, &element, ×); if times == Py_None cnt = -1 else cnt = PyLong_ToPy_ssize_t(times) if cnt < 0 if "times" was passed by keyword issue DeprecationWarning, "use repeat(o, times=None) to repeat indefinitely" else cnt = 0 (For those of you who aren't familiar with the source: "cnt" is the internal variable used to set the repeat count of the iterator. If "cnt" is < 0, the iterator repeats forever.) If in the future we actually removed the deprecated behavior, the last "if" block would change simply to if cnt < 0 cnt = 0 //arry/
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 9:13 PM, Larry Hastings <larry@hastings.org> wrote:
I bow to your decision, Larry. So I believe the doc fix is required then. I propose these for doc fix: 1. Keeps the status quo ================= the object\nendlessly.' We don't explain the meaning of negative `times`. Well, people shouldn't repeat with negative times because statement such as, "Kids, repeat the push-up negative two times more.", does not make sense. 2. Explains the negative times, ignores the keyword ===================================== the object\nendlessly. Negative times means zero repetitions.' The signature repeat(object [,times]) suggest this function does not accept keyword as some core developers have stated. So if the user uses keyword with this function, well, it's too bad for them. 3. Explains the negative times, warns about keyword ====================================== the object\nendlessly. Negative times means zero repetitions. This function accepts keyword argument but the behaviour is buggy and should be avoided.' 4. Explains everything ================ the object\nendlessly. Negative times means zero repetitions via positional-only arguments. -1 value for times via keyword means endless repetitions and is same as omitting times argument and other negative number for times means endless repetitions as well but with different implementation.' If you are wondering about the last statement:
Which one is better? Once we settle this, I can think about the doc fix for Doc/library/itertools.rst. Vajrasky
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 06:26 PM, Vajrasky Kok wrote:
So I believe the doc fix is required then.
I propose the docstring should describe only supported behavior, and the docs in the manual should mention the unsupported behavior. However, I'm interested in Raymond's take, as he's the original author of itertools.repeat. If I were writing it, it might well come out like this: docstring: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. documentation: repeat(object [,times]) -> iterator Return an iterator which yields the object for the specified number of times. If times is unspecified, yields the object forever. If times is negative, behave as if times is 0. Equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object A common use for repeat is to supply a stream of constant values to map or zip: >>> >>> list(map(pow, range(10), repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] .. note: if "times" is specified using a keyword argument, and provided with a negative value, repeat yields the object forever. This is a bug, its use is unsupported, and this behavior may be removed in a future version of Python. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 27, 2014 at 10:06:57PM -0800, Larry Hastings wrote:
If I were writing it, it might well come out like this: [snip example]
+1 on this wording, with one minor caveat:
How about changing "may be removed" to "will be removed", he asks hopefully? :-) -- Steven
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/28/2014 06:18 AM, Ethan Furman wrote:
See the recent discussion "Deprecation policy" right here in python-dev for a cogent discussion on this issue. I agree with Raymond's view, posted on 1/25: * A good use for deprecations is for features that were flat-out misdesigned and prone to error. For those, there is nothing wrong with deprecating them right away. Once deprecated though, there doesn't need to be a rush to actually remove it -- that just makes it harder for people with currently working code to upgrade to newer versions of Python. * When I became a core developer well over a decade ago, I was a little deprecation happy (old stuff must go, keep everything nice and clean, etc). What I learned though is that deprecations are very hard on users and that the purported benefits usually aren't really important. I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this category, and its advice on how to handle it is sound. Cheers, //arry/
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 01/28/2014 06:50 PM, Larry Hastings wrote:
I also agree with this view.
I think the "times behaves differently when passed by name versus passed by position" behavior falls exactly into this category, and its advice on how to handle it is sound.
I don't agree with this. This is a bug. Somebody going through (for example) a code review and making minor changes so the code is more readable shouldn't have to be afraid that [inserting | removing] the keyword in the function call is going to *drastically* [1] change the behavior. I understand the need for a cycle of deprecation [2], but not fixing it in 3.5 is folly. -- ~Ethan~ [1] or change the behavior *at all*, for that matter [2] speaking of deprecations, are all the 3.1, 3.2, etc., etc., deprecations being added to 2.7?
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/28/2014 09:18 PM, Ethan Furman wrote:
It's a bug. But it's also a longstanding bug, having been a part of Python since 2.7. Python is the language that cares about backwards-compatibility--bugs and all. If your code runs on version X.Y, it should run without modification on version X.(Y+Z) where Z is a positive integer. Therefore it would be inappropriate to remove the "times=-1 when passed by keyword repeats indefinitely" behavior without at /least/ a full deprecation cycle. Personally I'd prefer to leave the behavior in, undocumented and deprecated, until Python 4.0. //arry/
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 1 February 2014 16:04, Ethan Furman <ethan@stoneleaf.us> wrote:
No, we make a judgment call based on the severity and likelihood of encountering the bug, the consequences of encountering it, and the difficulty of working around it after you *do* encounter it. In this case: * Likelihood: low (using keyword arguments with simple APIs like this is not a common idiom, and you have to specifically pass -1 to trigger misbehaviour) * Consequence: application hang. Minimal chance of silent data corruption, high chance of being caught in testing if it's going to be encountered at all. Effectively equivalent impact arises when passing large integer values. * Security impact: negligible. If you're passing untrusted input to the second argument of repeat() at all, you're already exposed, independent of this bug. * Workaround: don't use keyword arguments or else prevalidate the input (the latter can also handle the large integer problem) * Potential backwards compatibility issue?: Yes, as users could be relying on this as a data driven trigger for infinite repetition and the most convenient currently available alternative spelling is rather awkward (involving *args). * Requires a new feature to fix properly?: Yes, as "None" should be added as a replacement, less error prone, data driven trigger for infinite repetition for deprecating or removing the current behaviour. Assessment: * this is a likely rare, high impact, easy to detect, easy to workaround failure where fixing it involves both adding a new feature and potentially breaking currently working code Conclusion: * add the new, supported feature that provides equivalent functionality (accepting None) in 3.5 * deprecate the problematic behaviour in 3.5 and then start treating it as equivalent to the positional argument handling in 3.6+ It's a complex balance between wanting to fix things that are broken in the interpreter and standard library and not breaking currently working end user code. As general rules of thumb: - for maintenance releases and a new feature release approaching release candidate status, err on the side of "preserving backwards compatibility is paramount" and "no new user visible features allowed" - for a new feature release after feature freeze, "no new user visible features allowed" still applies, but there is still scope to make the case for new deprecations and potentially even arguable backwards compatibility breaks that involve adding a note to the porting guide in the "What's New" document - for a new feature release before feature freeze, new features are allowed, as are new deprecations and notes in the porting guide - security issues that are deemed to be the interpreter's responsibility to resolve can trump all of this and lead to the inclusion of new features in maintenance releases. However, even then the new behaviour is likely to be opt-in in the old releases and opt-out in the next feature release (for example, hash randomisation). The "Porting to Python X.Y" guides can be a useful source of examples of previous changes that have been judged as posing too great a risk of breaking end user code to implement in a maintenance release, even when they're to resolve bugs. This is the one for 3.4: http://docs.python.org/dev/whatsnew/3.4.html#porting-to-python-3-4 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 02/01/2014 04:20 AM, Nick Coghlan wrote: [snip very nice summary] So you agree that the bug should be fixed. So do I. My disagreement with Larry is that he would leave the bug in until Py4k. -- ~Ethan~
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Feb 01, 2014 at 10:20:24PM +1000, Nick Coghlan wrote:
Nice summary Nick, however there is one slight error:
It's not just -1, its any negative value: py> from itertools import repeat py> it = repeat('a', -17) # Correct. py> next(it) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration py> it = repeat('a', times=-17) # Bug, repeats forever. py> next(it) 'a' [...]
Seems reasonable to me. Thanks again for the nice summary. -- Steven
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Jan 31, 2014 at 07:23:52PM -0800, Larry Hastings wrote:
That is a reasonable and responsible position to take.
Personally I'd prefer to leave the behavior in, undocumented and deprecated, until Python 4.0.
It's one thing to avoid removing things which do no harm, but this is a bug which does harm. Anyone who refactors repeat(x, count) to repeat(x, times=count) (or vice versa) is in for a nasty surprise if count ever becomes negative. If anyone out there relies on this bug, they can get the same behaviour easily, and end up with better code: # Before. repeat(x, times=-1) # Magic to make x repeat forever. # After. repeat(x) # Non-magic fully documented way of getting x to repeat forever. Given support for times=None at the same time, then it's easy to support the use-case of "sometimes I want an infinite repeat, sometimes I don't, but I won't know until runtime": # Before. values = (100, -1) # Repeat 100 times, or forever. repeat(x, times=random.choice(values)) # After. values = (100, None) # Repeat 100 times, or forever. repeat(x, times=random.choice(values)) I can see that delaying for a depreciation cycle is the responsible thing to do. I don't think delaying fixing this until some hypothetical distant Python 4000 is a good idea. -- Steven
data:image/s3,"s3://crabby-images/4c94f/4c94fef82b11b5a49dabd4c0228ddf483e1fc69f" alt=""
On 27/01/2014 12:56, Steven D'Aprano wrote:
None is not currently an acceptable value, ValueError is raised if you provide anything other than an int in both Python 2.7 and 3.3. That's why I'm against using it to say "run forever" in Python 3.5. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 12:00 PM, Vajrasky Kok <sky.kok@speaklikeaking.com>wrote:
repeat('a', times=-1) repeat('a')
As I think about it, this may be more than a bug but a door for a denial of service attack. Imagine an application where times comes from an untrusted source. Someone relying on documented behavior may decide to sanitize the value by only checking against an upper bound assuming that negative values would just lead to no repetitions. If an attacker could somehow case times to get the value of -1 this may cause an infinite loop, blow up memory etc. If you think this is far-fetched - consider a web app that uses repeat() as a part of logic to pretty-print user input. The times value may come from a calculation of a difference between the screen width and the length of some string - both under user control. So maybe the fix should go into security bugs only branches as well.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 January 2014 13:51, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
If we do go this path, then we should backport the full fix (i.e. accepting None to indicate repeating forever), rather than just a partial fix. That is, I'm OK with either not backporting anything at all, or backporting the full change. The only idea I object to is the one of removing the infinite iteration capability without providing a replacement spelling for it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 12:20 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
+1
A partial backport will do a disservice to both users and maintainers.
In case we are taking "not backporting anything at all" road, what is the best fix for the document? Old the object\nendlessly.' New the object\nendlessly. If times is specified through positional and negative numbers, it always means 0 repetitions. If times is specified through keyword and -1, it means endless repetition. Other negative numbers for times through keyword should be avoided.'
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Sun, Jan 26, 2014 at 11:26 PM, Vajrasky Kok <sky.kok@speaklikeaking.com>wrote:
I would say no fix is needed for this doc because the signature suggests (correctly) that passing times by keyword is not supported. The following behavior further supports this interpretation.
The ReST documentation may benefit from an addition of a warning that behavior of repeat() is "undefined" when times is passed by keyword.
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/26/2014 08:40 PM, Alexander Belopolsky wrote:
Where does it do that? And why would the function support keyword arguments, if it was the author's intent to not support them? It's easier to not support them, you call PyArg_ParseTuple. Calling PyArg_ParseTupleAndKeywords is slightly more involved. //arry/
data:image/s3,"s3://crabby-images/3d5e5/3d5e5dcf0a107ab8d3b7c638a8a9a5ea98ecf5f7" alt=""
On 01/27/2014 01:39 AM, Antoine Pitrou wrote:
That's not my experience. It's very common--in fact I believe more common--for functions that only accept positional arguments to *not* use the square-brackets-for-optional-parameters convention. The square-brackets-for-optional-parameters convention is not legal Python syntax, so I observe that documentation authors avoid it when they can, preferring to express their function's signature in real Python. As an example, consider "heapq.nlargest(n, iterable, key=None)". The implementation uses PyArg_ParseTuple to parse its parameters, and therefore does not accept keyword arguments. But--no square brackets. My experience is that the doc convention of square-brackets-for-optional-parameters is primarily used in two circumstances: one, when doing something really crazy like optional groups, and two, when the default value of one of the function's parameters is inconvenient to specify as a Python value. Of these two the second is far more common. An example of this latter case is zlib.compressobj(). The documentation shows its last parameter as "[, zdict]". However, the implementation parses uses PyArg_ParseTupleAndKeywords(), and therefore accepts keyword arguments. Furthermore, this notation simply cannot be used for functions that have only required parameters. You can't look at the constructor for "memoryview(object)" and determine whether or not it accepts keyword arguments. (It does.) There seems to be no strong correlation between functions that only accept positional-only parameters and functions whose documentation uses square-brackets-for-optional-parameters. Indeed, this is one of the things that can be frustrating about Python, which is why I hope we can make Python 3.5 more predictable in this area. //arry/
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 26, 2014 at 11:40:33PM -0500, Alexander Belopolsky wrote:
I would say no fix is needed for this doc because the signature suggests (correctly) that passing times by keyword is not supported.
How do you determine that? Passing times by keyword works in Python 3.3: py> from itertools import repeat py> it = repeat("a", times=5) py> list(it) ['a', 'a', 'a', 'a', 'a'] The docstring signature names both parameters: py> print(repeat.__doc__) repeat(object [,times]) -> create an iterator which returns the object for the specified number of times. If not specified, returns the object endlessly. And both names work fine: py> repeat(object=2, times=5) repeat(2, 5) Judging from the docstring and current behaviour, I think we can conclude that: - keyword arguments are fine; - there shouldn't be any difference between providing a value by position or by keyword; - repeat(x) should yield x forever; - repeat(x, 0) should immediately raise StopIteration; - repeat(x, -n) is not specified by the docstring, but the documentation on the website makes it clear that it should be equivalent to repeat(x, 0) no matter the magnitude of -n; - which matches the behaviour of [x]*-n nicely. As far as I'm concerned, this is a clear case of a bug. Providing times=None (by keyword or by position) ought to be equivalent to not providing times at all, and any negative times ought to be equivalent to zero.
I don't think it does. I think it suggests that something is trying to convert an unsigned value into an int, and failing. I note that repeat is happy to work with negatives times one at a time: py> it = repeat('a', times=-4) py> next(it) 'a' -- Steven
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 12:02 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Is repeat('a') (omitting times argument) not a replacement spelling for it? What about this alternative? Makes -1 consistently mean unlimited repetition and other negative numbers consistently mean zero repetitions then document this behaviour. Just throwing suggestion. I am not so keen to it, though.
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 1/26/2014 11:02 PM, Nick Coghlan wrote:
I believe that the replacement spelling would be the existing 'spelling' of no spelling at all -- omitting the argument. However, having to (temporarily) write def myfunc(...what, num, ...): ... if times is None: itertools.repeat(what, num) else: itertools.repeat(what) is obnoxious at best. While I am a strong supported of no new features in bug releases, one of my early commits added a new parameter to difflib.SequenceMatcher. This was after pydev discussion that included Tim Peters and which concluded that there was no other sane way to fix the bug. If merely making both ways of passing times have the same effect is rejected, then I vote for the complete fix. Fixing the default for an existing parameter is less of a new feature than a new parameter. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/fef1e/fef1ed960ef8d77a98dd6e2c2701c87878206a2e" alt=""
On Mon, 27 Jan 2014 14:02:29 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I would say not backport at all. The security threat is highly theoretical. If someone blindly accepts user values for repeat(), the user value can just as well be a very large positive with similar effects (e.g. 2**31). Regards Antoine.
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 5:38 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I can not comment about whether this is security issue or not. But the effect of large positive number is not similar to the effect of unlimited repetitions.
That is why I prefer we backport the fix (either partial or full). If not, giving a big warning in the documentation should suffice.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 January 2014 22:29, Antoine Pitrou <solipsis@pitrou.net> wrote:
And for anyone interested in why a sufficiently large positive value that won't fit in available RAM fails gracefully with MemoryError:
list() uses __length_hint__() for preallocation, so a sufficiently large length hint means the preallocation attempt fails with MemoryError. As Antoine showed though, you still can't feed it untrusted data, because a large enough value that just fits into RAM can still cause you a lot of grief. Everything points to "times=-1" behaving as it does being a bug, but not a sufficiently critical one to risk breaking working code in a maintenance release. That makes deprecating the current behaviour of "times=-1" and accepting "times=None" in 3.5 the least controversial course of action. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/d9bef/d9bef61e641057fc7dbad0c73ec1ce9c53831467" alt=""
On Mon, Jan 27, 2014 at 8:29 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Sure, just adjust the number to fit the available memory (here, 2**29 does the trick).
I get your point. But strangely enough, I can still recover from list(repeat('a', 2**29)). It only slows down my computer. I can ^Z the application then kill it later. But with list(repeat('a', times=-1)), rebooting the machine is compulsory.
participants (12)
-
Alexander Belopolsky
-
Antoine Pitrou
-
Armin Rigo
-
Ethan Furman
-
Georg Brandl
-
Larry Hastings
-
Mark Lawrence
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy
-
Vajrasky Kok