[Python-Dev] Python-Dev Digest, Vol 102, Issue 35

Mon Jan 16 11:23:51 CET 2012

jbk

python-dev-request at python.org编写：

>Send Python-Dev mailing list submissions to
>	python-dev at python.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://mail.python.org/mailman/listinfo/python-dev
>or, via email, send a message with subject or body 'help' to
>	python-dev-request at python.org
>
>You can reach the person managing the list at
>	python-dev-owner at python.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Python-Dev digest..."
>
>
>Today's Topics:
>
>   1. Re: Status of the fix for the hash collision	vulnerability
>      (Gregory P. Smith)
>   2. Re: Status of the fix for the hash collision vulnerability
>      (Barry Warsaw)
>   3. Re: Sphinx version for Python 2.x docs (?ric Araujo)
>   4. Re: Status of the fix for the hash collision vulnerability
>      (martin at v.loewis.de)
>   5. Re: Status of the fix for the hash collision	vulnerability
>      (Guido van Rossum)
>   6. Re: [Python-checkins] cpython: add test,	which was missing
>      from d64ac9ab4cd0 (Nick Coghlan)
>   7. Re: Status of the fix for the hash collision	vulnerability
>      (Terry Reedy)
>   8. Re: Status of the fix for the hash collision	vulnerability
>      (Jack Diederich)
>   9. Re: cpython: Implement PEP 380 - 'yield from' (closes	#11682)
>      (Nick Coghlan)
>  10. Re: Status of the fix for the hash collision	vulnerability
>      (Nick Coghlan)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Fri, 13 Jan 2012 19:06:00 -0800
>From: "Gregory P. Smith" <greg at krypto.org>
>Cc: python-dev at python.org
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID:
>	<CAGE7PNKkHW-_WqiuQC9bhqxnoU77f+eprs_q3nqmycstM3JZag at mail.gmail.com>
>Content-Type: text/plain; charset="iso-8859-1"
>
>On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at krypto.org> wrote:
>
>>
>> On Fri, Jan 13, 2012 at 5:38 PM, Guido van Rossum <guido at python.org>wrote:
>>
>>> On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at pitrou.net>wrote:
>>>
>>>> On Thu, 12 Jan 2012 18:57:42 -0800
>>>> Guido van Rossum <guido at python.org> wrote:
>>>> > Hm... I started out as a big fan of the randomized hash, but thinking
>>>> more
>>>> > about it, I actually believe that the chances of some legitimate app
>>>> having
>>>> > >1000 collisions are way smaller than the chances that somebody's code
>>>> will
>>>> > break due to the variable hashing.
>>>>
>>>> Breaking due to variable hashing is deterministic: you notice it as
>>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>>> variable hashing). That seems better than unpredictable breaking when
>>>> some legitimate collision chain happens.
>>>
>>>
>>> Fair enough. But I'm now uncomfortable with turning this on for bugfix
>>> releases. I'm fine with making this the default in 3.3, just not in 3.2,
>>> 3.1 or 2.x -- it will break too much code and organizations will have to
>>> roll back the release or do extensive testing before installing a bugfix
>>> release -- exactly what we *don't* want for those.
>>>
>>> FWIW, I don't believe in the SafeDict solution -- you never know which
>>> dicts you have to change.
>>>
>>>
>> Agreed.
>>
>> Of the three options Victor listed only one is good.
>>
>> I don't like *SafeDict*.  *-1*.  It puts the onerous on the coder to
>> always get everything right with regards to data that came from outside the
>> process never ending up hashed in a non-safe dict or set *anywhere*.
>>  "Safe" needs to be the default option for all hash tables.
>>
>> I don't like the "*too many hash collisions*" exception. *-1*. It
>> provides non-deterministic application behavior for data driven
>> applications with no way for them to predict when it'll happen or where and
>> prepare for it. It may work in practice for many applications but is simply
>> odd behavior.
>>
>> I do like *randomly seeding the hash*. *+1*. This is easy. It can easily
>> be back ported to any Python version.
>>
>> It is perfectly okay to break existing users who had anything depending on
>> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
>> feature off at their own peril which they can use in their test harnesses
>> that are stupid enough to use doctests with order dependencies.
>>
>
>What an implementation looks like:
>
> http://pastebin.com/9ydETTag
>
>some stuff to be filled in, but this is all that is really required.  add
>logic to allow a particular seed to be specified or forced to 0 from the
>command line or environment.  add the logic to grab random bytes.  add the
>autoconf glue to disable it.  done.
>
>-gps
>
>
>> This approach worked fine for Perl 9 years ago.
>> https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371
>>
>> -gps
>>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/3fb82673/attachment-0001.html>
>
>------------------------------
>
>Message: 2
>Date: Sat, 14 Jan 2012 04:19:38 +0100
>From: Barry Warsaw <barry at python.org>
>To: python-dev at python.org
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID: <20120114041938.098fd14b at rivendell>
>Content-Type: text/plain; charset=US-ASCII
>
>On Jan 13, 2012, at 05:38 PM, Guido van Rossum wrote:
>
>>On Fri, Jan 13, 2012 at 5:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>>> Breaking due to variable hashing is deterministic: you notice it as
>>> soon as you upgrade (and then you use PYTHONHASHSEED to disable
>>> variable hashing). That seems better than unpredictable breaking when
>>> some legitimate collision chain happens.
>>
>>
>>Fair enough. But I'm now uncomfortable with turning this on for bugfix
>>releases. I'm fine with making this the default in 3.3, just not in 3.2,
>>3.1 or 2.x -- it will break too much code and organizations will have to
>>roll back the release or do extensive testing before installing a bugfix
>>release -- exactly what we *don't* want for those.
>
>+1
>
>-Barry
>
>
>------------------------------
>
>Message: 3
>Date: Sat, 14 Jan 2012 04:24:52 +0100
>From: ?ric Araujo <merwok at netwok.org>
>To: <python-dev at python.org>
>Subject: Re: [Python-Dev] Sphinx version for Python 2.x docs
>Message-ID: <ff8dc5d4bd1c5d3583c3ff9c18e2445e at netwok.org>
>Content-Type: text/plain; charset=UTF-8; format=flowed
>
>Hi Sandro,
>
>Thanks for getting the ball rolling on this.  One style for markup, one
>Sphinx version to code our extensions against and one location for the
>documenting guidelines will make our work a bit easier.
>
>> During the build process, there are some warnings that I can 
>> understand:
>I assume you mean ?can?t?, as you later ask how to fix them.  As a
>general rule, they?re only warnings, so they don?t break the build, 
>only
>some links or stylings, so I think it?s okay to ignore them *right 
>now*.
>
>> Doc/glossary.rst:520: WARNING: unknown keyword: nonlocal
>That?s a mistake I did in cefe4f38fa0e.  This sentence should be 
>removed.
>
>> Doc/library/stdtypes.rst:2372: WARNING: more than one target found 
>> for
>> cross-reference u'next':
>Need to use :meth:`.next` to let Sphinx find the right target (more 
>info
>on request :)
>
>> Doc/library/sys.rst:651: WARNING: unknown keyword: None
>Should use ``None``.
>
>> Doc/reference/datamodel.rst:1942: WARNING: unknown keyword: not in
>> Doc/reference/expressions.rst:1184: WARNING: unknown keyword: is not
>I don?t know if these should work (i.e. create a link to the 
>appropriate
>language reference section) or abuse the markup (there are ?not? and
>?in? keywords, but no ?not in? keyword ? use ``not in``).  I?d say 
>ignore
>them.
>
>Cheers
>
>
>------------------------------
>
>Message: 4
>Date: Sat, 14 Jan 2012 04:45:57 +0100
>From: martin at v.loewis.de
>To: python-dev at python.org
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID:
>	<20120114044557.Horde.MZdrbFNNcXdPEPp1QVb0EaA at webmail.df.eu>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed; DelSp=Yes
>
>> What an implementation looks like:
>>
>>  http://pastebin.com/9ydETTag
>>
>> some stuff to be filled in, but this is all that is really required.
>
>I think this statement (and the patch) is wrong. You also need to change
>the byte string hashing, at least for 2.x. This I consider the biggest
>flaw in that approach - other people may have written string-like objects
>which continue to compare equal to a string but now hash different.
>
>Regards,
>Martin
>
>
>
>
>------------------------------
>
>Message: 5
>Date: Fri, 13 Jan 2012 20:00:54 -0800
>From: Guido van Rossum <guido at python.org>
>To: "Gregory P. Smith" <greg at krypto.org>
>Cc: Antoine Pitrou <solipsis at pitrou.net>, python-dev at python.org
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID:
>	<CAP7+vJL+Qrz0oiqbLPCg3QxVqZLjbOeMQpeQykiidiGC2uN9FQ at mail.gmail.com>
>Content-Type: text/plain; charset="iso-8859-1"
>
>On Fri, Jan 13, 2012 at 5:58 PM, Gregory P. Smith <greg at krypto.org> wrote:
>
>> It is perfectly okay to break existing users who had anything depending on
>> ordering of internal hash tables. Their code was already broken. We *will*provide a flag and/or environment variable that can be set to turn the
>> feature off at their own peril which they can use in their test harnesses
>> that are stupid enough to use doctests with order dependencies.
>
>
>No, that is not how we usually take compatibility between bugfix releases.
>"Your code is already broken" is not an argument to break forcefully what
>worked (even if by happenstance) before. The difference between CPython and
>Jython (or between different CPython feature releases) also isn't relevant
>-- historically we have often bent over backwards to avoid changing
>behavior that was technically undefined, if we believed it would affect a
>significant fraction of users.
>
>I don't think anyone doubts that this will break lots of code (at least,
>the arguments I've heard have been "their code is broken", not "nobody does
>that").
>
>This approach worked fine for Perl 9 years ago.
>> https://rt.perl.org/rt3//Public/Bug/Display.html?id=22371
>>
>
>I don't know what the Perl attitude about breaking undefined behavior
>between micro versions was at the time. But ours is pretty clear -- don't
>do it.
>
>-- 
>--Guido van Rossum (python.org/~guido)
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://mail.python.org/pipermail/python-dev/attachments/20120113/16511835/attachment-0001.html>
>
>------------------------------
>
>Message: 6
>Date: Sat, 14 Jan 2012 15:16:32 +1000
>From: Nick Coghlan <ncoghlan at gmail.com>
>To: python-dev at python.org
>Cc: python-checkins at python.org
>Subject: Re: [Python-Dev] [Python-checkins] cpython: add test,	which
>	was missing from d64ac9ab4cd0
>Message-ID:
>	<CADiSq7fcjLgkrjQEqBhb0oNu9eiLnHhovtoZRDzNSTDvjzx3ZQ at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>On Sat, Jan 14, 2012 at 5:39 AM, benjamin.peterson
><python-checkins at python.org> wrote:
>> http://hg.python.org/cpython/rev/be85914b611c
>> changeset: ? 74363:be85914b611c
>> parent: ? ? ?74361:609482c6710e
>> user: ? ? ? ?Benjamin Peterson <benjamin at python.org>
>> date: ? ? ? ?Fri Jan 13 14:39:38 2012 -0500
>> summary:
>> ?add test, which was missing from d64ac9ab4cd0
>
>Ah, that's where that came from, thanks.
>
>I still haven't fully trained myself to use hg import instead of
>patch, which would avoid precisely this kind of error :P
>
>Cheers,
>Nick.
>
>-- 
>Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
>
>
>------------------------------
>
>Message: 7
>Date: Sat, 14 Jan 2012 00:43:04 -0500
>From: Terry Reedy <tjreedy at udel.edu>
>To: python-dev at python.org
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID: <jer4lp$qe4$1 at dough.gmane.org>
>Content-Type: text/plain; charset=UTF-8; format=flowed
>
>On 1/13/2012 8:58 PM, Gregory P. Smith wrote:
>
>> It is perfectly okay to break existing users who had anything depending
>> on ordering of internal hash tables. Their code was already broken.
>
>Given that the doc says "Return the hash value of the object", I do not 
>think we should be so hard-nosed. The above clearly implies that there 
>is such a thing as *the* Python hash value for an object. And indeed, 
>that has been true across many versions. If we had written "Return a 
>hash value for the object, which can vary from run to run", the case 
>would be different.
>
>-- 
>Terry Jan Reedy
>
>
>
>------------------------------
>
>Message: 8
>Date: Sat, 14 Jan 2012 01:24:54 -0500
>From: Jack Diederich <jackdied at gmail.com>
>To: Guido van Rossum <guido at python.org>
>Cc: Python Dev <Python-Dev at python.org>
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID:
>	<CACLn2+3Z1EW8Rxox7Zif=20P2SDHxYhv+Wo6dhXKKnO09+-uxQ at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>On Thu, Jan 12, 2012 at 9:57 PM, Guido van Rossum <guido at python.org> wrote:
>> Hm... I started out as a big fan of the randomized hash, but thinking more
>> about it, I actually believe that the chances of some legitimate app having
>>>1000 collisions are way smaller than the chances that somebody's code will
>> break due to the variable hashing.
>
>Python's dicts are designed to avoid hash conflicts by resizing and
>keeping the available slots bountiful.  1000 conflicts sounds like a
>number that couldn't be hit accidentally unless you had a single dict
>using a terabyte of RAM (i.e. if Titus Brown doesn't object, we're
>good).   The hashes also look to exploit cache locality but that is
>very unlikely to get one thousand conflicts by chance.  If you get
>that many there is an attack.
>
>> This is depending on how the counting is done (I didn't look at MAL's
>> patch), and assuming that increasing the hash table size will generally
>> reduce collisions if items collide but their hashes are different.
>
>The patch counts conflicts on an individual insert and not lifetime
>conflicts.  Looks sane to me.
>
>> That said, even with collision counting I'd like a way to disable it without
>> changing the code, e.g. a flag or environment variable.
>
>Agreed.  Paranoid people can turn the behavior off and if it ever were
>to become a problem in practice we could point people to a solution.
>
>-Jack
>
>
>------------------------------
>
>Message: 9
>Date: Sat, 14 Jan 2012 16:53:39 +1000
>From: Nick Coghlan <ncoghlan at gmail.com>
>To: Georg Brandl <g.brandl at gmx.net>
>Cc: python-dev at python.org
>Subject: Re: [Python-Dev] cpython: Implement PEP 380 - 'yield from'
>	(closes	#11682)
>Message-ID:
>	<CADiSq7dA6P8U3_MiweM9=s-q49+y0KndeQX=ZNGWog-dZ-hzMA at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>On Sat, Jan 14, 2012 at 1:17 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> On 01/13/2012 12:43 PM, nick.coghlan wrote:
>>> diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
>>
>> There should probably be a "versionadded" somewhere on this page.
>
>Good catch, I added versionchanged notes to this page, simple_stmts
>and the StopIteration entry in the library reference.
>
>>> ?PEP 3155: Qualified name for classes and functions
>>> ?==================================================
>>
>> This looks like a spurious (and syntax-breaking) change.
>
>Yeah, it was an error I introduced last time I merged from default. Fixed.
>
>>> diff --git a/Grammar/Grammar b/Grammar/Grammar
>>> -argument: test [comp_for] | test '=' test ?# Really [keyword '='] test
>>> +argument: (test) [comp_for] | test '=' test ?# Really [keyword '='] test
>>
>> This looks like a change without effect?
>
>Fixed.
>
>It was a lingering after-effect of Greg's original patch (which also
>modified the function call syntax to allow "yield from" expressions
>with extra parens). I reverted the change to the function call syntax,
>but forgot to ditch the added parens while doing so.
>
>>> diff --git a/Include/genobject.h b/Include/genobject.h
>>>
>>> - ? ? /* List of weak reference. */
>>> - ? ? PyObject *gi_weakreflist;
>>> + ? ? ? ?/* List of weak reference. */
>>> + ? ? ? ?PyObject *gi_weakreflist;
>>> ?} PyGenObject;
>>
>> While these change tabs into spaces, it should be 4 spaces, not 8.
>
>Fixed.
>
>>> +PyAPI_FUNC(int) PyGen_FetchStopIterationValue(PyObject **);
>>
>> Does this API need to be public? If yes, it needs to be documented.
>
>Hmm, good point - that one needs a bit of thought, so I've put it on
>the tracker: http://bugs.python.org/issue13783
>
>(that issue also covers your comments regarding the docstring for this
>function and whether or not we even need the StopIteration instance
>creation API)
>
>>> -#define CALL_FUNCTION ? ? ? ?131 ? ? /* #args + (#kwargs<<8) */
>>> -#define MAKE_FUNCTION ? ? ? ?132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> -#define BUILD_SLICE ?133 ? ? /* Number of items */
>>> +#define CALL_FUNCTION ? 131 ? ? /* #args + (#kwargs<<8) */
>>> +#define MAKE_FUNCTION ? 132 ? ? /* #defaults + #kwdefaults<<8 + #annotations<<16 */
>>> +#define BUILD_SLICE ? ? 133 ? ? /* Number of items */
>>
>> Not sure putting these and all the other cosmetic changes into an already
>> big patch is such a good idea...
>
>I agree, but it's one of the challenges of a long-lived branch like
>the PEP 380 one (I believe some of these cosmetic changes started life
>in Greg's original patch and separating them out would have been quite
>a pain). Anyone that wants to see the gory details of the branch
>history can take a look at my bitbucket repo:
>
>https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/tip/branch%28%22pep380%22%29
>
>>> diff --git a/Objects/abstract.c b/Objects/abstract.c
>>> --- a/Objects/abstract.c
>>> +++ b/Objects/abstract.c
>>> @@ -2267,7 +2267,6 @@
>>>
>>> ? ? ?func = PyObject_GetAttrString(o, name);
>>> ? ? ?if (func == NULL) {
>>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>>> ? ? ? ? ?return 0;
>>> ? ? ?}
>>>
>>> @@ -2311,7 +2310,6 @@
>>>
>>> ? ? ?func = PyObject_GetAttrString(o, name);
>>> ? ? ?if (func == NULL) {
>>> - ? ? ? ?PyErr_SetString(PyExc_AttributeError, name);
>>> ? ? ? ? ?return 0;
>>> ? ? ?}
>>> ? ? ?va_start(va, format);
>>
>> These two changes also look suspiciously unrelated?
>
>IIRC, I removed those lines while working on the patch because the
>message they produce (just the attribute name) is worse than the one
>produced by the call to PyObject_GetAttrString (which also includes
>the type of the object being accessed). Leaving the original
>exceptions alone helped me track down some failures I was getting at
>the time.
>
>I've now made the various CallMethod helper APIs in abstract.c (1
>public, 3 private) consistently leave the GetAttr exception alone and
>added an explicit C API note to NEWS.
>
>(Vaguely related tangent: the new code added by the patch probably has
>a few parts that could benefit from the new GetAttrId private API)
>
>>> diff --git a/Objects/genobject.c b/Objects/genobject.c
>>> + ? ? ? ?} else {
>>> + ? ? ? ? ? ?PyObject *e = PyStopIteration_Create(result);
>>> + ? ? ? ? ? ?if (e != NULL) {
>>> + ? ? ? ? ? ? ? ?PyErr_SetObject(PyExc_StopIteration, e);
>>> + ? ? ? ? ? ? ? ?Py_DECREF(e);
>>> + ? ? ? ? ? ?}
>>
>> Wouldn't PyErr_SetObject(PyExc_StopIteration, value) suffice here
>> anyway?
>
>I think you're right - so noted in the tracker issue about the C API additions.
>
>Thanks for the thorough review, a fresh set of eyes is very helpful :)
>
>Cheers,
>Nick.
>
>-- 
>Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
>
>
>------------------------------
>
>Message: 10
>Date: Sat, 14 Jan 2012 17:01:48 +1000
>From: Nick Coghlan <ncoghlan at gmail.com>
>To: Jack Diederich <jackdied at gmail.com>
>Cc: Guido van Rossum <guido at python.org>, Python Dev
>	<Python-Dev at python.org>
>Subject: Re: [Python-Dev] Status of the fix for the hash collision
>	vulnerability
>Message-ID:
>	<CADiSq7cmNjM8mEEhktFjA5Ss+K0Z8u_CF7tmMucn56dWOzVFUQ at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>On Sat, Jan 14, 2012 at 4:24 PM, Jack Diederich <jackdied at gmail.com> wrote:
>>> This is depending on how the counting is done (I didn't look at MAL's
>>> patch), and assuming that increasing the hash table size will generally
>>> reduce collisions if items collide but their hashes are different.
>>
>> The patch counts conflicts on an individual insert and not lifetime
>> conflicts. ?Looks sane to me.
>
>Having a hard limit on the worst-case behaviour certainly sounds like
>an attractive prospect. And there's nothing to worry about in terms of
>secrecy or sufficient randomness - by default, attackers cannot
>generate more than 1000 hash collisions in one lookup, period.
>
>>> That said, even with collision counting I'd like a way to disable it without
>>> changing the code, e.g. a flag or environment variable.
>>
>> Agreed. ?Paranoid people can turn the behavior off and if it ever were
>> to become a problem in practice we could point people to a solution.
>
>Does MAL's patch allow the limit to be set on a per-dict basis
>(including setting it to None to disable collision limiting
>completely)? If people have data sets that need to tolerate that kind
>of collision level (and haven't already decided to move to a data
>structure other than the builtin dict), then it may make sense to
>allow them to remove the limit when using trusted input.
>
>For maintenance versions though, it would definitely need to be
>possible to switch it off without touching the code.
>
>Cheers,
>Nick.
>
>-- 
>Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia
>
>
>------------------------------
>
>_______________________________________________
>Python-Dev mailing list
>Python-Dev at python.org
>http://mail.python.org/mailman/listinfo/python-dev
>
>
>End of Python-Dev Digest, Vol 102, Issue 35
>*******************************************