I would like to solicit this group's thoughts on how to reconcile the Set abstract base class with the API for built-in set objects (see http://bugs.python.org/issue8743 ). I've been thinking about this issue for a good while and the RightThingToDo(tm) isn't clear.
Here's the situation:
Binary operators for the built-in set object restrict their "other" argument to instances of set, frozenset, or one of their subclasses. Otherwise, they return NotImplemented. This design was intentional (i.e. part of the original pure python version, it is unittested behavior, and it is a documented restriction). It allows other classes to "see" the NotImplemented and have a chance to take-over using __ror__, __rand__, etc. Also, by not accepting any iterable, it prevents little coding atrocities or possible mistakes like "s | 'abc'". This is a break with what is done for lists (Guido has previously lamented that list.__add__ accepting any iterable is one of his "regrets"). This design has been in place for several years and so far everyone has been happy with it (no bug reports, feature requests, or discussions on the newsgroup, etc). If someone needed to process a non-set iterable, the named set methods (like intersection, update, etc) all accept any iterable value and this provides an immediate, usable alternative.
In contrast, the Set and MutableSet abstract base classes in Lib/_abcoll.py take a different approach. They specify that something claiming to be set-like will accept any-iterable for a binary operator (IOW, the builtin set object does not comply). The provided mixins (such as __or__, __and__, etc) are implemented that way and it works fine. Also, the Set and MutableSet API do not provide named methods such as update, intersection, difference, etc. They aren't really needed because the operator methods already provide the functionality and because it keeps the Set API to a reasonable minimum.
All of this it well and good, but the two don't interoperate. You can't get an instance of the Set ABC to work with a regular set, nor do regular sets comply with the ABC. These are problems because they defeat some of the design goals for ABCs.
We have a few options:
1. Liberalize setobject.c binary operator methods to accept anything registered to the Set ABC and add a backwards incompatible restriction to the Set ABC binary operator methods to only accept Set ABC instances (they currently accept any iterable).
This approach has a backwards incompatible tightening of the Set ABC, but that will probably affect very few people. It also has the disadvantage of not providing a straight-forward way to handle general iterable arguments (either the implementer needs to write named binary methods like update, difference, etc for that purpose or the user will need to cast the the iterable to a set before operating on it). The positive side of this option is that keeps the current advantages of the setobject API and its NotImplemented return value.
1a. Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable.
2. We could liberalize builtin set objects to accept any iterable as an "other" argument to a binary set operator. This choice is not entirely backwards compatible because it would break code depending on being able run __ror__, __rand__, etc after a NotImplemented value is returned. That being said, I think it unlikely that such code exists. The real disadvantage is that it replicates the problems with list.__add__ and Guido has said before that he doesn't want to do that again.
I was leaning towards #1 or #1a and the guys on IRC thought #2 would be better. Now I'm not sure and would like additional input so I can get this bug closed for 3.2. Any thoughts on the subject would be appreciated.
Thanks,
Raymond
P.S. I also encountered a small difficulty in implementing #2 that would still need to be resolved if that option is chosen.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
http://bugs.python.org/issue9675
Long history sort: Python 2.7 backported Capsule support and
(incorrectly, in my opinion) marked CObject as deprecated.
All C modules in the stdlib were updated to Capsule (with a CObject
compatibility layer), except BSDDB, because this change was done late in
the cycle, the proposed patch was buggy (solvable) and a pronouncement
was done that CObject was not actually deprecated.
But in python 2.7 release, CObject is marked as deprecated (arg!), so
when executing python with -We (mark warnings as errors), bsddb fails.
Since I think that adopting Capsule in BSDDB for 2.7.1 would break the
API compatibility (maybe the CObject proxy would solve this), and since
a previous pronouncement was done abour CObject not-deprecated in 2.7.x,
I would like comments.
Long history and links to previous pronouncements in
http://bugs.python.org/issue9675
My proposal: CObject should not be marked as deprecated in 2.7.1.
Thanks for your time and attention.
- --
Jesus Cea Avion _/_/ _/_/_/ _/_/_/
jcea(a)jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/
. _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQCVAwUBTKFWeJlgi5GaxT1NAQLlIgP+NAOBvwCW8gJNFspsjdLc2VqPbwXK1GJc
mmESVsoZRlAROwCkTlaOeO8GGoEuWtfb32SwJ+21RTPdPo7UxbaxhFUNju3bRYzQ
We8VGh/Qu8oJPk/toifCEw80mv4Vr9Pfli3qsR9MmGsCBFdqjLMmtwTZAkl3uMoY
6PCdo1hgAoY=
=veqL
-----END PGP SIGNATURE-----
I would like to recommend that the Python core developers start using
a code review tool such as Rietveld or Reviewboard. I don't really
care which tool we use (I'm sure there are plenty of pros and cons to
each) but I do think we should get out of the stone age and start
using a tool for the majority of our code reviews.
While I would personally love to see Rietveld declared the official
core Python code review tool, I realize that since I wrote as a Google
engineer and it is running on Google infrastructure (App Engine), I
can't be fully objective about the tool choice -- even though it is
open source, has several non-Googler maintainers, and can be run
outside App Engine as well.
But I do think that using a specialized code review tool rather than
unstructured email plus a general-purpose issue tracker can hugely
improve developer performance and also increase community
participation. (A code review tool makes it much more convenient for a
senior reviewer to impart their wisdom to a junior developer without
appearing judgmental or overbearing.)
See also this buzz thread:
http://www.google.com/buzz/115212051037621986145/At6Rj82Kret/When-will-the-…
--
--Guido van Rossum (python.org/~guido)
Hello everyone.
I see several problems with the two hex-conversion function pairs that
Python offers:
1. binascii.hexlify and binascii.unhexlify
2. bytes.fromhex and bytes.hex
Problem #1:
bytes.hex is not implemented, although it was specified in PEP 358.
This means there is no symmetrical function to accompany bytes.fromhex.
Problem #2:
Both pairs perform the same function, although The Zen Of Python suggests
that
"There should be one-- and preferably only one --obvious way to do it."
I do not understand why PEP 358 specified the bytes function pair although
it mentioned the binascii pair...
Problem #3:
bytes.fromhex may receive spaces in the input string, although
binascii.unhexlify may not.
I see no good reason for these two functions to have different features.
Problem #4:
binascii.unhexlify may receive both input types: strings or bytes, whereas
bytes.fromhex raises an exception when given a bytes parameter.
Again there is no reason for these functions to be different.
Problem #5:
binascii.hexlify returns a bytes type - although ideally, converting to hex
should
always return string types and converting from hex should always return
bytes.
IMO there is no meaning of bytes as an output of hexlify, since the output
is a
representation of other bytes.
This is also the suggested behavior of bytes.hex in PEP 358
Problems #4 and #5 call for a decision about the input and output of the
functions being discussed:
Option A : Strict input and output
unhexlify (and bytes.fromhex) may only receives string and may only return
bytes
hexlify (and bytes.hex) may only receives bytes and may only return strings
Option B : Robust input and strict output
unhexlify (and bytes.fromhex) may receive bytes and strings and may only
return bytes
hexlify (and bytes.hex) may receive bytes or strings and may only return
strings
Of course we may also consider a third option, which will allow the return
type of
all functions to be robust (perhaps specified in a keyword argument), but as
I wrote in
the description of problem #5, I see no sense in that.
Note that PEP 3137 describes: "... the more strict definitions of encoding
and decoding in
Python 3000: encoding always takes a Unicode string and returns a bytes
sequence, and decoding
always takes a bytes sequence and returns a Unicode string." - suggesting
option A.
To repeat problems #4 and #5, the current behavior does not match any
option:
* The return type of binascii.hexlify should be string, and this is not the
current behavior.
As for the input:
* Option A is not the current behavior because binascii.unhexlify may
receive both input types.
* Option B is not the current behavior because bytes.fromhex does not allow
bytes as input.
To fix these issues, three changes should be applied:
1. Deprecate bytes.fromhex. This fixes the following problems:
#4 (go with option B and remove the function that does not allow bytes
input)
#2 (the binascii functions will be the only way to "do it")
#1 (bytes.hex should not be implemented)
2. In order to keep the functionality that bytes.fromhex has over unhexlify,
the latter function should be able to handle spaces in its input (fix #3)
3. binascii.hexlify should return string as its return type (fix #5)
Amaury just filed issue #10000 yesterday; as counting started
with 1000, we are now into 9000 roundup issues.
I have become quite fond of roundup over the years, and would
like to thank Ka-Ping Yee, Richard Jones, and Erik Forsberg
for getting us here.
There are many contributions to this infrastructure, both
from individuals and software projects, but I'd like to single
out two of them which have I also appreciate very much:
the folks at Upfront Hosting have helped a lot to keep the system
running, and the PostgreSQL database has really validated it's
own claim of being the world's most advanced open source
database.
Kind regards,
Martin
On Wed, Sep 22, 2010 at 10:38 PM, Brett Cannon <brett(a)python.org> wrote:
> the first thing on the agenda is a complete rewrite of the developer
> docs and moving them into the Doc/ directory
I'd like to know why you think moving the developer docs into the
CPython tree makes sense.
My own thought here is that they're not specific to the version of
Python, though some of the documentation deals with the group of
specific branches being maintained. For me, keeping them in a
separate space (like www.python.org/dev/) makes sense.
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
"A storm broke loose in my mind." --Albert Einstein
I'm rather sad to have been sacked, but such is life. I won't be doing
any more work on the bug tracker for obvious reasons, but hope that you
who have managed to keep your voluntary jobs manage to keep Python going.
Kindest regards.
Mark Lawrence.
Hi all --
I looked through the bug tracker, but I didn't see this listed. I
was trying to use the bz2 codec, but it seems like it's not very
useful in the current form (and I'm not sure if it's getting added
back to py3k, so maybe this is a moot point). It looks like the codec
writes every piece of data fed to it as a separate compressed block.
This results in compressed files which are significantly larger than
the uncompressed files, if you're writing a lot of small bursts of
data. It also leads to interesing oddities like this:
import codecs
with codecs.open('text.bz2', 'w', 'bz2') as f:
for x in xrange(20):
f.write('This is data %i\n' % x)
with codecs.open('text.bz2', 'r', 'bz2') as f:
print f.read()
This prints "This is data 0" and exits, because the codec won't read
beyond the first compressed block.
My question is, is this known, intended behavior? Should I open a bug
report? Is it going away in py3k, so there's no real point in fixing
it?
-- Chris
I see that Atlassian have just taken over BitBucket, the Mercurial
hosting company. IIRC Atlassian offered to host our issue tracking on
JIRA, but in the end we decided to eat our own dog food and went with
roundup.
I'm wondering if they'd be similarly interested in supporting our Hg
server. Or is self-hosting the only acceptable solution? From recent
mail it looks likes we may be up and running on Hg fairly soon.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
DjangoCon US September 7-9, 2010 http://djangocon.us/
See Python Video! http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/