From martin at v.loewis.de Sat Sep 1 00:11:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 01 Sep 2007 00:11:36 +0200 Subject: [Python-3000] Release Countdown In-Reply-To: References:
<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: <46D89218.5090005@v.loewis.de> > (1) Allow bytes methods to take a literal string (which will > obviously be in the source file's encoding). To rephrase Guido's comment: do you have the slightest idea on how to specify and implement that? Regards, Martin From jimjjewett at gmail.com Sat Sep 1 00:17:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 31 Aug 2007 18:17:57 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: References:
<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: On 8/31/07, Guido van Rossum wrote: > On 8/31/07, Jim Jewett wrote: > > (1) Allow bytes methods to take a literal string (which will > > obviously be in the source file's encoding). > Yuck, yuck about the source file encoding part. Also, there is no way > to tell that a particular argument was passed a literal. There is when compiling to bytecode; it goes in co_consts. > The very > definition of "this was a literal" is iffy -- is x a literal when > passed to f below? > x = "abc" > f(x) No, it isn't. Though I suppose consistency with that sort of use (particularly inside a function, where the compiler *could* know) is the main argument against this. > > (2) There really ought to be an immutable bytes type, and the literal > > (or at least a literal, if capitalization matters) ought to be the > > immutable. > > PLISTHEADER = b"""\ > > > > > PLIST 1.0//EN" "http://www.apple.com/DTDs/ > > PropertyList-1.0.dtd"> > > """ > > If the value of PLISTHEADER does change during the run, it will almost > > certainly be a bug. I could code defensively by only ever passing > > copies, but that seems wasteful, and it could hide other bugs. If > > something does try to modify (not replace, modify) it, then there was > > probably a typo or API misunderstanding; I *want* an exception. > Sounds like you're worrying to much. Do you have any indication that > this is going to be a common problem? > > http://svn.python.org/view/python/branches/py3k/Lib/plat-mac/plistlib.py?rev=57563&r1=57305&r2=57563 Let me reverse the question. In Py2, that variable holds a constant string. What is the value in making that constant mutable? -jJ From martin at v.loewis.de Sat Sep 1 00:34:10 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 01 Sep 2007 00:34:10 +0200 Subject: [Python-3000] Release Countdown In-Reply-To: References:
<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: <46D89762.1000608@v.loewis.de> >> Yuck, yuck about the source file encoding part. Also, there is no way >> to tell that a particular argument was passed a literal. > > There is when compiling to bytecode; it goes in co_consts. > >> The very >> definition of "this was a literal" is iffy -- is x a literal when >> passed to f below? > >> x = "abc" >> f(x) > > No, it isn't. By that definition, bytes never receives a constant. Regards, Martin From jimjjewett at gmail.com Sat Sep 1 01:18:12 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 31 Aug 2007 19:18:12 -0400 Subject: [Python-3000] Release Countdown In-Reply-To: <46D89762.1000608@v.loewis.de> References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> <46D89762.1000608@v.loewis.de> Message-ID: On 8/31/07, "Martin v. L?wis" wrote: > >> Yuck, yuck about the source file encoding part. Also, there is no way > >> to tell that a particular argument was passed a literal. > > There is when compiling to bytecode; it goes in co_consts. > >> The very > >> definition of "this was a literal" is iffy -- is x a literal when > >> passed to f below? > >> x = "abc" > >> f(x) > > No, it isn't. > By that definition, bytes never receives a constant. To go back to the original motivation x.split(":") # a constant, currently fails in Py3K x.split(b":") # mechanical replacement for x.split(":") sep=":" x.split(sep) # annoying but less important failure I would prefer that x.split(":") work. If that happens because bytes.split does the conversion for me (so that x.split(sep) also works), then great. But I realize that would require an assumption about the proper encoding. If it works because the bytecode compiler changes x.split(":") into the moral equivalent of try: x.split(":") except StrNotBytesError: x.split(b":") that is good enough. And for constants which appear as string literals in the code (token stringliteral), the proper encoding is known. -jJ From lists at cheimes.de Sat Sep 1 01:28:00 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 01 Sep 2007 01:28:00 +0200 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 In-Reply-To: <46D886DD.2070601@v.loewis.de> References: <46D886DD.2070601@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Christian Heimes schrieb: >> I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP >> SP2 (German) and I run into multiple problems with 3rd party modules. >> The problem with time on German installations of Windows still exists. > > Not for me - it works fine here. Are you sure your source is up-to-date? My sources were up to date but unfortunately the output wasn't. After I did a cleanup and full recompile the error is gone. Christian From guido at python.org Sat Sep 1 01:32:20 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 16:32:20 -0700 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 In-Reply-To: References: <46D886DD.2070601@v.loewis.de> Message-ID: On 8/31/07, Christian Heimes wrote: > Martin v. L?wis wrote: > > Christian Heimes schrieb: > >> I tried to compile Python 3.0 with MS Visual Studio 2005 on Windows XP > >> SP2 (German) and I run into multiple problems with 3rd party modules. > >> The problem with time on German installations of Windows still exists. > > > > Not for me - it works fine here. Are you sure your source is up-to-date? > > My sources were up to date but unfortunately the output wasn't. After I > did a cleanup and full recompile the error is gone. Does this mean that all the problems you reported at the start of this thread are gone? (If so, I need to remove the link to this thread from the online release notes. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From half.italian at gmail.com Sat Sep 1 02:14:09 2007 From: half.italian at gmail.com (Sean DiZazzo) Date: Fri, 31 Aug 2007 17:14:09 -0700 Subject: [Python-3000] iterating over a dcitionary Message-ID: <7baa94f60708311714l1423846eq38cd71e586ca87e7@mail.gmail.com> How should we replace in our code: for k,v in dict.iteritems(): with this ?? for k,v in zip(dict, dict.values()): Sorry if this is the wrong forum for questions like this. ~Sean From l.mastrodomenico at gmail.com Sat Sep 1 02:17:54 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sat, 1 Sep 2007 02:17:54 +0200 Subject: [Python-3000] iterating over a dcitionary In-Reply-To: <7baa94f60708311714l1423846eq38cd71e586ca87e7@mail.gmail.com> References: <7baa94f60708311714l1423846eq38cd71e586ca87e7@mail.gmail.com> Message-ID: 2007/9/1, Sean DiZazzo : > How should we replace in our code: > > for k,v in dict.iteritems(): for k, v in dict.items(): -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From dalcinl at gmail.com Sat Sep 1 02:32:49 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 31 Aug 2007 21:32:49 -0300 Subject: [Python-3000] bug in py3k buffer object? Message-ID: Dear Travis, in my MPI wrappers, I use MPI_Alloc_mem function to get 'special' MPI memory, and next I return it to Python using return PyBuffer_FromReadWriteMemory(ptr, len); Well, getting back this rw-buffer in python, I tried to do mem = MPI.Alloc_mem(10) mem[:] = str8('\0') * 8 # sort of memzero but then I get this error: Traceback (most recent call last): File "", line 1, in TypeError: buffer is read-only I noticed you use PyBuff_SIMPLE in buffer_ass_item/buffer_ass_subscript... Is this OK? perhaps PyBuf_WRITEABLE is the right flag? No much more time to go deeper. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From lists at cheimes.de Sat Sep 1 02:47:00 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 01 Sep 2007 02:47:00 +0200 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 In-Reply-To: References: <46D886DD.2070601@v.loewis.de> Message-ID: <46D8B684.1030904@cheimes.de> Guido van Rossum wrote: > Does this mean that all the problems you reported at the start of this > thread are gone? (If so, I need to remove the link to this thread from > the online release notes. :-) Just the problem with the time module is gone. The problems with the 3rd party modules still exist and so does the issue with os.stat on non English Windows installations. I'm neither a Windows nor a MS VS 2005 expert - I'm mostly using Linux for development - but I could try to tweak the project file if it is appreciated and wanted. Who is responsible for PCbuild8? Christian From nnorwitz at gmail.com Sat Sep 1 02:58:28 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 31 Aug 2007 17:58:28 -0700 Subject: [Python-3000] Compiling Python 3.0 with MS Visual Studio 2005 In-Reply-To: <46D8B684.1030904@cheimes.de> References: <46D886DD.2070601@v.loewis.de> <46D8B684.1030904@cheimes.de> Message-ID: On 8/31/07, Christian Heimes wrote: > Guido van Rossum wrote: > > Does this mean that all the problems you reported at the start of this > > thread are gone? (If so, I need to remove the link to this thread from > > the online release notes. :-) > > Just the problem with the time module is gone. The problems with the 3rd > party modules still exist and so does the issue with os.stat on non > English Windows installations. I'm neither a Windows nor a MS VS 2005 > expert - I'm mostly using Linux for development - but I could try to > tweak the project file if it is appreciated and wanted. Who is > responsible for PCbuild8? If you have to ask who's in control, that means you're it. :-) There isn't really anyone. Kristj?n V. J?nsson has worked on it in the trunk, but he hasn't been maintaining it in 3k IIRC. It would be really great if you took on the responsibility, made sure things work, and provided patches when they didn't. Bug reports are of course helpful if you can't fix the problems. Cheers, n From greg.ewing at canterbury.ac.nz Sat Sep 1 03:24:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 01 Sep 2007 13:24:28 +1200 Subject: [Python-3000] Release Countdown In-Reply-To: References:
<46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> Message-ID: <46D8BF4C.7050508@canterbury.ac.nz> Jim Jewett wrote: > On 8/31/07, Guido van Rossum wrote: > > > x = "abc" > > f(x) > > I suppose consistency with that sort of use ... is > the main argument against this. I'd be *very* upset if Python started behaving differently depending on whether I wrote a literal directly inside a function call or not. -- Greg From guido at python.org Sat Sep 1 04:18:24 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Aug 2007 19:18:24 -0700 Subject: [Python-3000] Windows registry question from blog Message-ID: Someone added this comment to my blog (http://www.artima.com/forums/flat.jsp?forum=106&thread=213583&start=0#278818): "Only a question please, I have Python 2.5 installed in my windows XP machine and I would like to install Python 3a1. I think I could have troubles at the Windows Registry level. Did anybody tried to do so?" Can someone help this person? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Sat Sep 1 04:44:28 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Fri, 31 Aug 2007 22:44:28 -0400 Subject: [Python-3000] Windows registry question from blog References: Message-ID: "Guido van Rossum" wrote in message news:ca471dc20708311918k642b0d2elf67bd8bdba8830a1 at mail.gmail.com... > Someone added this comment to my blog > (http://www.artima.com/forums/flat.jsp?forum=106&thread=213583&start=0#278818): > > "Only a question please, I have Python 2.5 installed in my windows XP > machine and I would like to install Python 3a1. I think I could have > troubles at the Windows Registry level. Did anybody tried to do so?" > > Can someone help this person? > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) A quick scan of the registry makes it look to me like the main issue is that py3k would take over the .py, .pyo, .pyc extentions, which is not a big deal. Reinstalling 2.5 (in place) would fix that. The only other potential issue is the uninstall icon for 2.5 disappearing. However a quick test shows that the design of the installer prevents this potential issue. So everything looks fine to me. I have both instaled at the moment, and it looks fine to me. From nick.bastin at gmail.com Sat Sep 1 06:48:58 2007 From: nick.bastin at gmail.com (Nicholas Bastin) Date: Sat, 1 Sep 2007 00:48:58 -0400 Subject: [Python-3000] Windows registry question from blog In-Reply-To: References: Message-ID: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> Is there no option in the installer to associate Python with .py, .pyc, etc.? Obviously then the logical choice would be to unselect that (or perhaps have it unselected by default for alpha installations). -- Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070901/527bf49c/attachment.htm From martin at v.loewis.de Sat Sep 1 07:31:04 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 01 Sep 2007 07:31:04 +0200 Subject: [Python-3000] Release Countdown In-Reply-To: References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> <46D89762.1000608@v.loewis.de> Message-ID: <46D8F918.6090701@v.loewis.de> > If it works because the bytecode compiler changes x.split(":") into > the moral equivalent of > > try: > x.split(":") > except StrNotBytesError: > x.split(b":") > > that is good enough. And how do you propose to implement that? Regards, Martin From greg.ewing at canterbury.ac.nz Sat Sep 1 03:35:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 01 Sep 2007 13:35:10 +1200 Subject: [Python-3000] Release Countdown In-Reply-To: References: <46D7FE7D.5020909@trueblade.com> <46D803FF.9000909@v.loewis.de> <46D806D8.4070905@trueblade.com> <797C63A8-888F-45CC-A780-CE9AD859BC1B@python.org> <46D89762.1000608@v.loewis.de> Message-ID: <46D8C1CE.1020608@canterbury.ac.nz> Jim Jewett wrote: > I would prefer that x.split(":") work. > > If that happens because bytes.split does the conversion for me (so > that x.split(sep) also works), then great. But I realize that would > require an assumption about the proper encoding. If you're going to do things like that, why stop at the parameters to bytes methods? It's hard to argue that they should be treated specially, rather than allowing strings to be cast to bytes in any context that expects bytes. And then the clear distinction between str and bytes that we're trying to maintain breaks down. You can't have it both ways. The type error you're complaining about is just the sort of error that the str/bytes distinction is meant to *catch*. -- Greg From martin at v.loewis.de Sat Sep 1 08:03:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 01 Sep 2007 08:03:48 +0200 Subject: [Python-3000] Windows registry question from blog In-Reply-To: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> References: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> Message-ID: <46D900C4.8050109@v.loewis.de> Nicholas Bastin schrieb: > Is there no option in the installer to associate Python with .py, .pyc, > etc.? There certainly is. > Obviously then the logical choice would be to unselect that (or > perhaps have it unselected by default for alpha installations). I'd rather have the user unselect it - people installing multiple Python version are familiar with the phenomenon and might get puzzled if some installation suddenly behaved different. Regards, Martin From michele.simionato at gmail.com Sat Sep 1 10:33:32 2007 From: michele.simionato at gmail.com (Michele Simionato) Date: Sat, 1 Sep 2007 10:33:32 +0200 Subject: [Python-3000] let's get rid of unbound super methods Message-ID: <4edc17eb0709010133n5dd560e1i1956c1b4a395f96f@mail.gmail.com> So Python 3000a1 is out! Kudos to everybody involved! You did an incredible amount of work in a relatively short time! :-) Having said that, let me go to the point. This morning I downloaded the tarball and compiled everything without issues, then I started playing around. One of the first thing I looked at was the new super, since it is a matter that made me scratch my head a lot in the past. Basically I am happy with the implementation, especially about the new magic name __class__ inside the methods which is something I always wanted. So I am not here to ask for new features. I am actually here to ask for less features: specifically, I would like the unbound syntax for super to be removed. I am talking about this: >>> help(super) Help on class super in module __builtin__: class super(object) | super() -> same as super(__class__, ) | super(type) -> unbound super object | super(type, obj) -> bound super object; requires isinstance(obj, type) | super(type, type2) -> bound super object; requires issubclass(type2, type) The single argument syntax 'super(type)' is what I call the unbound syntax. I would like 'super(type)' to be removed from the valid signatures. AFAIK, the only use case for it was the implementation of the autosuper recipe in Guido's new style classes essay. That use case has disappeared nowadays, and I cannot think of other situations where may want to use that feature (you may think differently, if so, please speak). The other reason why I would like it to be removed (apart from the fact that it looks unneeded to me) is that is very difficult to explain to beginners. For instance in the past I lectured on Python, and in order to explain why unbound super objects can be useful I gave this example, which is basically Guido's autosuper recipe implemented by hand: class B(object): def __repr__(self): return '' % self.__class__.__name__ #@classmethod def cmeth(self): print("B.meth called from %s" % self) class C(B): #@classmethod def cmeth(self): print("C.meth called from %s" % self) self.__super.cmeth() C._C__super = super(C) c = C() c.cmeth() Here everything works because the unbound super object is a descriptor and self.__super calls super(C).__get__(self, C) which corresponds to the bound method super(C, self) which is able to dispatch to .cmeth. However, if you uncomment the classmethod decorator, self.__super (where self is now the class C) will just return the unbound super object super(C) which is unable to dispatch to .cmeth. Now, try to explain that to a beginner! We can leave just as well without unbound super methods, so let's take the occasion of Python3k to remove this glitch. Michele Simionato From guido at python.org Sat Sep 1 17:01:58 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Sep 2007 08:01:58 -0700 Subject: [Python-3000] let's get rid of unbound super methods In-Reply-To: <4edc17eb0709010133n5dd560e1i1956c1b4a395f96f@mail.gmail.com> References: <4edc17eb0709010133n5dd560e1i1956c1b4a395f96f@mail.gmail.com> Message-ID: Thanks for proposing this -- I've been scratching my head wondering what the use of unbound super() would be. :-) I'm fine with killing it -- perhaps someone can do a bit of research to try and find out if there are any real-life uses (apart from various auto-super clones)? --Guido On 9/1/07, Michele Simionato wrote: > So Python 3000a1 is out! Kudos to everybody involved! > You did an incredible amount of work in a relatively short time! :-) > > Having said that, let me go to the point. This morning I downloaded > the tarball and compiled everything without issues, then I > started playing around. One of the first thing I looked at was the new > super, since it is a matter that made me scratch my head a lot in the > past. Basically I am happy with the implementation, especially about > the new magic name __class__ inside the methods which is something I > always wanted. So I am not here to ask for new features. I am actually > here to ask for less features: specifically, I would like the unbound > syntax for super to be removed. I am talking about this: > > >>> help(super) > Help on class super in module __builtin__: > > class super(object) > | super() -> same as super(__class__, ) > | super(type) -> unbound super object > | super(type, obj) -> bound super object; requires isinstance(obj, type) > | super(type, type2) -> bound super object; requires issubclass(type2, type) > > > The single argument syntax 'super(type)' is what I call the unbound syntax. > I would like 'super(type)' to be removed from the valid signatures. > AFAIK, the only use case for it was the implementation of the autosuper > recipe in Guido's new style classes essay. That use case has disappeared > nowadays, and I cannot think of other situations where may want to use > that feature (you may think differently, if so, please speak). > The other reason why I would like it to be removed (apart from the fact > that it looks unneeded to me) is that is very difficult to explain to > beginners. For instance in the past I lectured on Python, and in order > to explain why unbound super objects can be useful I gave this example, > which is basically Guido's autosuper recipe implemented by hand: > > class B(object): > def __repr__(self): > return '' % self.__class__.__name__ > #@classmethod > def cmeth(self): > print("B.meth called from %s" % self) > > class C(B): > #@classmethod > def cmeth(self): > print("C.meth called from %s" % self) > self.__super.cmeth() > > C._C__super = super(C) > > c = C() > > c.cmeth() > > Here everything works because the unbound super object is a descriptor > and self.__super calls super(C).__get__(self, C) which corresponds > to the bound method super(C, self) which is able to dispatch to .cmeth. > However, if you uncomment the classmethod decorator, self.__super (where > self is now the class C) will just return the unbound super object super(C) > which is unable to dispatch to .cmeth. Now, try to explain that to a beginner! > We can leave just as well without unbound super methods, so let's take > the occasion of Python3k to remove this glitch. > > > Michele Simionato > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nick.bastin at gmail.com Sun Sep 2 08:14:37 2007 From: nick.bastin at gmail.com (Nicholas Bastin) Date: Sun, 2 Sep 2007 02:14:37 -0400 Subject: [Python-3000] Windows registry question from blog In-Reply-To: <46D900C4.8050109@v.loewis.de> References: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> <46D900C4.8050109@v.loewis.de> Message-ID: <66d0a6e10709012314g4142e74blefd2a4620e11e4@mail.gmail.com> On 9/1/07, "Martin v. L?wis" wrote: > > > Obviously then the logical choice would be to unselect that (or > > perhaps have it unselected by default for alpha installations). > > I'd rather have the user unselect it - people installing multiple > Python version are familiar with the phenomenon and might get puzzled > if some installation suddenly behaved different. > If this were an actual certified "release" of Python, I'd agree with that. However, it's not - it's a specifically-incompatible alpha release, and I would vote for it being unselected by default. (People familiar with installing multiple Python versions will not be familiar with anything close to this level of incompatibility in their .py files). -- Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070902/b43cbdff/attachment.htm From brett at python.org Sun Sep 2 09:17:07 2007 From: brett at python.org (Brett Cannon) Date: Sun, 2 Sep 2007 00:17:07 -0700 Subject: [Python-3000] Ambiguity in PEP 3115 and the args to __prepare__ Message-ID: PEP 3115 says a metaclass' __prepare__ takes two positional arguments, name and bases. But the example has it actually accept an arbitrary number of arguments: name and then everything else is bound to bases. Which happens to be true? I'm too tired to even fully trust that I am reading the PEP correctly, so I am not about to try to write an example to see which is correct and come up with a coherent rewording if I am right about what is wrong. =) -Brett From ggpolo at gmail.com Sun Sep 2 15:42:35 2007 From: ggpolo at gmail.com (Guilherme Polo) Date: Sun, 2 Sep 2007 10:42:35 -0300 Subject: [Python-3000] Ambiguity in PEP 3115 and the args to __prepare__ In-Reply-To: References: Message-ID: 2007/9/2, Brett Cannon : > PEP 3115 says a metaclass' __prepare__ takes two positional arguments, > name and bases. But the example has it actually accept an arbitrary > number of arguments: name and then everything else is bound to bases. > > Which happens to be true? I've played with it a bit and as I see it only takes name and bases. Maybe there is something secret ;p about using *args in __prepare__ that I dont know yet. > I'm too tired to even fully trust that I am > reading the PEP correctly, so I am not about to try to write an > example to see which is correct and come up with a coherent rewording > if I am right about what is wrong. =) > > -Brett > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/ggpolo%40gmail.com > -- -- Guilherme H. Polo Goncalves -- -- Guilherme H. Polo Goncalves From guido at python.org Sun Sep 2 17:07:55 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 2 Sep 2007 08:07:55 -0700 Subject: [Python-3000] Ambiguity in PEP 3115 and the args to __prepare__ In-Reply-To: References: Message-ID: On 9/2/07, Brett Cannon wrote: > PEP 3115 says a metaclass' __prepare__ takes two positional arguments, > name and bases. But the example has it actually accept an arbitrary > number of arguments: name and then everything else is bound to bases. > > Which happens to be true? I'm too tired to even fully trust that I am > reading the PEP correctly, so I am not about to try to write an > example to see which is correct and come up with a coherent rewording > if I am right about what is wrong. =) I think you're misreading what you think is an example. I'm assuming you're referring to this code: def prepare_class(name, *bases, metaclass=None, **kwargs): if metaclass is None: metaclass = compute_default_metaclass(bases) prepare = getattr(metaclass, '__prepare__', None) if prepare is not None: return prepare(name, bases, **kwargs) else: return dict() This indeed *defines* a function with a *bases argument, but it is not called __prepare__! It *calls* __prepare__ passing it name and bases, i.e. the 2nd argument to prepare is a tuple of bases. The only example defining __prepare__ later in the PEP takes two positional arguments (name and bases again). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From trentm at gmail.com Sun Sep 2 18:26:07 2007 From: trentm at gmail.com (Trent Mick) Date: Sun, 2 Sep 2007 09:26:07 -0700 Subject: [Python-3000] Windows registry question from blog In-Reply-To: <66d0a6e10709012314g4142e74blefd2a4620e11e4@mail.gmail.com> References: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> <46D900C4.8050109@v.loewis.de> <66d0a6e10709012314g4142e74blefd2a4620e11e4@mail.gmail.com> Message-ID: <6db0ea510709020926p6745e419x1b02217016addcad@mail.gmail.com> > > > Obviously then the logical choice would be to unselect that (or > > > perhaps have it unselected by default for alpha installations). > > > > I'd rather have the user unselect it - people installing multiple > > Python version are familiar with the phenomenon and might get puzzled > > if some installation suddenly behaved different. > > > > If this were an actual certified "release" of Python, I'd agree with that. > However, it's not - it's a specifically-incompatible alpha release, and I > would vote for it being unselected by default. (People familiar with > installing multiple Python versions will not be familiar with anything close > to this level of incompatibility in their .py files). FWIW, this is what I do for the ActivePython (and Komodo) installers: only do the PATHEXT, PATH and file association changes by default in final releases and require the user to select that for alpha/beta releases. Trent -- Trent Mick trentm at gmail.com From martin at v.loewis.de Sun Sep 2 19:24:25 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 02 Sep 2007 19:24:25 +0200 Subject: [Python-3000] Windows registry question from blog In-Reply-To: <6db0ea510709020926p6745e419x1b02217016addcad@mail.gmail.com> References: <66d0a6e10708312148w677ed8b5g223ebb4288c0c167@mail.gmail.com> <46D900C4.8050109@v.loewis.de> <66d0a6e10709012314g4142e74blefd2a4620e11e4@mail.gmail.com> <6db0ea510709020926p6745e419x1b02217016addcad@mail.gmail.com> Message-ID: <46DAF1C9.6000009@v.loewis.de> > FWIW, this is what I do for the ActivePython (and Komodo) installers: > only do the PATHEXT, PATH and file association changes by default in > final releases and require the user to select that for alpha/beta > releases. That's actually worth something; I'll see whether I can find the time to change this for a2. I'd like to make that computed, so I don't have to change the script for a release. Contributions are welcome. Regards, Martin From brett at python.org Sun Sep 2 19:43:34 2007 From: brett at python.org (Brett Cannon) Date: Sun, 2 Sep 2007 10:43:34 -0700 Subject: [Python-3000] Ambiguity in PEP 3115 and the args to __prepare__ In-Reply-To: References: Message-ID: On 9/2/07, Guido van Rossum wrote: > On 9/2/07, Brett Cannon wrote: > > PEP 3115 says a metaclass' __prepare__ takes two positional arguments, > > name and bases. But the example has it actually accept an arbitrary > > number of arguments: name and then everything else is bound to bases. > > > > Which happens to be true? I'm too tired to even fully trust that I am > > reading the PEP correctly, so I am not about to try to write an > > example to see which is correct and come up with a coherent rewording > > if I am right about what is wrong. =) > > I think you're misreading what you think is an example. I'm assuming > you're referring to this code: > > def prepare_class(name, *bases, metaclass=None, **kwargs): > if metaclass is None: > metaclass = compute_default_metaclass(bases) > prepare = getattr(metaclass, '__prepare__', None) > if prepare is not None: > return prepare(name, bases, **kwargs) > else: > return dict() > > This indeed *defines* a function with a *bases argument, but it is not > called __prepare__! It *calls* __prepare__ passing it name and bases, > i.e. the 2nd argument to prepare is a tuple of bases. Ah, OK, that is the issue (that and type.__prepare__ takes any arguments and just always returns a new dictionary). So it was the lack of sleep. =) -Brett From robin at nibor.org Sun Sep 2 23:10:08 2007 From: robin at nibor.org (Robin Stocker) Date: Sun, 02 Sep 2007 23:10:08 +0200 Subject: [Python-3000] Patch for Doc/tutorial In-Reply-To: References: Message-ID: <46DB26B0.2090007@nibor.org> Paul Dubois schrieb: > Attached is a patch for changes to the tutorial. I made it by doing: > > svn diff tutorial > tutorial.diff > > in the Doc directory. I hope this is what is wanted; if not let me know > what to do. > > Unfortunately cygwin will not run Sphinx correctly even using 2.5, much > less 3.0. And running docutils by hand gets a lot of errors because > Sphinx has hidden a lot of the definitions used in the tutorial. So the > bottom line is I have only an imperfect idea if I have screwed up any > formatting. > > I would like to rewrite the classes.rst file in particular, and it is > the one that I did not check to be sure the examples worked, but first I > need to do something about getting me a real Linux so I don't have these > problems. So unless someone is hot to trot I'd like to remain 'owner' of > this issue on the spreadsheet. > > Whoever puts in these patches, I would appreciate being notified that it > is done. > > Paul I've had a look at the patch and here's another one against the current py3k, to be applied in the Doc directory. It mostly fixes some code formatting errors, like no space after a comma. Robin Stocker -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: tutorial-formatting-fixes.patch Url: http://mail.python.org/pipermail/python-3000/attachments/20070902/2fcab1da/attachment.txt From g.brandl at gmx.net Mon Sep 3 09:10:48 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 03 Sep 2007 09:10:48 +0200 Subject: [Python-3000] Patch for Doc/tutorial In-Reply-To: <46DB26B0.2090007@nibor.org> References: <46DB26B0.2090007@nibor.org> Message-ID: Robin Stocker schrieb: > Paul Dubois schrieb: >> Attached is a patch for changes to the tutorial. I made it by doing: >> >> svn diff tutorial > tutorial.diff >> >> in the Doc directory. I hope this is what is wanted; if not let me know >> what to do. >> >> Unfortunately cygwin will not run Sphinx correctly even using 2.5, much >> less 3.0. And running docutils by hand gets a lot of errors because >> Sphinx has hidden a lot of the definitions used in the tutorial. So the >> bottom line is I have only an imperfect idea if I have screwed up any >> formatting. >> >> I would like to rewrite the classes.rst file in particular, and it is >> the one that I did not check to be sure the examples worked, but first I >> need to do something about getting me a real Linux so I don't have these >> problems. So unless someone is hot to trot I'd like to remain 'owner' of >> this issue on the spreadsheet. >> >> Whoever puts in these patches, I would appreciate being notified that it >> is done. >> >> Paul > > I've had a look at the patch and here's another one against the current > py3k, to be applied in the Doc directory. It mostly fixes some code > formatting errors, like no space after a comma. Thanks very much (again), applied as rev. 57923. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From baranguren at gmail.com Mon Sep 3 16:59:33 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Mon, 3 Sep 2007 07:59:33 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: I am having a problem backporting collections.py/_abcoll.py and would like to get your input. There's one test in test_collections that fails. class TestOneTrickPonyABCs(unittest.TestCase): def test_Hashable(self): # Check some non-hashables non_samples = [list(), set(), dict()] for x in non_samples: self.failIf(isinstance(x, Hashable), repr(x)) self.failIf(issubclass(type(x), Hashable), repr(type(x))) The problem is list, set, dict all has __hash__ function so isinstance and issubclass returns true even though none of list, set, and dict was registered as a subclass of Hashable. But, calling x.__hash__() on these types results to a TypeError: list objects are unhashable. Thanks! On 8/26/07, Benjamin Aranguren wrote: > I got it now. both modules need to be backported as well. I'm on it. > > On 8/26/07, Benjamin Aranguren wrote: > > No problem. Created issue 1026 in tracker with a single patch file attached. > > > > I'm not aware of what changes need to be done with _abcoll.py and > > collections.py. If you can point me to the right direction, I would > > definitely like to work on it. > > > > On 8/26/07, Guido van Rossum wrote: > > > Thanks! > > > > > > Would it inconvenience you terribly to upload this all to the new > > > tracker (bugs.python.org)? Preferably as a single patch against the > > > svn trunk (to use svn diff, you have to svn add the new files first!) > > > > > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > > > > > --Guido > > > > > > On 8/26/07, Benjamin Aranguren wrote: > > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > > > > > After making all the changes we ran all the tests to ensure that no > > > > other modules were affected. > > > > > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > > > > > On 8/25/07, Guido van Rossum wrote: > > > > > Um, that patch contains only the C code for overloading isinstance() > > > > > and issubclass(). > > > > > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > > > their respective unit tests? Or what about the unit tests for > > > > > isinstance()/issubclass()? > > > > > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > > > > > -- > > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > From eric+python-dev at trueblade.com Mon Sep 3 17:06:55 2007 From: eric+python-dev at trueblade.com (Eric Smith) Date: Mon, 03 Sep 2007 11:06:55 -0400 Subject: [Python-3000] str.format vs. string.Formatter exceptions Message-ID: <46DC230F.2040409@trueblade.com> Ron Adam points out some differences in which exceptions are thrown by str.format and string.Formatter. For example, on a missing positional argument: >>> "{0}".format() Traceback (most recent call last): File "", line 1, in ValueError: Not enough positional arguments in format string >>> Formatter().format("{0}") Traceback (most recent call last): File "", line 1, in File "/shared/src/python/py3k/Lib/string.py", line 201, in format return self.vformat(format_string, args, kwargs) File "/shared/src/python/py3k/Lib/string.py", line 220, in vformat obj, arg_used = self.get_field(field_name, args, kwargs) File "/shared/src/python/py3k/Lib/string.py", line 278, in get_field obj = self.get_value(first, args, kwargs) File "/shared/src/python/py3k/Lib/string.py", line 235, in get_value return args[key] IndexError: tuple index out of range The PEP says: In general, exceptions generated by the formatter code itself are of the "ValueError" variety -- there is an error in the actual "value" of the format string. I can easily change string.Formatter to make this a ValueError, and I think that's probably the right thing to do. For example, if the string comes from a translation module, then there might be an extra parameter added by mistake, in which case ValueError seems right to me. But I'd like to hear if anyone else thinks this should be an IndexError, or maybe they both should be some other exception. Similarly "{x}".format()' currently raises ValueError, but 'Formatter().format("{x}")' raises KeyError. From nick.bastin at gmail.com Mon Sep 3 20:54:45 2007 From: nick.bastin at gmail.com (Nicholas Bastin) Date: Mon, 3 Sep 2007 14:54:45 -0400 Subject: [Python-3000] Performance Notes Message-ID: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com> I've been doing some profiling of 3.0 vs. 2.6 release builds on Windows XP for the purpose of hopefully closing the performance gap. This data is very preliminary, but I thought I'd throw it out here in case someone else also wanted to look into this. Also, possibly useful for comparing against profiling data on other platforms. The table below just lists functions and speed differentials in 3.0 vs. 2.6, ordered by the functions in which we spend the most total time. NOTE: This data is time sampling, not call graph. Added time could come from either more calls, or longer calls. + 11.5% PyEval_EvalFrameEx + 40.2% lookdict (replacing lookdict_string) +312.9% PyDict_GetItem - 13.2% call_function + 19.4% fast_function Other notes: * PyLong_FitsInLong consumes about 2% of total pystone runtime. * unicode_compare consumes the exact same time in 3.0 that string_richcompare consumed in 2.6. Either these functions share a similar CPU profile, or their call counts vary dramatically. Top 5 functions in Python 2.6: * PyEval_EvalFrameEx (48.66%) * lookdict_string (5.76%) * call_function (4.80%) * frame_dealloc (2.80%) * fast_function (2.48%) Top 5 functions in Python 3.0: * PyEval_EvalFrameEx (44.37%) * lookdict (6.66%) * PyDict_GetItem (4.63%) * unicode_hash (3.51%) * call_function (3.38%) -- Nick From thomas at python.org Tue Sep 4 01:33:33 2007 From: thomas at python.org (Thomas Wouters) Date: Tue, 4 Sep 2007 01:33:33 +0200 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: References: Message-ID: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> On 8/31/07, Guido van Rossum wrote: > > I haven't heard yet that merging is impossible or useless; there's > still a lot of similarity between the trunk and the branch. Merging is sometimes hard, but always fun. Well, challenging. A Chinese kind of interesting time. It certainly forces the merger to keep up to date on changes in both branches :-) I'll happily keep on merging until at least 3.0final is released, quite possibly until 2.x is nailed to its perch. I wouldn't even mind doing that after the reindent of the py3k C source; everything would conflict, but 'diff -cbB' solves that nicely. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/603706c9/attachment.htm From guido at python.org Tue Sep 4 04:25:17 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Sep 2007 19:25:17 -0700 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> References: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> Message-ID: Thanks for volunteering! Let me know when you're short on time and I'll take over (or appoint another volunteer :). --Guido On 9/3/07, Thomas Wouters wrote: > > > On 8/31/07, Guido van Rossum wrote: > > I haven't heard yet that merging is impossible or useless; there's > > still a lot of similarity between the trunk and the branch. > > Merging is sometimes hard, but always fun. Well, challenging. A Chinese kind > of interesting time. It certainly forces the merger to keep up to date on > changes in both branches :-) I'll happily keep on merging until at least > 3.0final is released, quite possibly until 2.x is nailed to its perch. I > wouldn't even mind doing that after the reindent of the py3k C source; > everything would conflict, but 'diff -cbB' solves that nicely. > > -- > Thomas Wouters > > Hi! I'm a .signature virus! copy me into your .signature file to help me > spread! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nick.bastin at gmail.com Tue Sep 4 04:33:37 2007 From: nick.bastin at gmail.com (Nicholas Bastin) Date: Mon, 3 Sep 2007 22:33:37 -0400 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> References: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> Message-ID: <66d0a6e10709031933i3b8c0d88ma11429329a4b311d@mail.gmail.com> On 9/3/07, Thomas Wouters wrote: > > > On 8/31/07, Guido van Rossum wrote: > > I haven't heard yet that merging is impossible or useless; there's > > still a lot of similarity between the trunk and the branch. > > Merging is sometimes hard, but always fun. Well, challenging. A Chinese kind > of interesting time. Merging in SVN is hard and challenging. Merging in a reasonable SCM is not so bad. :-) (Unfortunately, in this context, read "reasonable" as "commercial") -- Nick From rrr at ronadam.com Tue Sep 4 04:38:22 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 03 Sep 2007 21:38:22 -0500 Subject: [Python-3000] str.format vs. string.Formatter exceptions In-Reply-To: <46DC230F.2040409@trueblade.com> References: <46DC230F.2040409@trueblade.com> Message-ID: <46DCC51E.3050809@ronadam.com> Eric Smith wrote: > Ron Adam points out some differences in which exceptions are thrown by > str.format and string.Formatter. For example, on a missing positional > argument: > > >>> "{0}".format() > Traceback (most recent call last): > File "", line 1, in > ValueError: Not enough positional arguments in format string > > >>> Formatter().format("{0}") > Traceback (most recent call last): > File "", line 1, in > File "/shared/src/python/py3k/Lib/string.py", line 201, in format > return self.vformat(format_string, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 220, in vformat > obj, arg_used = self.get_field(field_name, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 278, in get_field > obj = self.get_value(first, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 235, in get_value > return args[key] > IndexError: tuple index out of range > > The PEP says: In general, exceptions generated by the formatter code > itself are of the "ValueError" variety -- there is an error in the > actual "value" of the format string. The PEP also says the following in regards to this... +---------------- Implementation note: The implementation of this proposal is not required to enforce the rule about a name being a valid Python identifier. Instead, it will rely on the getattr function of the underlying object to throw an exception if the identifier is not legal. The str.format() function will have a minimalist parser which only attempts to figure out when it is "done" with an identifier (by finding a '.' or a ']', or '}', etc.). +---------------- If these return ValueErrors, as I think it has been suggested in the earlier messages, then this will need to be updated as well. _RON > I can easily change string.Formatter to make this a ValueError, and I > think that's probably the right thing to do. For example, if the string > comes from a translation module, then there might be an extra parameter > added by mistake, in which case ValueError seems right to me. > > But I'd like to hear if anyone else thinks this should be an IndexError, > or maybe they both should be some other exception. > > Similarly "{x}".format()' currently raises ValueError, but > 'Formatter().format("{x}")' raises KeyError. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rrr%40ronadam.com > > From aahz at pythoncraft.com Tue Sep 4 05:09:24 2007 From: aahz at pythoncraft.com (Aahz) Date: Mon, 3 Sep 2007 20:09:24 -0700 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> References: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> Message-ID: <20070904030923.GA18848@panix.com> On Tue, Sep 04, 2007, Thomas Wouters wrote: > > Merging is sometimes hard, but always fun. Well, challenging. A > Chinese kind of interesting time. Not so Chinese, actually: http://www.noblenet.org/reference/inter.htm -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Many customs in this life persist because they ease friction and promote productivity as a result of universal agreement, and whether they are precisely the optimal choices is much less important." --Henry Spencer http://www.lysator.liu.se/c/ten-commandments.html From guido at python.org Tue Sep 4 05:09:28 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Sep 2007 20:09:28 -0700 Subject: [Python-3000] str.format vs. string.Formatter exceptions In-Reply-To: <46DC230F.2040409@trueblade.com> References: <46DC230F.2040409@trueblade.com> Message-ID: Since IndexError and KeyError are conceptually like ValueError but in a more narrowly defined context, I think IndexError and KeyError actually make sense here (even though they don't inherit from ValueError). --Guido On 9/3/07, Eric Smith wrote: > Ron Adam points out some differences in which exceptions are thrown by > str.format and string.Formatter. For example, on a missing positional > argument: > > >>> "{0}".format() > Traceback (most recent call last): > File "", line 1, in > ValueError: Not enough positional arguments in format string > > >>> Formatter().format("{0}") > Traceback (most recent call last): > File "", line 1, in > File "/shared/src/python/py3k/Lib/string.py", line 201, in format > return self.vformat(format_string, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 220, in vformat > obj, arg_used = self.get_field(field_name, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 278, in get_field > obj = self.get_value(first, args, kwargs) > File "/shared/src/python/py3k/Lib/string.py", line 235, in get_value > return args[key] > IndexError: tuple index out of range > > The PEP says: In general, exceptions generated by the formatter code > itself are of the "ValueError" variety -- there is an error in the > actual "value" of the format string. > > I can easily change string.Formatter to make this a ValueError, and I > think that's probably the right thing to do. For example, if the string > comes from a translation module, then there might be an extra parameter > added by mistake, in which case ValueError seems right to me. > > But I'd like to hear if anyone else thinks this should be an IndexError, > or maybe they both should be some other exception. > > Similarly "{x}".format()' currently raises ValueError, but > 'Formatter().format("{x}")' raises KeyError. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 4 05:16:43 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Sep 2007 20:16:43 -0700 Subject: [Python-3000] Performance Notes In-Reply-To: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com> References: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com> Message-ID: Interesting! Thanks for doing this. We'll need a lot of this over the coming year. I read in this that the increased cost is largely due to using unicode strings for all variable and attribute names. So the next step might be to optimize the snot out of unicode hashing and introduce the unicode equivalent of lookup_string (while retiring the 8-bit version). The unicode type has never received the same amount of love that the 8-bit str type received over the years (and from day zero). BTW this goes to show that int operations are *not* (yet) the biggest bottleneck -- though I'm sure they're bubbling under. PS It would be interesting to collect more "holistic" benchmarks (micro-benchmarks aren't particularly interesting in this stage, as we're trying to improve *overall* performance). --Guido On 9/3/07, Nicholas Bastin wrote: > I've been doing some profiling of 3.0 vs. 2.6 release builds on > Windows XP for the purpose of hopefully closing the performance gap. > This data is very preliminary, but I thought I'd throw it out here in > case someone else also wanted to look into this. Also, possibly > useful for comparing against profiling data on other platforms. The > table below just lists functions and speed differentials in 3.0 vs. > 2.6, ordered by the functions in which we spend the most total time. > > NOTE: This data is time sampling, not call graph. Added time could > come from either more calls, or longer calls. > > + 11.5% PyEval_EvalFrameEx > + 40.2% lookdict (replacing lookdict_string) > +312.9% PyDict_GetItem > - 13.2% call_function > + 19.4% fast_function > > Other notes: > * PyLong_FitsInLong consumes about 2% of total pystone runtime. > * unicode_compare consumes the exact same time in 3.0 that > string_richcompare consumed in 2.6. Either these functions share a > similar CPU profile, or their call counts vary dramatically. > > Top 5 functions in Python 2.6: > > * PyEval_EvalFrameEx (48.66%) > * lookdict_string (5.76%) > * call_function (4.80%) > * frame_dealloc (2.80%) > * fast_function (2.48%) > > Top 5 functions in Python 3.0: > > * PyEval_EvalFrameEx (44.37%) > * lookdict (6.66%) > * PyDict_GetItem (4.63%) > * unicode_hash (3.51%) > * call_function (3.38%) > > -- > Nick > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Sep 4 05:30:41 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Sep 2007 20:30:41 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: You're going to have to do some spelunking in the 3.0 source (because I don't have time right now :-), but I think 3.0 has some magic that solves this. I *think* it is done by not inheriting tp_hash unless tp_richcompare is also inherited. The details are probably in typeobject.c. Ask me again tomorrow if you can't figure it out. --Guido On 9/3/07, Benjamin Aranguren wrote: > I am having a problem backporting collections.py/_abcoll.py and would > like to get your input. > > There's one test in test_collections that fails. > > class TestOneTrickPonyABCs(unittest.TestCase): > > def test_Hashable(self): > # Check some non-hashables > non_samples = [list(), set(), dict()] > for x in non_samples: > self.failIf(isinstance(x, Hashable), repr(x)) > self.failIf(issubclass(type(x), Hashable), repr(type(x))) > > The problem is list, set, dict all has __hash__ function so isinstance > and issubclass returns true even though none of list, set, and dict > was registered as a subclass of Hashable. > > But, calling x.__hash__() on these types results to a TypeError: list > objects are unhashable. > > Thanks! > > On 8/26/07, Benjamin Aranguren wrote: > > I got it now. both modules need to be backported as well. I'm on it. > > > > On 8/26/07, Benjamin Aranguren wrote: > > > No problem. Created issue 1026 in tracker with a single patch file attached. > > > > > > I'm not aware of what changes need to be done with _abcoll.py and > > > collections.py. If you can point me to the right direction, I would > > > definitely like to work on it. > > > > > > On 8/26/07, Guido van Rossum wrote: > > > > Thanks! > > > > > > > > Would it inconvenience you terribly to upload this all to the new > > > > tracker (bugs.python.org)? Preferably as a single patch against the > > > > svn trunk (to use svn diff, you have to svn add the new files first!) > > > > > > > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > > > > > > > --Guido > > > > > > > > On 8/26/07, Benjamin Aranguren wrote: > > > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > > > > > > > After making all the changes we ran all the tests to ensure that no > > > > > other modules were affected. > > > > > > > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > > > > > > > On 8/25/07, Guido van Rossum wrote: > > > > > > Um, that patch contains only the C code for overloading isinstance() > > > > > > and issubclass(). > > > > > > > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > > > > their respective unit tests? Or what about the unit tests for > > > > > > isinstance()/issubclass()? > > > > > > > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > > > > > > > -- > > > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From baranguren at gmail.com Tue Sep 4 05:37:00 2007 From: baranguren at gmail.com (Benjamin Aranguren) Date: Mon, 3 Sep 2007 20:37:00 -0700 Subject: [Python-3000] backported ABC In-Reply-To: References: Message-ID: Thanks! This helps. I was just not sure if I was on the right track or not. I did try disabling &list_nohash in listobject.c I think I have the right idea and just needed some reassurance. I'll give it another try. Thanks again. On 9/3/07, Guido van Rossum wrote: > You're going to have to do some spelunking in the 3.0 source (because > I don't have time right now :-), but I think 3.0 has some magic that > solves this. I *think* it is done by not inheriting tp_hash unless > tp_richcompare is also inherited. The details are probably in > typeobject.c. > > Ask me again tomorrow if you can't figure it out. > > --Guido > > On 9/3/07, Benjamin Aranguren wrote: > > I am having a problem backporting collections.py/_abcoll.py and would > > like to get your input. > > > > There's one test in test_collections that fails. > > > > class TestOneTrickPonyABCs(unittest.TestCase): > > > > def test_Hashable(self): > > # Check some non-hashables > > non_samples = [list(), set(), dict()] > > for x in non_samples: > > self.failIf(isinstance(x, Hashable), repr(x)) > > self.failIf(issubclass(type(x), Hashable), repr(type(x))) > > > > The problem is list, set, dict all has __hash__ function so isinstance > > and issubclass returns true even though none of list, set, and dict > > was registered as a subclass of Hashable. > > > > But, calling x.__hash__() on these types results to a TypeError: list > > objects are unhashable. > > > > Thanks! > > > > On 8/26/07, Benjamin Aranguren wrote: > > > I got it now. both modules need to be backported as well. I'm on it. > > > > > > On 8/26/07, Benjamin Aranguren wrote: > > > > No problem. Created issue 1026 in tracker with a single patch file attached. > > > > > > > > I'm not aware of what changes need to be done with _abcoll.py and > > > > collections.py. If you can point me to the right direction, I would > > > > definitely like to work on it. > > > > > > > > On 8/26/07, Guido van Rossum wrote: > > > > > Thanks! > > > > > > > > > > Would it inconvenience you terribly to upload this all to the new > > > > > tracker (bugs.python.org)? Preferably as a single patch against the > > > > > svn trunk (to use svn diff, you have to svn add the new files first!) > > > > > > > > > > Also, are you planning to work on _abcoll.py and the changes to collections.py? > > > > > > > > > > --Guido > > > > > > > > > > On 8/26/07, Benjamin Aranguren wrote: > > > > > > We copied abc.py and test_abc.py from py3k svn and modified to work with 2.6. > > > > > > > > > > > > After making all the changes we ran all the tests to ensure that no > > > > > > other modules were affected. > > > > > > > > > > > > Attached are abc.py, test_abc.py, and their relevant patches from 3.0 to 2.6. > > > > > > > > > > > > On 8/25/07, Guido van Rossum wrote: > > > > > > > Um, that patch contains only the C code for overloading isinstance() > > > > > > > and issubclass(). > > > > > > > > > > > > > > Did you do anything about abc.py and _abcoll.py/collections.py and > > > > > > > their respective unit tests? Or what about the unit tests for > > > > > > > isinstance()/issubclass()? > > > > > > > > > > > > > > On 8/25/07, Benjamin Aranguren wrote: > > > > > > > > Worked with Alex Martelli at the Goolge Python Sprint. > > > > > > > > > > > > > > -- > > > > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > > > > > > > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From rhamph at gmail.com Tue Sep 4 06:10:45 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 3 Sep 2007 22:10:45 -0600 Subject: [Python-3000] Performance Notes In-Reply-To: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com> References: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com> Message-ID: On 9/3/07, Nicholas Bastin wrote: > I've been doing some profiling of 3.0 vs. 2.6 release builds on > Windows XP for the purpose of hopefully closing the performance gap. > This data is very preliminary, but I thought I'd throw it out here in > case someone else also wanted to look into this. Also, possibly > useful for comparing against profiling data on other platforms. The > table below just lists functions and speed differentials in 3.0 vs. > 2.6, ordered by the functions in which we spend the most total time. > > NOTE: This data is time sampling, not call graph. Added time could > come from either more calls, or longer calls. > > + 11.5% PyEval_EvalFrameEx > + 40.2% lookdict (replacing lookdict_string) > +312.9% PyDict_GetItem > - 13.2% call_function > + 19.4% fast_function lookdict_string appears to still use the old string type, rather than unicode. This prevents it from being used. It's probably not too hard to fix. > Other notes: > * PyLong_FitsInLong consumes about 2% of total pystone runtime. > * unicode_compare consumes the exact same time in 3.0 that > string_richcompare consumed in 2.6. Either these functions share a > similar CPU profile, or their call counts vary dramatically. > > Top 5 functions in Python 2.6: > > * PyEval_EvalFrameEx (48.66%) > * lookdict_string (5.76%) > * call_function (4.80%) > * frame_dealloc (2.80%) > * fast_function (2.48%) > > Top 5 functions in Python 3.0: > > * PyEval_EvalFrameEx (44.37%) > * lookdict (6.66%) > * PyDict_GetItem (4.63%) > * unicode_hash (3.51%) > * call_function (3.38%) -- Adam Olsen, aka Rhamphoryncus From amk at amk.ca Mon Sep 3 18:53:47 2007 From: amk at amk.ca (A.M. Kuchling) Date: Mon, 3 Sep 2007 12:53:47 -0400 Subject: [Python-3000] [mark@qtrac.eu: Poss. clarification for What's New in Python 3] Message-ID: <20070903165347.GA24392@mac.local> Forwarded: a comment on the 3.0 What's New. --amk -------------- next part -------------- An embedded message was scrubbed... From: Mark Summerfield Subject: Poss. clarification for What's New in Python 3 Date: Sat, 1 Sep 2007 08:55:42 +0100 Size: 3400 Url: http://mail.python.org/pipermail/python-3000/attachments/20070903/787db536/attachment.eml From g.brandl at gmx.net Tue Sep 4 08:23:11 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 04 Sep 2007 08:23:11 +0200 Subject: [Python-3000] What about operator.*slice? Message-ID: Are they useful enough to keep? -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From thomas at python.org Tue Sep 4 10:43:08 2007 From: thomas at python.org (Thomas Wouters) Date: Tue, 4 Sep 2007 10:43:08 +0200 Subject: [Python-3000] Merging between trunk and py3k? In-Reply-To: <66d0a6e10709031933i3b8c0d88ma11429329a4b311d@mail.gmail.com> References: <9e804ac0709031633w705f2c9fkb0cf3ef98a62840c@mail.gmail.com> <66d0a6e10709031933i3b8c0d88ma11429329a4b311d@mail.gmail.com> Message-ID: <9e804ac0709040143q23bcd22by78bf66e4138faa83@mail.gmail.com> On 9/4/07, Nicholas Bastin wrote: > > On 9/3/07, Thomas Wouters wrote: > > > > > > On 8/31/07, Guido van Rossum wrote: > > > I haven't heard yet that merging is impossible or useless; there's > > > still a lot of similarity between the trunk and the branch. > > > > Merging is sometimes hard, but always fun. Well, challenging. A Chinese > kind > > of interesting time. > > Merging in SVN is hard and challenging. Merging in a reasonable SCM > is not so bad. :-) Merging two direct sibling branches with svnmerge is actually quite doable. It's slightly more annoying than it would be in an SCM with proper branch merging, but not significantly so. The merges we're doing would be about as hard and challenging in any other SCM. I know, I actually did them. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/5f05bb83/attachment-0001.htm From noamraph at gmail.com Tue Sep 4 10:49:53 2007 From: noamraph at gmail.com (Noam Raphael) Date: Tue, 4 Sep 2007 11:49:53 +0300 Subject: [Python-3000] Default dict iterator should have been iteritems() Message-ID: Hello, Just a thought that came to me after writing a code that deals quite a lot with dicts: The default dict iterator should in principle be iteritems(), and not iterkeys(). This is probably just theoritical, since it will break a lot of code and not gain a lot, but it may be remembered when someone decides to write a new language... The reasoning is simple: Iteration over an object usually gets all the data it contains. A dict can be seen as an unordered collection of tuples (key, value), indexed by key. So, iteration over a dict should yield those tuples. For this reason, I think that "for key, value in dict.iteritems()" is more common than "for key in dict" - When iterating over a dict, you are usually interested in both the key and the value. Another point: if the default dict iterator were iteritems(), the dict copy constructor would not have been a special case - dict(x) always gets an iterable over tuples and produces a new dict. Currently, if you want to produce a dict from a UserDict, for example, you must call dict(userdict.iteritems()). As I see it, the only reason for the current status is the desire to make "x in dict" equivalent to "dict.has_key(x)", since has_key is a common operation and "x in" is shorter. But actually "dict.has_key(x)" explains exactly what's intended, while "x in dict" isn't really clear (for newbies, that is): do you ask whether x is in dict.keys(), or in dict.values(), or in dict.items()? Of course, if dict's default iterator were iteritems(), "x in dict" should have meant "x in dict.items()", which is very easy to implement. What do you think? Noam From thomas at python.org Tue Sep 4 10:56:20 2007 From: thomas at python.org (Thomas Wouters) Date: Tue, 4 Sep 2007 10:56:20 +0200 Subject: [Python-3000] What about operator.*slice? In-Reply-To: References: Message-ID: <9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com> On 9/4/07, Georg Brandl wrote: > > Are they useful enough to keep? operator.*slice? They're rather convenient when you don't want to bother with creating a slice object yourself, but I'm not worried either way. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/cbcef254/attachment.htm From g.brandl at gmx.net Tue Sep 4 11:09:07 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 04 Sep 2007 11:09:07 +0200 Subject: [Python-3000] Default dict iterator should have been iteritems() In-Reply-To: References: Message-ID: Noam Raphael schrieb: > As I see it, the only reason for the current status is the desire to > make "x in dict" equivalent to "dict.has_key(x)", since has_key is a > common operation and "x in" is shorter. But actually "dict.has_key(x)" > explains exactly what's intended, while "x in dict" isn't really clear > (for newbies, that is): do you ask whether x is in dict.keys(), or in > dict.values(), or in dict.items()? Even if it's true that a loop over items is more common than a loop over keys, "x in keys" is much more common than "x in items". In every language there are things that must be learned and remembered. That dict.__iter__ yields keys is one of them. (You could present similar arguments that speak in favor of dict.__iter__ yielding values...) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From greg.ewing at canterbury.ac.nz Tue Sep 4 11:30:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 04 Sep 2007 21:30:14 +1200 Subject: [Python-3000] Default dict iterator should have been iteritems() In-Reply-To: References: Message-ID: <46DD25A6.6070504@canterbury.ac.nz> Noam Raphael wrote: > The default dict iterator should in principle be iteritems(), and not > iterkeys(). This was discussed at length back when "in" support was added to dicts. There were reasons for choosing to do it the way it's done, and I don't think it's likely to be changed. -- Greg From theller at ctypes.org Tue Sep 4 11:34:46 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 04 Sep 2007 11:34:46 +0200 Subject: [Python-3000] Confused about getattr() and special methods Message-ID: I was looking into the Lib\test\test_uuid on Windows, which fails with this traceback: test test_uuid failed -- Traceback (most recent call last): File "C:\buildbot\work\3.0.heller-windows\build\lib\test\test_uuid.py", line 323, in test_ipconfig_getnode node = uuid._ipconfig_getnode() File "C:\buildbot\work\3.0.heller-windows\build\lib\uuid.py", line 376, in _ipconfig_getnode for line in pipe: TypeError: '_wrap_close' object is not iterable The test can be fixed with this little patch: Index: Lib/os.py =================================================================== --- Lib/os.py (revision 57827) +++ Lib/os.py (working copy) @@ -664,6 +664,8 @@ return self._proc.wait() << 8 # Shift left to match old behavior def __getattr__(self, name): return getattr(self._stream, name) + def __iter__(self): + return iter(self._stream) # Supply os.fdopen() (used by subprocess!) def fdopen(fd, mode="r", buffering=-1): However, looking further into this I'm getting confused. Shouldn't the __getattr__ implementation find the __iter__ method of the _stream instance variable? Consider this code:

##__metaclass__ = type

class X:
    def __str__(self):
        return "foo"
    def __len__(self):
        return 42
    def __iter__(self):
        return iter([1, 2, 3])

class proxy:
    def __init__(self):
        self.x = X()
    def __getattr__(self, name):
        return getattr(self.x, name)

p = proxy()

print(len(p))
print(str(p))
print(iter(p))



In Python2.5 and trunk, all the calls len(p), str(p), and iter(p) return the attributes
of the X class instance.  Uncommenting the '__metaclass__ = type' line makes the code fail.

IIUC, in py3k, classic classes do not exist any longer, so the __metaclass__ line
has no effect anyway.  Is this behaviour intended?

Thomas


From thomas at python.org  Tue Sep  4 12:00:16 2007
From: thomas at python.org (Thomas Wouters)
Date: Tue, 4 Sep 2007 12:00:16 +0200
Subject: [Python-3000] Confused about getattr() and special methods
In-Reply-To: 
References: 
Message-ID: <9e804ac0709040300l1a79d22bv7811b054a5245380@mail.gmail.com>

On 9/4/07, Thomas Heller  wrote:

> Shouldn't the __getattr__ implementation find the __iter__ method
> of the _stream instance variable?


No. For new-style classes, the special methods (that are part of the PyType
C struct) are always looked up on the class, never the instance. The class's
__getattr__ is never called. It's the class that defines behaviour, and
__getattr__ and __getattribute__ just define how to handle *instance*
attribute access.

This change is really the biggest difference between classic classes and
new-style classes, much bigger than the MRO change ;-)

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/df911e34/attachment.htm 

From g.brandl at gmx.net  Tue Sep  4 12:09:12 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 04 Sep 2007 12:09:12 +0200
Subject: [Python-3000] __special__ method lookup [was Re: Confused about
 getattr() and special methods]
In-Reply-To: 
References: 
Message-ID: 

Thomas Heller schrieb:

> IIUC, in py3k, classic classes do not exist any longer, so the __metaclass__ line
> has no effect anyway.  Is this behaviour intended?

It is another incarnation of special methods being looked up on the class,
not the instance. This was always the behavior with new-style classes, see
the thread at

http://mail.python.org/pipermail/python-3000/2007-March/006261.html

for a previous discussion.

I think we should tackle this issue now and make sure the decided resolution
is consistently applied throughout Python.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From ncoghlan at gmail.com  Tue Sep  4 12:27:49 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 04 Sep 2007 20:27:49 +1000
Subject: [Python-3000] __special__ method lookup [was Re: Confused about
 getattr() and special methods]
In-Reply-To: 
References:  
Message-ID: <46DD3325.4060601@gmail.com>

Georg Brandl wrote:
> Thomas Heller schrieb:
> 
>> IIUC, in py3k, classic classes do not exist any longer, so the __metaclass__ line
>> has no effect anyway.  Is this behaviour intended?
> 
> It is another incarnation of special methods being looked up on the class,
> not the instance. This was always the behavior with new-style classes, see
> the thread at
> 
> http://mail.python.org/pipermail/python-3000/2007-March/006261.html
> 
> for a previous discussion.
> 
> I think we should tackle this issue now and make sure the decided resolution
> is consistently applied throughout Python.

This issue came up when implementing PEP 343 as well - because the with 
statement is just syntactic sugar without any dedicated opcodes, 
__enter__/__exit__ are accessed via a conventional attribute lookup 
opcode. So unlike the special methods that use a C-level slot in the 
type object, these two operations *can* be affected by instance 
attributes and __getattr__.

However, Guido did say at the time that he was OK with the effect of 
instance attributes on special method lookups being formally undefined 
and implementation dependent. I wasn't too worried either way - mucking 
with special methods outside the scope of 'provide this on your class to 
support operation X' has long been a pretty dubious exercise.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From noamraph at gmail.com  Tue Sep  4 13:16:07 2007
From: noamraph at gmail.com (Noam Raphael)
Date: Tue, 4 Sep 2007 14:16:07 +0300
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <46DD25A6.6070504@canterbury.ac.nz>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
Message-ID: 

On 9/4/07, Greg Ewing  wrote:
> Noam Raphael wrote:
> > The default dict iterator should in principle be iteritems(), and not
> > iterkeys().
>
> This was discussed at length back when "in" support was
> added to dicts. There were reasons for choosing to do it
> the way it's done, and I don't think it's likely to be
> changed.
>
Just out of curiousity - do you remember these reasons? I just have
the feeling that back then, iterations were less common, since you
couldn't iterate over dicts without creating new lists, and you didn't
have list comprehensions and generators. You couldn't write an
expression such as
  dict((x, y) for y, x in d)
to quickly get the inverse permutation, so the relative ugliness of
  dict((x, y) for y, x in d.items())
was not considered.

I don't think that it's likely to be changed too.

Noam

From g.brandl at gmx.net  Tue Sep  4 13:24:20 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 04 Sep 2007 13:24:20 +0200
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 	<46DD25A6.6070504@canterbury.ac.nz>
	
Message-ID: 

Noam Raphael schrieb:
> On 9/4/07, Greg Ewing  wrote:
>> Noam Raphael wrote:
>> > The default dict iterator should in principle be iteritems(), and not
>> > iterkeys().
>>
>> This was discussed at length back when "in" support was
>> added to dicts. There were reasons for choosing to do it
>> the way it's done, and I don't think it's likely to be
>> changed.
>>
> Just out of curiousity - do you remember these reasons? I just have
> the feeling that back then, iterations were less common, since you
> couldn't iterate over dicts without creating new lists, and you didn't
> have list comprehensions and generators. You couldn't write an
> expression such as
>   dict((x, y) for y, x in d)
> to quickly get the inverse permutation, so the relative ugliness of
>   dict((x, y) for y, x in d.items())
> was not considered.

Well, what about dict((x, d[x]) for x in d) ? Doesn't strike me as ugly...

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From nick.bastin at gmail.com  Tue Sep  4 14:34:43 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 4 Sep 2007 08:34:43 -0400
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
Message-ID: <66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>

On 9/4/07, Georg Brandl  wrote:
> Noam Raphael schrieb:
> > Just out of curiousity - do you remember these reasons? I just have
> > the feeling that back then, iterations were less common, since you
> > couldn't iterate over dicts without creating new lists, and you didn't
> > have list comprehensions and generators. You couldn't write an
> > expression such as
> >   dict((x, y) for y, x in d)
> > to quickly get the inverse permutation, so the relative ugliness of
> >   dict((x, y) for y, x in d.items())
> > was not considered.
>
> Well, what about dict((x, d[x]) for x in d) ? Doesn't strike me as ugly...

It doesn't strike me as ugly, it just strikes me as slow.  In C++, a
std::map::iterator will give you std::pair, and I've
often wanted such a construction in Python.  Right now to get a
similar thing, you pay something like O(n log n) (assuming d[x] is
O(log n)) instead of O(n).  Not to mention that we know that d[x] is
pretty expensive these days on common lookups, since we're not
dropping into the fast lookdict_string anymore.

--
Nick

From guido at python.org  Tue Sep  4 16:23:43 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 07:23:43 -0700
Subject: [Python-3000] What about operator.*slice?
In-Reply-To: <9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
References: 
	<9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
Message-ID: 

Since x[a:b] is not basic syntax (like it once was) but simply the
combination of operator.getitem and slice() I don't see the point of
keeping operator.getitem.

PS. I don't know how useful the operator module really is -- in all
those years it's existed I haven't really used it myself, and I'm
always baffled when I see code using it.

--Guido

On 9/4/07, Thomas Wouters  wrote:
>
>
> On 9/4/07, Georg Brandl  wrote:
> > Are they useful enough to keep?
>
> operator.*slice? They're rather convenient when you don't want to bother
> with creating a slice object yourself, but I'm not worried either way.
>
> --
>  Thomas Wouters 
>
> Hi! I'm a .signature virus! copy me into your .signature file to help me
> spread!
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep  4 16:31:53 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 07:31:53 -0700
Subject: [Python-3000] __special__ method lookup [was Re: Confused about
	getattr() and special methods]
In-Reply-To: <46DD3325.4060601@gmail.com>
References:  
	<46DD3325.4060601@gmail.com>
Message-ID: 

I only care about getting this right when there is a reasonable chance
that a class is being used as an object. For example, at the sprint we
ran into this with the __format__ special method, when someone
discovered that format(object, "") raised a weird error rather than
returning str(object), which was due to the default __format__ method
defined on the object class. It's important that you can format
*anything*, so we fixed this right away.

OTOH for the with-statement, the object passed to it is always
specially constructed to work in this context, and passing something
random like a type object just isn't a reasonable use case. As long as
you get *some* kind of error (and you do, usually complaining about
the arg count) I'm okay.

--Guido

On 9/4/07, Nick Coghlan  wrote:
> Georg Brandl wrote:
> > Thomas Heller schrieb:
> >
> >> IIUC, in py3k, classic classes do not exist any longer, so the __metaclass__ line
> >> has no effect anyway.  Is this behaviour intended?
> >
> > It is another incarnation of special methods being looked up on the class,
> > not the instance. This was always the behavior with new-style classes, see
> > the thread at
> >
> > http://mail.python.org/pipermail/python-3000/2007-March/006261.html
> >
> > for a previous discussion.
> >
> > I think we should tackle this issue now and make sure the decided resolution
> > is consistently applied throughout Python.
>
> This issue came up when implementing PEP 343 as well - because the with
> statement is just syntactic sugar without any dedicated opcodes,
> __enter__/__exit__ are accessed via a conventional attribute lookup
> opcode. So unlike the special methods that use a C-level slot in the
> type object, these two operations *can* be affected by instance
> attributes and __getattr__.
>
> However, Guido did say at the time that he was OK with the effect of
> instance attributes on special method lookups being formally undefined
> and implementation dependent. I wasn't too worried either way - mucking
> with special methods outside the scope of 'provide this on your class to
> support operation X' has long been a pretty dubious exercise.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep  4 16:36:09 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 07:36:09 -0700
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
Message-ID: 

On 9/4/07, Noam Raphael  wrote:
> On 9/4/07, Greg Ewing  wrote:
> > Noam Raphael wrote:
> > > The default dict iterator should in principle be iteritems(), and not
> > > iterkeys().
> >
> > This was discussed at length back when "in" support was
> > added to dicts. There were reasons for choosing to do it
> > the way it's done, and I don't think it's likely to be
> > changed.
> >
> Just out of curiousity - do you remember these reasons?

Consistency with "k in d", where you'll agree with me that the only
useful interpretation is checking for a key. It would be annoying if
"for x in obj:" no longer rhymed with "if x in obj:".

> I just have
> the feeling that back then, iterations were less common, since you
> couldn't iterate over dicts without creating new lists, and you didn't
> have list comprehensions and generators. You couldn't write an
> expression such as
>   dict((x, y) for y, x in d)
> to quickly get the inverse permutation, so the relative ugliness of
>   dict((x, y) for y, x in d.items())
> was not considered.
>
> I don't think that it's likely to be changed too.

I think it's even in PEP 3099 as something we *won't* change. I happen
to be rather fond of it myself.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Sep  4 17:01:07 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 04 Sep 2007 17:01:07 +0200
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
References: 	<46DD25A6.6070504@canterbury.ac.nz>		
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
Message-ID: <46DD7333.9060006@v.loewis.de>

> (assuming d[x] is  O(log n))

In Python, d[x] is typically considered to be O(1) (unlike in C++,
where it is O(log n)). Of course, with Python using a hashtable,
performance may decrease in the presence of collisions. In the
normal case, dict((x, d[x]) for x in d) will be O(n) in Python.

Regards,
Martin

From ncoghlan at gmail.com  Tue Sep  4 17:09:12 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 05 Sep 2007 01:09:12 +1000
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 	<46DD25A6.6070504@canterbury.ac.nz>	
	
Message-ID: <46DD7518.7070108@gmail.com>

Guido van Rossum wrote:
> On 9/4/07, Noam Raphael  wrote:
>> On 9/4/07, Greg Ewing  wrote:
>>> Noam Raphael wrote:
>>>> The default dict iterator should in principle be iteritems(), and not
>>>> iterkeys().
>>> This was discussed at length back when "in" support was
>>> added to dicts. There were reasons for choosing to do it
>>> the way it's done, and I don't think it's likely to be
>>> changed.
>>>
>> Just out of curiousity - do you remember these reasons?
> 
> Consistency with "k in d", where you'll agree with me that the only
> useful interpretation is checking for a key. It would be annoying if
> "for x in obj:" no longer rhymed with "if x in obj:".

I would certainly be rather annoyed if the following code could blow up 
with an assertion error in the absence of any threading foolishness:

   for k in d:
       assert k in d

Containment and iteration really do need to be kept consistent and 
having the value matter when checking for dictionary containment would 
be outright bizarre. Put the two together and it makes sense for 
dictionary iteration and containment tests to both be based on keys.

Note that the other basic container types in the standard library 
(lists, tuples, sets, strings, xrange) also obey the 
iteration<->containment invariant above.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From nick.bastin at gmail.com  Tue Sep  4 17:39:00 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 4 Sep 2007 11:39:00 -0400
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <46DD7333.9060006@v.loewis.de>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
	<46DD7333.9060006@v.loewis.de>
Message-ID: <66d0a6e10709040839u465530bcw38ba21b4886bc4a4@mail.gmail.com>

On 9/4/07, "Martin v. L?wis"  wrote:
> > (assuming d[x] is  O(log n))
>
> In Python, d[x] is typically considered to be O(1) (unlike in C++,
> where it is O(log n)). Of course, with Python using a hashtable,
> performance may decrease in the presence of collisions. In the
> normal case, dict((x, d[x]) for x in d) will be O(n) in Python.

Even if we suppose that d[x] is O(1) (and I don't have real data to
say whether most uses of it actually conform to this, besides keyword
argument passing), that still makes:

[(x, d[x]) for x in d]

O(2n), which is O(n), but only pedantically.  In the real world, 2n is
still worse than n (and the hashtable means that it can devolve into
O(n**2) in the worst case).  However, all that said, you'd probably
never write the above line of code, and d.iteritems() will continue to
suffice if there are concerns about 'for (k,v) in d' being materially
different than 'if x in d'.

--
Nick

From g.brandl at gmx.net  Tue Sep  4 17:45:38 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 04 Sep 2007 17:45:38 +0200
Subject: [Python-3000] abc docs
Message-ID: 

I've added a basic skeleton of documentation for the "abc" module, but it
would be nice if somebody proofread it and at add more from PEP 3119 if
desired.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Tue Sep  4 18:17:36 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 09:17:36 -0700
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <66d0a6e10709040839u465530bcw38ba21b4886bc4a4@mail.gmail.com>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
	<46DD7333.9060006@v.loewis.de>
	<66d0a6e10709040839u465530bcw38ba21b4886bc4a4@mail.gmail.com>
Message-ID: 

On 9/4/07, Nicholas Bastin  wrote:
> On 9/4/07, "Martin v. L?wis"  wrote:
> > > (assuming d[x] is  O(log n))
> >
> > In Python, d[x] is typically considered to be O(1) (unlike in C++,
> > where it is O(log n)). Of course, with Python using a hashtable,
> > performance may decrease in the presence of collisions. In the
> > normal case, dict((x, d[x]) for x in d) will be O(n) in Python.
>
> Even if we suppose that d[x] is O(1) (and I don't have real data to
> say whether most uses of it actually conform to this, besides keyword
> argument passing), that still makes:
>
> [(x, d[x]) for x in d]
>
> O(2n), which is O(n), but only pedantically.  In the real world, 2n is
> still worse than n (and the hashtable means that it can devolve into
> O(n**2) in the worst case).

You shouldn't be using words whose meaning you don't understand.

> However, all that said, you'd probably
> never write the above line of code, and d.iteritems() will continue to
> suffice if there are concerns about 'for (k,v) in d' being materially
> different than 'if x in d'.

Since this is the python-3000 list, d.items() is what you're looking for.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep  4 18:23:47 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 09:23:47 -0700
Subject: [Python-3000] [mark@qtrac.eu: Poss. clarification for What's
	New in Python 3]
In-Reply-To: <20070903165347.GA24392@mac.local>
References: <20070903165347.GA24392@mac.local>
Message-ID: 

Thanks, Mark! Fixed by changing "B\n" into "B". :-)

On 9/3/07, A.M. Kuchling  wrote:
> Forwarded: a comment on the 3.0 What's New.
>
> --amk
>
>
> ---------- Forwarded message ----------
> From: Mark Summerfield 
> To: comments at amk.ca
> Date: Sat, 1 Sep 2007 08:55:42 +0100
> Subject: Poss. clarification for What's New in Python 3
> Hi,
>
> In the What's New in Python 3 document you say
>
>     For example, in Python 2.x, print "A\n", "B\n" would write "A\nB\n";
>     but in Python 3.0, print("A\n", "B\n") writes "A\n B\n".
>
>
> I would be tempted to change this to:
>
>     For example, in Python 2.x, print "A\n", "B\n" would write "A\nB\n\n";
>     but in Python 3.0, print("A\n", "B\n") writes "A\n B\n\n".
>     Python 3's print() has keyword arguments to control what's
>     output between items and what is output at the end, for example,
>     print("A\n", "B\n", sep="", end="") writes "A\nB\n".
>
> --
> Mark Summerfield, Qtrac Ltd., www.qtrac.eu
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Tue Sep  4 18:52:45 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 4 Sep 2007 12:52:45 -0400
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
	<46DD7333.9060006@v.loewis.de>
	<66d0a6e10709040839u465530bcw38ba21b4886bc4a4@mail.gmail.com>
	
Message-ID: <66d0a6e10709040952p472b1bb4q3dcd46b1ac5127ff@mail.gmail.com>

On 9/4/07, Guido van Rossum  wrote:
> On 9/4/07, Nicholas Bastin  wrote:
> > However, all that said, you'd probably
> > never write the above line of code, and d.iteritems() will continue to
> > suffice if there are concerns about 'for (k,v) in d' being materially
> > different than 'if x in d'.
>
> Since this is the python-3000 list, d.items() is what you're looking for.

My mistake, I had referred back to the 3.0 documentation, which still
claims that iteritems is a method.

--
Nick

From greg at krypto.org  Tue Sep  4 19:11:14 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 4 Sep 2007 11:11:14 -0600
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <46DD7333.9060006@v.loewis.de>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
	<46DD7333.9060006@v.loewis.de>
Message-ID: <52dc1c820709041011r64acd37et88cc664350e95e92@mail.gmail.com>

On 9/4/07, "Martin v. L?wis"  wrote:
>
> > (assuming d[x] is  O(log n))
>
> In Python, d[x] is typically considered to be O(1) (unlike in C++,
> where it is O(log n)). Of course, with Python using a hashtable,
> performance may decrease in the presence of collisions. In the
> normal case, dict((x, d[x]) for x in d) will be O(n) in Python.


And if the speed of d[x] were ever an issue that shows up on python
performance profiles when used in a loop like that it would be pretty easy
to optimize the common case internally by having the key iteration retain an
optional (weak?) reference in the dict object to the most recently looked up
key+value for a short circuit quickly returning its value.  I do not expect
that to ever to matter as code can just loop using the appropriate iterator
instead.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/cbe5ee88/attachment.htm 

From g.brandl at gmx.net  Tue Sep  4 19:15:37 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 04 Sep 2007 19:15:37 +0200
Subject: [Python-3000] dict view operations
Message-ID: 

While looking at documenting the dict view changes, I came across an
inconsistency in how the dict views' set-like operations are implemented:
with sets/frozensets, the operator versions only work if the other operand
is a set/frozenset, while the dict view operators allow any iterable.

Do we care?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From skip at pobox.com  Tue Sep  4 19:27:17 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 4 Sep 2007 12:27:17 -0500
Subject: [Python-3000] Should all iter(keys|items|values) be renamed?
Message-ID: <18141.38261.320718.902982@montanaro.dyndns.org>

After Nick's last message I went searching for "iteritems" in the docs.  I
fixed a couple places (not yet checked in), but eventually came across
Mailbox.iteritems.  Looking at the mailbox.py code, sure enough, it still
exists:

    def iteritems(self):
        """Return an iterator over (key, message) tuples."""
        for key in self.keys():
            try:
                value = self[key]
            except KeyError:
                continue
            yield (key, value)

    def items(self):
        """Return a list of (key, message) tuples. Memory intensive."""
        return list(self.iteritems())

Should it be renamed items and the second def'n deleted?  Same for iterkeys,
itervalues where they appear?

Skip

From fdrake at acm.org  Tue Sep  4 19:35:05 2007
From: fdrake at acm.org (Fred Drake)
Date: Tue, 4 Sep 2007 13:35:05 -0400
Subject: [Python-3000] Should all iter(keys|items|values) be renamed?
In-Reply-To: <18141.38261.320718.902982@montanaro.dyndns.org>
References: <18141.38261.320718.902982@montanaro.dyndns.org>
Message-ID: <5D1955CD-4228-43D1-BC4D-C79FAA6832E0@acm.org>

On Sep 4, 2007, at 1:27 PM, skip at pobox.com wrote:
> After Nick's last message I went searching for "iteritems" in the  
> docs.  I
> fixed a couple places (not yet checked in), but eventually came across

Timing is great!  I checked in a bunch of doc changes on this exact  
topic.  Watch for conflicts.  My changes were mostly removals and  
minor updates; I'm sure there's more to be done.


   -Fred

-- 
Fred Drake   




From g.brandl at gmx.net  Tue Sep  4 19:37:26 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 04 Sep 2007 19:37:26 +0200
Subject: [Python-3000] dict view operations
In-Reply-To: 
References: 
Message-ID: 

Georg Brandl schrieb:
> While looking at documenting the dict view changes, I came across an
> inconsistency in how the dict views' set-like operations are implemented:
> with sets/frozensets, the operator versions only work if the other operand
> is a set/frozenset, while the dict view operators allow any iterable.
> 
> Do we care?

Oh, and another thing: the items views can contain unhashable values, so

d.items() & d.items()

will fail for such dictionaries since the operands are converted to sets
before doing the intersection.

I suspect there's nothing that can easily be done about that though...

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From brett at python.org  Tue Sep  4 20:08:53 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 4 Sep 2007 11:08:53 -0700
Subject: [Python-3000] What about operator.*slice?
In-Reply-To: 
References: 
	<9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
	
Message-ID: 

On 9/4/07, Guido van Rossum  wrote:
> Since x[a:b] is not basic syntax (like it once was) but simply the
> combination of operator.getitem and slice() I don't see the point of
> keeping operator.getitem.
>
> PS. I don't know how useful the operator module really is -- in all
> those years it's existed I haven't really used it myself, and I'm
> always baffled when I see code using it.
>

The only great use I have found for it myself is attrgetter and
itemgetter, but those were added by Raymond in 2.5 (I think).
Otherwise I never use it.

-Brett

From guido at python.org  Tue Sep  4 20:17:34 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 11:17:34 -0700
Subject: [Python-3000] Should all iter(keys|items|values) be renamed?
In-Reply-To: <18141.38261.320718.902982@montanaro.dyndns.org>
References: <18141.38261.320718.902982@montanaro.dyndns.org>
Message-ID: 

On 9/4/07, skip at pobox.com  wrote:
> After Nick's last message I went searching for "iteritems" in the docs.  I
> fixed a couple places (not yet checked in), but eventually came across
> Mailbox.iteritems.  Looking at the mailbox.py code, sure enough, it still
> exists:
>
>     def iteritems(self):
>         """Return an iterator over (key, message) tuples."""
>         for key in self.keys():
>             try:
>                 value = self[key]
>             except KeyError:
>                 continue
>             yield (key, value)
>
>     def items(self):
>         """Return a list of (key, message) tuples. Memory intensive."""
>         return list(self.iteritems())
>
> Should it be renamed items and the second def'n deleted?  Same for iterkeys,
> itervalues where they appear?

It is incorrect to replace items() with iteritems() though -- it
should be replaced with a "view" like sketched in PEP 3106.

I think this will be a fairly large project; ATM we don't even have a
reusable implementation of dict views (the version in dictobject.c is
explicitly restricted to dict instances). It would be a good idea to
review the conformance of every stdlib API that tries to look like a
mapping, and make them conform to the new mapping ABCs in PEP 3119.
(Ditto for sequences and sets except there are so few of those.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep  4 20:22:31 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 11:22:31 -0700
Subject: [Python-3000] dict view operations
In-Reply-To: 
References:  
Message-ID: 

On 9/4/07, Georg Brandl  wrote:
> Georg Brandl schrieb:
> > While looking at documenting the dict view changes, I came across an
> > inconsistency in how the dict views' set-like operations are implemented:
> > with sets/frozensets, the operator versions only work if the other operand
> > is a set/frozenset, while the dict view operators allow any iterable.
> >
> > Do we care?

The Set ABCs in PEP 3119 should be followed IMO. But they haven't
received a lot of review so we may have to go back and discuss what
that PEP should say (and perhaps it isn't giving enough detail).
However, I don't see it as a violation if some of the types are more
lenient in what they accept -- they just shouldn't be more
restrictive.

> Oh, and another thing: the items views can contain unhashable values, so
>
> d.items() & d.items()
>
> will fail for such dictionaries since the operands are converted to sets
> before doing the intersection.
>
> I suspect there's nothing that can easily be done about that though...

Indeed, since the result must be a new set (not a view) and the result
cannot be represented as a set either (unless it's empty or happens to
contain no unhashable values, which would be a rare piece of luck).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Tue Sep  4 20:35:12 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 04 Sep 2007 20:35:12 +0200
Subject: [Python-3000] dict view operations
In-Reply-To: 
References:  
Message-ID: <46DDA560.8070301@v.loewis.de>

> Oh, and another thing: the items views can contain unhashable values

That, of course, could be fixed: if the key-value pairs would only
hash by key (ignoring the value), they would remain hashable.

Regards,
Martin

From guido at python.org  Tue Sep  4 20:41:37 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 11:41:37 -0700
Subject: [Python-3000] dict view operations
In-Reply-To: <46DDA560.8070301@v.loewis.de>
References:  
	<46DDA560.8070301@v.loewis.de>
Message-ID: 

On 9/4/07, "Martin v. L?wis"  wrote:
> > Oh, and another thing: the items views can contain unhashable values
>
> That, of course, could be fixed: if the key-value pairs would only
> hash by key (ignoring the value), they would remain hashable.

How would that help? The key/value pairs are ordinary tuples, so you
still wouldn't be able to look them up in another set, nor would you
be able to represent d.items() & d.items() as a regular set or
frozenset instance.

What use case are you thinking of that this would address?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Tue Sep  4 20:44:47 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 4 Sep 2007 14:44:47 -0400
Subject: [Python-3000] dict view operations
In-Reply-To: <46DDA560.8070301@v.loewis.de>
References:  
	<46DDA560.8070301@v.loewis.de>
Message-ID: <66d0a6e10709041144k1402615q17182d820c99cdc9@mail.gmail.com>

On 9/4/07, "Martin v. L?wis"  wrote:
> > Oh, and another thing: the items views can contain unhashable values
>
> That, of course, could be fixed: if the key-value pairs would only
> hash by key (ignoring the value), they would remain hashable.

I understand what you mean, but without changing tuples generically,
how would you implement this?

--
Nick

From barry at python.org  Tue Sep  4 20:51:49 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 4 Sep 2007 14:51:49 -0400
Subject: [Python-3000] What about operator.*slice?
In-Reply-To: 
References: 
	<9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
	
	
Message-ID: <43F0FFDA-242C-4810-A534-164092EBA835@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 4, 2007, at 2:08 PM, Brett Cannon wrote:

> On 9/4/07, Guido van Rossum  wrote:
>> Since x[a:b] is not basic syntax (like it once was) but simply the
>> combination of operator.getitem and slice() I don't see the point of
>> keeping operator.getitem.
>>
>> PS. I don't know how useful the operator module really is -- in all
>> those years it's existed I haven't really used it myself, and I'm
>> always baffled when I see code using it.
>>
>
> The only great use I have found for it myself is attrgetter and
> itemgetter, but those were added by Raymond in 2.5 (I think).
> Otherwise I never use it.

Same here, although very occasionally I use one or two others.  I  
still think attrgetter could be made more useful by dereferencing dot- 
paths.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRt2pRXEjvBPtnXfVAQK59gQAn7KTJHk3R3JTLErEfljDKZ7B2H0WEZD3
ljpnDc7Kn5GNAfWdNueJNigMKGctKhK3ZEO9Gw8TNxTJonhOCjLhSPZPrCMlM3tV
CeEieXw8VBFMPA0biDEtq3Ic6x/6yuX3xXmVPQTOOY1kAScfFmeb1bi17xPkhdsl
36FrPEsePig=
=rR0Y
-----END PGP SIGNATURE-----

From hto at arcor.de  Tue Sep  4 18:49:48 2007
From: hto at arcor.de (Thomas Hunger)
Date: Tue, 4 Sep 2007 18:49:48 +0200
Subject: [Python-3000] Performance Notes
In-Reply-To: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com>
References: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com>
Message-ID: <200709041849.48534.hto@arcor.de>

> I've been doing some profiling of 3.0 vs. 2.6 release builds on
> Windows XP for the purpose of hopefully closing the performance
> gap. This data is very preliminary, but I thought I'd throw it out
> here in case someone else also wanted to look into this.  Also,
> possibly useful for comparing against profiling data on other
> platforms.  The table below just lists functions and speed
> differentials in 3.0 vs. 2.6, ordered by the functions in which we
> spend the most total time.

Hello, 

I don't know much about python internals, so the following might be 
bogus:

I replaced unicode_hash and string_hash with the hash function from 
here: http://www.azillionmonkeys.com/qed/hash.html.

Then I ran the following micro-benchmark :

    $ time ./python bench.py

where bech.py is:

    f = dict((line, nr) for nr, line
             in enumerate(open('/usr/share/dict/words',
                               encoding='latin1').readlines()))

Python3k original hash: real    0m2.210s
              new hash: real    0m1.842s

So maybe this is an interesting hash function?

Tom

From martin at v.loewis.de  Tue Sep  4 20:55:14 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 04 Sep 2007 20:55:14 +0200
Subject: [Python-3000] dict view operations
In-Reply-To: 
References:  	
	<46DDA560.8070301@v.loewis.de>
	
Message-ID: <46DDAA12.5030707@v.loewis.de>

Guido van Rossum schrieb:
> On 9/4/07, "Martin v. L?wis"  wrote:
>>> Oh, and another thing: the items views can contain unhashable values
>> That, of course, could be fixed: if the key-value pairs would only
>> hash by key (ignoring the value), they would remain hashable.
> 
> How would that help? The key/value pairs are ordinary tuples

They would have to stop being that:

class Association(tuple):
  def __hash__(self):
    return hash(self[0])

> What use case are you thinking of that this would address?

It would allow to treat the items view as a proper set (which
it still is).

Regards,
Martin


From guido at python.org  Tue Sep  4 21:14:14 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 12:14:14 -0700
Subject: [Python-3000] dict view operations
In-Reply-To: <46DDAA12.5030707@v.loewis.de>
References:  
	<46DDA560.8070301@v.loewis.de>
	
	<46DDAA12.5030707@v.loewis.de>
Message-ID: 

On 9/4/07, "Martin v. L?wis"  wrote:
> Guido van Rossum schrieb:
> > On 9/4/07, "Martin v. L?wis"  wrote:
> >>> Oh, and another thing: the items views can contain unhashable values
> >> That, of course, could be fixed: if the key-value pairs would only
> >> hash by key (ignoring the value), they would remain hashable.
> >
> > How would that help? The key/value pairs are ordinary tuples
>
> They would have to stop being that:
>
> class Association(tuple):
>   def __hash__(self):
>     return hash(self[0])
>
> > What use case are you thinking of that this would address?
>
> It would allow to treat the items view as a proper set (which
> it still is).

Can you give some examples? I can too easily think of examples that
fail with this approach:

d = {1: 1, 2: 2}
iv = set(d.items())
(1, 1) in iv

The latter expression would be False, (while it currently is True),
since (1,1) has a different hash value than Association((1, 1)).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Tue Sep  4 21:18:59 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 4 Sep 2007 15:18:59 -0400
Subject: [Python-3000] Default dict iterator should have been iteritems()
References: 
Message-ID: 


"Noam Raphael"  wrote in message 
news:b348a0850709040149i6d9d7183ped5d393d492d3824 at mail.gmail.com...
| The reasoning is simple: Iteration over an object usually gets all the
| data it contains. A dict can be seen as an unordered collection of
| tuples (key, value), indexed by key. So, iteration over a dict should
| yield those tuples.

Given that viewpoint, yes.  But a dict can also be seen as a set of objects 
that happen to have a value attached (like a graph with labelled nodes, 
which is still 'made up of' nodes rather than (node,label) pairs).  From 
this viewpoint, yielding the objects is sensible.

By itself, I think the decision was a toss-up.  But consistency with 'in', 
which is not a toss-up, tips the balance.

tjr




From martin at v.loewis.de  Tue Sep  4 21:22:48 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 04 Sep 2007 21:22:48 +0200
Subject: [Python-3000] dict view operations
In-Reply-To: 
References:  	
	<46DDA560.8070301@v.loewis.de>	
		
	<46DDAA12.5030707@v.loewis.de>
	
Message-ID: <46DDB088.7010606@v.loewis.de>

>>> What use case are you thinking of that this would address?
>> It would allow to treat the items view as a proper set (which
>> it still is).
> 
> Can you give some examples?

You mean, actual applications where people would want to perform
set operations on .items()? No - I was just trying to give a
solution to the theoretical problem that Georg brought up.

> I can too easily think of examples that
> fail with this approach:
> 
> d = {1: 1, 2: 2}
> iv = set(d.items())
> (1, 1) in iv
> 
> The latter expression would be False, (while it currently is True),
> since (1,1) has a different hash value than Association((1, 1)).

Right. Since the elements in the view/set would not be plain
two-tuples, this would have to be spelled as

Association((1,1)) in iv

Of course, it violates the principle that things that compare equal
should also hash equal; to restore that principle, one would have
to make associations not compare equal to two-tuples (and then
not make them a subtype anymore, either).

Regards,
Martin


From eduardo.padoan at gmail.com  Tue Sep  4 21:37:44 2007
From: eduardo.padoan at gmail.com (Eduardo O. Padoan)
Date: Tue, 4 Sep 2007 16:37:44 -0300
Subject: [Python-3000] dict view operations
In-Reply-To: 
References:  
Message-ID: 

On 9/4/07, Georg Brandl  wrote:
> Georg Brandl schrieb:

> Oh, and another thing: the items views can contain unhashable values, so
>
> d.items() & d.items()
>
> will fail for such dictionaries since the operands are converted to sets
> before doing the intersection.
>
> I suspect there's nothing that can easily be done about that though...

Py3k-ish:

>>> d = {2: [], 4: {}}
>>> d.items() & d.items()
...
TypeError: list objects are unhashable

Must behave like Python 2.x-ish:

>>> d = {2: [], 4: {}}
>>> set(d.items()) & set(d.items())
...
TypeError: list objects are unhashable

.. right? If so, IIUC, there is nothing to be done about that...




> Georg

-- 
http://www.advogato.org/person/eopadoan/
Bookmarks: http://del.icio.us/edcrypt

From eric+python-dev at trueblade.com  Tue Sep  4 22:05:53 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 04 Sep 2007 16:05:53 -0400
Subject: [Python-3000] str.format vs. string.Formatter exceptions
In-Reply-To: 
References: <46DC230F.2040409@trueblade.com>
	
Message-ID: <46DDBAA1.7090406@trueblade.com>

Guido van Rossum wrote:
> Since IndexError and KeyError are conceptually like ValueError but in
> a more narrowly defined context, I think IndexError and KeyError
> actually make sense here (even though they don't inherit from
> ValueError).
> 
> --Guido

Okay, I'll change these to IndexError and KeyError.

Eric.

> 
> On 9/3/07, Eric Smith  wrote:
>> Ron Adam points out some differences in which exceptions are thrown by
>> str.format and string.Formatter.  For example, on a missing positional
>> argument:
>>
>>  >>> "{0}".format()
>> Traceback (most recent call last):
>>    File "", line 1, in 
>> ValueError: Not enough positional arguments in format string
>>
>>  >>> Formatter().format("{0}")
>> Traceback (most recent call last):
>>    File "", line 1, in 
>>    File "/shared/src/python/py3k/Lib/string.py", line 201, in format
>>      return self.vformat(format_string, args, kwargs)
>>    File "/shared/src/python/py3k/Lib/string.py", line 220, in vformat
>>      obj, arg_used = self.get_field(field_name, args, kwargs)
>>    File "/shared/src/python/py3k/Lib/string.py", line 278, in get_field
>>      obj = self.get_value(first, args, kwargs)
>>    File "/shared/src/python/py3k/Lib/string.py", line 235, in get_value
>>      return args[key]
>> IndexError: tuple index out of range
>>
>> The PEP says: In general, exceptions generated by the formatter code
>> itself are of the "ValueError" variety -- there is an error in the
>> actual "value" of the format string.
>>
>> I can easily change string.Formatter to make this a ValueError, and I
>> think that's probably the right thing to do.  For example, if the string
>> comes from a translation module, then there might be an extra parameter
>> added by mistake, in which case ValueError seems right to me.
>>
>> But I'd like to hear if anyone else thinks this should be an IndexError,
>> or maybe they both should be some other exception.
>>
>> Similarly "{x}".format()' currently raises ValueError, but
>> 'Formatter().format("{x}")' raises KeyError.
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
> 
> 


From greg.ewing at canterbury.ac.nz  Tue Sep  4 22:44:45 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 05 Sep 2007 08:44:45 +1200
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
Message-ID: <46DDC3BD.8090505@canterbury.ac.nz>

Noam Raphael wrote:

> Just out of curiousity - do you remember these reasons?

I don't remember the discussion in detail, but a couple of
reasons that come to mind:

* It would be confusing to have "x in d" and "for x in d"
meaning subtly different things.

* It's more efficient to iterate over just the keys,
because a tuple has to be created for each item when
iterating over (key, value) pairs. It's reasonable
that if you want more done, you should have to write
more to get it.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Sep  4 22:52:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 05 Sep 2007 08:52:20 +1200
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<66d0a6e10709040534j616eda22va40647ca622ae989@mail.gmail.com>
Message-ID: <46DDC584.6010905@canterbury.ac.nz>

Nicholas Bastin wrote:
> On 9/4/07, Georg Brandl  wrote:
 >
> > Well, what about dict((x, d[x]) for x in d) ? Doesn't strike me as ugly...
> 
> It doesn't strike me as ugly, it just strikes me as slow.

Are people forgetting that in 3.0

   dict(d.items())

will do the same thing very efficiently?

Of course, if you know you have a dict, d.copy() is even
more efficient.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Sep  4 23:01:03 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 05 Sep 2007 09:01:03 +1200
Subject: [Python-3000] What about operator.*slice?
In-Reply-To: 
References: 
	<9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
	
Message-ID: <46DDC78F.2090208@canterbury.ac.nz>

Guido van Rossum wrote:
> PS. I don't know how useful the operator module really is

I think its main use is as a source of functions for passing
to map(). Unless I'm mistaken, that's still going to be faster
than a listcomp when a built-in function is used, isn't it?

--
Greg

From facundobatista at gmail.com  Tue Sep  4 23:46:43 2007
From: facundobatista at gmail.com (Facundo Batista)
Date: Tue, 4 Sep 2007 18:46:43 -0300
Subject: [Python-3000] What about operator.*slice?
In-Reply-To: <46DDC78F.2090208@canterbury.ac.nz>
References: 
	<9e804ac0709040156x74a36892p1090d0d113f043f9@mail.gmail.com>
	
	<46DDC78F.2090208@canterbury.ac.nz>
Message-ID: 

2007/9/4, Greg Ewing :

> I think its main use is as a source of functions for passing
> to map(). Unless I'm mistaken, that's still going to be faster

Or to sort:

>>> import operator
>>> l = [(1, 3), (2, 2)]
>>> sorted(l, key=operator.itemgetter(1))
[(2, 2), (1, 3)]
>>>

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From lars at ibp.de  Tue Sep  4 23:54:53 2007
From: lars at ibp.de (Lars Immisch)
Date: Tue, 04 Sep 2007 23:54:53 +0200
Subject: [Python-3000] audio device support
Message-ID: <46DDD42D.8090608@ibp.de>

Hi,

I recently worked on Python audio device support for Linux and OS X. Not 
so recently, I wrote a DirectSound module for win32.

Python 2 has support for various audio devices, but they have no common 
interface and some are broken or obsolete. Python 3000 might be a chance 
to improve on this.

The situation seems to be:

Linux:

ossaudiodev is becoming obsolete on Linux (because OSS is being replaced 
by ALSA).

pyalsaaudio, http://sourceforge.net/projects/pyalsaaudio, is broken for 
multithreaded programs: it does not wrap blocking calls with 
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. A suitable, submitted patch 
has not been included by the maintainer in nearly two years. With this 
or a similar patch, it works fine, however.

Windows:

win32all has DirectSound support, but it's lowlevel and complicated. 
Other audio device wrappers may exit, but I don't know about them.

OS X:

The (undocumented) audiodev implementation does not work for me. There 
is a pyrex implementation for coreaudio support which I haven't tested, 
but I have written coreaudio wrappers in C (to be published).

What I'd like to see:

I like the idea of having audio device support for the major operating 
systems in the standard library.

But I am even more interested in a common interface for simple operations.

IMO, the API should support:

- stereo playback
- stereo recording
- different sampling rates and formats (alaw, mulaw and PCM in signed 
integers in various widths and maybe PCM in floats/doubles).
- device selection
- volume control

Overall, I think the level of abstraction in the OSS or ALSA APIs is 
about right, coreaudio on OS X and DirectSound on Windows are overkill 
outside of niche applications.

I would volunteer sample implementations for Windows, OS X and Linux (ALSA).

- Lars

From brett at python.org  Wed Sep  5 02:54:44 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 4 Sep 2007 17:54:44 -0700
Subject: [Python-3000] Questions about PEP 3121
Message-ID: 

I am prepping for a presentation on Python 3.0 that I am giving
tonight and I had some questions about PEP 3121 that the example
creates.

First is whether the name of the function that returns the
module-specific memory is PyModule_GetData() or PyModule_GetState()?
The former is listed by the PEP but the latter is used by the example.

Second is how are the exception and type to be added to the module?
Currently one uses PyModule_AddObject() to insert an object into the
global namespace of a module.  But the example leaves that out and I
wanted to make sure there was not some magical new step left out
(initializing Xxo_Type is also left out, but that does not directly
deal with module initialization).

Lastly, what is tp_reload to be used for?  The PEP doesn't say but the
PyModuleDef lists it.  I assume it is to be called when a module is
reloaded, but it is not specified in the PEP.

-Brett

From janssen at parc.com  Wed Sep  5 05:31:25 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 4 Sep 2007 20:31:25 PDT
Subject: [Python-3000] bytes C API in 2.6 for easy transition to 3.0?
Message-ID: <07Sep4.203132pdt."57996"@synergy1.parc.xerox.com>

According to PEP 358, "bytes" will be in both 2.6 and 3.0.  It would
be nice if the C API for "bytes" existed in the trunk, so that it
could be used for new code that will port more easily to 3.0.

Bill

From guido at python.org  Wed Sep  5 05:41:54 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 4 Sep 2007 20:41:54 -0700
Subject: [Python-3000] bytes C API in 2.6 for easy transition to 3.0?
In-Reply-To: <-6760061404575982124@unknownmsgid>
References: <-6760061404575982124@unknownmsgid>
Message-ID: 

This is the plan. We're just short on cheap labor to implement it. I
wish I could quote an email that you sent long, long ago (in ILU
times) about having set up a drummer in the back of the room to entice
the 50 coding slaves to more productivity. I believe there was a whip
involved too. :-)

On 9/4/07, Bill Janssen  wrote:
> According to PEP 358, "bytes" will be in both 2.6 and 3.0.  It would
> be nice if the C API for "bytes" existed in the trunk, so that it
> could be used for new code that will port more easily to 3.0.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Wed Sep  5 07:53:45 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 4 Sep 2007 23:53:45 -0600
Subject: [Python-3000] bytes C API in 2.6 for easy transition to 3.0?
In-Reply-To: <1636919686236946180@unknownmsgid>
References: <1636919686236946180@unknownmsgid>
Message-ID: <52dc1c820709042253j5c69e4e5l1d3f953526c051a4@mail.gmail.com>

On 9/4/07, Bill Janssen  wrote:
>
> According to PEP 358, "bytes" will be in both 2.6 and 3.0.  It would
> be nice if the C API for "bytes" existed in the trunk, so that it
> could be used for new code that will port more easily to 3.0.
>
> Bill


I assume this includes the new buffer api since we really seem to want C API
users to use that rather than bytes objects directly?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070904/6bd913dd/attachment.htm 

From nick.bastin at gmail.com  Wed Sep  5 09:17:39 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 5 Sep 2007 03:17:39 -0400
Subject: [Python-3000] Solaris support in 3.0?
Message-ID: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>

This is a combination question-and-status-report email.  The question
would be, what does the "somewhat" tag mean on Solaris support in the
release notes for 3.0a1, and does someone have a list of things that
don't work, or does that just mean it hasn't been tested?

I built 3.0a1 on Sparc Solaris (5.8), and except for those things that
didn't build for lack of the required dependencies (_bsddb, _hashlib,
_ssl, _tkinter, gdbm, ossaudiodev, readline, _curses, _curses_panel),
everything claims to have built fine (with gcc 3.4.6).

Unit tests reveal the following failures:

test_cookielib (no _md5)
test_fileio
test_nis
test_pickletools
test_pipes
test_pty
test_str
test_unicode
test_userstring
test_uuid (no _md5)


And the following unexpected (according to it) skips:

test_hashlib (no _md5)
test_hmac (no _md5)
test_urllib2_localnet (no _md5)
test_urllib2net (no _md5)
test_urllib2 (no _md5)
test_tcl (no tcl on my system)
test_sundry (no _md5)
test_ssl (no SSL in my configuration)
test_tarfile (no _md5)
test_unicodedata (no _md5)

If anyone wants more data on any of these particular failures, let me
know, otherwise I'm going to start working through the ones that fail
in 3.0 that don't fail in 2.6.  All of the _md5 failures are because
of the lack of SSL, so I'm not sure that the tests should be 'failing'
in this configuration.

--
Nick

From mark at qtrac.eu  Wed Sep  5 10:43:04 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 5 Sep 2007 09:43:04 +0100
Subject: [Python-3000] abc docs
In-Reply-To: 
References: 
Message-ID: <200709050943.04784.mark@qtrac.eu>

On 2007-09-04, Georg Brandl wrote:
> I've added a basic skeleton of documentation for the "abc" module, but it
> would be nice if somebody proofread it and at add more from PEP 3119 if
> desired.

One strange point: the module correctly appears on the
library/python.html page (Python Runtime Services), but does _not_
appear in library/index.html (The Python Standard Library), although all
the other Python Runtime Services modules do. index.rst lists python.rst
and python.rst lists abc.rst.

I've done various changes to the text, with one semantic change (in the
parenthesised phrase about __mro__).

Also, I added a table to collections.rst listing collections.Container,
collections.Hashable, and similar that you might want to check over.

All in revision 57988.


BTW When I tried a variation of one of the ABC examples from the PEP I
got this:

    Python 3.0a1 (py3k, Sep  1 2007, 08:25:11)
    [GCC 4.1.2 20070626 (Red Hat 4.1.2-13)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import abc
    >>> class MyABC(abc.ABCMeta): pass
    ...
    >>> MyABC.register(tuple)
    Traceback (most recent call last):
    File "", line 1, in 
    RuntimeError: maximum recursion depth exceeded in __instancecheck__

So then I tried it exactly as written, and it worked fine:

    Python 3.0a1 (py3k, Sep  1 2007, 08:25:11)
    [GCC 4.1.2 20070626 (Red Hat 4.1.2-13)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from abc import ABCMeta
    >>> class MyABC(metaclass=ABCMeta): pass
    ...
    >>> MyABC.register(tuple)
    >>> assert issubclass(tuple, MyABC)
    >>> assert isinstance((), MyABC)

I hope that the first one is a bug rather than intended.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From martin at v.loewis.de  Wed Sep  5 13:00:32 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 05 Sep 2007 13:00:32 +0200
Subject: [Python-3000] Questions about PEP 3121
In-Reply-To: 
References: 
Message-ID: <46DE8C50.6090600@v.loewis.de>

> First is whether the name of the function that returns the
> module-specific memory is PyModule_GetData() or PyModule_GetState()?
> The former is listed by the PEP but the latter is used by the example.

I think I like _GetState more, so I have now adjusted the PEP.

> Second is how are the exception and type to be added to the module?
> Currently one uses PyModule_AddObject() to insert an object into the
> global namespace of a module.  But the example leaves that out and I
> wanted to make sure there was not some magical new step left out
> (initializing Xxo_Type is also left out, but that does not directly
> deal with module initialization).

No, this is just an omission. I'll fix it when I revise the PEP after
the implementation.

> Lastly, what is tp_reload to be used for?  The PEP doesn't say but the
> PyModuleDef lists it.  I assume it is to be called when a module is
> reloaded, but it is not specified in the PEP.

Yes; I'm not certain whether module reloading continues to be supported
in Py3k or not. If not, it should be removed from the PEP, if yes, it
should be specified.

A few other issues that you may want to know:

I found that enhancing PyModule_New cannot really work, as Py_InitModule
does a lot of other things that shouldn't be done in PyModule_New (which
is also used to create Python modules). So I keep calling the function
Py_InitModule.

I also found that passing two constant arguments to the function is
pointless, so I moved the module name into struct PyModuleDef. I also
add PyModuleDef_HEAD, similar to types.

E.g. for array, the current diff looks like that:

+static PyModuleDef array_mod = {
+       PyModuleDef_HEAD,
+       "array",          /* name */
+       module_doc,       /* doc string */
+       a_methods,        /* methods */
+       0,                /* m_size */
+       NULL,             /* m_reload */
+       NULL,             /* m_traverse */
+       NULL,             /* m_clear */
+       NULL,             /* m_free */
+};
+
 PyMODINIT_FUNC
-initarray(void)
+PyInit_array(void)
 {
        PyObject *m;

        if (PyType_Ready(&Arraytype) < 0)
             return;
        Py_Type(&PyArrayIter_Type) = &PyType_Type;
-       m = Py_InitModule3("array", a_methods, module_doc);
+       m = Py_InitModule(&array_mod);
        if (m == NULL)
-               return;
+               return NULL;

         Py_INCREF((PyObject *)&Arraytype);
        PyModule_AddObject(m, "ArrayType", (PyObject *)&Arraytype);
         Py_INCREF((PyObject *)&Arraytype);
        PyModule_AddObject(m, "array", (PyObject *)&Arraytype);
        /* No need to check the error here, the caller will do that */
+       return m;
 }

This doesn't include putting the type into interpreter state, and
I won't be able to fix all cases of global variables (also, some
global variables are out of scope of the PEP, including most types,
so some global variables will remain after I'm done).

Notice that I also kept the convention that the caller will check
for errors, so you can return a module object even though an
exception occurred. Making all these functions exception-safe is
fairly tedious, and I'm not attempting that for the moment.

Regards,
Martin

From martin at v.loewis.de  Wed Sep  5 13:19:12 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 05 Sep 2007 13:19:12 +0200
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
Message-ID: <46DE90B0.4050905@v.loewis.de>

> This is a combination question-and-status-report email.  The question
> would be, what does the "somewhat" tag mean on Solaris support in the
> release notes for 3.0a1, and does someone have a list of things that
> don't work, or does that just mean it hasn't been tested?

Not sure what "somewhat" means, but you can take a look at the build
failures in the Solaris buildbot - this is what is "officially" known
not to work.

As always with Solaris, there are several dimensions to be considered:
- version (2.5,2.6,7,8,9,10,11); not sure what the oldest Solaris
  version is that we still want to support.
- compiler: gcc vs. SunPRO/Forte
- 32 vs. 64 bits
- SPARC vs. x86

(not all combinations exist, but plenty)

> If anyone wants more data on any of these particular failures, let me
> know, otherwise I'm going to start working through the ones that fail
> in 3.0 that don't fail in 2.6.  All of the _md5 failures are because
> of the lack of SSL, so I'm not sure that the tests should be 'failing'
> in this configuration.

I think that's a serious issue to consider. As so much code now depends
on OpenSSL, setup.py should try harder to find it. E.g. on the build
slave, it can be found in /usr/sfw - not sure whether that is normal
on a Solaris 10 installation, and not sure whether there is a
Sun-provided OpenSSL on Solaris 8.

Notice that the tests don't 'fail', they are skipped. There are also
failing test cases, something that is more worrisome than a skipped
test case.

Regards,
Martin

From mark at qtrac.eu  Wed Sep  5 13:40:43 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 5 Sep 2007 12:40:43 +0100
Subject: [Python-3000] abc docs
Message-ID: <200709051240.43292.mark@qtrac.eu>

I may not be the first to mistakenly write

    class Foo(ABCMeta):

when I meant to write

    class Foo(metaclass=ABCMeta):

but I'm sure I won't be the last.

Sorry for the mistake...

Maybe attempting to register an ABCMeta subclass might lead to a more
informative warning though?

----------  Forwarded Message  ----------

Subject: Re: [Python-3000] abc docs
Date: 2007-09-05
From: Mark Summerfield 
To: python-3000 at python.org

[snip]

BTW When I tried a variation of one of the ABC examples from the PEP I
got this:

    Python 3.0a1 (py3k, Sep  1 2007, 08:25:11)
    [GCC 4.1.2 20070626 (Red Hat 4.1.2-13)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import abc
    >>> class MyABC(abc.ABCMeta): pass
    ...
    >>> MyABC.register(tuple)
    Traceback (most recent call last):
    File "", line 1, in 
    RuntimeError: maximum recursion depth exceeded in __instancecheck__

[snip]

I hope that the first one is a bug rather than intended.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


-------------------------------------------------------

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu




From guido at python.org  Wed Sep  5 17:02:14 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 5 Sep 2007 08:02:14 -0700
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <46DE90B0.4050905@v.loewis.de>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
Message-ID: 

On 9/5/07, "Martin v. L?wis"  wrote:
> > This is a combination question-and-status-report email.  The question
> > would be, what does the "somewhat" tag mean on Solaris support in the
> > release notes for 3.0a1, and does someone have a list of things that
> > don't work, or does that just mean it hasn't been tested?
>
> Not sure what "somewhat" means, but you can take a look at the build
> failures in the Solaris buildbot - this is what is "officially" known
> not to work.

The "somewhat" was my word -- I meant that when I last looked at the
Solaris buildbot, I saw a few failures; and also that I don't have
access to Sun hardware. And also what Martin says below. I'd be happy
though to replace "somewhat" with specific indications of h/w and s/w
versions if you are willing to commit to supporting these throughout
the 3.0 life cycle.

> As always with Solaris, there are several dimensions to be considered:
> - version (2.5,2.6,7,8,9,10,11); not sure what the oldest Solaris
>   version is that we still want to support.
> - compiler: gcc vs. SunPRO/Forte
> - 32 vs. 64 bits
> - SPARC vs. x86
>
> (not all combinations exist, but plenty)
>
> > If anyone wants more data on any of these particular failures, let me
> > know, otherwise I'm going to start working through the ones that fail
> > in 3.0 that don't fail in 2.6.  All of the _md5 failures are because
> > of the lack of SSL, so I'm not sure that the tests should be 'failing'
> > in this configuration.
>
> I think that's a serious issue to consider. As so much code now depends
> on OpenSSL, setup.py should try harder to find it. E.g. on the build
> slave, it can be found in /usr/sfw - not sure whether that is normal
> on a Solaris 10 installation, and not sure whether there is a
> Sun-provided OpenSSL on Solaris 8.
>
> Notice that the tests don't 'fail', they are skipped. There are also
> failing test cases, something that is more worrisome than a skipped
> test case.

Yes, this is a serious issue -- we are totally dependent on openssl
for computing MD5 checksums. Several modules use MD5 checksums
casually, and it's not good that these fail when openssl isn't
available (or if it's too old, like what happened on an ancient Red
Hat 7.3 system I have at home). I'm tempted to put the old
RSA-copyrighted md5.c back in as a fallback, even though its license
is impopular. Or perhaps we could make a copy of a small fraction of
openssl and use that? I think MD5 is the only one that's popular
enough to warrant this treatment; I think SHA1 is a distant second.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Sep  5 17:04:13 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 5 Sep 2007 08:04:13 -0700
Subject: [Python-3000] Questions about PEP 3121
In-Reply-To: <46DE8C50.6090600@v.loewis.de>
References: 
	<46DE8C50.6090600@v.loewis.de>
Message-ID: 

On 9/5/07, "Martin v. L?wis"  wrote:
> Yes; I'm not certain whether module reloading continues to be supported
> in Py3k or not. If not, it should be removed from the PEP, if yes, it
> should be specified.

I'm already missing the reload() builtin, so I think it should be kept
around in some form. I expect some form of reload functionality will
remain available, perhaps somewhere in the imp module.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Sep  5 17:08:02 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 5 Sep 2007 08:08:02 -0700
Subject: [Python-3000] bytes C API in 2.6 for easy transition to 3.0?
In-Reply-To: <52dc1c820709042253j5c69e4e5l1d3f953526c051a4@mail.gmail.com>
References: <1636919686236946180@unknownmsgid>
	<52dc1c820709042253j5c69e4e5l1d3f953526c051a4@mail.gmail.com>
Message-ID: 

On 9/4/07, Gregory P. Smith  wrote:
>
> On 9/4/07, Bill Janssen  wrote:
> > According to PEP 358, "bytes" will be in both 2.6 and 3.0.  It would
> > be nice if the C API for "bytes" existed in the trunk, so that it
> > could be used for new code that will port more easily to 3.0 .
>
> I assume this includes the new buffer api since we really seem to want C API
> users to use that rather than bytes objects directly?

Well, in a pinch the old buffer API would work (the 3.0 bytes object
used that until recently :-) but Travis told me he is planning to
backport PEP 3118 to 2.6, so eventually that will happen, yes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Wed Sep  5 17:36:38 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 5 Sep 2007 09:36:38 -0600
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: 
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
Message-ID: <52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>

> Yes, this is a serious issue -- we are totally dependent on openssl
> for computing MD5 checksums. Several modules use MD5 checksums
> casually, and it's not good that these fail when openssl isn't
> available (or if it's too old, like what happened on an ancient Red
> Hat 7.3 system I have at home). I'm tempted to put the old
> RSA-copyrighted md5.c back in as a fallback, even though its license
> is impopular. Or perhaps we could make a copy of a small fraction of
> openssl and use that? I think MD5 is the only one that's popular
> enough to warrant this treatment; I think SHA1 is a distant second.


Every OS I use has openssl installed so i figured someone else had made the
same decision and removed the non-openssl variants.  Are there really
non-linux/bsd/osx installations out there where anyone intends to build and
install python that do -not- have openssl installed somewhere?  That'd be
sad but in that case we shouldn't abandon them.  Modifying setup.py to find
it installed in a different place should be easy if thats all it takes.

Rather than resurrecting the old RSA-copyright md5.c I can easily make new
ones out of the libtomcrypt md5 and sha1 sources the same way i created the
non-openssl sha256 and sha512 modules.

We should not limit ourselves to only md5 if we do that, lets guarantee that
md5, sha1 - sha512 are available on all future python installs; its not
difficult.  I'll do the work if we need it.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070905/4364f46f/attachment.htm 

From nick.bastin at gmail.com  Wed Sep  5 17:51:30 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 5 Sep 2007 11:51:30 -0400
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <46DE90B0.4050905@v.loewis.de>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
Message-ID: <66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>

On 9/5/07, "Martin v. L?wis"  wrote:
 > I think that's a serious issue to consider. As so much code now depends
> on OpenSSL, setup.py should try harder to find it. E.g. on the build
> slave, it can be found in /usr/sfw - not sure whether that is normal
> on a Solaris 10 installation, and not sure whether there is a
> Sun-provided OpenSSL on Solaris 8.

There is not.  I can put OpenSSL in my environment, but I do not
usually build with it as I can't build with it on many non-US
installations.  If we really just need OpenSSL for hashing most of the
time, we should probably try to implement that somewhere else.  The
2.5 "What's new" documentation said that hashlib used OpenSSL when
available, but it appears to be requiring OpenSSL?

> Notice that the tests don't 'fail', they are skipped. There are also
> failing test cases, something that is more worrisome than a skipped
> test case.

The tests that I marked as "fail" in my email are marked as "fail" by
the unittest framework.  It is 'wrong' in some of these cases, because
it should have skipped the tests, but it didn't.  I also think that
unittest shouldn't think that SSL-related skips are unexpected if I
don't have SSL, but that's a bone to pick for another day.

--
Nick

From martin at v.loewis.de  Wed Sep  5 17:54:39 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 05 Sep 2007 17:54:39 +0200
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>	
	
	<52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
Message-ID: <46DED13F.6080705@v.loewis.de>

> Every OS I use has openssl installed so i figured someone else had made
> the same decision and removed the non-openssl variants.  Are there
> really non-linux/bsd/osx installations out there where anyone intends to
> build and install python that do -not- have openssl installed
> somewhere? 

Most certainly. Commercial Unix vendors have been very hesitant to
include open source software in any form, as they are worried about
having to maintain it without having control over it.
Sun started recently, but I'm not sure whether you could get a
Sun-packaged OpenSSL with Solaris 8 (say). I would expect it's worse for
AIX and HP-UX, although IBM's recent open-source strategy may have made
life easier for AIX users.

> We should not limit ourselves to only md5 if we do that, lets guarantee
> that md5, sha1 - sha512 are available on all future python installs; its
> not difficult.  I'll do the work if we need it.

Ok - start with the buildbots. It's easy to see whether it works; if it
doesn't, you can probably get accounts on the machines to see whether
OpenSSL is included, or some guideline from people familiar with
the systems.

Regards,
Martin

From martin at v.loewis.de  Wed Sep  5 18:09:36 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 05 Sep 2007 18:09:36 +0200
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
Message-ID: <46DED4C0.20406@v.loewis.de>

> There is not.  I can put OpenSSL in my environment

What do you "I can put". You compile it yourself? Why not use
the Sun-provided one?

> The
> 2.5 "What's new" documentation said that hashlib used OpenSSL when
> available, but it appears to be requiring OpenSSL?

That's for 2.5. In 3.0 (currently), hashlib requires OpenSSL.

> The tests that I marked as "fail" in my email are marked as "fail" by
> the unittest framework.

Ah, ok.

Regards,
Martin

From guido at python.org  Wed Sep  5 18:21:36 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 5 Sep 2007 09:21:36 -0700
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
	<52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
Message-ID: 

On 9/5/07, Gregory P. Smith  wrote:
[Guido]
> > Yes, this is a serious issue -- we are totally dependent on openssl
> > for computing MD5 checksums. Several modules use MD5 checksums
> > casually, and it's not good that these fail when openssl isn't
> > available (or if it's too old, like what happened on an ancient Red
> > Hat 7.3 system I have at home). I'm tempted to put the old
> > RSA-copyrighted md5.c back in as a fallback, even though its license
> > is impopular. Or perhaps we could make a copy of a small fraction of
> > openssl and use that? I think MD5 is the only one that's popular
> > enough to warrant this treatment; I think SHA1 is a distant second.
>
> Every OS I use has openssl installed so i figured someone else had made the
> same decision and removed the non-openssl variants.  Are there really
> non-linux/bsd/osx installations out there where anyone intends to build and
> install python that do -not- have openssl installed somewhere?  That'd be
> sad but in that case we shouldn't abandon them.  Modifying setup.py to find
> it installed in a different place should be easy if thats all it takes.
>
> Rather than resurrecting the old RSA-copyright md5.c I can easily make new
> ones out of the libtomcrypt md5 and sha1 sources the same way i created the
> non-openssl sha256 and sha512 modules.
>
> We should not limit ourselves to only md5 if we do that, lets guarantee that
> md5, sha1 - sha512 are available on all future python installs; its not
> difficult.  I'll do the work if we need it.

I'd appreciate that -- openssl is a fickle dependency.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at ctypes.org  Wed Sep  5 18:51:58 2007
From: theller at ctypes.org (Thomas Heller)
Date: Wed, 05 Sep 2007 18:51:58 +0200
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
Message-ID: 

While we're at solaris, I would appreciate if some solaris expert(s)
could take a look at http://bugs.python.org/issue1777530 

Thanks,
Thomas


From nick.bastin at gmail.com  Wed Sep  5 19:54:57 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 5 Sep 2007 13:54:57 -0400
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: 
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
Message-ID: <66d0a6e10709051054v974178djb5dd589befaa384@mail.gmail.com>

On 9/5/07, Guido van Rossum  wrote:
> On 9/5/07, "Martin v. L?wis"  wrote:
> > > This is a combination question-and-status-report email.  The question
> > > would be, what does the "somewhat" tag mean on Solaris support in the
> > > release notes for 3.0a1, and does someone have a list of things that
> > > don't work, or does that just mean it hasn't been tested?
> >
> > Not sure what "somewhat" means, but you can take a look at the build
> > failures in the Solaris buildbot - this is what is "officially" known
> > not to work.
>
> The "somewhat" was my word -- I meant that when I last looked at the
> Solaris buildbot, I saw a few failures; and also that I don't have
> access to Sun hardware. And also what Martin says below. I'd be happy
> though to replace "somewhat" with specific indications of h/w and s/w
> versions if you are willing to commit to supporting these throughout
> the 3.0 life cycle.

I have access to Solaris 8 and 9 on Sparc, and Solaris 10 on x86.  My
Solaris 10 x86 installation is currently in a VM, and it's unpleasant
to work with (performance is terrible for some reason), but I can at
least make a passing attempt to build and run unit tests in that
environment.  I have to have Python on Sparc for my application, so
I'm going to continue to work on Python 3.0 on Solaris 8/9 for Sparc
throughout the entire cycle to make sure that we have a usable product
there.

> > As always with Solaris, there are several dimensions to be considered:
> > - version (2.5,2.6,7,8,9,10,11); not sure what the oldest Solaris
> >   version is that we still want to support.
> > - compiler: gcc vs. SunPRO/Forte
> > - 32 vs. 64 bits
> > - SPARC vs. x86

I will at least build and test the following configurations.  I will
also attempt to fix any platform specific bugs, but I suspect the
Unicode failures are going to create some interesting discussions
around here.  :-)

Solaris 8, 32-bit, 64-bit, Sparc, gcc and SunPro 11
Solaris 9, 32-bit, 64-bit, Sparc, gcc and SunPro 11

I will try to get to:

Solaris 10, 32-bit, x86, gcc

Because there's no reason not to since I have an x86 machine and VMWare.  :-)

> > > If anyone wants more data on any of these particular failures, let me
> > > know, otherwise I'm going to start working through the ones that fail
> > > in 3.0 that don't fail in 2.6.  All of the _md5 failures are because
> > > of the lack of SSL, so I'm not sure that the tests should be 'failing'
> > > in this configuration.
> >
> > I think that's a serious issue to consider. As so much code now depends
> > on OpenSSL, setup.py should try harder to find it. E.g. on the build
> > slave, it can be found in /usr/sfw - not sure whether that is normal
> > on a Solaris 10 installation, and not sure whether there is a
> > Sun-provided OpenSSL on Solaris 8.
> >
> > Notice that the tests don't 'fail', they are skipped. There are also
> > failing test cases, something that is more worrisome than a skipped
> > test case.
>
> Yes, this is a serious issue -- we are totally dependent on openssl
> for computing MD5 checksums. Several modules use MD5 checksums
> casually, and it's not good that these fail when openssl isn't
> available (or if it's too old, like what happened on an ancient Red
> Hat 7.3 system I have at home). I'm tempted to put the old
> RSA-copyrighted md5.c back in as a fallback, even though its license
> is impopular. Or perhaps we could make a copy of a small fraction of
> openssl and use that? I think MD5 is the only one that's popular
> enough to warrant this treatment; I think SHA1 is a distant second.

MD5 is defined in RFC 1321, there's no reason to have to use any
particular code with a bad license - there's plenty of LGPL MD5
implementations out there (although you could probably argue that if
they'd ever looked at 1321, which they almost certainly did, then
they've been tainted by the RSA code).

Also, the NIST SHA-1/256/384/512 code is freely available, there's
also no reason to rely on OpenSSL for it (although it looks like the
PKI reference implementation links that I can find are dead, so we
might have to hunt a little bit).

In either case, we could probably copy the relevant pieces out of OpenSSL.

--
Nick

From greg at krypto.org  Wed Sep  5 21:12:37 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 5 Sep 2007 13:12:37 -0600
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <66d0a6e10709051054v974178djb5dd589befaa384@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
	<66d0a6e10709051054v974178djb5dd589befaa384@mail.gmail.com>
Message-ID: <52dc1c820709051212r4a22a917k47dd6e69c15b591a@mail.gmail.com>

>
> Also, the NIST SHA-1/256/384/512 code is freely available, there's
> also no reason to rely on OpenSSL for it (although it looks like the
> PKI reference implementation links that I can find are dead, so we
> might have to hunt a little bit).
>
> In either case, we could probably copy the relevant pieces out of OpenSSL.


No.  OpenSSL hashlib support was added for a good reason.  Its
implementations are *much* faster as it includes platform optimized versions
of all hash algorithms that are continually being updated tweaked and
tuned.  OpenSSL itself also doesn't lend itself to cut and paste very well.
libtomcrypt is the ideal completely unencumbered basic C implementation of
all hash and crypto algorithms and is easy to cut from. We already use it
for sha256/512 when needed, i'll do it for the non-openssl md5 and sha1
modules in the next week or so.

Someone could also implement all these hash algorithms in python.  Bad
idea.  Not what python is good at. :)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070905/e275ecbb/attachment.htm 

From brett at python.org  Wed Sep  5 21:49:03 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 5 Sep 2007 12:49:03 -0700
Subject: [Python-3000] Questions about PEP 3121
In-Reply-To: 
References: 
	<46DE8C50.6090600@v.loewis.de>
	
Message-ID: 

On 9/5/07, Guido van Rossum  wrote:
> On 9/5/07, "Martin v. L?wis"  wrote:
> > Yes; I'm not certain whether module reloading continues to be supported
> > in Py3k or not. If not, it should be removed from the PEP, if yes, it
> > should be specified.
>
> I'm already missing the reload() builtin, so I think it should be kept
> around in some form. I expect some form of reload functionality will
> remain available, perhaps somewhere in the imp module.

+1 on having imp.reload().

-Brett

From oliphant.travis at ieee.org  Wed Sep  5 22:45:55 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 05 Sep 2007 15:45:55 -0500
Subject: [Python-3000] bug in py3k buffer object?
In-Reply-To: 
References: 
Message-ID: 

Lisandro Dalcin wrote:
> Dear Travis, in my MPI wrappers, I use MPI_Alloc_mem function to get
> 'special' MPI memory, and next I return it to Python using
> 
> return PyBuffer_FromReadWriteMemory(ptr, len);
> 
> Well, getting back this rw-buffer in python, I tried to do
> 
> mem = MPI.Alloc_mem(10)
> mem[:] = str8('\0') * 8 # sort of memzero
> 
> but then I get this error:
> 
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: buffer is read-only
> 
> 
> I noticed you use PyBuff_SIMPLE in
> buffer_ass_item/buffer_ass_subscript... Is this OK? perhaps
> PyBuf_WRITEABLE is the right flag? No much more time to go deeper.
> 

yes, I see the problem.  The problem is with get_buf not setting 
view->readonly when it the buffer object has a NULL base (i.e. its own 
memory).

I'll fix this and check it in as soon as I get on a machine with 
check-in possibilities.

-Travis


From nick.bastin at gmail.com  Wed Sep  5 22:44:22 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 5 Sep 2007 16:44:22 -0400
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <52dc1c820709051212r4a22a917k47dd6e69c15b591a@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
	<66d0a6e10709051054v974178djb5dd589befaa384@mail.gmail.com>
	<52dc1c820709051212r4a22a917k47dd6e69c15b591a@mail.gmail.com>
Message-ID: <66d0a6e10709051344u261bc56bsd1e1369a0cadd5c0@mail.gmail.com>

On 9/5/07, Gregory P. Smith  wrote:
> No.  OpenSSL hashlib support was added for a good reason.  Its
> implementations are *much* faster as it includes platform optimized versions
> of all hash algorithms that are continually being updated tweaked and tuned.
>  OpenSSL itself also doesn't lend itself to cut and paste very well.
> libtomcrypt is the ideal completely unencumbered basic C implementation of
> all hash and crypto algorithms and is easy to cut from. We already use it
> for sha256/512 when needed, i'll do it for the non-openssl md5 and sha1
> modules in the next week or so.

I don't care where you get them from.. :-)  I would pull them from
NIST myself for the SHA code, and just take the md5 code from the RFC
(because I would argue that anyone who has implemented their own md5
algorithm is tainted by the RFC code anyhow), and play by the
copyright notice.

My interest would be in just maintaining the capability, and if you
want it optimized, there's no reason for us to maintain that ourselves
outside of the OpenSSL code base.

--
Nick

From brett at python.org  Thu Sep  6 01:38:06 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 5 Sep 2007 16:38:06 -0700
Subject: [Python-3000] Google spreadsheet to collaborate on backporting Py3K
	stuff to 2.6
Message-ID: 

Neal, Anthony, Thomas W., and I have a spreadsheet that was started to
keep track of what needs to be done in what needs to be done in 2.6
for Py3K transitioning:
http://spreadsheets.google.com/pub?key=pCKY4oaXnT81FrGo3ShGHGg .  I am
opening the spreadsheet up to everyone so that others can help
maintain it.

There is a sheet in the Python 3000 Tasks spreadsheet that should be
merged into this spreadsheet and then deleted.  If anyone wants to
help with that it would be great (once something has been moved from
"Python 3000 Tasks" to "Python 2 -> 3 transition" just delete it from
"Python 3000 Tasks").

Because Neal created this spreadsheet he is the only one who can open
editing to everyone.  If you would like to have edit abilities to the
spreadsheet just reply to this email saying you want an invite and I
will add you manually (and if you want a different address added just
say so).

-Brett

From guido at python.org  Thu Sep  6 07:06:06 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 5 Sep 2007 22:06:06 -0700
Subject: [Python-3000] test__locale failing on Red Hat 7.3 system for et_EE
	locale
Message-ID: 

test__locale (that's two underscores, testing _locale.c) fails on my
Red Hat 7.3 box. Further investigation shows that it's because the
et_EE locale (Estonia(n)) defines the thousands separator as '\xa0'
(no-break space U+00A0). Both localeconv() and nl_langinfo() use
PyUnicode_FromString() which assumes UTF-8, and hence the decoding
fails.

On my OSX box, the thousands separator in the et_EE locale is a
regular space.. On a Red Hat 9 box I have access to at work it is
'\xa0' as well (tested with Python2.4; I assume Python 3.0 would fail
there too). On my Ubuntu box that locale is unsupported.

I can "fix" it on that particular box by using latin-1 instead, but
that sounds wrong. There's an XXX comment in the code for
nl_langinfo() about possibly converting to wcs (wide character set?).

Any ideas? Removing et_EE from the list of interesting locales in
test__locale.py seems lame.

I did a quick web search and the first few hits are all about an
exchange whereby someone from Estonia asked Red Hat to change the
locale to use 8859-15 and the Red Hat guy point blank refused, saying
it was the Estonians own fault for having submitted incorrect locale
info a few years before. (But in 8859-15 \xa0 is the same no-break
space character as it is in Latin-1, so this may all be irrelevant.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From noamraph at gmail.com  Thu Sep  6 09:15:31 2007
From: noamraph at gmail.com (Noam Raphael)
Date: Thu, 6 Sep 2007 10:15:31 +0300
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: <46DD7518.7070108@gmail.com>
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<46DD7518.7070108@gmail.com>
Message-ID: 

(Sorry, it turns out that I posted this reply only to Nick and not to
the list, so I post it again.)

On 9/4/07, Nick Coghlan  wrote:
> Containment and iteration really do need to be kept consistent and
> having the value matter when checking for dictionary containment would
> be outright bizarre. Put the two together and it makes sense for
> dictionary iteration and containment tests to both be based on keys.
>
I absolutely agree that containment and iteration should be kept consistent.

I suggest (again, ignoring backwards compatibility completely), that
"in" would behave according to the iteration, that is, check if the
tuple (key, value) is in dict.items(). If you prefer code:

class DreamDict(dict):
   def __iter__(self):
       return self.iteritems()
   def __contains__(self, (key, value)):
       try:
           myvalue = self[key]
       except KeyError:
           return False
       return value == myvalue

Indeed, the suggested "in" operator is not very useful, so you'll
usually use has_key. But I actually think that "d.has_key(k)" is
clearer than "k in d" - There's no "syntactic" reason why "k in d"
should mean "k in d.keys()" and not "k in d.values()".*

Noam

From krstic at solarsail.hcs.harvard.edu  Thu Sep  6 09:49:44 2007
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Thu, 6 Sep 2007 03:49:44 -0400
Subject: [Python-3000] 3.0 crypto (was: Re: Solaris support in 3.0?)
In-Reply-To: <46DED4C0.20406@v.loewis.de>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
Message-ID: <5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>

On Sep 5, 2007, at 12:09 PM, Martin v. L?wis wrote:
> That's for 2.5. In 3.0 (currently), hashlib requires OpenSSL.

On the wider subject of crypto in Python, is there someone who  
actively takes care of this area and who could clarify any legal/ 
export restrictions on what gets included with the source distribution?

There's good-quality, suitably licensed crypto code out there  
implementing most of the major ciphers, hashes, and asymmetric  
cryptosystems. I'd love it if we included a real set of crypto  
batteries with 3.0 that didn't depend on outside libraries, and  
provided more than just a hash or two. Doing the work isn't a  
problem. Is legalese?

--
Ivan Krsti?  | http://radian.org

From martin at v.loewis.de  Thu Sep  6 10:09:26 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 06 Sep 2007 10:09:26 +0200
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
Message-ID: <46DFB5B6.1020807@v.loewis.de>

> On the wider subject of crypto in Python, is there someone who actively
> takes care of this area and who could clarify any legal/export
> restrictions on what gets included with the source distribution?

The PSF does (more specifically, the PSF board, and even more
specifically, Tim Peters). We have registered Python with the U.S. BXA
(or whatever the name of this agency is), allowing export of Python
from the U.S. to all countries (with a few exceptions, I believe).

This is, of course, fairly immaterial, as both the Python source
code and the Python releases are located on a server in the Netherlands,
so downloading it from www.python.org is not an export from the U.S.

There are more issues, of course: some countries restrict the use
of cryptography. France is given as an example: you need to register
your cryptography keys with the government (SCSSI) before you can
use confidentiality-oriented algorithms, IIUC.

> There's good-quality, suitably licensed crypto code out there
> implementing most of the major ciphers, hashes, and asymmetric
> cryptosystems. I'd love it if we included a real set of crypto batteries
> with 3.0 that didn't depend on outside libraries, and provided more than
> just a hash or two. Doing the work isn't a problem. Is legalese?

Why do you say that doing the work is not a problem? I see it as
a major problem.

In addition, other people also see other problems, like size of the
distribution, fear of cryptography in general, and so on.

Regards,
Martin

From p.f.moore at gmail.com  Thu Sep  6 10:29:22 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 6 Sep 2007 09:29:22 +0100
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
	<52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
Message-ID: <79990c6b0709060129s458f6ce4t71e128a4a4f6e2dd@mail.gmail.com>

On 05/09/07, Gregory P. Smith  wrote:
> Rather than resurrecting the old RSA-copyright md5.c I can easily make new
> ones out of the libtomcrypt md5 and sha1 sources the same way i created the
> non-openssl sha256 and sha512 modules.

Which reminds me - when I build Python 3 (on an Ubuntu box) with
openssl installed, I get a message about _sha256 and _sha512 not being
built. Presumably this is intentional? (It looks a bit odd, and I
spent a while trying to work out what dependencies I needed before
realising it was probably OK).

Paul.

From krstic at solarsail.hcs.harvard.edu  Thu Sep  6 12:03:45 2007
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Thu, 6 Sep 2007 06:03:45 -0400
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <46DFB5B6.1020807@v.loewis.de>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
Message-ID: <308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>

On Sep 6, 2007, at 4:09 AM, Martin v. L?wis wrote:
> There are more issues, of course: some countries restrict the use
> of cryptography. France is given as an example: you need to register
> your cryptography keys with the government (SCSSI) before you can
> use confidentiality-oriented algorithms, IIUC.

This gets at what most interests me -- namely, whether there's a  
strong legal barrier to including more crypto with Python than just  
the hashes we have at the moment. It sounds like the answer is 'yes',  
but what are the details?

> Why do you say that doing the work is not a problem? I see it as
> a major problem.

I'm willing to either do the work myself, or have someone else from  
the secops team at OLPC do it.

> In addition, other people also see other problems, like size of the
> distribution, fear of cryptography in general, and so on.

The distribution size issue can be mitigated by a reasonable choice  
of supported primitives. I don't think we need to ship the crypto  
kitchen sink with Python; we can disqualify known-broken algorithms  
that many libraries still ship, etc.

--
Ivan Krsti?  | http://radian.org

From thomas at python.org  Thu Sep  6 12:13:33 2007
From: thomas at python.org (Thomas Wouters)
Date: Thu, 6 Sep 2007 12:13:33 +0200
Subject: [Python-3000] Default dict iterator should have been iteritems()
In-Reply-To: 
References: 
	<46DD25A6.6070504@canterbury.ac.nz>
	
	
	<46DD7518.7070108@gmail.com>
	
Message-ID: <9e804ac0709060313x6b142672xa84f56cd54a3c5a2@mail.gmail.com>

On 9/6/07, Noam Raphael  wrote:
>
> (Sorry, it turns out that I posted this reply only to Nick and not to
> the list, so I post it again.)
>
> On 9/4/07, Nick Coghlan  wrote:
> > Containment and iteration really do need to be kept consistent and
> > having the value matter when checking for dictionary containment would
> > be outright bizarre. Put the two together and it makes sense for
> > dictionary iteration and containment tests to both be based on keys.
> >
> I absolutely agree that containment and iteration should be kept
> consistent.
>
> I suggest (again, ignoring backwards compatibility completely), that
> "in" would behave according to the iteration, that is, check if the
> tuple (key, value) is in dict.items(). If you prefer code:
>
> class DreamDict(dict):
>    def __iter__(self):
>        return self.iteritems()
>    def __contains__(self, (key, value)):
>        try:
>            myvalue = self[key]
>        except KeyError:
>            return False
>        return value == myvalue
>
> Indeed, the suggested "in" operator is not very useful, so you'll
> usually use has_key. But I actually think that "d.has_key(k)" is
> clearer than "k in d" - There's no "syntactic" reason why "k in d"
> should mean "k in d.keys()" and not "k in d.values()".*


None of what you're saying is new. It's all been said back when iteration
and containment testing were added to the dict type. The choice was
explicitly made for the useful containment test, and the conforming
iteration behaviour. The iteration is not actually less useful, it's just
different. The net result of 'more useful + just as useful' is 'more
useful'. I don't believe the actual experience in the three major releases
since it was added, have convinced anyone that it's a bad idea (in fact, I
had slight misgivings back then, but none what so ever now.) The mapping
types simply don't act as containers of (key, value) pairs.

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070906/109e5dbe/attachment.htm 

From martin at v.loewis.de  Thu Sep  6 12:18:54 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 06 Sep 2007 12:18:54 +0200
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>	
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
	<308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
Message-ID: <46DFD40E.8010705@v.loewis.de>

> This gets at what most interests me -- namely, whether there's a strong
> legal barrier to including more crypto with Python than just the hashes
> we have at the moment. It sounds like the answer is 'yes', but what are
> the details?

The export permission allows for exporting "mass-market" software;
anything you can come up with likely classifies. We need to report
precisely what is included (i.e. what files contain the crypto code).
So with any release that adds new crypto features, a new report to BXA
would formally be necessary.

>> Why do you say that doing the work is not a problem? I see it as
>> a major problem.
> 
> I'm willing to either do the work myself, or have someone else from the
> secops team at OLPC do it.

It's not something that a single person can well do. You will also need
to design APIs, and that traditionally involves the community. If you
create something ad-hoc, I would request that this first gets
field-proven for a few years before being included in the standard
distribution. Then, it would face competition to existing such
solutions.

> The distribution size issue can be mitigated by a reasonable choice of
> supported primitives. I don't think we need to ship the crypto kitchen
> sink with Python; we can disqualify known-broken algorithms that many
> libraries still ship, etc.

Sounds like a PEP topic.

Regards,
Martin

From ndbecker2 at gmail.com  Thu Sep  6 14:33:09 2007
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 06 Sep 2007 08:33:09 -0400
Subject: [Python-3000] pep-0362?
Message-ID: 

http://www.python.org/dev/peps/pep-0362/

This would be helpful for boost::python.  Any thoughts on approving this for
python-3k?


From fdrake at acm.org  Thu Sep  6 14:50:34 2007
From: fdrake at acm.org (Fred Drake)
Date: Thu, 6 Sep 2007 08:50:34 -0400
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
Message-ID: <3D0ACD2D-EDBC-40DC-93EB-A3B264567B85@acm.org>

On Sep 6, 2007, at 8:33 AM, Neal Becker wrote:
> http://www.python.org/dev/peps/pep-0362/
>
> This would be helpful for boost::python.  Any thoughts on approving  
> this for
> python-3k?

The var_args and var_kw_args definitions are a little weird.  Why use  
the empty string instead of None when they aren't used in the signature?

Also, the post-history is blank; perhaps this still needs to be  
presented to the community for review and discussion?  Or perhaps the  
field in the PEP needs to be filled in.


   -Fred

-- 
Fred Drake   




From skip at pobox.com  Thu Sep  6 15:46:44 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 6 Sep 2007 08:46:44 -0500
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
Message-ID: <18144.1220.785244.174063@montanaro.dyndns.org>


    Neal> This would be helpful for boost::python.  Any thoughts on
    Neal> approving this for python-3k?

I haven't read it, but it seems very similar to the new annotations
capability in py3k (pep 3107).  Will that not suffice?

Skip


From skip at pobox.com  Thu Sep  6 15:48:01 2007
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 6 Sep 2007 08:48:01 -0500
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
Message-ID: <18144.1297.46506.699543@montanaro.dyndns.org>


    > I haven't read it, but it seems very similar to the new annotations
    > capability in py3k (pep 3107).  Will that not suffice?

Which I notice has a "Requires: 362" field.  Perhaps you're good to go. ;-)

Skip



From guido at python.org  Thu Sep  6 16:54:04 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 6 Sep 2007 07:54:04 -0700
Subject: [Python-3000] 3.0 crypto (was: Re: Solaris support in 3.0?)
In-Reply-To: <5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
Message-ID: 

[Adding Greg P Smith who owns the hashes, and Bill Janssen who has
recently taken over our SSL support.]

Traditionally this is something for which the core developers haven't
had an inclination, so it's been left to 3rd party packages. The
position of the US government on crypto export hasn't helped - at some
point we felt the need to even ask for permission to include code in
the source code that would link to 3rd party crypto libraries, even if
we weren't distributing those libraries (e.g. openssl). I think this
has calmed down some but I don't know if the requirement to register
anything to do with crypto is completely gone; the PSF generally
doesn't want to bother with such red tape.

I'm not sure what you meant with "doing the work isn't a problem". Are
you volunteering? I think we need someone who understands the red tape
situation most of all. Hopefully I'm worried for nothing.

--Guido

On 9/6/07, Ivan Krsti?  wrote:
> On Sep 5, 2007, at 12:09 PM, Martin v. L?wis wrote:
> > That's for 2.5. In 3.0 (currently), hashlib requires OpenSSL.
>
> On the wider subject of crypto in Python, is there someone who
> actively takes care of this area and who could clarify any legal/
> export restrictions on what gets included with the source distribution?
>
> There's good-quality, suitably licensed crypto code out there
> implementing most of the major ciphers, hashes, and asymmetric
> cryptosystems. I'd love it if we included a real set of crypto
> batteries with 3.0 that didn't depend on outside libraries, and
> provided more than just a hash or two. Doing the work isn't a
> problem. Is legalese?
>
> --
> Ivan Krsti?  | http://radian.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Thu Sep  6 16:54:58 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 6 Sep 2007 15:54:58 +0100
Subject: [Python-3000] pep-0362?
In-Reply-To: <18144.1297.46506.699543@montanaro.dyndns.org>
References: 
	<18144.1297.46506.699543@montanaro.dyndns.org>
Message-ID: <79990c6b0709060754m3405cc23o77d014d2c59908ae@mail.gmail.com>

On 06/09/07, skip at pobox.com  wrote:
>
>    > I haven't read it, but it seems very similar to the new annotations
>    > capability in py3k (pep 3107).  Will that not suffice?
>
> Which I notice has a "Requires: 362" field.  Perhaps you're good to go. ;-)

Apparently not (yet, at least).

>\Apps\Python30\python.exe
Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> def f(): pass
...
>>> f.__signature__
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'function' object has no attribute '__signature__'
>>> signature(f)
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'signature' is not defined

Paul.

From brett at python.org  Thu Sep  6 19:41:07 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 6 Sep 2007 10:41:07 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: <3D0ACD2D-EDBC-40DC-93EB-A3B264567B85@acm.org>
References: 
	<3D0ACD2D-EDBC-40DC-93EB-A3B264567B85@acm.org>
Message-ID: 

On 9/6/07, Fred Drake  wrote:
> On Sep 6, 2007, at 8:33 AM, Neal Becker wrote:
> > http://www.python.org/dev/peps/pep-0362/
> >
> > This would be helpful for boost::python.  Any thoughts on approving
> > this for
> > python-3k?
>
> The var_args and var_kw_args definitions are a little weird.  Why use
> the empty string instead of None when they aren't used in the signature?
>

I think because when it was designed there was discussions going on
about not having different behavior based on types or something.

> Also, the post-history is blank; perhaps this still needs to be
> presented to the community for review and discussion?  Or perhaps the
> field in the PEP needs to be filled in.

The open issues were brought up on python-dev but they were never resolved.

-Brett

From brett at python.org  Thu Sep  6 19:43:04 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 6 Sep 2007 10:43:04 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: <18144.1297.46506.699543@montanaro.dyndns.org>
References: 
	<18144.1297.46506.699543@montanaro.dyndns.org>
Message-ID: 

On 9/6/07, skip at pobox.com  wrote:
>
>     > I haven't read it, but it seems very similar to the new annotations
>     > capability in py3k (pep 3107).  Will that not suffice?
>
> Which I notice has a "Requires: 362" field.  Perhaps you're good to go. ;-)

I think that is there because an original version of PEP 3107 put all
of the annotation information into the Signature object and not
directly on to the function.

-Brett

From brett at python.org  Thu Sep  6 19:44:10 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 6 Sep 2007 10:44:10 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: <18144.1220.785244.174063@montanaro.dyndns.org>
References: 
	<18144.1220.785244.174063@montanaro.dyndns.org>
Message-ID: 

On 9/6/07, skip at pobox.com  wrote:
>
>     Neal> This would be helpful for boost::python.  Any thoughts on
>     Neal> approving this for python-3k?
>
> I haven't read it, but it seems very similar to the new annotations
> capability in py3k (pep 3107).  Will that not suffice?

There are different ideas here.  Signature objects are meant to
collect all of the various pieces of information about parameters into
a single place for easier introspection.  Annotations are just a part
of what is exposed for introspection.

-Brett

From qrczak at knm.org.pl  Thu Sep  6 20:58:41 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Thu, 06 Sep 2007 20:58:41 +0200
Subject: [Python-3000] Default dict iterator should have been	iteritems()
In-Reply-To: 
References: 
	
Message-ID: <1189105122.15072.29.camel@qrnik>

Dnia 04-09-2007, Wt o godzinie 11:09 +0200, Georg Brandl napisa?(a):

> Even if it's true that a loop over items is more common than a loop over keys,
> "x in keys" is much more common than "x in items".

In my language iterating over dict yields (key,value) pairs, but the
equivalent of "x in dict" checks whether a key is present.

My Kogut<->Python binding is smart enough to convert these conventions
(which needed some work anyway because tuples could not be converted
implicitly between the languages). An ugly part of the conversion was
distinguishing between Python dictionaries, sequences and sets by the
presence of some methods. For the curious, bits of the binding are at
http://kokogut.cvs.sourceforge.net/kokogut/kokogut/lib/Python/Foreign/Python/Collection.ko?view=markup
http://kokogut.cvs.sourceforge.net/kokogut/kokogut/lib/Python/Foreign/Python/KogutObject.ko?view=markup

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From jjb5 at cornell.edu  Thu Sep  6 21:40:57 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Thu, 06 Sep 2007 15:40:57 -0400
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
Message-ID: <46E057C9.6000206@cornell.edu>

> http://www.python.org/dev/peps/pep-0362/
> 
> This would be helpful for boost::python.

Speaking of helpful...

     class X:
         def f(self): pass

     class Y(X): pass

...I would like a mechanism to indicate that Y.f is inherited, and I was 
hoping that perhaps that information could be found in its signature.  I 
see that it's not, would it be another PEP to add it?  (It was a bit of 
an eye opener when I first found out that Y.f.im_class wasn't X.)


Joel

From brett at python.org  Thu Sep  6 22:43:02 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 6 Sep 2007 13:43:02 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: <46E057C9.6000206@cornell.edu>
References:  <46E057C9.6000206@cornell.edu>
Message-ID: 

On 9/6/07, Joel Bender  wrote:
> > http://www.python.org/dev/peps/pep-0362/
> >
> > This would be helpful for boost::python.
>
> Speaking of helpful...
>
>      class X:
>          def f(self): pass
>
>      class Y(X): pass
>
> ...I would like a mechanism to indicate that Y.f is inherited, and I was
> hoping that perhaps that information could be found in its signature.  I
> see that it's not, would it be another PEP to add it?  (It was a bit of
> an eye opener when I first found out that Y.f.im_class wasn't X.)

Something like this could go into the 'inspect' module (didn't even
worry about __slots__)::

  def find_def(meth):
    for cls in meth.im_class.mro():
        if meth.im_func.__name__ in cls.__dict__:
            return cls
    else:
        return None

For such a simple addition to inspect you just need a patch that has a
good implementation, thorough unit tests, and a core developer who
thinks it is worthwhile enough to add the functionality.

-Brett

From collinw at gmail.com  Thu Sep  6 22:49:58 2007
From: collinw at gmail.com (Collin Winter)
Date: Thu, 6 Sep 2007 13:49:58 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
	<18144.1297.46506.699543@montanaro.dyndns.org>
	
Message-ID: <43aa6ff70709061349l4d5cb9e4ge2311efe3267e700@mail.gmail.com>

On 9/6/07, Brett Cannon  wrote:
> On 9/6/07, skip at pobox.com  wrote:
> >
> >     > I haven't read it, but it seems very similar to the new annotations
> >     > capability in py3k (pep 3107).  Will that not suffice?
> >
> > Which I notice has a "Requires: 362" field.  Perhaps you're good to go. ;-)
>
> I think that is there because an original version of PEP 3107 put all
> of the annotation information into the Signature object and not
> directly on to the function.

Correct. I'll remove the references to 362 from PEP 3107.

Collin Winter

From guido at python.org  Thu Sep  6 23:10:44 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 6 Sep 2007 14:10:44 -0700
Subject: [Python-3000] [Python-Dev] Google spreadsheet to collaborate on
	backporting Py3K stuff to 2.6
In-Reply-To: 
References: 
Message-ID: 

I've transferred everything from my spreadsheet to Neal's.

On 9/5/07, Brett Cannon  wrote:
> Neal, Anthony, Thomas W., and I have a spreadsheet that was started to
> keep track of what needs to be done in what needs to be done in 2.6
> for Py3K transitioning:
> http://spreadsheets.google.com/pub?key=pCKY4oaXnT81FrGo3ShGHGg .  I am
> opening the spreadsheet up to everyone so that others can help
> maintain it.
>
> There is a sheet in the Python 3000 Tasks spreadsheet that should be
> merged into this spreadsheet and then deleted.  If anyone wants to
> help with that it would be great (once something has been moved from
> "Python 3000 Tasks" to "Python 2 -> 3 transition" just delete it from
> "Python 3000 Tasks").
>
> Because Neal created this spreadsheet he is the only one who can open
> editing to everyone.  If you would like to have edit abilities to the
> spreadsheet just reply to this email saying you want an invite and I
> will add you manually (and if you want a different address added just
> say so).
>
> -Brett
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Thu Sep  6 23:49:42 2007
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 6 Sep 2007 14:49:42 -0700
Subject: [Python-3000] pep-0362?
In-Reply-To: 
References: 
Message-ID: <20070906214942.GB439@panix.com>

On Thu, Sep 06, 2007, Neal Becker wrote:
>
> http://www.python.org/dev/peps/pep-0362/
> 
> This would be helpful for boost::python.  Any thoughts on approving this for
> python-3k?

What would be helpful IMO is using a Subject: line that doesn't require
using a browser to find out what the thread is about.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
http://www.lysator.liu.se/c/ten-commandments.html

From nick.bastin at gmail.com  Fri Sep  7 18:29:44 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Fri, 7 Sep 2007 12:29:44 -0400
Subject: [Python-3000] Performance Notes
In-Reply-To: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com>
References: <66d0a6e10709031154x6ea3d235ya894014ecdf546a2@mail.gmail.com>
Message-ID: <66d0a6e10709070929p6897f69cq940655b2fd46ac0b@mail.gmail.com>

On 9/3/07, Nicholas Bastin  wrote:
> NOTE:  This data is time sampling, not call graph.  Added time could
> come from either more calls, or longer calls.
>
> +312.9% PyDict_GetItem

I've finally managed to get call graph data and it's fairly
interesting for this call.  I try to find some way to post all of the
data at some point, but I thought some initial data might be useful.

Calls to PyDict_GetItem in 2.6 (pystone.py 10000):

160839 - instance_getattr2
30325 - class_lookup
5545 - PyString_InternInPlace
4808 - update_one_slot
2290 - PyObject_GenericGetAttr
...
Total: 208697

3.0 (pystone.py 10000):

575093 - PyEval_EvalFrameEx
416600 - PyObject_GenericGetAttr
321447 - PyObject_GenericSetAttr
25394 - update_one_slot
10142 - lookup_maybe
8925 - PyUnicode_InternInPlace
...
Total: 1368114

Almost all (522631) of the extra calls in PyEval_EvalFrameEx are
because in 2.6 we use the unrolled code in LOAD_GLOBAL, and in 3.0,
LOAD_GLOBAL always falls through to PyDict_GetItem.

I haven't investigated GenericGet/SetAttr yet.

--
Nick

From g.brandl at gmx.net  Fri Sep  7 19:24:10 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 07 Sep 2007 19:24:10 +0200
Subject: [Python-3000] clean out the future?
Message-ID: 

Should the __future__ be cleaned out for 3k, or should all future imports
continue to work and do nothing?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From fdrake at acm.org  Fri Sep  7 19:29:54 2007
From: fdrake at acm.org (Fred Drake)
Date: Fri, 7 Sep 2007 13:29:54 -0400
Subject: [Python-3000] clean out the future?
In-Reply-To: 
References: 
Message-ID: 

On Sep 7, 2007, at 1:24 PM, Georg Brandl wrote:
> Should the __future__ be cleaned out for 3k, or should all future  
> imports
> continue to work and do nothing?

They should continue to work.

One advantage of keeping the existing feature table in the __future__  
module is that is makes it easier to avoid re-using a feature name; I  
think there's merit in that.


   -Fred

-- 
Fred Drake   




From nick.bastin at gmail.com  Fri Sep  7 20:30:33 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Fri, 7 Sep 2007 14:30:33 -0400
Subject: [Python-3000] Where is PyUnicodeObject->hash supposed to be set?
Message-ID: <66d0a6e10709071130x600c3383if3718f6ec41395d5@mail.gmail.com>

Before I do a bunch of searching around in the source, perhaps someone
just knows the answer to this question.

A quick trip through the debugger indicates that the reason
PyDict_GetItem is being called 5 million times more often in
PyEval_EvalFrameEx in 3.0 (in pystone 100000) is because while
PyString_CheckExact was swapped out for PyUnicode_CheckExact in
LOAD_GLOBAL, ((PyUnicodeObject*)w)->hash always evaluates to -1, which
punts us down to the non-inline code.  Presumably
((PyStringObject*)w)->ob_shash was already set at this point, which is
why it worked in 2.6 and previous.

Before I spend a lot of time trying to track down where this is
supposed to be getting set (or, needs to be being set), does anyone
know where this is supposed to happen?

--
Nick

From guido at python.org  Fri Sep  7 20:46:28 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 7 Sep 2007 11:46:28 -0700
Subject: [Python-3000] Where is PyUnicodeObject->hash supposed to be set?
In-Reply-To: <66d0a6e10709071130x600c3383if3718f6ec41395d5@mail.gmail.com>
References: <66d0a6e10709071130x600c3383if3718f6ec41395d5@mail.gmail.com>
Message-ID: 

It should be set in unicode_hash(). If you compare the trunk version
of that function with the py3k branch version, you see that it's been
refactored, and in the refactoring, setting ->hash was omitted. It
should be trivial to put it back.

On 9/7/07, Nicholas Bastin  wrote:
> Before I do a bunch of searching around in the source, perhaps someone
> just knows the answer to this question.
>
> A quick trip through the debugger indicates that the reason
> PyDict_GetItem is being called 5 million times more often in
> PyEval_EvalFrameEx in 3.0 (in pystone 100000) is because while
> PyString_CheckExact was swapped out for PyUnicode_CheckExact in
> LOAD_GLOBAL, ((PyUnicodeObject*)w)->hash always evaluates to -1, which
> punts us down to the non-inline code.  Presumably
> ((PyStringObject*)w)->ob_shash was already set at this point, which is
> why it worked in 2.6 and previous.
>
> Before I spend a lot of time trying to track down where this is
> supposed to be getting set (or, needs to be being set), does anyone
> know where this is supposed to happen?
>
> --
> Nick
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Fri Sep  7 20:48:18 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 7 Sep 2007 11:48:18 -0700
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
	<308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
Message-ID: <52dc1c820709071148l2c3061f9l14c929657ef7e397@mail.gmail.com>

On 9/6/07, Ivan Krsti?  wrote:
>
> On Sep 6, 2007, at 4:09 AM, Martin v. L?wis wrote:
> > There are more issues, of course: some countries restrict the use
> > of cryptography. France is given as an example: you need to register
> > your cryptography keys with the government (SCSSI) before you can
> > use confidentiality-oriented algorithms, IIUC.
>
> This gets at what most interests me -- namely, whether there's a
> strong legal barrier to including more crypto with Python than just
> the hashes we have at the moment. It sounds like the answer is 'yes',
> but what are the details?


fwiw hashes are not cryptography.

The distribution size issue can be mitigated by a reasonable choice
> of supported primitives. I don't think we need to ship the crypto
> kitchen sink with Python; we can disqualify known-broken algorithms
> that many libraries still ship, etc.


I see nothing wrong with leaving pycrypto as an add-on library as most
things don't need it.  http://www.amk.ca/python/code/crypto.

The pycrypto API is is very nice.  But if we were to consider it for the
standard library I'd prefer it just link against OpenSSL rather than use its
own C implementations and just leave platforms without ssl without any
crypto.

Besides the chances are that most programmers seeing a crypto library will
misuse it and gain a false sense of security on what they've done. ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070907/5350ec2f/attachment.htm 

From greg at krypto.org  Fri Sep  7 22:45:58 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 7 Sep 2007 13:45:58 -0700
Subject: [Python-3000] Performance Notes - new hash algorithm
Message-ID: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>

On 9/4/07, Thomas Hunger  wrote:
>
>
> Hello,
>
> I don't know much about python internals, so the following might be
> bogus:
>
> I replaced unicode_hash and string_hash with the hash function from
> here: http://www.azillionmonkeys.com/qed/hash.html.
>
> Then I ran the following micro-benchmark :
>
>     $ time ./python bench.py
>
> where bech.py is:
>
>     f = dict((line, nr) for nr, line
>              in enumerate(open('/usr/share/dict/words',
>                                encoding='latin1').readlines()))
>
> Python3k original hash: real    0m2.210s
>               new hash: real    0m1.842s
>
> So maybe this is an interesting hash function?
>
> Tom


Sounds like a great idea to me.  Can you submit it as a patch?

We should run some more realistic perf tests and profiles but I imagine the
impact will only be good.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070907/eef30c77/attachment.htm 

From guido at python.org  Fri Sep  7 22:53:45 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 7 Sep 2007 13:53:45 -0700
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
Message-ID: 

I'd like Tim Peters's input on this before we change it. I seem to
recall that there's an aspect of non-randomness to the existing hash
function that's important when you hash many closely related strings,
e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
so long that I may misremember this, and perhaps it was related to the
dictionary implementation.

In any case we need to see the code as a patch, of course.

On 9/7/07, Gregory P. Smith  wrote:
> On 9/4/07, Thomas Hunger  wrote:
> >
> > Hello,
> >
> > I don't know much about python internals, so the following might be
> > bogus:
> >
> > I replaced unicode_hash and string_hash with the hash function from
> > here: http://www.azillionmonkeys.com/qed/hash.html.
> >
> > Then I ran the following micro-benchmark :
> >
> >     $ time ./python bench.py
> >
> > where bech.py is:
> >
> >     f = dict((line, nr) for nr, line
> >              in enumerate(open('/usr/share/dict/words',
> >
> encoding='latin1').readlines()))
> >
> > Python3k original hash: real    0m2.210s
> >               new hash: real    0m1.842s
> >
> > So maybe this is an interesting hash function?
> >
> > Tom
>
> Sounds like a great idea to me.  Can you submit it as a patch?
>
> We should run some more realistic perf tests and profiles but I imagine the
> impact will only be good.
>
> -gps
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Fri Sep  7 23:13:31 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Fri, 7 Sep 2007 17:13:31 -0400
Subject: [Python-3000] Where is PyUnicodeObject->hash supposed to be set?
In-Reply-To: 
References: <66d0a6e10709071130x600c3383if3718f6ec41395d5@mail.gmail.com>
	
Message-ID: <66d0a6e10709071413q2d7532edh6d94f43a0e790b81@mail.gmail.com>

On 9/7/07, Guido van Rossum  wrote:
> It should be set in unicode_hash(). If you compare the trunk version
> of that function with the py3k branch version, you see that it's been
> refactored, and in the refactoring, setting ->hash was omitted. It
> should be trivial to put it back.

Putting it back nets an average 1.8% performance gain for pystone, but
probably there were other cases that were extremely bad given this
behaviour.  We're still left with another 5 million 'extra' calls to
PyDict_GetItem in 3.0 over 2.6 in a 100000 cycle pystone run, so I'll
look around into those, but I suspect none of them will generate any
larger performance gain.

Someone with more experience than I in the 3.0 development cycle will
be able to determine what macro-level optimizations / refactoring make
sense, and what design decisions we're just going to have to pay for.
At the moment (and probably for the forseeable moments), I'm focusing
on small improvements across the codebase.

--
Nick

From guido at python.org  Fri Sep  7 23:20:30 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 7 Sep 2007 14:20:30 -0700
Subject: [Python-3000] Where is PyUnicodeObject->hash supposed to be set?
In-Reply-To: <66d0a6e10709071413q2d7532edh6d94f43a0e790b81@mail.gmail.com>
References: <66d0a6e10709071130x600c3383if3718f6ec41395d5@mail.gmail.com>
	
	<66d0a6e10709071413q2d7532edh6d94f43a0e790b81@mail.gmail.com>
Message-ID: 

Can you post the full call graph after this fix (thanks Neil S!)
somewhere, or attach it to an email here?

--Guido

On 9/7/07, Nicholas Bastin  wrote:
> On 9/7/07, Guido van Rossum  wrote:
> > It should be set in unicode_hash(). If you compare the trunk version
> > of that function with the py3k branch version, you see that it's been
> > refactored, and in the refactoring, setting ->hash was omitted. It
> > should be trivial to put it back.
>
> Putting it back nets an average 1.8% performance gain for pystone, but
> probably there were other cases that were extremely bad given this
> behaviour.  We're still left with another 5 million 'extra' calls to
> PyDict_GetItem in 3.0 over 2.6 in a 100000 cycle pystone run, so I'll
> look around into those, but I suspect none of them will generate any
> larger performance gain.
>
> Someone with more experience than I in the 3.0 development cycle will
> be able to determine what macro-level optimizations / refactoring make
> sense, and what design decisions we're just going to have to pay for.
> At the moment (and probably for the forseeable moments), I'm focusing
> on small improvements across the codebase.
>
> --
> Nick
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From qrczak at knm.org.pl  Sat Sep  8 13:59:01 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 08 Sep 2007 13:59:01 +0200
Subject: [Python-3000] Proposed new language for newline parameter
	to	TextIOBase
In-Reply-To: 
References: 
Message-ID: <1189252741.25695.1.camel@qrnik>

Dnia 14-08-2007, Wt o godzinie 21:56 -0700, Guido van Rossum napisa?(a):

> (2) newline='': input with untranslated universal newlines mode; lines
> may end in \r, \n, or \r\n, and these are returned untranslated.
> 
> (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> with the given character(s), and these are translated to \n.

What is the difference between '' and '\n'?

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From guido at python.org  Sat Sep  8 16:27:17 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 8 Sep 2007 07:27:17 -0700
Subject: [Python-3000] Proposed new language for newline parameter to
	TextIOBase
In-Reply-To: <1189252741.25695.1.camel@qrnik>
References: 
	<1189252741.25695.1.camel@qrnik>
Message-ID: 

On 9/8/07, Marcin 'Qrczak' Kowalczyk  wrote:
> Dnia 14-08-2007, Wt o godzinie 21:56 -0700, Guido van Rossum napisa?(a):
>
> > (2) newline='': input with untranslated universal newlines mode; lines
> > may end in \r, \n, or \r\n, and these are returned untranslated.
> >
> > (3) newline='\r', newline='\n', newline='\r\n': input lines must end
> > with the given character(s), and these are translated to \n.
>
> What is the difference between '' and '\n'?

None on output.

On input, "\n" disables universal newline mode altogether ("\r"
doesn't end a line), while "" enables universal newlines for
determining the line ending, but disables the *translation* part,
meaning you will get lines ending in "\r", "\r\n", or "\n" depending
on what's in the input. The default UN mode with translation is easier
for most apps (since it guarantees that lines end in \n like most apps
expect), but the UN mode without translation is handy if you want to
copy the file faithfully (apart from specific edits).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From qrczak at knm.org.pl  Sat Sep  8 18:45:20 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 08 Sep 2007 18:45:20 +0200
Subject: [Python-3000] python3.0-config uses python2 syntax
Message-ID: <1189269920.25695.3.camel@qrnik>

and fails on print.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From g.brandl at gmx.net  Sat Sep  8 18:52:54 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 08 Sep 2007 18:52:54 +0200
Subject: [Python-3000] python3.0-config uses python2 syntax
In-Reply-To: <1189269920.25695.3.camel@qrnik>
References: <1189269920.25695.3.camel@qrnik>
Message-ID: 

Marcin 'Qrczak' Kowalczyk schrieb:
> and fails on print.

Already fixed. :)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From qrczak at knm.org.pl  Sat Sep  8 19:00:39 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 08 Sep 2007 19:00:39 +0200
Subject: [Python-3000] C API for ints and strings
Message-ID: <1189270839.25695.18.camel@qrnik>

I see that PyInt_* functions are aliases for PyLong_*. Which ones
should I use for the long term? There are no PyInt equivalents of
PyLong_FromLongLong nor PyLong_AsLongLong.

Should I continue to use PyUnicode_* functions for the new str?

What is the status of the str8 type? Is it kept temporarily until the
modules are updated to Python3 str, or it is an official immutable bytes
type? Its repr uses s'...' syntax which is not supported by the parser.

Why is _PyLong_FitsInLong private? In order to convert a Python3 int to
another numeric representation, I would like to check if it fits in a C
long, and convert via a string only if it does not. Should I use
PyLong_AsLong + PyErr_Occurred?

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From guido at python.org  Sat Sep  8 19:12:00 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 8 Sep 2007 10:12:00 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <1189270839.25695.18.camel@qrnik>
References: <1189270839.25695.18.camel@qrnik>
Message-ID: 

On 9/8/07, Marcin 'Qrczak' Kowalczyk  wrote:
> I see that PyInt_* functions are aliases for PyLong_*. Which ones
> should I use for the long term? There are no PyInt equivalents of
> PyLong_FromLongLong nor PyLong_AsLongLong.

Use PyLong for now. Eventually we may rename them all; then we'll
provide a renaming tool or macros.

> Should I continue to use PyUnicode_* functions for the new str?

Correct. Again, eventually we may rename.

> What is the status of the str8 type? Is it kept temporarily until the
> modules are updated to Python3 str, or it is an official immutable bytes
> type? Its repr uses s'...' syntax which is not supported by the parser.

The problem with its repr() is a hint. ;-) it is a temporary hack
until we don't need it any more. During and after the last sprint,
Neal Norwitz did a lot of work towards getting rid of it, but more
needs to be done. Help is welcome!

> Why is _PyLong_FitsInLong private?

I don't know; perhaps because it doesn't always give the best answer.

> In order to convert a Python3 int to
> another numeric representation, I would like to check if it fits in a C
> long, and convert via a string only if it does not. Should I use
> PyLong_AsLong + PyErr_Occurred?

I think either is fine. _PyLong_FitsInLong() will only get better over time. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Sep  8 19:27:11 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 08 Sep 2007 19:27:11 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	
Message-ID: <46E2DB6F.2080608@v.loewis.de>

>> Why is _PyLong_FitsInLong private?
> 
> I don't know; perhaps because it doesn't always give the best answer.

Its sole purpose is to support PyInt_CheckExact. There is some code
that relies that after PyInt_CheckExact succeeds, it is safe to do
PyInt_AsLong. When I defined PyInt_CheckExact to PyLong_CheckExact,
such code would break. Adding this "conservative" estimate allowed
that code to work when the macro was true. As this occurs in some
time-critical places, I did not want to waste time with computing a
correct result.

Regards,
Martin

From guido at python.org  Sat Sep  8 19:29:09 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 8 Sep 2007 10:29:09 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E2DB6F.2080608@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<46E2DB6F.2080608@v.loewis.de>
Message-ID: 

Hm, then perhaps rangeobject.c shouldn't use it?

On 9/8/07, "Martin v. L?wis"  wrote:
> >> Why is _PyLong_FitsInLong private?
> >
> > I don't know; perhaps because it doesn't always give the best answer.
>
> Its sole purpose is to support PyInt_CheckExact. There is some code
> that relies that after PyInt_CheckExact succeeds, it is safe to do
> PyInt_AsLong. When I defined PyInt_CheckExact to PyLong_CheckExact,
> such code would break. Adding this "conservative" estimate allowed
> that code to work when the macro was true. As this occurs in some
> time-critical places, I did not want to waste time with computing a
> correct result.
>
> Regards,
> Martin
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Sep  8 19:38:48 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 08 Sep 2007 19:38:48 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>	
		
	<46E2DB6F.2080608@v.loewis.de>
	
Message-ID: <46E2DE28.1040704@v.loewis.de>

> Hm, then perhaps rangeobject.c shouldn't use it?

That use is correct also; the int_range_iter is also an
optimization. It does not matter that the result is not correct;
if one bound is >2**30, it will create a longrangeiter, even though
an int one would still be sufficient.

Regards,
Martin

From nick.bastin at gmail.com  Sat Sep  8 19:41:10 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sat, 8 Sep 2007 13:41:10 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	
Message-ID: <66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>

On 9/8/07, Guido van Rossum  wrote:
> On 9/8/07, Marcin 'Qrczak' Kowalczyk  wrote:
> > I see that PyInt_* functions are aliases for PyLong_*. Which ones
> > should I use for the long term? There are no PyInt equivalents of
> > PyLong_FromLongLong nor PyLong_AsLongLong.
>
> Use PyLong for now. Eventually we may rename them all; then we'll
> provide a renaming tool or macros.
>
> > Why is _PyLong_FitsInLong private?
>
> I don't know; perhaps because it doesn't always give the best answer.
>
> > In order to convert a Python3 int to
> > another numeric representation, I would like to check if it fits in a C
> > long, and convert via a string only if it does not. Should I use
> > PyLong_AsLong + PyErr_Occurred?
>
> I think either is fine. _PyLong_FitsInLong() will only get better over time. :-)

Speaking of PyLong, and its' minor awkwardness to work with in C (you
either have to convert to another multiple-precision type through a
string, or use Python's arithmetic operators directly), was there any
thought given to using something like GPM's mpz_t as the backing data
type?

--
Nick

From martin at v.loewis.de  Sat Sep  8 19:44:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 08 Sep 2007 19:44:37 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
Message-ID: <46E2DF85.4090005@v.loewis.de>

> Speaking of PyLong, and its' minor awkwardness to work with in C (you
> either have to convert to another multiple-precision type through a
> string, or use Python's arithmetic operators directly), was there any
> thought given to using something like GPM's mpz_t as the backing data
> type?

I never did that.

Regards,
Martin

From janssen at parc.com  Sat Sep  8 21:39:25 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 8 Sep 2007 12:39:25 PDT
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <46DFD40E.8010705@v.loewis.de> 
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
	<308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
	<46DFD40E.8010705@v.loewis.de>
Message-ID: <07Sep8.123933pdt."57996"@synergy1.parc.xerox.com>

> >> Why do you say that doing the work is not a problem? I see it as
> >> a major problem.
> > 
> > I'm willing to either do the work myself, or have someone else from the
> > secops team at OLPC do it.
> 
> It's not something that a single person can well do. You will also need
> to design APIs, and that traditionally involves the community. If you
> create something ad-hoc, I would request that this first gets
> field-proven for a few years before being included in the standard
> distribution. Then, it would face competition to existing such
> solutions.

We're already linking against the OpenSSL EVP libraries for hashlib
(and against the OpenSSL SSL libraries for the SSL support).  It
wouldn't be hard to expose the EVP functions a bit more, essentially
as hash functions that return long (and reversible) hashes:

   encryptor = opensslevp.encryptor("AES-256-CBC", ...maybe some options...)
   encryptor.update(...some plaintext...)
   ...
   cipertext = encryptor.digest()
   ...
   decryptor = opensslevp.decryptor("AES-256-CBC", ...maybe some options...)
   decryptor.update(cipertext)
   plaintext = decryptor.digest()

Take a look at the docs for EVP_EncryptInit_ex.

The crypto would stay in the OpenSSL library; this would just be more
hashing on top of it.

I'd sure like to have this so I could write a Python decryptor for my
PalmOS password keeper (a program called Strip) which I could run on
my iPhone.  (The iPhone Python has SSL support.)

Bill

From nick.bastin at gmail.com  Sat Sep  8 22:47:56 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sat, 8 Sep 2007 16:47:56 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E2DF85.4090005@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
Message-ID: <66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>

On 9/8/07, "Martin v. L?wis"  wrote:
> > Speaking of PyLong, and its' minor awkwardness to work with in C (you
> > either have to convert to another multiple-precision type through a
> > string, or use Python's arithmetic operators directly), was there any
> > thought given to using something like GPM's mpz_t as the backing data
> > type?
>
> I never did that.

Would anyone be opposed to rehosting PyLong on top of GMP?  I'm not
necessarily volunteering to do the work (yet, anyhow), but just trying
to get a read on the feelings of the community.  PyLong has
historically been a bit of a pain to deal with if you embedded or
extended python, or otherwise had to deal with it at the C API level.
With the distinction between int and long being removed at the user
level, it will become more un-pythonic to refuse to accept long
integers in some extensions.

Additionally, something like GMP would likely provide improved
performance, and would be a piece of code, perhaps out of the core
domain knowledge of the core python developers, that we would not have
to maintain.  On the other hand, GMP would become a required library,
not one simply built against if you had it (provided that the issues
with the pervasiveness of the use of OpenSSL are resolved, no external
library is currently required for 'normal' operation of the
interpreter).  Would we want to maintain parallel implementations?
Does this provide a barrier to entry to some platform ports? (I think
not, since it doesn't change the definition of the language, but it's
worth asking).

--
Nick

From martin at v.loewis.de  Sun Sep  9 00:18:10 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 09 Sep 2007 00:18:10 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
Message-ID: <46E31FA2.4060701@v.loewis.de>

> Would anyone be opposed to rehosting PyLong on top of GMP?

I would be opposed. It's LGPL'ed, so you would have to ship GMP sources
with any Python binary that you distribute.

Regards,
Martin


From greg at krypto.org  Sun Sep  9 01:15:58 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 8 Sep 2007 16:15:58 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>
	
Message-ID: <52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>

A new version is attached; cleaned up and simplified based on your original
comments.

On 8/29/07, Guido van Rossum  wrote:
>
> That's a huge patch to land so close before a release. I'm not sure I
> like the immutability API -- it won't be useful unless we add a hash
> method, and then we have all sorts of difficulties again -- the
> distinction between a hashable and an unhashable object should be made
> by type, not by value (tuples containing unhashable values
> notwithstanding).


ok i've removed the immutable support in the most recent patch.  i still
think it -might- be useful but isn't required and you're right that it could
open a can of worms if people think it should also mean hashable.  immutable
bytes may be best implemented as a subclass if its ever wanted.

I don't understand the comment about using PyBUF_WRITABLE in
> _getbuffer() -- this is only used for data we're *reading* and I don't
> think the GIL is even released while we're reading such things.


that appears to be correct.  the comment was wrong.  fixed.

-gps

If you think it's important to get this in the 3.0a1 release, we
> should pair-program on it ASAP, preferable tomorrow morning.
> Otherwise, let's do a review next week.
>
> --Guido
>
> On 8/29/07, Gregory P. Smith  wrote:
> > Attached is what I've come up with so far.  Only a single field is
> > added to the PyBytesObject struct.  This adds support to the bytes
> > object for PyBUF_LOCKDATA buffer API operation.  bytes objects can be
> > marked temporarily read-only for use while the buffer api has handed
> > them off to something which may run without the GIL (think IO).  Any
> > attempt to modify them during that time will raise an exception as I
> > believe Martin suggested earlier.
> >
> > As an added bonus because its been discussed here, support for setting
> > a bytes object immutable has been added since its pretty trivial once
> > the read only export support was in place.  Thats not required but was
> > trivial to include.
> >
> > I'd appreciate any feedback.
> >
> > My TODO list for this patch:
> >
> >  0. Get feedback and make adjustments as necessary.
> >
> >  1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal
> >     uses of the _getbuffer() function.  bytesobject.c contains both
> readonly
> >     and read-write uses of the buffers, i'll add boolean parameter for
> >     that.
> >
> >  2. More testing: a few tests in the test suite fail after this but the
> >     number was low and I haven't had time to look at why or what the
> >     failures were.
> >
> >  3. Exporting methods suggested in the TODO at the top of the file.
> >
> >  4. Unit tests for all of the functionality this adds.
> >
> > NOTE: after these changes I had to make clean and rm -rf build before
> > things would not segfault on import.  I suspect some things (modules?)
> > were not properly recompiled after the bytesobject.h struct change
> > otherwise.
> >
> > -gps
> >
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070908/e3621c4a/attachment-0001.htm 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bytes-lockdata-gps02.patch.txt
Url: http://mail.python.org/pipermail/python-3000/attachments/20070908/e3621c4a/attachment-0001.txt 

From nick.bastin at gmail.com  Sun Sep  9 01:23:13 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sat, 8 Sep 2007 19:23:13 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E31FA2.4060701@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
Message-ID: <66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>

On 9/8/07, "Martin v. L?wis"  wrote:
> > Would anyone be opposed to rehosting PyLong on top of GMP?
>
> I would be opposed. It's LGPL'ed, so you would have to ship GMP sources
> with any Python binary that you distribute.

The LGPL has no requirement that you convey source for unmodified
libraries.  Linkage does not imply modification.

--
Nick

From tim.peters at gmail.com  Sun Sep  9 03:48:56 2007
From: tim.peters at gmail.com (Tim Peters)
Date: Sat, 8 Sep 2007 21:48:56 -0400
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: 
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
	
Message-ID: <1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>

[Guido]
> I'd like Tim Peters's input on this before we change it. I seem to
> recall that there's an aspect of non-randomness to the existing hash
> function that's important when you hash many closely related strings,
> e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
> so long that I may misremember this, and perhaps it was related to the
> dictionary implementation.

Not "important" so much as "possibly helpful" ;-)  This is explained
in comments in dictobject.c.  As it notes there, hashing the strings
"namea", "nameb", "namec", and "named" currently produces (on a
sizeof(long) == 4 box):

-1658398457
-1658398460
-1658398459
-1658398462

That the hash codes are very close but not identical is "a feature",
since the dict implementation only looks at the last k bits (for
various more-or-less small values of k):  this gives "better than
random" dict collision behavior for input strings very close together.

The proposed hash produces instead:

 1892683363
 -970432008
   51735791
 1567337715

Obviously much closer to "random" behavior, but that's not necessarily
a good thing for dicts.

FYI, wrt

    http://www.azillionmonkeys.com/qed/hash.html

Python's current string hash is very similar to (but developed
independently of) the FNV hash.

Things to look out for in the proposed hash:

- There's no explanation of where all the magic shift
  constants and shift patterns come from.

- It relies on potentially unaligned access to read 16-bit chunks
  at a time.  This means #ifdef cruft to "turn that off" on platforms
  that don't support unaligned access, and means timing will vary
  on platforms that do (depending on whether input strings do or
  do not /happen/ to be 2-byte aligned).

- It only delivers a 32-bit hash.  But at least before Py3K, Python's
  hash codes are the native C "long" (32 or 64 bits on all current
  boxes).  The current hash code couldn't care less what sizeof(long)
  is.  It's not clear how to modify the proposed hash to deliver
  64-bit hash codes, in large part because of the first point above.

- It needs another conditional "at the bottom" to avoid returning
  a hash code of -1.  That will affect timing too.

>>> Python3k original hash: real    0m2.210s
>>>               new hash: real    0m1.842s

That's actually a surprisingly small difference, given the much larger
timing differences displayed on:

    http://www.azillionmonkeys.com/qed/hash.html

compared to the FNV hash.  OTOH, the figures there only looked at
256-byte strings, which is much larger (IMO) "than average" for
strings.

Better tests would time building and accessing string-keyed dicts with
reasonable and unreasonable ;-) keys.

From larry at hastings.org  Sun Sep  9 04:24:47 2007
From: larry at hastings.org (Larry Hastings)
Date: Sat, 08 Sep 2007 19:24:47 -0700
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>	
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
Message-ID: <46E3596F.3090606@hastings.org>


If the Python community is just noticing the Hsieh hash, that implies 
that the Bob Jenkins hashes are probably unknown as well.  Behold:
    http://burtleburtle.net/bob/hash/doobs.html
To save you a little head-scratching, the functions you want to play 
with are hashlittle()/hashlittle2() in "lookup3.c":
    http://burtleburtle.net/bob/c/lookup3.c
hashlittle() returns a 32-bit hash; hashlittle2() returns two 32-bit 
hashes on the same input (in effect a 64-bit hash).  The "little" 
implies that the function is better on little-endian machines.  (There 
is a hashbig(); no hashbig2(), it is left as an exercise for the reader.)

In our testing (at Facebook, for memcached) hashlittle2 was faster than 
the Hsieh hash; that was done a year ago (and before I joined) so I 
don't have numbers for you.

One goal of Jenkin's hashes is uniform distribution, so these functions 
presumably lack the serendipitous "similar inputs hash to similar 
values" behavior of Python's current hash function.  But why is that a 
feature?  (Not that I doubt Tim Peters!)

Oh, and, all the Jenkins code is public domain. 

Cheers,


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070908/a49272fd/attachment.htm 

From guido at python.org  Sun Sep  9 07:19:54 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 8 Sep 2007 22:19:54 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
Message-ID: 

On 9/8/07, Nicholas Bastin  wrote:
> On 9/8/07, "Martin v. L?wis"  wrote:
> > > Would anyone be opposed to rehosting PyLong on top of GMP?
> >
> > I would be opposed. It's LGPL'ed, so you would have to ship GMP sources
> > with any Python binary that you distribute.
>
> The LGPL has no requirement that you convey source for unmodified
> libraries.  Linkage does not imply modification.

Nevertheless I think it would be a bad idea to make it the default
long implementation. There are bound to be *some* licensing issues
with the LGPL (even if it's just more FUD we'd have to fight) and it'd
be one more dependency. I believe there are already Python bindings
for GMP somewhere, so it's not like there is no way to use if if you
absolutely have to.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sun Sep  9 10:39:10 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 09 Sep 2007 10:39:10 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
Message-ID: <46E3B12E.1000703@v.loewis.de>

> The LGPL has no requirement that you convey source for unmodified
> libraries.  Linkage does not imply modification.

Why do you say that? LGPL 2.1, section 6a) (talking about
"work that uses the Library"):

a) Accompany the work with the complete corresponding machine-readable
source code for the Library including whatever changes were used in the
work (which must be distributed under Sections 1 and 2 above); and, if
the work is an executable linked with the Library, with the complete
machine-readable "work that uses the Library", as object code and/or
source code, so that the user can modify the Library and then relink to
produce a modified executable containing the modified Library. (It is
understood that the user who changes the contents of definitions files
in the Library will not necessarily be able to recompile the application
to use the modified definitions.)

So you must "accompany the work with complete source code for the Library".

Regards,
Martin

From nick.bastin at gmail.com  Sun Sep  9 11:06:37 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 05:06:37 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E3B12E.1000703@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
Message-ID: <66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>

On 9/9/07, "Martin v. L?wis"  wrote:
> > The LGPL has no requirement that you convey source for unmodified
> > libraries.  Linkage does not imply modification.
>
> Why do you say that? LGPL 2.1, section 6a) (talking about
> "work that uses the Library"):
>
> a) Accompany the work with the complete corresponding machine-readable
> source code for the Library including whatever changes were used in the
> work (which must be distributed under Sections 1 and 2 above); and, if
> the work is an executable linked with the Library, with the complete
> machine-readable "work that uses the Library", as object code and/or
> source code, so that the user can modify the Library and then relink to
> produce a modified executable containing the modified Library. (It is
> understood that the user who changes the contents of definitions files
> in the Library will not necessarily be able to recompile the application
> to use the modified definitions.)
>
> So you must "accompany the work with complete source code for the Library".

You're being awfully selective in your reading.  Section 6a is
immediately preceded by a statement which says:

"Also, you must do one of these things:"

6a is but one of 5 choices.

Those choices are:

"b) Use a suitable shared library mechanism for linking with the Library."
"c) Accompany the work with a written offer, valid for at least three
years, to give the same user the materials specified in Subsection 6a,
above, for a charge no more than the cost of performing this
distribution."
"d) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place."
"e) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy."

Pick any one of those options you like that doesn't involve shipping
source code.  Using standard shared libraries is a "suitable shared
library mechanism".

Also, the LGPLv3 in section 4d.1 specifies the same  "Use a suitable
shared library mechanism for linking with the Library."  This is more
relevant, since GMP is licensed under v3 and not v2.1.

--
Nick

From thomas at python.org  Sun Sep  9 11:13:00 2007
From: thomas at python.org (Thomas Wouters)
Date: Sun, 9 Sep 2007 11:13:00 +0200
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <46E3596F.3090606@hastings.org>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
	
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
	<46E3596F.3090606@hastings.org>
Message-ID: <9e804ac0709090213q4c8f7431oa93037efb36e009e@mail.gmail.com>

On 9/9/07, Larry Hastings  wrote:

> One goal of Jenkin's hashes is uniform distribution, so these functions
> presumably lack the serendipitous "similar inputs hash to similar values"
> behavior of Python's current hash function.  But why is that a feature?
> (Not that I doubt Tim Peters!)
>

Because (relatively) small dicts with (broadly speaking) similar keys are
quite common in Python. Module and class and instance __dict__s, for
instance ;) As Tim mentioned, the dict implementation only looks at part of
the actual hash value (depending on the size of the dict) and having hash
values close but not the same greatly decreases the chance of collisions in
(relatively) small dicts. It's less of a problem for massive dicts with
(almost) completely arbitrary keys, but it doesn't exactly hurt there,
either.

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070909/abe62bce/attachment.htm 

From martin at v.loewis.de  Sun Sep  9 11:24:55 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 09 Sep 2007 11:24:55 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>	<46E31FA2.4060701@v.loewis.de>	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
Message-ID: <46E3BBE7.4020800@v.loewis.de>

> You're being awfully selective in your reading.

On purpose. All alternatives can be ruled out quickly as unfeasible,
or equivalent to "distribute the source code".

> 6a is but one of 5 choices.

So which of these would you recommend?

> "b) Use a suitable shared library mechanism for linking with the Library."

This is shortened. The full text reads

b) Use a suitable shared library mechanism for linking with the Library.
A suitable mechanism is one that (1) uses at run time a copy of the
library already present on the user's computer system, rather than
copying library functions into the executable, and (2) will operate
properly with a modified version of the library, if the user installs
one, as long as the modified version is interface-compatible with the
version that the work was made with.

So this is only an option if "a copy of the library [is] already
present on the user's computer system". This may work for Linux,
but not for Windows, or Solaris (not sure about OSX).

> "c) Accompany the work with a written offer, valid for at least three
> years, to give the same user the materials specified in Subsection 6a,
> above, for a charge no more than the cost of performing this
> distribution."

I find that equally unacceptable for Python. People distributing Python
should not be required to include written offers.

> "d) If distribution of the work is made by offering access to copy
> from a designated place, offer equivalent access to copy the above
> specified materials from the same place."

This is the same as "distribute the source code".

> "e) Verify that the user has already received a copy of these
> materials or that you have already sent this user a copy."

This may work for a limited number of copies, where you know
all recipients personally, but won't work for Python.

> Also, the LGPLv3 in section 4d.1 specifies the same  "Use a suitable
> shared library mechanism for linking with the Library."  This is more
> relevant, since GMP is licensed under v3 and not v2.1.

And it has the same restriction: the shared library must already
be present on the user's computer system. So again, this won't work
for the Windows binaries that we distribute. We (python.org) could
place the source code of GMP along with the MSI binary, but then
people redistributing the MSI binary would break the LGPL,
unless they also distribute the GMP sources.

Regards,
Martin

From larry at hastings.org  Sun Sep  9 14:04:44 2007
From: larry at hastings.org (Larry Hastings)
Date: Sun, 09 Sep 2007 05:04:44 -0700
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <9e804ac0709090213q4c8f7431oa93037efb36e009e@mail.gmail.com>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>	
		
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>	
	<46E3596F.3090606@hastings.org>
	<9e804ac0709090213q4c8f7431oa93037efb36e009e@mail.gmail.com>
Message-ID: <46E3E15C.8040801@hastings.org>

Thomas Wouters wrote:
> Because (relatively) small dicts with (broadly speaking) similar keys 
> are quite common in Python. Module and class and instance __dict__s, 
> for instance ;) As Tim mentioned, the dict implementation only looks 
> at part of the actual hash value (depending on the size of the dict) 
> and having hash values close but not the same greatly decreases the 
> chance of collisions in (relatively) small dicts.
I see--it's avoiding the Birthday Paradox.  Collisions are actually more 
likely if the numbers are totally random than if the numbers are, 
because of a feeble hash algorithm, relatively consecutive.  ;)

Got it, thanks,


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070909/0cb9a27b/attachment.htm 

From qrczak at knm.org.pl  Sun Sep  9 15:12:23 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sun, 09 Sep 2007 15:12:23 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <1189270839.25695.18.camel@qrnik>
References: <1189270839.25695.18.camel@qrnik>
Message-ID: <1189343544.4344.9.camel@qrnik>

Since PyString_Format is deprecated, is there a better way to convert a
Python3 int which doesn't fit in a C long to a hex representation in a
C string, than PyUnicode_Format and iterating over characters, casting
them from Unicode to bytes?

I actually need to convert it to mpz_t, which is best done via text
in a C string in a base which is a power of 2. Since PyUnicode_Format
for Python3 int creates a byte string first, it's quite silly to let
a byte string be converted to a Unicode string and then back.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From martin at v.loewis.de  Sun Sep  9 15:24:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 09 Sep 2007 15:24:37 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <1189343544.4344.9.camel@qrnik>
References: <1189270839.25695.18.camel@qrnik> <1189343544.4344.9.camel@qrnik>
Message-ID: <46E3F415.9060707@v.loewis.de>

> I actually need to convert it to mpz_t, which is best done via text
> in a C string in a base which is a power of 2. Since PyUnicode_Format
> for Python3 int creates a byte string first, it's quite silly to let
> a byte string be converted to a Unicode string and then back.

You could use _PyLong_AsByteArray.

Regards,
Martin

From ncoghlan at gmail.com  Sun Sep  9 16:10:19 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 10 Sep 2007 00:10:19 +1000
Subject: [Python-3000] clean out the future?
In-Reply-To: 
References: 
	
Message-ID: <46E3FECB.2080404@gmail.com>

Fred Drake wrote:
> On Sep 7, 2007, at 1:24 PM, Georg Brandl wrote:
>> Should the __future__ be cleaned out for 3k, or should all future  
>> imports
>> continue to work and do nothing?
> 
> They should continue to work.
> 
> One advantage of keeping the existing feature table in the __future__  
> module is that is makes it easier to avoid re-using a feature name; I  
> think there's merit in that.

While I don't object to that (I agree keeping the history in the 
__future__ module is a good thing), 2to3 should probably strip them 
anyway, since they're now redundant.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From qrczak at knm.org.pl  Sun Sep  9 16:23:18 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sun, 09 Sep 2007 16:23:18 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E3F415.9060707@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	<1189343544.4344.9.camel@qrnik>  <46E3F415.9060707@v.loewis.de>
Message-ID: <1189347799.4344.12.camel@qrnik>

Dnia 09-09-2007, N o godzinie 15:24 +0200, "Martin v. L?wis" napisa?(a):

> You could use _PyLong_AsByteArray.

I'm scared by the underscore.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From martin at v.loewis.de  Sun Sep  9 16:31:08 2007
From: martin at v.loewis.de (=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 09 Sep 2007 16:31:08 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <1189347799.4344.12.camel@qrnik>
References: <1189270839.25695.18.camel@qrnik>	<1189343544.4344.9.camel@qrnik>
	<46E3F415.9060707@v.loewis.de> <1189347799.4344.12.camel@qrnik>
Message-ID: <46E403AC.3050508@v.loewis.de>

>> You could use _PyLong_AsByteArray.
> 
> I'm scared by the underscore.

If that helps, feel free to submit a patch to remove the underscore,
and document the function properly.

Regards,
Martin


From fdrake at acm.org  Sun Sep  9 17:47:56 2007
From: fdrake at acm.org (Fred Drake)
Date: Sun, 9 Sep 2007 11:47:56 -0400
Subject: [Python-3000] clean out the future?
In-Reply-To: <46E3FECB.2080404@gmail.com>
References: 
	
	<46E3FECB.2080404@gmail.com>
Message-ID: 

On Sep 9, 2007, at 10:10 AM, Nick Coghlan wrote:
> While I don't object to that (I agree keeping the history in the  
> __future__ module is a good thing), 2to3 should probably strip them  
> anyway, since they're now redundant.

That would be good.  From a compatibility perspective, they should  
work, but they should be removed from source code (I've never *like*  
the __future__ imports, though I understand their value).


   -Fred

-- 
Fred Drake   




From nick.bastin at gmail.com  Sun Sep  9 19:41:53 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 13:41:53 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E3BBE7.4020800@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de>
Message-ID: <66d0a6e10709091041u5fa1d7c2xfd16b45a91dab0d0@mail.gmail.com>

On 9/9/07, "Martin v. L?wis"  wrote:
> > "d) If distribution of the work is made by offering access to copy
> > from a designated place, offer equivalent access to copy the above
> > specified materials from the same place."
>
> This is the same as "distribute the source code".

Well, it's the same as "offer for distribution".  There's no
requirement that the user actually ever download it, only that you
offer it for download.  Certainly there's no requirement that you put
the source in the installer package (the GPL FAQ covers this question
- "Our requirements for redistributors are intended to make sure the
users can get the source code, not to force users to download the
source code even if they don't want it.")

Also, if python.org agreed to continually make the GMP library source
available, that would solve the problem for other binary distributors.
 From the GPL FAQ: "the GPL says you must offer access to copy the
source code "from the same place"; that is, next to the binaries.
However, if you make arrangements with another site to keep the
necessary source code available, and put a link or cross-reference to
the source code next to the binaries, we think that qualifies as "from
the same place"."

--
Nick

From greg at krypto.org  Sun Sep  9 21:02:15 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 9 Sep 2007 12:02:15 -0700
Subject: [Python-3000] Solaris support in 3.0?
In-Reply-To: <52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	
	<52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
Message-ID: <52dc1c820709091202p7fcb037j850e1750fdc736e3@mail.gmail.com>

> Rather than resurrecting the old RSA-copyright md5.c I can easily make new
> ones out of the libtomcrypt md5 and sha1 sources the same way i created the
> non-openssl sha256 and sha512 modules.
>
> We should not limit ourselves to only md5 if we do that, lets guarantee
> that md5, sha1 - sha512 are available on all future python installs; its not
> difficult.  I'll do the work if we need it.
>
> -gps
>

Done.  Waiting on buildbots to confirm it fixes tru64 and solaris.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070909/c48a5771/attachment.htm 

From greg at krypto.org  Sun Sep  9 21:09:23 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 9 Sep 2007 12:09:23 -0700
Subject: [Python-3000] audio device support
In-Reply-To: <46DDD42D.8090608@ibp.de>
References: <46DDD42D.8090608@ibp.de>
Message-ID: <52dc1c820709091209v2f04a406q4f5cf4c8d5d38968@mail.gmail.com>

> What I'd like to see:
>
> I like the idea of having audio device support for the major operating
> systems in the standard library.
>
> But I am even more interested in a common interface for simple operations.
>
> IMO, the API should support:
>
> - stereo playback
> - stereo recording
> - different sampling rates and formats (alaw, mulaw and PCM in signed
> integers in various widths and maybe PCM in floats/doubles).
> - device selection
> - volume control
>
> Overall, I think the level of abstraction in the OSS or ALSA APIs is
> about right, coreaudio on OS X and DirectSound on Windows are overkill
> outside of niche applications.
>
> I would volunteer sample implementations for Windows, OS X and Linux
> (ALSA).
>
> - Lars


That sounds like a nice basic simple interface.  I suggest writing it up and
submitting it as a patch or even making it stand alone module with its own
distutils setup.py.  It sounds like a good idea regardless of it its
accepted into the standard library.  (clearly what we have now for python
audio is a mess :)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070909/a88ad6ea/attachment-0001.htm 

From lars at ibp.de  Sun Sep  9 21:39:34 2007
From: lars at ibp.de (Lars Immisch)
Date: Sun, 09 Sep 2007 21:39:34 +0200
Subject: [Python-3000] audio device support
In-Reply-To: <52dc1c820709091209v2f04a406q4f5cf4c8d5d38968@mail.gmail.com>
References: <46DDD42D.8090608@ibp.de>
	<52dc1c820709091209v2f04a406q4f5cf4c8d5d38968@mail.gmail.com>
Message-ID: <46E44BF6.5090501@ibp.de>


> That sounds like a nice basic simple interface.  I suggest writing it up 
> and submitting it as a patch or even making it stand alone module with 
> its own distutils setup.py.  It sounds like a good idea regardless of it 
> its accepted into the standard library.  (clearly what we have now for 
> python audio is a mess :)

Terry Reedy suggested looking into pygame; I like its explicit channel 
abstraction.

A standalone module is probably the best start.

I'll look into it.

- Lars

From jimjjewett at gmail.com  Sun Sep  9 23:25:56 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 9 Sep 2007 17:25:56 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
Message-ID: 

On 9/8/07, Nicholas Bastin  wrote:
> On 9/8/07, "Martin v. L?wis"  wrote:
> > > Speaking of PyLong, and its' minor awkwardness to work with in C (you
> > > either have to convert to another multiple-precision type through a
> > > string, or use Python's arithmetic operators directly), was there any
> > > thought given to using something like GPM's mpz_t as the backing data
> > > type?

> Would anyone be opposed to rehosting PyLong on top of GMP?

(1)  If there are concerns about the RCA attribution license, I would
expect much greater concerns about LGPL.

(2)  License aside, does it really solve the problem you had about
needing to convert or use Python's arithmetic operations?   At first
glance, it looks like you would still have the same problem, except
that you would need to use the GMP functions instead of the python
functions.

(3)  Is it stable enough?  I know it has been developed since 1991,
but they seem to focus on high performance for truly huge numbers.  I
suspect the vast majority of python programs would perform fine if
they were limited to C ints, and so the extra costs may not be worth
it.

According to http://gmplib.org/
"""
IMPORTANT INFORMATION FOR ALL GMP USERS:

GMP is very often miscompiled! We are seeing ever increasing problems with
miscompilations of the GMP code. It has now come to the point where a
compiler should be assumed to miscompile GMP.
"""

Later details of issues with the current release include:

Garbage from some ternary ops (a=c+b*a) with the C++ wrappers (would
that apply to C++ python extensions?)

crash bugs

It doesn't work on the Intel Macintoshes, and the workarounds are so
ugly that they won't be applied to the trunk.

-jJ

From nick.bastin at gmail.com  Mon Sep 10 00:14:45 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 18:14:45 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	
Message-ID: <66d0a6e10709091514k15d81759h488c5b29ccd63bc7@mail.gmail.com>

On 9/9/07, Jim Jewett  wrote:
> On 9/8/07, Nicholas Bastin  wrote:
> > On 9/8/07, "Martin v. L?wis"  wrote:
> > > > Speaking of PyLong, and its' minor awkwardness to work with in C (you
> > > > either have to convert to another multiple-precision type through a
> > > > string, or use Python's arithmetic operators directly), was there any
> > > > thought given to using something like GPM's mpz_t as the backing data
> > > > type?
>
> > Would anyone be opposed to rehosting PyLong on top of GMP?
>
> (1)  If there are concerns about the RCA attribution license, I would
> expect much greater concerns about LGPL.

Maybe, but I'd rather have a technical discussion than a licensing
discussion.  If GMP doesn't stand up for technical reasons, then the
licensing discussion was a waste of time without resolving whether it
would be a good technical decision or not.

> (2)  License aside, does it really solve the problem you had about
> needing to convert or use Python's arithmetic operations?   At first
> glance, it looks like you would still have the same problem, except
> that you would need to use the GMP functions instead of the python
> functions.

Yes, but the GMP function set is much richer than the Python one, and
more efficient.  GMP is in fact the thing I most convert PyLong to
(via a string, which is, as you might imagine, not that efficient).
Obviously if we're going to support numbers larger than the host
language (in this case, C) natively supports, there's going to be some
other API involved.

> (3)  Is it stable enough?  I know it has been developed since 1991,
> but they seem to focus on high performance for truly huge numbers.  I
> suspect the vast majority of python programs would perform fine if
> they were limited to C ints, and so the extra costs may not be worth
> it.

In a little test, integer math (not-long-requiring) in 3.0 is 2.3x
slower than the same integer math in 2.6.  Here is my test code:

inttest.py:
def int_test(rounds):
  index = 0
  while index < rounds:
    foo = 0
    while foo < 10000000:
      foo += 1
      .... (above line repeated 100 times)

    index += 1

3.0:  python Lib\timeit.py "import inttest; inttest.int_test (5)"
10 loops, best of 3: 6.01 sec per loop

2.6:  python Lib\timeit.py "import inttest; inttest.int_test (5)"
10 loops, best of 3: 2.64 sec per loop

I welcome other benchmarks if people think there's something
fundamentally wrong with my test.

> It doesn't work on the Intel Macintoshes, and the workarounds are so
> ugly that they won't be applied to the trunk.

This is clearly a deal killer, thanks for pointing that out.

I would however continue to ask the general question - do we really
want to maintain our own arbitrary precision math library (which we
now use exclusively)?  Who is committing to optimizing the performance
of PyLong?

--
Nick

From greg.ewing at canterbury.ac.nz  Mon Sep 10 01:01:01 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 10 Sep 2007 11:01:01 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E3B12E.1000703@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
Message-ID: <46E47B2D.6020608@canterbury.ac.nz>

Martin v. L?wis wrote:
> a) Accompany the work with the complete corresponding machine-readable
> source code for the Library

But if it's like the regular GPL, you can just tell people
where to get the source -- you don't have to physically
provide it yourself.

--
Greg

From nick.bastin at gmail.com  Mon Sep 10 01:38:28 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 19:38:28 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E47B2D.6020608@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de> <46E47B2D.6020608@canterbury.ac.nz>
Message-ID: <66d0a6e10709091638q762f010bu7605f1793236177a@mail.gmail.com>

On 9/9/07, Greg Ewing  wrote:
> Martin v. L?wis wrote:
> > a) Accompany the work with the complete corresponding machine-readable
> > source code for the Library
>
> But if it's like the regular GPL, you can just tell people
> where to get the source -- you don't have to physically
> provide it yourself.

You technically have to have a written agreement with the people who
provide the source that they will continue to do so.  This is why I
suggested that python.org could just host the source and provide that
agreement to other distributors of Python.  We could ask the GMP folks
for those assurances as well, but that point appears moot as there are
technical issues with using the library (which is what I was really
trying to get at in the first place).

I still think we should investigate other arbitrary precision math
libraries, or have someone commit to meeting certain performance goals
for PyLong.

--
Nick

From greg.ewing at canterbury.ac.nz  Mon Sep 10 01:46:57 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 10 Sep 2007 11:46:57 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	
Message-ID: <46E485F1.6030503@canterbury.ac.nz>

Jim Jewett wrote:
> It has now come to the point where a
> compiler should be assumed to miscompile GMP.
> ...
> It doesn't work on the Intel Macintoshes, and the workarounds are so
> ugly that they won't be applied to the trunk.

Sounds like it's been optimised for speed over portability
in a really extreme way. I wouldn't go anywhere near code
like that.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Sep 10 02:13:04 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 10 Sep 2007 12:13:04 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E3BBE7.4020800@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de>
Message-ID: <46E48C10.7010705@canterbury.ac.nz>

Martin v. L?wis wrote:
> b) Use a suitable shared library mechanism for linking with the Library.
> A suitable mechanism is one that (1) uses at run time a copy of the
> library already present on the user's computer system, rather than
> copying library functions into the executable, and (2) will operate
> properly with a modified version of the library, if the user installs
> one, as long as the modified version is interface-compatible with the
> version that the work was made with.
> 
> So this is only an option if "a copy of the library [is] already
> present on the user's computer system". This may work for Linux,
> but not for Windows, or Solaris (not sure about OSX).

I think it's just trying to say dynamic rather than static
linking, not that the library has to be a pre-existing
one. The important thing is that the library can be
updated just by replacing a file, without having to
re-link the executable.

So Windows DLLs qualify, as far as I can see.

--
Greg

From jimjjewett at gmail.com  Mon Sep 10 02:58:34 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 9 Sep 2007 20:58:34 -0400
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
	
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
Message-ID: 

On 9/8/07, Tim Peters  wrote:

> in comments in dictobject.c.  As it notes there, hashing the strings
> "namea", "nameb", "namec", and "named" currently produces (on a
> sizeof(long) == 4 box):

> -1658398457
> -1658398460
> -1658398459
> -1658398462

> That the hash codes are very close but not identical is "a feature",
> since the dict implementation only looks at the last k bits (for
> various more-or-less small values of k):  this gives "better than
> random" dict collision behavior for input strings very close together.

> The proposed hash produces instead:

>  1892683363
>  -970432008
>    51735791
>  1567337715
>
> Obviously much closer to "random" behavior, but that's not necessarily
> a good thing for dicts.

To spell this out a bit more:

For cryptography, you want a "random" has function.  For hash tables,
you just want one that spreads out your actual input.  For strings,
this tends to mean short strings that look like possible variable
names.  Because they often *are* variable names, they are sometimes
sequential, like var_a, var_b, var_c.

In the current CPython implementation, dicts start as a size-8
smalldict, and most dicts never grow beyond that.  So the effective
hash is really (hash%8)

When adding four entries to an 8-slot table, a truly random hash would
have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4  of the
time.  As expected, the proposed hash does have a collision for those
four values (the first and fourth).

The current hash function does not collide for strings that change
only one character to the "next" in ASCIIbetical order until the 9th
string -- at which time you need to resize anyhow.

For larger tables, having them close still doesn't cause a problem,
and may even be useful if you do decide to sort the keys.  (CPython
lists use a "timsort" that takes advantage of partially sorted input,
so if the iterator gets them close to sorted initially, that can
help.)

-jJ

From jimjjewett at gmail.com  Mon Sep 10 03:27:36 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 9 Sep 2007 21:27:36 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E48C10.7010705@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
Message-ID: 

On 9/9/07, Greg Ewing  wrote:

> I think it's just trying to say dynamic rather than static ...
> library can be updated just by replacing a file, ...

> So Windows DLLs qualify, as far as I can see.

How many external library calls would need to be resolved at runtime
for the following code?

    for x in range(N):

    x = 0
    while x < N:   # Would this comparison be external?
        x +=1        # And this incf?

If python handled small ints itself, and only farmed out the "large"
ones, I think the situation would be worse than today, as extensions
would still need to support two forms of integer, but they wouldn't
even know which was going to be used for a given numeric value.
(Unless GMP were modified to return the python version for small
ones... in which case we have a fork.)  And since we would still have
the object headers of python, I suspect it still wouldn't be as simple
as just using GMP routines.

-jJ

From tim.peters at gmail.com  Mon Sep 10 03:32:16 2007
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 9 Sep 2007 21:32:16 -0400
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: <46E3E15C.8040801@hastings.org>
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
	
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
	<46E3596F.3090606@hastings.org>
	<9e804ac0709090213q4c8f7431oa93037efb36e009e@mail.gmail.com>
	<46E3E15C.8040801@hastings.org>
Message-ID: <1f7befae0709091832m3ff970a7v864757a0c138071f@mail.gmail.com>

[Larry Hastings]
> I see--it's avoiding the Birthday Paradox.

It /tends/ to, yes.  This wasn't a design goal of the string hash,
it's just a property observed after it was adopted, and appreciated
much later ;-)

It's much clearer for Python's small-int hash, where hash(i) == i for
i != -1.  That is, nearly all "small enough" integers are their own
"hash codes".  That guarantees no collisions whatsoever in a dict
keyed by a contiguous range of small integers (excluding -1), no
matter how large the range.

Read the comments in dictobject.c for more on this.  The
predictability of such hash schemes has both good & bad implications
for dict performance, and Python's dict conflict-resolution strategies
are fancier than most to mitigate the possible bad implications.

> Collisions are actually more likely if the numbers are totally random than
> if the numbers are, because of a feeble hash algorithm, relatively
> consecutive.  ;)

Right.  In a "good" (cryptographically speaking) hash function, a
1-bit change in the input "should" change about half the output bits
(in the hash code), making collisions much more likely when the keys
differ little in the low bits.

The important point is that the cost of building and accessing
string-keyed dicts is more important (in Python) than the cost of just
hashing strings, and collision resolution is a real expense.

From nick.bastin at gmail.com  Mon Sep 10 04:41:07 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 22:41:07 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik> <46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
Message-ID: <66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>

On 9/9/07, Jim Jewett  wrote:
> On 9/9/07, Greg Ewing  wrote:
>
> > I think it's just trying to say dynamic rather than static ...
> > library can be updated just by replacing a file, ...
>
> > So Windows DLLs qualify, as far as I can see.
>
> How many external library calls would need to be resolved at runtime
> for the following code?
>
>     for x in range(N):
>
>     x = 0
>     while x < N:   # Would this comparison be external?
>         x +=1        # And this incf?
>
> If python handled small ints itself, and only farmed out the "large"
> ones, I think the situation would be worse than today, as extensions
> would still need to support two forms of integer, but they wouldn't
> even know which was going to be used for a given numeric value.

For the current implementation in 3.0, for C API extension writers,
this is practically already the case.  The same type is used
everywhere, but you have to test if it is out of range for C types,
and then extract it as a string to put in some other long integer
type, or work with it using the Python C API exclusively.

I'm not suggesting that Python handle small ints itself  and then farm
out large integer computations, I'm suggesting that since we've
already coalesced small ints into 'large' ones, we might want to
review the performance implications of that decision, and possibly
consider that other people have already solved this problem.  Clearly
GMP appears to fail on a technical level, but there might be other
options worth investigating.

--
Nick

From guido at python.org  Mon Sep 10 05:38:26 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 9 Sep 2007 20:38:26 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
Message-ID: 

On 9/9/07, Nicholas Bastin  wrote:
> I'm not suggesting that Python handle small ints itself  and then farm
> out large integer computations, I'm suggesting that since we've
> already coalesced small ints into 'large' ones, we might want to
> review the performance implications of that decision, and possibly
> consider that other people have already solved this problem.  Clearly
> GMP appears to fail on a technical level, but there might be other
> options worth investigating.

The performance problems that are affecting us most are for
small-value ints. The old PyInt type has many custom optimizations to
help. I think we could do worse than re-introducing some of the same
tricks, retargeted to PyLong (which never got much attention for
small-value performance).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Mon Sep 10 05:53:53 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sun, 9 Sep 2007 23:53:53 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik> <46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
Message-ID: <66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>

On 9/9/07, Guido van Rossum  wrote:
> On 9/9/07, Nicholas Bastin  wrote:
> > I'm not suggesting that Python handle small ints itself  and then farm
> > out large integer computations, I'm suggesting that since we've
> > already coalesced small ints into 'large' ones, we might want to
> > review the performance implications of that decision, and possibly
> > consider that other people have already solved this problem.  Clearly
> > GMP appears to fail on a technical level, but there might be other
> > options worth investigating.
>
> The performance problems that are affecting us most are for
> small-value ints. The old PyInt type has many custom optimizations to
> help. I think we could do worse than re-introducing some of the same
> tricks, retargeted to PyLong (which never got much attention for
> small-value performance).

I did redo my benchmark using 200 as the increment number instead of
1, to duck any impact from the interning of small value ints in 2.6,
and it made no discernible difference in the results.

--
Nick

From martin at v.loewis.de  Mon Sep 10 07:13:23 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 10 Sep 2007 07:13:23 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E48C10.7010705@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>	<46E31FA2.4060701@v.loewis.de>	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	<46E3B12E.1000703@v.loewis.de>	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	<46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>
Message-ID: <46E4D273.9080300@v.loewis.de>

> I think it's just trying to say dynamic rather than static
> linking, not that the library has to be a pre-existing
> one. The important thing is that the library can be
> updated just by replacing a file, without having to
> re-link the executable.
> 
> So Windows DLLs qualify, as far as I can see.

No no no no no. As with the GPL, the important point is
that the user of the library has ready access to the source
code. Every binary of the library must be accompanied by
the source code, where "accompanied" means either "included
in the installation media", "downloadable from the same
source", or "promised in writing".

The first right of the user is to get the source code
easily, without having to beg for it. Only then it is also
the user's right to modify it, and use the modified version
in the application.

So normally, the application's task would be to provide
source code. However, if the application links with a
shared library already on the system, it is the system
vendor's task to provide source code - which is the
common case on Linux. So in that case, the application
vendor can be cleared of having to provide source code.

Therefore, Windows DLLs would only qualify if Microsoft
would provide them, as then Microsoft would also have
to provide the source code.

Regards,
Martin

From qrczak at knm.org.pl  Mon Sep 10 14:04:08 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Mon, 10 Sep 2007 14:04:08 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
Message-ID: <1189425848.7656.19.camel@qrnik>

Dnia 09-09-2007, N o godzinie 21:27 -0400, Jim Jewett napisa?(a):

> If python handled small ints itself, and only farmed out the "large"
> ones,

If GMP is used, it's definitely worth to have a non-GMP representation
for small integers, because GMP itself does not do it. A GMP integer
is represented by a pointer to digits, the allocated size, and the used
size multiplied by the sign; no special cases here.

(The fact that GMP does not do it is good for people who want to make
a super-compact representation themselves. GMP optimization for the same
case would be wasted. It requires some work for implementing overflow
detection, but it yields a very good final result.)

The major technical problem with GMP is that an out of memory condition
during computation is a fatal error, GMP does not provide a way to
recover from it.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From eric+python-dev at trueblade.com  Mon Sep 10 16:51:21 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 10 Sep 2007 10:51:21 -0400
Subject: [Python-3000] __format__ and datetime
Message-ID: <46E559E9.4090907@trueblade.com>

I have a patch to add __format__ to datetime.time, .date, and .datetime. 
  For non-empty format_spec's, I just pass on to .strftime.  For empty 
format_spec's, it returns str(self).

I think this is the only reasonable interpretation of format_spec's for 
datetime.  Does anyone think otherwise?

Eric.

From martin at v.loewis.de  Mon Sep 10 16:56:05 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 10 Sep 2007 16:56:05 +0200
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E559E9.4090907@trueblade.com>
References: <46E559E9.4090907@trueblade.com>
Message-ID: <46E55B05.3090701@v.loewis.de>

> I have a patch to add __format__ to datetime.time, .date, and .datetime. 
>   For non-empty format_spec's, I just pass on to .strftime.  For empty 
> format_spec's, it returns str(self).
> 
> I think this is the only reasonable interpretation of format_spec's for 
> datetime.  Does anyone think otherwise?

Can you please show an example of how it would look like?

Regards,
Martin

From eric+python-dev at trueblade.com  Mon Sep 10 17:16:36 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 10 Sep 2007 11:16:36 -0400
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E55B05.3090701@v.loewis.de>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
Message-ID: <46E55FD4.9000807@trueblade.com>

Martin v. L?wis wrote:
>> I have a patch to add __format__ to datetime.time, .date, and .datetime. 
>>   For non-empty format_spec's, I just pass on to .strftime.  For empty 
>> format_spec's, it returns str(self).
>>
>> I think this is the only reasonable interpretation of format_spec's for 
>> datetime.  Does anyone think otherwise?
> 
> Can you please show an example of how it would look like?

 >>> import datetime
 >>> format(datetime.datetime.now(), 'date: %Y-%m-%d time:%H:%M:%s')
'date: 2007-09-10 time:11:15:1189437339'
 >>> format(datetime.datetime.now(), '')
'2007-09-10T11:15:51.329639'

From p.f.moore at gmail.com  Mon Sep 10 17:29:56 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 10 Sep 2007 16:29:56 +0100
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E55FD4.9000807@trueblade.com>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
Message-ID: <79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>

On 10/09/2007, Eric Smith  wrote:
> Martin v. L?wis wrote:
> >> I have a patch to add __format__ to datetime.time, .date, and .datetime.
> >>   For non-empty format_spec's, I just pass on to .strftime.  For empty
> >> format_spec's, it returns str(self).
> >>
> >> I think this is the only reasonable interpretation of format_spec's for
> >> datetime.  Does anyone think otherwise?
> >
> > Can you please show an example of how it would look like?
>
>  >>> import datetime
>  >>> format(datetime.datetime.now(), 'date: %Y-%m-%d time:%H:%M:%s')
> 'date: 2007-09-10 time:11:15:1189437339'
>  >>> format(datetime.datetime.now(), '')
> '2007-09-10T11:15:51.329639'

I'd like to see the default format specified (somewhere). I note that
the default format for datetime values seems to differ for me (on
3.0a1 on Windows)

Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> str(datetime.datetime.now())
'2007-09-10 16:26:25.218000'

(Note lack of 'T'). I'm not sure I like 6 decimal places of seconds to
be the default format, either, but consistency (with str()) and
accuracy (however extreme) may be more important here...

The date and time defaults (which appear to be %Y-%m-%d and %H:%M:%s)
seem perfectly acceptable, on the other hand.

Paul.

From eric+python-dev at trueblade.com  Mon Sep 10 17:31:23 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 10 Sep 2007 11:31:23 -0400
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E55FD4.9000807@trueblade.com>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
Message-ID: <46E5634B.4050405@trueblade.com>

Eric Smith wrote:
> Martin v. L?wis wrote:
>>> I have a patch to add __format__ to datetime.time, .date, and .datetime. 
>>>   For non-empty format_spec's, I just pass on to .strftime.  For empty 
>>> format_spec's, it returns str(self).
>>>
>>> I think this is the only reasonable interpretation of format_spec's for 
>>> datetime.  Does anyone think otherwise?
>> Can you please show an example of how it would look like?
> 
>  >>> import datetime
>  >>> format(datetime.datetime.now(), 'date: %Y-%m-%d time:%H:%M:%s')
> 'date: 2007-09-10 time:11:15:1189437339'
>  >>> format(datetime.datetime.now(), '')
> '2007-09-10T11:15:51.329639'

Oops, that should have been '%S':
 >>> format(datetime.datetime.now(), 'date: %Y-%m-%d time:%H:%M:%S')
'date: 2007-09-10 time:11:28:12'

I'm not sure what strftime does with '%s', I don't see it documented.

 >>> datetime.datetime.now().strftime('%s')
'1189438155'



From thomas at python.org  Mon Sep 10 17:33:57 2007
From: thomas at python.org (Thomas Wouters)
Date: Mon, 10 Sep 2007 17:33:57 +0200
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <20070909235556.04BA71E400F@bag.python.org>
References: <20070909235556.04BA71E400F@bag.python.org>
Message-ID: <9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>

On 9/10/07, gregory.p.smith  wrote:
>
> Author: gregory.p.smith
> Date: Mon Sep 10 01:55:55 2007
> New Revision: 58068
>
> Modified:
>    python/branches/py3k/Doc/library/exceptions.rst
>    python/branches/py3k/Doc/library/socket.rst
>    python/branches/py3k/Doc/whatsnew/2.6.rst
>    python/branches/py3k/Lib/test/test_urllib2net.py
>    python/branches/py3k/Lib/urllib2.py
>    python/branches/py3k/Modules/socketmodule.c
> Log:
> merge this from trunk:


Please do these merges with snvmerge. Otherwise, the bookkeeping of what was
merged or not gets all messed up, and the next person to use svnmerge will
be in a world of hurt. (I know, I've been there.)

py3k% svnmerge merge -r58067
[ resolve conflicts, configure, make, make test ]
py3k% svn commit -F svnmerge-commit-message.txt

svnmerge should come with svn, nowadays, or you can download it separately
(as svnmerge.py, probably; it's just a Python script.)

Alternatively, if you know what you're doing, you can edit the
svnmerge-integrated property on the branch directly -- but don't mess it up
:)

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070910/94ad3eb0/attachment.htm 

From janssen at parc.com  Mon Sep 10 18:11:04 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 10 Sep 2007 09:11:04 PDT
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com> 
References: <20070909235556.04BA71E400F@bag.python.org>
	<9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
Message-ID: <07Sep10.091110pdt."57996"@synergy1.parc.xerox.com>

> svnmerge should come with svn, nowadays, or you can download it separately
> (as svnmerge.py, probably; it's just a Python script.)

It comes with version 3 of svn.

Or http://svn.collab.net/repos/svn/trunk/contrib/client-side/svnmerge/svnmerge.py.

Bill

From janssen at parc.com  Mon Sep 10 18:30:52 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 10 Sep 2007 09:30:52 PDT
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <07Sep10.091110pdt."57996"@synergy1.parc.xerox.com> 
References: <20070909235556.04BA71E400F@bag.python.org>
	<9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
	<07Sep10.091110pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <07Sep10.093055pdt."57996"@synergy1.parc.xerox.com>

> It comes with version 3 of svn.

Sorry, that should be 1.3.  But I see I've got version 1.4.4 installed,
and no svnmerge.  Of course, this is Apple's XCode version of svn.

Bill

From guido at python.org  Mon Sep 10 18:38:40 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 10 Sep 2007 09:38:40 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
Message-ID: 

On 9/9/07, Nicholas Bastin  wrote:
> On 9/9/07, Guido van Rossum  wrote:
> > On 9/9/07, Nicholas Bastin  wrote:
> > > I'm not suggesting that Python handle small ints itself  and then farm
> > > out large integer computations, I'm suggesting that since we've
> > > already coalesced small ints into 'large' ones, we might want to
> > > review the performance implications of that decision, and possibly
> > > consider that other people have already solved this problem.  Clearly
> > > GMP appears to fail on a technical level, but there might be other
> > > options worth investigating.
> >
> > The performance problems that are affecting us most are for
> > small-value ints. The old PyInt type has many custom optimizations to
> > help. I think we could do worse than re-introducing some of the same
> > tricks, retargeted to PyLong (which never got much attention for
> > small-value performance).
>
> I did redo my benchmark using 200 as the increment number instead of
> 1, to duck any impact from the interning of small value ints in 2.6,
> and it made no discernible difference in the results.

I'm sorry, I've lost context. I'm not at all clear at this point what
benchmark you might have ran.

Note that when I said "small values" I meant (in part) anything that
fits in a Python long -- while there's a special cache in 2.x for ints
< 100, there's also a special allocator that outperforms the obmalloc
allocator.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Mon Sep 10 18:41:30 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 10 Sep 2007 09:41:30 -0700
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
References: <20070909235556.04BA71E400F@bag.python.org>
	<9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
Message-ID: <52dc1c820709100941m66d2a5b2v156ac9d0a471a87b@mail.gmail.com>

On 9/10/07, Thomas Wouters  wrote:
>
>
> On 9/10/07, gregory.p.smith  wrote:
> >
> > Author: gregory.p.smith
> > Date: Mon Sep 10 01:55:55 2007
> > New Revision: 58068
> >
> > Modified:
> >    python/branches/py3k/Doc/library/exceptions.rst
> >    python/branches/py3k/Doc/library/socket.rst
> >    python/branches/py3k/Doc/whatsnew/2.6.rst
> >    python/branches/py3k/Lib/test/test_urllib2net.py
> >    python/branches/py3k/Lib/urllib2.py
> >    python/branches/py3k/Modules/socketmodule.c
> > Log:
> > merge this from trunk:
>
>
> Please do these merges with snvmerge. Otherwise, the bookkeeping of what
> was merged or not gets all messed up, and the next person to use svnmerge
> will be in a world of hurt. (I know, I've been there.)
>
> py3k% svnmerge merge -r58067
> [ resolve conflicts, configure, make, make test ]
> py3k% svn commit -F svnmerge-commit-message.txt
>
> svnmerge should come with svn, nowadays, or you can download it separately
> (as svnmerge.py, probably; it's just a Python script.)
>
> Alternatively, if you know what you're doing, you can edit the
> svnmerge-integrated property on the branch directly -- but don't mess it up
> :)


Sorry about that & thanks for the pointers, I'll use svnmerge (instead of
"svn merge" or "svn diff | patch" which i had been using) in the future.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070910/a3a5ccdc/attachment.htm 

From guido at python.org  Mon Sep 10 18:42:05 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 10 Sep 2007 09:42:05 -0700
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E559E9.4090907@trueblade.com>
References: <46E559E9.4090907@trueblade.com>
Message-ID: 

On 9/10/07, Eric Smith  wrote:
> I have a patch to add __format__ to datetime.time, .date, and .datetime.
>   For non-empty format_spec's, I just pass on to .strftime.  For empty
> format_spec's, it returns str(self).
>
> I think this is the only reasonable interpretation of format_spec's for
> datetime.  Does anyone think otherwise?

+1

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Mon Sep 10 18:55:54 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 10 Sep 2007 17:55:54 +0100
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
References: <20070909235556.04BA71E400F@bag.python.org>
	<9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
Message-ID: <79990c6b0709100955i2cbca7dblbd6fd4ed32781ab2@mail.gmail.com>

On 10/09/2007, Thomas Wouters  wrote:
> svnmerge should come with svn, nowadays, or you can download it separately
> (as svnmerge.py, probably; it's just a Python script.)

It's not part of the Win32 binary distribution for Subversion - but I
found it at http://www.orcaware.com/svn/wiki/Svnmerge.py

It doesn't seem to need the Subversion Python libraries. OTOH, I
haven't tested it on Windows (but there seems to be Windows code in
there, so I'm guessing it's meant to work :-))

Paul

From mike.klaas at gmail.com  Mon Sep 10 18:58:49 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Mon, 10 Sep 2007 09:58:49 -0700
Subject: [Python-3000] [Python-3000-checkins] r58068 - in
	python/branches/py3k: Doc/library/exceptions.rst
	Doc/library/socket.rst Doc/whatsnew/2.6.rst
	Lib/test/test_urllib2net.py Lib/urllib2.py Modules/socketmodule.c
In-Reply-To: <9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
References: <20070909235556.04BA71E400F@bag.python.org>
	<9e804ac0709100833t10461267l346a4ebfeabcaedf@mail.gmail.com>
Message-ID: <1DF55068-6E1E-45E7-8CC6-4C10EF097A62@gmail.com>


On 10-Sep-07, at 8:33 AM, Thomas Wouters wrote:

> Alternatively, if you know what you're doing, you can edit the  
> svnmerge-integrated property on the branch directly -- but don't  
> mess it up :)
>

svnmerge also has a handy -M flag that marks a (set of) revisions as  
merged, but doesn't actually do any merging.

-Mike


From nick.bastin at gmail.com  Mon Sep 10 19:58:47 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Mon, 10 Sep 2007 13:58:47 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
Message-ID: <66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>

On 9/10/07, Guido van Rossum  wrote:
> On 9/9/07, Nicholas Bastin  wrote:
> > On 9/9/07, Guido van Rossum  wrote:
> > > On 9/9/07, Nicholas Bastin  wrote:
> > > > I'm not suggesting that Python handle small ints itself  and then farm
> > > > out large integer computations, I'm suggesting that since we've
> > > > already coalesced small ints into 'large' ones, we might want to
> > > > review the performance implications of that decision, and possibly
> > > > consider that other people have already solved this problem.  Clearly
> > > > GMP appears to fail on a technical level, but there might be other
> > > > options worth investigating.
> > >
> > > The performance problems that are affecting us most are for
> > > small-value ints. The old PyInt type has many custom optimizations to
> > > help. I think we could do worse than re-introducing some of the same
> > > tricks, retargeted to PyLong (which never got much attention for
> > > small-value performance).
> >
> > I did redo my benchmark using 200 as the increment number instead of
> > 1, to duck any impact from the interning of small value ints in 2.6,
> > and it made no discernible difference in the results.
>
> I'm sorry, I've lost context. I'm not at all clear at this point what
> benchmark you might have ran.

I posted a tiny snippet of code earlier in the thread that was a
sortof silly benchmark of integer math operations.

> Note that when I said "small values" I meant (in part) anything that
> fits in a Python long -- while there's a special cache in 2.x for ints
> < 100, there's also a special allocator that outperforms the obmalloc
> allocator.

Yeah, my point was mostly an aside to anyone that might have
questioned my earlier results of a 2.3x slowdown on integer-sized
values because I used 1.  A quick switch to 200 netted the exact same
results, and a more extensive refactoring to get the same number of
operations on a random set of larger numbers netted the same result as
well.

--
Nick

From guido at python.org  Mon Sep 10 20:16:43 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 10 Sep 2007 11:16:43 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
Message-ID: 

On 9/10/07, Nicholas Bastin  wrote:
> > > I did redo my benchmark using 200 as the increment number instead of
> > > 1, to duck any impact from the interning of small value ints in 2.6,
> > > and it made no discernible difference in the results.
> >
> > I'm sorry, I've lost context. I'm not at all clear at this point what
> > benchmark you might have ran.
>
> I posted a tiny snippet of code earlier in the thread that was a
> sortof silly benchmark of integer math operations.

Can you report the exact code after all the changes you made, *and*
the results that you are now comparing?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Mon Sep 10 21:24:26 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Mon, 10 Sep 2007 15:24:26 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik> <46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
Message-ID: <66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>

On 9/10/07, Guido van Rossum  wrote:
> On 9/10/07, Nicholas Bastin  wrote:
> > > > I did redo my benchmark using 200 as the increment number instead of
> > > > 1, to duck any impact from the interning of small value ints in 2.6,
> > > > and it made no discernible difference in the results.
> > >
> > > I'm sorry, I've lost context. I'm not at all clear at this point what
> > > benchmark you might have ran.
> >
> > I posted a tiny snippet of code earlier in the thread that was a
> > sortof silly benchmark of integer math operations.
>
> Can you report the exact code after all the changes you made, *and*
> the results that you are now comparing?

Simple example code:

inttest.py:
def int_test2(rounds):
  index = 0
  while index < rounds:
    foo = 0
    while foo < 200000000:
      foo += 200
      .... above line repeated 99 more times
    index += 1

python timeit.py "import inttest; inttest.int_test2(5)"

3.0: 10 loops, best of 3: 6.76 sec per loop
2.6: 10 loops, best of 3: 2.61 sec per loop

The case of foo += 200 actually performs worse in 3.0 than foo += 1,
although 2.6 is consistent using either value.

This is on Windows XP Pro, Pentium D 3.00 ghz (dual core).  Python was
invoked with REALTIME process priority with thread affinity set to 1.
Without thread affinity, 3.0 averaged 7.15 seconds per loop and 2.6
averaged 2.64 seconds per loop.

--
Nick

From greg.ewing at canterbury.ac.nz  Tue Sep 11 02:07:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 11 Sep 2007 12:07:39 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E4D273.9080300@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de>
Message-ID: <46E5DC4B.6030304@canterbury.ac.nz>

Martin v. L?wis wrote:

> The first right of the user is to get the source code
> easily, without having to beg for it. Only then it is also
> the user's right to modify it, and use the modified version
> in the application.

Where does begging come into it? As long as the user
is provided with information which allows them to
easily obtain the source, there shouldn't be a
problem.

What does "from the same source" mean, anyway? On
the same hard disk? On a disk connected to the same
computer? On a server in the same room? Same building?
Owned by the same person/company?

If there's a link on the same web page that works
when the user clicks on it, I don't think they're
even going to notice the difference.

--
Greg


From larry at hastings.org  Tue Sep 11 02:17:22 2007
From: larry at hastings.org (Larry Hastings)
Date: Mon, 10 Sep 2007 17:17:22 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E5DC4B.6030304@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>	<46E31FA2.4060701@v.loewis.de>	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	<46E3B12E.1000703@v.loewis.de>	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	<46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>	<46E4D273.9080300@v.loewis.de>
	<46E5DC4B.6030304@canterbury.ac.nz>
Message-ID: <46E5DE92.8070808@hastings.org>

Greg Ewing wrote:
> If there's a link on the same web page that works
> when the user clicks on it, I don't think they're
> even going to notice the difference.

They'll notice the difference when they want to redistribute Python, 
when they note the new licensing-based restrictions ("GMP must be in a 
user-replaceable shared library", "you must distribute the source to 
your GMP build").

I am opposed to using LGPL- or GPL-licensed code in Python.


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070910/72b2f981/attachment-0001.htm 

From greg.ewing at canterbury.ac.nz  Tue Sep 11 02:25:53 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 11 Sep 2007 12:25:53 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <1189425848.7656.19.camel@qrnik>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	
	<1189425848.7656.19.camel@qrnik>
Message-ID: <46E5E091.5020405@canterbury.ac.nz>

Marcin 'Qrczak' Kowalczyk wrote:
> The major technical problem with GMP is that an out of memory condition
> during computation is a fatal error, GMP does not provide a way to
> recover from it.

If using GMP itself is not feasible, then perhaps
some algorithms could be extracted from it in
areas where it does better than Python?

--
Greg

From nick.bastin at gmail.com  Tue Sep 11 02:48:22 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Mon, 10 Sep 2007 20:48:22 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E5DC4B.6030304@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
Message-ID: <66d0a6e10709101748n2f4edf9di4dd073c5e7e7bd2f@mail.gmail.com>

On 9/10/07, Greg Ewing  wrote:
> Martin v. L?wis wrote:
>
> > The first right of the user is to get the source code
> > easily, without having to beg for it. Only then it is also
> > the user's right to modify it, and use the modified version
> > in the application.
>
> Where does begging come into it? As long as the user
> is provided with information which allows them to
> easily obtain the source, there shouldn't be a
> problem.

The FSF has clarified that this is all that it means.  Technically you
should have an agreement with whoever is providing the source that
they will continue to do so, but it is probably sufficient to take
that burden upon yourself if and only if they stop doing so.

--
Nick

From nick.bastin at gmail.com  Tue Sep 11 03:02:31 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Mon, 10 Sep 2007 21:02:31 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E5DE92.8070808@hastings.org>
References: <1189270839.25695.18.camel@qrnik> <46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org>
Message-ID: <66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>

On 9/10/07, Larry Hastings  wrote:
>
>  Greg Ewing wrote:
>  If there's a link on the same web page that works
> when the user clicks on it, I don't think they're
> even going to notice the difference.
>
>  They'll notice the difference when they want to redistribute Python, when
> they note the new licensing-based restrictions ("GMP must be in a
> user-replaceable shared library", "you must distribute the source to your
> GMP build").

If python.org agreed to host the GMP source, that would suffice for
all people distributing python binaries (they could then just refer to
the GMP source download as a link).  The FSF explicitly states that
this kind of agreement satisfies that requirement of the license.

As for the user-replaceable shared library part, that's up for
considerable debate.  It's unlikely that static linkage legally
creates a derivative work (that would be pretty unreasonable in
computer science terms), but it's never been tested in court, so
static linking would probably be out for distributors without a legal
department.

--
Nick

From eric+python-dev at trueblade.com  Tue Sep 11 03:30:27 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Mon, 10 Sep 2007 21:30:27 -0400
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>	
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
Message-ID: <46E5EFB3.7050809@trueblade.com>

Paul Moore wrote:
> I'd like to see the default format specified (somewhere). I note that
> the default format for datetime values seems to differ for me (on
> 3.0a1 on Windows)
> 
> Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27) [MSC v.1310 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import datetime
>>>> str(datetime.datetime.now())
> '2007-09-10 16:26:25.218000'
> 
> (Note lack of 'T'). I'm not sure I like 6 decimal places of seconds to
> be the default format, either, but consistency (with str()) and
> accuracy (however extreme) may be more important here...

This is my error.  I caught it while adding tests, and I'll fix it 
before I check anything in.  format(datetime.datetime.now(), '') will 
not have a 'T' in it, just as str(datetime.datetime.now()) doesn't.


From skip at pobox.com  Tue Sep 11 05:11:03 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 10 Sep 2007 22:11:03 -0500
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
Message-ID: <18150.1863.436464.41503@montanaro.dyndns.org>


    Paul> The date and time defaults (which appear to be %Y-%m-%d and
    Paul> %H:%M:%s) seem perfectly acceptable, on the other hand.

I would like to see an analog to %S which preserves fractions of a second as
the default formatting for time and datetime objects does:

    >>> print(now)
    2007-09-10 22:07:53.654774
    >>> print(now.strftime("%H:%M:%S"))
    22:07:53
    >>> print(now.time())
    22:07:53.654774

Skip

From guido at python.org  Tue Sep 11 05:24:42 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 10 Sep 2007 20:24:42 -0700
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <18150.1863.436464.41503@montanaro.dyndns.org>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
Message-ID: 

Right. It's odd that there's nothing explicit that exactly produces
the default. (Though floats have this issue too -- I wish it could be
fixed there too.)

On 9/10/07, skip at pobox.com  wrote:
>
>     Paul> The date and time defaults (which appear to be %Y-%m-%d and
>     Paul> %H:%M:%s) seem perfectly acceptable, on the other hand.
>
> I would like to see an analog to %S which preserves fractions of a second as
> the default formatting for time and datetime objects does:
>
>     >>> print(now)
>     2007-09-10 22:07:53.654774
>     >>> print(now.strftime("%H:%M:%S"))
>     22:07:53
>     >>> print(now.time())
>     22:07:53.654774
>
> Skip
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 11 05:58:17 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 10 Sep 2007 20:58:17 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
Message-ID: 

I'd like to see Travis's response to this. It's setting a precedent
regarding locking objects in read-only mode; I haven't found other
examples of objects using LOCKDATA (the only mentions of it seem to be
rejecting it :). I keep getting confused by the two separate lock
counts (and I think in this version the comment is inconsistent with
the code). So I'm hoping Travis has a particular way in mind of
handling LOCKDATA that can be used as a template.

Travis?

--Guido

On 9/8/07, Gregory P. Smith  wrote:
> A new version is attached; cleaned up and simplified based on your original
> comments.
>
> On 8/29/07, Guido van Rossum < guido at python.org> wrote:
> > That's a huge patch to land so close before a release. I'm not sure I
> > like the immutability API -- it won't be useful unless we add a hash
> > method, and then we have all sorts of difficulties again -- the
> > distinction between a hashable and an unhashable object should be made
> > by type, not by value (tuples containing unhashable values
> > notwithstanding).
>
> ok i've removed the immutable support in the most recent patch.  i still
> think it -might- be useful but isn't required and you're right that it could
> open a can of worms if people think it should also mean hashable.  immutable
> bytes may be best implemented as a subclass if its ever wanted.
>
> > I don't understand the comment about using PyBUF_WRITABLE in
> > _getbuffer() -- this is only used for data we're *reading* and I don't
> > think the GIL is even released while we're reading such things.
>
> that appears to be correct.  the comment was wrong.  fixed.
>
> -gps
>
>
> > If you think it's important to get this in the 3.0a1 release, we
> > should pair-program on it ASAP, preferable tomorrow morning.
> > Otherwise, let's do a review next week.
> >
> > --Guido
> >
> > On 8/29/07, Gregory P. Smith < greg at krypto.org> wrote:
> > > Attached is what I've come up with so far.  Only a single field is
> > > added to the PyBytesObject struct.  This adds support to the bytes
> > > object for PyBUF_LOCKDATA buffer API operation.  bytes objects can be
> > > marked temporarily read-only for use while the buffer api has handed
> > > them off to something which may run without the GIL (think IO).  Any
> > > attempt to modify them during that time will raise an exception as I
> > > believe Martin suggested earlier.
> > >
> > > As an added bonus because its been discussed here, support for setting
> > > a bytes object immutable has been added since its pretty trivial once
> > > the read only export support was in place.  Thats not required but was
> > > trivial to include.
> > >
> > > I'd appreciate any feedback.
> > >
> > > My TODO list for this patch:
> > >
> > >  0. Get feedback and make adjustments as necessary.
> > >
> > >  1. Deciding between PyBUF_SIMPLE and PyBUF_WRITEABLE for the internal
> > >     uses of the _getbuffer() function.  bytesobject.c contains both
> readonly
> > >     and read-write uses of the buffers, i'll add boolean parameter for
> > >     that.
> > >
> > >  2. More testing: a few tests in the test suite fail after this but the
> > >     number was low and I haven't had time to look at why or what the
> > >     failures were.
> > >
> > >  3. Exporting methods suggested in the TODO at the top of the file.
> > >
> > >  4. Unit tests for all of the functionality this adds.
> > >
> > > NOTE: after these changes I had to make clean and rm -rf build before
> > > things would not segfault on import.  I suspect some things (modules?)
> > > were not properly recompiled after the bytesobject.h struct change
> > > otherwise.
> > >
> > > -gps
> > >
> > >
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> > >
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Tue Sep 11 01:03:13 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 10 Sep 2007 19:03:13 -0400
Subject: [Python-3000] C API for ints and strings
References: <1189270839.25695.18.camel@qrnik>
	<46E3B12E.1000703@v.loewis.de><66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com><46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz><66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com><66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
Message-ID: 


"Nicholas Bastin"  wrote in message 
news:66d0a6e10709101058n22b04bfakf67a15aea8e739f4 at mail.gmail.com...
| Yeah, my point was mostly an aside to anyone that might have
| questioned my earlier results of a 2.3x slowdown on integer-sized
| values because I used 1.  A quick switch to 200 netted the exact same
| results,

Currently, 200 is a small, cached int just as 1 is ([-10,256] or so is 
range).

| and a more extensive refactoring to get the same number of
| operations on a random set of larger numbers netted the same result as
| well

better test

tjr




From nick.bastin at gmail.com  Tue Sep 11 06:50:57 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 11 Sep 2007 00:50:57 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik> <46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
Message-ID: <66d0a6e10709102150k217adedblfc7cc7b57309f5a7@mail.gmail.com>

On 9/10/07, Terry Reedy  wrote:
>
> "Nicholas Bastin"  wrote in message
> news:66d0a6e10709101058n22b04bfakf67a15aea8e739f4 at mail.gmail.com...
>
> | Yeah, my point was mostly an aside to anyone that might have
> | questioned my earlier results of a 2.3x slowdown on integer-sized
> | values because I used 1.  A quick switch to 200 netted the exact same
> | results,
>
> Currently, 200 is a small, cached int just as 1 is ([-10,256] or so is
> range).

Interesting, I didn't look at the code (obviously), but my
understanding was that it was only positive integers below 100.

--
Nick

From oliphant at enthought.com  Tue Sep 11 07:10:48 2007
From: oliphant at enthought.com (Travis E. Oliphant)
Date: Tue, 11 Sep 2007 00:10:48 -0500
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>	
		
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
Message-ID: <46E62358.3020404@enthought.com>

Guido van Rossum wrote:
> I'd like to see Travis's response to this. It's setting a precedent
> regarding locking objects in read-only mode; I haven't found other
> examples of objects using LOCKDATA (the only mentions of it seem to be
> rejecting it :). I keep getting confused by the two separate lock
> counts (and I think in this version the comment is inconsistent with
> the code). So I'm hoping Travis has a particular way in mind of
> handling LOCKDATA that can be used as a template.
>
> Travis?
>   

The use case I had in mind comes about quite often in NumPy when you 
want to modify the data-area of an object which may have a 
non-contiguous chunk of memory, but the algorithm being used expects 
contiguous data.  Imagine, for example, that the exporting object is an 
image whose rows are stored in different segments.  

The consumer of the buffer interface, however, may be an extension 
module that does fast image-processing operations and requires 
contiguous data.  Because it wants to write the results back in to the 
memory area when it is done with the algorithm (which may be thread-safe 
and may release the GIL), it requests the object to lock its data to 
read-only so that other consumers do not try to get writeable buffers 
while it is processing.

When the algorithm is done, it alone can write to the memory area and 
then when it releases the buffer, the original object will restore 
itself to being writeable.  Of course, the exporting object must support 
this kind of operation and not all objects will.  I expect the NumPy 
array object and the PIL to support it for example, and other 
media-centric objects.  

It would probably be useful if the bytes object supported it because 
then other objects could use it as the memory area.    To do it 
correctly, the object exporting the interface must only allow locking if 
no other writeable interfaces have been exported (which it must keep 
track of) and then on release must check to see if the buffer that is 
being released is the one that locked its data.

For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a 
slightly different implementation of the concept.   When this flag is 
set during conversion to an array, then if a copy must be made to 
satisfy the requirements, the original array is set as read-only and 
this special flag is set on the array.  When the copy is deleted, its 
memory is automatically copied (and possibly casted, etc.) back into the 
original array.  It is a nice abstraction of the concept of an output 
data area that was borrowed from Numarray and allows many things to be 
implemented very quickly in NumPy.

One of the main things people use the NumPy C-API for is to get a 
contiguous chunk of memory from an array in order to do processing in 
another language (such as C or Fortran).   It is nice to be able to 
specify that the result gets placed back into another chunk of memory 
(which may or may not be contiguous) in a unified fashion.   NumPy 
handles all the copying for you.  

My thinking was that many people will want to be able to get contiguous 
chunks of memory, do processing, and then copy the result back into a 
segment of memory from a buffer-exporting object which is passed into 
the routine as an output object.

I'm not sure if my explanations are helpful.  Please let me know if I 
can explain further. 

-Travis



From martin at v.loewis.de  Tue Sep 11 07:22:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 11 Sep 2007 07:22:37 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E5DC4B.6030304@canterbury.ac.nz>
References: <1189270839.25695.18.camel@qrnik>		<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>	<46E2DF85.4090005@v.loewis.de>	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>	<46E31FA2.4060701@v.loewis.de>	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	<46E3B12E.1000703@v.loewis.de>	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	<46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>	<46E4D273.9080300@v.loewis.de>
	<46E5DC4B.6030304@canterbury.ac.nz>
Message-ID: <46E6261D.9010704@v.loewis.de>

>> The first right of the user is to get the source code
>> easily, without having to beg for it. Only then it is also
>> the user's right to modify it, and use the modified version
>> in the application.
> 
> Where does begging come into it? As long as the user
> is provided with information which allows them to
> easily obtain the source, there shouldn't be a
> problem.

No. If the user got the software on a CD-ROM, he should
not be required to use an internet connection to get the
source.

> What does "from the same source" mean, anyway? On
> the same hard disk? On a disk connected to the same
> computer? On a server in the same room? Same building?
> Owned by the same person/company?

Depends on how he gets the software. If the software
was received by download, getting the source by download
is fine. If the software was in a box he got by mail,
the source should be in the same box (or a written
offer to get the source in a box).

> If there's a link on the same web page that works
> when the user clicks on it, I don't think they're
> even going to notice the difference.

Certainly not. The "problem" is with copies you don't
receive through download. E.g. if Python comes
preinstalled in some device, that device should be
accompanied directly with the source "on a medium
customarily used for software interchange" (i.e.
you should not just print out the source code in
the handbook).

Regards,
Martin



From martin at v.loewis.de  Tue Sep 11 07:26:57 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 11 Sep 2007 07:26:57 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<46E31FA2.4060701@v.loewis.de>	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	<46E3B12E.1000703@v.loewis.de>	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	<46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>	<46E4D273.9080300@v.loewis.de>
	<46E5DC4B.6030304@canterbury.ac.nz>	<46E5DE92.8070808@hastings.org>
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
Message-ID: <46E62721.4020009@v.loewis.de>

> If python.org agreed to host the GMP source, that would suffice for
> all people distributing python binaries (they could then just refer to
> the GMP source download as a link).

It would not if they don't distribute the binary through download.
If they put it on some media, or preinstalled on a computer (which
happens a lot), offering the source for download through the internet
is not good enough. Option 6d) only applies if the binaries are
distributed "by offering access to copy from a designated place".

> The FSF explicitly states that
> this kind of agreement satisfies that requirement of the license.

Where do they do that?

> As for the user-replaceable shared library part, that's up for
> considerable debate.  It's unlikely that static linkage legally
> creates a derivative work (that would be pretty unreasonable in
> computer science terms), but it's never been tested in court, so
> static linking would probably be out for distributors without a legal
> department.

Perhaps. However, even if you link dynamically, you would *still*
have to provide source code along with the binary.

Regards,
Martin

From martin at v.loewis.de  Tue Sep 11 07:32:14 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 11 Sep 2007 07:32:14 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709102150k217adedblfc7cc7b57309f5a7@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<46E3BBE7.4020800@v.loewis.de>	<46E48C10.7010705@canterbury.ac.nz>		<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>		<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>		<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>	
	<66d0a6e10709102150k217adedblfc7cc7b57309f5a7@mail.gmail.com>
Message-ID: <46E6285E.7060901@v.loewis.de>

> Interesting, I didn't look at the code (obviously), but my
> understanding was that it was only positive integers below 100.

See NSMALLPOSINTS and NSMALLNEGINTS. It's 257 positive ints since
r42552, contributed through bugs.python.org/1436243.

Regards,
Martin

From larry at hastings.org  Tue Sep 11 08:09:29 2007
From: larry at hastings.org (Larry Hastings)
Date: Mon, 10 Sep 2007 23:09:29 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik> <46E31FA2.4060701@v.loewis.de>	
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>	
	<46E3B12E.1000703@v.loewis.de>	
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>	
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>	
	<46E5DE92.8070808@hastings.org>
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
Message-ID: <46E63119.2070502@hastings.org>

Nicholas Bastin wrote:
> As for the user-replaceable shared library part, that's up for
> considerable debate.  It's unlikely that static linkage legally
> creates a derivative work (that would be pretty unreasonable in
> computer science terms), but it's never been tested in court, so
> static linking would probably be out for distributors without a legal
> department.

I guess anything is debatable, but the LGPL explicitly defines programs 
statically-linked with LGPL code as being "derivative works":

    *5.* A program that contains no derivative of any portion of the
    Library, but is designed to work with the Library by being compiled
    or linked with it, is called a "work that uses the Library". Such a
    work, in isolation, is not a derivative work of the Library, and
    therefore falls outside the scope of this License.

    However, linking a "work that uses the Library" with the Library
    creates an executable that is a derivative of the Library (because
    it contains portions of the Library), rather than a "work that uses
    the library". The executable is therefore covered by this License.
    Section 6 states terms for distribution of such executables.

I feel it's intellectually dishonest to ignore the LGPL's restrictions 
on the basis that its definitions haven't been tested in court.  You 
seem to suggest that, were Python to incorporate LGPL code, 
organizations which redistribute a statically-linked Python should 
ignore the LGPL-induced restrictions--is that really what you mean?

I for one am relatively happy with the existing Python license.  I would 
be quite irritated if Python were to incur more restrictive licenses, 
whether or not they had been tested in court.


/larry/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070910/775bfdb6/attachment.htm 

From martin at v.loewis.de  Tue Sep 11 09:21:22 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 11 Sep 2007 09:21:22 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<46E3BBE7.4020800@v.loewis.de>	<46E48C10.7010705@canterbury.ac.nz>		<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>		<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>		<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
Message-ID: <46E641F2.4020701@v.loewis.de>

> 3.0: 10 loops, best of 3: 6.76 sec per loop
> 2.6: 10 loops, best of 3: 2.61 sec per loop

I can't quite reproduce these results. On a 3.2GHz Pentium 4,
running Linux 2.6.21, gcc 4.1.3, I get

3.0: 10 loops, best of 3: 728 msec per loop
2.6: 10 loops, best of 3: 558 msec per loop

So it's only 30% slower, not 260%.

What puzzles me more is that on comparable machines, it
runs 5 to 10 times as fast on Linux as it does on Windows.
Have you turned off optimization by any chance in the
compiler (what compiler did you use, anyway)?

Regards,
Martin

From krstic at solarsail.hcs.harvard.edu  Tue Sep 11 09:21:20 2007
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Tue, 11 Sep 2007 03:21:20 -0400
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <52dc1c820709071148l2c3061f9l14c929657ef7e397@mail.gmail.com>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
	<308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
	<52dc1c820709071148l2c3061f9l14c929657ef7e397@mail.gmail.com>
Message-ID: <1B544854-053A-45C9-869B-92F48D54CA45@solarsail.hcs.harvard.edu>

On Sep 7, 2007, at 2:48 PM, Gregory P. Smith wrote:
> fwiw hashes are not cryptography.

I assume you mean legally? I was referring to the fact that we're  
specifically discussing cryptographic hashes.

> I see nothing wrong with leaving pycrypto as an add-on library as  
> most things don't need it.  http://www.amk.ca/python/code/crypto.

Last I heard, AMK was no longer maintaining pycrypto, and a number of  
people have found weird issues with it and were generally uncertain  
of the correctness of the implemented crypto.

> The pycrypto API is is very nice.  But if we were to consider it  
> for the standard library I'd prefer it just link against OpenSSL  
> rather than use its own C implementations and just leave platforms  
> without ssl without any crypto.

That's one option, although there seems to be some FUD surrounding  
OpenSSL licensing and its interactions with the GPL:

     

It's also a standalone library, and it strikes me as much nicer to  
just have Python provide the crypto functionality out of the box. So,  
if we built an API atop the (public domain) LibTomCrypt code that  
mimicked that of pycrypto, would anyone object to getting that kind  
of thing into the Python source distribution?

> Besides the chances are that most programmers seeing a crypto  
> library will misuse it and gain a false sense of security on what  
> they've done. ;)

Consenting adults, etc.

--
Ivan Krsti?  | http://radian.org

From krstic at solarsail.hcs.harvard.edu  Tue Sep 11 09:29:26 2007
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Tue, 11 Sep 2007 03:29:26 -0400
Subject: [Python-3000] 3.0 crypto (was: Re: Solaris support in 3.0?)
In-Reply-To: 
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	
Message-ID: <6EA91F68-7625-47FA-90BC-2F0E1455F1B9@solarsail.hcs.harvard.edu>

On Sep 6, 2007, at 10:54 AM, Guido van Rossum wrote:
> I'm not sure what you meant with "doing the work isn't a problem". Are
> you volunteering? I think we need someone who understands the red tape
> situation most of all. Hopefully I'm worried for nothing.

I'm trying to feel out whether there's strong opposition to shipping  
a good set of built-in crypto operations with Python, and in a way  
that doesn't depend on external libraries.

There are three reasons for opposition that I could imagine:

- legal, in that there's uncertainty about what we can or can't ship.  
I can very likely get the appropriate assistance here to clarify the  
situation.

- technical, in that no one has been willing to do the work of  
providing such a set of crypto ops, and/or of writing a PEP for them.

- philosophical, in that folks think crypto shouldn't come bundled  
with the language.

I'm volunteering to tackle the first two, assuming those are the  
actual problems. Are they?

--
Ivan Krsti?  | http://radian.org

From nick.bastin at gmail.com  Tue Sep 11 10:38:21 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 11 Sep 2007 04:38:21 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E62721.4020009@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org>
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
	<46E62721.4020009@v.loewis.de>
Message-ID: <66d0a6e10709110138w3fcb5f7bl87168db2328695d1@mail.gmail.com>

On 9/11/07, "Martin v. L?wis"  wrote:
> > If python.org agreed to host the GMP source, that would suffice for
> > all people distributing python binaries (they could then just refer to
> > the GMP source download as a link).
>
> It would not if they don't distribute the binary through download.
> If they put it on some media, or preinstalled on a computer (which
> happens a lot), offering the source for download through the internet
> is not good enough. Option 6d) only applies if the binaries are
> distributed "by offering access to copy from a designated place".

This is a good point.

> > The FSF explicitly states that
> > this kind of agreement satisfies that requirement of the license.
>
> Where do they do that?

In the GPL FAQ ().  Specifically:

Can I put the binaries on my Internet server and put the source on a
different Internet site?
    The GPL says you must offer access to copy the source code "from
the same place"; that is, next to the binaries. However, if you make
arrangements with another site to keep the necessary source code
available, and put a link or cross-reference to the source code next
to the binaries, we think that qualifies as "from the same place".

> > As for the user-replaceable shared library part, that's up for
> > considerable debate.  It's unlikely that static linkage legally
> > creates a derivative work (that would be pretty unreasonable in
> > computer science terms), but it's never been tested in court, so
> > static linking would probably be out for distributors without a legal
> > department.
>
> Perhaps. However, even if you link dynamically, you would *still*
> have to provide source code along with the binary.

No one is disputing that, just saying that the terms could be made
less onerous for subsequent distributors of python by securing a
written guarantee from python.org that python.org would continue to
distribute the source code on the internet.

Of course, as several people have now pointed out, non-internet
distribution would still have to ship the source code on their own,
since the FAQ also prefers that source distribution be done by the
same method as binary distribution.  However, that being said, I don't
see it as particularly onerous to add a small source distribution to a
CD, since there's only a marginal increase in effective cost.

All of this being said, GMP has been shot down for plenty of good
technical reasons, which is really the question that was asked in the
first place.  This legal discussion is bordering on the sublime at
this point, given that no one is actually suggesting that we bind
Python to any LGPL software (nor, by the way, was that actually ever
suggested - the question was asked of what the community thought of a
particular piece of software, and an idea in general, and instead of
answering that question, most decided to explain what they thought of
a particular license, ignoring the technical questions entirely).

--
Nick

From nick.bastin at gmail.com  Tue Sep 11 10:59:32 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 11 Sep 2007 04:59:32 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E63119.2070502@hastings.org>
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org>
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
	<46E63119.2070502@hastings.org>
Message-ID: <66d0a6e10709110159w1861c488j15375a543a3502b4@mail.gmail.com>

On 9/11/07, Larry Hastings  wrote:
>  I guess anything is debatable, but the LGPL explicitly defines programs
> statically-linked with LGPL code as being "derivative works":

Where exactly does it do that?  The GPL does that, but not the LGPL.
In fact, the LGPL does not define nor reference "derivative works" in
any way.

Earlier revisions of the LGPL were potentially somewhat more
restrictive, and certainly harder to parse, but the current version is
reasonably clear on this topic.

> 5. A program that contains no derivative of any portion of the Library, but
> is designed to work with the Library by being compiled or linked with it, is
> called a "work that uses the Library". Such a work, in isolation, is not a
> derivative work of the Library, and therefore falls outside the scope of
> this License.

What version of the LGPL did you find this clause in?  Section 5 of
the current license says the following:

5. Combined Libraries.

You may place library facilities that are a work based on the Library
side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:

    * a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
    * b) Give prominent notice with the combined library that part of
it is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.

>I feel it's intellectually dishonest to ignore the LGPL's
restrictions on the basis that its
>definitions haven't been tested in court.  You seem to suggest that,
were Python to
>incorporate LGPL code, organizations which redistribute a
statically-linked Python should
>ignore the LGPL-induced restrictions--is that really what you mean?

No, that's why I said that statically linking was out for
distributions without their own legal department.  That was supposed
to be read as, "we don't supply legal advice, they have to make their
own decisions".  If they want to interpret it to mean that static
linkage is fine, then that's their own decision.  In my experience,
lawyers don't view those kinds of decisions as "intellectually
dishonest", but rather as "up for interpretation".  I'll leave it as
an exercise for the reader to determine what they think of that
particular philosophy.

--
Nick

From nick.bastin at gmail.com  Tue Sep 11 11:20:45 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 11 Sep 2007 05:20:45 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E641F2.4020701@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
	<46E641F2.4020701@v.loewis.de>
Message-ID: <66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>

On 9/11/07, "Martin v. L?wis"  wrote:
> > 3.0: 10 loops, best of 3: 6.76 sec per loop
> > 2.6: 10 loops, best of 3: 2.61 sec per loop
>
> I can't quite reproduce these results. On a 3.2GHz Pentium 4,
> running Linux 2.6.21, gcc 4.1.3, I get
>
> 3.0: 10 loops, best of 3: 728 msec per loop
> 2.6: 10 loops, best of 3: 558 msec per loop
>
> So it's only 30% slower, not 260%.

It's certainly possible that other architecture/os/compiler
combinations will generate different results, although I was able to
produce similar scaling results on my Core Duo in my MacBook Pro under
MacOS X 10.4.10 using gcc 4.0.1 (Apple build 5247).

> What puzzles me more is that on comparable machines, it
> runs 5 to 10 times as fast on Linux as it does on Windows.

The machines actually aren't that comparable.  The differences between
the P4 and PD are vast. Depending on which P4 revision you have (and 3
Ghz was available in more than one flavor - northwood, prescott, P4HT,
prescott 2M and cedar mill), your FSB is possibly up to 50% faster
than mine, and you may have 2MB of L2 cache.  Almost all available
3Ghz P4s had hyperthreading, and while I don't believe that would have
any effect in this case, I don't know (I don't believe HT ever
performed any "magic" on non-threaded code).

> Have you turned off optimization by any chance in the
> compiler (what compiler did you use, anyway)?

VC.NET 2005 Pro.  I did not optimize beyond what is in the Python
vcproj, but I ran both in release build configurations, which I
presume have some optimizations enabled, anyhow (It appears to set
/O2, but no more).

--
Nick

From martin at v.loewis.de  Tue Sep 11 13:03:18 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 11 Sep 2007 13:03:18 +0200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709110138w3fcb5f7bl87168db2328695d1@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>	
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>	
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>	
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>	
	<46E5DE92.8070808@hastings.org>	
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>	
	<46E62721.4020009@v.loewis.de>
	<66d0a6e10709110138w3fcb5f7bl87168db2328695d1@mail.gmail.com>
Message-ID: <46E675F6.8090604@v.loewis.de>

> In the GPL FAQ ().  Specifically:
> 
> Can I put the binaries on my Internet server and put the source on a
> different Internet site?

Ok. As you say, this applies to downloading only.

> Of course, as several people have now pointed out, non-internet
> distribution would still have to ship the source code on their own,
> since the FAQ also prefers that source distribution be done by the
> same method as binary distribution.

I'm glad we now agree that you have to ship GMP sources with any
Python binary that you distribute.

> However, that being said, I don't
> see it as particularly onerous to add a small source distribution to a
> CD, since there's only a marginal increase in effective cost.

So the issue now is only whether that's acceptable. I think it is not;
CPython should not rely on LGPL'ed code.

> All of this being said, GMP has been shot down for plenty of good
> technical reasons, which is really the question that was asked in the
> first place. 

Hmm. You asked "Would anyone be opposed to rehosting PyLong on top of
GMP?", which is a different question than the one you just said you
asked. If you had agreed on the facts from the beginning, this
entire discussion would not have taken place.

> This legal discussion is bordering on the sublime at
> this point, given that no one is actually suggesting that we bind
> Python to any LGPL software (nor, by the way, was that actually ever
> suggested - the question was asked of what the community thought of a
> particular piece of software

No, that was not the question, either. You asked "Would anyone be
opposed to rehosting PyLong on top of GMP?", not "what do you think
about GMP?". "rehosting PyLong on top of GMP" literally requires
binding Python to GMP.

> and an idea in general, and instead of
> answering that question, most decided to explain what they thought of
> a particular license, ignoring the technical questions entirely).

I personally never said what I think of the LGPL. I was merely trying
to explain what it actually says. FWIW, I quite like both the GPL, and
the LGPL, and applaud the motivations behind it. That's why I prefer
to follow it faithfully, and in its spirit, rather than trying to
weasel-word out of it.

Regards,
Martin

From p.f.moore at gmail.com  Tue Sep 11 14:21:20 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 11 Sep 2007 13:21:20 +0100
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709091941h749630fag9e3739fd24ab31fd@mail.gmail.com>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
	<46E641F2.4020701@v.loewis.de>
	<66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>
Message-ID: <79990c6b0709110521p10722897s6e4d03e5a558b457@mail.gmail.com>

On 11/09/2007, Nicholas Bastin  wrote:
> On 9/11/07, "Martin v. L?wis"  wrote:
> > > 3.0: 10 loops, best of 3: 6.76 sec per loop
> > > 2.6: 10 loops, best of 3: 2.61 sec per loop
> >
> > I can't quite reproduce these results. On a 3.2GHz Pentium 4,
> > running Linux 2.6.21, gcc 4.1.3, I get
> >
> > 3.0: 10 loops, best of 3: 728 msec per loop
> > 2.6: 10 loops, best of 3: 558 msec per loop
> >
> > So it's only 30% slower, not 260%.

FWIW, I get

>python -m timeit "import inttest; inttest.int_test2(5)"
10 loops, best of 3: 367 msec per loop

>\Apps\Python30\python -m timeit "import inttest; inttest.int_test2(5)"
10 loops, best of 3: 810 msec per loop

That's on Windows XP, distributed binaries of Python 2.5 and 3.0a1.
Processor speed:           1.7 GHz
Processor type:            Intel(R) Pentium(R) M processor

That's 120% slower (but against very different versions).

I guess this proves nothing much, apart from the fact that the test is
wildly variable and as such probably not very valid :-)

Paul.

From eric+python-dev at trueblade.com  Tue Sep 11 14:47:12 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 11 Sep 2007 08:47:12 -0400
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E559E9.4090907@trueblade.com>
References: <46E559E9.4090907@trueblade.com>
Message-ID: <46E68E50.8050101@trueblade.com>

Eric Smith wrote:
> I have a patch to add __format__ to datetime.time, .date, and .datetime. 
>   For non-empty format_spec's, I just pass on to .strftime.  For empty 
> format_spec's, it returns str(self).

What's the best way to call str(self)?

I'm currently doing:
     if (PyUnicode_GetSize(format) == 0)
        return PyObject_CallMethod((PyObject *)self, "__str__", NULL);

Although this works, calling self.__str__ doesn't seem like the right 
thing to do.

Thanks.

From ncoghlan at gmail.com  Tue Sep 11 15:35:33 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 11 Sep 2007 23:35:33 +1000
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E68E50.8050101@trueblade.com>
References: <46E559E9.4090907@trueblade.com> <46E68E50.8050101@trueblade.com>
Message-ID: <46E699A5.20307@gmail.com>

Eric Smith wrote:
> Eric Smith wrote:
>> I have a patch to add __format__ to datetime.time, .date, and .datetime. 
>>   For non-empty format_spec's, I just pass on to .strftime.  For empty 
>> format_spec's, it returns str(self).
> 
> What's the best way to call str(self)?
> 
> I'm currently doing:
>      if (PyUnicode_GetSize(format) == 0)
>         return PyObject_CallMethod((PyObject *)self, "__str__", NULL);
> 
> Although this works, calling self.__str__ doesn't seem like the right 
> thing to do.

PyObject_Str is the C API equivalent of str, but I believe 
PyObject_Unicode is currently the right call for Py3k [1].

Cheers,
Nick.

[1] http://docs.python.org/api/object.html

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Sep 11 15:59:10 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 11 Sep 2007 23:59:10 +1000
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E675F6.8090604@v.loewis.de>
References: <1189270839.25695.18.camel@qrnik>
	<46E3B12E.1000703@v.loewis.de>		<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>		<46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz>		<46E4D273.9080300@v.loewis.de>
	<46E5DC4B.6030304@canterbury.ac.nz>		<46E5DE92.8070808@hastings.org>		<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>		<46E62721.4020009@v.loewis.de>	<66d0a6e10709110138w3fcb5f7bl87168db2328695d1@mail.gmail.com>
	<46E675F6.8090604@v.loewis.de>
Message-ID: <46E69F2E.9080509@gmail.com>

Martin v. L?wis wrote:
> I personally never said what I think of the LGPL. I was merely trying
> to explain what it actually says. FWIW, I quite like both the GPL, and
> the LGPL, and applaud the motivations behind it. That's why I prefer
> to follow it faithfully, and in its spirit, rather than trying to
> weasel-word out of it.

I have to agree with what Martin has said here - the PSF license used 
for the CPython interpreter is designed to give a lot of flexibility to 
embedders and developers using the engine. Preserving the freedom of 
end-users to access the interpreter source code isn't one of the aims of 
the license, so redistributors are free to use whatever license they 
like, and are also free to distribute the software purely in binary form.

The LGPL and GPL have different aims from the PSF license, with a much 
greater focus on preserving freedom for the end-user, so code under 
those licenses doesn't fit in with the licensing model for the base 
CPython distribution. Even though it would be possible for the PSF to do 
what was necessary to make the inclusion of LGPL code legal, the effect 
on the overall licensing model would be a major inconvenience for 
downstream embedders and developers. So rather than trying to skirt the 
letter of the licenses, it makes sense to just obey the spirit and 
accept that this may sometimes prevent us from using code that might 
otherwise be helpful.

In at least one case where this mattered in the past (locale independent 
atoi/atof, if I recall correctly), the author of the relevant code was 
actually kind enough to grant the PSF direct permission to use the code 
under a Python contributor agreement.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From mark at qtrac.eu  Tue Sep 11 16:06:32 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Tue, 11 Sep 2007 15:06:32 +0100
Subject: [Python-3000] ordered dict for p3k collections?
Message-ID: <200709111506.32823.mark@qtrac.eu>

Hi,

Is there any chance that an ordered dict will be added to Python 3's
library?

I personally find such data structures v. useful in C++. I know that in
Python the sort function is v. fast, but often I prefer never to sort
but simply to use an ordered data structure in the first place.
(I'm aware that for ordered lists I can use the bisect module, but I
want an ordered key-value data structure.)

I think other people must find such things useful. There are three
implementations on the Python Cookbook site, and one on PyPI, all in
pure Python (plus I have my own implementation, also pure Python).

I would suppose that it would be better if it was implemented in C---for
example, my own pure Python ordered dict takes about eight times as long
to load in 18,000 items compared with loading the same into a dict.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From eric+python-dev at trueblade.com  Tue Sep 11 16:21:10 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Tue, 11 Sep 2007 10:21:10 -0400
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E699A5.20307@gmail.com>
References: <46E559E9.4090907@trueblade.com> <46E68E50.8050101@trueblade.com>
	<46E699A5.20307@gmail.com>
Message-ID: <46E6A456.1020200@trueblade.com>

Nick Coghlan wrote:
> Eric Smith wrote:
>> Eric Smith wrote:
>>> I have a patch to add __format__ to datetime.time, .date, and 
>>> .datetime.   For non-empty format_spec's, I just pass on to 
>>> .strftime.  For empty format_spec's, it returns str(self).
>>
>> What's the best way to call str(self)?
>>
>> I'm currently doing:
>>      if (PyUnicode_GetSize(format) == 0)
>>         return PyObject_CallMethod((PyObject *)self, "__str__", NULL);
>>
>> Although this works, calling self.__str__ doesn't seem like the right 
>> thing to do.
> 
> PyObject_Str is the C API equivalent of str, but I believe 
> PyObject_Unicode is currently the right call for Py3k [1].

Of course!  Thanks for the help, I was trying to over-complicate it.

Eric.



From skip at pobox.com  Tue Sep 11 16:33:03 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 11 Sep 2007 09:33:03 -0500
Subject: [Python-3000] __format__ and datetime
In-Reply-To: 
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
Message-ID: <18150.42783.278892.121765@montanaro.dyndns.org>


    Skip> I would like to see an analog to %S which preserves fractions of a
    Skip> second as the default formatting for time and datetime objects
    Skip> does:

    Skip> >>> print(now)
    Skip> 2007-09-10 22:07:53.654774

    Guido> Right. It's odd that there's nothing explicit that exactly
    Guido> produces the default. (Though floats have this issue too -- I
    Guido> wish it could be fixed there too.)

Looking at the libref doc for time.strftime and the strftime(3) man pages on
Solaris 10, Mac OS X and CentOS 4, I see that %f is unused ("f" is mnemonic
for "fractions" of a second).  Maybe after a little more investigation and
not endless amounts of discussion this could be added to Python as the way
to represent the fractions of seconds as an int representing microseconds.
For example, the above example could be specified by

    %Y-%m-%d %H:%M:%S.%f

Thinking about future advances in timekeeping, is microseconds too short?
Maybe "%N" for "nanoseconds"?

Skip

From qrczak at knm.org.pl  Tue Sep 11 17:38:58 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 11 Sep 2007 17:38:58 +0200
Subject: [Python-3000] help(pickle) fails: unorderable types: type() < type()
Message-ID: <1189525138.14065.5.camel@qrnik>

Python 3.0a1 (py3k, Sep  8 2007, 15:57:56) 
[GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> help(pickle)
Traceback (most recent call last):
[...]
  File "/usr/local/lib/python3.0/pydoc.py", line 954, in repr1
    return getattr(self, methodname)(x, level)
  File "/usr/local/lib/python3.0/repr.py", line 78, in repr_dict
    for key in islice(sorted(x), self.maxdict):
TypeError: unorderable types: type() < type()

BTW, is cPickle officially gone and should pickle be used instead?

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From janssen at parc.com  Tue Sep 11 18:15:24 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 11 Sep 2007 09:15:24 PDT
Subject: [Python-3000] 3.0 crypto (was: Re: Solaris support in 3.0?)
In-Reply-To: <6EA91F68-7625-47FA-90BC-2F0E1455F1B9@solarsail.hcs.harvard.edu> 
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	
	<6EA91F68-7625-47FA-90BC-2F0E1455F1B9@solarsail.hcs.harvard.edu>
Message-ID: <07Sep11.091532pdt."57996"@synergy1.parc.xerox.com>

> I'm trying to feel out whether there's strong opposition to shipping =20
> a good set of built-in crypto operations with Python, and in a way =20
> that doesn't depend on external libraries.

Could you say a bit more about what these "built-in crypto operations"
would be?  What's the scope of your ambition here?

Bill

From jimjjewett at gmail.com  Tue Sep 11 18:56:06 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 11 Sep 2007 12:56:06 -0400
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E62358.3020404@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
Message-ID: 

On 9/11/07, Travis E. Oliphant  wrote:
> Guido van Rossum wrote:
> > ... I'm hoping Travis has a particular way in mind of
> > handling LOCKDATA that can be used as a template.

> The use case I had in mind comes about quite often in NumPy when you
> want to modify the data-area of an object which may have a
> non-contiguous chunk of memory, but the algorithm being used expects
> contiguous data.  Imagine, for example, that the exporting object is an
> image whose rows are stored in different segments.

> The consumer of the buffer interface, however, may be an extension
> module that does fast image-processing operations and requires
> contiguous data.  Because it wants to write the results back in to the
> memory area when it is done with the algorithm (which may be thread-safe
> and may release the GIL), it requests the object to lock its data to
> read-only so that other consumers do not try to get writeable buffers
> while it is processing.

Does it do its processing in the original buffer, causing it to be
temporarily invalid? If so, no one else should even be reading it.

Or does it just replace the original buffer with the new results once
it is finished?  If so, then why does it need the lock the whole time?
 Is someone getting known stale data (when you could tell them to
wait) always OK, but overwriting someone else's change never is?

-jJ

From nick.bastin at gmail.com  Tue Sep 11 19:03:00 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Tue, 11 Sep 2007 13:03:00 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <79990c6b0709110521p10722897s6e4d03e5a558b457@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
	<46E641F2.4020701@v.loewis.de>
	<66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>
	<79990c6b0709110521p10722897s6e4d03e5a558b457@mail.gmail.com>
Message-ID: <66d0a6e10709111003y4bc1e5acpfe7ce26841718a37@mail.gmail.com>

On 9/11/07, Paul Moore  wrote:
> On 11/09/2007, Nicholas Bastin  wrote:
> > On 9/11/07, "Martin v. L?wis"  wrote:
> > > > 3.0: 10 loops, best of 3: 6.76 sec per loop
> > > > 2.6: 10 loops, best of 3: 2.61 sec per loop
> > >
> > > I can't quite reproduce these results. On a 3.2GHz Pentium 4,
> > > running Linux 2.6.21, gcc 4.1.3, I get
> > >
> > > 3.0: 10 loops, best of 3: 728 msec per loop
> > > 2.6: 10 loops, best of 3: 558 msec per loop
> > >
> > > So it's only 30% slower, not 260%.
>
> FWIW, I get
>
> >python -m timeit "import inttest; inttest.int_test2(5)"
> 10 loops, best of 3: 367 msec per loop
>
> >\Apps\Python30\python -m timeit "import inttest; inttest.int_test2(5)"
> 10 loops, best of 3: 810 msec per loop
>
> That's on Windows XP, distributed binaries of Python 2.5 and 3.0a1.
> Processor speed:           1.7 GHz
> Processor type:            Intel(R) Pentium(R) M processor
>
> That's 120% slower (but against very different versions).
>
> I guess this proves nothing much, apart from the fact that the test is
> wildly variable and as such probably not very valid :-)

The Pentium M and Pentium D are much more alike, architecturally, than
either and the Pentium 4, although the per-clock performance of the
Pentium M is much better than either the 4 or the D (although not
*that* good compared to a D, I didn't think).  In a test like this
where the loop is reasonably tight (even given the trek through the
python interpreter), processor architecture and differing compiler
optimizations will likely have a pretty significant effect on the
overall performance.  Without looking into it at a much lower level,
it's hard to tell, but the difference between a 1MB and 2MB L2 cache
might make all the difference in 3.0 performance.

--
Nick

From guido at python.org  Tue Sep 11 19:27:27 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 10:27:27 -0700
Subject: [Python-3000] Which joker tried to remove me from the py3k list?
Message-ID: 

---------- Forwarded message ----------
From: python-3000-confirm+a02c328561e5ecf4a0373b3c0001cd33ec59ea4f at python.org

Date: Sep 11, 2007 9:58 AM
Subject: Your confirmation is required to leave the Python-3000 mailing list
To: guido at python.org


Mailing list removal confirmation notice for mailing list Python-3000

We have received a request for the removal of your email address,
"guido at python.org" from the python-3000 at python.org mailing list.  To
confirm that you want to be removed from this mailing list, simply
reply to this message, keeping the Subject: header intact.  Or visit
this web page:

    http://mail.python.org/mailman/confirm/python-3000/a02c328561e5ecf4a0373b3c0001cd33ec59ea4f


Or include the following line -- and only the following line -- in a
message to python-3000-request at python.org:

    confirm a02c328561e5ecf4a0373b3c0001cd33ec59ea4f

Note that simply sending a `reply' to this message should work from
most mail readers, since that usually leaves the Subject: line in the
right form (additional "Re:" text in the Subject: is okay).

If you do not wish to be removed from this list, please simply
disregard this message.  If you think you are being maliciously
removed from the list, or have any other questions, send them to
python-3000-owner at python.org.


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 11 19:46:12 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 10:46:12 -0700
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <18150.42783.278892.121765@montanaro.dyndns.org>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
Message-ID: 

On 9/11/07, skip at pobox.com  wrote:
>
>     Skip> I would like to see an analog to %S which preserves fractions of a
>     Skip> second as the default formatting for time and datetime objects
>     Skip> does:
>
>     Skip> >>> print(now)
>     Skip> 2007-09-10 22:07:53.654774
>
>     Guido> Right. It's odd that there's nothing explicit that exactly
>     Guido> produces the default. (Though floats have this issue too -- I
>     Guido> wish it could be fixed there too.)
>
> Looking at the libref doc for time.strftime and the strftime(3) man pages on
> Solaris 10, Mac OS X and CentOS 4, I see that %f is unused ("f" is mnemonic
> for "fractions" of a second).  Maybe after a little more investigation and
> not endless amounts of discussion this could be added to Python as the way
> to represent the fractions of seconds as an int representing microseconds.
> For example, the above example could be specified by
>
>     %Y-%m-%d %H:%M:%S.%f
>
> Thinking about future advances in timekeeping, is microseconds too short?
> Maybe "%N" for "nanoseconds"?

No, the datetime module is explicitly defined to use microseconds. I
don't expect there to be a practical use for nanoseconds (even
microseconds are doubtful, but useful since one might want unique
timestamps for more than 1000 events per second).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 11 19:52:19 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 10:52:19 -0700
Subject: [Python-3000] help(pickle) fails: unorderable types: type() <
	type()
In-Reply-To: <1189525138.14065.5.camel@qrnik>
References: <1189525138.14065.5.camel@qrnik>
Message-ID: 

On 9/11/07, Marcin 'Qrczak' Kowalczyk  wrote:
> Python 3.0a1 (py3k, Sep  8 2007, 15:57:56)
> [GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pickle
> >>> help(pickle)
> Traceback (most recent call last):
> [...]
>   File "/usr/local/lib/python3.0/pydoc.py", line 954, in repr1
>     return getattr(self, methodname)(x, level)
>   File "/usr/local/lib/python3.0/repr.py", line 78, in repr_dict
>     for key in islice(sorted(x), self.maxdict):
> TypeError: unorderable types: type() < type()

Mind reporting this on bugs.python.org?

> BTW, is cPickle officially gone and should pickle be used instead?

Yes. There will be a transparent accellerator written in C, but the
public API will be called "pickle".

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 11 20:00:17 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 11:00:17 -0700
Subject: [Python-3000] 3.0 crypto (was: Re: Solaris support in 3.0?)
In-Reply-To: <6EA91F68-7625-47FA-90BC-2F0E1455F1B9@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	
	<6EA91F68-7625-47FA-90BC-2F0E1455F1B9@solarsail.hcs.harvard.edu>
Message-ID: 

On 9/11/07, Ivan Krsti?  wrote:
> On Sep 6, 2007, at 10:54 AM, Guido van Rossum wrote:
> > I'm not sure what you meant with "doing the work isn't a problem". Are
> > you volunteering? I think we need someone who understands the red tape
> > situation most of all. Hopefully I'm worried for nothing.
>
> I'm trying to feel out whether there's strong opposition to shipping
> a good set of built-in crypto operations with Python, and in a way
> that doesn't depend on external libraries.
>
> There are three reasons for opposition that I could imagine:
>
> - legal, in that there's uncertainty about what we can or can't ship.
> I can very likely get the appropriate assistance here to clarify the
> situation.

I think you will have to start here.

> - technical, in that no one has been willing to do the work of
> providing such a set of crypto ops, and/or of writing a PEP for them.

Well, most people in need of crypto with Python can find what they
want as 3rd party code (whether using openssl or not). That these
haven't been integrated with Python is often more a matter of
different project management styles than a philosophical disagreement.
E.g. code that gets significant updates twice a year isn't ready for
inclusion into Python, which only releases new features every 18-24
months.

> - philosophical, in that folks think crypto shouldn't come bundled
> with the language.

I don't think so, though the release managers might disagree. The PR
disaster if a bug in the crypto code were to require shipment of
updates could be significant.

> I'm volunteering to tackle the first two, assuming those are the
> actual problems. Are they?

Why write something new instead of integrating existing code?

What's wrong with openssl?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 11 21:02:41 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 12:02:41 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E62358.3020404@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
Message-ID: 

On 9/10/07, Travis E. Oliphant  wrote:
> Guido van Rossum wrote:
> > I'd like to see Travis's response to this. It's setting a precedent
> > regarding locking objects in read-only mode; I haven't found other
> > examples of objects using LOCKDATA (the only mentions of it seem to be
> > rejecting it :). I keep getting confused by the two separate lock
> > counts (and I think in this version the comment is inconsistent with
> > the code). So I'm hoping Travis has a particular way in mind of
> > handling LOCKDATA that can be used as a template.
> >
> > Travis?
>
> The use case I had in mind comes about quite often in NumPy when you
> want to modify the data-area of an object which may have a
> non-contiguous chunk of memory, but the algorithm being used expects
> contiguous data.  Imagine, for example, that the exporting object is an
> image whose rows are stored in different segments.
>
> The consumer of the buffer interface, however, may be an extension
> module that does fast image-processing operations and requires
> contiguous data.  Because it wants to write the results back in to the
> memory area when it is done with the algorithm (which may be thread-safe
> and may release the GIL), it requests the object to lock its data to
> read-only so that other consumers do not try to get writeable buffers
> while it is processing.
>
> When the algorithm is done, it alone can write to the memory area and
> then when it releases the buffer, the original object will restore
> itself to being writeable.  Of course, the exporting object must support
> this kind of operation and not all objects will.  I expect the NumPy
> array object and the PIL to support it for example, and other
> media-centric objects.

Hm, so this is completely different from what I thought. It seems you
are describing the following:

1. acquire the buffer with LOCK_DATA
2. copy the data out of the buffer into a scratch area
3. work on the scratch area
4. copy the data from the scratch area back into the buffer
5. release the buffer

i would call this an exclusive write lock, which is quite different
from the read lock interpretation implemented by Greg in his patch.
Could you add some language to PEP 3118 to clarify this usage? Or is
it already there? I admit to not having read it in full...

> It would probably be useful if the bytes object supported it because
> then other objects could use it as the memory area.    To do it
> correctly, the object exporting the interface must only allow locking if
> no other writeable interfaces have been exported (which it must keep
> track of) and then on release must check to see if the buffer that is
> being released is the one that locked its data.

Right. So it seems you would need a counter of outstanding
non-data-locked buffer requests and a single bit indicating whether
there's a data-locked request. (Rather than two counters like Greg's
patch currently uses.)

The hacker in me is already exploring the possibility of making the
count negative if there's a data-locked request; it sounds like the
valid transitions are:

0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
0 -> -1 (LOCKDATA get)
-1 -> 0 (LOCKDATA release)

Have I got that right? I think that you should only be able to request
LOCKDATA if there are no other readers *or* writers, but that SIMPLE
and WRITABLE clients should be able to coexist (any mess that creates
would be the requester's own fault). Any nonzero value here would
indicate that the buffer can't be moved.

I note that the use case in the bsddb wrapper extension is a bit
different -- Greg suspects that BerkeleyDB won't like the data
changing while it is using it (e.g. it might violate its own invariant
if the key changes between the time its hash is computed and the time
it is written to disk). To ensure this, currently LOCKDATA is the only
option; but a classic read lock would allow multiple concurrent
readers (which is how Greg's patch to bytesobject.c interprets
LOCKDATA).

I think this needs to be clarified. Perhaps we need to separate
clearer the type of access (read or write) and the amount of locking
desired (can others read? can others write?).

(BTW The current implementation in bytesobject.c allows changing the
size as long as it fits within the allocated size; I think this is
probably too lenient, and begging for latent bugs.)

(Spelling alert: 'writeable' is apparently not an English word. I hope
it's not too late to rename the flag to PyBUF_WRITABLE. I've opened
http://bugs.python.org/issue1150 to track this.)

> For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
> slightly different implementation of the concept.   When this flag is
> set during conversion to an array, then if a copy must be made to
> satisfy the requirements, the original array is set as read-only and
> this special flag is set on the array.  When the copy is deleted, its
> memory is automatically copied (and possibly casted, etc.) back into the
> original array.  It is a nice abstraction of the concept of an output
> data area that was borrowed from Numarray and allows many things to be
> implemented very quickly in NumPy.

So in terms of locks, this effectively sets read *and* write locks on
the original object (since whatever you might read out of it may be
invalidated when the modified copy is written back). But how to
enforce that at the Python level? If we had something like this for
the bytes object, any *use* of the bytes object from Python (e.g.
iterating over it or indexing or slicing it) should be prohibited. Is
this reasonable?

> One of the main things people use the NumPy C-API for is to get a
> contiguous chunk of memory from an array in order to do processing in
> another language (such as C or Fortran).   It is nice to be able to
> specify that the result gets placed back into another chunk of memory
> (which may or may not be contiguous) in a unified fashion.   NumPy
> handles all the copying for you.
>
> My thinking was that many people will want to be able to get contiguous
> chunks of memory, do processing, and then copy the result back into a
> segment of memory from a buffer-exporting object which is passed into
> the routine as an output object.

This is probably common for numpy; for the bytes object, I expect that
it's all much simpler, since it's just a contiguous 1D array of
bytes...

> I'm not sure if my explanations are helpful.  Please let me know if I
> can explain further.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant at enthought.com  Tue Sep 11 21:49:11 2007
From: oliphant at enthought.com (Travis E. Oliphant)
Date: Tue, 11 Sep 2007 14:49:11 -0500
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>	
		
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>	
		
	<46E62358.3020404@enthought.com>
	
Message-ID: <46E6F137.2020001@enthought.com>

Guido van Rossum wrote:
> On 9/10/07, Travis E. Oliphant  wrote:
>   
>>
>
> Hm, so this is completely different from what I thought. It seems you
> are describing the following:
>
> 1. acquire the buffer with LOCK_DATA
> 2. copy the data out of the buffer into a scratch area
> 3. work on the scratch area
> 4. copy the data from the scratch area back into the buffer
> 5. release the buffer
>
> i would call this an exclusive write lock, which is quite different
> from the read lock interpretation implemented by Greg in his patch.
> Could you add some language to PEP 3118 to clarify this usage? Or is
> it already there? I admit to not having read it in full...
>   
Yes, you have nailed the usage I was thinking of.  I admit that there 
are other usage variants that I am not thinking of.   These should be 
vetted. 
>> It would probably be useful if the bytes object supported it because
>> then other objects could use it as the memory area.    To do it
>> correctly, the object exporting the interface must only allow locking if
>> no other writeable interfaces have been exported (which it must keep
>> track of) and then on release must check to see if the buffer that is
>> being released is the one that locked its data.
>>     
>
> Right. So it seems you would need a counter of outstanding
> non-data-locked buffer requests and a single bit indicating whether
> there's a data-locked request. (Rather than two counters like Greg's
> patch currently uses.)
>
> The hacker in me is already exploring the possibility of making the
> count negative if there's a data-locked request; it sounds like the
> valid transitions are:
>
> 0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
> ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
> 0 -> -1 (LOCKDATA get)
> -1 -> 0 (LOCKDATA release)
>
> Have I got that right? I think that you should only be able to request
> LOCKDATA if there are no other readers *or* writers, but that SIMPLE
> and WRITABLE clients should be able to coexist (any mess that creates
> would be the requester's own fault). Any nonzero value here would
> indicate that the buffer can't be moved.
>   
Your understanding looks fine to me.  A comment I got at SciPy gave me 
the feeling that this has the look of an infrastructure that is 
necessary for shared-memory and thread-safe memory management.  But, I 
do not admit to having thought through all of those issues.  However, I 
would welcome any suggestions for improvement that would allow the 
buffer interface to be used to manage memory in thread-safe ways.
> I note that the use case in the bsddb wrapper extension is a bit
> different -- Greg suspects that BerkeleyDB won't like the data
> changing while it is using it (e.g. it might violate its own invariant
> if the key changes between the time its hash is computed and the time
> it is written to disk). To ensure this, currently LOCKDATA is the only
> option; but a classic read lock would allow multiple concurrent
> readers (which is how Greg's patch to bytesobject.c interprets
> LOCKDATA).
>   
I'm not sure I understand the difference between a classic read lock and 
the exclusive write lock concept.   Does the classic read-lock just 
prevent writing to the memory area.  In my mind that is a read-only 
memory buffer and the buffer interface would complain if a writeable 
buffer was requested.

> I think this needs to be clarified. Perhaps we need to separate
> clearer the type of access (read or write) and the amount of locking
> desired (can others read? can others write?).
>   
Yes, I think the clarification is useful.  
> (BTW The current implementation in bytesobject.c allows changing the
> size as long as it fits within the allocated size; I think this is
> probably too lenient, and begging for latent bugs.)
>
> (Spelling alert: 'writeable' is apparently not an English word. I hope
> it's not too late to rename the flag to PyBUF_WRITABLE. I've opened
> http://bugs.python.org/issue1150 to track this.)
>
>   
Actually, writeable is an accepted variant of 'writable' (but it doesn't 
show up in many spell-check dictionaries).  No, it is not too late to 
change it.  Or just define WRITEABLE as WRITABLE.   NumPy uses 
"WRITEABLE" simply because I like that spelling better. 
>> For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
>> slightly different implementation of the concept.   When this flag is
>> set during conversion to an array, then if a copy must be made to
>> satisfy the requirements, the original array is set as read-only and
>> this special flag is set on the array.  When the copy is deleted, its
>> memory is automatically copied (and possibly casted, etc.) back into the
>> original array.  It is a nice abstraction of the concept of an output
>> data area that was borrowed from Numarray and allows many things to be
>> implemented very quickly in NumPy.
>>     
>
> So in terms of locks, this effectively sets read *and* write locks on
> the original object (since whatever you might read out of it may be
> invalidated when the modified copy is written back). 
Sort of, the object is set as read-only before the UPDATEIFCOPY version 
is made.  Another python thread could technically read the data (but the 
flag would be set on it so that the user could know that another memory 
area was shadowing this one).  Usually these kinds of object only show 
up as output arguments to functions and the programmer is left 
responsible to not try and rely on data that may be changing. 

Perhaps more fine-grained locks are needed. 

>
> This is probably common for numpy; for the bytes object, I expect that
> it's all much simpler, since it's just a contiguous 1D array of
> bytes...
>   
Yes, indeed it is much simpler....


I'm anxious for feedback and help with the locking mechanism, because I 
do not have all use cases in mind.  I have never thought about a lock 
that prevents reading.  In my mind, this would be handled by the object 
itself.  It could refuse buffer requests if it's data had been locked or 
it could not. 

On the other hand, there could be two concepts of locking that a 
consumer could request from an object

1) Lock so that no other reads or writes are possible until the lock is 
released.
2) Lock so that only reads are possible. 

I had only thought of #2 for the current buffer interface.

-Travis


From oliphant at enthought.com  Tue Sep 11 21:53:56 2007
From: oliphant at enthought.com (Travis E. Oliphant)
Date: Tue, 11 Sep 2007 14:53:56 -0500
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>	
		
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>	
		
	<46E62358.3020404@enthought.com>
	
Message-ID: <46E6F254.9020501@enthought.com>

Jim Jewett wrote:
> On 9/11/07, Travis E. Oliphant  wrote:
>   
>> Guido van Rossum wrote:
>>     
>>> ... I'm hoping Travis has a particular way in mind of
>>> handling LOCKDATA that can be used as a template.
>>>       
>
>   
> Does it do its processing in the original buffer, causing it to be
> temporarily invalid? If so, no one else should even be reading it.
>   

No, the processing is done in a scratch area.  But whether or not the 
object thinks anyone should be reading it or not is up to the object.  
If I've exported my memory as writeable and then somebody else wants to 
get access to the same memory, then its up to the object to decide 
whether or not that will be allowed.

It is useful to at least allow other objects to get the pointer to the 
memory (perhaps they are just monitoring what is there or are just a 
pipeline or a view of the data). 

> Or does it just replace the original buffer with the new results once
> it is finished?  If so, then why does it need the lock the whole time?
>  Is someone getting known stale data (when you could tell them to
> wait) always OK, but overwriting someone else's change never is?
>   
There is no mechanism to "tell anybody" that the data is stale.  Only 
read-able copies are allowed until the "shadow" object is done and 
copies its results back into the original data.   Perhaps a mechanism to 
signal that the data is stale (i.e. has been locked) would be a useful 
addition.

-Travis


From amauryfa at gmail.com  Tue Sep 11 21:57:33 2007
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Tue, 11 Sep 2007 21:57:33 +0200
Subject: [Python-3000] Which joker tried to remove me from the py3k list?
In-Reply-To: 
References: 
Message-ID: 

Hello,

Guido van Rossum wrote:
> ---------- Forwarded message ----------
> From: python-3000-confirm+a02c328561e5ecf4a0373b3c0001cd33ec59ea4f at python.org
> 
> Date: Sep 11, 2007 9:58 AM
> Subject: Your confirmation is required to leave the Python-3000 mailing list
> To: guido at python.org
>
>
> Mailing list removal confirmation notice for mailing list Python-3000
...
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/amauryfa%40gmail.com
>

Mailman adds these links at the bottom of every message, after the
signature. Depending on your mail client, they may be part of the
reply (Thunderbird does remove the signature and everything that
follows. Gmail seems to quote the entire message). See above, *my*
unsubscribe link after *your* signature.

It is even archived:
http://mail.python.org/pipermail/python-3000/2007-September/010383.html

Of course, this means that someone followed the link, then clicked the
"unsubscribe" button. A robot?

-- 
Amaury Forgeot d'Arc

From greg at krypto.org  Tue Sep 11 23:10:58 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 11 Sep 2007 14:10:58 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
Message-ID: <52dc1c820709111410tb37393fh3daae25eec5e6301@mail.gmail.com>

On 9/11/07, Guido van Rossum  wrote:
>
> On 9/10/07, Travis E. Oliphant  wrote:
> > Guido van Rossum wrote:
> > > I'd like to see Travis's response to this. It's setting a precedent
> > > regarding locking objects in read-only mode; I haven't found other
> > > examples of objects using LOCKDATA (the only mentions of it seem to be
> > > rejecting it :). I keep getting confused by the two separate lock
> > > counts (and I think in this version the comment is inconsistent with
> > > the code). So I'm hoping Travis has a particular way in mind of
> > > handling LOCKDATA that can be used as a template.
> > >
> > > Travis?
> >
> > The use case I had in mind comes about quite often in NumPy when you
> > want to modify the data-area of an object which may have a
> > non-contiguous chunk of memory, but the algorithm being used expects
> > contiguous data.  Imagine, for example, that the exporting object is an
> > image whose rows are stored in different segments.
> >
> > The consumer of the buffer interface, however, may be an extension
> > module that does fast image-processing operations and requires
> > contiguous data.  Because it wants to write the results back in to the
> > memory area when it is done with the algorithm (which may be thread-safe
> > and may release the GIL), it requests the object to lock its data to
> > read-only so that other consumers do not try to get writeable buffers
> > while it is processing.
> >
> > When the algorithm is done, it alone can write to the memory area and
> > then when it releases the buffer, the original object will restore
> > itself to being writeable.  Of course, the exporting object must support
> > this kind of operation and not all objects will.  I expect the NumPy
> > array object and the PIL to support it for example, and other
> > media-centric objects.
>
> Hm, so this is completely different from what I thought. It seems you
> are describing the following:
>
> 1. acquire the buffer with LOCK_DATA
> 2. copy the data out of the buffer into a scratch area
> 3. work on the scratch area
> 4. copy the data from the scratch area back into the buffer
> 5. release the buffer
>
> i would call this an exclusive write lock, which is quite different
> from the read lock interpretation implemented by Greg in his patch.
> Could you add some language to PEP 3118 to clarify this usage? Or is
> it already there? I admit to not having read it in full...


Yes that is different from what I was using it for based on what the pep
3118 description said.  Perhaps the existing description in PEP 3118 should
be renamed from LOCKDATA to READONLY?

> It would probably be useful if the bytes object supported it because
> > then other objects could use it as the memory area.    To do it
> > correctly, the object exporting the interface must only allow locking if
> > no other writeable interfaces have been exported (which it must keep
> > track of) and then on release must check to see if the buffer that is
> > being released is the one that locked its data.
>
> Right. So it seems you would need a counter of outstanding
> non-data-locked buffer requests and a single bit indicating whether
> there's a data-locked request. (Rather than two counters like Greg's
> patch currently uses.)
>
> The hacker in me is already exploring the possibility of making the
> count negative if there's a data-locked request; it sounds like the
> valid transitions are:
>
> 0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
> ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
> 0 -> -1 (LOCKDATA get)
> -1 -> 0 (LOCKDATA release)
>
> Have I got that right? I think that you should only be able to request
> LOCKDATA if there are no other readers *or* writers, but that SIMPLE
> and WRITABLE clients should be able to coexist (any mess that creates
> would be the requester's own fault). Any nonzero value here would
> indicate that the buffer can't be moved.
>
> I note that the use case in the bsddb wrapper extension is a bit
> different -- Greg suspects that BerkeleyDB won't like the data
> changing while it is using it (e.g. it might violate its own invariant
> if the key changes between the time its hash is computed and the time
> it is written to disk). To ensure this, currently LOCKDATA is the only
> option; but a classic read lock would allow multiple concurrent
> readers (which is how Greg's patch to bytesobject.c interprets
> LOCKDATA).
>
> I think this needs to be clarified. Perhaps we need to separate
> clearer the type of access (read or write) and the amount of locking
> desired (can others read? can others write?).


bsddb is not alone here but was just the code I was working on that made me
think it necessary.  I am hoping that -all- file/socket/whatever output
operations using the buffer API will get properly read-locked views of the
buffer so that they can release the GIL and not have the data change out
from underneath them by other threads.  (this avoids hard to debug issues
which python has so far been pretty good at avoiding)

(BTW The current implementation in bytesobject.c allows changing the
> size as long as it fits within the allocated size; I think this is
> probably too lenient, and begging for latent bugs.)
>
> (Spelling alert: 'writeable' is apparently not an English word. I hope
> it's not too late to rename the flag to PyBUF_WRITABLE. I've opened
> http://bugs.python.org/issue1150 to track this.)


eek, yes please lets spell correctly. :)

> For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a
> > slightly different implementation of the concept.   When this flag is
> > set during conversion to an array, then if a copy must be made to
> > satisfy the requirements, the original array is set as read-only and
> > this special flag is set on the array.  When the copy is deleted, its
> > memory is automatically copied (and possibly casted, etc.) back into the
> > original array.  It is a nice abstraction of the concept of an output
> > data area that was borrowed from Numarray and allows many things to be
> > implemented very quickly in NumPy.
>
> So in terms of locks, this effectively sets read *and* write locks on
> the original object (since whatever you might read out of it may be
> invalidated when the modified copy is written back). But how to
> enforce that at the Python level? If we had something like this for
> the bytes object, any *use* of the bytes object from Python (e.g.
> iterating over it or indexing or slicing it) should be prohibited. Is
> this reasonable?
>
> > One of the main things people use the NumPy C-API for is to get a
> > contiguous chunk of memory from an array in order to do processing in
> > another language (such as C or Fortran).   It is nice to be able to
> > specify that the result gets placed back into another chunk of memory
> > (which may or may not be contiguous) in a unified fashion.   NumPy
> > handles all the copying for you.
> >
> > My thinking was that many people will want to be able to get contiguous
> > chunks of memory, do processing, and then copy the result back into a
> > segment of memory from a buffer-exporting object which is passed into
> > the routine as an output object.
>
> This is probably common for numpy; for the bytes object, I expect that
> it's all much simpler, since it's just a contiguous 1D array of
> bytes...


fwiw, in the bsddb and hashlib code I raise an error if the buffer returned
is not a 1D array.

> I'm not sure if my explanations are helpful.  Please let me know if I
> > can explain further.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070911/829c0386/attachment-0001.htm 

From greg at krypto.org  Tue Sep 11 23:38:07 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 11 Sep 2007 14:38:07 -0700
Subject: [Python-3000] 3.0 crypto
In-Reply-To: <1B544854-053A-45C9-869B-92F48D54CA45@solarsail.hcs.harvard.edu>
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com>
	<46DE90B0.4050905@v.loewis.de>
	<66d0a6e10709050851g21bf8b5ct7486f41122487656@mail.gmail.com>
	<46DED4C0.20406@v.loewis.de>
	<5CAF4C40-5087-4BA8-B971-C3DA2A0DE679@solarsail.hcs.harvard.edu>
	<46DFB5B6.1020807@v.loewis.de>
	<308CC895-A9EB-48F8-A7B7-80DC90A8D55A@solarsail.hcs.harvard.edu>
	<52dc1c820709071148l2c3061f9l14c929657ef7e397@mail.gmail.com>
	<1B544854-053A-45C9-869B-92F48D54CA45@solarsail.hcs.harvard.edu>
Message-ID: <52dc1c820709111438n22c45fc0ncf76212324669e4a@mail.gmail.com>

> Last I heard, AMK was no longer maintaining pycrypto, and a number of
> people have found weird issues with it and were generally uncertain
> of the correctness of the implemented crypto.
>
> > The pycrypto API is is very nice.  But if we were to consider it
> > for the standard library I'd prefer it just link against OpenSSL
> > rather than use its own C implementations and just leave platforms
> > without ssl without any crypto.
>
> That's one option, although there seems to be some FUD surrounding
> OpenSSL licensing and its interactions with the GPL:
>
>      
>
> It's also a standalone library, and it strikes me as much nicer to
> just have Python provide the crypto functionality out of the box. So,
> if we built an API atop the (public domain) LibTomCrypt code that
> mimicked that of pycrypto, would anyone object to getting that kind
> of thing into the Python source distribution?


I'm +1 for that.  LibTomCrypt is a great place to start.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070911/536f508c/attachment.htm 

From guido at python.org  Tue Sep 11 23:49:17 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 14:49:17 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E6F137.2020001@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
Message-ID: 

On 9/11/07, Travis E. Oliphant  wrote:
> I'm not sure I understand the difference between a classic read lock and
> the exclusive write lock concept.   Does the classic read-lock just
> prevent writing to the memory area.  In my mind that is a read-only
> memory buffer and the buffer interface would complain if a writeable
> buffer was requested.

There are different notions of reading and writing.  Sometimes an
object it naturally read-only (e.g. a PyString). In that case
requesting SIMPLE access should pass but requesting WRITABLE or
LOCKDATA access should fail. (I think the other flags are orthogonal
to these, right?). Any number of concurrent SIMPLE accesses can
coexist since the clients promise they will only read.

OTOH suppose we have an object that is naturally writable (e.g. e
PyBytes). I understood that in this case any number of SIMPLE or
WRITABLE requests would be allowed to be outstanding simultaneously,
and any of these would simply prevent the buffer from moving (fixing
the object's size). But this doesn't sound like it is how you meant it
-- you seem to say that once any SIMPLE (readonly) requests are
outstanding, WRITABLE requests should fail. And I suppose that only
one WRITABLE request ought to be allowed at a time. But then I don't
know what the difference between WRITABLE and LOCKDATA would be.

I guess I would be inclined to propose separate flags for indicating
the operation that the caller will attempt (read or write) and the
level of locking (lock the buffer's address or also prevent anyone
else from writing). Then a "classic read lock" would request read
access while locking out writers (bsddb would use this); a "classic
write lock" would request write access while locking out writers (your
scratch area example would use this); others who don't really care if
the data changes underneath them as long as it doesn't move (e.g.
traditional I/O) could request read access without locking. I'm not
sure if there's a use case to be made for write access without
locking, but I wouldn't rule it out -- possibly when two threads share
a memory area they might have their own protocol for locking it and
might just both want to be able to write to (parts of) it.

What do you think? Another way to look at this would be to consider
these 4 cases:

basic read access (I can read, others can read or write)
locked read access (I can read, others can only read)
basic write access (I can read and write, others can read or write)
exclusive write access (I can read and write, no others can read or write)

Except that accessing the object from Python (e.g. iteration or
indexing) never gets locked out. (Or perhaps it should be? That can
also be done.)

Also, it remains to be seen whether basic read access should be
granted when someone has exclusive write access (see below).

> Actually, writeable is an accepted variant of 'writable' (but it doesn't
> show up in many spell-check dictionaries).  No, it is not too late to
> change it.  Or just define WRITEABLE as WRITABLE.   NumPy uses
> "WRITEABLE" simply because I like that spelling better.

Google found 1.4M occurrences of writeable vs. 3.9M occurrences of
writable. I guess you represent a strong minority. :-) I'd still like
to see it changed. We can leave WRITEABLE as an alias for WRITABLE for
those who are used to seeing it that way in NumPy.

> I'm anxious for feedback and help with the locking mechanism, because I
> do not have all use cases in mind.  I have never thought about a lock
> that prevents reading.  In my mind, this would be handled by the object
> itself.  It could refuse buffer requests if it's data had been locked or
> it could not.

Well, the scratch area scenario you describe makes it iffy to read
anything out of the original object since you wouldn't know whether
you were reading before, during or after the write back from the
scratch area to the object's buffer. The question is, do we really
care. If we adopted my 4 access modes above, we could say that basic
read access will still be granted when someone has exclusive write
access if we don't care, OR we could say that basic reads are locked
out by exclusive write access. (And then there's the separate issue of
whether python-level access counts as basic read access or doesn't
count at all -- though the moer I think about it, I think it should be
treated the smne as basic read access.)

> On the other hand, there could be two concepts of locking that a
> consumer could request from an object
>
> 1) Lock so that no other reads or writes are possible until the lock is
> released.
> 2) Lock so that only reads are possible.
>
> I had only thought of #2 for the current buffer interface.

#1 maps to locked read OR exclusive write access in the strict variant.
#2 maps to locked read in my scheme.

(Gotta go -- ttyl.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Sep 12 00:38:59 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 10:38:59 +1200
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <46E69F2E.9080509@gmail.com>
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org>
	<66d0a6e10709101802t3a8f2475gcdeb180ceaaf3855@mail.gmail.com>
	<46E62721.4020009@v.loewis.de>
	<66d0a6e10709110138w3fcb5f7bl87168db2328695d1@mail.gmail.com>
	<46E675F6.8090604@v.loewis.de> <46E69F2E.9080509@gmail.com>
Message-ID: <46E71903.8060903@canterbury.ac.nz>

Nick Coghlan wrote:
> The LGPL and GPL have different aims from the PSF license, with a much 
> greater focus on preserving freedom for the end-user,

Seems to me they go somewhat beyond "preserving freedoms"
and into other areas. It's one thing to *allow* people to
use the source if they can get it; it's another thing to
try to force people who have no interest in the source
themselves to act as agents for hosting and distributing
it.

Still, it appears that this is what the LGPL requires, so
I agree that it's not appropriate for Python.

--
[L]GPL - just say no.
Greg

From greg.ewing at canterbury.ac.nz  Wed Sep 12 00:47:22 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 10:47:22 +1200
Subject: [Python-3000] __format__ and datetime
In-Reply-To: 
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
Message-ID: <46E71AFA.9020903@canterbury.ac.nz>

Guido van Rossum wrote:
> I don't expect there to be a practical use for nanoseconds (even
> microseconds are doubtful, but useful since one might want unique
> timestamps for more than 1000 events per second).

But... what if you want unique timestamps for more
than 1000000 events per second? :-)

--
Greg



From greg.ewing at canterbury.ac.nz  Wed Sep 12 00:52:32 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 10:52:32 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
Message-ID: <46E71C30.50409@canterbury.ac.nz>

Guido van Rossum wrote:

> 0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get)
> ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release)
> 0 -> -1 (LOCKDATA get)
> -1 -> 0 (LOCKDATA release)

And if this is the correct interpretation, the requests
should be called something like READ_LOCK and WRITE_LOCK
to make this clear.

--
Greg


From guido at python.org  Wed Sep 12 00:58:11 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 11 Sep 2007 15:58:11 -0700
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E71AFA.9020903@canterbury.ac.nz>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
	<46E71AFA.9020903@canterbury.ac.nz>
Message-ID: 

On 9/11/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
> > I don't expect there to be a practical use for nanoseconds (even
> > microseconds are doubtful, but useful since one might want unique
> > timestamps for more than 1000 events per second).
>
> But... what if you want unique timestamps for more
> than 1000000 events per second? :-)

Then you can't use the datetime module, or you'll have to petition for
an extension to it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Sep 12 01:12:14 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 11:12:14 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: <46E6F137.2020001@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
Message-ID: <46E720CE.7030602@canterbury.ac.nz>

Travis E. Oliphant wrote:
> I'm not sure I understand the difference between a classic read lock and 
> the exclusive write lock concept.

A read lock means that others can obtain read locks,
and nobody can obtain a write lock.

A write lock means that nobody else can obtain a
lock of any kind.

I think strictly the 'e' should only be inserted if the
preceding letter is one whose sound changes depending
on whether it's followed by an 'e', such as 'c' or 'g'.
"Writeable" does seem to be commonly used, though.

In any case, it would be good to adopt a convention for
these kinds of word used in source, to minimise confusion.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Sep 12 01:15:47 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 11:15:47 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: <46E6F254.9020501@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F254.9020501@enthought.com>
Message-ID: <46E721A3.4060406@canterbury.ac.nz>

Jim Jewett wrote:
> 
> why does it need the lock the whole time?
> Is someone getting known stale data (when you could tell them to
> wait) always OK, but overwriting someone else's change never is?

In a threaded environment, it shouldn't really be
a problem as long as the view of the data is consistent.
It's no different from what would have happened if the
reading thread had got there just a moment sooner,
before the writer got hold of it.

If that's a problem, there should have been some
higher-level synchronisation going on before getting
to that point.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Sep 12 01:17:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 11:17:20 +1200
Subject: [Python-3000] Which joker tried to remove me from the py3k list?
In-Reply-To: 
References: 
	
Message-ID: <46E72200.7070408@canterbury.ac.nz>

Amaury Forgeot d'Arc wrote:
> Of course, this means that someone followed the link, then clicked the
> "unsubscribe" button. A robot?

It could just be someone trying to unsubscribe themselves,
but hitting the wrong link and not noticing.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Sep 12 01:56:08 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 12 Sep 2007 11:56:08 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
Message-ID: <46E72B18.9060908@canterbury.ac.nz>

Guido van Rossum wrote:
> Any number of concurrent SIMPLE accesses can
> coexist since the clients promise they will only read.

As a general principle, using a word like SIMPLE in an
API is a really bad idea imo, as it's far too vague.
I'm finding it impossible to evaluate the truthfulness
of statements like the above in this discussion, because
of that.

> basic read access (I can read, others can read or write)
> locked read access (I can read, others can only read)
> basic write access (I can read and write, others can read or write)
> exclusive write access (I can read and write, no others can read or write)

Should that last one perhaps be "I can read and write,
others can only read"?

Another thread wanting to read but get a stable view
of the data will be using "I can read, others can only read",
which will fail because the first one is writing. If the
reading thread doesn't care about stability, the writing
one shouldn't have to know.

Then we have two orthogonal things: READ vs WRITE, and
SHARED vs EXCLUSIVE (where 'exclusive' means that others
are excluded from writing).

> Except that accessing the object from Python (e.g. iteration or
> indexing) never gets locked out.

With the scheme I just proposed, the iterator could use
a non-exclusive mode if it wanted, which would give this
effect.

--
Greg


From greg at krypto.org  Wed Sep 12 07:55:09 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 11 Sep 2007 22:55:09 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709111003y4bc1e5acpfe7ce26841718a37@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709092053r50cc23fcsb74cea71c9541797@mail.gmail.com>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
	<46E641F2.4020701@v.loewis.de>
	<66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>
	<79990c6b0709110521p10722897s6e4d03e5a558b457@mail.gmail.com>
	<66d0a6e10709111003y4bc1e5acpfe7ce26841718a37@mail.gmail.com>
Message-ID: <52dc1c820709112255j1709da88x7886faa431f2ed70@mail.gmail.com>

> The Pentium M and Pentium D are much more alike, architecturally, than
> either and the Pentium 4,


[cpu rant]
Off topic: not true.  The Pentium D is the final Pentium 4 netburst
architecture based design.  It is not at all close to the Pentium M.   The M
is much more a derivative of the pentium pro,ii,iii, & iii-m before it as
core and more distantly core2 are follow ons to the M.  Yes the D (50xx) and
Woodcrest core2s (51xx) shared the same socket and front side bus but
internally they are unrelated.
[/cpu rant]

Regardless comparing between different cpus doesn't matter, only the
difference between runs on the same cpu.

for instance on a 1.4Ghz efficeon:

python2.5:
10 loops, best of 3: 932 msec per loop
python 3.0a1 svn trunk:
10 loops, best of 3: 1.54 sec per loop

(both compiled with gcc 4.1.2 -O3)

which falls right smack in the middle of the measurements others were
reporting in this thread. ;)

Without looking into it at a much lower level,
> it's hard to tell, but the difference between a 1MB and 2MB L2 cache
> might make all the difference in 3.0 performance.


doubtful, python's ceval core and the data representing the code being
executed are both tiny.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070911/e2ff73f8/attachment.htm 

From nick.bastin at gmail.com  Wed Sep 12 08:54:57 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 12 Sep 2007 02:54:57 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <52dc1c820709112255j1709da88x7886faa431f2ed70@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709101058n22b04bfakf67a15aea8e739f4@mail.gmail.com>
	
	<66d0a6e10709101224j4cbe900dsb8aa52bd7259e66a@mail.gmail.com>
	<46E641F2.4020701@v.loewis.de>
	<66d0a6e10709110220i2f415fcan9047e4cb40676488@mail.gmail.com>
	<79990c6b0709110521p10722897s6e4d03e5a558b457@mail.gmail.com>
	<66d0a6e10709111003y4bc1e5acpfe7ce26841718a37@mail.gmail.com>
	<52dc1c820709112255j1709da88x7886faa431f2ed70@mail.gmail.com>
Message-ID: <66d0a6e10709112354m76c3bedn28f9713038f137a5@mail.gmail.com>

On 9/12/07, Gregory P. Smith  wrote:
> [cpu rant]
> Off topic: not true.  The Pentium D is the final Pentium 4 netburst
> architecture based design.  It is not at all close to the Pentium M.   The M
> is much more a derivative of the pentium pro,ii,iii, & iii-m before it as
> core and more distantly core2 are follow ons to the M.  Yes the D (50xx) and
> Woodcrest core2s (51xx) shared the same socket and front side bus but
> internally they are unrelated.
> [/cpu rant]

Yeah, my mistake, I misread intel's NetBurst page.  I should have
stuck with Wikipedia (who knew).

> Regardless comparing between different cpus doesn't matter, only the
> difference between runs on the same cpu.

I agree.

> for instance on a 1.4Ghz efficeon:
>
> python2.5:
> 10 loops, best of 3: 932 msec per loop
> python 3.0a1 svn trunk:
> 10 loops, best of 3: 1.54 sec per loop
>
> (both compiled with gcc 4.1.2 -O3)
>
> which falls right smack in the middle of the measurements others were
> reporting in this thread. ;)

I should look at a comparison of 2.5 and 2.6 at some point, for better
reference.

> > Without looking into it at a much lower level,
> > it's hard to tell, but the difference between a 1MB and 2MB L2 cache
> > might make all the difference in 3.0 performance.
>
> doubtful, python's ceval core and the data representing the code being
> executed are both tiny.

Makes me miss the G4/5 version of Shark on MacOS X, which would show
you the pipelining in the processor and cache utilization, so you
could actually see what was going on - the x86 Shark doesn't seem to
have this capability.  It's suitably interesting on my windows xp
machine to turn processor affinity off and see the performance go to
hell in a handbasket.  (Why, oh why, does windows insist on moving
processes between CPUs all the time?)

--
Nick

From greg at krypto.org  Wed Sep 12 09:44:56 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 12 Sep 2007 00:44:56 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E72B18.9060908@canterbury.ac.nz>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	<46E72B18.9060908@canterbury.ac.nz>
Message-ID: <52dc1c820709120044h722605cekc86ea668a6a1b4bd@mail.gmail.com>

On 9/11/07, Greg Ewing  wrote:
>
> Guido van Rossum wrote:
> > Any number of concurrent SIMPLE accesses can
> > coexist since the clients promise they will only read.
>
> As a general principle, using a word like SIMPLE in an
> API is a really bad idea imo, as it's far too vague.
> I'm finding it impossible to evaluate the truthfulness
> of statements like the above in this discussion, because
> of that.


+1 on that. SIMPLE is a bad name.  Based on the pep3118 description, how
about calling it 1D_CONTIGUOUS or just RAW or FLAT?

I also like your suggestion of renaming PyBUF api flags to READ_LOCK and
WRITE_LOCK as those are well defined concepts in the classic multiple
readers or one writer synchronization sense.  What I implemented in my bytes
patch should really be called PyBUF_READ_LOCK and what Travis describes as
LOCKDATA in this email thread should become WRITE_LOCK.

> basic read access (I can read, others can read or write)
> > locked read access (I can read, others can only read)
> > basic write access (I can read and write, others can read or write)
> > exclusive write access (I can read and write, no others can read or
> write)
>
> Should that last one perhaps be "I can read and write,
> others can only read"?
>
> Another thread wanting to read but get a stable view
> of the data will be using "I can read, others can only read",
> which will fail because the first one is writing. If the
> reading thread doesn't care about stability, the writing
> one shouldn't have to know.
>
> Then we have two orthogonal things: READ vs WRITE, and
> SHARED vs EXCLUSIVE (where 'exclusive' means that others
> are excluded from writing).


When I read the plain term EXCLUSIVE I read that to mean nobody else can
read -or- write, ie: not shared in any sense.  Lets extend these base
concepts to SHARED_READ, SHARED_WRITE, EXCLUSIVE_READ, EXCLUSIVE_WRITE and
use them to define the more others:

EXCLUSIVE_WRITE - no others write to the buffer while this view is open
(this does *not* imply that the requester wants to actually write, thats
what the WRIT(E)ABLE flag is for)
EXCLUSIVE_READ - no others can read this buffer while this view is open.
(this is only useful in conjunction with exclusive write below to make a
write_lock).
SHARED_READ - anyone can read this buffer
SHARED_WRITE - anyone can write this buffer

SIMPLE/FLAT/RAW = SHARED_WRITE | SHARED_READ
READ_LOCK = EXCLUSIVE_WRITE | SHARED_READ
WRITE_LOCK = EXCLUSIVE_WRITE | EXCLUSIVE_READ

Just | any of the above with WRIT(E)ABLE if you intend to actually write to
the buffer.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070912/aa420023/attachment.htm 

From skip at pobox.com  Wed Sep 12 15:58:05 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 12 Sep 2007 08:58:05 -0500
Subject: [Python-3000] __format__ and datetime
In-Reply-To: 
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
Message-ID: <18151.61549.956117.769166@montanaro.dyndns.org>


    Guido> No, the datetime module is explicitly defined to use
    Guido> microseconds. I don't expect there to be a practical use for
    Guido> nanoseconds (even microseconds are doubtful, but useful since one
    Guido> might want unique timestamps for more than 1000 events per
    Guido> second).

I was just thinking about the folks at places like FermiLab and CERN. ;-)

So, is '%f" okay to coopt?  Is there some sort of future-proofing we can do
so that if the libc folks decide later to use "%f" for something we're not
(mildly) hosed?  Maybe "%."?  It appears that all strftime codes are one or
two letters.

Skip


From nas at arctrix.com  Wed Sep 12 19:43:07 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Wed, 12 Sep 2007 17:43:07 +0000 (UTC)
Subject: [Python-3000] C API for ints and strings
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org>
Message-ID: 

Larry Hastings  wrote:
> I am opposed to using LGPL- or GPL-licensed code in Python.

Me too.  Also, I don't see the point.  Python's current long integer
performance is good enough for the large majority of Python users.
For the few specialized users, an extension module should serve.
Maybe I missed something but I thought the real concern was the
performance of the PyLong type when representing relatively short
integers.  Is GMP a solution to that?

  Neil


From barry at python.org  Wed Sep 12 20:06:11 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 12 Sep 2007 14:06:11 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	
	<66d0a6e10709081041v4ea37ce8od75d8a688b52faae@mail.gmail.com>
	<46E2DF85.4090005@v.loewis.de>
	<66d0a6e10709081347k6873d581w869b9b483126a929@mail.gmail.com>
	<46E31FA2.4060701@v.loewis.de>
	<66d0a6e10709081623w59440ac2pf8dca78ae05dfd52@mail.gmail.com>
	<46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org> 
Message-ID: <889D3A2E-3FE6-49C0-89E5-3EB6B885950D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 12, 2007, at 1:43 PM, Neil Schemenauer wrote:

> Larry Hastings  wrote:
>> I am opposed to using LGPL- or GPL-licensed code in Python.
>
> Me too.  Also, I don't see the point.  Python's current long integer
> performance is good enough for the large majority of Python users.
> For the few specialized users, an extension module should serve.
> Maybe I missed something but I thought the real concern was the
> performance of the PyLong type when representing relatively short
> integers.  Is GMP a solution to that?

Back in the days of a previous employment, we used some homegrown  
extensions to give us GMP support in our embedded app.  In a fit of  
rewrite-mania, we ditched it all and stuck with Python's own long  
integer support.  Made our lives easier and we didn't feel we lost  
anything in terms of accuracy or functionality.  We gained in  
performance but I can't attribute that solely to Python's  
implementation, since we also ditched a level of abstraction in the  
process.  In any event, I'd agree that Python's current support is  
probably good enough for most people.

- -1 on GMP in the core.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRugqlHEjvBPtnXfVAQLu5AP/TolPljxJuqOeEUDrJo1cT0c3FgpJY3RE
WSCiIC9+5GW1DSkcZvbO5DzHJH6qYd7HL7z1n2D+AMSH7NFQU4G7yXIkTd4AAibW
U3M7KSLEh/q75+lnx5nIoHrPB1A0lJU+c34Ly/kuusE5x4JIeuITkorQYKRDCcKs
ZcGFOtGs4pE=
=Ysmv
-----END PGP SIGNATURE-----

From guido at python.org  Wed Sep 12 20:42:33 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 12 Sep 2007 11:42:33 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <889D3A2E-3FE6-49C0-89E5-3EB6B885950D@python.org>
References: <1189270839.25695.18.camel@qrnik> <46E3B12E.1000703@v.loewis.de>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org> 
	<889D3A2E-3FE6-49C0-89E5-3EB6B885950D@python.org>
Message-ID: 

Can I just shortcut this discussion saying that we will *not* switch
to use GMP? It's just not going to happen. Period. End of discussion.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Sep 12 20:45:19 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 12 Sep 2007 11:45:19 -0700
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <18151.61549.956117.769166@montanaro.dyndns.org>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
	<18151.61549.956117.769166@montanaro.dyndns.org>
Message-ID: 

On 9/12/07, skip at pobox.com  wrote:
> So, is '%f" okay to coopt?  Is there some sort of future-proofing we can do
> so that if the libc folks decide later to use "%f" for something we're not
> (mildly) hosed?  Maybe "%."?  It appears that all strftime codes are one or
> two letters.

Which ones are two letters?

Given how long strftime has been around I think %f is fine. We may
even influence the future of the C library. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rowen at cesmail.net  Wed Sep 12 20:53:25 2007
From: rowen at cesmail.net (Russell E. Owen)
Date: Wed, 12 Sep 2007 11:53:25 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
Message-ID: 

In article 
,
 "Guido van Rossum"  wrote:

> I guess I would be inclined to propose separate flags for indicating
> the operation that the caller will attempt (read or write) and the
> level of locking (lock the buffer's address or also prevent anyone
> else from writing). Then a "classic read lock" would request read
> access while locking out writers (bsddb would use this); a "classic
> write lock" would request write access while locking out writers (your
> scratch area example would use this); others who don't really care if
> the data changes underneath them as long as it doesn't move (e.g.
> traditional I/O) could request read access without locking. I'm not
> sure if there's a use case to be made for write access without
> locking, but I wouldn't rule it out -- possibly when two threads share
> a memory area they might have their own protocol for locking it and
> might just both want to be able to write to (parts of) it.
> 
> What do you think? Another way to look at this would be to consider
> these 4 cases:
> 
> basic read access (I can read, others can read or write)
> locked read access (I can read, others can only read)
> basic write access (I can read and write, others can read or write)
> exclusive write access (I can read and write, no others can read or write)

Sounds much like the modes offered by an old operating system that had a 
very nice lock manager. The modes:
- concurrent read (others can read or write)
- protected read (others can read but not write)
- concurrent write (others can read or concurrent write)
- protected write (others can concurrent read)
- exclusive (no other locks allowed)
(as well as null to release the resource)

Some of these modes were intended for resources that are locked at 
multiple levels (which I don't think applies to array buffers). For 
example one might get a concurrent lock for a group of resources, then a 
protected lock for one resource. But as you say, there are some 
situations where concurrent write might be useful.

-- Russell


From jjb5 at cornell.edu  Wed Sep 12 23:10:11 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Wed, 12 Sep 2007 17:10:11 -0400
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>		<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>		<46E62358.3020404@enthought.com>		<46E6F137.2020001@enthought.com>	
	
Message-ID: <46E855B3.7040908@cornell.edu>

> Sounds much like the modes offered by an old operating system that had a 
> very nice lock manager.

Awe, VMS isn't THAT old, is it?  :-)

I have a wrapper around threading.Lock and threading.RLock that I've 
been using that does deadlock detection and have wished for these lock 
modes many times.  I would hesitate to create a PEP to support this 
until I actually new I could pull it off.

I would be happy to share the code that I have with anybody that might 
find it useful, and welcome some help in implementing these modes.

There's one other very useful feature, and that was a callback from the 
lock manager when a lock that you held was blocking a request from some 
other process.


Joel

From nick.bastin at gmail.com  Wed Sep 12 23:15:54 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 12 Sep 2007 17:15:54 -0400
Subject: [Python-3000] C API for ints and strings
In-Reply-To: 
References: <1189270839.25695.18.camel@qrnik>
	<66d0a6e10709090206n27b8cbe3y5f6d13085aa74036@mail.gmail.com>
	<46E3BBE7.4020800@v.loewis.de> <46E48C10.7010705@canterbury.ac.nz>
	<46E4D273.9080300@v.loewis.de> <46E5DC4B.6030304@canterbury.ac.nz>
	<46E5DE92.8070808@hastings.org> 
	<889D3A2E-3FE6-49C0-89E5-3EB6B885950D@python.org>
	
Message-ID: <66d0a6e10709121415s49db5a03g90d902dd3a613abf@mail.gmail.com>

On 9/12/07, Guido van Rossum  wrote:
> Can I just shortcut this discussion saying that we will *not* switch
> to use GMP? It's just not going to happen. Period. End of discussion.

I figured that was assumed once it was pointed out that it didn't work
on Intel macs...  I'm pretty sure that's a platform we'd prefer to
continue to support.

--
Nick

From guido at python.org  Wed Sep 12 23:18:37 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 12 Sep 2007 14:18:37 -0700
Subject: [Python-3000] C API for ints and strings
In-Reply-To: <66d0a6e10709121415s49db5a03g90d902dd3a613abf@mail.gmail.com>
References: <1189270839.25695.18.camel@qrnik> <46E3BBE7.4020800@v.loewis.de>
	<46E48C10.7010705@canterbury.ac.nz> <46E4D273.9080300@v.loewis.de>
	<46E5DC4B.6030304@canterbury.ac.nz> <46E5DE92.8070808@hastings.org>
	
	<889D3A2E-3FE6-49C0-89E5-3EB6B885950D@python.org>
	
	<66d0a6e10709121415s49db5a03g90d902dd3a613abf@mail.gmail.com>
Message-ID: 

On 9/12/07, Nicholas Bastin  wrote:
> On 9/12/07, Guido van Rossum  wrote:
> > Can I just shortcut this discussion saying that we will *not* switch
> > to use GMP? It's just not going to happen. Period. End of discussion.
>
> I figured that was assumed once it was pointed out that it didn't work
> on Intel macs...  I'm pretty sure that's a platform we'd prefer to
> continue to support.

Then why are people (not you) still arguing about this?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Sep 12 23:19:31 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 12 Sep 2007 14:19:31 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E855B3.7040908@cornell.edu>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	
	<46E855B3.7040908@cornell.edu>
Message-ID: 

That's a different topic altogether. We're talking here about locking
modes for the buffer API (PEP 3118). This does not involve actual
locks -- the operations just fail if the requested lock cannot be
obtained.

On 9/12/07, Joel Bender  wrote:
> > Sounds much like the modes offered by an old operating system that had a
> > very nice lock manager.
>
> Awe, VMS isn't THAT old, is it?  :-)
>
> I have a wrapper around threading.Lock and threading.RLock that I've
> been using that does deadlock detection and have wished for these lock
> modes many times.  I would hesitate to create a PEP to support this
> until I actually new I could pull it off.
>
> I would be happy to share the code that I have with anybody that might
> find it useful, and welcome some help in implementing these modes.
>
> There's one other very useful feature, and that was a callback from the
> lock manager when a lock that you held was blocking a request from some
> other process.
>
>
> Joel
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nicko at nicko.org  Thu Sep 13 01:33:07 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Thu, 13 Sep 2007 00:33:07 +0100
Subject: [Python-3000] Performance Notes - new hash algorithm
In-Reply-To: 
References: <52dc1c820709071345m4f4fbe52i41921be5fcb116df@mail.gmail.com>
	
	<1f7befae0709081848m477422bdm11355e58920bf6c6@mail.gmail.com>
	
Message-ID: 

On 10 Sep 2007, at 01:58, Jim Jewett wrote:
> To spell this out a bit more:
> ...
> When adding four entries to an 8-slot table, a truly random hash would
> have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4  of the
> time.  As expected, the proposed hash does have a collision for those
> four values (the first and fourth).

While your over-all analysis is both informative and helpful, the  
pedant in me feels obliged to point out the flaw in your math.  The  
probability of at least one collision is 1 minus the probability of  
no collision, which is in turn 8/8 * 7/8 * 6/8 * 5/8, so the correct  
figure is actually that you collide about 59% of the time, not 75%.

(If your math were correct then 5 items would collide 125% of the  
time, which is clearly wrong! :-)

	Cheers,
		Nicko




From unknown_kev_cat at hotmail.com  Thu Sep 13 02:12:06 2007
From: unknown_kev_cat at hotmail.com (Joe Smith)
Date: Wed, 12 Sep 2007 20:12:06 -0400
Subject: [Python-3000] Solaris support in 3.0?
References: <66d0a6e10709050017s7b354bd7tf418a0c168e181c9@mail.gmail.com><46DE90B0.4050905@v.loewis.de><52dc1c820709050836pba30e32me219a4c03627f223@mail.gmail.com>
	<79990c6b0709060129s458f6ce4t71e128a4a4f6e2dd@mail.gmail.com>
Message-ID: 


"Paul Moore"  wrote in message 
news:79990c6b0709060129s458f6ce4t71e128a4a4f6e2dd at mail.gmail.com...
> On 05/09/07, Gregory P. Smith  wrote:
>> Rather than resurrecting the old RSA-copyright md5.c I can easily make 
>> new
>> ones out of the libtomcrypt md5 and sha1 sources the same way i created 
>> the
>> non-openssl sha256 and sha512 modules.
>
> Which reminds me - when I build Python 3 (on an Ubuntu box) with
> openssl installed, I get a message about _sha256 and _sha512 not being
> built. Presumably this is intentional? (It looks a bit odd, and I
> spent a while trying to work out what dependencies I needed before
> realising it was probably OK).
Yep, perfectly normal. That just says that the code that shipped with python 
is not being used, because OpenSSL's implementations
of those functions were being used instead. (At least that is my 
understanding.) 



From skip at pobox.com  Thu Sep 13 02:56:57 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 12 Sep 2007 19:56:57 -0500
Subject: [Python-3000] __format__ and datetime
In-Reply-To: 
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
	<18151.61549.956117.769166@montanaro.dyndns.org>
	
Message-ID: <18152.35545.33753.630023@montanaro.dyndns.org>


    Guido> Which ones are two letters?

All the locale-specific stuff on Solaris 10.  I guess technically the first
letter of the pair is a modifier of the actual code, which comes next.  From
the man page:

  Modified Conversion Specifications
     Some conversion specifications can be modified by the E  and
     O modifiers to indicate that an alternate format or specifi-
     cation should be used rather than the one normally  used  by
     the  unmodified  conversion  specification. If the alternate
     format or  specification  does  not  exist  in  the  current
     locale, the behavior will be as if the unmodified specifica-
     tion were used.

     %Ec      Locale's  alternate  appropriate  date   and   time
              representation.

     %EC      Name of the base  year  (period)  in  the  locale's
              alternate representation.

     %Eg      Offset from %EC  of  the  week-based  year  in  the
              locale's alternative representation.

     %EG      Full alternative representation of  the  week-based
              year.

     %Ex      Locale's alternate date representation.

     %EX      Locale's alternate time representation.

     %Ey      Offset from %EC (year only) in the locale's  alter-
              nate representation.

     %EY      Full alternate year representation.

     %Od      Day of  the  month  using  the  locale's  alternate
              numeric symbols.

     %Oe      Same as %Od.

     %Og      Week-based year (offset from %C)  in  the  locale's
              alternate  representation  and  using  the locale's
              alternate numeric symbols.

     %OH      Hour (24-hour clock) using the  locale's  alternate
              numeric symbols.

     %OI      Hour (12-hour clock) using the  locale's  alternate
              numeric symbols.

     %Om      Month using the locale's alternate numeric symbols.

     %OM      Minutes using the locale's alternate  numeric  sym-
              bols.

     %OS      Seconds using the locale's alternate  numeric  sym-
              bols.

     %Ou      Weekday as  a  number  in  the  locale's  alternate
              numeric symbols.

     %OU      Week number of the year (Sunday as the first day of
              the week) using the locale's alternate numeric sym-
              bols.

     %Ow      Number  of  the  weekday   (Sunday=0)   using   the
              locale's alternate numeric symbols.

     %OW      Week number of the year (Monday as the first day of
              the week) using the locale's alternate numeric sym-
              bols.

     %Oy      Year (offset from %C)  in  the  locale's  alternate
              representation  and  using  the  locale's alternate
              numeric symbols.

Skip

From skip.montanaro at gmail.com  Thu Sep 13 04:29:36 2007
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Wed, 12 Sep 2007 21:29:36 -0500
Subject: [Python-3000] __format__ and datetime
In-Reply-To: 
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
	<18151.61549.956117.769166@montanaro.dyndns.org>
	
Message-ID: <60bb7ceb0709121929x53c82180xac8a350fb1d2a422@mail.gmail.com>

> Given how long strftime has been around I think %f is fine. We may
> even influence the future of the C library. :-)

Patch for datetime (py3k only at this point, no tests either) here:

    http://bugs.python.org/issue1158

Skip

From qrczak at knm.org.pl  Thu Sep 13 18:22:12 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Thu, 13 Sep 2007 18:22:12 +0200
Subject: [Python-3000] Unicode and OS strings
Message-ID: <1189700532.22693.40.camel@qrnik>

What should happen when a command line argument or an environment
variable is not decodable using the system encoding (on Unix where
from the OS point of view it is an array of bytes)?

This is an unfortunate side effect of switching to Unicode. It's
unfortunate because often the data is only passed back to another
function, and thus lack of round trip is a pure loss caused by
choosing a Unicode string as the representation of such data.
I opt for Unicode strings nevertheless, Python did a right step.

I once checked what other languages with Unicode strings do, and the
results were not enlightening: inconsistency, weird errors, damaged or
truncated data.

Python 3.0a1 mostly fails with weird errors, and fails a bit too early:

[qrczak ~]$ echo $LANG
pl_PL.UTF-8

[qrczak ~]$ python3.0 - $(printf '\x80')           
Python 3.0a1 (py3k, Sep  8 2007, 15:57:56) 
[GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Fatal Python error: no mem for sys.argv
zsh: abort      python3.0 - $(printf '\x80')

[qrczak ~]$ FOO=$(printf '\x80') python3.0
Python 3.0a1 (py3k, Sep  8 2007, 15:57:56) 
[GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
object  : UnicodeDecodeError('utf8', b'\x80', 0, 1, 'unexpected code byte')
type    : UnicodeDecodeError
refcount: 4
address : 0xb7a5142c
lost sys.stderr
>>>

[qrczak ~]$ mkdir $(printf '\x80')

[qrczak ~]$ cd $(printf '\x80')

[qrczak ~/\M-^@]$ python3.0
Python 3.0a1 (py3k, Sep  8 2007, 15:57:56) 
[GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
object  : UnicodeDecodeError('utf8', b'/home/users/qrczak/\x80', 19, 20, 'unexpected code byte')
type    : UnicodeDecodeError
refcount: 4
address : 0xb7a1242c
lost sys.stderr
>>>

os.listdir returns undecodable filenames as str8.

I don't know what it should do. Choices:

1. Fail in a controlled way (without losing sys.stderr), and no earlier
   than necessary, i.e. fail when the given string is requested, not
   when a module is imported.

1a. Guarantee that choosing a different encoding and retrying works,
    for a rare case when the programmer wishes to handle such strings by
    explicitly trying latin1.

2. Return undecodable information as bytes, and accept bytes when it is
   passed back to similar functions in the other direction.

3. Have an option to use a modified UTF-8 in these places, where
   undecodable bytes are e.g. escaped as U+0000 U+00xx.

I will not advocate any choice other than 1, but perhaps someone has
another idea.

My language Kogut uses 1a (even for things like sys.argv which look like
variables), experimentally with 3 as an option to be requested either by
choosing such encoding by the program or with an environment variable.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From guido at python.org  Thu Sep 13 18:48:47 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 13 Sep 2007 09:48:47 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189700532.22693.40.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
Message-ID: 

Yes, I have noticed this too. Environment variables, command line
arguments, locale properties, TZ names, and so on, are often given as
8-bit strings in who knows what encoding. I'm not sure what the
solution is, but we need one. I'm guessing one thing we need to do is
research how various systems decide what encoding to use. Even on OSX,
I managed to create an environment variable containing non-ASCII
non-UTF-8 bytes.

I believe Tcl/Tk used to have some kind of heuristic where they would
try UTF-8 first and if that failed used Latin-1 for the bytes that
aren't valid UTF-8, but I'm not at all sure that that's the right
solution in places where Latin-1 is not spoken.

--Guido

On 9/13/07, Marcin 'Qrczak' Kowalczyk  wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?
>
> This is an unfortunate side effect of switching to Unicode. It's
> unfortunate because often the data is only passed back to another
> function, and thus lack of round trip is a pure loss caused by
> choosing a Unicode string as the representation of such data.
> I opt for Unicode strings nevertheless, Python did a right step.
>
> I once checked what other languages with Unicode strings do, and the
> results were not enlightening: inconsistency, weird errors, damaged or
> truncated data.
>
> Python 3.0a1 mostly fails with weird errors, and fails a bit too early:
>
> [qrczak ~]$ echo $LANG
> pl_PL.UTF-8
>
> [qrczak ~]$ python3.0 - $(printf '\x80')
> Python 3.0a1 (py3k, Sep  8 2007, 15:57:56)
> [GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Fatal Python error: no mem for sys.argv
> zsh: abort      python3.0 - $(printf '\x80')
>
> [qrczak ~]$ FOO=$(printf '\x80') python3.0
> Python 3.0a1 (py3k, Sep  8 2007, 15:57:56)
> [GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> object  : UnicodeDecodeError('utf8', b'\x80', 0, 1, 'unexpected code byte')
> type    : UnicodeDecodeError
> refcount: 4
> address : 0xb7a5142c
> lost sys.stderr
> >>>
>
> [qrczak ~]$ mkdir $(printf '\x80')
>
> [qrczak ~]$ cd $(printf '\x80')
>
> [qrczak ~/\M-^@]$ python3.0
> Python 3.0a1 (py3k, Sep  8 2007, 15:57:56)
> [GCC 4.2.1 20070719 (release) (PLD-Linux)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> object  : UnicodeDecodeError('utf8', b'/home/users/qrczak/\x80', 19, 20, 'unexpected code byte')
> type    : UnicodeDecodeError
> refcount: 4
> address : 0xb7a1242c
> lost sys.stderr
> >>>
>
> os.listdir returns undecodable filenames as str8.
>
> I don't know what it should do. Choices:
>
> 1. Fail in a controlled way (without losing sys.stderr), and no earlier
>    than necessary, i.e. fail when the given string is requested, not
>    when a module is imported.
>
> 1a. Guarantee that choosing a different encoding and retrying works,
>     for a rare case when the programmer wishes to handle such strings by
>     explicitly trying latin1.
>
> 2. Return undecodable information as bytes, and accept bytes when it is
>    passed back to similar functions in the other direction.
>
> 3. Have an option to use a modified UTF-8 in these places, where
>    undecodable bytes are e.g. escaped as U+0000 U+00xx.
>
> I will not advocate any choice other than 1, but perhaps someone has
> another idea.
>
> My language Kogut uses 1a (even for things like sys.argv which look like
> variables), experimentally with 3 as an option to be requested either by
> choosing such encoding by the program or with an environment variable.
>
> --
>    __("<         Marcin Kowalczyk
>    \__/       qrczak at knm.org.pl
>     ^^     http://qrnik.knm.org.pl/~qrczak/
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Thu Sep 13 19:08:40 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 13 Sep 2007 19:08:40 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	
Message-ID: <46E96E98.9080406@v.loewis.de>

> Yes, I have noticed this too. Environment variables, command line
> arguments, locale properties, TZ names, and so on, are often given as
> 8-bit strings in who knows what encoding. I'm not sure what the
> solution is, but we need one.

One "universal" solution is to use Unicode private-use-area
characters. We could come up with some error handler which replaces
undecodable characters with a PUA character; on encoding, the
same error handler encodes the PUA characters again as bytes.
We would need a block of 256 PUA characters for that.

Of course, if the input data already contains PUA characters,
there would be an ambiguity. We can rule this out for most codecs,
as they don't support PUA characters. The major exception would
be UTF-8, for which we would need to create a UTF-8-noPUA codec,
which would then be used at all system interfaces that should use
UTF-8 but might use arbitrary bytes.

We would make a list of all interfaces that use the PUA error
handler: file names, environment variables, command line
arguments.

> I'm guessing one thing we need to do is
> research how various systems decide what encoding to use. Even on OSX,
> I managed to create an environment variable containing non-ASCII
> non-UTF-8 bytes.

Unix-ish systems just don't decide. They pass that on to the
application. On display, they display things like question marks. At
API level, it's just null-terminated char*.

> I believe Tcl/Tk used to have some kind of heuristic where they would
> try UTF-8 first and if that failed used Latin-1 for the bytes that
> aren't valid UTF-8, but I'm not at all sure that that's the right
> solution in places where Latin-1 is not spoken.

Indeed not - here lies moji-bake.

Regards,
Martin

From stephen at xemacs.org  Thu Sep 13 20:43:59 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 14 Sep 2007 03:43:59 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46E96E98.9080406@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
Message-ID: <87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > One "universal" solution is to use Unicode private-use-area
 > characters. 

+1

 > Of course, if the input data already contains PUA characters,
 > there would be an ambiguity.

That may be true in the implementation, but it shouldn't.  What should
happen internally is that all undecodable characters (which PUA
characters are by definition for standard codecs) are mapped to unused
codepoints in the PUA, chosen by Python.

This map would be required to maintain some house-keeping information
about where the character came from (specificially the original
coded character set so that round-tripping would succeed).

One possible error-recovery strategy for broken encodings (as opposed
to coding which is correct in format but contains a code point not in
the table) would be to have a "pure code unit" block in the PUA.

Note that since we're talking about code units throughout (there's no
guarantee that the encoding in question is octet-oriented, although
that's almost always the case in practice), 256 code points may not be
enough.

 > We would make a list of all interfaces that use the PUA error
 > handler: file names, environment variables, command line
 > arguments.

In general, I don't consider this an error.  It's reasonable to use
exception handling internally to the codec -- such broken texts are
rare except in interactive applications where the speed isn't an issue
-- but for some applications it would be useful to accept entire
broken strings and pass them to Python with the broken parts marked
(ie, by being assigned to the "code unit" block of the PUA) and the
rest decoded.

Here's an example that comes up in Emacs (specifically AUCTeX).  TeX
error messages are octet-oriented and regularly slice multibyte
encodings in the middle of characters or escape sequences.  It turns
out the basic codec algorithms often DTRT by (accidentally)
resynchronizing on ASCII, and sometimes can even resynch on a
multibyte character.  So the display of the "broken" text is often
useful.  However, for reasons I'm not familiar with the AUCTeX
developers have asked that the strings be invertible (ie, back to the
octets that TeX spit out).  This scheme would allow that.




From martin at v.loewis.de  Thu Sep 13 21:18:05 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 13 Sep 2007 21:18:05 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>		<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46E98CED.1010008@v.loewis.de>

>  > We would make a list of all interfaces that use the PUA error
>  > handler: file names, environment variables, command line
>  > arguments.
> 
> In general, I don't consider this an error.

I don't, either. However, given the current codec design, this is
the least intrusive way to enhance "all" codecs with the feature
of mapping unsupported code points to PUA characters. Otherwise,
we would have to duplicate all codecs.

Regards,
Martin

From qrczak at knm.org.pl  Thu Sep 13 21:26:15 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Thu, 13 Sep 2007 21:26:15 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46E96E98.9080406@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
Message-ID: <1189711575.22693.86.camel@qrnik>

Dnia 13-09-2007, Cz o godzinie 19:08 +0200, "Martin v. L?wis"
napisa?(a):

> Of course, if the input data already contains PUA characters,
> there would be an ambiguity. We can rule this out for most codecs,
> as they don't support PUA characters. The major exception would
> be UTF-8,

Most codecs other than UTF-8 don't have this problem.

Unicode people are generally allergic to any non-standard variants of
Unicode specifications, and feel that this is a heresy. I experimentally
and optionally use U+0000 escaping, but I'm not convinced that anything
like this is a good idea, and it should probably not be enabled by
default.

Mono uses U+0000 escaping too; I'm not sure if all the details agree.
This escaping scheme has an advantage that it's compatible with real
UTF-8 for strings which contain no \x00 = U+0000. Most of applicable
contexts do guarantee to not contain NUL, so the interpretation of valid
data in both directions is unchanged. My encoder even rejects U+0000
prefixes for bytes which would form valid UTF-8 sequences, so you can't
have two Unicode strings which encode to the same byte string. The
side effect is that not all U+0000 occurrences can be encoded, but the
contexts we are talking about don't allow U+0000 anyway.

> > I'm guessing one thing we need to do is
> > research how various systems decide what encoding to use.

This is the easy part; modern Unices have nl_langinfo(CODESET).
The hard part is deciding what to do when decoding fails.

[I will be absent between Friday and Monday.]

Here is what other environments do. This was over 2 years ago, something
might have changed. In particular Mono now uses some U+0000 escaping,
I need to investigate it again. I checked both directions, i.e. what do
they do with unencodable filenames given by the program. Everything is
on Linux. Some behaviors are obviously awful.


Java (Sun)
----------

Filenames are assumed to be in the locale encoding.

a) Interpreting. Bytes which cannot be converted are replaced by U+FFFD.

b) Creating. Characters which cannot be converted are replaced by "?".

Command line arguments and standard I/O are treated in the same way.


Java (GNU)
----------

Filenames are assumed to be in Java-modified UTF-8.

a) Interpreting. If a filename cannot be converted, a directory listing
   contains a null instead of a string object.

b) Creating. All Java characters are representable in Java-modified
   UTF-8. Obviously not all potential filenames can be represented.

Command line arguments are interpreted according to the locale.
Bytes which cannot be converted are silently skipped.

Standard I/O works in ISO-8859-1 by default. Obviously all input is
accepted. On output characters above U+00FF are replaced by "?".


C# (mono)
---------

Filenames use the list of encodings from the MONO_EXTERNAL_ENCODINGS
environment variable, with UTF-8 implicitly added at the end. These
encodings are tried in order.

a) Interpreting. If a filename cannot be converted, it is skipped in
   a directory listing.

   The documentation says that if a filename, a command line argument
   etc. looks like valid UTF-8, it is treated as such first, and
   MONO_EXTERNAL_ENCODINGS is consulted only in remaining cases.
   The reality seems to not match this (mono-1.0.5).

b) Creating. If UTF-8 is used, U+0000 throws an exception
   (System.ArgumentException: Path contains invalid chars), paired
   surrogates are treated correctly, and an isolated surrogate causes
   an internal error:
** ERROR **: file strenc.c: line 161 (mono_unicode_to_external):
assertion failed: (utf8!=NULL)
aborting...

Command line arguments are treated in the same way, except that if an
argument cannot be converted, the program dies at start:
[Invalid UTF-8]
Cannot determine the text encoding for argument 1 (xxx\xb1\xe6\xea).
Please add the correct encoding to MONO_EXTERNAL_ENCODINGS and try
again.

Console.WriteLine emits UTF-8. Paired surrogates are treated
correctly, unpaired surrogates are converted to pseudo-UTF-8.

Console.ReadLine interprets text as UTF-8. Bytes which cannot be
converted are silently skipped.


Perl
----

Depending on the convention used by a particular function and on
imported packages, a Perl string is treated either as Perl-modified
Unicode (with character values up to 32 bits or 64 bits depending on
the architecture) or as an unspecified locale encoding. It has two
internal representations: ISO-8859-1 and Perl-modified UTF-8 (with
an extended range).

If every Perl string is assumed to be a Unicode string, then filenames
are effectively ISO-8859-1.

a) Interpreting. Characters up to U+00FF are used.

b) Creating. If the filename has no characters above 0xFF, it is
   converted to ISO-8859-1. Otherwise it is converted to Perl-modified
   UTF-8 (all characters, not just those above 0xFF).

Command line arguments and standard I/O are treated in the same way,
i.e. ISO-8859-1 on input and a mixture of ISO-8859-1 and UTF-8 on
output, depending on the contents.

This behavior is modifiable by importing various packages and using
interpreter invocation flags. When Perl is told that command line
arguments are UTF-8, the behavior for strings which cannot be
converted is inconsistent: sometimes it's treated as ISO-8859-1,
sometimes an error is signalled.


Haskell
-------

Haskell nominally uses Unicode. There is no conversion framework
standarized or implemented yet though. Implementations which support
more than 256 characters currently assume ISO-8859-1 for filenames,
command line arguments and all I/O, taking the lowest 8 bits of a
character code on output.


Common Lisp: CLISP
------------------

Common Lisp standard doesn't say anything about string encoding.
In Clisp strings are UTF-32 (internally optimized as UCS-2 and
ISO-8859-1 when possible). Any character code up to U+10FFFF is
allowed, including isolated surrogates.

Filenames are assumed to be in the locale encoding.

a) Interpreting. If a byte cannot be converted, a condition is signaled.

b) Creating. If a character cannot be converted, a condition is
   signaled.


Kogut (my language)
-----

Strings are UTF-32 (internally optimized as ISO-8859-1 when possible).
Any character code up to U+10FFFF is allowed, including isolated
surrogates.

Filenames are assumed to be in the locale encoding; the encoding can be
overridden by a Kogut-specific environment variable. A program can
itself set the encoding to something else, perhaps locally during
execution of some code. It can use a conversion which puts U+FFFD / "?"
instead of throwing an exception on error, or which does something else.

a) Interpreting. If a byte cannot be converted, an exception is thrown.

b) Creating. If a character cannot be converted or if a name contains
   U+0000, an exception is thrown.

Command line arguments and standard I/O are treated in the same way.

There is an additional encoding which is a modified UTF-8 and can be
explicitly used instead of true UTF-8: any byte string can be decoded,
where normally undecodable bytes and \0 are escaped as U+0000 U+00xx.


GNOME
-----

GNOME uses UTF-8 internally, or sometimes byte strings in other
encodings. I guess filenames are passed as byte strings. AFAIK
sometimes filenames are expressed as URLs, even internally when it's
invisible to the user, and then various unsafe bytes are escaped as
two hex digits preceded by the percent sign. From the programmer's
point of view the original byte strings are generally used. Filename
encoding matters for the display though, so here I describe the user's
point of view.

If the environment variable G_FILENAME_ENCODING is present, it
specifies the encoding of filenames, unless it is @locale which means
the encoding of the locale. If it's not present but G_BROKEN_FILENAMES
is present, filenames are assumed to be in the locale encoding.
If neither variable is present, filenames are assumed to be in UTF-8.

a) Interpreting. If a filename cannot be converted from the selected
   encoding, all non-ASCII bytes are shown as octal numbers preceded
   by the backslash, as hex numbers preceded by the percent sign, or
   as question marks, depending on the situation (I can observe all
   three cases in gedit). What is physically stored is the byte string
   and the file is opened successfully.

b) Creating. If a character cannot be represented, the application
   refuses to save the file until a good filename is entered.


Mozilla
-------

I don't know how it handles filenames internally. From the user's
point of view it matters how it presents a local directory listing.

Filenames are assumed to be in the locale encoding.

If a filename cannot be converted, it's skipped. If it can be
converted but contains characters like 0x80-0x9F in ISO-8859-2,
they are displayed as question marks and the file is inaccessible.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From oliphant at enthought.com  Thu Sep 13 21:27:33 2007
From: oliphant at enthought.com (Travis E. Oliphant)
Date: Thu, 13 Sep 2007 14:27:33 -0500
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: 
References: <20070829234728.GV24059@electricrain.com>	
		
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>	
		
	<46E62358.3020404@enthought.com>	
		
	<46E6F137.2020001@enthought.com>
	
Message-ID: <46E98F25.5010404@enthought.com>

Guido van Rossum wrote:
> On 9/11/07, Travis E. Oliphant  wrote:
>   
>> I'm not sure I understand the difference between a classic read lock and
>> the exclusive write lock concept.   Does the classic read-lock just
>> prevent writing to the memory area.  In my mind that is a read-only
>> memory buffer and the buffer interface would complain if a writeable
>> buffer was requested.
>>     
>
> There are different notions of reading and writing.  Sometimes an
> object it naturally read-only (e.g. a PyString). In that case
> requesting SIMPLE access should pass but requesting WRITABLE or
> LOCKDATA access should fail. (I think the other flags are orthogonal
> to these, right?). Any number of concurrent SIMPLE accesses can
> coexist since the clients promise they will only read.
>   
Yes, the other flags are orthogonal to this concept.
> OTOH suppose we have an object that is naturally writable (e.g. e
> PyBytes). I understood that in this case any number of SIMPLE or
> WRITABLE requests would be allowed to be outstanding simultaneously,
> and any of these would simply prevent the buffer from moving (fixing
> the object's size). But this doesn't sound like it is how you meant it
> -- you seem to say that once any SIMPLE (readonly) requests are
> outstanding, WRITABLE requests should fail. 
Wait a minute.  I want to clarify that normally any number of SIMPLE or 
WRITEABLE requests would be possible for an object that is naturally 
writeable.   That is my thinking. 

The purpose of LOCKDATA is to allow an object to request that the object 
not be writeable in the future while it holds a view to the object.   I 
did not think that this would be the normal behavior, but exceptional.

What seems to be needed is yet another flag that allows a buffer 
requester to insist that the object not allow any buffer accesses read 
or write until its view is done.   So, you would have something like

LOCK_FOR_WRITE
LOCK_FOR_READ

I would want to encourage people not to use the LOCK_FOR_READ unless 
there is an important benefit or need to use it.   On the other hand, 
the argument about dma mechanisms (like moving memory to a video card 
for processing) needing to make the buffer unavailable temporarily 
sounds like a reasonable one to me.  I can already see applications for it.
> And I suppose that only
> one WRITABLE request ought to be allowed at a time. But then I don't
> know what the difference between WRITABLE and LOCKDATA would be.
>
>
>   
I hope I've clarified the difference between these in my mind.
> Then a "classic read lock" would request read
> access while locking out writers (bsddb would use this);
I did not separate this case in my mind, as I presumed that if something 
wanted to prevent other writers it would itself want to write.  I can 
see what is wanted here now.
>  a "classic
> write lock" would request write access while locking out writers (your
> scratch area example would use this); others who don't really care if
> the data changes underneath them as long as it doesn't move (e.g.
> traditional I/O) could request read access without locking. I'm not
> sure if there's a use case to be made for write access without
> locking, but I wouldn't rule it out -- possibly when two threads share
> a memory area they might have their own protocol for locking it and
> might just both want to be able to write to (parts of) it.
>   
Yes, I would not rule out write-access without locking either.  NumPy 
actually uses that all the time internally where two or more objects 
share the same data and can both write to it (although the community 
warns people about doing this without knowing what you are doing).
> What do you think? Another way to look at this would be to consider
> these 4 cases:
>   
I think I was leaving out the cases

1) requesting a read access with future write locking ('classic read lock')
2) requesting a read or write access with future read locking.

Let me see how my thinking maps to your list below which at first glance 
looks pretty good.
> basic read access (I can read, others can read or write)
> locked read access (I can read, others can only read)
> basic write access (I can read and write, others can read or write)
> exclusive write access (I can read and write, no others can read or write)
>
>   
I guess my original LOCK_DATA concept (I can read and write, others can 
only read) is not even in this list as you discuss below.   I'm actually 
wondering if another function should be added to handle the concept of 
locking.  I can imagine that it will want to grow more fine-grained 
locking possibilities.

> Except that accessing the object from Python (e.g. iteration or
> indexing) never gets locked out. (Or perhaps it should be? That can
> also be done.)
>   
I think if it doesn't go through the buffer interface it is up to the 
object to decide (i.e. what does the object do with itself when buffers 
are exported --- that will depend on the object).   All it must do is 
support the buffer interface in the correct way (i.e. not move the 
memory buffers are relying on and support the access modes correctly 
that it purports to export).
>> Actually, writeable is an accepted variant of 'writable' (but it doesn't
>> show up in many spell-check dictionaries).  No, it is not too late to
>> change it.  Or just define WRITEABLE as WRITABLE.   NumPy uses
>> "WRITEABLE" simply because I like that spelling better.
>>     
>
> Google found 1.4M occurrences of writeable vs. 3.9M occurrences of
> writable. I guess you represent a strong minority. :-) I'd still like
> to see it changed. We can leave WRITEABLE as an alias for WRITABLE for
> those who are used to seeing it that way in NumPy.
>   
I'm fine with that. 
>
> Well, the scratch area scenario you describe makes it iffy to read
> anything out of the original object since you wouldn't know whether
> you were reading before, during or after the write back from the
> scratch area to the object's buffer. The question is, do we really
> care. If we adopted my 4 access modes above, we could say that basic
> read access will still be granted when someone has exclusive write
> access if we don't care, OR we could say that basic reads are locked
> out by exclusive write access. (And then there's the separate issue of
> whether python-level access counts as basic read access or doesn't
> count at all -- though the moer I think about it, I think it should be
> treated the smne as basic read access.)
>
>   
>> On the other hand, there could be two concepts of locking that a
>> consumer could request from an object
>>
>> 1) Lock so that no other reads or writes are possible until the lock is
>> released.
>> 2) Lock so that only reads are possible.
>>
>> I had only thought of #2 for the current buffer interface.
>>     
>
> #1 maps to locked read OR exclusive write access in the strict variant.
> #2 maps to locked read in my scheme.
>
>   
Let me think about adding a function for read-write locking that is 
separate from getting a view (which implements memory-location 
locking).  I appreciate the discussion as it is helping me clarify my 
thinking.

-Travis



From stephen at xemacs.org  Thu Sep 13 23:12:04 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 14 Sep 2007 06:12:04 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189711575.22693.86.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
Message-ID: <18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>

"Marcin 'Qrczak' Kowalczyk"  writes:

 >> Of course, if the input data already contains PUA characters,
 >> there would be an ambiguity. We can rule this out for most codecs,
 >> as they don't support PUA characters. The major exception would
 >> be UTF-8,

 > Most codecs other than UTF-8 don't have this problem.

All Japanese codecs do.  Corporate variants of JIS remain alive, and
well.  They're not limited to Microsoft and Apple, but also IBM,
Fujitsu/Sun, Hitachi, and NEC software allow entry of characters not
in the JIS sets.

 > Unicode people are generally allergic to any non-standard variants of
 > Unicode specifications, and feel that this is a heresy. I experimentally
 > and optionally use U+0000 escaping, but I'm not convinced that anything
 > like this is a good idea, and it should probably not be enabled by
 > default.

-1

Heresy, no.  That doesn't make it anything like a good idea.  There
are plenty of character sets, even those that are ISO 2022 compatible,
with undefined code points.  Such code points regularly do appear in
text content where the coded character set is either incorrectly
specified or ambiguous.  This means that a way of handling such points
is very useful, and as long as there's enough PUA space, the approach
I suggested can handle all of these various issues.  Any application
where there won't be enough PUA space is very special, either
demanding more than 2 planes worth of private space (planes 15 and
16), or demanding very high efficiency (needs to fit in the BMP
private space).  The approach I suggest has the advantage that
applications with a small PUA usage (IIRC more than 4000 PUA code
points are available in the BMP) will have string length == character
count.

 > the contexts we are talking about don't allow U+0000 anyway.

zsh at least allows you to type ^V^SPC to enter an ASCII NUL character
on the command line, and to assign a string containing NULs to an
environment variable.


From qrczak at knm.org.pl  Fri Sep 14 00:31:36 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 14 Sep 2007 00:31:36 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1189722696.30037.14.camel@qrnik>

Dnia 14-09-2007, Pt o godzinie 06:12 +0900, Stephen J. Turnbull
napisa?(a):

> This means that a way of handling such points
> is very useful, and as long as there's enough PUA space, the approach
> I suggested can handle all of these various issues.

PUA already has a representation in UTF-8, so this is more incompatible
with UTF-8 than needed, and hijacks characters which might be used (for
example I'm using some PUA ranges for encoding my script, they are being
transported between processes, and I would be upset if some language had
mangled them to something else).

While U+0000 is also representable in UTF-8, it cannot occur in
filenames, program arguments, environment variables etc., and thus
in many contexts it was free. It's not free mostly in file contents,
including stdin/stdout/stderr. Of course my escaping scheme can
preserve \0 too, by escaping it to U+0000 U+0000, but here it's
incompatible with the real UTF-8.

> zsh at least allows you to type ^V^SPC to enter an ASCII NUL character
> on the command line, and to assign a string containing NULs to an
> environment variable.

They may work for its internal commands and process-internal variables.
But there can't be NULs in arguments of program invocation, or in
environment variables which survive execve, because the Unix APIs and
data structures - not just C functions - use NULs to delimit these
strings.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From greg.ewing at canterbury.ac.nz  Fri Sep 14 01:26:28 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 14 Sep 2007 11:26:28 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: <52dc1c820709120044h722605cekc86ea668a6a1b4bd@mail.gmail.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	<46E72B18.9060908@canterbury.ac.nz>
	<52dc1c820709120044h722605cekc86ea668a6a1b4bd@mail.gmail.com>
Message-ID: <46E9C724.9080808@canterbury.ac.nz>

Gregory P. Smith wrote:
> When I read the plain term EXCLUSIVE I read that to mean nobody else can 
> read -or- write, ie: not shared in any sense.

You're right, it's not the best term.

> Lets extend these base 
> concepts to SHARED_READ, SHARED_WRITE, EXCLUSIVE_READ, EXCLUSIVE_WRITE

EXCLUDE_WRITE might be better, since EXCLUSIVE_WRITE seems
to imply that one is writing oneself as well.

> EXCLUSIVE_READ - no others can read this buffer while this view is 
> open.

This is the one that I don't think is necessary. I don't
see a need to ever prevent others from *reading* if they
really want to and are prepared to deal with the
consequences. Most of the time the other party will be using
READ_LOCK which includes EXCLUDE_WRITE, so it will fail
if you're already holding a write lock.

So we just have

READ
WRITE
READ_LOCK = READ | EXCLUDE_WRITE
WRITE_LOCK = WRITE | EXCLUDE_WRITE

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Sep 14 02:02:12 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 14 Sep 2007 12:02:12 +1200
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <18151.61549.956117.769166@montanaro.dyndns.org>
References: <46E559E9.4090907@trueblade.com> <46E55B05.3090701@v.loewis.de>
	<46E55FD4.9000807@trueblade.com>
	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>
	<18150.1863.436464.41503@montanaro.dyndns.org>
	
	<18150.42783.278892.121765@montanaro.dyndns.org>
	
	<18151.61549.956117.769166@montanaro.dyndns.org>
Message-ID: <46E9CF84.7060308@canterbury.ac.nz>

skip at pobox.com wrote:
> I was just thinking about the folks at places like FermiLab and CERN. ;-)

Those guys probably need picoseconds...

--
Greg

From foom at fuhm.net  Fri Sep 14 05:41:12 2007
From: foom at fuhm.net (James Y Knight)
Date: Thu, 13 Sep 2007 23:41:12 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189700532.22693.40.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
Message-ID: <28CDCC5D-E62C-4C9F-86FE-2DC31C6834B0@fuhm.net>

On Sep 13, 2007, at 12:22 PM, Marcin 'Qrczak' Kowalczyk wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?

Here's a suggestion I made on the SBCL dev list a while back, in  
response to the same issues. I am responding to myself here, where my  
first suggestion was to keep all the environmental gunk in byte- 
arrays rather than strings. That is still a very nice and simple  
possibility.

My second inclination was to use a variant of utf8 which can handle  
all bytestrings, instead of utf8 itself: utf-8b. This obviously works  
best when the system encoding is actually utf8.

> On Aug 2, 2007, at 4:55 PM, James Y Knight wrote:
>
>> Yeah -- it's pretty clear the environment isn't _actually_ in the
>> default encoding. It's just binary junk which often but not always
>> contains some text encoded in some arbitrary superset of ASCII. Just
>> like command line arguments (and filenames on linux).
>>
>> The hard part is that users expect command line arguments, filenames,
>> and environment values to be strings (because they normally do
>> contain text-like things), when strictly they cannot be because there
>> is no reliable encoding.
>>
>
> A good alternative to this is for SBCL to use the UTF8b encoding to  
> decode unix environment gunk (filenames, env vars, command line  
> args) which are *probably* in utf8, but might not be. utf8b has the  
> nice property that any arbitrary bytestring can be decoded into  
> unicode, and then round-tripped back to the same bytes. Valid utf8  
> sequences turns into the same unicode characters as with the utf8  
> codec. Invalid utf8 sequences turn into invalid surrogate pair  
> sequences in the unicode string.
>
> Thus, SBCL can return strings, and never throw an error. If you  
> actually wanted the random binary, you can losslessly convert the  
> unicode string back to binary. Win win.
>
> Some references:
> Original mail:
> http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html
>
> Blog entry:
> http://bsittler.livejournal.com/10381.html
>
> Python implementation: http://hyperreal.org/~est/libutf8b/

James



From greg.ewing at canterbury.ac.nz  Fri Sep 14 06:00:56 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 14 Sep 2007 16:00:56 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46EA0778.3000502@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> What should
> happen internally is that all undecodable characters (which PUA
> characters are by definition for standard codecs) are mapped to unused
> codepoints in the PUA, chosen by Python.

You mean chosen dynamically? What happens if these PUA
characters get encoded some other way, written out, and
read back into another session? The information mapping
them back to their original meanings would no longer
be correct.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Sep 14 06:28:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 14 Sep 2007 16:28:39 +1200
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
 immutable support
In-Reply-To: <46E98F25.5010404@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	<46E98F25.5010404@enthought.com>
Message-ID: <46EA0DF7.2090706@canterbury.ac.nz>

Travis E. Oliphant wrote:
> I would want to encourage people not to use the LOCK_FOR_READ unless 
> there is an important benefit or need to use it.

If you mean that LOCK_FOR_READ would unilaterally deny
anyone else read access, my proposal avoids this by not
having such a mode at all. So you can always get read
access if you really want it.

But I expect that most of the time you'll at least want
to make sure nobody is writing while you're trying to
read. In my terminology you spell that READ | EXCLUDE_WRITE.

> Let me think about adding a function for read-write locking that is 
> separate from getting a view (which implements memory-location 
> locking).

I'm not sure it needs to be a separate function, just
a clearly separated set of options in the flags.

Remember that clients are only supposed to be holding
a buffer for as short a time as possible. It's most
likely that the same read/write locking options are
going to apply for the whole duration of a buffer
operation, I think.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From stephen at xemacs.org  Fri Sep 14 06:52:45 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 14 Sep 2007 13:52:45 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA0778.3000502@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
Message-ID: <87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > Stephen J. Turnbull wrote:

 > > What should happen internally is that all undecodable characters
 > > (which PUA characters are by definition for standard codecs) are
 > > mapped to unused codepoints in the PUA, chosen by Python.
 > 
 > You mean chosen dynamically?

Yes.

 > What happens if these PUA characters get encoded some other way,

You can't win that, because Unicode is the only encoding that attempts
to guarantee even the possibility of round-tripping.  The only thing
you can win is if it's the *same* character set (which might be used
by multiple encodings), and then we record the character set and the
code point.  That's the best we can do in theory.

The main problem with this scheme that I know of is that if you have a
Python string that contains such a code point, you'll need to somehow
include the information about the original encoding when pickling and
the like.

From greg.ewing at canterbury.ac.nz  Fri Sep 14 07:08:04 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 14 Sep 2007 17:08:04 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46EA1734.6020103@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> You can't win that, because Unicode is the only encoding that attempts
> to guarantee even the possibility of round-tripping.

Rubbish -- I can do print [ord(c) for c in my_unicode_string]
and get perfect round-trippability if I want.

You can ask people to use pre-existing officially-sanctioned
encodings for their unicode data, but you can't force them to.

> The main problem with this scheme that I know of is that if you have a
> Python string that contains such a code point, you'll need to somehow
> include the information about the original encoding when pickling and
> the like.

That's exactly the sort of thing I'm talking about. It
would be surprising if pickling worked reliably for all
strings *except* ones that happened to come in as a
command line argument.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From stephen at xemacs.org  Fri Sep 14 08:02:56 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 14 Sep 2007 15:02:56 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189722696.30037.14.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
Message-ID: <18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>

"Marcin 'Qrczak' Kowalczyk"  writes:

 >> This means that a way of handling such points is very useful, and
 >> as long as there's enough PUA space, the approach I suggested can
 >> handle all of these various issues.

 > PUA already has a representation in UTF-8, so this is more incompatible
 > with UTF-8 than needed,

Hm?  It's not incompatible at all, and we're not interested in a
representation in UTF-8, but rather in UTF-16 (ie, the Python internal
encoding).  And it *is* needed, because these characters by assumption
are not present in Unicode at all.  (More precisely, they may be
present, but the tables we happen to have don't have mappings for
them.)

 > and hijacks characters

No, it doesn't.  As I responded to Greg Ewing, there is an issue about
things like pickling which use Python internal representations, but
not for anything which normally communicates with Python through
codecs.

 > which might be used (for example I'm using some PUA ranges for
 > encoding my script, they are being transported between processes,
 > and I would be upset if some language had mangled them to something
 > else).

Your escaping proposal *guarantees* mangling because it turns
characters into tuples of code units; it does not preserve character
set information.  It only works for you because you only have one
private script you care about, so you know what those code units mean.

If we don't have character set information, then of course that's the
best you can do, and my proposal will do something equivalent.  But if
we *do* have character set information, then my proposal is far more
powerful.  It allows us to process PUA characters as characters (ie,
put them in strings, slice and dice, merge and meld) with some hope of
recovering the character's semantics after many transformations of the
containing string.

In any case, it would not be hard to create an API allowing a Python
program to "reserve" a block in a PUA.  You still have the issue of
collision among multiple applications wanting the same block, of
course.  You may be able to guarantee that will never happen in your
application, but there are examples of OSes that assigned characters
in the PUA (Mac OS and Microsoft Windows both did so at one time or
another, although they may not be doing it currently, I haven't
checked).

 > While U+0000 is also representable in UTF-8, it cannot occur in
 > filenames, program arguments, environment variables etc., in many
 > contexts it was free.

In your experience, and mine, but is it in POSIX?  If not, I'd rather
not add the restriction, no matter how harmless it seems in practice.
(Of course practicality beats purity, but your proposal has many other
defects, too.)

I'm also very bothered by the fact that the interpretation of U+0000
differs in different contexts in your proposal.  As I'm sure you know,
the semantics of mixing codecs with different semantics (specifically,
the treatment of particular code units) is very hairy.  Once you get a
string into Python, you normally no longer know where it came from,
but now whether something came from the program argument or
environment or from a stdio stream changes the semantics of U+0000.
For me personally, that's a very good reason to object to your
proposal.

 > Of course my escaping scheme can preserve \0 too, by escaping it to
 > U+0000 U+0000, but here it's incompatible with the real UTF-8.

No.  It's *never* compatible with UTF-8 because it assigns a different
meaning to U+0000 from ASCII NUL.

Your scheme also suffers from the practical problem that strings
containing escapes are no longer arrays of characters.  One effect of
my scheme is to extend the "string is array" model to any application
that doesn't need to treat more non-BMP characters than there is space
available in the PUA.  Once implemented, it could easily be adapted to
handle characters in Planes 1-16, thus avoiding any use of surrogates
in the vast majority of cases.


From qrczak at knm.org.pl  Fri Sep 14 09:49:33 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 14 Sep 2007 09:49:33 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1189756174.32337.30.camel@qrnik>

Dnia 14-09-2007, Pt o godzinie 15:02 +0900, Stephen J. Turnbull
napisa?(a):

>  > PUA already has a representation in UTF-8, so this is more incompatible
>  > with UTF-8 than needed,
> 
> Hm?  It's not incompatible at all, and we're not interested in a
> representation in UTF-8, but rather in UTF-16

PUA is representable in both. When the command line contains an UTF-8
encoding of U+E650 (a PUA character), the script should better receive
a UTF-16 or UTF-32 encoding of U+E650 in the appropriate place,
otherwise we are corrupting user data.

> (ie, the Python internal encoding).

(Python also uses UTF-32 alternatively to UTF-16.)

> And it *is* needed, because these characters by assumption
> are not present in Unicode at all.  (More precisely, they may be
> present, but the tables we happen to have don't have mappings for
> them.)

They are present! For UTF-8, UTF-16 and UTF-32 PUA is not special in
any way. It's just a block of characters which will never be officially
assigned by the Unicode Consortium, so they can be used privately among
parties who agree about their meaning.

> Your escaping proposal *guarantees* mangling because it turns
> characters into tuples of code units; it does not preserve character
> set information.

Huh? What do you mean by preserving character set information?

It preserves the byte string contents, which is all that is needed.
It has the same result as UTF-8 for all valid UTF-8 sequences not
containing NUL.

>  > While U+0000 is also representable in UTF-8, it cannot occur in
>  > filenames, program arguments, environment variables etc., in many
>  > contexts it was free.
> 
> In your experience, and mine, but is it in POSIX?

Yes. Both as specified and in the reality (e.g. POSIX offers the second
parameter of main() of type char ** as the only way to receive command
line arguments, and they are NUL-terminated).

> I'm also very bothered by the fact that the interpretation of U+0000
> differs in different contexts in your proposal.

Well, for any scheme which attempts to modify UTF-8 by accepting
arbitrary byte strings is used, *something* must be interpreted
differently than in real UTF-8.

> Once you get a
> string into Python, you normally no longer know where it came from,
> but now whether something came from the program argument or
> environment or from a stdio stream changes the semantics of U+0000.
> For me personally, that's a very good reason to object to your
> proposal.

This can be said about any modification of UTF-8.

Of course you can use such encoding on a standard stream too. In this
case only U+0000 cannot be used normally, and the resulting stream will
contain whatever bytes were present in filenames and other strings being
output to it.

>  > Of course my escaping scheme can preserve \0 too, by escaping it to
>  > U+0000 U+0000, but here it's incompatible with the real UTF-8.
> 
> No.  It's *never* compatible with UTF-8 because it assigns a different
> meaning to U+0000 from ASCII NUL.

It is compatible with UTF-8 except for U+0000, and a true U+0000 cannot
occur anyway in these contexts, so this incompatibility is mostly
harmless.

> Your scheme also suffers from the practical problem that strings
> containing escapes are no longer arrays of characters.

They are no less arrays of characters than strings containing combining
marks.

[And now I'm gone for 4 days.]

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From qrczak at knm.org.pl  Fri Sep 14 10:20:47 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Fri, 14 Sep 2007 10:20:47 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <28CDCC5D-E62C-4C9F-86FE-2DC31C6834B0@fuhm.net>
References: <1189700532.22693.40.camel@qrnik>
	<28CDCC5D-E62C-4C9F-86FE-2DC31C6834B0@fuhm.net>
Message-ID: <1189758047.544.1.camel@qrnik>

Dnia 13-09-2007, Cz o godzinie 23:41 -0400, James Y Knight napisa?(a):

> Here's a suggestion I made on the SBCL dev list a while back, in  
> response to the same issues.

After a second thought, this (escaping undecodable UTF-8 bytes by
unpaired low surrogates) might be a good idea.

(I don't remember why I once rejected this.)

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From stephen at xemacs.org  Fri Sep 14 10:56:24 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 14 Sep 2007 17:56:24 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA1734.6020103@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
Message-ID: <87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > Stephen J. Turnbull wrote:
 > > You can't win that, because Unicode is the only encoding that attempts
 > > to guarantee even the possibility of round-tripping.
 > 
 > Rubbish -- I can do print [ord(c) for c in my_unicode_string]
 > and get perfect round-trippability if I want.

Speaking of rubbish.  You chose the context of round-tripping *across
encodings*, not me.  Please stick with your context.

 > You can ask people to use pre-existing officially-sanctioned
 > encodings for their unicode data, but you can't force them to.

A wide variety of encodings, some standard and some not, and not
necessarily with a known injection into Unicode, is precisely what I'm
trying to deal with.  None of the other proposals, except maybe
Martin's, do.  James Knight's proposal as it stands assumes UTF-8
Unicode, while Marcin Kowalczyk's just punts to treating everything
unknown as a sequence of code units AFAICS.

 > > The main problem with this scheme that I know of is that if you have a
 > > Python string that contains such a code point, you'll need to somehow
 > > include the information about the original encoding when pickling and
 > > the like.

I was merely admitting that getting it to work *efficiently* and
*backward-compatibly* for pickling will be tricky.  But it's trivial
to get it to work *reliably*.
 
 > That's exactly the sort of thing I'm talking about. It
 > would be surprising if pickling worked reliably for all
 > strings *except* ones that happened to come in as a
 > command line argument.

Um, no, it's not what you're talking about.  Pickling is not currently
reliable for strings that come in as command line arguments because
Python is not reliable.  That's precisely what we're trying to fix.
None of the proposals make things worse, since they only apply in
cases where the codec would throw an exception or incorrectly decode
the argument anyway.

Yes, you could improve reliability in this sense by storing those
strings as bytes, rather than trying to make better encoding guesses
and storing "debugging info" about undecodable input.  But surely
using bytes objects is a non-starter; users are going to expect that
command-line arguments are strings, not bytes, and ASCII-only users
will raise hell if you ask them to explicitly invoke codecs to
translate command-line arguments to strings so that they can be used.

From hagenf at CoLi.Uni-SB.DE  Fri Sep 14 11:15:00 2007
From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Fri, 14 Sep 2007 11:15:00 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189700532.22693.40.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
Message-ID: <46EA5114.9060200@coli.uni-saarland.de>

Is it too unreasonable to keep the byte strings we get from the OS as 
byte strings in Python (since we're not sure about their encoding) and 
offer functions for getting strings?

sys.argv could be of type bytes and sys.arguments (or whatever) could be 
a function taking an encoding parameter (which defaults to UTF-8) and 
returning strings.

Of course that's backwards incompatible and I'm not sure if it's too 
late for something like this now.

- Hagen

From ncoghlan at gmail.com  Fri Sep 14 12:07:21 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 14 Sep 2007 20:07:21 +1000
Subject: [Python-3000] __format__ and datetime
In-Reply-To: <46E9CF84.7060308@canterbury.ac.nz>
References: <46E559E9.4090907@trueblade.com>
	<46E55B05.3090701@v.loewis.de>	<46E55FD4.9000807@trueblade.com>	<79990c6b0709100829t6aa18653i5f67b7848c778587@mail.gmail.com>	<18150.1863.436464.41503@montanaro.dyndns.org>		<18150.42783.278892.121765@montanaro.dyndns.org>		<18151.61549.956117.769166@montanaro.dyndns.org>
	<46E9CF84.7060308@canterbury.ac.nz>
Message-ID: <46EA5D59.6050103@gmail.com>

Greg Ewing wrote:
> skip at pobox.com wrote:
>> I was just thinking about the folks at places like FermiLab and CERN. ;-)
> 
> Those guys probably need picoseconds...

With the suggested %f format character and the mention of Fermilab and 
CERN, I started thinking about femtoseconds :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From barry at python.org  Fri Sep 14 13:30:10 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 14 Sep 2007 07:30:10 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA1734.6020103@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
Message-ID: <0618908E-E4A5-4062-BC92-1A0B83C69E7B@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 14, 2007, at 1:08 AM, Greg Ewing wrote:

> Stephen J. Turnbull wrote:
>> You can't win that, because Unicode is the only encoding that  
>> attempts
>> to guarantee even the possibility of round-tripping.
>
> Rubbish -- I can do print [ord(c) for c in my_unicode_string]
> and get perfect round-trippability if I want.

I think my_unicode_string.encode('raw-unicode-escape) is equivalent.

- -Barry



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRupww3EjvBPtnXfVAQKBWAP/dU7eBsgvg704+beCPRbcKkFJvQuVd7br
D0irSae0P4IxQDC36dlVE+nUFvKWQDx0UPBmFfWb7CYZnmGpS+Z1hBNLzKy+5POJ
A4KSVV9nv1+YGKZBna1zgxuiP9EEHo7MqPm5PxKHmMHqpmcns3U6hZxutBCXN7Sw
pics7Kb7s6s=
=fiv7
-----END PGP SIGNATURE-----

From barry at python.org  Fri Sep 14 13:34:36 2007
From: barry at python.org (Barry Warsaw)
Date: Fri, 14 Sep 2007 07:34:36 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA5114.9060200@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
Message-ID: <200CD272-6015-4FE7-A004-5939E59316BE@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sep 14, 2007, at 5:15 AM, Hagen F?rstenau wrote:

> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?
>
> sys.argv could be of type bytes and sys.arguments (or whatever)  
> could be
> a function taking an encoding parameter (which defaults to UTF-8) and
> returning strings.
>
> Of course that's backwards incompatible and I'm not sure if it's too
> late for something like this now.

It might be reasonable and even necessary, but I suspect usability  
will suffer.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRupxzHEjvBPtnXfVAQJ1owP+OBzC2UfeU4rio8nQJgYHl33xZfsAmHkQ
Iv8188QzbCuypWQF/Zwr6i6yu+Kt64b0amDoYKI/VdnTceeC3u5ejSh66JocyP2X
SmNJYrt6aikFJTgs5nqAgAKQhcXfPNZh45tg/ZVsnpOro6juZTSgs+XO3b3g16VD
VSs//yDdL64=
=nBLI
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Fri Sep 14 14:09:58 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 14 Sep 2007 14:09:58 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA5114.9060200@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
Message-ID: <46EA7A16.5010902@v.loewis.de>

> Is it too unreasonable to keep the byte strings we get from the OS as 
> byte strings in Python (since we're not sure about their encoding) and 
> offer functions for getting strings?

I think people will complain if command line arguments aren't strings,
and they will complain even more so if file names are not strings.

> Of course that's backwards incompatible and I'm not sure if it's too 
> late for something like this now.

That is not a concern. However, it is fundamentally the wrong thing to
do. Most people rightfully view command line arguments and file names
as strings, as they use the keyboard to enter them, and the computer
uses letters from a font to display them. They are not bytes
conceptually - they are strings in a potentially unknown encoding.

Regards,
Martin

From hagenf at CoLi.Uni-SB.DE  Fri Sep 14 14:20:19 2007
From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Fri, 14 Sep 2007 14:20:19 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA7A16.5010902@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>
	<46EA7A16.5010902@v.loewis.de>
Message-ID: <46EA7C83.6040507@coli.uni-saarland.de>

> That is not a concern. However, it is fundamentally the wrong thing to
> do. Most people rightfully view command line arguments and file names
> as strings, as they use the keyboard to enter them, and the computer
> uses letters from a font to display them. They are not bytes
> conceptually - they are strings in a potentially unknown encoding.

Are you sure that "strings in an unknown encoding" are conceptually 
strings and not rather bytes?

And what if we skillfully conserve unknown bytes in a private use or 
surrogate area and the application author actually knows the encoding 
and wants correctly decoded strings?

- Hagen


-- 
http://www.coli.uni-saarland.de/~hagenf/
PGP fingerprint: C8EF 458E 5531 14AA 42BC AA1C 36AE D91D BA94 7D32

From martin at v.loewis.de  Fri Sep 14 14:32:59 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 14 Sep 2007 14:32:59 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA7C83.6040507@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>
	<46EA7A16.5010902@v.loewis.de>
	<46EA7C83.6040507@coli.uni-saarland.de>
Message-ID: <46EA7F7B.2060609@v.loewis.de>

> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?

For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.

> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?

They can easily roundtrip that then to the encoding that it should have:

good_string = sys.argv[bad_string_index].\
   encode(sys.argv_encoding, "pua-replace").decode(real_encoding)

However, we are talking about borderline cases here - in most cases,
Python will just do the right thing. Special cases aren't special enough
to break the rules.

Regards,
Martin

From hagenf at CoLi.Uni-SB.DE  Fri Sep 14 14:46:34 2007
From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Fri, 14 Sep 2007 14:46:34 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA7F7B.2060609@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>
	<46EA7A16.5010902@v.loewis.de>
	<46EA7C83.6040507@coli.uni-saarland.de>
	<46EA7F7B.2060609@v.loewis.de>
Message-ID: <46EA82AA.3070200@coli.uni-saarland.de>

> They can easily roundtrip that then to the encoding that it should have:
> 
> good_string = sys.argv[bad_string_index].\
>    encode(sys.argv_encoding, "pua-replace").decode(real_encoding)

To me this doesn't look easier than sys.arguments() in the standard case 
and sys.arguments(encoding="whatever") if you know the special encoding.

Just my two cents...

- Hagen

From jimjjewett at gmail.com  Fri Sep 14 15:39:31 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 14 Sep 2007 09:39:31 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA5114.9060200@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
Message-ID: 

On 9/14/07, Hagen F?rstenau  wrote:
> Is it too unreasonable to keep the byte strings we get from the OS as
> byte strings in Python (since we're not sure about their encoding) and
> offer functions for getting strings?

> sys.argv could be of type bytes and sys.arguments (or whatever) could be
> a function taking an encoding parameter (which defaults to UTF-8) and
> returning strings.

> Of course that's backwards incompatible and I'm not sure if it's too
> late for something like this now.

For that reason alone, it makes sense to do it the other way.
sys.argv is the text string, and sys.arguments is a bytes object which
can be decoded if you know the encoding.  sys.argv ==
sys.arguments(best_guess)

-jJ

From nicko at nicko.org  Fri Sep 14 19:11:08 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Fri, 14 Sep 2007 18:11:08 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709111506.32823.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
Message-ID: 

On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> Is there any chance that an ordered dict will be added to Python 3's
> library?

It would make sense, since one of the primary justifications for the  
new metaclass system (PEP 3115) is to allow the metaclass to provide  
order-preserving dictionaries to record the order in which members  
are defined.

> I think other people must find such things useful. There are three
> implementations on the Python Cookbook site, and one on PyPI, all in
> pure Python (plus I have my own implementation, also pure Python).

Is there much commonality between the interfaces for these?  I'm sure  
there are various different opinions as to the exact nature of the  
API, particularly around any facilities for re-ordering, slicing etc.

	Cheers,
		Nicko


From mark at qtrac.eu  Fri Sep 14 19:36:11 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Fri, 14 Sep 2007 18:36:11 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	
Message-ID: <200709141836.11481.mark@qtrac.eu>

On 2007-09-14, Nicko van Someren wrote:
> On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > Is there any chance that an ordered dict will be added to Python 3's
> > library?
>
> It would make sense, since one of the primary justifications for the
> new metaclass system (PEP 3115) is to allow the metaclass to provide
> order-preserving dictionaries to record the order in which members
> are defined.
>
> > I think other people must find such things useful. There are three
> > implementations on the Python Cookbook site, and one on PyPI, all in
> > pure Python (plus I have my own implementation, also pure Python).
>
> Is there much commonality between the interfaces for these?  I'm sure
> there are various different opinions as to the exact nature of the
> API, particularly around any facilities for re-ordering, slicing etc.
> 	Cheers,
> 		Nicko

After posting I realised that actually this isn't P3K-specific. I'd hope
to see the collections module extended with more data structures in
general.

I put a similar post on the main python list but with no consensus so
far...

I put forward an API which is the same as dict (but any list or iterator
returned "just happens" to work in key order) plus a few extra methods
to exploit the ordering. I don't know how to refer to a usenet thread
but this should get there:
http://groups.google.co.uk/group/comp.lang.python/browse_frm/thread/b16c34f8dd09a8a0/62a9cd8f8b73cdac#62a9cd8f8b73cdac

I also put an example implementation on PyPI since a respondent advised
that I do that:
http://pypi.python.org/pypi?:action=display&name=ordereddict&version=1.0.0

I certainly hope that Python will have one or more ordered data
structures in the collections module since I think they are often
useful. I don't expect mine to be used, I am just trying to get the
_idea_ accepted that an ordered data structure is useful and worth
putting in the standard library. I hope for example, that an AVL tree
and/or a B*tree and/or a skiplist will be implemented.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From rhamph at gmail.com  Fri Sep 14 19:50:34 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 14 Sep 2007 11:50:34 -0600
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709141836.11481.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	
	<200709141836.11481.mark@qtrac.eu>
Message-ID: 

On 9/14/07, Mark Summerfield  wrote:
> On 2007-09-14, Nicko van Someren wrote:
> > On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > > Is there any chance that an ordered dict will be added to Python 3's
> > > library?
> >
> > It would make sense, since one of the primary justifications for the
> > new metaclass system (PEP 3115) is to allow the metaclass to provide
> > order-preserving dictionaries to record the order in which members
> > are defined.
> >
> > > I think other people must find such things useful. There are three
> > > implementations on the Python Cookbook site, and one on PyPI, all in
> > > pure Python (plus I have my own implementation, also pure Python).
> >
> > Is there much commonality between the interfaces for these?  I'm sure
> > there are various different opinions as to the exact nature of the
> > API, particularly around any facilities for re-ordering, slicing etc.
> >       Cheers,
> >               Nicko
>
> After posting I realised that actually this isn't P3K-specific. I'd hope
> to see the collections module extended with more data structures in
> general.
>
> I put a similar post on the main python list but with no consensus so
> far...
>
> I put forward an API which is the same as dict (but any list or iterator
> returned "just happens" to work in key order) plus a few extra methods
> to exploit the ordering. I don't know how to refer to a usenet thread
> but this should get there:

That's a sorted dict.  PEP 3115 wants an insertion-ordered dict.
You're not the first to confuse them. ;)

-- 
Adam Olsen, aka Rhamphoryncus

From mark at qtrac.eu  Fri Sep 14 21:52:23 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Fri, 14 Sep 2007 20:52:23 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709141836.11481.mark@qtrac.eu>
	
Message-ID: <200709142052.23583.mark@qtrac.eu>

On 2007-09-14, Adam Olsen wrote:
> On 9/14/07, Mark Summerfield  wrote:
> > On 2007-09-14, Nicko van Someren wrote:
> > > On 11 Sep 2007, at 15:06, Mark Summerfield wrote:
> > > > Is there any chance that an ordered dict will be added to Python 3's
> > > > library?
> > >
> > > It would make sense, since one of the primary justifications for the
> > > new metaclass system (PEP 3115) is to allow the metaclass to provide
> > > order-preserving dictionaries to record the order in which members
> > > are defined.
> > >
> > > > I think other people must find such things useful. There are three
> > > > implementations on the Python Cookbook site, and one on PyPI, all in
> > > > pure Python (plus I have my own implementation, also pure Python).
> > >
> > > Is there much commonality between the interfaces for these?  I'm sure
> > > there are various different opinions as to the exact nature of the
> > > API, particularly around any facilities for re-ordering, slicing etc.
> > >       Cheers,
> > >               Nicko
> >
> > After posting I realised that actually this isn't P3K-specific. I'd hope
> > to see the collections module extended with more data structures in
> > general.
> >
> > I put a similar post on the main python list but with no consensus so
> > far...
> >
> > I put forward an API which is the same as dict (but any list or iterator
> > returned "just happens" to work in key order) plus a few extra methods
> > to exploit the ordering. I don't know how to refer to a usenet thread
> > but this should get there:
>
> That's a sorted dict.  PEP 3115 wants an insertion-ordered dict.
> You're not the first to confuse them. ;)

Hmmm, I'd not come across that terminology distinction before.
I guess I'll have to rename mine then.

BTW In my previous I said "I hope for example, that an AVL tree and/or a
B*tree and/or a skiplist will be implemented." Actually, I don't care
what data structures are used, I just think that Python lacks ordered
data structures, specifically: sorteddict and sortedset. (Personally
I've never needed an insertion-ordered dict.)


-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From larry at hastings.org  Fri Sep 14 22:34:01 2007
From: larry at hastings.org (Larry Hastings)
Date: Fri, 14 Sep 2007 13:34:01 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709142052.23583.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>	<200709141836.11481.mark@qtrac.eu>	
	<200709142052.23583.mark@qtrac.eu>
Message-ID: <46EAF039.7020208@hastings.org>

Mark Summerfield wrote:
> (Personally I've never needed an insertion-ordered dict.)

Then you've never programmed in PHP I take it.  PHP's one-size-fits-all 
data structure is an insertion-ordered dict; PHP programmers use it 
everywhere a Python programmer might use a dict /or/ a list.  I've had 
one or two people tell me having both behaviors in one object is "really 
useful every-so-often", though they didn't go into any more detail.  
Can't really see the advantage, myself.


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070914/44a25647/attachment.htm 

From martin at v.loewis.de  Fri Sep 14 23:01:37 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 14 Sep 2007 23:01:37 +0200
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709142052.23583.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>	<200709141836.11481.mark@qtrac.eu>	
	<200709142052.23583.mark@qtrac.eu>
Message-ID: <46EAF6B1.8000705@v.loewis.de>

>> That's a sorted dict.  PEP 3115 wants an insertion-ordered dict.
>> You're not the first to confuse them. ;)
> 
> Hmmm, I'd not come across that terminology distinction before.
> I guess I'll have to rename mine then.

I think "insertion-ordered" is over-specification, just to make
the distinction clear. Most of the time, people mean "ordered
dictionary" to say "keys are in a fixed order" - typically insertion
order. When they want to express that the keys ought to be
sorted, they call it "sorted dictionary".

Regards,
Martin


From greg.ewing at canterbury.ac.nz  Sat Sep 15 00:40:00 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 15 Sep 2007 10:40:00 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46EB0DC0.3050906@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> You chose the context of round-tripping *across
> encodings*, not me.  Please stick with your context.

Maybe we have different ideas of what the problem is.
I thought the problem is to take arbitrary byte sequences
coming in as command-line args and represent them as
unicode strings in such a way that the can be losslessly
converted back into the same byte strings.

I was just pointing out that if you do this in a way
that involves some sort of dynamically generated mapping,
then it won't work if the round trip spans more than
one Python session -- and that there are any number of
ways that the data could get from one session to
another, many of them not involving anything that one
would recognise as a unicode encoding in the conventional
sense.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Sep 15 00:44:18 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 15 Sep 2007 10:44:18 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA5114.9060200@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
Message-ID: <46EB0EC2.4030208@canterbury.ac.nz>

Hagen F?rstenau wrote:
> sys.argv could be of type bytes and sys.arguments (or whatever) could be 
> a function taking an encoding parameter (which defaults to UTF-8) and 
> returning strings.
> 
> Of course that's backwards incompatible and I'm not sure if it's too 
> late for something like this now.

It would be pretty disruptive to ask everyone to change
their habit of thinking of sys.argv as a list of strings.

I would suggest doing it the other way around -- have
sys.argv be an object that automatically converts to
unicode on access, and something else, such as
sys.argbytes, for getting the raw bytes if that fails.

--
Greg

From guido at python.org  Sat Sep 15 01:22:25 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 14 Sep 2007 16:22:25 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB0EC2.4030208@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
Message-ID: 

On 9/14/07, Greg Ewing  wrote:
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.

Indeed.

> I would suggest doing it the other way around -- have
> sys.argv be an object that automatically converts to
> unicode on access, and something else, such as
> sys.argbytes, for getting the raw bytes if that fails.

Great idea, but sys.argv doesn't need to be magic for this approach to work.

If course os.environ would have to be treated similarly.

And things like the strings returned by the _locale module (I found at
least one test failing on Red Hat platforms because the thousands
separator is set to \xa0 in the Estonian locale).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Sat Sep 15 01:21:26 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 15 Sep 2007 11:21:26 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	
Message-ID: <46EB1776.6030006@canterbury.ac.nz>

Guido van Rossum wrote:
> Great idea, but sys.argv doesn't need to be magic for this approach to work.

Are you sure? I thought part of the problem was that
if an argv entry couldn't be decoded, you got an error
too soon to do anything about it. Making sys.argv lazy
would avoid that.

--
Greg

From guido at python.org  Sat Sep 15 02:07:52 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 14 Sep 2007 17:07:52 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB1776.6030006@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	
	<46EB1776.6030006@canterbury.ac.nz>
Message-ID: 

On 9/14/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
> > Great idea, but sys.argv doesn't need to be magic for this approach to work.
>
> Are you sure? I thought part of the problem was that
> if an argv entry couldn't be decoded, you got an error
> too soon to do anything about it. Making sys.argv lazy
> would avoid that.

I see. But you could also insert '?'s into the argv string.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Sat Sep 15 02:13:31 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 15 Sep 2007 09:13:31 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189756174.32337.30.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
Message-ID: <18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>

"Marcin 'Qrczak' Kowalczyk"  writes:

 >> And it *is* needed, because these characters by assumption
 >> are not present in Unicode at all.  (More precisely, they may be
 >> present, but the tables we happen to have don't have mappings for
 >> them.)

 > They are present! For UTF-8, UTF-16 and UTF-32 PUA is not special in
 > any way.

The characters I am referring to are the unstandardized so-called
"corporate characters" that are very common in Japanese text.  My
solution handles your problem, slightly less efficiently than yours
does, perhaps, but in a Unicode-conforming way.  Yours doesn't help
with mine at all.

 > It preserves the byte string contents, which is all that is needed.

That is not true in any environment where the encoding is not known
with certainty.

 > It has the same result as UTF-8 for all valid UTF-8 sequences not
 > containing NUL.

Sorry, I'm talking about real Japanese and other situations where
there is no corresponding Unicode character point, and a solution
which not only handles that but also handles corrupt UTF-8.  Valid
UTF-8 is not a problem, it's the solution.  But a robust language
should handle text that is not valid UTF-8 in a way that allows the
programmer or user to implement error correction at a finer-grained
level than dumping core.

 >> I'm also very bothered by the fact that the interpretation of U+0000
 >> differs in different contexts in your proposal.

 > Well, for any scheme which attempts to modify UTF-8 by accepting
 > arbitrary byte strings is used, *something* must be interpreted
 > differently than in real UTF-8.

Wrong.  In my scheme everything ends up in the PUA, on which real
UTF-8 imposes no interpretation by definition.

I haven't gone back to check yet, but it's possible that a "real UTF-8
conforming process" is required to stop processing and issue an error
or something like that in the cases we're trying to handle.  But your
extension and James Knight's extension both fall afoul of any such
provision, too.

 >> Once you get a string into Python, you normally no longer know
 >> where it came from, but now whether something came from the
 >> program argument or environment or from a stdio stream changes the
 >> semantics of U+0000.  For me personally, that's a very good reason
 >> to object to your proposal.

 > This can be said about any modification of UTF-8.

It's not true of James Knight's proposal, because the same
modification can be used for both program arguments and file streams.

And my proposal doesn't modify UTF-8 at all; it takes advantage of the
farsighted wisdom of the designers of Unicode and puts all the
non-standard "characters", including broken encoding, in the PUA.

 > Of course you can use such encoding on a standard stream too. In
 > this case only U+0000 cannot be used normally, and the resulting
 > stream will contain whatever bytes were present in filenames and
 > other strings being output to it.

A programmer can use it, but his users will curse his name every time
a binary stream gets corrupted because they forgot that little detail.

 >>  > Of course my escaping scheme can preserve \0 too, by escaping it to
 >>  > U+0000 U+0000, but here it's incompatible with the real UTF-8.

 >> No.  It's *never* compatible with UTF-8 because it assigns a different
 >> meaning to U+0000 from ASCII NUL.

 > It is compatible with UTF-8 except for U+0000, and a true U+0000 cannot
 > occur anyway in these contexts, so this incompatibility is mostly
 > harmless.

Forcing users to use codecs of subtly different semantics simply because
they're getting I/O from different sources is a substantial harm.

 >> Your scheme also suffers from the practical problem that strings
 >> containing escapes are no longer arrays of characters.

 > They are no less arrays of characters than strings containing combining
 > marks.

Those marks are characters in their own right.  Your escapes are not,
nor are surrogates.

It's true that users will be surprised by the count of characters in
many cases with unnormalized Unicode, but these can be reduced to a
very few by normalizing to NFC.


From stephen at xemacs.org  Sat Sep 15 02:25:02 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 15 Sep 2007 09:25:02 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA7C83.6040507@coli.uni-saarland.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EA7A16.5010902@v.loewis.de>
	<46EA7C83.6040507@coli.uni-saarland.de>
Message-ID: <87y7f83hcx.fsf@uwakimon.sk.tsukuba.ac.jp>

Hagen F?rstenau writes:

 > And what if we skillfully conserve unknown bytes in a private use or 
 > surrogate area and the application author actually knows the encoding 
 > and wants correctly decoded strings?

This is what my proposal would do, but my proposal would would return
a string, not bytes.


From stephen at xemacs.org  Sat Sep 15 05:44:05 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 15 Sep 2007 12:44:05 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB0DC0.3050906@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
Message-ID: <87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > Stephen J. Turnbull wrote:
 > > You chose the context of round-tripping *across
 > > encodings*, not me.  Please stick with your context.
 > 
 > Maybe we have different ideas of what the problem is.  I thought
 > the problem is to take arbitrary byte sequences coming in as
 > command-line args and represent them as unicode strings in such a
 > way that the can be losslessly converted back into the same byte
 > strings.

That's a straw man if taken literally.  Just use the ISO-8859-1 codec,
and you're done.

If you add the condition that the encoding is known with certainty and
the source string is well-formed for that encoding, then you need to
decode to meaningful Unicode.  For that problem, James Knight's
solution is good if it makes sense to assume that the sequence of
bytes is encoded in UTF-8 Unicode.  However, I don't think that is a
reasonable assumption for a language that is heavily used in Europe
and Japan, and for processing email.  These are contexts where UTF-8
is making steady progress, but legacy encodings are still quite
important.

However, the general problem is to decode a sequence of bytes into a
Unicode string and be able to recover the original sequence if you
decide you got it wrong, even after you've sliced and concatenated the
string with other strings.  With no guarantee that all the source
encodings where the same.

 > I was just pointing out that if you do this in a way that involves
 > some sort of dynamically generated mapping, then it won't work if
 > the round trip spans more than one Python session -- and that there
 > are any number of ways that the data could get from one session to
 > another, many of them not involving anything that one would
 > recognise as a unicode encoding in the conventional sense.

But it also won't work if you just pass around strings that are
invertible to byte sequences, *because recipients don't know which
byte sequence to invert them to*.  Is that cruft corrupt EUC-JP or
corrupt Shift JIS or corrupt UTF-8?  Or maybe simply a valid character
which is even a Unicode character, but not in the table for the source
encoding (this happens in Japanese all the time)?  You're likely to
make different guesses about what was intended by a specific sequence
of byte cruft for different original encodings.

What I'm suggesting is to provide a way for processes to record and
communicate that information without needing to provide a "source
encoding" slot for strings, and which is able to handle strings
containing unrecognized (including corrupt) characters from multiple
source encodings.  True, it will be up to the applications to
communicate that information, but it is, anyway.

Furthermore, the same algorithms can be used to "fold" any text that
contains only BMP characters plus no more than 6400 distinct non-BMP
characters into the BMP, which would be a nice feature for people
wanting to avoid the UTF-16 surrogates for some reason.

As Martin points out, it may not be possible to implement this without
changing the codecs one by one (I have some hope that it can
nevertheless be done, but haven't looked at the codec framework
closely yet).  I think it would be unfortunate if we're going to try
to solve a small subset of these problems (as James and Marcin are
doing) to overlook the possibility of a good solution to a whole bunch
of related problems.

From mark at qtrac.eu  Sat Sep 15 07:04:18 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Sat, 15 Sep 2007 06:04:18 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <46EAF6B1.8000705@v.loewis.de>
References: <200709111506.32823.mark@qtrac.eu>
	<200709142052.23583.mark@qtrac.eu> <46EAF6B1.8000705@v.loewis.de>
Message-ID: <200709150604.18638.mark@qtrac.eu>

On 2007-09-14, Martin v. L?wis wrote:
> >> That's a sorted dict.  PEP 3115 wants an insertion-ordered dict.
> >> You're not the first to confuse them. ;)
> >
> > Hmmm, I'd not come across that terminology distinction before.
> > I guess I'll have to rename mine then.
>
> I think "insertion-ordered" is over-specification, just to make
> the distinction clear. Most of the time, people mean "ordered
> dictionary" to say "keys are in a fixed order" - typically insertion
> order. When they want to express that the keys ought to be
> sorted, they call it "sorted dictionary".

I got my terminology from C++ which has

    C++ map           => Python sorteddict (missing!)
    C++ unordered_map => Python dict
    C++ set           => Python sortedset (missing!)
    C++ unordered_set => Python set

I've now renamed mine to sorteddict:
http://pypi.python.org/pypi/sorteddict

As for Adam's comment about the dict API, I find it okay, and I think
some people would prefer a close match. Unfortunately, I don't think
there will ever be consensus (so that's why there's a BDFL), but
whatever their APIs, I hope that Python gets a sorteddict and a
sortedset. But how does this happen? I've discussed it on
comp.lang.python (having used ordereddict in the subject line to create
unintentional confusion), but at some point a PEP has to be created. I'm
happy to do that (at least for a sorteddict), but if someone who has
done PEPs before did so, I'd be just as happy---I'll see what the
feedback is (if any) when I get online again next week.

PS And no, I've never programmed in PHP and never fancied doing so:-)

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From martin at v.loewis.de  Sat Sep 15 07:33:21 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 15 Sep 2007 07:33:21 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>		<46E96E98.9080406@v.loewis.de>	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA0778.3000502@canterbury.ac.nz>	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA1734.6020103@canterbury.ac.nz>	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46EB6EA1.5020104@v.loewis.de>

> What I'm suggesting is to provide a way for processes to record and
> communicate that information without needing to provide a "source
> encoding" slot for strings, and which is able to handle strings
> containing unrecognized (including corrupt) characters from multiple
> source encodings.

Can you please (re-)state how that way would precisely work? I could
not find that in the archives.

Regards,
Martin

From arvind1.singh at gmail.com  Sat Sep 15 15:45:21 2007
From: arvind1.singh at gmail.com (Arvind Singh)
Date: Sat, 15 Sep 2007 19:15:21 +0530
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709150604.18638.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709142052.23583.mark@qtrac.eu> <46EAF6B1.8000705@v.loewis.de>
	<200709150604.18638.mark@qtrac.eu>
Message-ID: 

>
> I hope that Python gets a sorteddict and a
> sortedset.


It doesn't make sense for Python to have sorteddict or sortedset. You see,
dict can have  keys which cannot be ordered (keys can be heterogeneous, in
which case Py3K may raise TypeError; ordering doesn't make sense for the
objects used as keys) and same goes for set elements.

Sorting makes sense only as a run-time operation, in which case, the
programmer should be prepared to handle appropriate exceptions.

Btw, would you like a dict or set for which you have to handle exceptions at
every insertion?

-- 
Regards,
Arvind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070915/417eef1a/attachment.htm 

From hfuerstenau at gmx.net  Sat Sep 15 12:44:09 2007
From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=)
Date: Sat, 15 Sep 2007 12:44:09 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB0EC2.4030208@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
Message-ID: <46EBB779.6090605@gmx.net>

>> sys.argv could be of type bytes and sys.arguments (or whatever) could be 
>> a function taking an encoding parameter (which defaults to UTF-8) and 
>> returning strings.
>>
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.

The idea behind this was that it would preserve the non-decoding 
behaviour of the present sys.argv and put the new behaviour into a new 
function.

Also "argv" sounds more low-level than something like "arguments". But 
of course, "argbytes" sounds even more low-level. :-)

- Hagen


From nevillegrech at gmail.com  Sat Sep 15 18:36:51 2007
From: nevillegrech at gmail.com (Neville Grech Neville Grech)
Date: Sat, 15 Sep 2007 18:36:51 +0200
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709142052.23583.mark@qtrac.eu> <46EAF6B1.8000705@v.loewis.de>
	<200709150604.18638.mark@qtrac.eu>
	
Message-ID: 

>From a python's user point of view. a sorted dict/set/list was sometimes a
requirement for me. Basically. a dictionary that had a BTree implementation
instead of a hash table. Also. having an explicit type error would then be a
clear indication that you have something wrong in your implementattion (and
therefore useful indication).
Other languages have separate collection frameworks like C# has
powercollections. Having these collections as part of the standard library
is another issue though.
On 9/15/07, Arvind Singh  wrote:
>
> I hope that Python gets a sorteddict and a
> > sortedset.
>
>
> It doesn't make sense for Python to have sorteddict or sortedset. You see,
> dict can have  keys which cannot be ordered (keys can be heterogeneous, in
> which case Py3K may raise TypeError; ordering doesn't make sense for the
> objects used as keys) and same goes for set elements.
>
> Sorting makes sense only as a run-time operation, in which case, the
> programmer should be prepared to handle appropriate exceptions.
>
> Btw, would you like a dict or set for which you have to handle exceptions
> at every insertion?
>
> --
> Regards,
> Arvind
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/nevillegrech%40gmail.com
>
>


-- 
Regards,
Neville Grech
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070915/31b227bc/attachment.htm 

From g.brandl at gmx.net  Sat Sep 15 18:48:18 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 15 Sep 2007 18:48:18 +0200
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>	<200709142052.23583.mark@qtrac.eu>
	<46EAF6B1.8000705@v.loewis.de>	<200709150604.18638.mark@qtrac.eu>
	
Message-ID: 

Arvind Singh schrieb:
>      I hope that Python gets a sorteddict and a
>     sortedset. 
> 
> 
> It doesn't make sense for Python to have sorteddict or sortedset. You
> see, dict can have  keys which cannot be ordered (keys can be
> heterogeneous, in which case Py3K may raise TypeError; ordering doesn't
> make sense for the objects used as keys) and same goes for set elements.
> 
> Sorting makes sense only as a run-time operation, in which case, the
> programmer should be prepared to handle appropriate exceptions.
> 
> Btw, would you like a dict or set for which you have to handle
> exceptions at every insertion?

In the cases where you have to do that, you shouldn't be using a sorted
dict.

But why not better look at those other 95% of cases where the values are
uniformly typed and perfectly sortable?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From greg at krypto.org  Sat Sep 15 19:56:46 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 15 Sep 2007 10:56:46 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E9C724.9080808@canterbury.ac.nz>
References: <20070829234728.GV24059@electricrain.com>
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	<46E72B18.9060908@canterbury.ac.nz>
	<52dc1c820709120044h722605cekc86ea668a6a1b4bd@mail.gmail.com>
	<46E9C724.9080808@canterbury.ac.nz>
Message-ID: <52dc1c820709151056t1b14bca2ld723524e542aa914@mail.gmail.com>

On 9/13/07, Greg Ewing  wrote:
> Gregory P. Smith wrote:
> > When I read the plain term EXCLUSIVE I read that to mean nobody else can
> > read -or- write, ie: not shared in any sense.
>
> You're right, it's not the best term.
>
> > Lets extend these base
> > concepts to SHARED_READ, SHARED_WRITE, EXCLUSIVE_READ, EXCLUSIVE_WRITE
>
> EXCLUDE_WRITE might be better, since EXCLUSIVE_WRITE seems
> to imply that one is writing oneself as well.
>
> > EXCLUSIVE_READ - no others can read this buffer while this view is
> > open.
>
> This is the one that I don't think is necessary. I don't
> see a need to ever prevent others from *reading* if they
> really want to and are prepared to deal with the
> consequences. Most of the time the other party will be using
> READ_LOCK which includes EXCLUDE_WRITE, so it will fail
> if you're already holding a write lock.
>
> So we just have
>
> READ
> WRITE
> READ_LOCK = READ | EXCLUDE_WRITE
> WRITE_LOCK = WRITE | EXCLUDE_WRITE

I like your terminology.  Also, agreed that EXCLUDE_READ is not likely
to be necessary; I listed it for completeness sake.

From greg at krypto.org  Sat Sep 15 20:11:33 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 15 Sep 2007 11:11:33 -0700
Subject: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and
	immutable support
In-Reply-To: <46E98F25.5010404@enthought.com>
References: <20070829234728.GV24059@electricrain.com>
	
	<52dc1c820709081615m783ea9fctc562d113252fb7b1@mail.gmail.com>
	
	<46E62358.3020404@enthought.com>
	
	<46E6F137.2020001@enthought.com>
	
	<46E98F25.5010404@enthought.com>
Message-ID: <52dc1c820709151111n111856f2k5d1fa80af7d3460b@mail.gmail.com>

On 9/13/07, Travis E. Oliphant  wrote:
> I think if it doesn't go through the buffer interface it is up to the
> object to decide (i.e. what does the object do with itself when buffers
> are exported --- that will depend on the object).   All it must do is
> support the buffer interface in the correct way (i.e. not move the
> memory buffers are relying on and support the access modes correctly
> that it purports to export).

Correct.  This is what I have done in my Bytes object patch to support
READ | EXCLUDE_WRITE (speaking in Greg Ewing's terms which I think we
should adopt).

> Let me think about adding a function for read-write locking that is
> separate from getting a view (which implements memory-location
> locking).  I appreciate the discussion as it is helping me clarify my
> thinking.
>
> -Travis

I'm interested to see what you come up with but... As it is, I agree
with Greg Ewing that a separate function is not necessary and that
just an set of flags to the existing buffer interface are all thats
needed.  Otherwise code would need to make multiple calls (get, lock,
unlock, release) and deal with errors when both do not succeed which
is  complicated and error prone to do in C when a single call could
encapsulate it.

-gps

From greg at krypto.org  Sat Sep 15 22:36:49 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 15 Sep 2007 13:36:49 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB0EC2.4030208@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
Message-ID: <52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>

On 9/14/07, Greg Ewing  wrote:
> Hagen F?rstenau wrote:
> > sys.argv could be of type bytes and sys.arguments (or whatever) could be
> > a function taking an encoding parameter (which defaults to UTF-8) and
> > returning strings.
> >
> > Of course that's backwards incompatible and I'm not sure if it's too
> > late for something like this now.
>
> It would be pretty disruptive to ask everyone to change
> their habit of thinking of sys.argv as a list of strings.

Would it?  We're already asking them to convert between bytes and
unicode strings anywhere else I/O is done.  I see the command line and
environment as merely more forms of input.  The only way to parse them
into data structures automatically is to keep them as bytes.  They are
C concepts and can't imply an encoding.  As it is, its entirely
possible to have -multiple- encodings on a command line at once as
well as in environment variables.  They're all context sensitive.
This isn't going to change.

> I would suggest doing it the other way around -- have
> sys.argv be an object that automatically converts to
> unicode on access, and something else, such as
> sys.argbytes, for getting the raw bytes if that fails.

I'd leave sys.argv bytes and make sys.args/arguments/argstrs be some
best effort parsing.  argv is the C/C++ name for bytes, lets not
confuse people.  similarly for the environment.  os.environ dict
should be bytes object keys and values (or perhaps a bytes object
subclass that refuses null bytes).  the os.getenv and os.putenv
functions should take care of any best effort decoding/encoding and
have an optional getenv encoding= parameter to explicitly specify.

-gps

From p.f.moore at gmail.com  Sat Sep 15 23:29:35 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 15 Sep 2007 22:29:35 +0100
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
Message-ID: <79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>

On 15/09/2007, Gregory P. Smith  wrote:
> similarly for the environment.  os.environ dict
> should be bytes object keys and values

You can't have bytes as keys - the type isn't hashable...

Paul

From greg at krypto.org  Sun Sep 16 00:22:54 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 15 Sep 2007 15:22:54 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
Message-ID: <52dc1c820709151522h5e2a4336qcefe55f820042d36@mail.gmail.com>

On 9/15/07, Paul Moore  wrote:
> On 15/09/2007, Gregory P. Smith  wrote:
> > similarly for the environment.  os.environ dict
> > should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...

ugh, yeah.  as much as i hate to suggest it given my preference for
keeping any encoding out of automatic environment and argument
parsing, just make os.environ keys be latin-1 encoding or make them a
hashable subclass of bytes (yuck or yuck).  someone on windows should
check to see if it allows evil such as utf16 environment variable
names first (i'd hope not, that'd break traditional C/C++ code).

From aahz at pythoncraft.com  Sun Sep 16 02:40:05 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 15 Sep 2007 17:40:05 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EA7F7B.2060609@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EA7A16.5010902@v.loewis.de>
	<46EA7C83.6040507@coli.uni-saarland.de>
	<46EA7F7B.2060609@v.loewis.de>
Message-ID: <20070916004005.GA12697@panix.com>

On Fri, Sep 14, 2007, "Martin v. L??wis" wrote:
>Hagen:
>> 
>> And what if we skillfully conserve unknown bytes in a private use or
>> surrogate area and the application author actually knows the encoding
>> and wants correctly decoded strings?
> 
> They can easily roundtrip that then to the encoding that it should have:
> 
> good_string = sys.argv[bad_string_index].\
>    encode(sys.argv_encoding, "pua-replace").decode(real_encoding)

That doesn't count as "easily" in my book.  What about a sys._argv_orig
containing bytes objects?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

From aahz at pythoncraft.com  Sun Sep 16 02:44:52 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 15 Sep 2007 17:44:52 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
Message-ID: <20070916004452.GB12697@panix.com>

On Sat, Sep 15, 2007, Paul Moore wrote:
> On 15/09/2007, Gregory P. Smith  wrote:
>>
>> similarly for the environment.  os.environ dict
>> should be bytes object keys and values
> 
> You can't have bytes as keys - the type isn't hashable...

That's why people keep arguing for an immutable bytes types.  I keep
seeing long discussions that end up with a tortured mechanism for making
the keys unicode.  Why don't we just bite the bullet and make things
easier and have the immutable bytes type?
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

From greg.ewing at canterbury.ac.nz  Sun Sep 16 03:38:17 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 16 Sep 2007 13:38:17 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EBB779.6090605@gmx.net>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
Message-ID: <46EC8909.4050300@canterbury.ac.nz>

 > Also "argv" sounds more low-level than something like "arguments".

While we're on the subject of argv, I've been wondering
whether py3k might want to revisit the idea of having
argv[0] be the program name. In my experience, one almost
*never* wants to treat argv[0] the same way as the rest of
the arguments.

Putting the program name into argv[0] is a neat
trick in C that's relatively harmless, because it's
just as easy to start iterating from 1 than 0, but
in Python it makes all argument-processing code
more complicated than necessary.

It also provides a nasty trap for the unwary, as I
discovered one day when I wrote a program for deleting
files that deleted itself the first time I ran it. :-)

Changing the existing behaviour of argv would probably
be too disruptive, so how about relegating argv to a
low-level detail and providing something else for
everyday use that omits argv[0]?

sys.arguments would sound quite nice for that.

--
Greg

From nick.bastin at gmail.com  Sun Sep 16 03:53:48 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sat, 15 Sep 2007 21:53:48 -0400
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709142052.23583.mark@qtrac.eu> <46EAF6B1.8000705@v.loewis.de>
	<200709150604.18638.mark@qtrac.eu>
	
Message-ID: <66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>

On 9/15/07, Arvind Singh  wrote:
>
> > I hope that Python gets a sorteddict and a
> > sortedset.
>
> It doesn't make sense for Python to have sorteddict or sortedset. You see,
> dict can have  keys which cannot be ordered (keys can be heterogeneous, in
> which case Py3K may raise TypeError; ordering doesn't make sense for the
> objects used as keys) and same goes for set elements.

How do you get from "some keys can't be ordered" to "it doesn't make
sense for Python to have sorteddict or sortedset"?  If you want to use
keys that can't be ordered, then feel free to continue to use dict.
For situations in which ordering is important, that language should
support that.  When did this become an all or nothing proposition?
There's plenty of space for both dict and sorteddict.

> Btw, would you like a dict or set for which you have to handle exceptions at
> every insertion?

Yes, if that's what the situation calls for.

--
Nick

From nick.bastin at gmail.com  Sun Sep 16 04:00:25 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Sat, 15 Sep 2007 22:00:25 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
Message-ID: <66d0a6e10709151900l3d89b71u2e8b9bcb4b62b9f4@mail.gmail.com>

On 9/15/07, Paul Moore  wrote:
> On 15/09/2007, Gregory P. Smith  wrote:
> > similarly for the environment.  os.environ dict
> > should be bytes object keys and values
>
> You can't have bytes as keys - the type isn't hashable...

Then lets stop beating around the bush and implement an immutable
bytes type.  Why put ourselves through contortions trying to jam a
square peg into a round hole and not just decide to make a round peg?

--
Nick

From guido at python.org  Sun Sep 16 04:01:00 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 15 Sep 2007 19:01:00 -0700
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EC8909.4050300@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>
Message-ID: 

This sounds awfully close to bikeshedding. Change too many details
like this and you cause death by a 1000 pinpricks for existing apps.
sys.argv[0] *does* get used (though arguably rarely in the same way as
sys.argv[1:]).

--Guido

On 9/15/07, Greg Ewing  wrote:
>  > Also "argv" sounds more low-level than something like "arguments".
>
> While we're on the subject of argv, I've been wondering
> whether py3k might want to revisit the idea of having
> argv[0] be the program name. In my experience, one almost
> *never* wants to treat argv[0] the same way as the rest of
> the arguments.
>
> Putting the program name into argv[0] is a neat
> trick in C that's relatively harmless, because it's
> just as easy to start iterating from 1 than 0, but
> in Python it makes all argument-processing code
> more complicated than necessary.
>
> It also provides a nasty trap for the unwary, as I
> discovered one day when I wrote a program for deleting
> files that deleted itself the first time I ran it. :-)
>
> Changing the existing behaviour of argv would probably
> be too disruptive, so how about relegating argv to a
> low-level detail and providing something else for
> everyday use that omits argv[0]?
>
> sys.arguments would sound quite nice for that.
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Sun Sep 16 04:28:53 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 15 Sep 2007 19:28:53 PDT
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <20070916004452.GB12697@panix.com> 
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
	<20070916004452.GB12697@panix.com>
Message-ID: <07Sep15.192901pdt."57996"@synergy1.parc.xerox.com>

> > You can't have bytes as keys - the type isn't hashable...
> 
> That's why people keep arguing for an immutable bytes types.  I keep
> seeing long discussions that end up with a tortured mechanism for making
> the keys unicode.  Why don't we just bite the bullet and make things
> easier and have the immutable bytes type?

It's a pretty horrible hole (IMO) that a sequence of bytes isn't
hashable.  If we need the immutable bytes type, or some attribute on
the regular bytes type akin to the C "const", let's add it now before
the insanity gets out of control.

Bill

From fdrake at acm.org  Sun Sep 16 04:46:19 2007
From: fdrake at acm.org (Fred Drake)
Date: Sat, 15 Sep 2007 22:46:19 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <66d0a6e10709151900l3d89b71u2e8b9bcb4b62b9f4@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
	<66d0a6e10709151900l3d89b71u2e8b9bcb4b62b9f4@mail.gmail.com>
Message-ID: 

On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> Then lets stop beating around the bush and implement an immutable
> bytes type.  Why put ourselves through contortions trying to jam a
> square peg into a round hole and not just decide to make a round peg?

+42 !!!!


   -Fred

-- 
Fred Drake   




From stephen at xemacs.org  Sun Sep 16 09:13:29 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 16 Sep 2007 16:13:29 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EB6EA1.5020104@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
Message-ID: <87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > > What I'm suggesting is to provide a way for processes to record and
 > > communicate that information without needing to provide a "source
 > > encoding" slot for strings, and which is able to handle strings
 > > containing unrecognized (including corrupt) characters from multiple
 > > source encodings.
 > 
 > Can you please (re-)state how that way would precisely work? I could
 > not find that in the archives.

The basic idea is to allocate code points in private space as-needed.

All points in private space would be initially "owned" by the Python
process.

When a codec encounters something it can't handle, whether it's a
valid character in a legacy encoding, a private use character in a
UTF, or an invalid sequence of code units, it throws an exception
specifying the character or code unit and the current coded character
set, and the handler either finds that tuple in the table, or assigns
a private use character and enters it in the table with key being the
charset-codepoint tuple, and the inverse assignment in an inverse
mapping table.

It may be that no charset can be assigned to the codepoint, in which
case None would be assigned as the charset, and instead of mapping
characters, the invalid codepoints would be individually mapped.

On output, if the codec can output in the recorded character set, it
does so, otherwise it throws an unencodable character exception.

This definitely requires that the Unicode codecs be modified to do the
right thing if they encounter private use characters in the input
stream or output stream.

Other codecs don't need to be modified, although ISO 2022-based codecs
(at least) would benefit from it.  Some codecs (like ISO-8859 codecs)
will have implicit charsets (ASCII code points can't be errors for
them, so only GR matters), and can use codec-specific handlers that
know what the implicit charset is.  (AIUI this would require that the
handler-specifying protocol be changed from an enumeration of the
available handlers to the ability to actually specify one.)  The rest
can use the None charset, so that code units will be preserved.

Applications which wish to pass strings across process boundaries will
have to pass the table too.  If they don't, then in general they can't
use this family of exception handlers.

To handle cases like Marcin's private encoding, and in general to
allow efficient IPC for process that know they're going to get certain
private use characters in I/O, there should be an API to preallocate
specific code points.  (Theoretically, dynamically allocated private
code points could be reallocated, but that would require translating
all existing strings, and I can't believe that would ever be worth
it.)

What happens if a string "escapes" without the table?

1.  The application uses the preallocation API.  Then the characters
it understands are handled normally, and dynamically allocated private
use characters are errors, anyway.  I don't see how this makes things
worse.

2.  The application doesn't use the preallocation API, but does know
about some private use characters.  Then it will get confused by the
dynamic allocation, as Greg and Marcin point out, and users should be
advised not to use the new handler.

3.  The application doesn't know about any private use characters.
Then dynamically allocated characters are exceptions anyway.  I don't
see how this makes things worse.

Advantages:

1.  Almost all the "interesting" information about the original
encoded source is preserved, including under string operations like
slicing and concatenation with strings form other sources.  (I can
quantify "almost all" more precisely if necessary.)

2.  100% Unicode conformance in the sense that if the internal
representation escapes, it's valid Unicode.

3.  Efficient internal representation in the sense that applications
need not worry about invalid Unicode when doing string operations.

4.  In 16-bit environments, up to 6400 non-BMP characters can be
mapped into the BMP private use area using the same algorithm,
achieving a "string is character array" representation at the expense
of slight overhead in I/O and one extra table reference in each
character property lookup.  As Marcin points out, given that not all
composable characters have one-character NFC representations, we can't
guarantee that the user's notion of string length will equal the
number of characters in the string, but in practice I think that will
almost invariably work out.  And if we're doing normalization, the
codec overhead becomes less important.

Disadvantages:

1.  Unicode codecs will need to be modified, since they need to throw
exceptions on private use characters.

2.  Other codecs will need to be modified to take advantage of this
handler, since AFAIK currently none of the available handlers can use
charset information, so I can't imagine the codecs provide it.

3.  More overhead in exception-handling than James Knight's or Marcin
Kowalczyk's proposals.

4.  Applications that know about some private use characters will need
to be modified to preallocate those characters before they can take
advantage of this handler.

In general, I don't think that the overhead should be weighted very
heavily against this proposal.  Exception handlers impose a fair
amount of overhead anyway, AIUI.  Furthermore, any application that
cares enough to keep track of the original code points will IMO be
hungry for any additional information that can help in exception
handling.  This proposal provides as much as you can get, short of
buffering all the input.

HTH,


From martin at v.loewis.de  Sun Sep 16 09:25:26 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 16 Sep 2007 09:25:26 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>		<46E96E98.9080406@v.loewis.de>	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA0778.3000502@canterbury.ac.nz>	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA1734.6020103@canterbury.ac.nz>	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EB0DC0.3050906@canterbury.ac.nz>	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46ECDA66.3040702@v.loewis.de>

> The basic idea is to allocate code points in private space as-needed.

Ok, thanks. Would you be interested in implementing that scheme?

Regards,
Martin

From p.f.moore at gmail.com  Sun Sep 16 13:42:35 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 16 Sep 2007 12:42:35 +0100
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
	<66d0a6e10709151900l3d89b71u2e8b9bcb4b62b9f4@mail.gmail.com>
	
Message-ID: <79990c6b0709160442g44100aean3da2890085eb643@mail.gmail.com>

On 16/09/2007, Fred Drake  wrote:
> On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> > Then lets stop beating around the bush and implement an immutable
> > bytes type.  Why put ourselves through contortions trying to jam a
> > square peg into a round hole and not just decide to make a round peg?
>
> +42 !!!!

I knew this would come up again when I made that comment - I
deliberately didn't express an opinion then, as I didn't want to
obscure the point. But I'll come off the fence and admit that I'm also
in favour of an immutable bytes type (and for bytes literals to be of
that type).

Paul.

From larry at hastings.org  Sun Sep 16 14:01:52 2007
From: larry at hastings.org (Larry Hastings)
Date: Sun, 16 Sep 2007 05:01:52 -0700
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net>	<46EC8909.4050300@canterbury.ac.nz>
	
Message-ID: <46ED1B30.9030708@hastings.org>


Guido van Rossum wrote:
> On 9/15/07, Greg Ewing  wrote:
>   
>> Changing the existing behaviour of argv would probably
>> be too disruptive, so how about relegating argv to a
>> low-level detail and providing something else for
>> everyday use that omits argv[0]?
>>
>> sys.arguments would sound quite nice for that.
> This sounds awfully close to bikeshedding. Change too many details
> like this and you cause death by a 1000 pinpricks for existing apps.
> sys.argv[0] *does* get used (though arguably rarely in the same way as
> sys.argv[1:]).

+0.5 on Greg Ewing's proposal.

argv[0] has little in common with argv[1:]; why should the user have to 
differentiate them?  I see this as yet one more messy detail of the OS 
that Python could hide for me.  Looking at it with fresh eyes, I think
    for a in sys.arguments:
is a lot prettier than
    for a in sys.argv[1:]:
After all: what's that 1: doing there?  Why the magic number?  Why does 
argv have the script name in [0], anyway?  None of my other 
functions/members are forced to take themselves as their first argument.

Taking it to its logical conclusion, I further propose:
    sys.raw_argv -- the original bytes as they came in from from the OS
    sys.argv -- raw_argv converted into (unicode) strings, not expected 
to be used by users
    sys.arguments -- sys.argv[1:]
    sys.script_path -- sys.argv[0]
    sys.split_argv -- callable that takes an argv-style array (strings, 
not bytes) and assigns it into argv, arguments, and script_path, slicing 
as appropriate

Yes, the format of argv has thirty years of history; yes I don't really 
expect this discussion to get anywhere.  But I hate having arbitrary 
idioms in Python, and I wanted to cast my vote into the swirling void 
before this idea totally died.

If nothing else, at least we could fix the proviso for argv[0]: "(it is 
operating system dependent whether this is a full pathname or not)."  
How about we always ensure it is an absolute path?


My "there's only one way to do it" reflex is fighting it out with my 
"beautiful is better than ugly" reflex,


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070916/d659a8f2/attachment.htm 

From sasch.pe at gmx.de  Sun Sep 16 15:34:24 2007
From: sasch.pe at gmx.de (Sascha Peilicke)
Date: Sun, 16 Sep 2007 15:34:24 +0200
Subject: [Python-3000] Stackless anyone ?
Message-ID: <1189949664.5502.3.camel@schlepp>

Hello,

is or has there been any discussion about stackless and py3k?

regards,
Sascha Peilicke
-- 
http://saschashideout.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Dies ist ein digital signierter Nachrichtenteil
Url : http://mail.python.org/pipermail/python-3000/attachments/20070916/e63b92a7/attachment.pgp 

From arvind1.singh at gmail.com  Sun Sep 16 16:45:40 2007
From: arvind1.singh at gmail.com (Arvind Singh)
Date: Sun, 16 Sep 2007 20:15:40 +0530
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>
References: <200709111506.32823.mark@qtrac.eu>
	<200709142052.23583.mark@qtrac.eu> <46EAF6B1.8000705@v.loewis.de>
	<200709150604.18638.mark@qtrac.eu>
	
	<66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>
Message-ID: 

> How do you get from "some keys can't be ordered" to "it doesn't make
> sense for Python to have sorteddict or sortedset"?  If you want to use
> keys that can't be ordered, then feel free to continue to use dict.
> For situations in which ordering is important, that language should
> support that.  When did this become an all or nothing proposition?
> There's plenty of space for both dict and sorteddict.


Sorry for premature conclusions. All I wanted to do was remind the potential
problems with any "generic" implementation.

And I did say, when ordering is important, we are left with two choices:
1) Sort explicitly (whenever required) and be prepared to handle exceptions
raised during sort operation.
2) Have a implicitly "sorted" implementation and handle exceptions at every
insertion.

I, personally, tend to prefer the former solution. Later case is useful when
we have large objects and we do large number of insertions, in which case,
per insertion exception handling would be inefficient. Former case, in turn,
can be slightly confusing and a bit to debug.

-- 
Regards,
Arvind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070916/cf740b8b/attachment.htm 

From thomas at python.org  Sun Sep 16 17:24:33 2007
From: thomas at python.org (Thomas Wouters)
Date: Sun, 16 Sep 2007 17:24:33 +0200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EC8909.4050300@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>
Message-ID: <9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>

On 9/16/07, Greg Ewing  wrote:
>
> > Also "argv" sounds more low-level than something like "arguments".
>
> While we're on the subject of argv, I've been wondering
> whether py3k might want to revisit the idea of having
> argv[0] be the program name. In my experience, one almost
> *never* wants to treat argv[0] the same way as the rest of
> the arguments.


-1. If you want to put more meaning in the argv list, use an option parser.
The _actual_ meaning of each element depends entirely on the program that's
started. For Python-the-language, there isn't any difference between them.

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070916/7e01a3f5/attachment.htm 

From mathieu.fenniak at gmail.com  Sun Sep 16 17:46:18 2007
From: mathieu.fenniak at gmail.com (Mathieu Fenniak)
Date: Sun, 16 Sep 2007 09:46:18 -0600
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
Message-ID: 

Hi everyone,

I'd like to be able to derive from the bytes type, but this currently  
isn't possible due to it missing the Py_TPFLAGS_BASETYPE.  A comment  
next to the flags indicates that this class is "sealed / final".  I  
tried to search this list for some information on this, but I  
couldn't find any relevant posts.  Why is this type "sealed"?

I've experimented by adding the basetype flag to the type (with a  
recent svn checkout).  Python's test suite continues to run without  
any errors after this change.  My own project's test suite works  
flawlessly with a bytes derived type, as well.  I expected to  
encounter some error or difficulty that would explain why this type  
wasn't usable as a base type, but it seems to work great.

Mathieu

From guido at python.org  Sun Sep 16 19:02:09 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 16 Sep 2007 10:02:09 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
Message-ID: 

It is possible to compromise the integrity of a built-in type by
subclassing it if the type wasn't carefully written to expect
subclassing. The bytes type currently wasn't written to be careful
about this. Why can't you use containment instead of subclassing?

--Guido

On 9/16/07, Mathieu Fenniak  wrote:
> Hi everyone,
>
> I'd like to be able to derive from the bytes type, but this currently
> isn't possible due to it missing the Py_TPFLAGS_BASETYPE.  A comment
> next to the flags indicates that this class is "sealed / final".  I
> tried to search this list for some information on this, but I
> couldn't find any relevant posts.  Why is this type "sealed"?
>
> I've experimented by adding the basetype flag to the type (with a
> recent svn checkout).  Python's test suite continues to run without
> any errors after this change.  My own project's test suite works
> flawlessly with a bytes derived type, as well.  I expected to
> encounter some error or difficulty that would explain why this type
> wasn't usable as a base type, but it seems to work great.
>
> Mathieu
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From noamraph at gmail.com  Sun Sep 16 19:06:29 2007
From: noamraph at gmail.com (Noam Raphael)
Date: Sun, 16 Sep 2007 20:06:29 +0300
Subject: [Python-3000] The order of list comprehensions and generator
	expressions
Message-ID: 

Hello,

I had a thought about syntax I want to share with you.

Say you want to get a list of all the phone numbers of your friends.
You'll write something like this:
telephones = [friend.telephone for friend in friends]

Now suppose that, unfortunately, you have many friends, and they are
grouped by city. Now, you'll probably write:
telephones = [friend.telephone for friend in city.friends for city in cities]

and you'll (hopefully) get an exception, and change your line to:
telephones = [friend.telephone for city in cities for friend in city.friends]

and say, "Ah, I should've remembered this from the last time it
happened to me", and forget it until the next time it happens to you.

The reason is that the code:
for city in cities:
    for friend in city.friends:
        yield friend.telephone

makes sense if you read it from the first line to the last line, and
makes sense if you read it from the last line to the first line, but
doesn't make a lot of sense if you start from the last line and then
jump to the first line and read it from there. In other words, you can
go from the general to the specific, and you can go from the specific
to the general, but jumping from the most specific to the most general
and back again up to the second-most specific is strange.

All this is to say that I think that the "for" parts in list
comprehensions and generator expressions should, in a perfect world,
be evaluated in the other way round.

The question remains, what should be done with the "if" parts. A
possible solution is this: only one "if" part will be allowed after
each "for" part (you don't need more than that, since you can always
use the "and" operator). So, if I want to limit the list, my line will
look like this:

telephones = [
    friend.telephone
    for friend in city.friends if friend.is_really_good
    for city in cities if city.is_close_to_me
    ]

What do you think?
Noam

(P.S. Please don't be annoyed at me. The answer "this will break too
much code and isn't worth it" is, of course, very sensible. I just
thought that such thoughts can be posted to this list without causing
too much harm.)

From steven.bethard at gmail.com  Sun Sep 16 19:16:56 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Sun, 16 Sep 2007 11:16:56 -0600
Subject: [Python-3000] The order of list comprehensions and generator
	expressions
In-Reply-To: 
References: 
Message-ID: 

On 9/16/07, Noam Raphael  wrote:
> All this is to say that I think that the "for" parts in list
> comprehensions and generator expressions should, in a perfect world,
> be evaluated in the other way round.

This proposal is not really appropriate for the python-3000 list --
it's too late for any more core language changes in Python 3000.  If
the idea belongs anywhere, it belongs on the python-ideas list.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Sun Sep 16 20:42:42 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 16 Sep 2007 11:42:42 -0700
Subject: [Python-3000] The order of list comprehensions and generator
	expressions
In-Reply-To: 
References: 
Message-ID: 

I think it's not so obvious that reversing the order is any better
when you throw in some if clauses:

[friend for city in cities if city.name != "Amsterdam" for friend in
city.friends if friend.name != "Guido"]

vs.

[friend for friend in city.friends if friend.name != "Guido" for city
in cities if city.name != "Amsterdam"]

--Guido

On 9/16/07, Noam Raphael  wrote:
> Hello,
>
> I had a thought about syntax I want to share with you.
>
> Say you want to get a list of all the phone numbers of your friends.
> You'll write something like this:
> telephones = [friend.telephone for friend in friends]
>
> Now suppose that, unfortunately, you have many friends, and they are
> grouped by city. Now, you'll probably write:
> telephones = [friend.telephone for friend in city.friends for city in cities]
>
> and you'll (hopefully) get an exception, and change your line to:
> telephones = [friend.telephone for city in cities for friend in city.friends]
>
> and say, "Ah, I should've remembered this from the last time it
> happened to me", and forget it until the next time it happens to you.
>
> The reason is that the code:
> for city in cities:
>     for friend in city.friends:
>         yield friend.telephone
>
> makes sense if you read it from the first line to the last line, and
> makes sense if you read it from the last line to the first line, but
> doesn't make a lot of sense if you start from the last line and then
> jump to the first line and read it from there. In other words, you can
> go from the general to the specific, and you can go from the specific
> to the general, but jumping from the most specific to the most general
> and back again up to the second-most specific is strange.
>
> All this is to say that I think that the "for" parts in list
> comprehensions and generator expressions should, in a perfect world,
> be evaluated in the other way round.
>
> The question remains, what should be done with the "if" parts. A
> possible solution is this: only one "if" part will be allowed after
> each "for" part (you don't need more than that, since you can always
> use the "and" operator). So, if I want to limit the list, my line will
> look like this:
>
> telephones = [
>     friend.telephone
>     for friend in city.friends if friend.is_really_good
>     for city in cities if city.is_close_to_me
>     ]
>
> What do you think?
> Noam
>
> (P.S. Please don't be annoyed at me. The answer "this will break too
> much code and isn't worth it" is, of course, very sensible. I just
> thought that such thoughts can be posted to this list without causing
> too much harm.)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From noamraph at gmail.com  Sun Sep 16 21:01:29 2007
From: noamraph at gmail.com (Noam Raphael)
Date: Sun, 16 Sep 2007 21:01:29 +0200
Subject: [Python-3000] The order of list comprehensions and generator
	expressions
In-Reply-To: 
References: 
	
Message-ID: 

On 9/16/07, Guido van Rossum  wrote:
> I think it's not so obvious that reversing the order is any better
> when you throw in some if clauses:
>
> [friend for city in cities if city.name != "Amsterdam" for friend in
> city.friends if friend.name != "Guido"]
>
> vs.
>
> [friend for friend in city.friends if friend.name != "Guido" for city
> in cities if city.name != "Amsterdam"]
>
> --Guido
>

I think that it's still better, at least if you add some newlines:

[friend
(Ok, we are talking about a list of friends. From where do these
friends come from?)
for friend in city.friends if friend.name != "Guido"
(Ah, they are all the friends in a city who aren't called Guido. What
about the city?)
for city in cities if city.name != "Amsterdam"]
(Ah, the city is every city which isn't Amsterdam.)

Versus:

[friend
(Ok, we are talking about a list of friends. From where do these
friends come from?)
for city in cities if city.name != "Amsterdam"
(What do cities which aren't Amsterdam have to do with my friend?)
for friend in city.friends if friend.name != "Guido"]
(Ah, we're talking about all the friends in those cities who aren't
called Guido. Let's have a look at the first line to remember what we
do with them... ah, yes, we just return them...)

Noam

From tjreedy at udel.edu  Sun Sep 16 21:02:56 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 16 Sep 2007 15:02:56 -0400
Subject: [Python-3000] Stackless anyone ?
References: <1189949664.5502.3.camel@schlepp>
Message-ID: 


"Sascha Peilicke"  wrote in message 
news:1189949664.5502.3.camel at schlepp...
| is or has there been any discussion about stackless and py3k?

No.  C. Tismer has focused his current efforts on PyPy. 




From greg at krypto.org  Sun Sep 16 21:14:03 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 16 Sep 2007 12:14:03 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709160442g44100aean3da2890085eb643@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
	<66d0a6e10709151900l3d89b71u2e8b9bcb4b62b9f4@mail.gmail.com>
	
	<79990c6b0709160442g44100aean3da2890085eb643@mail.gmail.com>
Message-ID: <52dc1c820709161214o23160d2au6c80ae61f3e11bcd@mail.gmail.com>

On 9/16/07, Paul Moore  wrote:
> On 16/09/2007, Fred Drake  wrote:
> > On Sep 15, 2007, at 10:00 PM, Nicholas Bastin wrote:
> > > Then lets stop beating around the bush and implement an immutable
> > > bytes type.  Why put ourselves through contortions trying to jam a
> > > square peg into a round hole and not just decide to make a round peg?
> >
> > +42 !!!!
>
> I knew this would come up again when I made that comment - I
> deliberately didn't express an opinion then, as I didn't want to
> obscure the point. But I'll come off the fence and admit that I'm also
> in favour of an immutable bytes type (and for bytes literals to be of
> that type).
>
> Paul.

FYI - my first patch in the bytes object support for PyBUF_LOCKDATA
thread added support for immutable bytes.  I didn't add a hash method
yet but that should be trivial.

Should the hash method raise an exception when set_immutable has not
been called yet or should it call set_immutable?  I'm in favor of the
exception.  side effects are bad.

-gps

From mathieu.fenniak at gmail.com  Sun Sep 16 22:19:57 2007
From: mathieu.fenniak at gmail.com (Mathieu Fenniak)
Date: Sun, 16 Sep 2007 14:19:57 -0600
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
Message-ID: <8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>

On 16-Sep-07, at 12:38 PM, Guido van Rossum wrote:
> I'm not doubting that *your* subclass works well enough. The problem
> is that it must robust in the light of *any* subclass, no matter how
> crazy.

I understand that, but I'm not sure what kind of problems can be  
created by crazy subclasses.  But my imagination of "crazy subclass"  
is pretty limited.

> I'd have to understand more about your app to see whether subclassing
> truly makes sense.

I didn't want to flood too many pointless details into the  
discussion, so here's the minimum that I think is relevant.  The  
project is pyPdf, a library for reading and writing PDF files.  I've  
been working on making the library support unicode text strings  
within PDF documents.

In a PDF file, a "string" can either be a text string, or a byte  
string.  A string is a text string if it starts with a UTF-16BE BOM  
marker, or if it can be decoded using an encoding called  
PDFDocEncoding (which is specified by the PDF reference, similar to  
Latin-1 but different just to make life difficult).  pyPdf needs to  
be capable of reading and writing these string objects.  Whether a  
string is a byte or a text string, writing out the raw bytes is the  
same process after the text has been encoded.  This lends itself to a  
common StringObject base class:

class StringObject(PdfObject):
     # contains common behavior for both types of strings, such as  
the ability to serialize out a byte array, encrypt/decrypt strings  
for "secure" PDF files
     # also contains reading code that attempts to autodetect whether  
the string is a byte or text string

class ByteStringObject(bytes, StringObject):
     # adds the byte array storage, and passes self back to  
StringObject for serialization output

class TextStringObject(str, StringObject):
     # overrides the default output serialization to encode the  
unicode string to match PDF's requirements,
     # passes the resulting byte array up for serialization.

(complete source code, if you're interested: http://hg.pybrary.net/ 
pyPdf-py3/file/fe0dc2014a1b/pyPdf/generic.py)

Deriving from the bytes type provides storage, and also direct & easy  
access to the byte array content.  I think in this case using bytes  
as a base type makes sense, at least as much as using str as a base  
type.  pyPdf derives from list and dict for different PDF object  
types in a similar manner as well.

Mathieu Fenniak

From greg.ewing at canterbury.ac.nz  Mon Sep 17 01:01:33 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 11:01:33 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
Message-ID: <46EDB5CD.6020205@canterbury.ac.nz>

Gregory P. Smith wrote:
> argv is the C/C++ name for bytes, lets not
> confuse people.

C has never made a clear distinction between characters
and bytes, using the type 'char' for both. It got away
with it for the same reason that Python did until
unicode came along. I'm pretty sure most people using
argv in C thought of it as holding characters. Certainly
I always did.

As far as I know, most other places in Python are going
to deal with the changes by keeping the existing text
APIs as returning text, e.g. open() gives you a text
mode I/O object by default with an assumed encoding,
and to get bytes you need to do something explicit
(e.g. opening the file in binary mode).

I don't see why argv should be different.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Sep 17 01:03:54 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 11:03:54 +1200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
Message-ID: <46EDB65A.9040402@canterbury.ac.nz>

Paul Moore wrote:
> On 15/09/2007, Gregory P. Smith  wrote:
> 
>>similarly for the environment.  os.environ dict
>>should be bytes object keys and values
> 
> You can't have bytes as keys - the type isn't hashable...

Has there been any consensus reached yet on whether
there will be a frozenbytes type? I can see the
non-hashability of bytes leading to lots of
annoyances like this.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Sep 17 01:32:05 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 11:32:05 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net> <46EC8909.4050300@canterbury.ac.nz>
	
Message-ID: <46EDBCF5.6090209@canterbury.ac.nz>

Guido van Rossum wrote:
> This sounds awfully close to bikeshedding.

I don't agree with that assessment. This is something
I've had in mind for quite a while. Python optimises
this for the *least* frequent use case, which is just
plain silly, as far as I can see. The only reason for
it is because that's the way C does it. That might be
called a foolish consistency.

> Change too many details
> like this and you cause death by a 1000 pinpricks for existing apps.

That's why I'm not suggesting that argv itself be
changed, but that something new be added for the
more frequent use cases.

--
Greg

From guido at python.org  Mon Sep 17 03:56:09 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 16 Sep 2007 18:56:09 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: <8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
Message-ID: 

On 9/16/07, Mathieu Fenniak  wrote:
> On 16-Sep-07, at 12:38 PM, Guido van Rossum wrote:
> > I'm not doubting that *your* subclass works well enough. The problem
> > is that it must robust in the light of *any* subclass, no matter how
> > crazy.
>
> I understand that, but I'm not sure what kind of problems can be
> created by crazy subclasses.  But my imagination of "crazy subclass"
> is pretty limited.
>
> > I'd have to understand more about your app to see whether subclassing
> > truly makes sense.
>
> I didn't want to flood too many pointless details into the
> discussion, so here's the minimum that I think is relevant.  The
> project is pyPdf, a library for reading and writing PDF files.  I've
> been working on making the library support unicode text strings
> within PDF documents.
>
> In a PDF file, a "string" can either be a text string, or a byte
> string.  A string is a text string if it starts with a UTF-16BE BOM
> marker, or if it can be decoded using an encoding called
> PDFDocEncoding (which is specified by the PDF reference, similar to
> Latin-1 but different just to make life difficult).  pyPdf needs to
> be capable of reading and writing these string objects.  Whether a
> string is a byte or a text string, writing out the raw bytes is the
> same process after the text has been encoded.  This lends itself to a
> common StringObject base class:
>
> class StringObject(PdfObject):
>      # contains common behavior for both types of strings, such as
> the ability to serialize out a byte array, encrypt/decrypt strings
> for "secure" PDF files
>      # also contains reading code that attempts to autodetect whether
> the string is a byte or text string
>
> class ByteStringObject(bytes, StringObject):
>      # adds the byte array storage, and passes self back to
> StringObject for serialization output
>
> class TextStringObject(str, StringObject):
>      # overrides the default output serialization to encode the
> unicode string to match PDF's requirements,
>      # passes the resulting byte array up for serialization.
>
> (complete source code, if you're interested: http://hg.pybrary.net/
> pyPdf-py3/file/fe0dc2014a1b/pyPdf/generic.py)
>
> Deriving from the bytes type provides storage, and also direct & easy
> access to the byte array content.  I think in this case using bytes
> as a base type makes sense, at least as much as using str as a base
> type.  pyPdf derives from list and dict for different PDF object
> types in a similar manner as well.

So suppose my answer was "no, bytes won't be subclassable". How much
would you really lose by having to wrap a separate object around a
bytes object, rather than being able to subclass? How much extra code
do you think you would have to write?

Another way to look at it-- how much of the bytes type's API do your
objects really have to support?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mathieu.fenniak at gmail.com  Mon Sep 17 05:19:34 2007
From: mathieu.fenniak at gmail.com (Mathieu Fenniak)
Date: Sun, 16 Sep 2007 21:19:34 -0600
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
Message-ID: 

On 16-Sep-07, at 7:56 PM, Guido van Rossum wrote:
> So suppose my answer was "no, bytes won't be subclassable". How much
> would you really lose by having to wrap a separate object around a
> bytes object, rather than being able to subclass? How much extra code
> do you think you would have to write?
>
> Another way to look at it-- how much of the bytes type's API do your
> objects really have to support?

Most often, I'd be concatenating and comparing with other bytes  
objects, iterating through the byte array, and passing the byte array  
into methods like stream.write.  Iterating and comparing could be  
dealt with by some code in the containing class; for other needs, I  
would sprinkle ".data" property accesses throughout the code to  
access the bytes instance.

I'm not too concerned about the programming I'd have to do, even  
though the end result wouldn't really be what I'd like to have.  It's  
not the end of the world, it's just not ideal.

I do think that subclassing bytes would probably be a request a few  
people would have, especially when porting Python 2 code that  
subclasses str.  It seems especially unusual that bytes can't be  
subclassed, when builtin types like str, list, dict, and set can be.

Mathieu

From greg.ewing at canterbury.ac.nz  Mon Sep 17 05:58:18 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 15:58:18 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46ED1B30.9030708@hastings.org>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net> <46EC8909.4050300@canterbury.ac.nz>
	
	<46ED1B30.9030708@hastings.org>
Message-ID: <46EDFB5A.2010808@canterbury.ac.nz>

Larry Hastings wrote:
> If nothing else, at least we could fix the proviso for argv[0]: "(it is 
> operating system dependent whether this is a full pathname or not)."  

It's actually worse than that -- you're entirely at the
mercy of whatever made the exec() call as to whether it's
a meaningful path at all.

Most programs are courteous enough to make sure it's at
least a relative path to the executable being run, but
you can't rely on that.

I'm not sure munging argv[0] to an absolute path is the
right thing to do, if it's to be regarded as a low-level
thing. A program wanting low-level access to argv might
want to know exactly what was passed to exec() for some
reason.

A separate sys.absolute_path_to_executable() or
something might be better.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Mon Sep 17 06:04:11 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 16:04:11 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net> <46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
Message-ID: <46EDFCBB.8010306@canterbury.ac.nz>

Thomas Wouters wrote:
> If you want to put more meaning in the argv list, use an option 
> parser.

I want to put *less* meaning in it, not more. :-)
And using an argument parser is often overkill for
simple programs.

> The _actual_ meaning of each element depends entirely on the 
> program that's started. For Python-the-language, there isn't any 
> difference between them.

So in your Python programs, you're quite happy
to write

   for arg in sys.argv:
     process(arg)

and not care about what this does with argv[0]?

I hardly see how one can claim that there's
"no difference" between argv[0] and the rest
for practical purposes.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Mon Sep 17 06:08:31 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 17 Sep 2007 16:08:31 +1200
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
Message-ID: <46EDFDBF.7010307@canterbury.ac.nz>

Guido van Rossum wrote:
> It is possible to compromise the integrity of a built-in type by
> subclassing it if the type wasn't carefully written to expect
> subclassing.

Disallowing subclassing in Python may make sense, but
it seems unreasonable not to allow subclassing by
consenting C code that is careful not to compromise
any integrity.

Maybe there should be two flags for this instead of
just one?

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From stephen at xemacs.org  Mon Sep 17 07:55:54 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 17 Sep 2007 14:55:54 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46ECDA66.3040702@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46ECDA66.3040702@v.loewis.de>
Message-ID: <878x75ltsl.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > > The basic idea is to allocate code points in private space as-needed.
 > 
 > Ok, thanks. Would you be interested in implementing that scheme?

Yes.  I'm recovering from moving from Japan to California, and will be
busy until the beginning of October, I'll get started on it then.  For
this kind of thing, what is the deadline for submission of a patch?
Before the alpha, early beta?


From guido at python.org  Mon Sep 17 16:55:43 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 07:55:43 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: <46EDFDBF.7010307@canterbury.ac.nz>
References: 
	
	<46EDFDBF.7010307@canterbury.ac.nz>
Message-ID: 

On 9/16/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
> > It is possible to compromise the integrity of a built-in type by
> > subclassing it if the type wasn't carefully written to expect
> > subclassing.
>
> Disallowing subclassing in Python may make sense, but
> it seems unreasonable not to allow subclassing by
> consenting C code that is careful not to compromise
> any integrity.
>
> Maybe there should be two flags for this instead of
> just one?

AFAIK there's nothing stopping you from subclassing in C. I thought we
were talking about Python though.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Sep 17 17:00:20 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 08:00:20 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
Message-ID: 

On 9/16/07, Mathieu Fenniak  wrote:
> On 16-Sep-07, at 7:56 PM, Guido van Rossum wrote:
> > So suppose my answer was "no, bytes won't be subclassable". How much
> > would you really lose by having to wrap a separate object around a
> > bytes object, rather than being able to subclass? How much extra code
> > do you think you would have to write?
> >
> > Another way to look at it-- how much of the bytes type's API do your
> > objects really have to support?
>
> Most often, I'd be concatenating and comparing with other bytes
> objects, iterating through the byte array, and passing the byte array
> into methods like stream.write.  Iterating and comparing could be
> dealt with by some code in the containing class; for other needs, I
> would sprinkle ".data" property accesses throughout the code to
> access the bytes instance.

OK, so it sounds like at least you are treating it as a bytes array.

> I'm not too concerned about the programming I'd have to do, even
> though the end result wouldn't really be what I'd like to have.  It's
> not the end of the world, it's just not ideal.
>
> I do think that subclassing bytes would probably be a request a few
> people would have, especially when porting Python 2 code that
> subclasses str.

Well, due to bytes' mutability, they'll be in for a ton of surprises.

>  It seems especially unusual that bytes can't be
> subclassed, when builtin types like str, list, dict, and set can be.

Maybe I should apologize for pushing back so hard, but in my
experience most people who subclass a built-in type do it because they
can, not because they should -- the lamented "path" module being a
prime example in my view.

I'm still not convinced of the usefulness in your case -- what would
you lose if you just passed a bytes instance around instead of an
instance of the subclass you'd like to have?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rrr at ronadam.com  Mon Sep 17 17:51:18 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 17 Sep 2007 10:51:18 -0500
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EDFCBB.8010306@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>	<46EB0EC2.4030208@canterbury.ac.nz>	<46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz>
Message-ID: <46EEA276.4040901@ronadam.com>



Greg Ewing wrote:
> Thomas Wouters wrote:
>> If you want to put more meaning in the argv list, use an option 
>> parser.
> 
> I want to put *less* meaning in it, not more. :-)
> And using an argument parser is often overkill for
> simple programs.

Would it be possible to split out the (pre) parsing from optparse so that 
instead of returning a list, it returns a dictionary of attributes and values?

This would only contain what was given in the command line as a first 
"lighter weight" step to parsing the command line.

    opts = opt_parser(argv)
    command_name = opts['argv0']   # better name for argv0?

Or...

    opts = opt_parser(argv)
    if "-h" in opts or "--h" in opts:
       print("Help on {argv0}: ...".format(opts))


If the dictionary was pre defined with defaults it might be more like..

     opts = {'-h':False, '--h':False}
     opts.update(opt_parser(argv)

     if opts['-h'] or opts['--h']:
        print("Help on {argv0}: ...".format(opts))


This avoids the loop for the simplest cases.

A second dispatcher/validator object could then use this as input.


Regards,
   Ron



>> The _actual_ meaning of each element depends entirely on the 
>> program that's started. For Python-the-language, there isn't any 
>> difference between them.
> 
> So in your Python programs, you're quite happy
> to write
> 
>    for arg in sys.argv:
>      process(arg)
> 
> and not care about what this does with argv[0]?
> 
> I hardly see how one can claim that there's
> "no difference" between argv[0] and the rest
> for practical purposes.
> 

From mathieu.fenniak at gmail.com  Mon Sep 17 18:44:18 2007
From: mathieu.fenniak at gmail.com (Mathieu Fenniak)
Date: Mon, 17 Sep 2007 10:44:18 -0600
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
Message-ID: <96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>

On 17-Sep-07, at 9:00 AM, Guido van Rossum wrote:
> Maybe I should apologize for pushing back so hard, but in my
> experience most people who subclass a built-in type do it because they
> can, not because they should -- the lamented "path" module being a
> prime example in my view.
>
> I'm still not convinced of the usefulness in your case -- what would
> you lose if you just passed a bytes instance around instead of an
> instance of the subclass you'd like to have?

The builtin type subclasses in pyPdf (including the would-be bytes  
subclass) add additional methods that every pdf object is expected to  
support.  All the PDF object types have two additional methods  
(writeToStream and getObject) that have varying behavior for each  
class: (relatively inconsequential PDF information follows)

"writeToStream" method that serializes the object -- a byte string  
would write out <68656c6c6f>, a text string (hello), and so on for  
other more complex types (dictionaries, labels, arrays, PDF data  
streams).  The type is also responsible for encrypting itself when  
applicable.

PDF files also have an ability to reference objects elsewhere in the  
file.  For example, the length of a content stream can be a simple  
"500 bytes", or it can be "read this length at offset X in the  
file".  Since almost any object can be an indirect object reference,  
the library objects support a "getObject" method that returns self --  
excluding PDF "indirect object reference" objects, which read an  
object from a table in a PDF file.

If you decide that bytes should be subclassable, I've included with  
this e-mail a patch that adds the basetype bit, adds some unit tests  
for bytes subclasses, and includes __dict__ in the bytes_reduce  
method (for pickling subclass instances).  I was going to upload this  
to the SF patch manager, but it appears to be closed to permit only  
project members access.

Mathieu Fenniak

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bytes-subclass-patch.diff
Type: application/octet-stream
Size: 4933 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070917/317ebd40/attachment.obj 

From tjreedy at udel.edu  Mon Sep 17 19:27:42 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 17 Sep 2007 13:27:42 -0400
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
References: <443E4C09-853C-474F-9150-A0EFA5418154@gmail.com><8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
Message-ID: 


"Mathieu Fenniak"  wrote in message 
news:96487E1C-4BA3-4FEC-9080-2C09AD330197 at gmail.com...
| method (for pickling subclass instances).  I was going to upload this
| to the SF patch manager, but it appears to be closed to permit only
| project members access.

Because SF is only an archive now.
We are now using bugs.python.org.
Your SF account may have carried over.  If not, sign up is easy.




From guido at python.org  Mon Sep 17 19:53:58 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 10:53:58 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: <96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
Message-ID: 

On 9/17/07, Mathieu Fenniak  wrote:
> On 17-Sep-07, at 9:00 AM, Guido van Rossum wrote:
> > Maybe I should apologize for pushing back so hard, but in my
> > experience most people who subclass a built-in type do it because they
> > can, not because they should -- the lamented "path" module being a
> > prime example in my view.
> >
> > I'm still not convinced of the usefulness in your case -- what would
> > you lose if you just passed a bytes instance around instead of an
> > instance of the subclass you'd like to have?
>
> The builtin type subclasses in pyPdf (including the would-be bytes
> subclass) add additional methods that every pdf object is expected to
> support.  All the PDF object types have two additional methods
> (writeToStream and getObject) that have varying behavior for each
> class: (relatively inconsequential PDF information follows)
>
> "writeToStream" method that serializes the object -- a byte string
> would write out <68656c6c6f>, a text string (hello), and so on for
> other more complex types (dictionaries, labels, arrays, PDF data
> streams).  The type is also responsible for encrypting itself when
> applicable.

This sounds like a perfect application for generic functions instead
of subclassing.

> PDF files also have an ability to reference objects elsewhere in the
> file.  For example, the length of a content stream can be a simple
> "500 bytes", or it can be "read this length at offset X in the
> file".  Since almost any object can be an indirect object reference,
> the library objects support a "getObject" method that returns self --
> excluding PDF "indirect object reference" objects, which read an
> object from a table in a PDF file.

Similar.

> If you decide that bytes should be subclassable, I've included with
> this e-mail a patch that adds the basetype bit, adds some unit tests
> for bytes subclasses, and includes __dict__ in the bytes_reduce
> method (for pickling subclass instances).  I was going to upload this
> to the SF patch manager, but it appears to be closed to permit only
> project members access.
>
> Mathieu Fenniak
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mathieu.fenniak at gmail.com  Mon Sep 17 20:33:18 2007
From: mathieu.fenniak at gmail.com (Mathieu Fenniak)
Date: Mon, 17 Sep 2007 12:33:18 -0600
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
	
Message-ID: 

On 17-Sep-07, at 11:53 AM, Guido van Rossum wrote:
>> "writeToStream" method that serializes the object -- a byte string
>> would write out <68656c6c6f>, a text string (hello), and so on for
>> other more complex types (dictionaries, labels, arrays, PDF data
>> streams).  The type is also responsible for encrypting itself when
>> applicable.
>
> This sounds like a perfect application for generic functions instead
> of subclassing.

Sure, there are other options for writing and organizing this code.   
But, this is a valid application of subclassing the bytes type, and  
it is the method I would prefer to be able to implement.

Mathieu

From guido at python.org  Mon Sep 17 20:44:33 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 11:44:33 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
	
	
Message-ID: 

I understand. But bytes are still in flux (see the repeated requests
for immutable bytes) and I don't want to commit to anything just yet.

On 9/17/07, Mathieu Fenniak  wrote:
> On 17-Sep-07, at 11:53 AM, Guido van Rossum wrote:
> >> "writeToStream" method that serializes the object -- a byte string
> >> would write out <68656c6c6f>, a text string (hello), and so on for
> >> other more complex types (dictionaries, labels, arrays, PDF data
> >> streams).  The type is also responsible for encrypting itself when
> >> applicable.
> >
> > This sounds like a perfect application for generic functions instead
> > of subclassing.
>
> Sure, there are other options for writing and organizing this code.
> But, this is a valid application of subclassing the bytes type, and
> it is the method I would prefer to be able to implement.
>
> Mathieu
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From qrczak at knm.org.pl  Mon Sep 17 21:12:00 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Mon, 17 Sep 2007 21:12:00 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1190056321.14217.21.camel@qrnik>

Dnia 15-09-2007, So o godzinie 09:13 +0900, Stephen J. Turnbull
napisa?(a):

>  > Well, for any scheme which attempts to modify UTF-8 by accepting
>  > arbitrary byte strings is used, *something* must be interpreted
>  > differently than in real UTF-8.
> 
> Wrong.  In my scheme everything ends up in the PUA, on which real
> UTF-8 imposes no interpretation by definition.

This is wrong: UTF-8 is specified for PUA. PUA is no special from the
point of view of UTF-8. UTF-8 is defined for all Unicode scalar values,
i.e. all code points in the ranges U+0000..U+D7FF and U+E000..U+10FFFF,
i.e. all code points excluding surrogates. This includes PUA.

> I haven't gone back to check yet, but it's possible that a "real UTF-8
> conforming process" is required to stop processing and issue an error
> or something like that in the cases we're trying to handle.

"C10. When a process interprets a code unit sequence which purports to
be in a Unicode character encoding form, it shall treat ill-formed code
unit sequences as an error condition and shall not interpret such
sequences as characters."

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From martin at v.loewis.de  Mon Sep 17 23:45:50 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 17 Sep 2007 23:45:50 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <878x75ltsl.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>		<46E96E98.9080406@v.loewis.de>	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA0778.3000502@canterbury.ac.nz>	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EA1734.6020103@canterbury.ac.nz>	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EB0DC0.3050906@canterbury.ac.nz>	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>	<46EB6EA1.5020104@v.loewis.de>	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>	<46ECDA66.3040702@v.loewis.de>
	<878x75ltsl.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <46EEF58E.5050809@v.loewis.de>

> Yes.  I'm recovering from moving from Japan to California, and will be
> busy until the beginning of October, I'll get started on it then.  For
> this kind of thing, what is the deadline for submission of a patch?
> Before the alpha, early beta?

Either would work fine, unless somebody else does it first.

Regards,
Martin

From qrczak at knm.org.pl  Tue Sep 18 01:06:54 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 18 Sep 2007 01:06:54 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1190070414.20673.12.camel@qrnik>

Dnia 16-09-2007, N o godzinie 16:13 +0900, Stephen J. Turnbull
napisa?(a):

> When a codec encounters something it can't handle, whether it's a
> valid character in a legacy encoding, a private use character in a
> UTF, or an invalid sequence of code units, it throws an exception
> specifying the character or code unit and the current coded character
> set,

Does this mean that this:
$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650")
would no longer print e650 in a UTF-8 locale, assuming a shell which
understands the escape sequence in printf, and the script would have
to make special arrangements to make the character available? U+E650
is a private use character.

If so, I'm violently against this.

> This definitely requires that the Unicode codecs be modified to do the
> right thing if they encounter private use characters in the input
> stream or output stream.

The right thing is to encode or decode private use characters according
to regular codec rules, as all other transcoders of these codecs in all
other languages do.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From mike.klaas at gmail.com  Tue Sep 18 01:41:57 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Mon, 17 Sep 2007 16:41:57 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46EDB65A.9040402@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<52dc1c820709151336y36f753ffw6359da92ff8dd2e@mail.gmail.com>
	<79990c6b0709151429q52f06744i86f4b9020d6ce639@mail.gmail.com>
	<46EDB65A.9040402@canterbury.ac.nz>
Message-ID: <8861D3FA-ABD1-4C6B-BA5D-A897D3B36194@gmail.com>

On 16-Sep-07, at 4:03 PM, Greg Ewing wrote:

> Paul Moore wrote:
>> On 15/09/2007, Gregory P. Smith  wrote:
>>
>>> similarly for the environment.  os.environ dict
>>> should be bytes object keys and values
>>
>> You can't have bytes as keys - the type isn't hashable...
>
> Has there been any consensus reached yet on whether
> there will be a frozenbytes type? I can see the
> non-hashability of bytes leading to lots of
> annoyances like this.

Might it make things clearer to use something other than the X/ 
frozenX nomenclature?

bytes -> b'HELO' -> immutable octet list

bytebuf -> mutable octet buffer (current bytes() objects)

buf = bytebuf()
buf.append(read(1024))
print buf

bytebuf(b'HELO')

-Mike

From greg.ewing at canterbury.ac.nz  Tue Sep 18 02:07:51 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 18 Sep 2007 12:07:51 +1200
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	
	<46EDFDBF.7010307@canterbury.ac.nz>
	
Message-ID: <46EF16D7.2010402@canterbury.ac.nz>

Guido van Rossum wrote:
> AFAIK there's nothing stopping you from subclassing in C.

That may be true -- I may have just incorrectly assumed
that the flag would prevent subclassing from working
properly in C as well.

> I thought we were talking about Python though.

That may be true as well. I think I got mixed up with
a discussion about adding an immutable bytes type as
a builtin.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Sep 18 02:13:14 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 18 Sep 2007 12:13:14 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EEA276.4040901@ronadam.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net> <46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz> <46EEA276.4040901@ronadam.com>
Message-ID: <46EF181A.5040907@canterbury.ac.nz>

Ron Adam wrote:
> Would it be possible to split out the (pre) parsing from optparse so 
> that instead of returning a list

Whatever is done, anything putting itself forward as a light
duty argument parser has to have a *very* simple API. Neither
of the current ones fits my brain, and I have to go looking
up the docs every time I want to use them.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Sep 18 02:21:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 18 Sep 2007 12:21:38 +1200
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: 
References: 
	<443E4C09-853C-474F-9150-A0EFA5418154@gmail.com>
	
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
	
	
	
Message-ID: <46EF1A12.4050608@canterbury.ac.nz>

Guido van Rossum wrote:
> I understand. But bytes are still in flux (see the repeated requests
> for immutable bytes)

Moreover, my feeling is that immutable byte should be
the *default*, and if you want mutable bytes you
should have to ask for it.

This would make bytes more symmetrical with strings,
where immutability is the default, and if you want
mutability you use an array.array('c') or whatever
the equivalent will be in py3k.

It would also help to settle the question of
whether b"xyz" should be mutable -- clearly not,
for symmetry with strings.

--
Greg

From guido at python.org  Tue Sep 18 02:45:23 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 17:45:23 -0700
Subject: [Python-3000] bytes & Py_TPFLAGS_BASETYPE
In-Reply-To: <46EF1A12.4050608@canterbury.ac.nz>
References: 
	<8D27D90B-EF8C-42FB-B104-FB98EC65DC1E@gmail.com>
	
	
	
	<96487E1C-4BA3-4FEC-9080-2C09AD330197@gmail.com>
	
	
	
	<46EF1A12.4050608@canterbury.ac.nz>
Message-ID: 

On 9/17/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
> > I understand. But bytes are still in flux (see the repeated requests
> > for immutable bytes)
>
> Moreover, my feeling is that immutable byte should be
> the *default*, and if you want mutable bytes you
> should have to ask for it.
>
> This would make bytes more symmetrical with strings,
> where immutability is the default, and if you want
> mutability you use an array.array('c') or whatever
> the equivalent will be in py3k.
>
> It would also help to settle the question of
> whether b"xyz" should be mutable -- clearly not,
> for symmetry with strings.

I'm considering the following option -- it would help if someone
explored creating a patch to implement this, just to see the minimum
amount of code that would need to change compared to 3.0a1: bytes are
always immutable, and for the few places where a mutable bytes buffer
would be handy, we use the array module. Then it would also make sense
to make b[0] return a bytes array of length 1 instead of a small int
-- bytes would be more similar to str in 2.x, albeit completely
incompatible with str in terms of mixed operations.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 18 04:18:01 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 17 Sep 2007 19:18:01 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
Message-ID: 

This may have passed in a thread where no-one was listening, so I'm
repeating it here.

I'm considering the following option: bytes would always be immutable,
and for the few places (mostly in io.py) where a mutable bytes buffer
would be handy, we use the array module. Then it would also make sense
to make b[0] return a bytes array of length 1 instead of a small int
-- bytes would be more similar to str in 2.x, albeit completely
incompatible with str in terms of mixed operations.

It would help if someone explored creating a patch to implement this,
just to see the minimum amount of code that would need to change
compared to 3.0a1. (The challenge includes making all the tests pass
again.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Tue Sep 18 04:32:47 2007
From: talin at acm.org (Talin)
Date: Mon, 17 Sep 2007 19:32:47 -0700
Subject: [Python-3000] Stackless anyone ?
In-Reply-To: 
References: <1189949664.5502.3.camel@schlepp> 
Message-ID: <46EF38CF.4020801@acm.org>

Terry Reedy wrote:
> "Sascha Peilicke"  wrote in message 
> news:1189949664.5502.3.camel at schlepp...
> | is or has there been any discussion about stackless and py3k?
> 
> No.  C. Tismer has focused his current efforts on PyPy. 

That seems like the right strategy to me. Rather than focusing on a 
specific implementation, it seems better to me to work on an abstract 
representation of the Python language which can be "rendered" into 
various implementations.

I think for those people in that other thread about threads (which I 
won't mention by name for fear of bringing that thread over here), that 
the ultimate solution to Python concurrency won't be via patching 
CPython, but to compile the meta-Python language to a back-end 
representation that is inherently  concurrent.

-- Talin

From talin at acm.org  Tue Sep 18 04:42:04 2007
From: talin at acm.org (Talin)
Date: Mon, 17 Sep 2007 19:42:04 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
Message-ID: <46EF3AFC.7000904@acm.org>

Guido van Rossum wrote:
> This may have passed in a thread where no-one was listening, so I'm
> repeating it here.
> 
> I'm considering the following option: bytes would always be immutable,
> and for the few places (mostly in io.py) where a mutable bytes buffer
> would be handy, we use the array module. Then it would also make sense
> to make b[0] return a bytes array of length 1 instead of a small int
> -- bytes would be more similar to str in 2.x, albeit completely
> incompatible with str in terms of mixed operations.
> 
> It would help if someone explored creating a patch to implement this,
> just to see the minimum amount of code that would need to change
> compared to 3.0a1. (The challenge includes making all the tests pass
> again.)

I don't know if I mentioned this before, since (a) I didn't want to be a 
distraction while you were busy trying to make mutable bytes work 
everywhere, and (b) I didn't want to sound completely insane. However - 
here is my vision of how things would look in an ideal world:

Data Type   AbstractSequence  Immutable   Mutable
=========   ================  =========   =======
byte        ByteSequence      bytes       buffer
character   CharSequence      str         strbuf

'buffer' could be an array.array, although if it's used frequently 
enough an optimized special-case 'buffer' class might be better. And it 
can have methods that array doesn't have.

-- Talin

From rrr at ronadam.com  Tue Sep 18 05:28:28 2007
From: rrr at ronadam.com (Ron Adam)
Date: Mon, 17 Sep 2007 22:28:28 -0500
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EF181A.5040907@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>	<46EA5114.9060200@coli.uni-saarland.de>	<46EB0EC2.4030208@canterbury.ac.nz>	<46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>	<46EDFCBB.8010306@canterbury.ac.nz>
	<46EEA276.4040901@ronadam.com> <46EF181A.5040907@canterbury.ac.nz>
Message-ID: <46EF45DC.50501@ronadam.com>



Greg Ewing wrote:
> Ron Adam wrote:
>> Would it be possible to split out the (pre) parsing from optparse so 
>> that instead of returning a list
> 
> Whatever is done, anything putting itself forward as a light
> duty argument parser has to have a *very* simple API. Neither
> of the current ones fits my brain, and I have to go looking
> up the docs every time I want to use them.

I agree.  I like reusing dictionaries and lists when possible over special 
types because I don't have to look up how to use them.

Ron

From stephen at xemacs.org  Tue Sep 18 06:08:29 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 18 Sep 2007 13:08:29 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1190056321.14217.21.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
Message-ID: <18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>

>>>>> "Marcin 'Qrczak' Kowalczyk"  writes:
 >>  > Well, for any scheme which attempts to modify UTF-8 by accepting
 >>  > arbitrary byte strings is used, *something* must be interpreted
 >>  > differently than in real UTF-8.

 >> Wrong.  In my scheme everything ends up in the PUA, on which real
 >> UTF-8 imposes no interpretation by definition.

 > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
 > point of view of UTF-8.

It is from the point of view of the Unicode standard, specifically v5.
Please see section 16.5, especially about the "corporate use subarea".
(No, I hadn't considered this stuff yet in my proposal, but it's not
hard to accomodate.)

 > UTF-8 is defined for all Unicode scalar values,

Sure, and what I propose is entirely compatible with the specification
of UTF-8 as a UTF, unlike what you propose.  Until you understand why
that's true, we're at an impasse.

 >> I haven't gone back to check yet, but it's possible that a "real UTF-8
 >> conforming process" is required to stop processing and issue an error
 >> or something like that in the cases we're trying to handle.

 > "C10. When a process interprets a code unit sequence which purports to
 > be in a Unicode character encoding form, it shall treat ill-formed code
 > unit sequences as an error condition and shall not interpret such
 > sequences as characters."

Yeah, that's the one.

While I'm uncomfortable advocating the position that my proposal is
entirely compatible with C10, it is true that it treats ill-formed
sequences as an error, and it is arguable that "mapping code units to
characters in private space" is not the same as "interpreting them as
characters".  For obvious reasons I'm uncomfortable with that, but I
actually don't consider this non-conformance a huge loss in the
context of this thread since both your proposal and James Knight's do
equally non-conformant things.


From stephen at xemacs.org  Tue Sep 18 06:56:37 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 18 Sep 2007 13:56:37 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1190070414.20673.12.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de>
	<87veaejths.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA0778.3000502@canterbury.ac.nz>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
Message-ID: <18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>

>>>>> "Marcin 'Qrczak' Kowalczyk"  writes:

 >> When a codec encounters something it can't handle, whether it's a
 >> valid character in a legacy encoding, a private use character in a
 >> UTF, or an invalid sequence of code units, it throws an exception
 >> specifying the character or code unit and the current coded character
 >> set,

 > Does this mean that this:
 > $ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650")
 > would no longer print e650 in a UTF-8 locale

What do you mean "no longer"?  Look:

chibi:MacPorts steve$ export LC_ALL=en_US.UTF-8
chibi:MacPorts steve$ python -c 'import sys; print("%s" % sys.argv[1])' $(printf "\ue650") 
\ue650
chibi:MacPorts steve$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650") 
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: ord() expected a character, but string of length 6 found
chibi:MacPorts steve$ 

Note that some people are currently arguing that sys.argv should be an
array of bytes objects, and Guido has not yet said "no".  In that
case, all of the current proposals should have exactly this result.

My position is that if you do something that depends on the internal
representation of implementation-dependent objects, you deserve
whatever results you get.


From steven.bethard at gmail.com  Tue Sep 18 08:40:47 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 18 Sep 2007 00:40:47 -0600
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EEA276.4040901@ronadam.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz> <46EEA276.4040901@ronadam.com>
Message-ID: 

On 9/17/07, Ron Adam  wrote:
> Greg Ewing wrote:
> > Thomas Wouters wrote:
> >> If you want to put more meaning in the argv list, use an option
> >> parser.
> >
> > I want to put *less* meaning in it, not more. :-)
> > And using an argument parser is often overkill for
> > simple programs.
>
> Would it be possible to split out the (pre) parsing from optparse so that
> instead of returning a list, it returns a dictionary of attributes and values?
>
> This would only contain what was given in the command line as a first
> "lighter weight" step to parsing the command line.
>
>     opts = opt_parser(argv)
>     command_name = opts['argv0']   # better name for argv0?

You might look at argparse_ which allows you to treat positional
arguments just like optional ones.  So you'd write::

    parser = argparse.ArgumentParser()
    parser.add_argument('command') # positional argument
    parser.add_argument('--option') # optional argument
    args = parser.parse_args()
    ... args.command ...
    ... args.option ...

If you're really insistent on a dict interface instead of an attribute
interface, the object returned by parse_args() is just a simple
namespace, so vars(args) will give you a dict.

.. _argparse: http://argparse.python-hosting.com/

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From qrczak at knm.org.pl  Tue Sep 18 11:12:19 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 18 Sep 2007 11:12:19 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1190106739.23701.17.camel@qrnik>

Dnia 18-09-2007, Wt o godzinie 13:08 +0900, Stephen J. Turnbull
napisa?(a):

>  > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
>  > point of view of UTF-8.
> 
> It is from the point of view of the Unicode standard, specifically v5.
> Please see section 16.5, especially about the "corporate use subarea".

It is not. 16.5 doesn't say anything about UTF-8, and UTF-8 is already
specified for PUA.

>  > UTF-8 is defined for all Unicode scalar values,
> 
> Sure, and what I propose is entirely compatible with the specification
> of UTF-8 as a UTF,

It is not. In UTF-8 '\ue650' is b'\xEE\x99\x90', in your proposal it
might be encoded as a single byte.

>  > "C10. When a process interprets a code unit sequence which purports to
>  > be in a Unicode character encoding form, it shall treat ill-formed code
>  > unit sequences as an error condition and shall not interpret such
>  > sequences as characters."
> 
> Yeah, that's the one.
> 
> While I'm uncomfortable advocating the position that my proposal is
> entirely compatible with C10,

It is not. Elements of PUA are characters.

> it is arguable that "mapping code units to
> characters in private space" is not the same as "interpreting them as
> characters".

It's not the same, but interpreting as characters in PUA is obviously
interpreting as characters.

> chibi:MacPorts steve$ python -c 'import sys; print("%x" % ord(sys.argv[1]))' $(printf "\ue650") 
> Traceback (most recent call last):
>   File "", line 1, in ?
> TypeError: ord() expected a character, but string of length 6 found

I meant Python3 where sys.argv is a list of Unicode strings. It should
work out of the box.

Why length 6? "\ue650" encoded in UTF-8 has length 3.

For an old discussion about using PUA to represent bytes undecodable
as UTF-8, see http://www.mail-archive.com/unicode at unicode.org/ and
subthreads with "roundtripping" in the subject.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From guido at python.org  Tue Sep 18 17:11:41 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 08:11:41 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

On 9/17/07, Stephen J. Turnbull  wrote:
> Note that some people are currently arguing that sys.argv should be an
> array of bytes objects, and Guido has not yet said "no".

Then let me say "no" now. I'd be happy to support a lower-level API
for getting at the actual bytes in the C-level argv and env (even
taking into account modifications to these made by C code out of our
control; and in Windows we should provide access to the command line
text as well). But argv and environ should be strings. If they contain
non-ASCII bytes I am currently in favor os doing a best-effort
decoding using the default locale encoding, replacing errors with '?'
rather than throwing exception.

Others have already explained why (they are typically text entered by a user).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 18 18:50:08 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 09:50:08 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
Message-ID: 

No takers? What about those repeated +42 voters? Does anyone want
immutable bytes enough to do a teensy bit of work?

--Guido

On 9/17/07, Guido van Rossum  wrote:
> This may have passed in a thread where no-one was listening, so I'm
> repeating it here.
>
> I'm considering the following option: bytes would always be immutable,
> and for the few places (mostly in io.py) where a mutable bytes buffer
> would be handy, we use the array module. Then it would also make sense
> to make b[0] return a bytes array of length 1 instead of a small int
> -- bytes would be more similar to str in 2.x, albeit completely
> incompatible with str in terms of mixed operations.
>
> It would help if someone explored creating a patch to implement this,
> just to see the minimum amount of code that would need to change
> compared to 3.0a1. (The challenge includes making all the tests pass
> again.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jyasskin at gmail.com  Tue Sep 18 19:19:51 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Tue, 18 Sep 2007 10:19:51 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
Message-ID: <5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>

I'll take it. I assume it's just a matter of removing the mutating
methods and making the tests pass? I saw but didn't read a couple
threads about the buffer API... how much has to change there?

On 9/18/07, Guido van Rossum  wrote:
> No takers? What about those repeated +42 voters? Does anyone want
> immutable bytes enough to do a teensy bit of work?
>
> --Guido
>
> On 9/17/07, Guido van Rossum  wrote:
> > This may have passed in a thread where no-one was listening, so I'm
> > repeating it here.
> >
> > I'm considering the following option: bytes would always be immutable,
> > and for the few places (mostly in io.py) where a mutable bytes buffer
> > would be handy, we use the array module. Then it would also make sense
> > to make b[0] return a bytes array of length 1 instead of a small int
> > -- bytes would be more similar to str in 2.x, albeit completely
> > incompatible with str in terms of mixed operations.
> >
> > It would help if someone explored creating a patch to implement this,
> > just to see the minimum amount of code that would need to change
> > compared to 3.0a1. (The challenge includes making all the tests pass
> > again.)
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins

From fdrake at acm.org  Tue Sep 18 19:28:26 2007
From: fdrake at acm.org (Fred Drake)
Date: Tue, 18 Sep 2007 13:28:26 -0400
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
Message-ID: 

On Sep 18, 2007, at 12:50 PM, Guido van Rossum wrote:
> No takers? What about those repeated +42 voters? Does anyone want
> immutable bytes enough to do a teensy bit of work?

Dang, Guido!  I don't eat, sleep, or breath any more; how quick do  
you expect me to jump?

I'll take a look at it as soon as I can, but won't object if someone  
beats me to it.


   -Fred

-- 
Fred Drake   




From guido at python.org  Tue Sep 18 19:30:35 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 10:30:35 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
Message-ID: 

On 9/18/07, Jeffrey Yasskin  wrote:
> I'll take it. I assume it's just a matter of removing the mutating
> methods and making the tests pass?

And adding __hash__. And (but this could be a separate, later change)
switch indexing to return 1-char bytes arrays instead of small ints.
And similar changes to the constructor.

Of course, the devil is in the "making the tests pass".

> I saw but didn't read a couple
> threads about the buffer API... how much has to change there?

The bytes buffer API should refuse requests for writable buffers.

Since you're so close, please do interrupt me over IM to review
incomplete work or ideas!

--Guido

> On 9/18/07, Guido van Rossum  wrote:
> > No takers? What about those repeated +42 voters? Does anyone want
> > immutable bytes enough to do a teensy bit of work?
> >
> > --Guido
> >
> > On 9/17/07, Guido van Rossum  wrote:
> > > This may have passed in a thread where no-one was listening, so I'm
> > > repeating it here.
> > >
> > > I'm considering the following option: bytes would always be immutable,
> > > and for the few places (mostly in io.py) where a mutable bytes buffer
> > > would be handy, we use the array module. Then it would also make sense
> > > to make b[0] return a bytes array of length 1 instead of a small int
> > > -- bytes would be more similar to str in 2.x, albeit completely
> > > incompatible with str in terms of mixed operations.
> > >
> > > It would help if someone explored creating a patch to implement this,
> > > just to see the minimum amount of code that would need to change
> > > compared to 3.0a1. (The challenge includes making all the tests pass
> > > again.)
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
> >
>
>
> --
> Namast?,
> Jeffrey Yasskin
> http://jeffrey.yasskin.info/
>
> "Religion is an improper response to the Divine." ? "Skinny Legs and
> All", by Tom Robbins
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Tue Sep 18 22:36:41 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 19 Sep 2007 05:36:41 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1190106739.23701.17.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
	
	<46E96E98.9080406@v.loewis.de> <1189711575.22693.86.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
Message-ID: <18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>

>>>>> "Marcin 'Qrczak' Kowalczyk"  writes:

 >>  > This is wrong: UTF-8 is specified for PUA. PUA is no special from the
 >>  > point of view of UTF-8.
 >
 >> It is from the point of view of the Unicode standard, specifically v5.
 >> Please see section 16.5, especially about the "corporate use subarea".
 >
 > It is not. 16.5 doesn't say anything about UTF-8, and UTF-8 is already
 > specified for PUA.

There's no UTF-8 in Python's internal string encoding.  What are you
talking about?

 >> Sure, and what I propose is entirely compatible with the specification
 >> of UTF-8 as a UTF,
 >
 > It is not. In UTF-8 '\ue650' is b'\xEE\x99\x90', in your proposal it
 > might be encoded as a single byte.

Of course not; the point of the proposal is to ensure that all text
can be round-tripped through Python's internal representation.
Anything that comes in as a character through a codec using my
exception handler will be the same character when output with that
handler.  Again, what are you talking about?

 >> While I'm uncomfortable advocating the position that my proposal is
 >> entirely compatible with C10,
 >
 > It is not. Elements of PUA are characters.

Yes.  Where did I say anything else?

 > It's not the same, but interpreting as characters in PUA is obviously
 > interpreting as characters.

No.  Internally mapping to characters in PUA is mapping.  Unicode does
not try to restrict internal processing, only behavior at process
boundaries.  Interpretation as characters happens only on output.

I do not yet know how to prevent that (or even if I can, it may be
practically impossible because of important cases where the internal
representation is exchanged between processes).  If it can't be
prevented while maintaining efficiency, that is a major flaw (but not
necessarily fatal, since I'm proposing an exception handler, not a
required feature of Unicode codecs).

 > I meant Python3 where sys.argv is a list of Unicode strings. It should
 > work out of the box.

I really don't think so.  Exposing internal representations as you are
doing here is your problem; it is not something that Python should
attempt to guarantee will work.

More troublesome from your point of view, Guido has stated that the
internal representation used by Python strings is a sequence of
Unicode code units, not characters.  I don't think that's reached the
status of "pronouncement" yet, but you will probably need a PEP to get
the guarantees you want.

 > Why length 6? "\ue650" encoded in UTF-8 has length 3.

MS UTF-8, I suppose.  You see, you simply cannot depend on any
particular Python string being translated to a particular Unicode
representation unless you choose the codec explicitly.  Since you have
to specify that codec to be reliable anyway, I don't see much loss
here except to lazy programmers willing to live dangerously.  But
that's not true of anybody in this thread!  The whole point is to
preserve even broken input for later forensic analysis.

 > For an old discussion about using PUA to represent bytes undecodable
 > as UTF-8, see http://www.mail-archive.com/unicode at unicode.org/ and
 > subthreads with "roundtripping" in the subject.

Which (after a half hour of looking) are mostly irrelevant, because
Mr. Kristan's proposal (I assume that's what you're talking about) as
far as I can see involved standardizing such representations within
Unicode.  We're not talking about that here; we're talking about
representations internal to Python, for the convenience of Python
users.


From rrr at ronadam.com  Tue Sep 18 22:52:50 2007
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 18 Sep 2007 15:52:50 -0500
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>	
	<46EA5114.9060200@coli.uni-saarland.de>	
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>	
	<46EC8909.4050300@canterbury.ac.nz>	
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>	
	<46EDFCBB.8010306@canterbury.ac.nz> <46EEA276.4040901@ronadam.com>
	
Message-ID: <46F03AA2.4090000@ronadam.com>



Steven Bethard wrote:
> On 9/17/07, Ron Adam  wrote:
>> Greg Ewing wrote:
>>> Thomas Wouters wrote:
>>>> If you want to put more meaning in the argv list, use an option
>>>> parser.
>>> I want to put *less* meaning in it, not more. :-)
>>> And using an argument parser is often overkill for
>>> simple programs.
>> Would it be possible to split out the (pre) parsing from optparse so that
>> instead of returning a list, it returns a dictionary of attributes and values?
>>
>> This would only contain what was given in the command line as a first
>> "lighter weight" step to parsing the command line.
>>
>>     opts = opt_parser(argv)
>>     command_name = opts['argv0']   # better name for argv0?
> 
> You might look at argparse_ which allows you to treat positional
> arguments just like optional ones.  So you'd write::
> 
>     parser = argparse.ArgumentParser()
>     parser.add_argument('command') # positional argument
>     parser.add_argument('--option') # optional argument
>     args = parser.parse_args()
>     ... args.command ...
>     ... args.option ...
> 
> If you're really insistent on a dict interface instead of an attribute
> interface, the object returned by parse_args() is just a simple
> namespace, so vars(args) will give you a dict.
> 
> .. _argparse: http://argparse.python-hosting.com/

I think a dict interface or even a (list, dict) interface is better in this 
case.  It makes it much easier to use these in already existing functions 
and other objects.

Once an objects data is stored in named attributes, it becomes a more 
specialized data structure and requires more specialized functions and 
objects to make use of it.  In the above case the attribute names are not 
even consistent because they depend on the .add_argument() calls.  I think 
this makes it harder to write reusable code.


If the parser returned a list and dictionary pair, it might make it easy to 
use the (*args, **kwds) form to pass these values directly to functions or 
other objects.

That also gives an easy and light weight way to validate command line 
arguments in the simplest cases without a lot of work.  Just let the 
function receiving them validate its arguments at call time.

Regards,
   Ron



From jimjjewett at gmail.com  Tue Sep 18 23:19:46 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 18 Sep 2007 17:19:46 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

On 9/18/07, Stephen J. Turnbull  wrote:

> There's no UTF-8 in Python's internal string encoding.  What are you
> talking about?

(At least as of a few days ago)

In Python 3 there is; strings are unicode.  A PyUnicodeObject object
has two encodings that you can grab from a pointer (which means they
have to be there; you don't have time to generate them like you would
with a function pointer).

One of these (str) is the "internal encoding" which is chosen at
compile time, and the other (defenc) is now hard-coded to UTF-8.

Hashing is also based on the UTF-8 bytestring.

-jJ

From guido at python.org  Tue Sep 18 23:26:09 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 14:26:09 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik> <1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
	
Message-ID: 

On 9/18/07, Jim Jewett  wrote:
> On 9/18/07, Stephen J. Turnbull  wrote:
>
> > There's no UTF-8 in Python's internal string encoding.  What are you
> > talking about?
>
> (At least as of a few days ago)
>
> In Python 3 there is; strings are unicode.  A PyUnicodeObject object
> has two encodings that you can grab from a pointer (which means they
> have to be there; you don't have time to generate them like you would
> with a function pointer).

Incorrect. The pointer can be NULL. The API for getting the UTF-8
encoding is a function (moreover a function whose name starts with
_Py).

> One of these (str) is the "internal encoding" which is chosen at
> compile time, and the other (defenc) is now hard-coded to UTF-8.
>
> Hashing is also based on the UTF-8 bytestring.

Not any more as of a few hours ago; the hashing based on UTF-8 was
excessively expensive, and I rewrote it to directly use the code
units(?) (or whatever they are called -- the Py_UNICODE values). For
strings not using code units(?) > 2**16 this will give the same value
on all platforms; if there are code units(?) >= 2**16 results vary
since these will be represented as surrogates on 2-byte systems but
not on 4-byte systems.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Sep 18 23:24:24 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 19 Sep 2007 09:24:24 +1200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <46EF3AFC.7000904@acm.org>
References: 
	<46EF3AFC.7000904@acm.org>
Message-ID: <46F04208.8070704@canterbury.ac.nz>

Talin wrote:
> Data Type   AbstractSequence  Immutable   Mutable
> =========   ================  =========   =======
> byte        ByteSequence      bytes       buffer
> character   CharSequence      str         strbuf
> 
> 'buffer' could be an array.array, although if it's used frequently 
> enough an optimized special-case 'buffer' class might be better.

I'd prefer to keep the term 'buffer' for an object that
exposes the buffer interface of another object.

I suggest calling it a 'bytearray' if you want a specialised
type for it.

--
Greg

From guido at python.org  Tue Sep 18 23:29:41 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 14:29:41 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
References: <1189700532.22693.40.camel@qrnik>
	<18153.42916.640227.483752@uwakimon.sk.tsukuba.ac.jp>
	<1189722696.30037.14.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

On 9/18/07, Stephen J. Turnbull  wrote:
> Guido has stated that the
> internal representation used by Python strings is a sequence of
> Unicode code units, not characters.  I don't think that's reached the
> status of "pronouncement" yet, but you will probably need a PEP to get
> the guarantees you want.

I think of this as cast in stone; we can't reasonably guarantee more
if we want to be compatible with the UTF-16 (*) Unicode
representations used on Windows and in Java. How much more
pronouncement do you want?

(*) I'm not at all sure that it's called that -- you guys keep asking
trick questions based on terminology that's only clear to people who
have read the Unicode standard several times forwards and backwards. I
mean the representation that uses 16-bit values, where characters >=
2**16 are represented as two 16-bit "surrogate" values. (I hope I at
least have the 'surrogate' thing right this time.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Sep 18 23:25:55 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 19 Sep 2007 09:25:55 +1200
Subject: [Python-3000] Stackless anyone ?
In-Reply-To: <46EF38CF.4020801@acm.org>
References: <1189949664.5502.3.camel@schlepp> 
	<46EF38CF.4020801@acm.org>
Message-ID: <46F04263.9000609@canterbury.ac.nz>

Talin wrote:
> the ultimate solution to Python concurrency won't be via patching 
> CPython, but to compile the meta-Python language to a back-end 
> representation that is inherently  concurrent.

You can't get something for nothing, though -- that
"inherently concurrent" back-end representation will
have to deal with all the same issues one way or
another.

--
Greg

From jimjjewett at gmail.com  Wed Sep 19 00:23:18 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 18 Sep 2007 18:23:18 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<18154.9232.740864.946506@uwakimon.sk.tsukuba.ac.jp>
	<1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
	
	
Message-ID: 

On 9/18/07, Guido van Rossum  wrote:
> On 9/18/07, Jim Jewett  wrote:
> > On 9/18/07, Stephen J. Turnbull  wrote:

> > > There's no UTF-8 in Python's internal string encoding.

> > (At least as of a few days ago)

> > In Python 3 there is; strings are unicode.  A PyUnicodeObject object
> > has two encodings that you can grab from a pointer (which means
> > they have to be there; you don't have time to generate them like
> > you would with a function pointer).

> Incorrect. The pointer can be NULL.

I had missed that comment, but I do see it now; thank you.

> The API for getting the UTF-8 encoding is a function

Thank you.  But given that defenc is now always UTF-8, won't exposing
it in the public typedef then just be an attractive nuisance?

> (moreover a function whose name starts with _Py).

That I still don't see.

http://svn.python.org/view/python/branches/py3k/Include/unicodeobject.h?rev=57656&view=markup

PyAPI_FUNC(PyObject*) PyUnicode_AsUTF8String(
    PyObject *unicode	 	/* Unicode object */
    );

PyAPI_FUNC(PyObject*) PyUnicode_EncodeUTF8(
    const Py_UNICODE *data, 	/* Unicode char buffer */
    Py_ssize_t length,	 	/* number of Py_UNICODE chars to encode */
    const char *errors		/* error handling */
    );


Later, the same file shows me:

/* --- Unicode Type ------------------------------------------------------- */

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;		/* Length of raw Unicode data in buffer */
    Py_UNICODE *str;		/* Raw Unicode buffer */
    long hash;			/* Hash value; -1 if not set */
    int state;			/* != 0 if interned. In this case the two
    				 * references from the dictionary to this object
    				 * are *not* counted in ob_refcnt. */
    PyObject *defenc;		/* (Default) Encoded version as Python
				   string, or NULL; this is used for
				   implementing the buffer protocol */
} PyUnicodeObject;


I would be happier with:

typedef struct {
    PyObject_VAR_HEAD		/* Length in code points, not chars */
} PyUnicodeObject;

And, in unicodeobject.c (*not* in a public header)

typedef struct {
    PyUnicodeObject ob_unicodehead;
    Py_UNICODE *str;		/* Raw Unicode buffer */
    long hash;			/* Hash value; -1 if not set */
    int state;			/* != 0 if interned. In this case the two
    				 * references from the dictionary to this object
    				 * are *not* counted in ob_refcnt. */
    PyObject *defenc;		/* (Default) Encoded version as Python
				   string, or NULL; this is used for
				   implementing the buffer protocol */
} _PyDefaultUnicodeObject;

As this would allow 3rd parties to create implementations specialized
for (and saving space on) smaller alphabets, without breaking C
extensions that stick to the public header files.  (Moving hash or
even state to the public header might be OK too, but they seemed to
get ignored for subclasses anyhow.)

-jJ

From pje at telecommunity.com  Wed Sep 19 00:21:05 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 18 Sep 2007 18:21:05 -0400
Subject: [Python-3000] Stackless anyone ?
In-Reply-To: <46F04263.9000609@canterbury.ac.nz>
References: <1189949664.5502.3.camel@schlepp> 
	<46EF38CF.4020801@acm.org> <46F04263.9000609@canterbury.ac.nz>
Message-ID: <20070918222729.9E38F3A40AC@sparrow.telecommunity.com>

At 09:25 AM 9/19/2007 +1200, Greg Ewing wrote:
>Talin wrote:
> > the ultimate solution to Python concurrency won't be via patching
> > CPython, but to compile the meta-Python language to a back-end
> > representation that is inherently  concurrent.
>
>You can't get something for nothing, though -- that
>"inherently concurrent" back-end representation will
>have to deal with all the same issues one way or
>another.

Right, but since you can write PyPy "C" extensions in RPython, the 
part you actually get for "free" is that PyPy extensions don't need 
to be written so as to take concurrency into account.  Those bits can 
be delegated to the "object space", in PyPy terms.



From guido at python.org  Wed Sep 19 00:29:24 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 15:29:24 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik> <1189756174.32337.30.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
	
	
	
Message-ID: 

On 9/18/07, Jim Jewett  wrote:
> On 9/18/07, Guido van Rossum  wrote:
> > On 9/18/07, Jim Jewett  wrote:
> > > On 9/18/07, Stephen J. Turnbull  wrote:
>
> > > > There's no UTF-8 in Python's internal string encoding.
>
> > > (At least as of a few days ago)
>
> > > In Python 3 there is; strings are unicode.  A PyUnicodeObject object
> > > has two encodings that you can grab from a pointer (which means
> > > they have to be there; you don't have time to generate them like
> > > you would with a function pointer).
>
> > Incorrect. The pointer can be NULL.
>
> I had missed that comment, but I do see it now; thank you.
>
> > The API for getting the UTF-8 encoding is a function
>
> Thank you.  But given that defenc is now always UTF-8, won't exposing
> it in the public typedef then just be an attractive nuisance?

*ALL* fields of the struct def are strictly internal.

> > (moreover a function whose name starts with _Py).
>
> That I still don't see.

I am talking about _PyUnicode_AsDefaultEncoding(). (Which you
shouldn't be calling. :-)

> http://svn.python.org/view/python/branches/py3k/Include/unicodeobject.h?rev=57656&view=markup
>
> PyAPI_FUNC(PyObject*) PyUnicode_AsUTF8String(
>     PyObject *unicode           /* Unicode object */
>     );
>
> PyAPI_FUNC(PyObject*) PyUnicode_EncodeUTF8(
>     const Py_UNICODE *data,     /* Unicode char buffer */
>     Py_ssize_t length,          /* number of Py_UNICODE chars to encode */
>     const char *errors          /* error handling */
>     );
>
>
> Later, the same file shows me:
>
> /* --- Unicode Type ------------------------------------------------------- */
>
> typedef struct {
>     PyObject_HEAD
>     Py_ssize_t length;          /* Length of raw Unicode data in buffer */
>     Py_UNICODE *str;            /* Raw Unicode buffer */
>     long hash;                  /* Hash value; -1 if not set */
>     int state;                  /* != 0 if interned. In this case the two
>                                  * references from the dictionary to this object
>                                  * are *not* counted in ob_refcnt. */
>     PyObject *defenc;           /* (Default) Encoded version as Python
>                                    string, or NULL; this is used for
>                                    implementing the buffer protocol */
> } PyUnicodeObject;
>
>
> I would be happier with:
>
> typedef struct {
>     PyObject_VAR_HEAD           /* Length in code points, not chars */
> } PyUnicodeObject;
>
> And, in unicodeobject.c (*not* in a public header)
>
> typedef struct {
>     PyUnicodeObject ob_unicodehead;
>     Py_UNICODE *str;            /* Raw Unicode buffer */
>     long hash;                  /* Hash value; -1 if not set */
>     int state;                  /* != 0 if interned. In this case the two
>                                  * references from the dictionary to this object
>                                  * are *not* counted in ob_refcnt. */
>     PyObject *defenc;           /* (Default) Encoded version as Python
>                                    string, or NULL; this is used for
>                                    implementing the buffer protocol */
> } _PyDefaultUnicodeObject;
>
> As this would allow 3rd parties to create implementations specialized
> for (and saving space on) smaller alphabets, without breaking C
> extensions that stick to the public header files.  (Moving hash or
> even state to the public header might be OK too, but they seemed to
> get ignored for subclasses anyhow.)

That is not a supported use case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From foom at fuhm.net  Wed Sep 19 00:52:18 2007
From: foom at fuhm.net (James Y Knight)
Date: Tue, 18 Sep 2007 18:52:18 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
Message-ID: <32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>


On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:
> If they contain
> non-ASCII bytes I am currently in favor os doing a best-effort
> decoding using the default locale encoding, replacing errors with '?'
> rather than throwing exception.

One of the more common things to do with command line arguments is  
open them. So, it'd really be nice if:

python -c 'import sys; open(sys.argv[1])' [some filename]

would always work, regardless of the current system encoding and what  
characters make up the filename.  Note that filenames are essentially  
random binary gunk in most Unix systems; the encoding is unspecified,  
and there can in fact be multiple encodings, even for different  
directories making up a single file's path.

I'd like to propose that python simply assume the external world is  
likely to be UTF-8, and always decode command-line arguments (and  
environment vars), and encode for filesystem operations using the  
roundtrip-able UTF-8b. Even if the system says its encoding is  
iso-2022 or some other abomination. This has upsides (simple, doesn't  
trample on PUA codepoints, only needs one new codec, never throws  
exception in the above example, and really is correct much of the  
time), and downsides (if the system locale is iso-2022, and all the  
filenames you're dealing with really are also properly encoded in  
iso-2022, it might be nice if they decoded into the sensible unicode  
string, instead of a non-sensical (but still round-trippable) one.

I think the advantages outweigh the disadvantages, but the world I  
live in, using anything other than UTF8 or ASCII is grounds for entry  
into an insane asylum. ;)

James



From guido at python.org  Wed Sep 19 01:00:09 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 18 Sep 2007 16:00:09 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
References: <1189700532.22693.40.camel@qrnik>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
Message-ID: 

On 9/18/07, James Y Knight  wrote:
>
> On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:
> > If they contain
> > non-ASCII bytes I am currently in favor os doing a best-effort
> > decoding using the default locale encoding, replacing errors with '?'
> > rather than throwing exception.
>
> One of the more common things to do with command line arguments is
> open them. So, it'd really be nice if:
>
> python -c 'import sys; open(sys.argv[1])' [some filename]

I'd like this too, but it isn't easy.

> would always work, regardless of the current system encoding and what
> characters make up the filename.  Note that filenames are essentially
> random binary gunk in most Unix systems; the encoding is unspecified,
> and there can in fact be multiple encodings, even for different
> directories making up a single file's path.
>
> I'd like to propose that python simply assume the external world is
> likely to be UTF-8, and always decode command-line arguments (and
> environment vars), and encode for filesystem operations using the
> roundtrip-able UTF-8b. Even if the system says its encoding is
> iso-2022 or some other abomination. This has upsides (simple, doesn't
> trample on PUA codepoints, only needs one new codec, never throws
> exception in the above example, and really is correct much of the
> time), and downsides (if the system locale is iso-2022, and all the
> filenames you're dealing with really are also properly encoded in
> iso-2022, it might be nice if they decoded into the sensible unicode
> string, instead of a non-sensical (but still round-trippable) one.
>
> I think the advantages outweigh the disadvantages, but the world I
> live in, using anything other than UTF8 or ASCII is grounds for entry
> into an insane asylum. ;)

You seem to be contradicting yourself. The world *isn't* using
UTF-8(b) predominantly yet, so assuming UTF-8(b) everywhere will break
your first requirement.

Two encodings are more likely (though not guaranteed) to produce
success: the locale encoding or the filesystem encoding. I'm thinking
that the locale encoding is probably the one to use for argv and
environ, since at least the user can change it in order to make things
work.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Wed Sep 19 03:40:40 2007
From: thomas at python.org (Thomas Wouters)
Date: Tue, 18 Sep 2007 18:40:40 -0700
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46EDFCBB.8010306@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz>
Message-ID: <9e804ac0709181840gb33da5co5db610aa2cd6bbeb@mail.gmail.com>

On 9/16/07, Greg Ewing  wrote:
>
> Thomas Wouters wrote:
> > If you want to put more meaning in the argv list, use an option
> > parser.
>
> I want to put *less* meaning in it, not more. :-)


Then why are you discriminating against argv[0]? It's just another member of
the argv list the OS gives us.

And using an argument parser is often overkill for
> simple programs.


So is trying to "fix" this non-issue.

> The _actual_ meaning of each element depends entirely on the
> > program that's started. For Python-the-language, there isn't any
> > difference between them.
>
> So in your Python programs, you're quite happy
> to write
>
>    for arg in sys.argv:
>      process(arg)
>
> and not care about what this does with argv[0]?


No. I'm quite happy to realize the argv list is what the shell executed. I'm
also quite happy to use a proper option parser even for my simple programs.
It adds useful defaults even if I didn't think I'd ever use them.

I hardly see how one can claim that there's
> "no difference" between argv[0] and the rest
> for practical purposes.


The only meaning is by accident of position. For most programs, the very
same thing goes for the rest of the arguments: 'mv foo bar' assigns a
different meaning to 'foo' than it does to 'bar'. Notice how
sys.argvmatches what the user typed, including
sys.argv[0].

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070918/481aff7f/attachment.htm 

From stephen at xemacs.org  Wed Sep 19 07:00:51 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 19 Sep 2007 14:00:51 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
References: <1189700532.22693.40.camel@qrnik>
	<87wsut7srm.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EA1734.6020103@canterbury.ac.nz>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
Message-ID: <873axbi70c.fsf@uwakimon.sk.tsukuba.ac.jp>

James Y Knight writes:

 > iso-2022 or some other abomination. This has upsides (simple, doesn't  
 > trample on PUA codepoints, only needs one new codec, never throws  
 > exception in the above example, and really is correct much of the  
 > time), and downsides (if the system locale is iso-2022, and all the  
 > filenames you're dealing with really are also properly encoded in  
 > iso-2022, it might be nice if they decoded into the sensible unicode  
 > string, instead of a non-sensical (but still round-trippable) one.

ISO 2022, like Unicode, is an extensible standard.  Corporate
character sets in Asia extend, but are not easy to distinguish from
each other though they often conflict.  They're not proper in the
sense that they abuse the registered final bytes of the national
standards they're based on, but it's also not reasonable for those of
us who live there to ignore them.

 > I think the advantages outweigh the disadvantages, but the world I  
 > live in, using anything other than UTF8 or ASCII is grounds for entry  
 > into an insane asylum. ;)

You're very fortunate.  In the world I live in, Shift JIS, which isn't
even ISO 2022 compatible, is mandated by a power higher even than the
Borg of Redmond: the telephone company.


From victor.stinner at haypocalc.com  Wed Sep 19 12:12:10 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 19 Sep 2007 12:12:10 +0200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
Message-ID: <200709191212.10502.victor.stinner@haypocalc.com>

Hi,

On Tuesday 18 September 2007 04:18:01 Guido van Rossum wrote:
> I'm considering the following option: bytes would always be immutable,
> (...) make b[0] return a bytes array of length 1 instead of a small int

Great idea! That will help migration from Python 2.x to Python 3.0. Choosing 
between byte and character string is already a difficult choice. So choosing 
between mutable (current bytes type) and immutable string (current str type) 
is a more difficult choice.

And substring behaviour change (python 2.x => 3) was also strange for python 
programmers.

>>> 'xyz'[0]
'x'
>>> b"xyz"[0]
120

This result is not symmetric. I would prefer what Guido proposes:

>>> 'xyz'[0]
'x'
>>> b"xyz"[0]
b'x'

And so be able to write such tests:

>>> b"xyz"[:2] == b'xy'
True
>>> b"xyz"[0:1] == b'x'
True
>>> b"xyz"[0] == b'x'
True

Victor Stinner
http://hachoir.org/

From victor.stinner at haypocalc.com  Wed Sep 19 12:40:33 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 19 Sep 2007 12:40:33 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <1189700532.22693.40.camel@qrnik>
References: <1189700532.22693.40.camel@qrnik>
Message-ID: <200709191240.33698.victor.stinner@haypocalc.com>

Hi,

On Thursday 13 September 2007 18:22:12 Marcin 'Qrczak' Kowalczyk wrote:
> What should happen when a command line argument or an environment
> variable is not decodable using the system encoding (on Unix where
> from the OS point of view it is an array of bytes)?

On Linux, filenames are *byte* string and not *character* string. I always 
have his problem with Python 2.x. I converted filename (argv[x]) to Unicode 
to be able to format error messages in full unicode... but it's not possible. 
Linux allows invalid utf8 filename even on full utf8 installation (ubuntu), 
see Marcin's examples.

So I propose to keep sys.argv as byte string array. If you try to create 
unicode strings, you will be unable to write a program to convert filesystem 
with "broken" filenames (see convmv program for example) or open file with 
broken "filename" (broken: invalid byte sequence for UTF/JIS/Big5/... 
charset).

---

For Python 2.x, my solution is to keep byte string for I/O and use unicode 
string for error messages. Function to convert any byte string (filename 
string) to Unicode:

def unicodeFilename(filename, charset=None):
    if not charset:
        charset = getTerminalCharset()
    try:
        return unicode(filename, charset)
    except UnicodeDecodeError:
        return makePrintable(filename, charset, to_unicode=True)

makePrintable() replace invalid byte sequence by escape string, example:

>>> from hachoir_core.tools import makePrintable
>>> makePrintable("a\x80", "utf8", to_unicode=True)
u'a\\x80'
>>> print makePrintable("a\x80", "utf8", to_unicode=True)
a\x80

Source code of function makePrintable:
http://hachoir.org/browser/trunk/hachoir-core/hachoir_core/tools.py#L225

Source code of function getTerminalCharset():
http://hachoir.org/browser/trunk/hachoir-core/hachoir_core/i18n.py#L23

Victor Stinner
http://hachoir.org/

From stephen at xemacs.org  Wed Sep 19 18:12:51 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 20 Sep 2007 01:12:51 +0900
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <200709191240.33698.victor.stinner@haypocalc.com>
References: <1189700532.22693.40.camel@qrnik>
	<200709191240.33698.victor.stinner@haypocalc.com>
Message-ID: <878x72y6po.fsf@uwakimon.sk.tsukuba.ac.jp>

Victor Stinner writes:

 > On Thursday 13 September 2007 18:22:12 Marcin 'Qrczak' Kowalczyk wrote:
 > > What should happen when a command line argument or an environment
 > > variable is not decodable using the system encoding (on Unix where
 > > from the OS point of view it is an array of bytes)?
 > 
 > On Linux, filenames are *byte* string and not *character* string. I always 
 > have his problem with Python 2.x. I converted filename (argv[x]) to Unicode 
 > to be able to format error messages in full unicode... but it's not possible.
 > Linux allows invalid utf8 filename even on full utf8 installation (ubuntu), 
 > see Marcin's examples.

This should be solved by providing library facilities to handle these
conditions.  Users and programmers may "know" that file names are
actually raw bytes obeying a set of restrictions unique to file names,
but they expect to be able to *use* them as characters, and 99.44% of
the time that just works.[1]  Even for the Japanese, who have
over 1500 years' experience in creating unusable writing systems.

 > So I propose to keep sys.argv as byte string array. If you try to create 
 > unicode strings, you will be unable to write a program to convert filesystem 
 > with "broken" filenames (see convmv program for example) or open file with 
 > broken "filename" (broken: invalid byte sequence for UTF/JIS/Big5/... 
 > charset).

This is simply not true.  Any of the proposals (Martin's, Marcin's,
James's, mine) will make this *possible*.  It's just less convenient
for the programmer who wishes to deal with such situations.  This
inconvenience is IMO more than balanced by the convenience for the
programmer who lives his life in ASCII or whose users just don't do
stuff like that, or who's writing a one-off script and doesn't care.

N.B.  You don't need to go farther than your favorite rootkit to find
broken filenames such as "^J" (linefeed).  This doesn't cause problems
specific to Unicode, of course, but it does demonstrate that a
library designed to help with weird file names has broader
applicability than just translation to Unicode strings.



Footnotes: 
[1]  "99.44%" is an expression of "very pure" derived from an
advertising campaign for soap.  Here it's an exaggeration, I guess,
but nobody knows how much.


From lists at cheimes.de  Wed Sep 19 18:42:27 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 19 Sep 2007 18:42:27 +0200
Subject: [Python-3000] New io system and binary data
Message-ID: 

Today I stumbled over another problem that is related to the unicode and
OS string topic. The new io system - or to be more precisely the
implicit converting of input and output data to UTF-8 makes it
impossible to pipe binary data through Python 3.0.

For example an user wants to write a filter for binary data like images
in Python. With Python 2.5 the input and output data isn't implicitly
converted:

# stdredirect.py
# simple stupid example
import sys
sys.stdout.write(sys.stdin.read())

$ chmod 755 stdredict.py
$ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
$ diff ./Mac/Demo/html.icons/python.gif out.gif

But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:

$ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
>out.gifTraceback (most recent call last):
  File "./stdredict.py", line 4, in 
    sys.stdout.write(sys.stdin.read())
  File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
    res += decoder.decode(self.buffer.read(), True)
  File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
invalid data

An easy workaround for the problem is:

sys.stdout = sys.stdout.buffer
sys.stdin = sys.stdin.buffer

I recommend that the problem and fix gets documented. Maybe stdin,
stdout and stderr should get a method that disables the implicit
conversion like setMode("b") / setMode("t").

Christian


From guido at python.org  Wed Sep 19 19:19:13 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 10:19:13 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
Message-ID: 

Changing the mode between text and binary is not feasible (since it
would have to change the class). But it is perfectly acceptable to use
sys.std{in,out}.buffer if you need to write a binary transparent
filter. Of course you'll be dealing with bytes at that point so the
usual cautions apply. I wouldn't do the assignments you propose
though, since that might surprise other code which expects text files.

--Guido

On 9/19/07, Christian Heimes  wrote:
> Today I stumbled over another problem that is related to the unicode and
> OS string topic. The new io system - or to be more precisely the
> implicit converting of input and output data to UTF-8 makes it
> impossible to pipe binary data through Python 3.0.
>
> For example an user wants to write a filter for binary data like images
> in Python. With Python 2.5 the input and output data isn't implicitly
> converted:
>
> # stdredirect.py
> # simple stupid example
> import sys
> sys.stdout.write(sys.stdin.read())
>
> $ chmod 755 stdredict.py
> $ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
> $ diff ./Mac/Demo/html.icons/python.gif out.gif
>
> But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:
>
> $ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
> >out.gifTraceback (most recent call last):
>   File "./stdredict.py", line 4, in 
>     sys.stdout.write(sys.stdin.read())
>   File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
>     res += decoder.decode(self.buffer.read(), True)
>   File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
> invalid data
>
> An easy workaround for the problem is:
>
> sys.stdout = sys.stdout.buffer
> sys.stdin = sys.stdin.buffer
>
> I recommend that the problem and fix gets documented. Maybe stdin,
> stdout and stderr should get a method that disables the implicit
> conversion like setMode("b") / setMode("t").

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Wed Sep 19 19:56:38 2007
From: janssen at parc.com (Bill Janssen)
Date: Wed, 19 Sep 2007 10:56:38 PDT
Subject: [Python-3000] New io system and binary data
In-Reply-To:  
References: 
	
Message-ID: <07Sep19.105646pdt."57996"@synergy1.parc.xerox.com>

GvR wrote:
> I wouldn't do the assignments you propose
> though, since that might surprise other code which expects text files.

But presumably that code wouldn't be used in that same program.

This really isn't a UTF-8 problem.  It is the problem with file opens
defaulting to "text" mode instead of "binary" mode rearing its ugly
head again.

Bill

> Changing the mode between text and binary is not feasible (since it
> would have to change the class). But it is perfectly acceptable to use
> sys.std{in,out}.buffer if you need to write a binary transparent
> filter. Of course you'll be dealing with bytes at that point so the
> usual cautions apply. I wouldn't do the assignments you propose
> though, since that might surprise other code which expects text files.
> 
> --Guido
> 
> On 9/19/07, Christian Heimes  wrote:
> > Today I stumbled over another problem that is related to the unicode and
> > OS string topic. The new io system - or to be more precisely the
> > implicit converting of input and output data to UTF-8 makes it
> > impossible to pipe binary data through Python 3.0.
> >
> > For example an user wants to write a filter for binary data like images
> > in Python. With Python 2.5 the input and output data isn't implicitly
> > converted:
> >
> > # stdredirect.py
> > # simple stupid example
> > import sys
> > sys.stdout.write(sys.stdin.read())
> >
> > $ chmod 755 stdredict.py
> > $ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif
> > $ diff ./Mac/Demo/html.icons/python.gif out.gif
> >
> > But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr:
> >
> > $ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py
> > >out.gifTraceback (most recent call last):
> >   File "./stdredict.py", line 4, in 
> >     sys.stdout.write(sys.stdin.read())
> >   File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read
> >     res += decoder.decode(self.buffer.read(), True)
> >   File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode
> >     (result, consumed) = self._buffer_decode(data, self.errors, final)
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13:
> > invalid data
> >
> > An easy workaround for the problem is:
> >
> > sys.stdout = sys.stdout.buffer
> > sys.stdin = sys.stdin.buffer
> >
> > I recommend that the problem and fix gets documented. Maybe stdin,
> > stdout and stderr should get a method that disables the implicit
> > conversion like setMode("b") / setMode("t").

From brett at python.org  Wed Sep 19 20:12:16 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 19 Sep 2007 11:12:16 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
Message-ID: 

On 9/17/07, Guido van Rossum  wrote:
> This may have passed in a thread where no-one was listening, so I'm
> repeating it here.
>
> I'm considering the following option: bytes would always be immutable,
> and for the few places (mostly in io.py) where a mutable bytes buffer
> would be handy, we use the array module. Then it would also make sense
> to make b[0] return a bytes array of length 1 instead of a small int
> -- bytes would be more similar to str in 2.x, albeit completely
> incompatible with str in terms of mixed operations.
>

How far do you want to push the similarity?  For instance, would ord()
start working on length 1 byte arrays or would int() be the only way
to get the integer out of the byte?

-Brett

From guido at python.org  Wed Sep 19 20:24:47 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 11:24:47 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
Message-ID: 

I think ord() would be fine.

On 9/19/07, Brett Cannon  wrote:
> On 9/17/07, Guido van Rossum  wrote:
> > This may have passed in a thread where no-one was listening, so I'm
> > repeating it here.
> >
> > I'm considering the following option: bytes would always be immutable,
> > and for the few places (mostly in io.py) where a mutable bytes buffer
> > would be handy, we use the array module. Then it would also make sense
> > to make b[0] return a bytes array of length 1 instead of a small int
> > -- bytes would be more similar to str in 2.x, albeit completely
> > incompatible with str in terms of mixed operations.
> >
>
> How far do you want to push the similarity?  For instance, would ord()
> start working on length 1 byte arrays or would int() be the only way
> to get the integer out of the byte?
>
> -Brett
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Sep 19 20:26:03 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 11:26:03 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <-7804278669952876495@unknownmsgid>
References: 
	
	<-7804278669952876495@unknownmsgid>
Message-ID: 

On 9/19/07, Bill Janssen  wrote:
> This really isn't a UTF-8 problem.  It is the problem with file opens
> defaulting to "text" mode instead of "binary" mode rearing its ugly
> head again.

You can repeat that until you're blue in the face but it's not going
to change. Way more programs (especially simple ones) deal with txet
than with binary data.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Wed Sep 19 21:34:39 2007
From: janssen at parc.com (Bill Janssen)
Date: Wed, 19 Sep 2007 12:34:39 PDT
Subject: [Python-3000] New io system and binary data
In-Reply-To:  
References: 
	
	<-7804278669952876495@unknownmsgid>
	
Message-ID: <07Sep19.123446pdt."57996"@synergy1.parc.xerox.com>

> You can repeat that until you're blue in the face but it's not going
> to change.

That happens to me a lot :-).

> Way more programs (especially simple ones) deal with txet
> than with binary data.

I'd love to see stats on that, Guido.  I'm sure it's true in your
immediate vicinity, given what you work on, but I don't believe it's
true in general.  And even for "text" files, it begs the several
questions of the expression of the text in a file, which is always a
binary artifact, due to the fact that files store bytes, not "text".

I'll shut up now...

Bill

From jason.orendorff at gmail.com  Wed Sep 19 21:58:49 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Wed, 19 Sep 2007 15:58:49 -0400
Subject: [Python-3000] New io system and binary data
In-Reply-To: <2296611449423656759@unknownmsgid>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
Message-ID: 

On 9/19/07, Bill Janssen  wrote:
> > Way more programs (especially simple ones) deal with txet
> > than with binary data.
>
> I'd love to see stats on that, Guido.  I'm sure it's true in your
> immediate vicinity, given what you work on, but I don't believe it's
> true in general.

Given the context (stdin/stdout/stderr), I'd love to know what you're
thinking of here.  I can't name a program offhand that wants to
operate on binary data via a pipeline.  There are a few that *can*,
like gzip, but my impression is that even those aren't often used that
way anymore.

-j

From skip at pobox.com  Wed Sep 19 21:59:54 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 19 Sep 2007 14:59:54 -0500
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
Message-ID: <18161.32698.291402.642086@montanaro.dyndns.org>


    Guido> You can repeat that until you're blue in the face but it's not
    Guido> going to change. Way more programs (especially simple ones) deal
    Guido> with txet than with binary data.

For us Unix-heads the notion that a file is anything other than a stream of
bytes is rather foreign.  I understand that to a large degree if you made
the world right for us the tail would be wagging the dog.

Skip


From skip at pobox.com  Wed Sep 19 22:06:03 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 19 Sep 2007 15:06:03 -0500
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
Message-ID: <18161.33067.153099.859501@montanaro.dyndns.org>


    Jason> Given the context (stdin/stdout/stderr), I'd love to know what
    Jason> you're thinking of here.  I can't name a program offhand that
    Jason> wants to operate on binary data via a pipeline.

You've obviously never used the netpbm (nee pbmplus, nee pbm) tools.  I
still use this pipeline to capture a window fairly frequently:

    xwd | xwdtopnm | pnmtopng > window.png

I believe ImageMagick also operates by means of filter programs transforming
binary data.

    Jason> There are a few that *can*, like gzip, but my impression is that
    Jason> even those aren't often used that way anymore.

Only because the true believers have been overrun by the unwashed masses who
need to use a GUI as a crutch.  I use g(un)?zip and b(un)?zip2 as filters
all the time.  It's that elegant Unix model of computing I grew up with.
Lots of small tools do one thing well instead of a massively bloated tool
that has a swiss army knife drawer full of options.

Skip

From fdrake at acm.org  Wed Sep 19 22:46:45 2007
From: fdrake at acm.org (Fred Drake)
Date: Wed, 19 Sep 2007 16:46:45 -0400
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
Message-ID: <5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>

On Sep 19, 2007, at 3:58 PM, Jason Orendorff wrote:
> Given the context (stdin/stdout/stderr), I'd love to know what you're
> thinking of here.  I can't name a program offhand that wants to
> operate on binary data via a pipeline.  There are a few that *can*,
> like gzip, but my impression is that even those aren't often used that
> way anymore.

Huh.  I use pipelines constructed in the shell for binary data  
regularly; I don't see any reason not to do that.  I'd certainly  
rather see the stdio streams be available as binary data, possibly  
with convenient text-centric wrappers also available.  But I'd be  
fine with constructing those myself.


   -Fred

-- 
Fred Drake   




From guido at python.org  Wed Sep 19 23:00:39 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 14:00:39 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
	<5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>
Message-ID: 

On 9/19/07, Fred Drake  wrote:
> On Sep 19, 2007, at 3:58 PM, Jason Orendorff wrote:
> > Given the context (stdin/stdout/stderr), I'd love to know what you're
> > thinking of here.  I can't name a program offhand that wants to
> > operate on binary data via a pipeline.  There are a few that *can*,
> > like gzip, but my impression is that even those aren't often used that
> > way anymore.
>
> Huh.  I use pipelines constructed in the shell for binary data
> regularly; I don't see any reason not to do that.  I'd certainly
> rather see the stdio streams be available as binary data, possibly
> with convenient text-centric wrappers also available.  But I'd be
> fine with constructing those myself.

I agree that binary pipelines are useful and should be possible. I
just don't think this should be the default behavior for stdin/stdout.

Since the binary stream underlying stdin is readily available as
sys.stdin.buffer (and ditto for stdout and even stderr) I don't think
any action needs to be taken. note that the instance variable doesn't
start with an underscore. It's part of the public API for text files.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Wed Sep 19 23:08:16 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 19 Sep 2007 14:08:16 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <18161.32698.291402.642086@montanaro.dyndns.org>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
Message-ID: 

On 9/19/07, skip at pobox.com  wrote:
>
>     Guido> You can repeat that until you're blue in the face but it's not
>     Guido> going to change. Way more programs (especially simple ones) deal
>     Guido> with txet than with binary data.
>
> For us Unix-heads the notion that a file is anything other than a stream of
> bytes is rather foreign.  I understand that to a large degree if you made
> the world right for us the tail would be wagging the dog.

I think the key thing here is that Guido said "especially simple ones"
and the examples people are talking about are not overly simple (e.g,
gzip, ImageMagik, etc.).  That would suggest that if you want the raw
bytes from stdin or write out to stdout that accessing the 'buffer'
attribute you probably know what you are doing and thus accessing a
'buffer' attribute is probably not difficult for you.  =)

-Brett

From fdrake at acm.org  Wed Sep 19 23:42:14 2007
From: fdrake at acm.org (Fred Drake)
Date: Wed, 19 Sep 2007 17:42:14 -0400
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
	<5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>
	
Message-ID: 

On Sep 19, 2007, at 5:00 PM, Guido van Rossum wrote:
> Since the binary stream underlying stdin is readily available as
> sys.stdin.buffer (and ditto for stdout and even stderr) I don't think
> any action needs to be taken. note that the instance variable doesn't
> start with an underscore. It's part of the public API for text files.

Amazingly, that's good enough for me.  ;-)


   -Fred

-- 
Fred Drake   




From skip at pobox.com  Thu Sep 20 00:19:29 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 19 Sep 2007 17:19:29 -0500
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
	<5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>
	
Message-ID: <18161.41073.724795.594482@montanaro.dyndns.org>


    Guido> I agree that binary pipelines are useful and should be
    Guido> possible. I just don't think this should be the default behavior
    Guido> for stdin/stdout.

Binary has (like it or not) been the default behavior on all previous
Pythons running on Unix systems where text and binary were never different
(a view of the computing world which VMS ruined with it's morass of file
types and which Windows NT lapped up like antifreeze).  The only time I ever
open a file with the "b" attribute is when I expect that code to run on
Windows (thankfully a rare occurrence for me).  Python 3 will obviously be
changing behavior in this regard for some of us, though as I indicated
before, satisfying those of us who hold this perspective (apparently just
Bill, Fred and me at this point) would probably be counter to the needs of
the Python community as a whole, fully 90% of whom think a pipe is made out
of PVC, not copper and can't understand what I'm telling them to type unless
I say

    grep -i python VERTICAL BAR lpr

instead of

    grep -i python PIPE lpr

Unix folks of course know to type '|' when you say PIPE and if you say
VERTICAL BAR they type 'vertical bar'.  But enough wistful reminiscing.  I
will shut up after one more parting shot:

Dennis-Ritchie-had-it-right-ly yr's,

Skip

From weilawei at gmail.com  Thu Sep 20 02:34:23 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Wed, 19 Sep 2007 20:34:23 -0400
Subject: [Python-3000] Implementing Abstract Interface for Numbers
Message-ID: 

This is the documentation for PyNumberMethods right now.

PyNumberMethods *tp_as_number;
XXX


I've managed to wrap GNU MP floats and add rich comparisons, but there's a
sore lack of documentation on how to implement the Number interface. Given a
bit of pointers on where to look, an alpha version of this extension will be
available tomorrow, most likely.

Thanks for the help.

Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070919/20259ca7/attachment-0001.htm 

From tjreedy at udel.edu  Thu Sep 20 03:24:04 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 19 Sep 2007 21:24:04 -0400
Subject: [Python-3000] New io system and binary data
References: 
	
Message-ID: 


"Guido van Rossum"  wrote in message 
news:ca471dc20709191019k2f5e16e5j75767b25ddf90e30 at mail.gmail.com...
| Changing the mode between text and binary is not feasible (since it
| would have to change the class). But it is perfectly acceptable to use
| sys.std{in,out}.buffer if you need to write a binary transparent
| filter.

In PEP 3116, the Buffered I/O section has
Additionally, the abstract base class provides one member variable:
    .raw
        A reference to the underlying RawIOBase object.

The Text I/O section does *not* have, but I presume should, similar lines 
about member variable .buffer.

Perhaps a note could be added that stdin/out will be Text I/O and that the 
bytes buffer is easily unwrapped via .buffer (and even via .raw).

While I sympathize with the initial surprise, I am willing to type .buffer 
should I need to.  The real problem is that 2to3.py cannot do so 
automatically (and be always. and probably not even usually, correct).

tjr




From greg.ewing at canterbury.ac.nz  Thu Sep 20 03:40:06 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 20 Sep 2007 13:40:06 +1200
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <9e804ac0709181840gb33da5co5db610aa2cd6bbeb@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz>
	<46EBB779.6090605@gmx.net> <46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz>
	<9e804ac0709181840gb33da5co5db610aa2cd6bbeb@mail.gmail.com>
Message-ID: <46F1CF76.20004@canterbury.ac.nz>

Thomas Wouters wrote:
> The only meaning is by accident of position. For most programs, the very 
> same thing goes for the rest of the arguments: 'mv foo bar' assigns a 
> different meaning to 'foo' than it does to 'bar'. Notice how sys.argv 
> matches what the user typed, including sys.argv[0].

But most users don't think of the 'mv' in 'mv foo bar'
as being an argument in any normal sense of the word.
It's the thing the arguments are passed *to*, not an
argument itself.

Also, most programs aren't interested in argv[0] at
all, and those that are treat it in a very different
way from the rest of argv.

I still think that argv[0] is in the "too clever by
half" category. It has a kind of theoretical elegance
from a certain point of view, but no practical
benefit that I can see.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From guido at python.org  Thu Sep 20 04:11:51 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 19:11:51 -0700
Subject: [Python-3000] Move argv[0]? (Re: Unicode and OS strings)
In-Reply-To: <46F1CF76.20004@canterbury.ac.nz>
References: <1189700532.22693.40.camel@qrnik>
	<46EA5114.9060200@coli.uni-saarland.de>
	<46EB0EC2.4030208@canterbury.ac.nz> <46EBB779.6090605@gmx.net>
	<46EC8909.4050300@canterbury.ac.nz>
	<9e804ac0709160824m3634437dseb2f0183580a7674@mail.gmail.com>
	<46EDFCBB.8010306@canterbury.ac.nz>
	<9e804ac0709181840gb33da5co5db610aa2cd6bbeb@mail.gmail.com>
	<46F1CF76.20004@canterbury.ac.nz>
Message-ID: 

On 9/19/07, Greg Ewing  wrote:
> I still think that argv[0] is in the "too clever by
> half" category. It has a kind of theoretical elegance
> from a certain point of view, but no practical
> benefit that I can see.

And I still think you're wasting your time on trivia.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep 20 04:13:15 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 19 Sep 2007 19:13:15 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
	
	
Message-ID: 

Yeah, the PEP is pretty out of date (perhaps only surpassed by PEP
3135, super()). It's on my list to update it. This should definitely
be added.

On 9/19/07, Terry Reedy  wrote:
>
> "Guido van Rossum"  wrote in message
> news:ca471dc20709191019k2f5e16e5j75767b25ddf90e30 at mail.gmail.com...
> | Changing the mode between text and binary is not feasible (since it
> | would have to change the class). But it is perfectly acceptable to use
> | sys.std{in,out}.buffer if you need to write a binary transparent
> | filter.
>
> In PEP 3116, the Buffered I/O section has
> Additionally, the abstract base class provides one member variable:
>     .raw
>         A reference to the underlying RawIOBase object.
>
> The Text I/O section does *not* have, but I presume should, similar lines
> about member variable .buffer.
>
> Perhaps a note could be added that stdin/out will be Text I/O and that the
> bytes buffer is easily unwrapped via .buffer (and even via .raw).
>
> While I sympathize with the initial surprise, I am willing to type .buffer
> should I need to.  The real problem is that 2to3.py cannot do so
> automatically (and be always. and probably not even usually, correct).
>
> tjr
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu Sep 20 05:19:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 20 Sep 2007 15:19:20 +1200
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 
Message-ID: <46F1E6B8.9080609@canterbury.ac.nz>

Christian Heimes wrote:
> With Python 2.5 the input and output data isn't implicitly
> converted

Are you sure that's always true? What about systems
where newlines aren't \n?

> I recommend that the problem and fix gets documented. Maybe stdin,
> stdout and stderr should get a method that disables the implicit
> conversion like setMode("b") / setMode("t").

Or maybe another set of objects called stdbin, stdbout, stdberr.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Sep 20 05:38:06 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 20 Sep 2007 15:38:06 +1200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <18161.41073.724795.594482@montanaro.dyndns.org>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<2296611449423656759@unknownmsgid>
	
	<5ACD931A-2E92-4018-ABB2-22C626589961@acm.org>
	
	<18161.41073.724795.594482@montanaro.dyndns.org>
Message-ID: <46F1EB1E.1080402@canterbury.ac.nz>

skip at pobox.com wrote:
> Binary has (like it or not) been the default behavior on all previous
> Pythons running on Unix systems where text and binary were never different

Um, no, *text* has always been the default on all systems.
It's just that on systems where text and binary are the
same, you don't notice the difference. This has led some
Unix programmers into bad habits.

> The only time I ever
> open a file with the "b" attribute is when I expect that code to run on
> Windows

A more defensive approach is to always open with "b" when
you're dealing with binary data, then it will work even if
someone does happen to run it on Windows.

Programs following this philosophy won't have any problems
with Py3k (at least not from that source).

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From weilawei at gmail.com  Thu Sep 20 05:49:29 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Wed, 19 Sep 2007 23:49:29 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
Message-ID: 

Okay, here's the barebones, scrapped together version. It's ugly. It's
messy. It might eat your kids. On the other hand, it seems to work.

http://umass.glexia.net/mpf.tar.bz2

It provides a module, mpf, with an MPF type and a bunch of methods.
You can directly set the value of an MPF type by setting the value attribute
to a string containing a number.
You can read the value of an MPF type by getting value. Note that it will
come back as a tuple in the form
(base, sign, whole, decimal)

whole and decimal will be strings. This is to avoid losing precision. I
couldn't think of another way to easily work with values that can be
(theoretically) infinite.

If the number is 0, value will be None. If there isn't a whole part (or a
decimal part), its place in the tuple will be set to None.
If you want to change the default precision (128 bits), use the keyword prec
in the costructor. It takes a Long.

The MPF type implements rich comparisons. The MPF type does not (yet)
implement the Number methods. It will.
The methods provided by mpf are...

(these methods take two MPF values and return an MPF value):

mpf_add
mpf_sub
mpf_div
mpf_mul

(these methods take one MPF value and return... gasp.. one MPF value):

mpf_sqrt
mpf_abs
mpf_neg

(this method has a stub and does not exist yet):

mpf_pow

Sharp Edges:

If you try to set value to something weird, like a dictionary, it will
segfault. PyString_Check wasn't working for me. It's in there, but defined
out.
If you divide by zero, GNU MP will crash it saying "Floating point
exception." Same goes for... well, I forgot already. I've been hacking on
this nearly
24 hours straight and I'm new to both the Python API and GNU MP.

Anything else? Good luck, hope someone likes it. I'll be continuing to work
on it, hopefully eliminate some redundancy in the code,
clean it up, flesh out the floating point support. After that, it's on to
integers.

Rob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070919/e65098a1/attachment.htm 

From jyasskin at gmail.com  Thu Sep 20 08:07:19 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 19 Sep 2007 23:07:19 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
Message-ID: <5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>

I've attached a very preliminary patch for this. It makes bytes
immutable but doesn't do either of the other suggested changes. It's
enough to make the tests run, but doesn't do anything to make them
pass. The test results so far are:

270 tests OK.
28 tests failed:
    test_asynchat test_asyncore test_audioop test_base64 test_binascii
    test_bytes test_codecs test_ftplib test_httplib test_io
    test_logging test_mailbox test_marshal test_mhlib test_mmap
    test_old_mailbox test_poplib test_smtplib test_socket test_string
    test_tarfile test_telnetlib test_unicode test_univnewlines
    test_urllib2_localnet test_uuid test_xmlrpc test_zipimport
24 tests skipped:
    test_bsddb3 test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_curses test_gdbm
    test_largefile test_locale test_normalization test_ossaudiodev
    test_pep277 test_socket_ssl test_socketserver test_ssl
    test_startfile test_timeout test_urllib2net test_urllibnet
    test_winreg test_winsound test_xmlrpc_net test_zipfile64
1 skip unexpected on darwin:
    test_ssl

On 9/18/07, Guido van Rossum  wrote:
> On 9/18/07, Jeffrey Yasskin  wrote:
> > I'll take it. I assume it's just a matter of removing the mutating
> > methods and making the tests pass?
>
> And adding __hash__. And (but this could be a separate, later change)
> switch indexing to return 1-char bytes arrays instead of small ints.
> And similar changes to the constructor.
>
> Of course, the devil is in the "making the tests pass".
>
> > I saw but didn't read a couple
> > threads about the buffer API... how much has to change there?
>
> The bytes buffer API should refuse requests for writable buffers.
>
> Since you're so close, please do interrupt me over IM to review
> incomplete work or ideas!
>
> --Guido
>
> > On 9/18/07, Guido van Rossum  wrote:
> > > No takers? What about those repeated +42 voters? Does anyone want
> > > immutable bytes enough to do a teensy bit of work?
> > >
> > > --Guido
> > >
> > > On 9/17/07, Guido van Rossum  wrote:
> > > > This may have passed in a thread where no-one was listening, so I'm
> > > > repeating it here.
> > > >
> > > > I'm considering the following option: bytes would always be immutable,
> > > > and for the few places (mostly in io.py) where a mutable bytes buffer
> > > > would be handy, we use the array module. Then it would also make sense
> > > > to make b[0] return a bytes array of length 1 instead of a small int
> > > > -- bytes would be more similar to str in 2.x, albeit completely
> > > > incompatible with str in terms of mixed operations.
> > > >
> > > > It would help if someone explored creating a patch to implement this,
> > > > just to see the minimum amount of code that would need to change
> > > > compared to 3.0a1. (The challenge includes making all the tests pass
> > > > again.)
> > > >
> > > > --
> > > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > >
> > >
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
> > >
> >
> >
> > --
> > Namast?,
> > Jeffrey Yasskin
> > http://jeffrey.yasskin.info/
> >
> > "Religion is an improper response to the Divine." ? "Skinny Legs and
> > All", by Tom Robbins
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins
-------------- next part --------------
A non-text attachment was scrubbed...
Name: preliminary_immutable_bytes.patch
Type: application/octet-stream
Size: 22657 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070919/c7b91d7e/attachment-0001.obj 

From lists at cheimes.de  Thu Sep 20 12:12:48 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 20 Sep 2007 12:12:48 +0200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F1E6B8.9080609@canterbury.ac.nz>
References:  <46F1E6B8.9080609@canterbury.ac.nz>
Message-ID: 

Greg Ewing wrote:
> Christian Heimes wrote:
>> With Python 2.5 the input and output data isn't implicitly
>> converted
> 
> Are you sure that's always true? What about systems
> where newlines aren't \n?

Windows is a strange beast. As far as I can remember the OS converts the
incoming and outgoing standard streams to Unix line endings \n. A true
binary standard stream on Windows needs some effort - unfortunately. :(

>> I recommend that the problem and fix gets documented. Maybe stdin,
>> stdout and stderr should get a method that disables the implicit
>> conversion like setMode("b") / setMode("t").
> 
> Or maybe another set of objects called stdbin, stdbout, stdberr.

I have given some thoughts to it while I was writing the initial mail. I
had the names stdinb, stdoutb and stderrb in mind but your names are
better. The problem with the binary stream lies in the fine detail. We
can't simply assign sys.stdout.buffer to sys.stdbout. I - as a Python
user - would expect that stdbout will always use the same backend as stdout:

Python sets
>>> sys.stdbout = sys.stdout.buffer

Now the user assigns a new file to stdout
>>> sys.stdout = file("myoutput", "w")

and blindly expects that
>>> sys.stdbout.write("data\ndata\n")
does the right thing.

A proxy like following (untested) class might do the trick.

import sys
class StdBinaryFacade:
    def __init__(self, name):
        self._name = name

    def __getattr__(self, key):
        buffer = getattr(sys, self._name).buffer
        return getattr(buffer, key)

    def __repr__(self):
        return "<%s for sys.%s at %i>" % (self.__name__, self._name,
id(self))

>>> sys.stdbout = StdBinaryFacade("stdout")

Christian


From eric+python-dev at trueblade.com  Thu Sep 20 12:58:31 2007
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 20 Sep 2007 06:58:31 -0400
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References:  <46F1E6B8.9080609@canterbury.ac.nz>
	
Message-ID: <46F25257.2030509@trueblade.com>

Christian Heimes wrote:
> Greg Ewing wrote:
>> Christian Heimes wrote:
>>> With Python 2.5 the input and output data isn't implicitly
>>> converted
>> Are you sure that's always true? What about systems
>> where newlines aren't \n?
> 
> Windows is a strange beast. As far as I can remember the OS converts the
> incoming and outgoing standard streams to Unix line endings \n. A true
> binary standard stream on Windows needs some effort - unfortunately. :(

To be precise, it's not the OS that does this, but rather the C runtime.

Eric.


From adam at hupp.org  Thu Sep 20 13:50:08 2007
From: adam at hupp.org (Adam Hupp)
Date: Thu, 20 Sep 2007 07:50:08 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: 
References: 
Message-ID: <766a29bd0709200450p26588f71x8526cd0d0ec64206@mail.gmail.com>

On 9/19/07, Rob Crowther  wrote:
> If you try to set value to something weird, like a dictionary, it will
> segfault. PyString_Check wasn't working for me. It's in there, but defined
> out.

I think you'll need to use PyUnicode_Check for that.

-- 
Adam Hupp | http://hupp.org/adam/

From adam at hupp.org  Thu Sep 20 15:46:36 2007
From: adam at hupp.org (Adam Hupp)
Date: Thu, 20 Sep 2007 09:46:36 -0400
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
Message-ID: <766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>

On 9/20/07, Jeffrey Yasskin  wrote:
> I've attached a very preliminary patch for this. It makes bytes
> immutable but doesn't do either of the other suggested changes. It's
> enough to make the tests run, but doesn't do anything to make them
> pass. The test results so far are:

I have fixes for the following:

test_asynchat
test_asyncore
test_bytes
test_string
test_base64
test_binascii
test_tarfile

I'll post a patch later today.


-- 
Adam Hupp | http://hupp.org/adam/

From martin at v.loewis.de  Thu Sep 20 15:51:16 2007
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Thu, 20 Sep 2007 15:51:16 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <200709191240.33698.victor.stinner@haypocalc.com>
References: <1189700532.22693.40.camel@qrnik>
	<200709191240.33698.victor.stinner@haypocalc.com>
Message-ID: <20070920155116.wiziy13ig4kksoc8@webmail.df.eu>

> On Linux, filenames are *byte* string and not *character* string.

That's not true, although this is a wide-spread misunderstanding.

The POSIX standard defines that the file names must be a superset
of the portable character set, which includes things such as '/',
which is the path separator.

> I always
> have his problem with Python 2.x. I converted filename (argv[x]) to Unicode
> to be able to format error messages in full unicode... but it's not possible.
> Linux allows invalid utf8 filename even on full utf8 installation (ubuntu),
> see Marcin's examples.

True. However, this does not mean that the file names are byte strings -
they are character strings in an unspecified/undetermined encoding.

Regards,
Martin



From janssen at parc.com  Thu Sep 20 17:57:59 2007
From: janssen at parc.com (Bill Janssen)
Date: Thu, 20 Sep 2007 08:57:59 PDT
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F1E6B8.9080609@canterbury.ac.nz> 
References:  <46F1E6B8.9080609@canterbury.ac.nz>
Message-ID: <07Sep20.085807pdt."57996"@synergy1.parc.xerox.com>

Greg Ewing writes:
> Christian Heimes writes:
> > I recommend that the problem and fix gets documented. Maybe stdin,
> > stdout and stderr should get a method that disables the implicit
> > conversion like setMode("b") / setMode("t").
> 
> Or maybe another set of objects called stdbin, stdbout, stdberr.

Nice idea, but it would have been a tad more true to the origin of the
names if "stdin", "stderr", and "stdout" were binary (as the re-use of
those fine names automatically implies to anyone who knows what
they're doing), and "textin", "textout", and "texterr" were the bogus
VMS/Windows corrupted versions of the dandy UNIX originals.

Bill

From guido at python.org  Thu Sep 20 19:08:03 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 20 Sep 2007 10:08:03 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <1462544809546634408@unknownmsgid>
References:  <46F1E6B8.9080609@canterbury.ac.nz>
	<1462544809546634408@unknownmsgid>
Message-ID: 

On 9/20/07, Bill Janssen  wrote:
> Greg Ewing writes:
> > Christian Heimes writes:
> > > I recommend that the problem and fix gets documented. Maybe stdin,
> > > stdout and stderr should get a method that disables the implicit
> > > conversion like setMode("b") / setMode("t").
> >
> > Or maybe another set of objects called stdbin, stdbout, stdberr.
>
> Nice idea, but it would have been a tad more true to the origin of the
> names if "stdin", "stderr", and "stdout" were binary (as the re-use of
> those fine names automatically implies to anyone who knows what
> they're doing), and "textin", "textout", and "texterr" were the bogus
> VMS/Windows corrupted versions of the dandy UNIX originals.

Oh for chrissakes. Can we stop the bikeshedding on this topic already?
Several people have already agreed that sys.stdin.buffer is good
enough. Please stop while you're ahead.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Thu Sep 20 20:03:53 2007
From: thomas at python.org (Thomas Wouters)
Date: Thu, 20 Sep 2007 11:03:53 -0700
Subject: [Python-3000] Implementing Abstract Interface for Numbers
In-Reply-To: 
References: 
Message-ID: <9e804ac0709201103s2b0d3392jdf8c19196c33efa9@mail.gmail.com>

On 9/19/07, Rob Crowther  wrote:
>
> This is the documentation for PyNumberMethods right now.
>
> PyNumberMethods *tp_as_number;
> XXX
>
>
> I've managed to wrap GNU MP floats and add rich comparisons, but there's a
> sore lack of documentation on how to implement the Number interface. Given a
> bit of pointers on where to look, an alpha version of this extension will be
> available tomorrow, most likely.


I'm not sure where you saw that 'XXX' -- are you looking at Py3k docs? In
that case, don't bother, the Numbers API has hardly changed, just use the
Python 2.5 docs. Or the Python 2.0 docs, as there's little difference ;)

But it's true there isn't all that much documentation on those parts. The
PyNumberMethods struct is really straightforward, you should be able to
guess what each function is supposed to do just by looking at the function
signature and the name. But when in doubt, the best place to go is the
Python source. Just look at Objects/intobject.c or Objects/longobject.c.

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070920/8ef58daa/attachment.htm 

From jyasskin at gmail.com  Thu Sep 20 21:34:57 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 20 Sep 2007 12:34:57 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
Message-ID: <5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>

On 9/20/07, Adam Hupp  wrote:
> On 9/20/07, Jeffrey Yasskin  wrote:
> > I've attached a very preliminary patch for this. It makes bytes
> > immutable but doesn't do either of the other suggested changes. It's
> > enough to make the tests run, but doesn't do anything to make them
> > pass. The test results so far are:
>
> I have fixes for the following:
>
> test_asynchat
> test_asyncore
> test_bytes
> test_string
> test_base64
> test_binascii
> test_tarfile
>
> I'll post a patch later today.

Thanks for the help! This brings up a policy question: For patches
like the one I've attached here, do we want to start submitting them
now, or build up a mondo patch to fix them all at once?

-- 
Namast?,
Jeffrey Yasskin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sample_test_changes.patch
Type: application/octet-stream
Size: 2727 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070920/0aae4034/attachment-0001.obj 

From tjreedy at udel.edu  Thu Sep 20 22:09:51 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 20 Sep 2007 16:09:51 -0400
Subject: [Python-3000] Immutable bytes -- looking for volunteer
References: <5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com><5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com><766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
Message-ID: 


"Jeffrey Yasskin"  wrote in message 
news:5d44f72f0709201234vec00c4w13d41bf5c4bea8d7 at mail.gmail.com...
| On 9/20/07, Adam Hupp  wrote:
|| >
| > I have fixes for the following:
...
| > I'll post a patch later today.
|
| Thanks for the help! This brings up a policy question: For patches
| like the one I've attached here, do we want to start submitting them
| now, or build up a mondo patch to fix them all at once?

I think it OK to post patches to the tracker even if one does not intend 
for them to be immediately applied or even expect them to be combined with 
others.  Then they do not get lost (or mangled by the mail system) and are 
available to anyone.




From guido at python.org  Thu Sep 20 22:18:00 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 20 Sep 2007 13:18:00 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
Message-ID: 

On 9/20/07, Jeffrey Yasskin  wrote:
> On 9/20/07, Adam Hupp  wrote:
> > On 9/20/07, Jeffrey Yasskin  wrote:
> > > I've attached a very preliminary patch for this. It makes bytes
> > > immutable but doesn't do either of the other suggested changes. It's
> > > enough to make the tests run, but doesn't do anything to make them
> > > pass. The test results so far are:
> >
> > I have fixes for the following:
> >
> > test_asynchat
> > test_asyncore
> > test_bytes
> > test_string
> > test_base64
> > test_binascii
> > test_tarfile
> >
> > I'll post a patch later today.
>
> Thanks for the help! This brings up a policy question: For patches
> like the one I've attached here, do we want to start submitting them
> now, or build up a mondo patch to fix them all at once?

This is supposed to be an exploration of the possibilities. So either
you create a branch, where you can submit to your heart's content, or
you collect everything in one big jumbo patch (in which case I'd
recommend reserving a tracker item).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adam at hupp.org  Fri Sep 21 00:48:14 2007
From: adam at hupp.org (Adam Hupp)
Date: Thu, 20 Sep 2007 18:48:14 -0400
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
Message-ID: <766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>

On 9/20/07, Jeffrey Yasskin  wrote:
>
> Thanks for the help! This brings up a policy question: For patches
> like the one I've attached here, do we want to start submitting them
> now, or build up a mondo patch to fix them all at once?

My changes are here:

http://bugs.python.org/issue1184

With that patch there are only two issues remaining (6 test failures).

-- 
Adam Hupp | http://hupp.org/adam/

From greg.ewing at canterbury.ac.nz  Fri Sep 21 03:00:38 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 21 Sep 2007 13:00:38 +1200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <07Sep20.085807pdt.57996@synergy1.parc.xerox.com>
References:  <46F1E6B8.9080609@canterbury.ac.nz>
	<07Sep20.085807pdt.57996@synergy1.parc.xerox.com>
Message-ID: <46F317B6.6070202@canterbury.ac.nz>

Bill Janssen wrote:

> Nice idea, but it would have been a tad more true to the origin of the
> names if "stdin", "stderr", and "stdout" were binary (as the re-use of
> those fine names automatically implies to anyone who knows what
> they're doing)

No, the names only imply that to Unix users who are ignorant
of the correct way to use the C stdio library portably.
Right from the beginning, binary mode was an option, and
if you didn't ask for it, you got text mode. The same thing
applies to stdin/out/err. Anyone using them to handle binary
data is and was writing non-portable code.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From jimjjewett at gmail.com  Fri Sep 21 16:00:38 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 21 Sep 2007 10:00:38 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
References: <1189700532.22693.40.camel@qrnik>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
Message-ID: 

On 9/18/07, James Y Knight  wrote:
> On Sep 18, 2007, at 11:11 AM, Guido van Rossum wrote:

> One of the more common things to do with command line arguments is
> open them. So, it'd really be nice if:

> python -c 'import sys; open(sys.argv[1])' [some filename]

> would always work, regardless of the current system encoding and what
> characters make up the filename.

(Outside ASCII), if you treat sys.argv as text, that is probably
impossible without filesystem support.  Before python even sees the
data, the terminal itself is allowed to change between canonical
equivalents, which have different binary representations.

It does sound like we need a way to get to the original bytes, similar
to sys.stdin.buffer.  Is it reasonable to expose sys.argv.buffer?
(Since this would be bytes rather than text, I assume this would be a
single array, rather than a list of already separated arguments.)

Similarly, could os.environ have a bytes mirror, where the keys and
values are (immutable) bytes?

-jJ

From p.f.moore at gmail.com  Fri Sep 21 16:41:03 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 21 Sep 2007 15:41:03 +0100
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
Message-ID: <79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>

On 21/09/2007, Jim Jewett  wrote:
> (Outside ASCII), if you treat sys.argv as text, that is probably
> impossible without filesystem support.  Before python even sees the
> data, the terminal itself is allowed to change between canonical
> equivalents, which have different binary representations.

Please note - this statement is Unix specific. The situation on
Windows is entirely different (the fact that the CRT on Windows
emulates some aspects of the Unix semantics is not relevant here - you
need to understand the underlying OS model).

If you want to redesign things (and I don't, personally, believe that
is a good idea) then make sure you don't base your design solely on
Unix semantics.

Paul.

From jimjjewett at gmail.com  Fri Sep 21 16:45:51 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 21 Sep 2007 10:45:51 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<18155.9131.229187.756043@uwakimon.sk.tsukuba.ac.jp>
	<1190056321.14217.21.camel@qrnik>
	<18159.20285.252979.634446@uwakimon.sk.tsukuba.ac.jp>
	<1190106739.23701.17.camel@qrnik>
	<18160.14041.403941.778059@uwakimon.sk.tsukuba.ac.jp>
	
	
	
	
Message-ID: 

On 9/18/07, Guido van Rossum  wrote:
> On 9/18/07, Jim Jewett  wrote:

> > ... given that defenc is now always UTF-8, won't exposing
> > it in the public typedef then just be an attractive nuisance?

> *ALL* fields of the struct def are strictly internal.

Is that policy documented somewhere?  I didn't get that impression
from the C API, the Extending and Embedding document, or from the
header itself.  In the header, it was above the "public API" line, but
so were things like Py_UNICODE_REPLACEMENT_CHARACTER, and it does
start with Py rather than _Py.  Other declarations, such as
_PyUnicode_AsDefaultEncodedString, were clearly marked as internal in
both comments and name.

> >  [ My proposal to remove *str and *defenc from definition in
> > the public .h file.)

> > As this would allow 3rd parties to create implementations specialized
> > for (and saving space on) smaller alphabets, without breaking C
> > extensions that stick to the public header files.

> That is not a supported use case.

Why not?  If it is just for lack of contributions, I'll shut up until
I find time.  But it sounds (and has sounded in the past) like a
policy decision -- and I want to know the reasoning behind it.

-jJ

From exarkun at divmod.com  Fri Sep 21 16:46:47 2007
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 21 Sep 2007 10:46:47 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
Message-ID: <20070921144647.8162.742888147.divmod.quotient.12306@ohm>

On Fri, 21 Sep 2007 10:00:38 -0400, Jim Jewett  wrote:
> [snip]
>
>It does sound like we need a way to get to the original bytes, similar
>to sys.stdin.buffer.  Is it reasonable to expose sys.argv.buffer?
>(Since this would be bytes rather than text, I assume this would be a
>single array, rather than a list of already separated arguments.)

Without commenting on whether this is a good idea overall or not, it
would not be a single array, rather than a list of already separated
arguments, because it is given to the C main() function as an array
of char*, not a single char*.

On Windows it's more complicated, but the same argument can probably
be applied (or it should also reflect the underlying system API on
Windows, which means on Windows it will be a single bytes object
instead of a list of them, but only on Windows. This goes beyond
even the 2.x level of low-level detail exposure).

Jean-Paul

From jimjjewett at gmail.com  Fri Sep 21 17:01:24 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 21 Sep 2007 11:01:24 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
Message-ID: 

On 9/21/07, Paul Moore  wrote:
> On 21/09/2007, Jim Jewett  wrote:
> > (Outside ASCII), if you treat sys.argv as text, that is probably
> > impossible without filesystem support.  Before python even sees the
> > data, the terminal itself is allowed to change between canonical
> > equivalents, which have different binary representations.

> Please note - this statement is Unix specific. The situation on
> Windows is entirely different (the fact that the CRT on Windows
> emulates some aspects of the Unix semantics is not relevant here - you
> need to understand the underlying OS model).

No; it is a consequence of unicode.  The command shell (or other
program launcher) have the same freedom.

If you are using text (as opposed to bytes), then ? can be either
U+00C0 or .  If the file system makes a distinction,
then it is using bytes, and any program interacting with it needs* to
use bytes too.

* To be correct; in practice, the problems will occur rarely enough
that most people won't notice.

-jJ

From theller at ctypes.org  Fri Sep 21 17:18:01 2007
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 21 Sep 2007 17:18:01 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <20070921144647.8162.742888147.divmod.quotient.12306@ohm>
References: 
	<20070921144647.8162.742888147.divmod.quotient.12306@ohm>
Message-ID: 

Jean-Paul Calderone schrieb:
> On Fri, 21 Sep 2007 10:00:38 -0400, Jim Jewett  wrote:
>> [snip]
>>
>>It does sound like we need a way to get to the original bytes, similar
>>to sys.stdin.buffer.  Is it reasonable to expose sys.argv.buffer?
>>(Since this would be bytes rather than text, I assume this would be a
>>single array, rather than a list of already separated arguments.)
> 
> Without commenting on whether this is a good idea overall or not, it
> would not be a single array, rather than a list of already separated
> arguments, because it is given to the C main() function as an array
> of char*, not a single char*.
> 
> On Windows it's more complicated, but the same argument can probably
> be applied (or it should also reflect the underlying system API on
> Windows, which means on Windows it will be a single bytes object
> instead of a list of them, but only on Windows. This goes beyond
> even the 2.x level of low-level detail exposure).

I *hope* that on Windows, these objects will be unicode not bytes
objects - the wide windows api should be used to get these values.
No conversion needed.

Thomas


From murman at gmail.com  Fri Sep 21 17:22:29 2007
From: murman at gmail.com (Michael Urman)
Date: Fri, 21 Sep 2007 10:22:29 -0500
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
Message-ID: 

On 9/21/07, Jim Jewett  wrote:
> (Outside ASCII), if you treat sys.argv as text, that is probably
> impossible without filesystem support.  Before python even sees the
> data, the terminal itself is allowed to change between canonical
> equivalents, which have different binary representations.
>
> It does sound like we need a way to get to the original bytes, similar
> to sys.stdin.buffer.  Is it reasonable to expose sys.argv.buffer?

If there's not something straightforward to put in the ... below that
would allow simple iteration and processing of all files passed on the
command line, preferably interchangeably on both unix (where filenames
cannot necessarily be converted to Unicode) and Windows NT and up
(where filenames cannot necessarily be represented by bytestrings, and
arguments don't necessarily come in as bytes), then I will be one of
many disappointed people.

>>> arguments = ... # something equivalent to (python 2.x on unix) sys.argv[1:]
>>> for filename in arguments:
...     archive.add(filename) # definitely - akin to open(file)
...     print(filename, file=listing) # maybe - this makes too many assumptions

Obviously simple things like replacing an un(de/en)codable character
with '?' will fail - while they could be partially worked around by
using glob (assuming a one to one replacement, as processed by the
OS), that's just asking for an unwitting corner-case behavior when
another file nearly matches the name of another with a replaced
character.

I don't have a preference between sys.argv[1:] doing this like it
always has on unix, and tends to within a single locale on Windows;
the introduction of a new sys.arguments (either [0:] or [1:]); or even
some simple map(encode_step, sys.argv[1:]). Of course the problem with
the encode_step is unless it is a no-op on Windows, it can break
filenames as badly as decoding them will on unix, unless the common OS
interfaces all reverse the process (in which case doing it manually is
never necessary).

Michael
-- 
Michael Urman

From p.f.moore at gmail.com  Fri Sep 21 17:59:43 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 21 Sep 2007 16:59:43 +0100
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik> <46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
Message-ID: <79990c6b0709210859h4116a7ecx1d1e024f16698483@mail.gmail.com>

On 21/09/2007, Jim Jewett  wrote:
> If you are using text (as opposed to bytes), then ? can be either
> U+00C0 or .  If the file system makes a distinction,
> then it is using bytes, and any program interacting with it needs* to
> use bytes too.

OK. I don't know enough about Unicode (or this low a level of the
Windows API) to be sure. But it's certainly possible that under
Windows, the file system (API) doesn't make a distinction.

> * To be correct; in practice, the problems will occur rarely enough
> that most people won't notice.

Too right. The only explicit case of an issue that I'm aware of is the
one that started the thread, of a Unix system with incompatible
terminal and filesystem encodings (or was it extremely obscure shell
incantations? whatever, it was well beyond my level of Unix
knowledge).

I'd say YAGNI except that someone seems to have demonstrated a genuine
(if rare) need on Unix. I'll stick with YAGNI on Windows, though.
(Where's uncle Tim to point out that Windows is the better platform
when you need him? :-))

Paul.

PS I'm now so far out of my depth on Unicode issues that I'll drop out
of this thread at this point.

From guido at python.org  Fri Sep 21 18:52:52 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 21 Sep 2007 09:52:52 -0700
Subject: [Python-3000] Py3k Trivia :-)
Message-ID: 

Would you believe there's a Curious George episode named "Curious
George Vs The Turbo Python 3000"?

"""
George isn't tall enough to ride the greatest rollercoaster of all
time, The Turbo Python 3000. He uses licorice whips to measure his
height and determines that he is 7-whips tall, one short of the 8-whip
minimum!
"""

http://pbskids.org/curiousgeorge/parentsteachers/program/ep_desc_3.html

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Wed Sep 19 22:28:08 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 19 Sep 2007 22:28:08 +0200
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 	
		
	<-7804278669952876495@unknownmsgid>
	
Message-ID: <46F18658.40805@cheimes.de>

Guido van Rossum wrote:
> You can repeat that until you're blue in the face but it's not going
> to change. Way more programs (especially simple ones) deal with txet
> than with binary data.

I have to agree with Guido. The new behavior is much better than the
default in Python 2.x. It seems that I'm the first user with an use case
which requires a binary stdin and stdout.

I can imagine two problems with the new way. The problems should have a
documented answer and best way to deal with them:

 * stdin or stdout are used in binary mode

 * stdin or stdout have to deal with data in a different encoding than UTF-8

Christian

From arvind1.singh at gmail.com  Fri Sep 21 19:31:10 2007
From: arvind1.singh at gmail.com (Arvind Singh)
Date: Fri, 21 Sep 2007 23:01:10 +0530
Subject: [Python-3000] decorators for variable assignments?
Message-ID: 

Hi,

We have function and class decorators. Can we also have decorators for
variable assignments?

For example:
  @validate_proxy
  proxy = "http://user:passwd at host:port/"

be a syntactical sugar for:
  proxy = validate_proxy("http://user:passwd at host:port/")


Python is often used as a configuration language (small utility
scripts, Django, etc.) and it makes more sense to have the validation
of a user supplied configuration value at the time of assignment
rather than leaving the burden of validation on every piece of code
that uses it. Although both approaches can be used for it, a user will
be more hesitant to edit the string in the latter form (the former one
is also more readable IMHO).


Arvind

PS: I hope it's not something too radical to talk so late about.

From guido at python.org  Fri Sep 21 19:40:55 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 21 Sep 2007 10:40:55 -0700
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: 
References: 
Message-ID: 

None of the arguments for function and class decorators apply here.

On 9/21/07, Arvind Singh  wrote:
> Hi,
>
> We have function and class decorators. Can we also have decorators for
> variable assignments?
>
> For example:
>   @validate_proxy
>   proxy = "http://user:passwd at host:port/"
>
> be a syntactical sugar for:
>   proxy = validate_proxy("http://user:passwd at host:port/")
>
>
> Python is often used as a configuration language (small utility
> scripts, Django, etc.) and it makes more sense to have the validation
> of a user supplied configuration value at the time of assignment
> rather than leaving the burden of validation on every piece of code
> that uses it. Although both approaches can be used for it, a user will
> be more hesitant to edit the string in the latter form (the former one
> is also more readable IMHO).
>
>
> Arvind
>
> PS: I hope it's not something too radical to talk so late about.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Fri Sep 21 23:51:57 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 21 Sep 2007 17:51:57 -0400
Subject: [Python-3000] decorators for variable assignments?
References: 
Message-ID: 

|  @validate_proxy
|  proxy = "http://user:passwd at host:port/"
|
| be a syntactical sugar for:
|  proxy = validate_proxy("http://user:passwd at host:port/")

Sorry, to me, this is syntactical pepper -- or worse ;-)

tjr 




From tjreedy at udel.edu  Sat Sep 22 00:17:13 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 21 Sep 2007 18:17:13 -0400
Subject: [Python-3000] Unicode and OS strings
References: <1189700532.22693.40.camel@qrnik><46EB0DC0.3050906@canterbury.ac.nz><87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp><46EB6EA1.5020104@v.loewis.de><87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp><1190070414.20673.12.camel@qrnik><18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp><32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
Message-ID: 


"Michael Urman"  wrote in message 
news:dcbbbb410709210822p354ef608o6cd01994a67710f1 at mail.gmail.com...

| If there's not something straightforward to put in the ... below that
| would allow simple iteration and processing of all files passed on the
| command line, preferably interchangeably on both unix (where filenames
| cannot necessarily be converted to Unicode) and Windows NT and up
| (where filenames cannot necessarily be represented by bytestrings, and
| arguments don't necessarily come in as bytes), then I will be one of
| many disappointed people.

Perhaps we need one or more library functions (generators) to hide the OS 
differences and corner-case details.





From arvind1.singh at gmail.com  Sat Sep 22 00:23:54 2007
From: arvind1.singh at gmail.com (Arvind Singh)
Date: Sat, 22 Sep 2007 03:53:54 +0530
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: 
References: 
	
Message-ID: 

> |  @validate_proxy
> |  proxy = "http://user:passwd at host:port/"
> |
> | be a syntactical sugar for:
> |  proxy = validate_proxy("http://user:passwd at host:port/")
>
> Sorry, to me, this is syntactical pepper -- or worse ;-)

"Poison" perhaps? Then, maybe we can have Poisonous Python! :-)

-- 
Regards,
Arvind

From nicko at nicko.org  Sat Sep 22 02:18:48 2007
From: nicko at nicko.org (Nicko van Someren)
Date: Sat, 22 Sep 2007 01:18:48 +0100
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: 
References: 
	
Message-ID: <2EE33BBD-C301-4CBF-BCCB-3BBE9204EF56@nicko.org>

On 21 Sep 2007, at 22:51, Terry Reedy wrote:

> |  @validate_proxy
> |  proxy = "http://user:passwd at host:port/"
> |
> | be a syntactical sugar for:
> |  proxy = validate_proxy("http://user:passwd at host:port/")
>
> Sorry, to me, this is syntactical pepper -- or worse ;-)

I'm thinking it tends towards "syntactic h?karl" :-)

	Nicko

[*] http://en.wikipedia.org/wiki/Hakarl


From greg.ewing at canterbury.ac.nz  Sat Sep 22 03:05:53 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 22 Sep 2007 13:05:53 +1200
Subject: [Python-3000] Py3k Trivia :-)
In-Reply-To: 
References: 
Message-ID: <46F46A71.1060409@canterbury.ac.nz>

Guido van Rossum wrote:
> """
> George isn't tall enough to ride the greatest rollercoaster of all
> time, The Turbo Python 3000. He uses licorice whips to measure his
> height and determines that he is 7-whips tall, one short of the 8-whip
> minimum!
> """

Fantastic! I vote that we hereby adopt the licorice whip
as the standard unit for measuring the speed of Python 3.0
implementations, with the speed of 2.6 (whatever it turns
out to be) defined as 7 whips.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Sep 22 03:08:53 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 22 Sep 2007 13:08:53 +1200
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: 
References: 
	
Message-ID: <46F46B25.6030907@canterbury.ac.nz>

On 9/21/07, Arvind Singh  wrote:

>  @validate_proxy
>  proxy = "http://user:passwd at host:port/"
>
> it makes more sense to have the validation
> of a user supplied configuration value at the time of assignment
> rather than leaving the burden of validation on every piece of code
> that uses it.

Why can't you just use a property that does the validation
in its set method?

--
Greg

From oliphant.travis at ieee.org  Sat Sep 22 03:40:18 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Fri, 21 Sep 2007 20:40:18 -0500
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
Message-ID: 

Guido van Rossum wrote:
> This may have passed in a thread where no-one was listening, so I'm
> repeating it here.
> 
> I'm considering the following option: bytes would always be immutable,
> and for the few places (mostly in io.py) where a mutable bytes buffer
> would be handy, we use the array module. Then it would also make sense
> to make b[0] return a bytes array of length 1 instead of a small int
> -- bytes would be more similar to str in 2.x, albeit completely
> incompatible with str in terms of mixed operations.

If it is decided to make bytes immutable (which sounds good to me), 
then I want to add my voice to those that clamor for an additional 
mutable object capable of allocating chunks of memory.

This object should have a C-API and have it's structure exposed to 
extension module writers (thus array.array does not fit the bill -- but 
might be a prototype if some of it is moved over to the Objects 
directory and given an API).

-Travis Oliphant




From guido at python.org  Sat Sep 22 03:55:10 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 21 Sep 2007 18:55:10 -0700
Subject: [Python-3000] Py3k Trivia :-)
In-Reply-To: <46F46A71.1060409@canterbury.ac.nz>
References: 
	<46F46A71.1060409@canterbury.ac.nz>
Message-ID: 

On 9/21/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
> > """
> > George isn't tall enough to ride the greatest rollercoaster of all
> > time, The Turbo Python 3000. He uses licorice whips to measure his
> > height and determines that he is 7-whips tall, one short of the 8-whip
> > minimum!
> > """
>
> Fantastic! I vote that we hereby adopt the licorice whip
> as the standard unit for measuring the speed of Python 3.0
> implementations, with the speed of 2.6 (whatever it turns
> out to be) defined as 7 whips.

Ah, but is 6 whips faster or slower than 7 whips?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Sep 22 04:03:44 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 21 Sep 2007 19:03:44 -0700
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: <2EE33BBD-C301-4CBF-BCCB-3BBE9204EF56@nicko.org>
References: 
	
	<2EE33BBD-C301-4CBF-BCCB-3BBE9204EF56@nicko.org>
Message-ID: 

Can we stop this already? The idea is dead. No need to drag it through
the mud around town for an extended period of time.

On 9/21/07, Nicko van Someren  wrote:
> On 21 Sep 2007, at 22:51, Terry Reedy wrote:
>
> > |  @validate_proxy
> > |  proxy = "http://user:passwd at host:port/"
> > |
> > | be a syntactical sugar for:
> > |  proxy = validate_proxy("http://user:passwd at host:port/")
> >
> > Sorry, to me, this is syntactical pepper -- or worse ;-)
>
> I'm thinking it tends towards "syntactic h?karl" :-)
>
>         Nicko
>
> [*] http://en.wikipedia.org/wiki/Hakarl
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Sat Sep 22 07:48:40 2007
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Sat, 22 Sep 2007 07:48:40 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
Message-ID: <20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>

Zitat von Jim Jewett :

> On 9/21/07, Paul Moore  wrote:
>> On 21/09/2007, Jim Jewett  wrote:
>> > (Outside ASCII), if you treat sys.argv as text, that is probably
>> > impossible without filesystem support.  Before python even sees the
>> > data, the terminal itself is allowed to change between canonical
>> > equivalents, which have different binary representations.
>
>> Please note - this statement is Unix specific. The situation on
>> Windows is entirely different (the fact that the CRT on Windows
>> emulates some aspects of the Unix semantics is not relevant here - you
>> need to understand the underlying OS model).
>
> No; it is a consequence of unicode.  The command shell (or other
> program launcher) have the same freedom.

I'm not quite sure what you are talking about here (what "same"
freedom?), but Paul is right: your statement *is* Unix specific,
and the situation on Windows *is* different on Windows.

argc/argv does not exist on Windows (that you seem to see it
anyway is an illusion), and if it did exist, it would be characters,
not bytes. "Canonical equivalents" is not a property of bytes,
but of Unicode characters (code points specifically).

Also, I'm not quite sure why you think the file system has
to do anything with sys.argv (unless your understanding of
what a "filesystem" is differs from mine).

Regards,
Martin


From qrczak at knm.org.pl  Sat Sep 22 10:18:34 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Sat, 22 Sep 2007 10:18:34 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>
	<87tzpx7hhj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB0DC0.3050906@canterbury.ac.nz>
	<87ps0kmw3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<46EB6EA1.5020104@v.loewis.de>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
Message-ID: <1190449114.30559.11.camel@qrnik>

Dnia 21-09-2007, Pt o godzinie 10:00 -0400, Jim Jewett napisa?(a):

> Is it reasonable to expose sys.argv.buffer?
> (Since this would be bytes rather than text, I assume this would be a
> single array, rather than a list of already separated arguments.)

On Unix the arguments are already separated on the OS level. It's the
shell which usually separates them if they were previously written with
spaces between (and understands quotes and other things). The execve()
system call obtains them separated, and the program receives them
separated.

Each Unix argument is a null-terminated array of bytes, i.e. only 0
bytes are disallowed, and the OS does not mangle the contents.

Of course people typically interpret these bytes as characters in a
guessed encoding, and the encoding is always a superset of ASCII.

On Windows the arguments are not separated, the whole command line is a
single string with spaces and possible quotes left for the program to
possibly interpret as separate arguments (unless something has changed
in the last 10 years). I believe it's an array of 16-bit code units,
typically meant to be interpreted as UTF-16, but without checking that
it's a well-formed UTF-16 sequence. I suppose that any 16-bit word
except 0 is allowed, but I'm not sure.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From p.f.moore at gmail.com  Sat Sep 22 14:05:30 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 22 Sep 2007 13:05:30 +0100
Subject: [Python-3000] decorators for variable assignments?
In-Reply-To: 
References: 
	
	<2EE33BBD-C301-4CBF-BCCB-3BBE9204EF56@nicko.org>
	
Message-ID: <79990c6b0709220505k1c123289pb1471dc03f584cc8@mail.gmail.com>

On 22/09/2007, Guido van Rossum  wrote:
> Can we stop this already? The idea is dead. No need to drag it through
> the mud around town for an extended period of time.

It's not dead, it's just pining for the fjords.

Sorry, couldn't resist :-)
Paul.

From jimjjewett at gmail.com  Sat Sep 22 21:11:34 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 22 Sep 2007 15:11:34 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
References: <1189700532.22693.40.camel@qrnik>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
Message-ID: 

On 9/22/07, martin at v.loewis.de  wrote:
> Zitat von Jim Jewett :
>
> > On 9/21/07, Paul Moore  wrote:
> >> On 21/09/2007, Jim Jewett  wrote:

[The original context, expressed with some detail by Michael Urman in
http://mail.python.org/pipermail/python-3000/2007-September/010621.html
was that it must be possible to treat command line arguments as filenames.]

> >> > (Outside ASCII), if you treat sys.argv as text, that is probably
> >> > impossible without filesystem support.  Before python even sees the
> >> > data, the terminal itself is allowed to change between canonical
> >> > equivalents, which have different binary representations.

> > No; it is a consequence of unicode.  The command shell (or other
> > program launcher) have the same freedom.

> I'm not quite sure what you are talking about here (what "same"
> freedom?),

The same freedom to represent ? as either U+00C0 or 

> argc/argv does not exist on Windows (that you seem to see it
> anyway is an illusion), and if it did exist, it would be characters,
> not bytes. "Canonical equivalents" is not a property of bytes,
> but of Unicode characters (code points specifically).

> Also, I'm not quite sure why you think the file system has
> to do anything with sys.argv (unless your understanding of
> what a "filesystem" is differs from mine).

The filesystem is unrelated to sys.argv, except for the need to pass
filenames through argv.  If the filesystem is using bytes rather than
characters, then sys.argv must offer the same option, or else certain
scripts will (under some rare circumstances) fail.

-jJ

From martin at v.loewis.de  Sat Sep 22 21:27:58 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 22 Sep 2007 21:27:58 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: 
References: <1189700532.22693.40.camel@qrnik>	
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<1190070414.20673.12.camel@qrnik>	
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>	
		
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>	
		
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>	
		
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
	
Message-ID: <46F56CBE.5010702@v.loewis.de>

> The filesystem is unrelated to sys.argv, except for the need to pass
> filenames through argv.  If the filesystem is using bytes rather than
> characters, then sys.argv must offer the same option, or else certain
> scripts will (under some rare circumstances) fail.

The same holds for file names on Windows - they aren't byte strings,
either.

Regards,
Martin

From charleshixsn at earthlink.net  Sun Sep 23 19:24:24 2007
From: charleshixsn at earthlink.net (Charles D Hixson)
Date: Sun, 23 Sep 2007 10:24:24 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 		<-7804278669952876495@unknownmsgid>
	
Message-ID: <46F6A148.6070107@earthlink.net>

Guido van Rossum wrote:
> On 9/19/07, Bill Janssen  wrote:
>   
>> This really isn't a UTF-8 problem.  It is the problem with file opens
>> defaulting to "text" mode instead of "binary" mode rearing its ugly
>> head again.
>>     
>
> You can repeat that until you're blue in the face but it's not going
> to change. Way more programs (especially simple ones) deal with txet
> than with binary data.
>
>   
OTOH, almost all of that text is ASCII.  Even if the system mode is set 
to utf-8, ascii is still ascii.

Still, this won't affect me, much, as I rarely send anything complex via 
pipes.  (I know, I should.  It's more secure.  But the fact is, I 
don't.  I use files.)

But this is the kind of thing that could make dealing with, say, xpm 
files a real hassle.  (Probably won't, as ascii is still ascii, but it 
will introduce corner cases.)  A lot of the time what I'm really dealing 
with is bytes rather than characters.  I think of them as characters, 
and try to choose values that display nicely as characters, because 
that's the way that's been convenient for decades.  But they ARE bytes, 
sometimes signed bytes.  And this is going to mean that there are lots 
of cases where they don't map nicely to something that's trying to 
understand them as unicode.

So there needs to be an easy and obvious way to deal with files whose 
records are arrays of byte valued data...that is commonly manipulated by 
an editor using ascii-8.


From martin at v.loewis.de  Sun Sep 23 19:36:53 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 23 Sep 2007 19:36:53 +0200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F6A148.6070107@earthlink.net>
References: 		<-7804278669952876495@unknownmsgid>	
	<46F6A148.6070107@earthlink.net>
Message-ID: <46F6A435.10203@v.loewis.de>

> So there needs to be an easy and obvious way to deal with files whose 
> records are arrays of byte valued data...that is commonly manipulated by 
> an editor using ascii-8.

Did you follow the thread at all? There is an easy and obvious way to
deal with such files.

Regards,
Martin

From charleshixsn at earthlink.net  Sun Sep 23 20:09:25 2007
From: charleshixsn at earthlink.net (Charles D Hixson)
Date: Sun, 23 Sep 2007 11:09:25 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: 
References: 		<-7804278669952876495@unknownmsgid>		<18161.32698.291402.642086@montanaro.dyndns.org>
	
Message-ID: <46F6ABD5.7010103@earthlink.net>

Brett Cannon wrote:
> On 9/19/07, skip at pobox.com  wrote:
>   
>>     Guido> You can repeat that until you're blue in the face but it's not
>>     Guido> going to change. Way more programs (especially simple ones) deal
>>     Guido> with txet than with binary data.
>>
>> For us Unix-heads the notion that a file is anything other than a stream of
>> bytes is rather foreign.  I understand that to a large degree if you made
>> the world right for us the tail would be wagging the dog.
>>     
>
> I think the key thing here is that Guido said "especially simple ones"
> and the examples people are talking about are not overly simple (e.g,
> gzip, ImageMagik, etc.).  That would suggest that if you want the raw
> bytes from stdin or write out to stdout that accessing the 'buffer'
> attribute you probably know what you are doing and thus accessing a
> 'buffer' attribute is probably not difficult for you.  =)
>
> -Brett
>   
The problem here seems to be that this isn't currently well documented.  
I've got no objection to using the buffer attribute...but I've searched 
the documentation and haven't found any references to it that don't 
merely refer back to a PEP.  There's one reference about "the new buffer 
interface", but no further details.  There's a comment in the tutorial 
that says to see the library reference for more information...but there 
doesn't appear to be anything in the library reference to justify that 
comment.  Etc.

P.S.:  If opening files on Linux is now to be semantically meaningful, 
then the documentation on that section also needs to change.  Currently 
it appears to mean that it's a meaningless specification that will be 
ignored unless you happen to be using the MSWindows platform.

 

From martin at v.loewis.de  Sun Sep 23 20:48:20 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 23 Sep 2007 20:48:20 +0200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F6ABD5.7010103@earthlink.net>
References: 		<-7804278669952876495@unknownmsgid>		<18161.32698.291402.642086@montanaro.dyndns.org>	
	<46F6ABD5.7010103@earthlink.net>
Message-ID: <46F6B4F4.2060307@v.loewis.de>

> The problem here seems to be that this isn't currently well documented.  
> I've got no objection to using the buffer attribute...

Ok, then it seems you missed the obvious way: Open the file in binary
mode ('rb' or 'wb') if you want to read or write bytes. It has always
been that way in Python; the only change now is that it matters on
systems other than Windows.

Regards,
Martin

From skip.montanaro at gmail.com  Sun Sep 23 23:21:54 2007
From: skip.montanaro at gmail.com (Skip Montanaro)
Date: Sun, 23 Sep 2007 16:21:54 -0500
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F6ABD5.7010103@earthlink.net>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
Message-ID: <60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>

> P.S.:  If opening files on Linux is now to be semantically meaningful,
> then the documentation on that section also needs to change.  Currently
> it appears to mean that it's a meaningless specification that will be
> ignored unless you happen to be using the MSWindows platform.

I just checked in a change to the documentation for the builtin open function.
Please have a look at Doc/library/functions.rst and let me know if you
think more needs to be done.  Also, if there are other places in the
documentation
where it seems to imply that the distinction between text and binary modes is
meaningless on Unix systems, drop me a note and I'll have a look.

Skip

From skip at pobox.com  Sun Sep 23 23:07:02 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 23 Sep 2007 16:07:02 -0500
Subject: [Python-3000] More uniform treatment of files' newlines attribute?
Message-ID: <18166.54646.466616.838995@montanaro.dyndns.org>


While editing the documentation of the builtin open function, I noticed that
the newlines attributes can take on three different value types: None,
strings or tuples of strings.  It seems to me it would be better if was
always a set containing the newline values seen so far.  There's no testing
necessary if you need to do something with the newlines you've seen, you
just loop over them:

    for nl in f.newlines:
        print("%r" % nl)

With the current mixed types metaphor you have to do something like this:

    if f.newlines is not None:
        if type(f.newlines) is tuple:
            for nl in f.newlines:
                print("%r" % nl)
        else:
            print("%r" % f.newlines)

This, of course, assumes the file has been opened in text mode.  If you have
a binary mode file you also have to call hasattr(f, "newlines").  Presumably
in most cases you'll know the file's mode without needing to check, but
maybe binary files should also have a newlines attribute which is always the
empty set.

Skip

From charleshixsn at earthlink.net  Tue Sep 25 04:32:26 2007
From: charleshixsn at earthlink.net (Charles D Hixson)
Date: Mon, 24 Sep 2007 19:32:26 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
References: 	
		
	<-7804278669952876495@unknownmsgid>	
		
	<18161.32698.291402.642086@montanaro.dyndns.org>	
		
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
Message-ID: <46F8733A.2020908@earthlink.net>

Skip Montanaro wrote:
>> P.S.:  If opening files on Linux is now to be semantically meaningful,
>> then the documentation on that section also needs to change.  Currently
>> it appears to mean that it's a meaningless specification that will be
>> ignored unless you happen to be using the MSWindows platform.
>>     
>
> I just checked in a change to the documentation for the builtin open function.
> Please have a look at Doc/library/functions.rst and let me know if you
> think more needs to be done.  Also, if there are other places in the
> documentation
> where it seems to imply that the distinction between text and binary modes is
> meaningless on Unix systems, drop me a note and I'll have a look.
>
> Skip
>
>   
Yes, that says what I feel it should say.
(Well, I looked it up at 
http://docs.python.org/dev/3.0/library/functions.html?highlight=builtin ).
There's another place in the tutorial section
http://docs.python.org/dev/3.0/tutorial/inputoutput.html?highlight=open
and search for "On Windows and the Macintosh, 'b' appended to the mode 
opens the file in binary mode,"


From janssen at parc.com  Tue Sep 25 05:42:16 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 24 Sep 2007 20:42:16 PDT
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F8733A.2020908@earthlink.net> 
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
	<46F8733A.2020908@earthlink.net>
Message-ID: <07Sep24.204222pdt."57996"@synergy1.parc.xerox.com>

> Also, if there are other places in the
> > documentation
> > where it seems to imply that the distinction between text and binary modes is
> > meaningless on Unix systems, drop me a note and I'll have a look.

That's certainly the prescribed behavior for the C stdio streams on
POSIX-compliant systems.  I think a lot of the original design of the
Python I/O system was based on that C stdio system, including names
like stdin, stdout, and stderr.

Now that we've moved away from the C stdio model, and the distinction
between text and binary streams is meaningful even on POSIX systems,
perhaps we should also change those names to reflect that difference
from C.  Given that Py3K is a once-in-a-decade chance to break
backwards compatibility, and all.  Perhaps something like
sys.io.input, sys.io.output, sys.io.err, or something similar.

Bill

From jyasskin at gmail.com  Tue Sep 25 08:09:54 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Mon, 24 Sep 2007 23:09:54 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
Message-ID: <5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>

On 9/20/07, Adam Hupp  wrote:
> On 9/20/07, Jeffrey Yasskin  wrote:
> >
> > Thanks for the help! This brings up a policy question: For patches
> > like the one I've attached here, do we want to start submitting them
> > now, or build up a mondo patch to fix them all at once?
>
> My changes are here:
>
> http://bugs.python.org/issue1184
>
> With that patch there are only two issues remaining (6 test failures).

I've finally gotten around to tracking down the ParseTuple issue,
which turned out to fix all 6 remaining tests, and posted the patch to
the same issue. Thanks for the help! Guido, the patch isn't quite in a
form I'd want to commit, but the tests pass. What do you think?

-- 
Namast?,
Jeffrey Yasskin

From p.f.moore at gmail.com  Tue Sep 25 09:39:24 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 25 Sep 2007 08:39:24 +0100
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
Message-ID: <79990c6b0709250039q3cf5b6a5j3a37797b84fe43d3@mail.gmail.com>

On 22/09/2007, Travis Oliphant  wrote:
> If it is decided to make bytes immutable (which sounds good to me),
> then I want to add my voice to those that clamor for an additional
> mutable object capable of allocating chunks of memory.
>
> This object should have a C-API and have it's structure exposed to
> extension module writers (thus array.array does not fit the bill -- but
> might be a prototype if some of it is moved over to the Objects
> directory and given an API).

Can you describe in a little more detail what you mean by "should have
a C-API"? I don't often work at the C level these days, so I may be
missing something obvious. The array module is built in, so it's
written in C - what needs to be exposed to qualify as a "C API"? And
why does the code need to move location to qualify?

(In case it's not clear, I'm thinking of having a look, and seeing if
I can help implement what you are after. No promises, given the amount
of free time I have, but with some hints I'll see how far I can get!)

Paul.

From mark at qtrac.eu  Tue Sep 25 10:58:03 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Tue, 25 Sep 2007 09:58:03 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>
	
Message-ID: <200709250958.03993.mark@qtrac.eu>

On 2007-09-16, Arvind Singh wrote:
> > How do you get from "some keys can't be ordered" to "it doesn't make
> > sense for Python to have sorteddict or sortedset"?  If you want to use
> > keys that can't be ordered, then feel free to continue to use dict.
> > For situations in which ordering is important, that language should
> > support that.  When did this become an all or nothing proposition?
> > There's plenty of space for both dict and sorteddict.
>
> Sorry for premature conclusions. All I wanted to do was remind the
> potential problems with any "generic" implementation.
>
> And I did say, when ordering is important, we are left with two choices:
> 1) Sort explicitly (whenever required) and be prepared to handle exceptions
> raised during sort operation.
> 2) Have a implicitly "sorted" implementation and handle exceptions at every
> insertion.
>
> I, personally, tend to prefer the former solution. Later case is useful
> when we have large objects and we do large number of insertions, in which
> case, per insertion exception handling would be inefficient. Former case,
> in turn, can be slightly confusing and a bit to debug.

I can understand your personal preference for dict, although mine is for
sorteddict---but IMO Python should provide both since both are
legitimate in appropriate contexts. To this end I've put a posting on
comp.lang.python with subject:

    sorteddict PEP proposal [started off as orderedict]

If there is a positive response I will submit it to the PEP editors. If
there is not, I will just hope that someone else will pick up the idea,
even if in another form or with a different API, because I'd really like
to see some kind of sorted dictionary in Python's standard library. (I
also think there's a similar case for a sorted set.)

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From weilawei at gmail.com  Tue Sep 25 15:46:01 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Tue, 25 Sep 2007 09:46:01 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
Message-ID: <20070925094601.c151245c.weilawei@gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've uploaded the latest code to http://umass.glexia.net/mpf.tar.bz2

It's been cleaned up, implements a little bit of the abstract number interface, many very repetitive function declarations were turned into macros making it far easier to maintain, and it now has a printable representation like you'd expect from a float. At this point, I'm able to use it as a stripped down drop in replacement for Decimal. It's also much, much faster.

One question I was asked in IRC was if it was possible to change the precision. Currently, that's only implemented during initialization of an instance, by passing the prec keyword. It defaults to 128 bits, what looks to me to be about double the precision of a builtin float. 

Included in this is a copy of my git repository since I don't have it online. I'm going to be away for a while, and someone else may find it useful if they want to hack on it. (I have occasionally been known to screw up =P)

Well, that's all for today's update. Hopefully, more to come soon.

Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG+REZqR5p8HaX4oURAnJMAKCDxjO2YUnNrJVClujA0l8+wKSLkACeOc5F
097rKqJO6DoaLShfpA3oPsU=
=lB+r
-----END PGP SIGNATURE-----

From uche at ogbuji.net  Tue Sep 25 15:39:24 2007
From: uche at ogbuji.net (Uche Ogbuji)
Date: Tue, 25 Sep 2007 07:39:24 -0600
Subject: [Python-3000] New io system and binary data
In-Reply-To: <07Sep24.204222pdt.57996@synergy1.parc.xerox.com>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
	<46F8733A.2020908@earthlink.net>
	<07Sep24.204222pdt.57996@synergy1.parc.xerox.com>
Message-ID: <46F90F8C.6000301@ogbuji.net>

Bill Janssen wrote:
> That's certainly the prescribed behavior for the C stdio streams on
> POSIX-compliant systems.  I think a lot of the original design of the
> Python I/O system was based on that C stdio system, including names
> like stdin, stdout, and stderr.
>
> Now that we've moved away from the C stdio model, and the distinction
> between text and binary streams is meaningful even on POSIX systems,
> perhaps we should also change those names to reflect that difference
> from C.  Given that Py3K is a once-in-a-decade chance to break
> backwards compatibility, and all.  Perhaps something like
> sys.io.input, sys.io.output, sys.io.err, or something similar.
>   

+1, except I'd say "sys.io.error"for the latter.

-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/


From facundobatista at gmail.com  Tue Sep 25 18:06:40 2007
From: facundobatista at gmail.com (Facundo Batista)
Date: Tue, 25 Sep 2007 13:06:40 -0300
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: <20070925094601.c151245c.weilawei@gmail.com>
References: <20070925094601.c151245c.weilawei@gmail.com>
Message-ID: 

2007/9/25, Rob Crowther :

> a float. At this point, I'm able to use it as a stripped down drop in
> replacement for Decimal. It's also much, much faster.

Didn't understand this phrase. You're able to use it, after stripping
it down, as a replacement of Decimal? Or you're able to use it as a
replacement of a stripped down Decimal?

For the record: I don't have the "not invented here" syndrome. If you
find a replacement to Decimal that is faster than actual, it's great!

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From guido at python.org  Tue Sep 25 19:10:31 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 10:10:31 -0700
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F90F8C.6000301@ogbuji.net>
References:  <-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
	<46F8733A.2020908@earthlink.net>
	<07Sep24.204222pdt.57996@synergy1.parc.xerox.com>
	<46F90F8C.6000301@ogbuji.net>
Message-ID: 

On 9/25/07, Uche Ogbuji  wrote:
> Bill Janssen wrote:
> > That's certainly the prescribed behavior for the C stdio streams on
> > POSIX-compliant systems.  I think a lot of the original design of the
> > Python I/O system was based on that C stdio system, including names
> > like stdin, stdout, and stderr.
> >
> > Now that we've moved away from the C stdio model, and the distinction
> > between text and binary streams is meaningful even on POSIX systems,
> > perhaps we should also change those names to reflect that difference
> > from C.  Given that Py3K is a once-in-a-decade chance to break
> > backwards compatibility, and all.  Perhaps something like
> > sys.io.input, sys.io.output, sys.io.err, or something similar.
> >
>
> +1, except I'd say "sys.io.error"for the latter.

-1. I could just say "the deadline for PEPs was last April" or "let's
stop bikeshedding", but I'd rather explain why I would have been
against this idea even if it was proposed with a proper PEP before the
deadline. Maybe it helps stem similar proposals.

In general the goal for Python 3000 is to change only things that are
genuine language warts (things that would remain stumbling blocks
forever if not fixed), and to leave everything else alone as much as
possible. I don't think the naming of sys.stdin and friends in Python
has ever confused anybody, regardless of whether they were amongst the
authors of the C standard library, or had never seen a line of C in
their life.

There are literally thousands of names in the standard library that
could be changed to conform to a better naming scheme, to be more
intuitive, to divorce them from an irrelevant legacy, or for whatever
other reason. Doing so would just cause endless annoyance for people
used to Python 2.x, at no real benefit for future users.

Python 3000 is boldly choosing to be backwards compatible, except in
cases where a real benefit can be obtained by being incompatible. This
is not such a case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 25 19:18:50 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 10:18:50 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709250958.03993.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>
	
	<200709250958.03993.mark@qtrac.eu>
Message-ID: 

On 9/25/07, Mark Summerfield  wrote:
> I can understand your personal preference for dict, although mine is for
> sorteddict---but IMO Python should provide both since both are
> legitimate in appropriate contexts.

Careful what you wish for.

One of Python's strengths is that there is *not* a lot of choice in
data type implementations (unless you go to relatively obscure places
like the collections module or 3rd party extensions). This saves
programmers time because they don't have to decide what data type
implementation to use in cases where it doesn't matter (and that's the
majority of cases).

This is not a rationalization after the fact: it has always been a
specific design goal in Python to minimize the number of decisions
that a programmer must make up front. This goal also minimizes the
danger that the *wrong* decision is made, as the standard data types
are pretty darn good for almost any purpose.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 25 19:20:04 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 10:20:04 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
Message-ID: 

On 9/24/07, Jeffrey Yasskin  wrote:
> On 9/20/07, Adam Hupp  wrote:
> > On 9/20/07, Jeffrey Yasskin  wrote:
> > >
> > > Thanks for the help! This brings up a policy question: For patches
> > > like the one I've attached here, do we want to start submitting them
> > > now, or build up a mondo patch to fix them all at once?
> >
> > My changes are here:
> >
> > http://bugs.python.org/issue1184
> >
> > With that patch there are only two issues remaining (6 test failures).
>
> I've finally gotten around to tracking down the ParseTuple issue,
> which turned out to fix all 6 remaining tests, and posted the patch to
> the same issue. Thanks for the help! Guido, the patch isn't quite in a
> form I'd want to commit, but the tests pass. What do you think?

I'll have a good look at this today. Thanks for your efforts everyone!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From weilawei at gmail.com  Tue Sep 25 19:30:37 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Tue, 25 Sep 2007 13:30:37 -0400
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<66d0a6e10709151853w37b949a8i6b4ed2bcb709c064@mail.gmail.com>
	
	<200709250958.03993.mark@qtrac.eu>
	
Message-ID: <20070925133037.9405a211.weilawei@gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 25 Sep 2007 10:18:50 -0700
"Guido van Rossum"  wrote:

> On 9/25/07, Mark Summerfield  wrote:
> > I can understand your personal preference for dict, although mine is for
> > sorteddict---but IMO Python should provide both since both are
> > legitimate in appropriate contexts.
>
> This is not a rationalization after the fact: it has always been a
> specific design goal in Python to minimize the number of decisions
> that a programmer must make up front. This goal also minimizes the
> danger that the *wrong* decision is made, as the standard data types
> are pretty darn good for almost any purpose.

I ran into the issue of wanting an ordered dict recently. I was rather upset at having to redesign my data structures--at first. After reworking them to fit within the confines of an unordered dict, I realized that it actually worked better. 

This isn't to say there should be no such thing, but it really doesn't need to be a part of the standard library, imo. -1 vote for ordered dicts.

Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG+UW9qR5p8HaX4oURAtYmAKCX4xjNTyC7n2ksV/Jb6+ztrtd43ACglRF2
PGUqWUUviyMoWvg9cAO6otk=
=umXa
-----END PGP SIGNATURE-----

From mark at qtrac.eu  Tue Sep 25 19:43:12 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Tue, 25 Sep 2007 18:43:12 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709250958.03993.mark@qtrac.eu>
	
Message-ID: <200709251843.12325.mark@qtrac.eu>

On 2007-09-25, Guido van Rossum wrote:
> On 9/25/07, Mark Summerfield  wrote:
> > I can understand your personal preference for dict, although mine is for
> > sorteddict---but IMO Python should provide both since both are
> > legitimate in appropriate contexts.
>
> Careful what you wish for.
>
> One of Python's strengths is that there is *not* a lot of choice in
> data type implementations (unless you go to relatively obscure places
> like the collections module or 3rd party extensions). This saves
> programmers time because they don't have to decide what data type
> implementation to use in cases where it doesn't matter (and that's the
> majority of cases).
>
> This is not a rationalization after the fact: it has always been a
> specific design goal in Python to minimize the number of decisions
> that a programmer must make up front. This goal also minimizes the
> danger that the *wrong* decision is made, as the standard data types
> are pretty darn good for almost any purpose.

My proposal was for the sorteddict to be put in the collections module,
not as a builtin. One of the things I particularly like about Python is
that the core language is small.

However, I think that the collections module is rather thin, and as you
say, it is "obscure" so won't get in the way of inexperienced or casual
users if it is beefed up a bit, yet could be really useful to more
demanding users.

On comp.lang.python, a respondent called Paul Hankin suggested a
somewhat different approach to mine: he proposed a sorteddict with the
same API as a dict but with a constructor that is similar to the
sorted() function:

    sorteddict((mapping | sequence | nothing), cmp=None, key=None,
               reverse=None)

He points out that this has a problem with keyword argument
dictionaries, but that one solution is sorteddict(dict(**kwargs), ...).

From comments other people have made on this list and on
comp.lang.python, it may be that Paul Hankin's approach is more popular
and better than the one I proposed---the only downside being that he
didn't give any hints as to an implementation.

I am hoping that Python 2.6 (and 3.0) will have a sorted dictionary of
some kind, and I get the impression that it would be welcomed (in the
standard library).

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From guido at python.org  Tue Sep 25 20:06:04 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 11:06:04 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709251843.12325.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709250958.03993.mark@qtrac.eu>
	
	<200709251843.12325.mark@qtrac.eu>
Message-ID: 

On 9/25/07, Mark Summerfield  wrote:
> My proposal was for the sorteddict to be put in the collections module,
> not as a builtin. One of the things I particularly like about Python is
> that the core language is small.
>
> However, I think that the collections module is rather thin, and as you
> say, it is "obscure" so won't get in the way of inexperienced or casual
> users if it is beefed up a bit, yet could be really useful to more
> demanding users.
>
> On comp.lang.python, a respondent called Paul Hankin suggested a
> somewhat different approach to mine: he proposed a sorteddict with the
> same API as a dict but with a constructor that is similar to the
> sorted() function:
>
>     sorteddict((mapping | sequence | nothing), cmp=None, key=None,
>                reverse=None)
>
> He points out that this has a problem with keyword argument
> dictionaries, but that one solution is sorteddict(dict(**kwargs), ...).

Why would this be a problem? There is no requirement that sorteddict()
support this feature.

> From comments other people have made on this list and on
> comp.lang.python, it may be that Paul Hankin's approach is more popular
> and better than the one I proposed---the only downside being that he
> didn't give any hints as to an implementation.
>
> I am hoping that Python 2.6 (and 3.0) will have a sorted dictionary of
> some kind, and I get the impression that it would be welcomed (in the
> standard library).

For that to happen, someone has to write a production-quality
implementation, release it as a separate 3rd party module for a while,
show that it is sufficiently stable and popular to be incorporated in
the standard library, and commit to maintaining it for a few years at
least. (It doesn't have to be all the same someone.)

Hoping and wishing doesn't cause working code to spring into existence.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mark at qtrac.eu  Tue Sep 25 23:01:43 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Tue, 25 Sep 2007 22:01:43 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709251843.12325.mark@qtrac.eu>
	
Message-ID: <200709252201.43689.mark@qtrac.eu>

On 2007-09-25, you wrote:
> On 9/25/07, Mark Summerfield  wrote:
> > My proposal was for the sorteddict to be put in the collections module,
> > not as a builtin. One of the things I particularly like about Python is
> > that the core language is small.
> >
> > However, I think that the collections module is rather thin, and as you
> > say, it is "obscure" so won't get in the way of inexperienced or casual
> > users if it is beefed up a bit, yet could be really useful to more
> > demanding users.
> >
> > On comp.lang.python, a respondent called Paul Hankin suggested a
> > somewhat different approach to mine: he proposed a sorteddict with the
> > same API as a dict but with a constructor that is similar to the
> > sorted() function:
> >
> >     sorteddict((mapping | sequence | nothing), cmp=None, key=None,
> >                reverse=None)
> >
> > He points out that this has a problem with keyword argument
> > dictionaries, but that one solution is sorteddict(dict(**kwargs), ...).
>
> Why would this be a problem? There is no requirement that sorteddict()
> support this feature.
>
> > From comments other people have made on this list and on
> > comp.lang.python, it may be that Paul Hankin's approach is more popular
> > and better than the one I proposed---the only downside being that he
> > didn't give any hints as to an implementation.
> >
> > I am hoping that Python 2.6 (and 3.0) will have a sorted dictionary of
> > some kind, and I get the impression that it would be welcomed (in the
> > standard library).
>
> For that to happen, someone has to write a production-quality
> implementation, release it as a separate 3rd party module for a while,
> show that it is sufficiently stable and popular to be incorporated in
> the standard library, and commit to maintaining it for a few years at
> least. (It doesn't have to be all the same someone.)

OK, I'm sure I or Paul Hankin or others will put up at least one version
on PyPI and maybe get it in for Python 4:-)

> Hoping and wishing doesn't cause working code to spring into existence.

As a matter of fact it does... by the time I read this Paul Hankin had
written a version based on his idea... and so have I. Neither is likely
to be fast but they both provide the API described above in pure Python.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From guido at python.org  Tue Sep 25 23:26:40 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 14:26:40 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
Message-ID: 

OK, Jeffrey's and Adam's patches were helpful; it looks like the
damage done by making bytes immutable is pretty limited: plenty of
modules are affected, but the changes are straightforward and
localized.

So now I have an idea that goes a little farther. It relates to
Talin's response (second message in this thread if you're using gmail)
and acknowledges that there are some good use cases for mutable bytes
as well (as I've always maintained).

How about we take the existing PyString implementation (Python 2's
str, currently still present as str8 in py3k), remove the locale and
unicode mixing support, and call it bytes. Then the PyBytes type can
be renamed to buffer. It is well-documented that I don't care much
about the existing buffer() builtin; it can be renamed to memview for
all I care (that would be a more descriptive name anyway).

This would provide a much better transitional path for 2.x code
manipulating raw bytes using str instances: just change "..." into
b"..." and str into bytes. (Of course, 2.x code that is confused about
bytes vs. characters will fail hard in 3.0 as soon as a bytes and a
str instance meet -- this is already the case in the current 3.0 code
base and will remain unchanged.)

It would mean more fixes beyond what Jeffrey and Adam did, since
iterating over a bytes instance would return a bytes instance of
length 1 instead of a small int, and the bytes constructor would
change accordingly (no more initializing a bytes object from a list of
ints).

The (new) buffer object would also have to change to be more
compatible with the (new) bytes object -- bytes<-->buffer conversions
should be 1-1, and iterating over a buffer instance would also have to
return a length-1 buffer (or bytes???) instance.

Thoughts?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Sep 25 23:32:05 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 14:32:05 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
Message-ID: 

On 9/25/07, Guido van Rossum  wrote:
> OK, Jeffrey's and Adam's patches were helpful; it looks like the
> damage done by making bytes immutable is pretty limited: plenty of
> modules are affected, but the changes are straightforward and
> localized.
>
> So now I have an idea that goes a little farther. It relates to
> Talin's response (second message in this thread if you're using gmail)
> and acknowledges that there are some good use cases for mutable bytes
> as well (as I've always maintained).
>
> How about we take the existing PyString implementation (Python 2's
> str, currently still present as str8 in py3k), remove the locale and
> unicode mixing support, and call it bytes. Then the PyBytes type can
> be renamed to buffer. It is well-documented that I don't care much
> about the existing buffer() builtin; it can be renamed to memview for
> all I care (that would be a more descriptive name anyway).

D'oh. Travis already implemented a memoryview object that has most of
the required properties. So let's use that instead of memview or the
old buffer object.

> This would provide a much better transitional path for 2.x code
> manipulating raw bytes using str instances: just change "..." into
> b"..." and str into bytes. (Of course, 2.x code that is confused about
> bytes vs. characters will fail hard in 3.0 as soon as a bytes and a
> str instance meet -- this is already the case in the current 3.0 code
> base and will remain unchanged.)
>
> It would mean more fixes beyond what Jeffrey and Adam did, since
> iterating over a bytes instance would return a bytes instance of
> length 1 instead of a small int, and the bytes constructor would
> change accordingly (no more initializing a bytes object from a list of
> ints).
>
> The (new) buffer object would also have to change to be more
> compatible with the (new) bytes object -- bytes<-->buffer conversions
> should be 1-1, and iterating over a buffer instance would also have to
> return a length-1 buffer (or bytes???) instance.
>
> Thoughts?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Wed Sep 26 00:14:19 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 25 Sep 2007 18:14:19 -0400
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
Message-ID: 

> How about we take the existing PyString implementation (Python 2's
> str, currently still present as str8 in py3k), remove the locale and
> unicode mixing support, and call it bytes.

Is that just encode/decode?
But isn't this one sensible way to store an encoded str, so that
decode (only) would still make sense?

I would have expected to drop text or character-oriented methods,
because they should really be done on the (decoded) unicode version.
Given bytes use in wire protocols, I could also understand saying that
these methods only work on ASCII, and either raise an exception or
return false for other byte values.

text-or-chararacter-oriented methods:

'capitalize', 'center',  'endswith', 'expandtabs', 'isalnum',
'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'ljust', 'lower', 'lstrip', 'rjust', 'rstrip',  'splitlines', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill'

> It would mean more fixes beyond what Jeffrey and Adam did, since
> iterating over a bytes instance would return a bytes instance of
> length 1 instead of a small int,

makes sense

> and the bytes constructor would
> change accordingly (no more initializing a bytes object from a list of
> ints).

Why not?

I expect the literal b"ASCII string" to be the most common
constructor, but I don't see the problem with a sequence of ints (or
hex) as an alternative constructor.

> The (new) buffer object would also have to change to be more
> compatible with the (new) bytes object -- bytes<-->buffer conversions
> should be 1-1, and iterating over a buffer instance would also have to
> return a length-1 buffer (or bytes???) instance.

I would return a bytes instance.  If you return a 1-char buffer, and
someone does modify that, it isn't clear whether the change should be
reflected in the original source buffer.  If someone does want an
in-place filter, they can always use enumerate and slicing.


Can we assume that the two types are unequal, but that you can search
a buffer for a (constant) bytes?

    >>> mybytes = b"some data"
    >>> mybuffer = buffer(mybytes)

    >>> mybuffer == mybytes
    False

    >>> mybuffer.startswith(mybytes)  and \
    ...    mybuffer.endswith(mybytes)  and \
    ...    len(mybuffer) == len(mybytes)
    True

-jJ

From brett at python.org  Wed Sep 26 02:03:29 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 25 Sep 2007 17:03:29 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
Message-ID: 

On 9/25/07, Guido van Rossum  wrote:
> OK, Jeffrey's and Adam's patches were helpful; it looks like the
> damage done by making bytes immutable is pretty limited: plenty of
> modules are affected, but the changes are straightforward and
> localized.
>
> So now I have an idea that goes a little farther. It relates to
> Talin's response (second message in this thread if you're using gmail)
> and acknowledges that there are some good use cases for mutable bytes
> as well (as I've always maintained).
>
> How about we take the existing PyString implementation (Python 2's
> str, currently still present as str8 in py3k), remove the locale and
> unicode mixing support, and call it bytes. Then the PyBytes type can
> be renamed to buffer. It is well-documented that I don't care much
> about the existing buffer() builtin; it can be renamed to memview for
> all I care (that would be a more descriptive name anyway).
>
> This would provide a much better transitional path for 2.x code
> manipulating raw bytes using str instances: just change "..." into
> b"..." and str into bytes. (Of course, 2.x code that is confused about
> bytes vs. characters will fail hard in 3.0 as soon as a bytes and a
> str instance meet -- this is already the case in the current 3.0 code
> base and will remain unchanged.)
>
> It would mean more fixes beyond what Jeffrey and Adam did, since
> iterating over a bytes instance would return a bytes instance of
> length 1 instead of a small int, and the bytes constructor would
> change accordingly (no more initializing a bytes object from a list of
> ints).
>

+0.  While 2to3 would be able to help more, the methods that will be
ripped out will make the ease in transition from this a lot less.
Plus you can have immutable bytes in a way by passing the current
bytes to tuple.

> The (new) buffer object would also have to change to be more
> compatible with the (new) bytes object -- bytes<-->buffer conversions
> should be 1-1, and iterating over a buffer instance would also have to
> return a length-1 buffer (or bytes???) instance.

Return a byte.  If you want a mutable length-1 thing you should have
to do a length 1 slice.  Otherwise its an index operation and you want
what is stored at the index, which is an immutable byte.

-Brett

From guido at python.org  Wed Sep 26 02:22:39 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 25 Sep 2007 17:22:39 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
Message-ID: 

On 9/25/07, Brett Cannon  wrote:
> On 9/25/07, Guido van Rossum  wrote:
> > OK, Jeffrey's and Adam's patches were helpful; it looks like the
> > damage done by making bytes immutable is pretty limited: plenty of
> > modules are affected, but the changes are straightforward and
> > localized.
> >
> > So now I have an idea that goes a little farther. It relates to
> > Talin's response (second message in this thread if you're using gmail)
> > and acknowledges that there are some good use cases for mutable bytes
> > as well (as I've always maintained).
> >
> > How about we take the existing PyString implementation (Python 2's
> > str, currently still present as str8 in py3k), remove the locale and
> > unicode mixing support, and call it bytes. Then the PyBytes type can
> > be renamed to buffer. It is well-documented that I don't care much
> > about the existing buffer() builtin; it can be renamed to memview for
> > all I care (that would be a more descriptive name anyway).
> >
> > This would provide a much better transitional path for 2.x code
> > manipulating raw bytes using str instances: just change "..." into
> > b"..." and str into bytes. (Of course, 2.x code that is confused about
> > bytes vs. characters will fail hard in 3.0 as soon as a bytes and a
> > str instance meet -- this is already the case in the current 3.0 code
> > base and will remain unchanged.)
> >
> > It would mean more fixes beyond what Jeffrey and Adam did, since
> > iterating over a bytes instance would return a bytes instance of
> > length 1 instead of a small int, and the bytes constructor would
> > change accordingly (no more initializing a bytes object from a list of
> > ints).
> >
>
> +0.  While 2to3 would be able to help more, the methods that will be
> ripped out will make the ease in transition from this a lot less.

Compared to what? The methods to be ripped out are already not
available on bytes objects.

> Plus you can have immutable bytes in a way by passing the current
> bytes to tuple.

At what cost? tuple(b"x"*100) is a tuple of length 100.

> > The (new) buffer object would also have to change to be more
> > compatible with the (new) bytes object -- bytes<-->buffer conversions
> > should be 1-1, and iterating over a buffer instance would also have to
> > return a length-1 buffer (or bytes???) instance.
>
> Return a byte.  If you want a mutable length-1 thing you should have
> to do a length 1 slice.  Otherwise its an index operation and you want
> what is stored at the index, which is an immutable byte.

OK. Though it's questionable even whether a slice of a mutable bytes
object should return a mutable bytes object (as it is not a shared
view). But as that is what PyBytes currently do it is certainly the
easiest...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Sep 26 02:43:05 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 26 Sep 2007 12:43:05 +1200
Subject: [Python-3000] New io system and binary data
In-Reply-To: <07Sep24.204222pdt.57996@synergy1.parc.xerox.com>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
	<46F8733A.2020908@earthlink.net>
	<07Sep24.204222pdt.57996@synergy1.parc.xerox.com>
Message-ID: <46F9AB19.7080404@canterbury.ac.nz>

Bill Janssen wrote:
> Now that we've moved away from the C stdio model, and the distinction
> between text and binary streams is meaningful even on POSIX systems,
> perhaps we should also change those names to reflect that difference
> from C.

I don't think anything would be gained by changing these
well-established and widely-understood names just because
of such an obscure and pedantic detail.

--
Greg

From brett at python.org  Wed Sep 26 02:55:47 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 25 Sep 2007 17:55:47 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
Message-ID: 

On 9/25/07, Guido van Rossum  wrote:
> On 9/25/07, Brett Cannon  wrote:
> > On 9/25/07, Guido van Rossum  wrote:
> > > OK, Jeffrey's and Adam's patches were helpful; it looks like the
> > > damage done by making bytes immutable is pretty limited: plenty of
> > > modules are affected, but the changes are straightforward and
> > > localized.
> > >
> > > So now I have an idea that goes a little farther. It relates to
> > > Talin's response (second message in this thread if you're using gmail)
> > > and acknowledges that there are some good use cases for mutable bytes
> > > as well (as I've always maintained).
> > >
> > > How about we take the existing PyString implementation (Python 2's
> > > str, currently still present as str8 in py3k), remove the locale and
> > > unicode mixing support, and call it bytes. Then the PyBytes type can
> > > be renamed to buffer. It is well-documented that I don't care much
> > > about the existing buffer() builtin; it can be renamed to memview for
> > > all I care (that would be a more descriptive name anyway).
> > >
> > > This would provide a much better transitional path for 2.x code
> > > manipulating raw bytes using str instances: just change "..." into
> > > b"..." and str into bytes. (Of course, 2.x code that is confused about
> > > bytes vs. characters will fail hard in 3.0 as soon as a bytes and a
> > > str instance meet -- this is already the case in the current 3.0 code
> > > base and will remain unchanged.)
> > >
> > > It would mean more fixes beyond what Jeffrey and Adam did, since
> > > iterating over a bytes instance would return a bytes instance of
> > > length 1 instead of a small int, and the bytes constructor would
> > > change accordingly (no more initializing a bytes object from a list of
> > > ints).
> > >
> >
> > +0.  While 2to3 would be able to help more, the methods that will be
> > ripped out will make the ease in transition from this a lot less.
>
> Compared to what? The methods to be ripped out are already not
> available on bytes objects.
>

Right, but that doesn't mean we could put others back in or something
to help others with their code transitions.

> > Plus you can have immutable bytes in a way by passing the current
> > bytes to tuple.
>
> At what cost? tuple(b"x"*100) is a tuple of length 100.
>

Right, but the question is how often people will need this.  There is
a reason that mutable bytes were chosen in the first place.

> > > The (new) buffer object would also have to change to be more
> > > compatible with the (new) bytes object -- bytes<-->buffer conversions
> > > should be 1-1, and iterating over a buffer instance would also have to
> > > return a length-1 buffer (or bytes???) instance.
> >
> > Return a byte.  If you want a mutable length-1 thing you should have
> > to do a length 1 slice.  Otherwise its an index operation and you want
> > what is stored at the index, which is an immutable byte.
>
> OK. Though it's questionable even whether a slice of a mutable bytes
> object should return a mutable bytes object (as it is not a shared
> view). But as that is what PyBytes currently do it is certainly the
> easiest...

-Brett

From greg.ewing at canterbury.ac.nz  Wed Sep 26 02:57:56 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 26 Sep 2007 12:57:56 +1200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <79990c6b0709250039q3cf5b6a5j3a37797b84fe43d3@mail.gmail.com>
References: 
	
	<79990c6b0709250039q3cf5b6a5j3a37797b84fe43d3@mail.gmail.com>
Message-ID: <46F9AE94.7010703@canterbury.ac.nz>

Paul Moore wrote:
> The array module is built in, so it's
> written in C - what needs to be exposed to qualify as a "C API"?

I think he's referring to the fact that there is no
public array.h header file provided that lays out the
C-level details. In fact, last time I looked I don't
think there was any array.h file at all, it was all
inside array.c.

You can fake it by copying the relevant declarations
into your own .h file, but then there's no assurance
that you're not relying on implementation details
that could change. A published interface would be
much more reassuring.

With the new buffer interface, probably just providing
that would be sufficient, together with a C function
for creating an array. The internals could still
remain private if desired.

--
Greg

From skip at pobox.com  Wed Sep 26 03:11:38 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 25 Sep 2007 20:11:38 -0500
Subject: [Python-3000] New io system and binary data
In-Reply-To: <46F8733A.2020908@earthlink.net>
References: 
	
	<-7804278669952876495@unknownmsgid>
	
	<18161.32698.291402.642086@montanaro.dyndns.org>
	
	<46F6ABD5.7010103@earthlink.net>
	<60bb7ceb0709231421v2adaa658m1999604047db527b@mail.gmail.com>
	<46F8733A.2020908@earthlink.net>
Message-ID: <18169.45514.615900.396756@montanaro.dyndns.org>


    Charles> There's another place in the tutorial section
    Charles> http://docs.python.org/dev/3.0/tutorial/inputoutput.html and
    Charles> search for "On Windows and the Macintosh, 'b' appended to the
    Charles> mode opens the file in binary mode,"

I fixed that up as well.  I mentioned the automatic encode/decode for text
files there as well, though I'm not sure it needs to be mentioned in the
tutorial.

Skip


From greg.ewing at canterbury.ac.nz  Wed Sep 26 03:49:03 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 26 Sep 2007 13:49:03 +1200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<5d44f72f0709181019n1eb7dfe4u81e0d7d5e67b2420@mail.gmail.com>
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
Message-ID: <46F9BA8F.80907@canterbury.ac.nz>

Brett Cannon wrote:
> Return a byte.  If you want a mutable length-1 thing you should have
> to do a length 1 slice.  Otherwise its an index operation and you want
> what is stored at the index, which is an immutable byte.

Why shouldn't this argument apply to immutable bytes objects as
well? Or should it?

--
Greg

From mike.klaas at gmail.com  Wed Sep 26 05:09:06 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Tue, 25 Sep 2007 20:09:06 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709252201.43689.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709251843.12325.mark@qtrac.eu>
	
	<200709252201.43689.mark@qtrac.eu>
Message-ID: 

On 25-Sep-07, at 2:01 PM, Mark Summerfield wrote:

> On 2007-09-25, Guido wrote:
>>
>> For that to happen, someone has to write a production-quality
>> implementation, release it as a separate 3rd party module for a  
>> while,
>> show that it is sufficiently stable and popular to be incorporated in
>> the standard library, and commit to maintaining it for a few years at
>> least. (It doesn't have to be all the same someone.)
>
> OK, I'm sure I or Paul Hankin or others will put up at least one  
> version
> on PyPI and maybe get it in for Python 4:-)

Since this isn't backward-incompatible, it can be added any time:  
2.X, 3.X, etc.

-Mike

From brett at python.org  Wed Sep 26 07:31:34 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 25 Sep 2007 22:31:34 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <46F9BA8F.80907@canterbury.ac.nz>
References: 
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	<46F9BA8F.80907@canterbury.ac.nz>
Message-ID: 

On 9/25/07, Greg Ewing  wrote:
> Brett Cannon wrote:
> > Return a byte.  If you want a mutable length-1 thing you should have
> > to do a length 1 slice.  Otherwise its an index operation and you want
> > what is stored at the index, which is an immutable byte.
>
> Why shouldn't this argument apply to immutable bytes objects as
> well? Or should it?

Never said it shouldn't.  But I don't view immutable bytes as a
container like mutable bytes.

-Brett

From mark at qtrac.eu  Wed Sep 26 09:02:44 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 26 Sep 2007 08:02:44 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709252201.43689.mark@qtrac.eu>
	
Message-ID: <200709260802.44381.mark@qtrac.eu>

On 2007-09-26, Mike Klaas wrote:
> On 25-Sep-07, at 2:01 PM, Mark Summerfield wrote:
> > On 2007-09-25, Guido wrote:
> >> For that to happen, someone has to write a production-quality
> >> implementation, release it as a separate 3rd party module for a
> >> while,
> >> show that it is sufficiently stable and popular to be incorporated in
> >> the standard library, and commit to maintaining it for a few years at
> >> least. (It doesn't have to be all the same someone.)
> >
> > OK, I'm sure I or Paul Hankin or others will put up at least one
> > version
> > on PyPI and maybe get it in for Python 4:-)
>
> Since this isn't backward-incompatible, it can be added any time:
> 2.X, 3.X, etc.
>
> -Mike

Yes of course, but I think GvR was really saying "no", at least not
until a year or so has passed, and only then if lots of users ask for
it. So I won't be submitting a PEP.

I have put a new version (incorporating another implementation idea from
Paul Hankin) on PyPI:

http://pypi.python.org/pypi/sorteddict

It does not have the all round (theoretically) good performance of my
original version, but does have a much nicer API than my original idea.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From skip at pobox.com  Wed Sep 26 13:12:27 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 26 Sep 2007 06:12:27 -0500
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709260802.44381.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709252201.43689.mark@qtrac.eu>
	
	<200709260802.44381.mark@qtrac.eu>
Message-ID: <18170.16027.665491.815991@montanaro.dyndns.org>


    Mark> I have put a new version (incorporating another implementation
    Mark> idea from Paul Hankin) on PyPI:

    Mark> http://pypi.python.org/pypi/sorteddict

>From that:

    The main benefit of sorteddicts is that you never have to explicitly
    sort.

Surely there must be something more than that.  Wrapping sorted() around a
keys() or values() call is a pretty trivial operation.  I didn't see that
the implementation saved anything.

Skip

From mark at qtrac.eu  Wed Sep 26 13:33:57 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 26 Sep 2007 12:33:57 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <18170.16027.665491.815991@montanaro.dyndns.org>
References: <200709111506.32823.mark@qtrac.eu>
	<200709260802.44381.mark@qtrac.eu>
	<18170.16027.665491.815991@montanaro.dyndns.org>
Message-ID: <200709261233.57636.mark@qtrac.eu>

On 2007-09-26, skip at pobox.com wrote:
>     Mark> I have put a new version (incorporating another implementation
>     Mark> idea from Paul Hankin) on PyPI:
>
>     Mark> http://pypi.python.org/pypi/sorteddict
>
> From that:
>
>     The main benefit of sorteddicts is that you never have to explicitly
>     sort.
>
> Surely there must be something more than that.  Wrapping sorted() around a
> keys() or values() call is a pretty trivial operation.  I didn't see that
> the implementation saved anything.

Assuming you have a good sorteddict implementation (i.e., based on a
balanced tree or a skiplist, not the one I've put up which is just
showing the API) you can gain significant performance benefits.

For example, if you have a large dataset that you need to traverse quite
frequently in sorted order, calling sorted() each time will be expensive
compared to simply traversing an intrinsically sorted data structure.

When I program in C++/Qt I use QMap (a sorteddict) very often; the STL
equivalent is called map. Both the Qt and STL libraries have dict
equivalents (QHash and unordered_map), but my impression is that the
sorted data structures are used far more frequently than the unsorted
versions.

If you primarily program in Python, using dict + sorted() is very
natural because they are built into the language. But using a sorted
data structure and never sorting is a very common practice in other
languages.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From guido at python.org  Wed Sep 26 16:25:13 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 26 Sep 2007 07:25:13 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709261233.57636.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709260802.44381.mark@qtrac.eu>
	<18170.16027.665491.815991@montanaro.dyndns.org>
	<200709261233.57636.mark@qtrac.eu>
Message-ID: 

On 9/26/07, Mark Summerfield  wrote:
> On 2007-09-26, skip at pobox.com wrote:
> >     Mark> I have put a new version (incorporating another implementation
> >     Mark> idea from Paul Hankin) on PyPI:
> >
> >     Mark> http://pypi.python.org/pypi/sorteddict
> >
> > From that:
> >
> >     The main benefit of sorteddicts is that you never have to explicitly
> >     sort.
> >
> > Surely there must be something more than that.  Wrapping sorted() around a
> > keys() or values() call is a pretty trivial operation.  I didn't see that
> > the implementation saved anything.
>
> Assuming you have a good sorteddict implementation (i.e., based on a
> balanced tree or a skiplist, not the one I've put up which is just
> showing the API) you can gain significant performance benefits.

That depends very much on the use case, and in general I strongly
doubt it. I haven't looked this up in Knuth, but I believe that in a
sorted dict implementation, the best performance you can get for
random access and random insertions is O(log N), which is always beat
by the O(1) of a hash table. This translates in O(N log N) for
inserting N elements into a sorted dict, vs. O(N) in a hash table.
Sorted traversal is O(N) for the sorted dict and O(N log N) for the
hash table. So in order to gain a "significant performance benefit"
you'd have to have one pass of insertions and two traversals with a
small number of insertions or deletions in between (otherwise the
sorted result from the hash table could just be cached).

I don't believe that this pattern is common enough, but I don't know
your application.

> For example, if you have a large dataset that you need to traverse quite
> frequently in sorted order, calling sorted() each time will be expensive
> compared to simply traversing an intrinsically sorted data structure.
>
> When I program in C++/Qt I use QMap (a sorteddict) very often; the STL
> equivalent is called map. Both the Qt and STL libraries have dict
> equivalents (QHash and unordered_map), but my impression is that the
> sorted data structures are used far more frequently than the unsorted
> versions.

Perhaps out of ignorance? Or perhaps the hash implementations have
suboptimal implementations? Or perhaps because no equivalent to
sorted() exists?

> If you primarily program in Python, using dict + sorted() is very
> natural because they are built into the language. But using a sorted
> data structure and never sorting is a very common practice in other
> languages.

Ah, now the real reason you want this so badly is finally clear:
simply because you're more familiar with it. :-)

Is the number of elements in a typical use case large enough that the
performance difference even matters?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mark at qtrac.eu  Wed Sep 26 17:27:25 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 26 Sep 2007 16:27:25 +0100
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709261233.57636.mark@qtrac.eu>
	
Message-ID: <200709261627.25593.mark@qtrac.eu>

On 2007-09-26, Guido van Rossum wrote:
> On 9/26/07, Mark Summerfield  wrote:
> > On 2007-09-26, skip at pobox.com wrote:
> > >     Mark> I have put a new version (incorporating another
> > > implementation Mark> idea from Paul Hankin) on PyPI:
> > >
> > >     Mark> http://pypi.python.org/pypi/sorteddict
> > >
> > > From that:
> > >
> > >     The main benefit of sorteddicts is that you never have to
> > > explicitly sort.
> > >
> > > Surely there must be something more than that.  Wrapping sorted()
> > > around a keys() or values() call is a pretty trivial operation.  I
> > > didn't see that the implementation saved anything.
> >
> > Assuming you have a good sorteddict implementation (i.e., based on a
> > balanced tree or a skiplist, not the one I've put up which is just
> > showing the API) you can gain significant performance benefits.
>
> That depends very much on the use case, and in general I strongly
> doubt it. I haven't looked this up in Knuth, but I believe that in a
> sorted dict implementation, the best performance you can get for
> random access and random insertions is O(log N), which is always beat
> by the O(1) of a hash table. This translates in O(N log N) for
> inserting N elements into a sorted dict, vs. O(N) in a hash table.
> Sorted traversal is O(N) for the sorted dict and O(N log N) for the
> hash table. So in order to gain a "significant performance benefit"
> you'd have to have one pass of insertions and two traversals with a
> small number of insertions or deletions in between (otherwise the
> sorted result from the hash table could just be cached).

I'm sure your numbers are right. It seems to me that the trade off is
this: with dict + sorted() you pay O(N log N) whenever you need to sort
(okay, Python is optimised for sorting partially ordered data so
probably is better than the theoretical best). With sorteddict you pay
O(log N) for accessing, but you pay nothing for sorting.

> I don't believe that this pattern is common enough, but I don't know
> your application.

> > For example, if you have a large dataset that you need to traverse quite
> > frequently in sorted order, calling sorted() each time will be expensive
> > compared to simply traversing an intrinsically sorted data structure.
> >
> > When I program in C++/Qt I use QMap (a sorteddict) very often; the STL
> > equivalent is called map. Both the Qt and STL libraries have dict
> > equivalents (QHash and unordered_map), but my impression is that the
> > sorted data structures are used far more frequently than the unsorted
> > versions.
>
> Perhaps out of ignorance? Or perhaps the hash implementations have
> suboptimal implementations? Or perhaps because no equivalent to
> sorted() exists?

C++ provides sorting algorithms that can be applied to STL containers
(or Qt containers which also has its own sorting algorithms), so these
do exist.

> > If you primarily program in Python, using dict + sorted() is very
> > natural because they are built into the language. But using a sorted
> > data structure and never sorting is a very common practice in other
> > languages.
>
> Ah, now the real reason you want this so badly is finally clear:
> simply because you're more familiar with it. :-)

That is true!

> Is the number of elements in a typical use case large enough that the
> performance difference even matters?

I don't know. In C++ I use QMap or map so my ordering is free and I
never notice the extra cost of lookup compared with a hash. In Python, I
can only compare theoretically, not empirically because I'd need a
sorteddict that was as well implemented as dict is.

I'll leave sorteddict on PyPI for those sad souls who want it, and I'll
try to think "dict + sorted()" for Python.

Of course this might be academic for Python 3, at least for strings
(unless you implement some kind of string comparison normalisation
method), since two strings that are the same to humans may be different
byte sequences which rather makes "sorting" a moot point.

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From jimjjewett at gmail.com  Wed Sep 26 17:35:15 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 26 Sep 2007 11:35:15 -0400
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709260802.44381.mark@qtrac.eu>
	<18170.16027.665491.815991@montanaro.dyndns.org>
	<200709261233.57636.mark@qtrac.eu>
	
Message-ID: 

On 9/26/07, Guido van Rossum  wrote:
> On 9/26/07, Mark Summerfield  wrote:

> > Assuming you have a good sorteddict implementation ...
> > you can gain significant performance benefits.

> ... sorted dict implementation, the best performance you can get for
> random access and random insertions is O(log N), which is always beat
> by the O(1) of a hash table

It is possible to keep two structures in parallel, so that lookup
(using the hash) is still O(1) and traversal (using the tree) is still
O(N); the penalty is that you pay for both methods when you do a
mutation.  (In big O notation, that doesn't matter, but the overhead
may be important in practice.)

-jJ

From guido at python.org  Wed Sep 26 18:34:16 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 26 Sep 2007 09:34:16 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	<46F9BA8F.80907@canterbury.ac.nz>
	
Message-ID: 

Sounds like we need a PEP to sort out the details. I'll try to come up
with something.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Wed Sep 26 18:43:10 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 26 Sep 2007 11:43:10 -0500
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <200709261627.25593.mark@qtrac.eu>
References: <200709111506.32823.mark@qtrac.eu>
	<200709261233.57636.mark@qtrac.eu>
	
	<200709261627.25593.mark@qtrac.eu>
Message-ID: <18170.35870.184920.53212@montanaro.dyndns.org>


    Mark> With sorteddict you pay O(log N) for accessing, but you pay
    Mark> nothing for sorting.

Pay me now or pay me later, but maintaining a sorted sequence will always
cost something.

Skip

From martin at v.loewis.de  Wed Sep 26 20:06:55 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 26 Sep 2007 20:06:55 +0200
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>	<200709260802.44381.mark@qtrac.eu>	<18170.16027.665491.815991@montanaro.dyndns.org>	<200709261233.57636.mark@qtrac.eu>
	
Message-ID: <46FA9FBF.8060909@v.loewis.de>

>> When I program in C++/Qt I use QMap (a sorteddict) very often; the STL
>> equivalent is called map. Both the Qt and STL libraries have dict
>> equivalents (QHash and unordered_map), but my impression is that the
>> sorted data structures are used far more frequently than the unsorted
>> versions.
> 
> Perhaps out of ignorance? Or perhaps the hash implementations have
> suboptimal implementations? Or perhaps because no equivalent to
> sorted() exists?

I feel (without being able to prove it) that C++ (i.e. STL uses a
red-black-tree instead of a hash table for two reasons):
1. it is theoretically better. Hash tables have not O(1), but O(n)
   as the worst case, whereas balanced trees can guarantee O(log n).
   Hash tables have O(1) in the average case only if the hash
   function is good, plus the costs for computing the hash are
   typically higher than the costs for comparison, unless the hash
   is cached.
2. it is often easier for applications to provide sorting. For
   most things you want to search for, coming up with a total order
   is straight-forward; defining a hash operation might not be that
   easy (of course, for identity lookups, hashing is easier).

Regards,
Martin

From jason.orendorff at gmail.com  Wed Sep 26 20:07:19 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Wed, 26 Sep 2007 14:07:19 -0400
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709260802.44381.mark@qtrac.eu>
	<18170.16027.665491.815991@montanaro.dyndns.org>
	<200709261233.57636.mark@qtrac.eu>
	
Message-ID: 

One situation where a sorteddict would win is finding upper and lower
bounds.  This especially matters if you want to iterate over a
specific range of keys: "show me all entries between 1 Jan 2007 and 1
Feb 2007" is O(N) in the number of entries in that range, not the
entire data set.

I think people ask for things like this because they have a high-level
need like "read 3 log files, jam all the data into a single data
structure, and extract time slices from that" for which no
particularly obivous combination of lists and dicts seems to jump out
at you.  Then they hit on an idea like sorteddict that looks like it
might get them 60% of the way there and seems like a simple, obvious
building block that belongs in the stdlib.  That's my own experience,
anyway.

Is sorteddict really such a great building block?  I dunno.  It seems
like that might or might not be true.  These situations seem to come
up pretty rarely in Python's problem domain, so it's hard to get a
feel for it.

I do know from recent personal experience that system-level code often
wants custom data structures, and having spent a decade with Python
lists and dictionaries, I'm out of shape.  :)

-j

From nick.bastin at gmail.com  Wed Sep 26 21:00:23 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 26 Sep 2007 15:00:23 -0400
Subject: [Python-3000] Py3k Trivia :-)
In-Reply-To: 
References: 
	<46F46A71.1060409@canterbury.ac.nz>
	
Message-ID: <66d0a6e10709261200x5834898bof1a23f80cc0ddca5@mail.gmail.com>

On 9/21/07, Guido van Rossum  wrote:
> On 9/21/07, Greg Ewing  wrote:
> > Guido van Rossum wrote:
> > > """
> > > George isn't tall enough to ride the greatest rollercoaster of all
> > > time, The Turbo Python 3000. He uses licorice whips to measure his
> > > height and determines that he is 7-whips tall, one short of the 8-whip
> > > minimum!
> > > """
> >
> > Fantastic! I vote that we hereby adopt the licorice whip
> > as the standard unit for measuring the speed of Python 3.0
> > implementations, with the speed of 2.6 (whatever it turns
> > out to be) defined as 7 whips.
>
> Ah, but is 6 whips faster or slower than 7 whips?

Slower.  If we get up to 8 we can go for an exciting ride!  :-)

--
Nick

From charleshixsn at earthlink.net  Wed Sep 26 21:39:49 2007
From: charleshixsn at earthlink.net (Charles D Hixson)
Date: Wed, 26 Sep 2007 12:39:49 -0700
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: <18170.35870.184920.53212@montanaro.dyndns.org>
References: <200709111506.32823.mark@qtrac.eu>	<200709261233.57636.mark@qtrac.eu>		<200709261627.25593.mark@qtrac.eu>
	<18170.35870.184920.53212@montanaro.dyndns.org>
Message-ID: <46FAB585.1000005@earthlink.net>

skip at pobox.com wrote:
>     Mark> With sorteddict you pay O(log N) for accessing, but you pay
>     Mark> nothing for sorting.
>
> Pay me now or pay me later, but maintaining a sorted sequence will always
> cost something.
>
> Skip
>   
Very frequently, however, I want frequent sorted access to a container.  
I.e. I will want something like "what's the next key bigger than nnn"  
(I said nnn because it often isn't a string).  In such cases a B+Tree or 
B*Tree is a much better answer than a hash table.  I'll grant that for 
the most common cases hash tables are superior...but not, by any means, 
for all cases.

There have been cases where I have maintained both a list and a dict for 
the same data.  (Well, the list was an index into the dict, but you get 
the idea.)   The dict was for fast access when I knew the key, and the 
list was for binary search when I knew things *about* the key.

An important note here is that the key to the dict/list was generally 
NOT a string in these situations.  If strings suffice, then I've 
generally found a hash table to work well enough, and frequently been 
superior.

OTOH, if you don't need continual access while you are building the 
list, then I agree with you.  The problem is that each time you sort a 
hash table you must pay for an entire sort, while adding a key or two to 
a B+Tree is relatively cheap.

From qrczak at knm.org.pl  Wed Sep 26 22:00:56 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Wed, 26 Sep 2007 22:00:56 +0200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	
	<5d44f72f0709192307r2d0cec8am5a83b3c32812cd9b@mail.gmail.com>
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
Message-ID: <1190836856.16322.55.camel@qrnik>

Dnia 25-09-2007, Wt o godzinie 17:22 -0700, Guido van Rossum napisa?(a):

> OK. Though it's questionable even whether a slice of a mutable bytes
> object should return a mutable bytes object (as it is not a shared
> view). But as that is what PyBytes currently do it is certainly the
> easiest...

A slice of a list is a list, as it always have been, so letting slicing
return the same type as the whole sequence is at least consistent and
easy to explain. Hard to say though what are typical use cases.

OTOH I believe individual elements of mutable or immutable bytes should
be ints. Here is why I think that the analogy between characters and
bytes is not strong enough to let elements of bytes be bytes of length 1
just because strings do the same.

Bytes are often computed, while characters are often only copied
from place to place. Arithmetic is defined on ints, but not on bytes
sequences of length 1. This means that computing a bytes sequence from
scratch requires explicit conversions between a byte represented by an
int and a byte represented by bytes of length 1.

There is also a philosophical reason. The division of a string into
characters is quite arbitrary: considering UTF-16/UTF-32, combining
characters, the encoding of Hangul, orthography peculiarities,
proportional fonts, ligatures, variant selectors etc. ? all of these
obscuring the concept of a character and of string length, and
considering that a sequence of characters might have been decoded from
or will be encoded into a sequence of bytes with a different length.
This means that having atomic string components is more a technical
convenience than a fundamental necessity, that the very concept of a
character in a Unicode world is arbitrary, and the length of a string is
more a technical detail of a representation than an inherent property of
the text being represented. All this means that the concept of a string
is more fundamental than a character.

OTOH a byte count and byte offsets are usually important in protocols
based on bytes (except text files when they encode human text). The
individual bytes are in some sense delimited very sharply from each
other, the amount of information stored in one byte is very well
defined. A single byte is a more important concept in a bytes world
than a character in a text world, it's not merely a sequence with
length 1.

Having characters different from strings would require creation of a new
type, because the existing int type is not very appropriate for single
characters, because many properties differ, e.g. the effect of writing
to a text file. To avoid the burden of creating a new type for a concept
which is rarely useful in isolation, strings of length 1 have been
reused. OTOH the existing int type seems appropriate for elements of
bytes. They can be easily thought of as just integers in the range
0..255, and Python does not use separate integer types for different
potential ranges.

If you really don't like ints there, I would prefer immutable bytes even
as elements of mutable bytes. This is just a value isomorphic to an int,
not an object with its own state. Moreover for atomic objects like
individual bytes mutability is not helpful to obtain performance, which
would be a reason to use a mutable type for non-atomic objects even when
conceptually they are identityless values (mutability often helps in
such case because an object can be constructed piece by piece).

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From guido at python.org  Wed Sep 26 23:58:53 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 26 Sep 2007 14:58:53 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Message-ID: 

Please comment.

PEP: 3137
Title: Immutable Bytes and Mutable Buffer
Version: $Revision: 58264 $
Last-Modified: $Date: 2007-09-26 14:58:29 -0700 (Wed, 26 Sep 2007) $
Author: Guido van Rossum 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 26-Sep-2007
Python-Version: 3.0
Post-History: 26-Sep-2007

Introduction
============

After releasing Python 3.0a1 with a mutable bytes type, pressure
mounted to add a way to represent immutable bytes.  Gregory P. Smith
proposed a patch that would allow making a bytes object temporarily
immutable by requesting that the data be locked using the new buffer
API from PEP 3118.  This did not seem the right approach to me.

Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to
make the bytes type immutable (by crudely removing all mutating APIs)
and fix the fall-out in the test suite.  This showed that there aren't
all that many places that depend on the mutability of bytes, with the
exception of code that builds up a return value from small pieces.

Thinking through the consequences, and noticing that using the array
module as an ersatz mutable bytes type is far from ideal, and
recalling a proposal put forward earlier by Talin, I floated the
suggestion to have both a mutable and an immutable bytes type.  (This
had been brought up before, but until seeing the evidence of Jeffrey's
patch I wasn't open to the suggestion.)

Moreover, a possible implementation strategy became clear: use the old
PyString implementation, stripped down to remove locale support and
implicit conversions to/from Unicode, for the immutable bytes type,
and keep the new PyBytes implementation as the mutable bytes type.

The ensuing discussion made it clear that the idea is welcome but
needs to be specified more precisely.  Hence this PEP.

Advantages
==========

One advantage of having an immutable bytes type is that code objects
can use these.  It also makes it possible to efficiently create hash
tables using bytes for keys; this may be useful when parsing protocols
like HTTP or SMTP which are based on bytes representing text.

Porting code that manipulates binary data (or encoded text) in Python
2.x will be easier using the new design than using the original 3.0
design with mutable bytes; simply replace ``str`` with ``bytes`` and
change '...' literals into b'...' literals.

Naming
======

I propose the following type names at the Python level:

  - ``bytes`` is an immutable array of bytes (PyString)

  - ``buffer`` is a mutable array of bytes (PyBytes)

  - ``memoryview`` is a bytes view on another object (PyMemory)

The old type named ``buffer`` is so similar to the new type
``memoryview``, introduce by PEP 3118, that it is redundant.  The rest
of this PEP doesn't discuss the functionality of ``memoryview``; it is
just mentioned here to justify getting rid of the old ``buffer`` type
so we can reuse its name for the mutable bytes type.

While eventually it makes sense to change the C API names, this PEP
maintains the old C API names, which should be familiar to all.

Literal Notations
=================

The b'...' notation introduced in Python 3.0a1 returns an immutable
bytes object, whatever variation is used.  To create a mutable bytes
buffer object, use buffer(b'...') or buffer([...]).  The latter may
use a list of integers in range(256).

Functionality
=============

PEP 3118 Buffer API
-------------------

Both bytes and buffer support the PEP 3118 buffer API.  The bytes type
only supports read-only requests; the buffer type allows writable and
data-locked requests as well.  The element data type is always 'B'
(i.e. unsigned byte).

Constructors
------------

There are four forms of constructors, applicable to both bytes and
buffer:

  - ``bytes()``, ``bytes()``, ``buffer()``,
    ``buffer()``: simple copying constructors, with the note
    that ``bytes()`` might return its (immutable) argument.

  - ``bytes(, [, ])``, ``buffer(,
    [, ])``: encode a text string.  Note that the
    ``str.encode()`` method returns an *immutable* bytes object.
    The  argument is mandatory;  is optional.

  - ``bytes()``, ``buffer()``: construct a
    bytes or buffer object from anything that supports the PEP 3118
    buffer API.

  - ``bytes()``, ``buffer()``:
    construct an immutable bytes or mutable buffer object from a
    stream of integers in range(256).

  - ``buffer()``: construct a zero-initialized buffer of a given
    lenth.

Comparisons
-----------

The bytes and buffer types are comparable with each other and
orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.

Comparing either type to a str object raises an exception.  This
turned out to be necessary to catch common mistakes.

Slicing
-------

Slicing a bytes object returns a bytes object.  Slicing a buffer
object returns a buffer object.

Slice assignment to a mutable buffer object accept anything that
supports the PEP 3118 buffer API, or an iterable of integers in
range(256).

Indexing
--------

**Open Issue:** I'm undecided on whether indexing bytes and buffer
objects should return small ints (like the bytes type in 3.0a1, and
like lists or array.array('B')), or bytes/buffer objects of length 1
(like the str type).  The latter (str-like) approach will ease porting
code from Python 2.x; but it makes it harder to extract values from a
bytes array.

Assignment to an item of a mutable buffer object accepts an int in
range(256); if we choose the str-like approach for indexing above, it
also accepts an object implementing the PEP 3118 buffer API, if it has
length 1.

Str() and Repr()
----------------

The str() and repr() functions return the same thing for these
objects.  The repr() of a bytes object returns a b'...' style literal.
The repr() of a buffer returns a string of the form "buffer(b'...')".

Methods
-------

The following methods are supported by bytes as well as buffer, with
similar semantics.  They accept anything that implements the PEP 3118
buffer API for bytes arguments, and return the same type as the object
whose method is called ("self")::

  .capitalize(), .center(), .count(), .decode(), .endswith(),
  .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
  .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
  .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
  .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
  .splitlines(), .startswith(), .strip(), .swapcase(), .title(),
  .translate(), .upper(), .zfill()

This is exactly the set of methods present on the str type in Python
2.x, with the exclusion of .encode().  The signatures and semantics
are the same too.  However, whenever character classes like letter,
whitespace, lower case are used, the ASCII definitions of these
classes are used.  (The Python 2.x str type uses the definitions from
the current locale, settable through the locale module.)  The
.encode() method is left out because of the more strict definitions of
encoding and decoding in Python 3000: encoding always takes a Unicode
string and returns a bytes sequence, and decoding always takes a bytes
sequence and returns a Unicode string.

Bytes and the Str Type
----------------------

Like the bytes type in Python 3.0a1, and unlike the relationship
between str and unicode in Python 2.x, any attempt to mix bytes (or
buffer) objects and str objects without specifying an encoding will
raise a TypeError exception.  This is the case even for simply
comparing a bytes or buffer object to a str object (even violating the
general rule that comparing objects of different types for equality
should just return False).

Conversions between bytes or buffer objects and str objects must
always be explicit, using an encoding.  There are two equivalent APIs:
``str(b, [, ])`` is equivalent to
``b.encode([, ])``, and
``bytes(s, [, ])`` is equivalent to
``s.decode([, ])``.

There is one exception: we can convert from bytes (or buffer) to str
without specifying an encoding by writing ``str(b)``.  This produces
the same result as ``repr(b)``.  This exception is necessary because
of the general promise that *any* object can be printed, and printing
is just a special case of conversion to str.  There is however no
promise that printing a bytes object interprets the individual bytes
as characters (unlike in Python 2.x).

The str type current supports the PEP 3118 buffer API.  While this is
perhaps occasionally convenient, it is also potentially confusing,
because the bytes accessed via the buffer API represent a
platform-depending encoding: depending on the platform byte order and
a compile-time configuration option, the encoding could be UTF-16-BE,
UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
of the str type might completely change the bytes representation,
e.g. to UTF-8, or even make it impossible to access the data as a
contiguous array of bytes at all.  Therefore, support for the PEP 3118
buffer API will be removed from the str type.

Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Thu Sep 27 00:57:47 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 26 Sep 2007 15:57:47 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: 

On 9/26/07, Guido van Rossum  wrote:
> Please comment.
>
> PEP: 3137
> Title: Immutable Bytes and Mutable Buffer
> Version: $Revision: 58264 $
> Last-Modified: $Date: 2007-09-26 14:58:29 -0700 (Wed, 26 Sep 2007) $
> Author: Guido van Rossum 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 26-Sep-2007
> Python-Version: 3.0
> Post-History: 26-Sep-2007
>
> Introduction
> ============
>
> After releasing Python 3.0a1 with a mutable bytes type, pressure
> mounted to add a way to represent immutable bytes.  Gregory P. Smith
> proposed a patch that would allow making a bytes object temporarily
> immutable by requesting that the data be locked using the new buffer
> API from PEP 3118.  This did not seem the right approach to me.
>
> Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to
> make the bytes type immutable (by crudely removing all mutating APIs)
> and fix the fall-out in the test suite.  This showed that there aren't
> all that many places that depend on the mutability of bytes, with the
> exception of code that builds up a return value from small pieces.
>
> Thinking through the consequences, and noticing that using the array
> module as an ersatz mutable bytes type is far from ideal, and
> recalling a proposal put forward earlier by Talin, I floated the
> suggestion to have both a mutable and an immutable bytes type.  (This
> had been brought up before, but until seeing the evidence of Jeffrey's
> patch I wasn't open to the suggestion.)
>
> Moreover, a possible implementation strategy became clear: use the old
> PyString implementation, stripped down to remove locale support and
> implicit conversions to/from Unicode, for the immutable bytes type,
> and keep the new PyBytes implementation as the mutable bytes type.
>
> The ensuing discussion made it clear that the idea is welcome but
> needs to be specified more precisely.  Hence this PEP.
>
> Advantages
> ==========
>
> One advantage of having an immutable bytes type is that code objects
> can use these.

Woohoo (from a security perspective)!

>  It also makes it possible to efficiently create hash
> tables using bytes for keys; this may be useful when parsing protocols
> like HTTP or SMTP which are based on bytes representing text.
>
> Porting code that manipulates binary data (or encoded text) in Python
> 2.x will be easier using the new design than using the original 3.0
> design with mutable bytes; simply replace ``str`` with ``bytes`` and
> change '...' literals into b'...' literals.
>
> Naming
> ======
>
> I propose the following type names at the Python level:
>
>   - ``bytes`` is an immutable array of bytes (PyString)
>
>   - ``buffer`` is a mutable array of bytes (PyBytes)
>
>   - ``memoryview`` is a bytes view on another object (PyMemory)
>
> The old type named ``buffer`` is so similar to the new type
> ``memoryview``, introduce by PEP 3118, that it is redundant.  The rest
> of this PEP doesn't discuss the functionality of ``memoryview``; it is
> just mentioned here to justify getting rid of the old ``buffer`` type
> so we can reuse its name for the mutable bytes type.
>
> While eventually it makes sense to change the C API names, this PEP
> maintains the old C API names, which should be familiar to all.
>
> Literal Notations
> =================
>
> The b'...' notation introduced in Python 3.0a1 returns an immutable
> bytes object, whatever variation is used.  To create a mutable bytes
> buffer object, use buffer(b'...') or buffer([...]).  The latter may
> use a list of integers in range(256).
>
> Functionality
> =============
>
> PEP 3118 Buffer API
> -------------------
>
> Both bytes and buffer support the PEP 3118 buffer API.  The bytes type
> only supports read-only requests; the buffer type allows writable and
> data-locked requests as well.  The element data type is always 'B'
> (i.e. unsigned byte).
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
>   - ``bytes()``, ``bytes()``, ``buffer()``,
>     ``buffer()``: simple copying constructors, with the note
>     that ``bytes()`` might return its (immutable) argument.
>
>   - ``bytes(, [, ])``, ``buffer(,
>     [, ])``: encode a text string.  Note that the
>     ``str.encode()`` method returns an *immutable* bytes object.
>     The  argument is mandatory;  is optional.
>
>   - ``bytes()``, ``buffer()``: construct a
>     bytes or buffer object from anything that supports the PEP 3118
>     buffer API.
>
>   - ``bytes()``, ``buffer()``:
>     construct an immutable bytes or mutable buffer object from a
>     stream of integers in range(256).
>
>   - ``buffer()``: construct a zero-initialized buffer of a given
>     lenth.

Typo; went ahead and fixed it in svn.

>
> Comparisons
> -----------
>
> The bytes and buffer types are comparable with each other and
> orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.
>
> Comparing either type to a str object raises an exception.  This
> turned out to be necessary to catch common mistakes.
>
> Slicing
> -------
>
> Slicing a bytes object returns a bytes object.  Slicing a buffer
> object returns a buffer object.
>
> Slice assignment to a mutable buffer object accept anything that
> supports the PEP 3118 buffer API, or an iterable of integers in
> range(256).
>
> Indexing
> --------
>
> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type).  The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.
>

How much do you care about making the 2 -> 3 transition easy?  If you
don't go the str way then comparisons like ``bytes_[0] == b"A"`` won't
work unless you allow comparisons between ints and length 1
bytes/buffers.  Extracting a single item is not horrendous if you pass
it to int().

Personally I say go with the list-like semantics.  Having the
following code return false seems odd (but not ridiculous) to me::

  stuff = bytes([0, 1])
  stuff[1] = 42
  stuff[1] == 42

So unless int comparisons are allowed I am -0 on the str-like semantics.

> Assignment to an item of a mutable buffer object accepts an int in
> range(256); if we choose the str-like approach for indexing above, it
> also accepts an object implementing the PEP 3118 buffer API, if it has
> length 1.
>
> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects.  The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".
>
> Methods
> -------
>
> The following methods are supported by bytes as well as buffer, with
> similar semantics.  They accept anything that implements the PEP 3118
> buffer API for bytes arguments, and return the same type as the object
> whose method is called ("self")::
>
>   .capitalize(), .center(), .count(), .decode(), .endswith(),
>   .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
>   .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
>   .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
>   .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
>   .splitlines(), .startswith(), .strip(), .swapcase(), .title(),
>   .translate(), .upper(), .zfill()
>
> This is exactly the set of methods present on the str type in Python
> 2.x, with the exclusion of .encode().  The signatures and semantics
> are the same too.  However, whenever character classes like letter,
> whitespace, lower case are used, the ASCII definitions of these
> classes are used.  (The Python 2.x str type uses the definitions from
> the current locale, settable through the locale module.)  The
> .encode() method is left out because of the more strict definitions of
> encoding and decoding in Python 3000: encoding always takes a Unicode
> string and returns a bytes sequence, and decoding always takes a bytes
> sequence and returns a Unicode string.
>
> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, [, ])`` is equivalent to
> ``b.encode([, ])``, and
> ``bytes(s, [, ])`` is equivalent to
> ``s.decode([, ])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``.  This produces
> the same result as ``repr(b)``.  This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str.  There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).
>
> The str type current supports the PEP 3118 buffer API.  While this is

Fixed to "currently" in svn.

> perhaps occasionally convenient, it is also potentially confusing,
> because the bytes accessed via the buffer API represent a
> platform-depending encoding: depending on the platform byte order and
> a compile-time configuration option, the encoding could be UTF-16-BE,
> UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
> of the str type might completely change the bytes representation,
> e.g. to UTF-8, or even make it impossible to access the data as a
> contiguous array of bytes at all.  Therefore, support for the PEP 3118
> buffer API will be removed from the str type.
>

+1 from me regardless of how the length 1 discussion turns out as this
will help with Py3K transitioning.

-Brett

From guido at python.org  Thu Sep 27 00:58:00 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 26 Sep 2007 15:58:00 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <1190836856.16322.55.camel@qrnik>
References: 
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
Message-ID: 

I find this semi-convincing. It would be very convincing in a
greenfield situation I think.

However there's quite a bit of Python 2.x code around that manipulates
*bytes* in the guise of 8-bit strings, and it uses tests like "if s[0]
== 'x': ..." frequently. This can of course be rewritten using a
slice, but not so easily when you're looping over bytes:

  for b in bb:
    if b == b'x': ...

This becomes the relatively ugly (because it uses a 1-char *string*):

  for b in bb:
    if b == ord('x'): ...

So I've left this as an open issue in PEP 3137.

--Guido

On 9/26/07, Marcin 'Qrczak' Kowalczyk  wrote:
> Dnia 25-09-2007, Wt o godzinie 17:22 -0700, Guido van Rossum napisa?(a):
>
> > OK. Though it's questionable even whether a slice of a mutable bytes
> > object should return a mutable bytes object (as it is not a shared
> > view). But as that is what PyBytes currently do it is certainly the
> > easiest...
>
> A slice of a list is a list, as it always have been, so letting slicing
> return the same type as the whole sequence is at least consistent and
> easy to explain. Hard to say though what are typical use cases.
>
> OTOH I believe individual elements of mutable or immutable bytes should
> be ints. Here is why I think that the analogy between characters and
> bytes is not strong enough to let elements of bytes be bytes of length 1
> just because strings do the same.
>
> Bytes are often computed, while characters are often only copied
> from place to place. Arithmetic is defined on ints, but not on bytes
> sequences of length 1. This means that computing a bytes sequence from
> scratch requires explicit conversions between a byte represented by an
> int and a byte represented by bytes of length 1.
>
> There is also a philosophical reason. The division of a string into
> characters is quite arbitrary: considering UTF-16/UTF-32, combining
> characters, the encoding of Hangul, orthography peculiarities,
> proportional fonts, ligatures, variant selectors etc. ? all of these
> obscuring the concept of a character and of string length, and
> considering that a sequence of characters might have been decoded from
> or will be encoded into a sequence of bytes with a different length.
> This means that having atomic string components is more a technical
> convenience than a fundamental necessity, that the very concept of a
> character in a Unicode world is arbitrary, and the length of a string is
> more a technical detail of a representation than an inherent property of
> the text being represented. All this means that the concept of a string
> is more fundamental than a character.
>
> OTOH a byte count and byte offsets are usually important in protocols
> based on bytes (except text files when they encode human text). The
> individual bytes are in some sense delimited very sharply from each
> other, the amount of information stored in one byte is very well
> defined. A single byte is a more important concept in a bytes world
> than a character in a text world, it's not merely a sequence with
> length 1.
>
> Having characters different from strings would require creation of a new
> type, because the existing int type is not very appropriate for single
> characters, because many properties differ, e.g. the effect of writing
> to a text file. To avoid the burden of creating a new type for a concept
> which is rarely useful in isolation, strings of length 1 have been
> reused. OTOH the existing int type seems appropriate for elements of
> bytes. They can be easily thought of as just integers in the range
> 0..255, and Python does not use separate integer types for different
> potential ranges.
>
> If you really don't like ints there, I would prefer immutable bytes even
> as elements of mutable bytes. This is just a value isomorphic to an int,
> not an object with its own state. Moreover for atomic objects like
> individual bytes mutability is not helpful to obtain performance, which
> would be a reason to use a mutable type for non-atomic objects even when
> conceptually they are identityless values (mutability often helps in
> such case because an object can be constructed piece by piece).
>
> --
>    __("<         Marcin Kowalczyk
>    \__/       qrczak at knm.org.pl
>     ^^     http://qrnik.knm.org.pl/~qrczak/
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep 27 01:03:12 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 26 Sep 2007 16:03:12 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
Message-ID: 

[PEP 3137]
> > **Open Issue:** I'm undecided on whether indexing bytes and buffer
> > objects should return small ints (like the bytes type in 3.0a1, and
> > like lists or array.array('B')), or bytes/buffer objects of length 1
> > (like the str type).  The latter (str-like) approach will ease porting
> > code from Python 2.x; but it makes it harder to extract values from a
> > bytes array.

On 9/26/07, Brett Cannon  wrote:
> How much do you care about making the 2 -> 3 transition easy?  If you
> don't go the str way then comparisons like ``bytes_[0] == b"A"`` won't
> work unless you allow comparisons between ints and length 1
> bytes/buffers.  Extracting a single item is not horrendous if you pass
> it to int().
>
> Personally I say go with the list-like semantics.  Having the
> following code return false seems odd (but not ridiculous) to me::
>
>   stuff = bytes([0, 1])
>   stuff[1] = 42
>   stuff[1] == 42
>
> So unless int comparisons are allowed I am -0 on the str-like semantics.

int comparisons would stick out like a sore thumb, especially since
they can only be reasonably made to work on 1-byte strings.

I'm still undecided (despite Marcin's eloquent argument for ints as
bytes) but I'm open for votes for this case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Thu Sep 27 04:15:58 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Wed, 26 Sep 2007 22:15:58 -0400
Subject: [Python-3000] ordered dict for p3k collections?
In-Reply-To: 
References: <200709111506.32823.mark@qtrac.eu>
	<200709260802.44381.mark@qtrac.eu>
	<18170.16027.665491.815991@montanaro.dyndns.org>
	<200709261233.57636.mark@qtrac.eu>
	
	
Message-ID: <66d0a6e10709261915j244b00d9s7f7369acb78e272a@mail.gmail.com>

On 9/26/07, Jason Orendorff  wrote:
> One situation where a sorteddict would win is finding upper and lower
> bounds.  This especially matters if you want to iterate over a
> specific range of keys: "show me all entries between 1 Jan 2007 and 1
> Feb 2007" is O(N) in the number of entries in that range, not the
> entire data set.

Yeah, we do this a lot.  We frequently end up with dictionaries with
hundreds of thousands of entries and a simple wrapper on std::map
gives us about 120x the performance of python dict in our use case,
almost entirely due to the fact that we search a LOT more than we
insert.

--
Nick

From alexandre at peadrop.com  Thu Sep 27 04:36:08 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Wed, 26 Sep 2007 22:36:08 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: 

On 9/26/07, Guido van Rossum  wrote:
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
>   - ``bytes()``, ``bytes()``, ``buffer()``,
>     ``buffer()``: simple copying constructors, with the note
>     that ``bytes()`` might return its (immutable) argument.
>
>   - ``bytes(, [, ])``, ``buffer(,
>     [, ])``: encode a text string.  Note that the
>     ``str.encode()`` method returns an *immutable* bytes object.
>     The  argument is mandatory;  is optional.
>
>   - ``bytes()``, ``buffer()``: construct a
>     bytes or buffer object from anything that supports the PEP 3118
>     buffer API.
>
>   - ``bytes()``, ``buffer()``:
>     construct an immutable bytes or mutable buffer object from a
>     stream of integers in range(256).
>
>   - ``buffer()``: construct a zero-initialized buffer of a given
>     lenth.
>

I think this section could be better organized. I had to read a few time
to fully understand it. Maybe a table would emphasize better the differences
between the two constructors.

> Indexing
> --------
>
> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type).  The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.

I think indexing a bytes/buffer object should return an int. I find
this behavior
more natural, to me, than using an ord()-like function to extract
values. In fact, I
remarked that the use of ord() is good indicator that bytes should be used
instead of str (look by yourself: grep -R --include='*.py' 'ord(' python25/Lib).

> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects.  The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".

Does that mean calling str() on a bytes/buffer object -- e.g., str(b"abc")
-- wouldn't decode the content of the object (like array objects)?


> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, [, ])`` is equivalent to
> ``b.encode([, ])``, and
> ``bytes(s, [, ])`` is equivalent to
> ``s.decode([, ])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``.  This produces
> the same result as ``repr(b)``.  This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str.  There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).

Ah! That answers my last question. :)

-- Alexandre

From greg.ewing at canterbury.ac.nz  Thu Sep 27 04:38:13 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 27 Sep 2007 14:38:13 +1200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
	
Message-ID: <46FB1795.5030404@canterbury.ac.nz>

Guido van Rossum wrote:
> I'm still undecided (despite Marcin's eloquent argument for ints as
> bytes) but I'm open for votes for this case.

Whatever is done, please don't do it *only* to make
conversion from 2.x easy. There should be good
independent reasons for whatever is chosen.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Thu Sep 27 04:35:10 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 27 Sep 2007 14:35:10 +1200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
	
Message-ID: <46FB16DE.7010109@canterbury.ac.nz>

Guido van Rossum wrote:

> However there's quite a bit of Python 2.x code around that manipulates
> *bytes* in the guise of 8-bit strings, and it uses tests like "if s[0]
> == 'x': ..." frequently. This can of course be rewritten using a
> slice, but not so easily when you're looping over bytes:
> 
>   for b in bb:
>     if b == b'x': ...

Would it make anything easier if there were a character
literal?

   for b in bb:
     if b == c'x': ...

where c'x' is another way of writing ord(b'x').

An advantage of this is that it would make Py3k compatible
with Pyrex, which already has c'x' literals. :-)

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From jyasskin at gmail.com  Thu Sep 27 05:44:16 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 26 Sep 2007 20:44:16 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: <5d44f72f0709262044p6ca07f05o662a89ef4a262775@mail.gmail.com>

On 9/26/07, Guido van Rossum  wrote:
> ...
> Indexing
> --------
>
> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type).  The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.

Marcin was far more eloquent than I could hope to be, but I too prefer
indexing bytes to return a small int. My reasoning is a little more
academic: All iterable types except for str get simpler when you
iterate over them, so eventually you come to a type that isn't
iterable. It would be a shame to extend this misbehavior to bytes if
we have a chance to remove it.

For example, the recursive flatten() function gets more complicated
for each type that does this:

>>> list(flatten.flatten([1, [2, [3, [4, 5]]]]))
[1, 2, 3, 4, 5]
>>> list(flatten.flatten([1, [2, [3, ["str", 5]]]]))
[1, 2, 3, 's', 't', 'r', 5]

If all iterables iterated over a simpler type, we could use:

def flatten(iterable):
    try:
        for elem in iterable:
            for elem in flatten(elem):
                yield elem
    except TypeError:
        # Not iterable
        yield iterable

but with strings, you need

def flatten(iterable):
    try:
        for elem in iterable:
            if isinstance(elem, str) and len(elem) == 1:
                yield elem
            else:
                for elem in flatten(elem):
                    yield elem
    except TypeError:
        # Not iterable
        yield iterable

and another special case for each similar type.


Comparisons with literal bytes could be done with:
  for b in bb:
    if b == b'x'[0]: ...
or perhaps
    if b == int(b'x'): ...
but you're right that's not ideal.

-- 
Namast?,
Jeffrey Yasskin

From greg at krypto.org  Thu Sep 27 07:06:35 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 26 Sep 2007 22:06:35 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <46FB16DE.7010109@canterbury.ac.nz>
References: 
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
	
	<46FB16DE.7010109@canterbury.ac.nz>
Message-ID: <52dc1c820709262206o33c0b792ib94156556d0b5bc5@mail.gmail.com>

On 9/26/07, Greg Ewing  wrote:
> Guido van Rossum wrote:
>
> > However there's quite a bit of Python 2.x code around that manipulates
> > *bytes* in the guise of 8-bit strings, and it uses tests like "if s[0]
> > == 'x': ..." frequently. This can of course be rewritten using a
> > slice, but not so easily when you're looping over bytes:
> >
> >   for b in bb:
> >     if b == b'x': ...
>
> Would it make anything easier if there were a character
> literal?
>
>    for b in bb:
>      if b == c'x': ...
>
> where c'x' is another way of writing ord(b'x').
>
> An advantage of this is that it would make Py3k compatible
> with Pyrex, which already has c'x' literals. :-)

My gut feeling on this is first "neat" but then "eew."  There should
not be multiple ways to write something so simple and letter'' syntax
we already use for b'' s'' u'' r'' and such already annoys me as ugly.
 However that syntax is already established so maybe its okay.

Should it be i'x' instead of c'x' since the result is an int?  i'x'
might look odd in some fonts?

Writing org(b'x') is ugly.

Would a special case in the b'x' comparison tests that knows how to
compare a len==1 bytes (mutable or not) object to an integer be
reasonable or just alternately confusing?

 b'x' == ord(b'x')
 b'x' > 65

Could that lead to people wanting to treat len==1 bytes objects like
tiny ints and use them in math (do *not* allow that)?

And if we did that what would a bytes len!=1 comparison to an integer
do?  return False as it currently does i'd hope.

-gps

From greg at krypto.org  Thu Sep 27 07:16:16 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 26 Sep 2007 22:16:16 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
	
Message-ID: <52dc1c820709262216n223b37fak835523027c8877eb@mail.gmail.com>

On 9/26/07, Guido van Rossum  wrote:
> [PEP 3137]
> > > **Open Issue:** I'm undecided on whether indexing bytes and buffer
> > > objects should return small ints (like the bytes type in 3.0a1, and
> > > like lists or array.array('B')), or bytes/buffer objects of length 1
> > > (like the str type).  The latter (str-like) approach will ease porting
> > > code from Python 2.x; but it makes it harder to extract values from a
> > > bytes array.
>
> On 9/26/07, Brett Cannon  wrote:
> > How much do you care about making the 2 -> 3 transition easy?  If you
> > don't go the str way then comparisons like ``bytes_[0] == b"A"`` won't
> > work unless you allow comparisons between ints and length 1
> > bytes/buffers.  Extracting a single item is not horrendous if you pass
> > it to int().
> >
> > Personally I say go with the list-like semantics.  Having the
> > following code return false seems odd (but not ridiculous) to me::
> >
> >   stuff = bytes([0, 1])
> >   stuff[1] = 42
> >   stuff[1] == 42
> >
> > So unless int comparisons are allowed I am -0 on the str-like semantics.
>
> int comparisons would stick out like a sore thumb, especially since
> they can only be reasonably made to work on 1-byte strings.
>
> I'm still undecided (despite Marcin's eloquent argument for ints as
> bytes) but I'm open for votes for this case.
>

looks like my response in the other thread suggesting allowing
comparisons of len==1 to ints was already mentioned before me.  yay.
I'm +0.5 on the idea of allowing the len==1 to int comparison and
returning ints for the bytes/buffer indices and iteration.

glad to see this as a PEP, it feels more real. :)

-gps

From g.brandl at gmx.net  Thu Sep 27 08:15:00 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 27 Sep 2007 08:15:00 +0200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
Message-ID: 

Alexandre Vassalotti schrieb:

>> Indexing
>> --------
>>
>> **Open Issue:** I'm undecided on whether indexing bytes and buffer
>> objects should return small ints (like the bytes type in 3.0a1, and
>> like lists or array.array('B')), or bytes/buffer objects of length 1
>> (like the str type).  The latter (str-like) approach will ease porting
>> code from Python 2.x; but it makes it harder to extract values from a
>> bytes array.
> 
> I think indexing a bytes/buffer object should return an int. I find
> this behavior
> more natural, to me, than using an ord()-like function to extract
> values. In fact, I
> remarked that the use of ord() is good indicator that bytes should be used
> instead of str (look by yourself: grep -R --include='*.py' 'ord(' python25/Lib).

If b[0] returns an int, you will have to use ord() to compare it to b"a".
If it returns b"a", you won't.

If you want to compare a byte by ordinal, you can still use b"\xAB", without
a function call...

Therefore I vote for returning not an int, but I wouldn't object to bytes of
length 1 being comparable to ints.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From walter at livinglogic.de  Thu Sep 27 09:34:48 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Thu, 27 Sep 2007 09:34:48 +0200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: <46FB5D18.1000601@livinglogic.de>

Guido van Rossum wrote:
> Please comment.

> [...] 
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, [, ])`` is equivalent to
> ``b.encode([, ])``, and
> ``bytes(s, [, ])`` is equivalent to
> ``s.decode([, ])``.

This looks backwards to me. IMHO it should be:

``str(b, [, ])`` is equivalent to 
``b.decode([, ])``, and ``bytes(s, [, 
])`` is equivalent to ``s.encode([, ])``.

Servus,
    Walter

From talin at acm.org  Thu Sep 27 10:20:14 2007
From: talin at acm.org (Talin)
Date: Thu, 27 Sep 2007 01:20:14 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: <46FB67BE.4080502@acm.org>

Guido van Rossum wrote:
> Thinking through the consequences, and noticing that using the array
> module as an ersatz mutable bytes type is far from ideal, and
> recalling a proposal put forward earlier by Talin, I floated the
> suggestion to have both a mutable and an immutable bytes type.  (This
> had been brought up before, but until seeing the evidence of Jeffrey's
> patch I wasn't open to the suggestion.)

One thing that you may have missed from my proposal is that both 'bytes' 
and 'buffer' inherit from a common ABC. This ABC defines all of the 
operations which 'bytes' and 'buffer' have in common. My name for this 
ABC was 'ByteSequence', but I have no particular attachment to that name.

-- Talin


From jjb5 at cornell.edu  Thu Sep 27 15:56:59 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Thu, 27 Sep 2007 09:56:59 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: <46FBB6AB.4030509@cornell.edu>

> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type).  The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.

The protocol encoding and decoding world calls these "octet strings" and 
it makes encoding and decoding discussions a lot easier.  ASN.1 calls 
them that and it's a good thing.

In that frame of mind, the first element is an octet, and while Python 
would not add a new datatype, just like it doesn't have one for 
character, it would be an unsigned integer in range(256).

> Methods
> -------
> 
> The following methods are supported by bytes as well as buffer, with
> similar semantics.  They accept anything that implements the PEP 3118
> buffer API for bytes arguments, and return the same type as the object
> whose method is called ("self"):

First, please enforce that where these functions take a "string" 
parameter that they require an octet or octet string (I couldn't find 
what kinds of arguments these functions require in PEP 3118):

     >>> x = b'123*45'
     >>> x.find("*")
     TypeError: expected an octet string or int

     >>> x.find(b'*')
     3
     >>> x.find(42)
     3

Second, Please add slice operations and .append() to mutable octet strings:

     >>> x[:0] = b'>'              # start of message
     >>> x.append(sum(x) % 256)    # simple checksum


Joel

From alexandre at peadrop.com  Thu Sep 27 17:13:38 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Thu, 27 Sep 2007 11:13:38 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
Message-ID: 

On 9/26/07, Alexandre Vassalotti  wrote:
> I think indexing a bytes/buffer object should return an int.
> I find this behavior more natural, to me, than using an
> ord()-like function to extract values.

I didn't known about the length-1 comparison issue when I wrote this.
Personally, I wouldn't mind writing either this:

   for b in bytes:
     if b == b'a'[0]:
       pass

or this:

   for b in bytes:
     if b == b'a':
       pass

> In fact, I remarked that the use of ord() is good indicator
> that bytes should be used instead of str (look by yourself:
> grep -R --include='*.py' 'ord(' python25/Lib).

I don't think my argument is still valid. Compared the use of ord() in
Python 2.x vs. Python 3.x with:

  % egrep -R --include='*.py' '\
References: 	
	
Message-ID: <46FBCE5A.6050503@gmail.com>

Guido van Rossum wrote:
> [PEP 3137]
>>> **Open Issue:** I'm undecided on whether indexing bytes and buffer
>>> objects should return small ints (like the bytes type in 3.0a1, and
>>> like lists or array.array('B')), or bytes/buffer objects of length 1
>>> (like the str type).  The latter (str-like) approach will ease porting
>>> code from Python 2.x; but it makes it harder to extract values from a
>>> bytes array.
> 
> On 9/26/07, Brett Cannon  wrote:
>> How much do you care about making the 2 -> 3 transition easy?  If you
>> don't go the str way then comparisons like ``bytes_[0] == b"A"`` won't
>> work unless you allow comparisons between ints and length 1
>> bytes/buffers.  Extracting a single item is not horrendous if you pass
>> it to int().
>>
>> Personally I say go with the list-like semantics.  Having the
>> following code return false seems odd (but not ridiculous) to me::
>>
>>   stuff = bytes([0, 1])
>>   stuff[1] = 42
>>   stuff[1] == 42
>>
>> So unless int comparisons are allowed I am -0 on the str-like semantics.
> 
> int comparisons would stick out like a sore thumb, especially since
> they can only be reasonably made to work on 1-byte strings.
> 
> I'm still undecided (despite Marcin's eloquent argument for ints as
> bytes) but I'm open for votes for this case.

Making an iterator over an integer sequence acceptable in the 
constructor strongly suggests that a byte sequence contains integers 
between 0 and 255 inclusive, not length 1 byte sequences.

And I think that's the cleanest conceptual model for them as well. A 
byte sequence doesn't contain length 1 byte sequences, it contains bytes 
(i.e. numbers between 0 and 255 inclusive).

For direct comparison, a slice works fine:

   if data[0:1] == b'x':
     print "Starts with x!"

The only problematic case is cases such as iterating over a byte 
sequence where we may have an integer and want to compare it to a length 
1 byte string. With just the simple conceptual model, we would have to 
write one of:

   if val == b'x'[0]:
   if bytes([val]) == b'x':
   if val == ord(b'x'):

I don't think it's worth breaking the conceptual model of the data type 
just to reduce the simplest spelling of that comparison by 3 characters.

However, I do think it may be worth having an additional iterator on 
bytes and buffer objects:

   def fragments(self, size=1): # Could do with a better name
     for i in range(len(self)):
       yield self[i:i+size]

Then the problematic example could be written:

   for val in data.fragments():
       if val == b'x':
           print "Found an x!"

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From nas at arctrix.com  Thu Sep 27 17:55:30 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Thu, 27 Sep 2007 15:55:30 +0000 (UTC)
Subject: [Python-3000] Immutable bytes -- looking for volunteer
References: 
	<766a29bd0709200646h1591715fib3344ba561d595cc@mail.gmail.com>
	<5d44f72f0709201234vec00c4w13d41bf5c4bea8d7@mail.gmail.com>
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
	
Message-ID: 

Guido van Rossum  wrote:
> However there's quite a bit of Python 2.x code around that manipulates
> *bytes* in the guise of 8-bit strings, and it uses tests like "if s[0]
>== 'x': ..." frequently.

I think it would be useful to do a survey and see how much code
would be affected and the effect on readability.

  Neil


From weilawei at gmail.com  Thu Sep 27 19:02:05 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Thu, 27 Sep 2007 13:02:05 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: 
References: 
Message-ID: 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've uploaded the latest code to http://umass.glexia.net/mpf.tar.bz2

Here's a quick rundown of supported functions and operations.

The MPF() constructor accepts a string and an optional keyword
argument, prec, specifying precision (as a Long).

Supported module functions:

mpf_add 	Add two MPF objects
mpf_sub 	Subtract two MPF objects
mpf_div 	Divide two MPF objects
mpf_mul 	Multiply two MPF objects
mpf_sqrt 	Take the square root of an MPF object
mpf_neg 	Get the negative of an MPF object
mpf_abs 	Get the absolute value of an MPF object
mpf_pow 	Raise an MPF object to a power
mpf_ceil 	Round an MPF object to the next highest integer
mpf_floor 	Round an MPF object to the next lower integer
mpf_trunc 	Truncate the decimal portion of an MPF object

Operations supported:
(note that only MPF objects are supported atm)

+ - * / ** abs() and - (negative)

Attributes:

value		A tuple of the form (base, sign, whole, decimal)

Also, it supports a print() representation. No more finagling with
value if you don't want to.

Things to come:

floor divide, support for other python numbers in the number interface

Comments:

This wasn't a case of NIH syndrome. I wrote this extension because
Decimal simply was not fast enough and the builtin floats didn't
provide enough precision for a project. The pre-existing modules were
terrible, didn't compile, etc. Necessity, not NIH syndrome.

Questions:

What features would you like to see?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG++HqqR5p8HaX4oURAsROAKCfNMxxoa+i0lFWJZPDWH8/lguT5ACfSl7d
eYrrkokoCIjuFmnxTW6f4y4=
=cZ8M
-----END PGP SIGNATURE-----

From jjb5 at cornell.edu  Thu Sep 27 19:14:53 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Thu, 27 Sep 2007 13:14:53 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FBCE5A.6050503@gmail.com>
References: 		
	<46FBCE5A.6050503@gmail.com>
Message-ID: <46FBE50D.303@cornell.edu>

> Making an iterator over an integer sequence acceptable in the 
> constructor strongly suggests that a byte sequence contains integers 
> between 0 and 255 inclusive, not length 1 byte sequences.
> 
> And I think that's the cleanest conceptual model for them as well. A 
> byte sequence doesn't contain length 1 byte sequences, it contains bytes 
> (i.e. numbers between 0 and 255 inclusive).

Using standards language, an octet string contains octets.  Since Python 
blurs the distinction between characters and strings of length 1, 
shouldn't it also blur the distinction between octets and an octet 
strings of length 1?

> The only problematic case is cases such as iterating over a byte 
> sequence where we may have an integer and want to compare it to a length 
> 1 byte string.

Why is it problematic?  Why does a programmer have to jump through hoops 
to compare the two?

      >>> x, y = "abc", "a"
      >>> x[0] == y
      True

And the same should be true for octet strings:

      >>> x, y = b"abc", b"a"
      >>> x[0] == y
      True

> With just the simple conceptual model...

Python doesn't have a simple conceptual model, there is no distinction 
between strings of length 1 and characters.  This makes it pretty clear 
that octet strings contain octets:

     >>> list(b"1234")
     [49, 50, 51, 52, 53]

And you should be able check for an octet in an octet string:

     >>> 51 in b"1234"
     True

And if I want to specify the same octet in ASCII do this:

     >>> b'3' in b"1234"
     True

> I don't think it's worth breaking the conceptual model of the data type 
> just to reduce the simplest spelling of that comparison by 3 characters.

The programmer shouldn't have to go through any one of those gyrations, 
the only reason why saying chr(51) == '3' is necessary is because 
characters and integers are different types.  But octets and "integers 
in the range(256)" are exactly the same thing.

     >>> b'3' == 51
     True

The fact that octets can be written as an octet string of length 1 is 
just a happy coincidence of Python, just like characters.

>    for val in data.fragments():
>        if val == b'x':
>            print "Found an x!"

That's a hideous amount of work to just say:

     if b'x' in data:
         print "Found an x!"


Joel


From guido at python.org  Thu Sep 27 19:29:24 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 27 Sep 2007 10:29:24 -0700
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: 
References: 
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
	
	
Message-ID: 

On 9/27/07, Neil Schemenauer  wrote:
> Guido van Rossum  wrote:
> > However there's quite a bit of Python 2.x code around that manipulates
> > *bytes* in the guise of 8-bit strings, and it uses tests like "if s[0]
> >== 'x': ..." frequently.
>
> I think it would be useful to do a survey and see how much code
> would be affected and the effect on readability.

Agreed. Anyone interested in researching this? (Though at this point
I'm pretty much ready to resolve the issue by choosing ints.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep 27 19:39:32 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 27 Sep 2007 10:39:32 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FBCE5A.6050503@gmail.com>
References: 
	
	
	<46FBCE5A.6050503@gmail.com>
Message-ID: 

I think I've been convinced that b[0] should return an int in range(256).

To Joel Bender: octet is not, and never will be a technical term for
Python. It is a silly standards body compromise.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep 27 19:41:08 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 27 Sep 2007 10:41:08 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FB67BE.4080502@acm.org>
References: 
	<46FB67BE.4080502@acm.org>
Message-ID: 

I didn't miss it, and I don't disagree, I just don't think it has much
bearing on the discussion (which is whether to go with this proposal
at all).

On 9/27/07, Talin  wrote:
> Guido van Rossum wrote:
> > Thinking through the consequences, and noticing that using the array
> > module as an ersatz mutable bytes type is far from ideal, and
> > recalling a proposal put forward earlier by Talin, I floated the
> > suggestion to have both a mutable and an immutable bytes type.  (This
> > had been brought up before, but until seeing the evidence of Jeffrey's
> > patch I wasn't open to the suggestion.)
>
> One thing that you may have missed from my proposal is that both 'bytes'
> and 'buffer' inherit from a common ABC. This ABC defines all of the
> operations which 'bytes' and 'buffer' have in common. My name for this
> ABC was 'ByteSequence', but I have no particular attachment to that name.
>
> -- Talin
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Sep 27 19:44:51 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 27 Sep 2007 10:44:51 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FBB6AB.4030509@cornell.edu>
References: 
	<46FBB6AB.4030509@cornell.edu>
Message-ID: 

On 9/27/07, Joel Bender  wrote:
> First, please enforce that where these functions take a "string"
> parameter that they require an octet or octet string (I couldn't find
> what kinds of arguments these functions require in PEP 3118):
>
>      >>> x = b'123*45'
>      >>> x.find("*")
>      TypeError: expected an octet string or int
>
>      >>> x.find(b'*')
>      3
>      >>> x.find(42)
>      3

PEP 3118 has nothing to do with this, but one of the last paragraphs
of PEP 3137 spells it out:

"""
The str type currently implements the PEP 3118 buffer API.  While this
is perhaps occasionally convenient, it is also potentially confusing,
because the bytes accessed via the buffer API represent a
platform-depending encoding: depending on the platform byte order and
a compile-time configuration option, the encoding could be UTF-16-BE,
UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
of the str type might completely change the bytes representation,
e.g. to UTF-8, or even make it impossible to access the data as a
contiguous array of bytes at all.  Therefore, the PEP 3118 buffer API
will be removed from the str type.
"""

> Second, Please add slice operations and .append() to mutable octet strings:
>
>      >>> x[:0] = b'>'              # start of message
>      >>> x.append(sum(x) % 256)    # simple checksum

Slice operations area already in the PEP, under "Slicing":

"""
Slice assignment to a mutable buffer object accept anything that
implements the PEP 3118 buffer API, or an iterable of integers in
range(256).
"""

I agree that append() and a few other list methods (insert(),
extend()) should be added to the buffer type. The PyBytes
implementation already has these so it's just a matter of updating the
PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Fri Sep 28 00:52:05 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 27 Sep 2007 18:52:05 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: 

On 9/26/07, Guido van Rossum  wrote:

> Comparisons
> -----------

> The bytes and buffer types are comparable with each other and
> orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.

I think bytes (regardless of length) should compare to integers, so that:

    b"" < -sys.maxint < 97 == b'a' < b'aa' < 98

(zero-length buffer < any integer; otherwise compare the number to the
first byte, and in case of ties, a BytesSequence of length 2 or more
is greater)

I'm not as sure about comparing to floats.

Should they be incomparable to integer sequences?

    (97, 98) != b'ab'
    not (97, 98) < b'ab'
    not (97, 98) > b'ab'


> Bytes and the Str Type
> ----------------------

> ... any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object ...

Should a TypeError be raised as soon as you try to put a bytes and a
string in the same dict, even if they don't happen to hash equal?

(I assume that  buffer(b'abc') in {} will raise a TypeError, just as
list("abc") in {} would.)

> Therefore, support for the PEP 3118
> buffer API will be removed from the str type.

Good; this may be the single biggest aid for separting characters from
a particular (bytes) representation.

-jJ

From guido at python.org  Fri Sep 28 01:03:03 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 27 Sep 2007 16:03:03 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
Message-ID: 

On 9/27/07, Jim Jewett  wrote:
> On 9/26/07, Guido van Rossum  wrote:
>
> > Comparisons
> > -----------
>
> > The bytes and buffer types are comparable with each other and
> > orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.
>
> I think bytes (regardless of length) should compare to integers, so that:
>
>     b"" < -sys.maxint < 97 == b'a' < b'aa' < 98

Argh. Yuck. I'm not even asking for a use case. No.

(Note, I've already decided that b[0] should produce an int, not a
1-size bytes object.)

> (zero-length buffer < any integer; otherwise compare the number to the
> first byte, and in case of ties, a BytesSequence of length 2 or more
> is greater)
>
> I'm not as sure about comparing to floats.
>
> Should they be incomparable to integer sequences?
>
>     (97, 98) != b'ab'
>     not (97, 98) < b'ab'
>     not (97, 98) > b'ab'

No. There are no precedents for supporting sequence comparisons across
type boundaries.

> > Bytes and the Str Type
> > ----------------------
>
> > ... any attempt to mix bytes (or
> > buffer) objects and str objects without specifying an encoding will
> > raise a TypeError exception.  This is the case even for simply
> > comparing a bytes or buffer object to a str object ...
>
> Should a TypeError be raised as soon as you try to put a bytes and a
> string in the same dict, even if they don't happen to hash equal?

Good idea, if you can figure out a way to implement this efficiently.

> (I assume that  buffer(b'abc') in {} will raise a TypeError, just as
> list("abc") in {} would.)

Indeed. It will fail to hash.

> > Therefore, support for the PEP 3118
> > buffer API will be removed from the str type.
>
> Good; this may be the single biggest aid for separting characters from
> a particular (bytes) representation.

Right. Much better than the 3.0a1 approach of explicitly excluding
PyUnicode/str where a sequence of bytes is accepted.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nick.bastin at gmail.com  Fri Sep 28 02:28:47 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Thu, 27 Sep 2007 20:28:47 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
References: <1189700532.22693.40.camel@qrnik>
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
Message-ID: <66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>

On 9/22/07, martin at v.loewis.de  wrote:
> argc/argv does not exist on Windows (that you seem to see it
> anyway is an illusion), and if it did exist, it would be characters,
> not bytes.

Of course it exists on Windows.  argc/argv are defined by the C
standard, and say what you will about Windows, but it has a conforming
implementation.  argv exists on Windows exactly the way the C standard
requires it - as an array of null terminated "strings".  It's left as
an exercise to people with more time than I to argue about the
definition of the term 'string' in the C standard (since the standard
itself is silent on the issue).

For what it's worth, the *Python* documentation does NOT guarantee
that the items in sys.argv will be strings.

--
Nick

From greg.ewing at canterbury.ac.nz  Fri Sep 28 03:14:35 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 28 Sep 2007 13:14:35 +1200
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <52dc1c820709262206o33c0b792ib94156556d0b5bc5@mail.gmail.com>
References: 
	<766a29bd0709201548v77c4bfa5xdae9182c2f3083c3@mail.gmail.com>
	<5d44f72f0709242309m492cc238k1b81d860c11345ab@mail.gmail.com>
	
	
	
	
	<1190836856.16322.55.camel@qrnik>
	
	<46FB16DE.7010109@canterbury.ac.nz>
	<52dc1c820709262206o33c0b792ib94156556d0b5bc5@mail.gmail.com>
Message-ID: <46FC557B.3020306@canterbury.ac.nz>

Gregory P. Smith wrote:
> Would a special case in the b'x' comparison tests that knows how to
> compare a len==1 bytes (mutable or not) object to an integer be
> reasonable or just alternately confusing?

Comparison isn't the only thing you might want to do
with bytes. Doing this just for comparison would be
rather arbitrary.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From larry at hastings.org  Fri Sep 28 03:32:29 2007
From: larry at hastings.org (Larry Hastings)
Date: Thu, 27 Sep 2007 18:32:29 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 			<46FBCE5A.6050503@gmail.com>
	
Message-ID: <46FC59AD.1000403@hastings.org>

Guido van Rossum wrote:
> I think I've been convinced that b[0] should return an int in range(256).

This made me feel funny.  I stared at this for a while:
    b'a' != b'abcde'[0]  ?!?
    b'a'[0] != b'a' ?!?

Then I realized that making b[0] return an int simply makes bytes 
objects behave less like strings, and more like tuples of integers:
    ( 97, ) != ( 97, 98, 99, 100, 101 )
    ( 97, )[0] != ( 97, )
Strings have always been the odd man out; no other sequence type has 
this individual-elements-are-implicitly-sequences-too behavior.

So now bytes are straddling the difference between strings and the other 
mapping types:

tuple:
    to construct one with multiple elements: ( 97, 98, 99, 100, 101 )
    elements aren't implicitly sequences: ( 97, ) != ( 97, 98, 99 )[0]
list:
    to construct one with multiple elements: [ 97, 98, 99, 100, 101 ]
    elements aren't implicitly sequences: [ 97, ] != [ 97, 98, 99 ][0]
bytes:
    to construct one with multiple elements: b"abcde"
    elements aren't implicitly sequences: b"a" != b"abcde"[0]
str:
    to construct one with multiple elements: "abcde"
    elements are implicity sequences: "a" == "abcde"[0]

So what should the bytes constructor take?  We all already know it 
should *not* take a string.  (You must explicitly decode a string to get 
a bytes object.)  Clearly it should take an int in the proper range:
    bytes(97) == b'a'
and a bytes object:
    bytes(b'a') == b'a'
    bytes(b'abcde') == b'abcde'
Like the tuple and list constructors, I think it should also attempt to 
cast iterables into its type.  So if you pass in an iterable, and the 
iterable contains nothing but ints in the proper range, it should 
produce a bytes object:
    bytes( [ 97, 98, 99, 100, 101] ) == b'abcde'

Sorry if this is obvious to everybody; thinking through it helped me, at 
least.


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070927/050eed54/attachment.htm 

From greg.ewing at canterbury.ac.nz  Fri Sep 28 03:37:50 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 28 Sep 2007 13:37:50 +1200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
	
	
Message-ID: <46FC5AEE.1060101@canterbury.ac.nz>

Alexandre Vassalotti wrote:
> Personally, I wouldn't mind writing either this:
> 
>    for b in bytes:
>      if b == b'a'[0]:
>        pass

Well, I would mind, because it's needlessly verbose
and inefficient.

I still think that c'x' is the least bad solution. As long
as we're wanting to write arrays of integers by means of
their corresponding ASCII characters, it makes sense to
be able to do that for a single integer as well.

So my current vote is:

   a) Indexing bytes or buffer gives an integer

   b) Have a c'x' notation for expressing a single
      integer

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From greg.ewing at canterbury.ac.nz  Fri Sep 28 03:39:37 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 28 Sep 2007 13:39:37 +1200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FBCE5A.6050503@gmail.com>
References: 
	
	
	<46FBCE5A.6050503@gmail.com>
Message-ID: <46FC5B59.9050807@canterbury.ac.nz>

Nick Coghlan wrote:
> However, I do think it may be worth having an additional iterator on 
> bytes and buffer objects:
> 
>    def fragments(self, size=1): # Could do with a better name

I suggest dice(). :-)

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From victor.stinner at haypocalc.com  Fri Sep 28 04:29:39 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 28 Sep 2007 04:29:39 +0200
Subject: [Python-3000] Python, int/long and GMP
Message-ID: <200709280429.39396.victor.stinner@haypocalc.com>

Hi,

I read some days ago a discussion about GMP (license). I wanted to know if GMP 
is really better than current Python int/long implementation. So I wrote a 
patch for python 3000 subversion (rev. 58277).

I changed long type structure with:

struct _longobject {
	PyObject_HEAD
        mpz_t number;
};

False is the number 0 and True is 1. marshal module is broken, my patch just 
makes gcc happy.

The most important point is the pystone results:
  original python: 32573.3 pystones/second
  python with GMP: 26666.7 pystones/second

So I can now say that GMP is much slower for Python pystone usage of integers. 
I use 32-bit CPU (Celeron M 420 at 1600 MHz on Ubuntu), so most integers are 
just one CPU word (and not a GMP complex structure).

Victor Stinner
http://hachoir.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: working-gmp.patch
Type: text/x-diff
Size: 103033 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070928/18a22c4e/attachment-0001.patch 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: longobject.c
Type: text/x-csrc
Size: 28488 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070928/18a22c4e/attachment-0001.c 

From greg.ewing at canterbury.ac.nz  Fri Sep 28 04:56:12 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 28 Sep 2007 14:56:12 +1200
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FC59AD.1000403@hastings.org>
References: 
	
	
	<46FBCE5A.6050503@gmail.com>
	
	<46FC59AD.1000403@hastings.org>
Message-ID: <46FC6D4C.9080100@canterbury.ac.nz>

Larry Hastings wrote:
> So now bytes are straddling the difference between strings and the other 
> mapping types:

I think the main reason it seems that way is that we're
using a string-like notation for a bytes literal. With
b[i] returning an int, it really behaves just like any
other sequence.

> So what should the bytes constructor take?  ...  Clearly it should 
 > take an int in the proper range:
 >
>     bytes(97) == b'a'

That should be

   bytes([97])

if it's to be consistent with other sequence constructors:

 >>> list(97)
Traceback (most recent call last):
   File "", line 1, in ?
TypeError: iteration over non-sequence

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From martin at v.loewis.de  Fri Sep 28 06:40:44 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 28 Sep 2007 06:40:44 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>	
	<87y7f7ozfq.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<1190070414.20673.12.camel@qrnik>	
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>	
		
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>	
		
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>	
		
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
	<66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>
Message-ID: <46FC85CC.4030806@v.loewis.de>

Nicholas Bastin schrieb:
> On 9/22/07, martin at v.loewis.de  wrote:
>> argc/argv does not exist on Windows (that you seem to see it
>> anyway is an illusion), and if it did exist, it would be characters,
>> not bytes.
> 
> Of course it exists on Windows.  argc/argv are defined by the C
> standard, and say what you will about Windows, but it has a conforming
> implementation.  

It doesn't. Microsoft has a conforming implementation of C for Windows
(Visual C), but Windows does not.

Regards,
Martin

From apt.shansen at gmail.com  Fri Sep 28 07:00:57 2007
From: apt.shansen at gmail.com (Stephen Hansen)
Date: Thu, 27 Sep 2007 22:00:57 -0700
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik> <1190070414.20673.12.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
	<66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>
Message-ID: <7a9c25c20709272200i5856753ey8fb00c7a2d834057@mail.gmail.com>

On 9/27/07, Nicholas Bastin  wrote:
>
> On 9/22/07, martin at v.loewis.de  wrote:
> > argc/argv does not exist on Windows (that you seem to see it
> > anyway is an illusion), and if it did exist, it would be characters,
> > not bytes.
>
> Of course it exists on Windows.  argc/argv are defined by the C
> standard, and say what you will about Windows, but it has a conforming
> implementation.  argv exists on Windows exactly the way the C standard
> requires it - as an array of null terminated "strings".  It's left as
> an exercise to people with more time than I to argue about the
> definition of the term 'string' in the C standard (since the standard
> itself is silent on the issue).


The entry point of a Windows application is WinMain, not main; you can
create a console-only standard C application if you'd like, but its not a
Windows program. Python apps are Windows programs even if they have a
console attached. And the WinMain function passes the entire command line as
a single char* with no breaking or parsing of any kind.

--S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070927/e9170a20/attachment.htm 

From nick.bastin at gmail.com  Fri Sep 28 08:21:18 2007
From: nick.bastin at gmail.com (Nicholas Bastin)
Date: Fri, 28 Sep 2007 02:21:18 -0400
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <46FC85CC.4030806@v.loewis.de>
References: <1189700532.22693.40.camel@qrnik>
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>
	
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>
	
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>
	
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>
	<66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>
	<46FC85CC.4030806@v.loewis.de>
Message-ID: <66d0a6e10709272321v52063cdcldeaac952c4ef4f28@mail.gmail.com>

On 9/28/07, "Martin v. L?wis"  wrote:
> Nicholas Bastin schrieb:
> > On 9/22/07, martin at v.loewis.de  wrote:
> >> argc/argv does not exist on Windows (that you seem to see it
> >> anyway is an illusion), and if it did exist, it would be characters,
> >> not bytes.
> >
> > Of course it exists on Windows.  argc/argv are defined by the C
> > standard, and say what you will about Windows, but it has a conforming
> > implementation.
>
> It doesn't. Microsoft has a conforming implementation of C for Windows
> (Visual C), but Windows does not.

msvcrt ships with the operating system - I'd call that a conforming
implementation.  Programs running in the standard C runtime are just
as much applications as programs using the Win32 API in advapi32.  But
we have drifted far from the topic at hand, since this is obviously a
misunderstanding on whether Windows was used to refer to the OS or the
API.

I still regard handling argv as anything other the raw bytes that come
from the host as bad.  argv *means* something - regardless of whether
WinMain provides it or not.  If we're going to call something
sys.argv, then presumably that was done because there was a
conventionally accepted meaning to it, and I would argue that meaning
comes from standard C.  If it were called sys.lpCmdLine, then I'd say
you have a point, but it isn't, and to the degree that it isn't, I
believe that we should emulate the standard argv behaviour (especially
since lpCmdLine doesn't include the program name).

Of course, on Win32 this entire issue is moot, given the availability
of CommandLineToArgvW(), which would allow you to provide a nice
convenient unicode argv.  However, since not all supported platforms
provide us this functionality, I would suggest we store the result of
any effort to transform argv into unicode into some other well named
member of sys (or make it a function call so it can be computed on
demand if you don't want it in the first place).  Changing the current
meaning of argv will break applications which already handle this
problem, and while I realize that that's not a showstopper for Python
3k, I don't see any particular benefit to introducing this
inconsistency, rather than adding something more defined, like
sys.arguments.

--
Nick

From foom at fuhm.net  Fri Sep 28 09:53:28 2007
From: foom at fuhm.net (James Y Knight)
Date: Fri, 28 Sep 2007 03:53:28 -0400
Subject: [Python-3000] Python, int/long and GMP
In-Reply-To: <200709280429.39396.victor.stinner@haypocalc.com>
References: <200709280429.39396.victor.stinner@haypocalc.com>
Message-ID: <400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net>


On Sep 27, 2007, at 10:29 PM, Victor Stinner wrote:

> Hi,
>
> I read some days ago a discussion about GMP (license). I wanted to  
> know if GMP
> is really better than current Python int/long implementation. So I  
> wrote a
> patch for python 3000 subversion (rev. 58277).
>
> I changed long type structure with:
>
> struct _longobject {
> 	PyObject_HEAD
>         mpz_t number;
> };

> So I can now say that GMP is much slower for Python pystone usage  
> of integers.
> I use 32-bit CPU (Celeron M 420 at 1600 MHz on Ubuntu), so most  
> integers are
> just one CPU word (and not a GMP complex structure).

GMP doesn't have a concept of a non-complex structure. It always  
allocates memory. If you want to have a single CPU word integer, you  
have to provide that outside of GMP. GMP's API is really designed for  
allocating an integer object and reusing it for a number of  
operations. You can generally get away with not doing that without  
destroying performance, but certainly not on small integers.

Here's the init function, just for illustration:
mpz_init (mpz_ptr x)
{
   x->_mp_alloc = 1;
   x->_mp_d = (mp_ptr) (*__gmp_allocate_func) (BYTES_PER_MP_LIMB);
   x->_mp_size = 0;
}

So replacing py3's integers with gmp as you did is not really fair.  
If you're going to use GMP in an immutable integer scenario, you  
really need to have a machine-word-int implementation as well.

So, if you want to actually give GMP a fair trial, I'd suggest trying  
to integrate it with python 2.X, replacing longobject, leaving  
intobject as is.

Also, removing python's caching of integers < 100 as you did in this  
patch is surely a *huge* killer of performance.

James

From jjb5 at cornell.edu  Fri Sep 28 15:58:38 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Fri, 28 Sep 2007 09:58:38 -0400
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: 
References: 
Message-ID: <46FD088E.6050404@cornell.edu>

Should this PEP include changes to the struct module, or should it be a 
separate PEP?

I would like struct.pack() to return bytes and struct.unpack() to accept 
bytes or buffers but not strings.  The 's' and 'p' format specifier 
should refer to bytes and not strings.

In protocol encoding and decoding, "unpack and strip off the front" and 
"pack and append" are very common operations.  I would also like to have 
  buffer.unpack(fmt) be the former and buffer.pack(fmt, v1, v2, ...) be 
the latter.


Joel

From guido at python.org  Fri Sep 28 16:47:45 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 28 Sep 2007 07:47:45 -0700
Subject: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
In-Reply-To: <46FD088E.6050404@cornell.edu>
References: 
	<46FD088E.6050404@cornell.edu>
Message-ID: 

On 9/28/07, Joel Bender  wrote:
> Should this PEP include changes to the struct module, or should it be a
> separate PEP?

Neither.

> I would like struct.pack() to return bytes and struct.unpack() to accept
> bytes or buffers but not strings.

This is already the case in 3.0a1. (Don't people try stuff out before posting?)

> The 's' and 'p' format specifier should refer to bytes and not strings.

They currently allow both, which I think is fine.

> In protocol encoding and decoding, "unpack and strip off the front" and
> "pack and append" are very common operations.  I would also like to have
>   buffer.unpack(fmt) be the former and buffer.pack(fmt, v1, v2, ...) be
> the latter.

IMO that would tie the buffer type too close to the struct module. You
could easily write a wrapper that does this though.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From weilawei at gmail.com  Fri Sep 28 18:32:52 2007
From: weilawei at gmail.com (Rob Crowther)
Date: Fri, 28 Sep 2007 12:32:52 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: <20070928122915.798d00e1.weilawei@gmail.com>
References: <20070925094601.c151245c.weilawei@gmail.com>
	<20070927125557.a5895341.weilawei@gmail.com>
	<20070928122915.798d00e1.weilawei@gmail.com>
Message-ID: <20070928123252.5a0692b0.weilawei@gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Another day, another update. Latest code: http://umass.glexia.net/mpf.tar.bz2

There's been a couple minor changes externally:

a) MPF() now takes a float or integer argument because mpf_set_str is just wacky and I haven't gotten it working properly yet. This does somewhat limit the values you can pass to it, but strings will be added back later on. At that point, you'll have a choice of initializing it with a tuple (base, sign, whole, decimal), a string, a float, or an integer.

b) As a side effect of this, roundtripping doesn't work. Not that it ever worked. But it's a bit further away right now.

Externally, the MPF_get function was rewritten from scratch (for the fourth time). MPF_init was changed to use mpf_set_d instead of mpf_set_str because... well, mpf_set_str is too wacky and unpredictable at the moment. I'm sorting that out as we speak.

If you really want to see lots of internal information, use the build_debug.sh script instead of setup.py. (Note that the directories already need to be in place to compile this way.)

There's a test program which you can compile with the command:

	gcc -o test test.c -lgmp

It's my scratchpad for working out new ideas before integrating them into the extension. Currently, it contains a barebones version of MPF_get and a slew of test cases, soon to be ported to Python. YES, there WILL be a test suite.

Question -- Does anyone know of a decent place to host this project? I'm really lazy about updating project sites, so I'd like something simple offering storage space and a bug tracker. I don't need SVN. I use git on my development box, so that would be a bonus if someone knew of free project hosting with git.

Rob
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFG/Sy0qR5p8HaX4oURAhznAJ9a8N6mgCHXcGph09KhjXu/kYPnFgCeOKLH
ngznr86SynMbF0wQep3GDB0=
=6Pun
-----END PGP SIGNATURE-----

From rhamph at gmail.com  Fri Sep 28 18:44:43 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 28 Sep 2007 10:44:43 -0600
Subject: [Python-3000] Python, int/long and GMP
In-Reply-To: <400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net>
References: <200709280429.39396.victor.stinner@haypocalc.com>
	<400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net>
Message-ID: 

On 9/28/07, James Y Knight  wrote:
>
> On Sep 27, 2007, at 10:29 PM, Victor Stinner wrote:
>
> > Hi,
> >
> > I read some days ago a discussion about GMP (license). I wanted to
> > know if GMP
> > is really better than current Python int/long implementation. So I
> > wrote a
> > patch for python 3000 subversion (rev. 58277).
> >
> > I changed long type structure with:
> >
> > struct _longobject {
> >       PyObject_HEAD
> >         mpz_t number;
> > };
>
> > So I can now say that GMP is much slower for Python pystone usage
> > of integers.
> > I use 32-bit CPU (Celeron M 420 at 1600 MHz on Ubuntu), so most
> > integers are
> > just one CPU word (and not a GMP complex structure).
>
> GMP doesn't have a concept of a non-complex structure. It always
> allocates memory. If you want to have a single CPU word integer, you
> have to provide that outside of GMP. GMP's API is really designed for
> allocating an integer object and reusing it for a number of
> operations. You can generally get away with not doing that without
> destroying performance, but certainly not on small integers.
>
> Here's the init function, just for illustration:
> mpz_init (mpz_ptr x)
> {
>    x->_mp_alloc = 1;
>    x->_mp_d = (mp_ptr) (*__gmp_allocate_func) (BYTES_PER_MP_LIMB);
>    x->_mp_size = 0;
> }
>
> So replacing py3's integers with gmp as you did is not really fair.
> If you're going to use GMP in an immutable integer scenario, you
> really need to have a machine-word-int implementation as well.
>
> So, if you want to actually give GMP a fair trial, I'd suggest trying
> to integrate it with python 2.X, replacing longobject, leaving
> intobject as is.
>
> Also, removing python's caching of integers < 100 as you did in this
> patch is surely a *huge* killer of performance.

I can vouch for that.  Allocation can easily dominate performance.  It
invalidates the rest of the benchmark.

-- 
Adam Olsen, aka Rhamphoryncus

From victor.stinner at haypocalc.com  Fri Sep 28 18:58:29 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 28 Sep 2007 18:58:29 +0200
Subject: [Python-3000] Python, int/long and GMP
In-Reply-To: 
References: <200709280429.39396.victor.stinner@haypocalc.com>
	<400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net>
	
Message-ID: <200709281858.29705.victor.stinner@haypocalc.com>

On Friday 28 September 2007 18:44:43 you wrote:
> > GMP doesn't have a concept of a non-complex structure. It always
> > allocates memory. (...)

I don't know GMP internals. I thaught that GMP uses an hack for small 
integers.

> > Also, removing python's caching of integers < 100 as you did in this
> > patch is surely a *huge* killer of performance.

Oh yes, I removed the cache because I would like to quickly get a working 
Python version. It took me two weeks to write the patch. It's not easy to get 
into CPython source code! And integer is one of the most important type!

> I can vouch for that.  Allocation can easily dominate performance.  It
> invalidates the rest of the benchmark.

I may also use Python garbage collector for GMP memory allocations since GMP 
allows to use my own memory allocating functions.

GMP also has its own reference counter mechanism :-/

Victor
-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From jimjjewett at gmail.com  Fri Sep 28 19:23:40 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 28 Sep 2007 13:23:40 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes and
	Mutable Buffer)
Message-ID: 

On 9/27/07, Guido van Rossum  wrote:
> On 9/27/07, Jim Jewett  wrote:

> > Should a TypeError be raised as soon as you try to put a bytes and a
> > string in the same dict, even if they don't happen to hash equal?

> Good idea, if you can figure out a way to implement this efficiently.

In news that may surprise no one, there were corner cases...

(1)  Does it have to raise the TypeError eagerly in all cases, or is
it OK to do so only when its easy?

For example, would it be OK to stop verifying once some keys have been deleted?

(2)  Is the restriction "sticky" for a dict, or based on current contents?

Current contents makes sense, but ...

If code clears an existing dict rather than creating a new one, then
that specific dict is probably a communication channel, and the API
should specify whether it takes bytes or characters.

-jJ

From guido at python.org  Fri Sep 28 19:36:57 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 28 Sep 2007 10:36:57 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
	and Mutable Buffer)
In-Reply-To: 
References: 
Message-ID: 

Well, maybe this is a good enough argument to give up. If the best we
can say is that having a bytes and a str as keys *may* cause a
TypeError on lookups, I'm not sure it is worth it to try to raise the
probability that it'll actually be raised...

--Guido

On 9/28/07, Jim Jewett  wrote:
> On 9/27/07, Guido van Rossum  wrote:
> > On 9/27/07, Jim Jewett  wrote:
>
> > > Should a TypeError be raised as soon as you try to put a bytes and a
> > > string in the same dict, even if they don't happen to hash equal?
>
> > Good idea, if you can figure out a way to implement this efficiently.
>
> In news that may surprise no one, there were corner cases...
>
> (1)  Does it have to raise the TypeError eagerly in all cases, or is
> it OK to do so only when its easy?
>
> For example, would it be OK to stop verifying once some keys have been deleted?
>
> (2)  Is the restriction "sticky" for a dict, or based on current contents?
>
> Current contents makes sense, but ...
>
> If code clears an existing dict rather than creating a new one, then
> that specific dict is probably a communication channel, and the API
> should specify whether it takes bytes or characters.
>
> -jJ
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Fri Sep 28 20:33:04 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 28 Sep 2007 14:33:04 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
	and Mutable Buffer)
In-Reply-To: 
References: 
	
Message-ID: 

On 9/28/07, Guido van Rossum  wrote:
> Well, maybe this is a good enough argument to give up.

Not quite yet... I still see two potential solutions, depending on
whether or not the exclusion is sticky.  Details below.

=========

If the exclusion is sticky, then add (implicit) flags saying "seen a
string" and "seen a byte".   Similar logic is already there, in that
"seen a non-string" replaces the lookdict function.

The most common case (exact unicode in an exact unicode-only dict)
would stay the same as today, but the other cases would have some
extra type-checking.

=========

If the exclusion is based on current contents, then we can add a
count; my concern is that keeping this efficient may be too hacky.

It looks like there is room for exactly one more pointer (-sized count
variable) before small dicts bleed to a third cacheline.  Because of
this guard, bytes and strings can never appear in the same dict, so at
least one count is zero.  Because dict entries are 3 pointers long,
there can never be more than (Py_ssize_t / 2) entries, so the sign bit
can be repurposed to indicate whether the count refers to strings or
bytes.  (count==0 means no bytes or strings;  count==5 means 5 string
keys;  count==-32 means 32 bytes keys.)

-jJ

From adam at hupp.org  Fri Sep 28 20:34:36 2007
From: adam at hupp.org (Adam Hupp)
Date: Fri, 28 Sep 2007 14:34:36 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
	and Mutable Buffer)
In-Reply-To: 
References: 
	
Message-ID: <766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>

On 9/28/07, Guido van Rossum  wrote:
> Well, maybe this is a good enough argument to give up. If the best we
> can say is that having a bytes and a str as keys *may* cause a
> TypeError on lookups, I'm not sure it is worth it to try to raise the
> probability that it'll actually be raised...

Would it make sense to have dict ignore TypeError on lookups?
Alternatively, the byte/str comparison could throw a specific subclass
of TypeError that dict ignored e.g. IncompatibleComparisonError.

-- 
Adam Hupp | http://hupp.org/adam/

From guido at python.org  Fri Sep 28 20:40:40 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 28 Sep 2007 11:40:40 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
	and Mutable Buffer)
In-Reply-To: <766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
Message-ID: 

On 9/28/07, Adam Hupp  wrote:
> On 9/28/07, Guido van Rossum  wrote:
> > Well, maybe this is a good enough argument to give up. If the best we
> > can say is that having a bytes and a str as keys *may* cause a
> > TypeError on lookups, I'm not sure it is worth it to try to raise the
> > probability that it'll actually be raised...
>
> Would it make sense to have dict ignore TypeError on lookups?

Certainly not.

> Alternatively, the byte/str comparison could throw a specific subclass
> of TypeError that dict ignored e.g. IncompatibleComparisonError.

Well, if we wanted "x" and b"x" to compare unequal instead of raising
an exception, we could just define it that way (it was that way until
just before 3.0a1). But we're explicitly defining it to raise a
TypeError so as to catch buggy code. I think trying to fix dict lookup
so that it, and only it, treats this as unequal, would be adding too
many quirks.

We could choose to kill the TypeError altogether. If we keep it, we
should consistently let it raise TypeError everywhere.

The question is whether it's worth the effort to raise TypeError when
the *potential* exists that a certain hash sequence *could* raise this
TypeError. I'm less and less convinced -- after all, we're making the
exception only for bytes/str, not for other types that might raise
TypeError upon comparison.

So, I think that after all this was a bad idea. Sorry.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Fri Sep 28 20:59:56 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 28 Sep 2007 19:59:56 +0100
Subject: [Python-3000] Immutable bytes -- looking for volunteer
In-Reply-To: <46F9AE94.7010703@canterbury.ac.nz>
References: 
	
	<79990c6b0709250039q3cf5b6a5j3a37797b84fe43d3@mail.gmail.com>
	<46F9AE94.7010703@canterbury.ac.nz>
Message-ID: <79990c6b0709281159u79a4aae1u844549d33358ac01@mail.gmail.com>

On 26/09/2007, Greg Ewing  wrote:
> Paul Moore wrote:
> > The array module is built in, so it's
> > written in C - what needs to be exposed to qualify as a "C API"?
>
> I think he's referring to the fact that there is no
> public array.h header file provided that lays out the
> C-level details. In fact, last time I looked I don't
> think there was any array.h file at all, it was all
> inside array.c.

Thanks. I see what you mean. Given the way the discussion is currently
going, I think I'll hold off doing anything just yet, but I'll keep it
in mind.

Paul

From jimjjewett at gmail.com  Fri Sep 28 21:02:59 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 28 Sep 2007 15:02:59 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
	and Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
Message-ID: 

On 9/28/07, Guido van Rossum  wrote:

> The question is whether it's worth the effort to raise TypeError when
> the *potential* exists that a certain hash sequence *could* raise this
> TypeError.

Bugs depending on the hash sequence are exactly the sort of thing that
doesn't get found by tests, and can't be easily reproduced.

> I'm less and less convinced -- after all, we're making the
> exception only for bytes/str, not for other types that might raise
> TypeError upon comparison.

What would those other types be?

As you point out in the "Bytes and the Str Type" section, this
exception violates the "general rule that comparing objects of
different types for equality
should just return False".

In Py3, there are plenty of types that aren't orderable, but I still
can't think of any[*] others that raise an exception when tested just
for equality.

[*]  It is of course possible to write a malicious class, and it is
possible to write a buggy class.  Even then, most buggy classes fail
when compared to anything from any other class, rather than just for
specific banned comparisons.

-jJ

From ntoronto at cs.byu.edu  Fri Sep 28 21:46:30 2007
From: ntoronto at cs.byu.edu (Neil Toronto)
Date: Fri, 28 Sep 2007 13:46:30 -0600
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
 and Mutable Buffer)
In-Reply-To: 
References: 		<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>	
	
Message-ID: <46FD5A16.7030004@cs.byu.edu>

Jim Jewett wrote:
> On 9/28/07, Guido van Rossum  wrote:
>
>   
>> The question is whether it's worth the effort to raise TypeError when
>> the *potential* exists that a certain hash sequence *could* raise this
>> TypeError.
>>     
>
> Bugs depending on the hash sequence are exactly the sort of thing that
> doesn't get found by tests, and can't be easily reproduced.
>   

Not that my opinion counts for much because I mostly just lurk, but I 
have to agree. A one-in-a-million Heisenbug (Mandelbug?) is exactly the 
sort of thing that breaks production systems but nobody can figure out 
how to fix, and causes management to lose faith in a language or in 
their developers.

>> I'm less and less convinced -- after all, we're making the
>> exception only for bytes/str, not for other types that might raise
>> TypeError upon comparison.
>>     
>
> What would those other types be?
>
> As you point out in the "Bytes and the Str Type" section, this
> exception violates the "general rule that comparing objects of
> different types for equality
> should just return False".
>   

So there's a special case comparison that's intended to protect users 
from themselves - to keep them from comparing bytes and strings without 
specifying an encoding. Then there has to be another potentially 
performance-munching special case to save them from an essentially 
random exception that could occur because of this extra protection - and 
this special-casing can only be guaranteed for built-in types, not 
custom ones. It's too easy to forget to consider it.

Is the only case they need to be saved from the 'if  == ' 
case? Shouldn't it be perfectly fine for a dict to hold a str and a 
bytes? If I recall correctly, the decision to raise a TypeError on 
str/bytes comparison was made before bytes became immutable and could be 
put into dicts.

Maybe the *extra protection* isn't worth the effort. How about a warning 
instead of a TypeError? Can the bytecode interpreter do something for 
simple '==' cases? Are there other alternatives?

Neil


From martin at v.loewis.de  Fri Sep 28 23:00:29 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 28 Sep 2007 23:00:29 +0200
Subject: [Python-3000] Unicode and OS strings
In-Reply-To: <66d0a6e10709272321v52063cdcldeaac952c4ef4f28@mail.gmail.com>
References: <1189700532.22693.40.camel@qrnik>	
	<18159.23173.178488.190409@uwakimon.sk.tsukuba.ac.jp>	
		
	<32C3C54C-18CC-4171-8A59-06170B5CFCD6@fuhm.net>	
		
	<79990c6b0709210741y465c016pbaefb04c2c2f3eee@mail.gmail.com>	
		
	<20070922074840.pwm2kfr2dc4gcgwg@webmail.df.eu>	
	<66d0a6e10709271728i15b31a82s51541816d5c6a66f@mail.gmail.com>	
	<46FC85CC.4030806@v.loewis.de>
	<66d0a6e10709272321v52063cdcldeaac952c4ef4f28@mail.gmail.com>
Message-ID: <46FD6B6D.6080905@v.loewis.de>

> msvcrt ships with the operating system - I'd call that a conforming
> implementation.

Yes, but it's not part of the operating system interface; Microsoft
documents it as "for future use only by system-level components".

> I still regard handling argv as anything other the raw bytes that come
> from the host as bad.

The point is that you cannot use "raw bytes" in Win32, not without
potential loss of data. If you pass arbitrary bytes to os.spawn*,
they get converted to Unicode, and the resulting Unicode command
line gets passed to the child process. So the *native* API is
Unicode, not arbitrary bytes - there is also _wmain supported by
the C library, if you want broken down command line arguments, but
without character set conversions.

> If we're going to call something
> sys.argv, then presumably that was done because there was a
> conventionally accepted meaning to it, and I would argue that meaning
> comes from standard C.

Yes, but also in C, the meaning is "characters", not "bytes". ISO
C 99 5.1.2.2.1p2 specifies they are *strings* passed by the host
environment, and elaborates that if the host environment does
is not capable of supplying mixed-case strings, it should convert
them all into lower case. So the intention clearly is that argv[]
is text, not bytes.

Regards,
Martin


From greg.ewing at canterbury.ac.nz  Sat Sep 29 01:48:33 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 29 Sep 2007 11:48:33 +1200
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
 and Mutable Buffer)
In-Reply-To: 
References: 
Message-ID: <46FD92D1.6020706@canterbury.ac.nz>

Jim Jewett wrote:
> If code clears an existing dict rather than creating a new one, then
> that specific dict is probably a communication channel, and the API
> should specify whether it takes bytes or characters.

This suggests it might be simpler to have normal dicts
refuse to accept bytes at all, and have another type
bytedict for that purpose.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Sep 29 01:57:10 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 29 Sep 2007 11:57:10 +1200
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
 and Mutable Buffer)
In-Reply-To: <766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
Message-ID: <46FD94D6.6040103@canterbury.ac.nz>

Adam Hupp wrote:
> Would it make sense to have dict ignore TypeError on lookups?
> Alternatively, the byte/str comparison could throw a specific subclass
> of TypeError that dict ignored e.g. IncompatibleComparisonError.

Presumably the reason for making strings and bytes uncomparable
in the first place is to catch errors due to unwittingly mixing
strings and bytes. Having dicts ignore the exception would
partly defeat that.

I'm not all that comfortable with the idea of having things
that can't even be compared for equality. Is this meant to be
a permanent feature of the language, or just something to help
people get over the transition? Could it be dropped once
everyone has got over the shock of having strings and bytes
being different things?

--
Greg

From tjreedy at udel.edu  Sat Sep 29 04:27:29 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 28 Sep 2007 22:27:29 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
References: <766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
Message-ID: 


"Guido van Rossum"  wrote in message 
news:ca471dc20709281140q2ef95c2ap8bbc7b7d3d46ebc0 at mail.gmail.com...
|
| Well, if we wanted "x" and b"x" to compare unequal instead of raising
| an exception, we could just define it that way (it was that way until
| just before 3.0a1). But we're explicitly defining it to raise a
| TypeError so as to catch buggy code. I think trying to fix dict lookup
| so that it, and only it, treats this as unequal, would be adding too
| many quirks.
|
| We could choose to kill the TypeError altogether. If we keep it, we
| should consistently let it raise TypeError everywhere.
|
| The question is whether it's worth the effort to raise TypeError when
| the *potential* exists that a certain hash sequence *could* raise this
| TypeError. I'm less and less convinced -- after all, we're making the
| exception only for bytes/str, not for other types that might raise
| TypeError upon comparison.
|
| So, I think that after all this was a bad idea. Sorry.

If you mean making a special case exception for string/bytes equality test, 
I agree.  Would a restricted key dict (say, rdict, in collections) solve 
the problem you are aiming at?

import collections
adict = rdict(str)
bdict = rdict(bytes)

Now any buggy insertions get caught.

Terry J. Reedy




From guido at python.org  Sat Sep 29 05:08:06 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 28 Sep 2007 20:08:06 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
Message-ID: 

On 9/28/07, Terry Reedy  wrote:
> "Guido van Rossum"  wrote in message
> news:ca471dc20709281140q2ef95c2ap8bbc7b7d3d46ebc0 at mail.gmail.com...
> |
> | Well, if we wanted "x" and b"x" to compare unequal instead of raising
> | an exception, we could just define it that way (it was that way until
> | just before 3.0a1). But we're explicitly defining it to raise a
> | TypeError so as to catch buggy code. I think trying to fix dict lookup
> | so that it, and only it, treats this as unequal, would be adding too
> | many quirks.
> |
> | We could choose to kill the TypeError altogether. If we keep it, we
> | should consistently let it raise TypeError everywhere.
> |
> | The question is whether it's worth the effort to raise TypeError when
> | the *potential* exists that a certain hash sequence *could* raise this
> | TypeError. I'm less and less convinced -- after all, we're making the
> | exception only for bytes/str, not for other types that might raise
> | TypeError upon comparison.
> |
> | So, I think that after all this was a bad idea. Sorry.
>
> If you mean making a special case exception for string/bytes equality test,
> I agree.  Would a restricted key dict (say, rdict, in collections) solve
> the problem you are aiming at?
>
> import collections
> adict = rdict(str)
> bdict = rdict(bytes)
>
> Now any buggy insertions get caught.

That sounds like a completely different use case -- a typechecking dict.

The use case we started with is to catch programmers who accidentally
mix str and bytes as dict keys -- those programmers aren't likely to
have thought much about their key type, so they're not likely to go
out of their way to use the rdict you propose above.

But here's a clever trick that might just do the job, without any
extra effort: make it so that the hash() of a bytes string containing
only ASCII bytes is the same as that of a text string containing only
ASCII characters. Likely, programmers will attempt to look up keys
that they know are in the dict -- and if they use the wrong type,
because of the identical hash values, they will get the TypeError as
soon as they compare it to the first object at the hashed location.

Even better, in the proposal we'll be reusing the old PyString type
for the new immutable bytes type, and its hash *already* is equal to
that of a PyUnicode object if they both contain the same ASCII bytes
only. (This used to be by design in 2.x, and I maintained this
property when I made PyUnicode's hash a lot faster.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sat Sep 29 16:24:02 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 29 Sep 2007 10:24:02 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
 Bytesand Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
Message-ID: <20070929142126.D61D23A4045@sparrow.telecommunity.com>

At 08:08 PM 9/28/2007 -0700, Guido van Rossum wrote:
>Likely, programmers will attempt to look up keys
>that they know are in the dict -- and if they use the wrong type,
>because of the identical hash values, they will get the TypeError as
>soon as they compare it to the first object at the hashed location.

I'm coming into this thread a little bit late, but if we don't want 
strings and bytes to be comparable, shouldn't we just make them 
*unequal*?  I mean, under normal circumstances, == and != are 
available on all objects without causing errors, and the same 
TypeError would occur for things like list.remove().

This seems a lot like Oleg's question on Python-Dev the other day, 
about raising a TypeError from __nonzero__: i.e., changing a 
significant expectation about all "normal" objects.

While it's true that it would be good to know when you've 
unintentionally mixed bytes and strings, surely there could be less 
fatal ways to find this, like perhaps a command-line option that 
causes byte/string comparisons to output a warning?


From guido at python.org  Sat Sep 29 16:33:01 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 29 Sep 2007 07:33:01 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: <20070929142126.D61D23A4045@sparrow.telecommunity.com>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
Message-ID: 

On 9/29/07, Phillip J. Eby  wrote:
> At 08:08 PM 9/28/2007 -0700, Guido van Rossum wrote:
> >Likely, programmers will attempt to look up keys
> >that they know are in the dict -- and if they use the wrong type,
> >because of the identical hash values, they will get the TypeError as
> >soon as they compare it to the first object at the hashed location.
>
> I'm coming into this thread a little bit late, but if we don't want
> strings and bytes to be comparable, shouldn't we just make them
> *unequal*?  I mean, under normal circumstances, == and != are
> available on all objects without causing errors, and the same
> TypeError would occur for things like list.remove().

Until just before 3.0a1, they were unequal. We decided to raise
TypeError because we noticed many bugs in code that was doing things
like

  data = f.read(4096)
  if data == "": break

where data was bytes and thus the break never taken. Similar with
checks for certain magic strings (so it wasn't just empty strings).

It is also in line with the policy to refuse things like
b"abc".replace("a", "A") or "abc".replace(b"b", b"B").

> This seems a lot like Oleg's question on Python-Dev the other day,
> about raising a TypeError from __nonzero__: i.e., changing a
> significant expectation about all "normal" objects.
>
> While it's true that it would be good to know when you've
> unintentionally mixed bytes and strings, surely there could be less
> fatal ways to find this, like perhaps a command-line option that
> causes byte/string comparisons to output a warning?

I thought about  using warning too, but since nobody wants warnings,
that would be pretty much the same as raising TypeError except for the
most dedicated individuals (and if I were really dedicated I'd just
write my own eq() function anyway). And the warning would do nothing
about the issue brought up by Jim Jewett, the unpredictable behavior
of a dict with both bytes and strings as keys.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sat Sep 29 17:14:04 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 29 Sep 2007 11:14:04 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
 Bytesand Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
Message-ID: <20070929151127.AE5203A4045@sparrow.telecommunity.com>

At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:
>Until just before 3.0a1, they were unequal. We decided to raise
>TypeError because we noticed many bugs in code that was doing things
>like
>
>   data = f.read(4096)
>   if data == "": break

Thought experiment: what if read() always returned strings, and to 
read bytes, you had to use something like 'f.readinto(ob, 4096)', 
where 'ob' is a mutable bytes instance or memory view?

In Python 2.x, there's only one read() method because (prior to 
unicode), there was only one type of reading to do.

But as the above example makes clear, in 3.x you simply *can't* write 
code that works correctly with an arbitrary file that might be binary 
or text, at least not without typechecking the return value from 
read().  (In which case, you might as well inspect the file 
object.)  So, the above problem could be fixed by having .read() 
raise an error (or simply not exist) on a binary file object.

In this way, the problem is fixed at the point where it really 
occurs: i.e., at the point of not having decided whether the stream 
is bytes or text.

This also seems to fit better (IMO) with the best practice of 
enforcing str/unicode/encoding distinctions at the point where data 
enters the program, rather than delaying the error to later.


>I thought about  using warning too, but since nobody wants warnings,
>that would be pretty much the same as raising TypeError except for the
>most dedicated individuals (and if I were really dedicated I'd just
>write my own eq() function anyway).

The use case I'm concerned about is code that's not type-specific 
getting a TypeError by comparing arbitrary objects.  For example, if 
you write Python code to create a Python code object (e.g. the 
compiler package or my own BytecodeAssembler), you need to create a 
list of constants as you generate the code, and you need to be able 
to search the list for an equal constant.  Since strings and bytes 
can both be constants, a simple list.index() test could now raise a 
TypeError, as could "item in list".

So raising an error to make bad code fail sooner, will also take down 
unsuspecting code that isn't really broken, and *force* the writing 
of special comparison code -- which won't be usable with things like 
list.remove and the "in" operator.

In comparison, forcing code to be bytes vs. text aware at the point 
of I/O directs attention to the place where you can best decide what 
to do about it.  (After all, the comparison that raises the TypeError 
might occur deep in a library that's expecting to work with text.)


>And the warning would do nothing
>about the issue brought up by Jim Jewett, the unpredictable behavior
>of a dict with both bytes and strings as keys.

I've looked at all of Jim's messages for September, but I don't see 
this.  I do see where raising TypeError for comparisons causes a 
problem with dictionaries, but I don't see how an unequal comparison 
creates "unpredictable" behavior (as opposed to predictable failure to match).


From murman at gmail.com  Sat Sep 29 17:12:06 2007
From: murman at gmail.com (Michael Urman)
Date: Sat, 29 Sep 2007 10:12:06 -0500
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
Message-ID: 

On 9/29/07, Guido van Rossum  wrote:
> On 9/29/07, Phillip J. Eby  wrote:
> > I'm coming into this thread a little bit late, but if we don't want
> > strings and bytes to be comparable, shouldn't we just make them
> > *unequal*?  I mean, under normal circumstances, == and != are
> > available on all objects without causing errors, and the same
> > TypeError would occur for things like list.remove().
>
> Until just before 3.0a1, they were unequal. We decided to raise
> TypeError because we noticed many bugs in code that was doing things
> like
>
>   data = f.read(4096)
>   if data == "": break

I agree that it's nice to catch this sort of error early, but I'm
wondering how to reconcile this decision with the discussion we had a
year ago when dicts stopped suppressing comparison exceptions.
http://mail.python.org/pipermail/python-dev/2006-August/068090.html is
the beginning of the thread, and
http://mail.python.org/pipermail/python-dev/2006-August/068112.html is
a clear description of an __eq__ raising an exception as being buggy.

If we're going to take a PBP approach to letting bytes() == str()
raise an exception, is there a PBP factor to having dictionaries cover
for this exception? The only unpredictable thing I see is if you're
willy-nilly mixing bytes and strs and expecting to be able to lookup
one with the other. If you're instead trying to store both, much like
you can store strs and tuples, this shouldn't cause a problem. Even if
it doing so is weird.

The idea of  if "" in somedict: pass  raising a TypeError depending on
the values in somedict is not pleasant. Just to throw another idea out
there, would a variant of dict that suppresses these comparison
exceptions, say collections.loosedict, sidestep the issue?

-- 
Michael Urman

From lists at cheimes.de  Sat Sep 29 17:28:16 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 29 Sep 2007 17:28:16 +0200
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytes
 and Mutable Buffer)
In-Reply-To: 
References: 
Message-ID: 

Jim Jewett wrote:
> On 9/27/07, Guido van Rossum  wrote:
>> On 9/27/07, Jim Jewett  wrote:
> 
>>> Should a TypeError be raised as soon as you try to put a bytes and a
>>> string in the same dict, even if they don't happen to hash equal?
> 
>> Good idea, if you can figure out a way to implement this efficiently.

What do you think about using the class hierarchy for the job? Instead
of raising a TypeError a comparison between a string and a byte raises
StringBytesError that subclasses from TypeError. The dict methods like
lookdict() then reraise the StringBytesError explicitly.

I'm know very little about the dict implementation and my idea could be
totally wrong ... The idea just came to me and perhaps it helps to find
the solution.

Christian


From pje at telecommunity.com  Sat Sep 29 18:01:00 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 29 Sep 2007 12:01:00 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
 Bytesand Mutable Buffer)
In-Reply-To: 
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
	
Message-ID: <20070929155823.C552B3A4045@sparrow.telecommunity.com>

At 10:26 AM 9/29/2007 -0500, Michael Urman wrote:
>[Sending direct because this is just a thanks and some idea fodder,
>but feel free to return this to the list]
>
>On 9/29/07, Phillip J. Eby  wrote:
> > can both be constants, a simple list.index() test could now raise a
> > TypeError, as could "item in list".
>
>Good point - I keep missing the forest for the trees. This isn't just
>a matter of dicts; any collection type can be susceptible. Thanks for
>this reminder.
>
>I'm torn on your idea of making a read vs readinto separation of
>files. If this works by, e.g., raising IOError on attempt to use the
>wrong one, the use case you proposed will be filtering out a ton of
>expected exceptions, but it's easy to understand the behavior.
>
>If it works by removing the wrong method from the object, then we've
>got two different file-like object types returned from the same
>function based on the value of an argument (but a better LBYL check
>available). Of course since we currently have two different types
>returned from a method based on a value passed to its constructor,
>this may be no worse.
>
>I'm not sure which way makes it easier to add new file-like-objects,
>either; they'll have the same problems.

They'll have the same problems *anyway*.  In fact, having different 
methods will simply force people creating such objects to decide what 
they're really trying to do.


From jyasskin at gmail.com  Sat Sep 29 20:10:07 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sat, 29 Sep 2007 11:10:07 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: <20070929151127.AE5203A4045@sparrow.telecommunity.com>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
Message-ID: <5d44f72f0709291110g7e66f00icead0bd060f5ebf9@mail.gmail.com>

On 9/29/07, Phillip J. Eby  wrote:
> At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:
> >Until just before 3.0a1, they were unequal. We decided to raise
> >TypeError because we noticed many bugs in code that was doing things
> >like
> >
> >   data = f.read(4096)
> >   if data == "": break
>
> Thought experiment: what if read() always returned strings, and to
> read bytes, you had to use something like 'f.readinto(ob, 4096)',
> where 'ob' is a mutable bytes instance or memory view?
>
> In Python 2.x, there's only one read() method because (prior to
> unicode), there was only one type of reading to do.
>
> But as the above example makes clear, in 3.x you simply *can't* write
> code that works correctly with an arbitrary file that might be binary
> or text, at least not without typechecking the return value from
> read().  (In which case, you might as well inspect the file
> object.)  So, the above problem could be fixed by having .read()
> raise an error (or simply not exist) on a binary file object.

Perhaps write
  if len(data) == 0: break
since that's what you really mean.

Any other code that compares the result of read() to either a bytes or
a str really is taking a text or binary file object specifically and
not working on an arbitrary file.

> In this way, the problem is fixed at the point where it really
> occurs: i.e., at the point of not having decided whether the stream
> is bytes or text.
>
> This also seems to fit better (IMO) with the best practice of
> enforcing str/unicode/encoding distinctions at the point where data
> enters the program, rather than delaying the error to later.
>
>
> >I thought about  using warning too, but since nobody wants warnings,
> >that would be pretty much the same as raising TypeError except for the
> >most dedicated individuals (and if I were really dedicated I'd just
> >write my own eq() function anyway).
>
> The use case I'm concerned about is code that's not type-specific
> getting a TypeError by comparing arbitrary objects.  For example, if
> you write Python code to create a Python code object (e.g. the
> compiler package or my own BytecodeAssembler), you need to create a
> list of constants as you generate the code, and you need to be able
> to search the list for an equal constant.  Since strings and bytes
> can both be constants, a simple list.index() test could now raise a
> TypeError, as could "item in list".
>
> So raising an error to make bad code fail sooner, will also take down
> unsuspecting code that isn't really broken, and *force* the writing
> of special comparison code -- which won't be usable with things like
> list.remove and the "in" operator.
>
> In comparison, forcing code to be bytes vs. text aware at the point
> of I/O directs attention to the place where you can best decide what
> to do about it.  (After all, the comparison that raises the TypeError
> might occur deep in a library that's expecting to work with text.)
>
>
> >And the warning would do nothing
> >about the issue brought up by Jim Jewett, the unpredictable behavior
> >of a dict with both bytes and strings as keys.
>
> I've looked at all of Jim's messages for September, but I don't see
> this.  I do see where raising TypeError for comparisons causes a
> problem with dictionaries, but I don't see how an unequal comparison
> creates "unpredictable" behavior (as opposed to predictable failure to match).
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
>


-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." ? "Skinny Legs and
All", by Tom Robbins

From greg at krypto.org  Sat Sep 29 21:04:42 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sat, 29 Sep 2007 12:04:42 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: <5d44f72f0709291110g7e66f00icead0bd060f5ebf9@mail.gmail.com>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
	<5d44f72f0709291110g7e66f00icead0bd060f5ebf9@mail.gmail.com>
Message-ID: <52dc1c820709291204r214e3037w78aba5495894da7b@mail.gmail.com>

On 9/29/07, Jeffrey Yasskin  wrote:
>
> On 9/29/07, Phillip J. Eby  wrote:
> > At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:
> > >Until just before 3.0a1, they were unequal. We decided to raise
> > >TypeError because we noticed many bugs in code that was doing things
> > >like
> > >
> > >   data = f.read(4096)
> > >   if data == "": break
> >
> > Thought experiment: what if read() always returned strings, and to
> > read bytes, you had to use something like 'f.readinto(ob, 4096)',
> > where 'ob' is a mutable bytes instance or memory view?
>

Using what encoding?  read() should raise an exception on a file opened as
binary in that case.  And instead of readinto() how about readbytes() that
just returns bytes and raises an exception on non-binary mode files.
(readinto for buffers is a good idea and i think we should have it but that
idea could be taken further to allow for even more scattered IO into a
mutable buffer; thats another discussion and should be a PEP of its own)

> But as the above example makes clear, in 3.x you simply *can't* write
> > code that works correctly with an arbitrary file that might be binary
> > or text, at least not without typechecking the return value from
> > read().  (In which case, you might as well inspect the file
> > object.)  So, the above problem could be fixed by having .read()
> > raise an error (or simply not exist) on a binary file object.
>
> Perhaps write
>   if len(data) == 0: break
> since that's what you really mean.


data = f.read()
if not data: break

Is the preferred way to write that.  Regardless, I agree.  read() returning
a different type based on the file open mode is going to cause problems.  I
do -NOT- like the idea of bytes vs string comparison raising an exception.
read() and readbytes() methods that raise exceptions when used on the wrong
mode of file would "solve" the problem in a more obvious way.

Any other code that compares the result of read() to either a bytes or
> a str really is taking a text or binary file object specifically and
> not working on an arbitrary file.
>
> > In this way, the problem is fixed at the point where it really
> > occurs: i.e., at the point of not having decided whether the stream
> > is bytes or text.
> >
> > This also seems to fit better (IMO) with the best practice of
> > enforcing str/unicode/encoding distinctions at the point where data
> > enters the program, rather than delaying the error to later.
> >
> >
> > >I thought about  using warning too, but since nobody wants warnings,
> > >that would be pretty much the same as raising TypeError except for the
> > >most dedicated individuals (and if I were really dedicated I'd just
> > >write my own eq() function anyway).
> >
> > The use case I'm concerned about is code that's not type-specific
> > getting a TypeError by comparing arbitrary objects.  For example, if
> > you write Python code to create a Python code object (e.g. the
> > compiler package or my own BytecodeAssembler), you need to create a
> > list of constants as you generate the code, and you need to be able
> > to search the list for an equal constant.  Since strings and bytes
> > can both be constants, a simple list.index() test could now raise a
> > TypeError, as could "item in list".
> >
> > So raising an error to make bad code fail sooner, will also take down
> > unsuspecting code that isn't really broken, and *force* the writing
> > of special comparison code -- which won't be usable with things like
> > list.remove and the "in" operator.
> >
> > In comparison, forcing code to be bytes vs. text aware at the point
> > of I/O directs attention to the place where you can best decide what
> > to do about it.  (After all, the comparison that raises the TypeError
> > might occur deep in a library that's expecting to work with text.)
> >
> >
> > >And the warning would do nothing
> > >about the issue brought up by Jim Jewett, the unpredictable behavior
> > >of a dict with both bytes and strings as keys.
> >
> > I've looked at all of Jim's messages for September, but I don't see
> > this.  I do see where raising TypeError for comparisons causes a
> > problem with dictionaries, but I don't see how an unequal comparison
> > creates "unpredictable" behavior (as opposed to predictable failure to
> match).
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com
> >
>
>
> --
> Namast?,
> Jeffrey Yasskin
> http://jeffrey.yasskin.info/
>
> "Religion is an improper response to the Divine." ? "Skinny Legs and
> All", by Tom Robbins
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070929/f1395dd0/attachment.htm 

From tjreedy at udel.edu  Sat Sep 29 23:28:40 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 29 Sep 2007 17:28:40 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: ImmutableBytesand
	Mutable Buffer)
References: <766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com><20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
Message-ID: 


"Guido van Rossum"  wrote in message 
news:ca471dc20709290733i54f63ac3pb4501b94530db820 at mail.gmail.com...
| Until just before 3.0a1, they were unequal.

I think it valuable that in the language as delivered, 'o==p' (as well as 
'bool(o)' )always return True or False.  Both make reasoning about code 
easier since one does not have to learn and carry around in the back of 
one's mind niggling exceptions.  I am -1 on the last minute change and for 
much the same reasons I have against building into the language 
Windows-specific suppression of \r output (see pydev post).

|  We decided to raise
| TypeError because we noticed many bugs in code that was doing things
| like
|
|  data = f.read(4096)
|  if data == "": break
|
| where data was bytes and thus the break never taken.

As G. Smith said, if a generic comparison is meant, then that should be

if not data: break

In any case, this seems like a old-code translation problem rather than a 
new-code writing problem.  We already know that each existing str literal 
may have to be humanly checked to determine whether a 'b' should be 
prepended, as would appear to be the case above.

| Similar with checks for certain magic strings (so it wasn't just empty 
strings).

If a generic comparison is wanted, then "if data in ('abc', b'abc')".

If a specific comparison is wanted, then raising an exception complicates 
what should be simple.  Consider

def g(stuff):
  if stuff == 'abc": special_text()
  elif stuff == b'abc': special_bytes()
  else: general_stuff(stuff)

Breaking equality is not free.

| It is also in line with the policy to refuse things like
| b"abc".replace("a", "A") or "abc".replace(b"b", b"B").

I do not see the connection.  I would expect either to return TypeError, 
just as
  '123'.replace(1,4)
does today, even though
  '1' == 1
is False, rather than exception raising.

Terry Jan Reedy




From pje at telecommunity.com  Sat Sep 29 23:47:39 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 29 Sep 2007 17:47:39 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
 Bytesand Mutable Buffer)
In-Reply-To: <52dc1c820709291204r214e3037w78aba5495894da7b@mail.gmail.co
 m>
References: 
	
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
	<5d44f72f0709291110g7e66f00icead0bd060f5ebf9@mail.gmail.com>
	<52dc1c820709291204r214e3037w78aba5495894da7b@mail.gmail.com>
Message-ID: <20070929214503.E5B133A4045@sparrow.telecommunity.com>

At 12:04 PM 9/29/2007 -0700, Gregory P. Smith wrote:


>On 9/29/07, Jeffrey Yasskin 
><jyasskin at gmail.com> wrote:
>On 9/29/07, Phillip J. Eby 
><pje at telecommunity.com> wrote:
> > At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote:
> > >Until just before 3.0a1, they were unequal. We decided to raise
> > >TypeError because we noticed many bugs in code that was doing things
> > >like
> > >
> > >   data = f.read(4096)
> > >   if data == "": break
> >
> > Thought experiment: what if read() always returned strings, and to
> > read bytes, you had to use something like 'f.readinto(ob, 4096)',
> > where 'ob' is a mutable bytes instance or memory view?
>
>
>Using what encoding?  read() should raise an exception on a file 
>opened as binary in that case.

Yes, that's what I meant -- the availability of read() and readinto() 
would be mutually exclusive.


>   And instead of readinto() how about readbytes() that just returns 
> bytes and raises an exception on non-binary mode files.

Sure.


>   (readinto for buffers is a good idea and i think we should have 
> it but that idea could be taken further to allow for even more 
> scattered IO into a mutable buffer; thats another discussion and 
> should be a PEP of its own)

Fair enough, although readbytes() can be implemented in terms of 
readinto(), while the reverse isn't the case.


From facundobatista at gmail.com  Sun Sep 30 16:32:38 2007
From: facundobatista at gmail.com (Facundo Batista)
Date: Sun, 30 Sep 2007 11:32:38 -0300
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: <20070928123252.5a0692b0.weilawei@gmail.com>
References: <20070925094601.c151245c.weilawei@gmail.com>
	<20070927125557.a5895341.weilawei@gmail.com>
	<20070928122915.798d00e1.weilawei@gmail.com>
	<20070928123252.5a0692b0.weilawei@gmail.com>
Message-ID: 

2007/9/28, Rob Crowther :

> a) MPF() now takes a float or integer argument because mpf_set_str is just

Rob, there has been a *lot* of discussion about this for Decimal (see
the PEP and discussions in python-dev and python-list around the PEP
date).

The main issue here is what means the user if he calls MPF(2.3):

a) MPF("2.3")

b) MPF("2.2999999999999998")

The difficult of the choice is that a) is maybe what she expects, b)
is the value value (so why not to think she expects the real value?)

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From dickinsm at gmail.com  Sun Sep 30 17:12:00 2007
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 30 Sep 2007 11:12:00 -0400
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: 
References: <20070925094601.c151245c.weilawei@gmail.com>
	<20070927125557.a5895341.weilawei@gmail.com>
	<20070928122915.798d00e1.weilawei@gmail.com>
	<20070928123252.5a0692b0.weilawei@gmail.com>
	
Message-ID: <5c6f2a5d0709300812w56b024b2l2765cc35a07353f8@mail.gmail.com>

On 9/30/07, Facundo Batista  wrote:
>
> 2007/9/28, Rob Crowther :
>
> > a) MPF() now takes a float or integer argument because mpf_set_str is
> just
>
> Rob, there has been a *lot* of discussion about this for Decimal (see
> the PEP and discussions in python-dev and python-list around the PEP
> date).



But there's a major difference here: Decimal is *decimal* floating point,
MPF and Python floats are *binary* floating point.


So in the case of Decimal, conversion from a decimal string is a
straightforward operation, while conversion from binary involves making
choices about how to round, how many decimal digits to use, etc.


But for MPF it's the other way around:  conversion from a float is immediate
(the GMP precision is always at least 53 bits, so any IEEE double can be
represented as an MPF with no loss of information), while conversion from a
string involves hard work and decisions about how to round (and GMP's
approach to rounding seems pretty haphazard here...).


So since there's really no ambiguity about what MPF(float) should be, and
since it's a computationally trivial operation to initialize an MPF from a
float, you certainly want to allow MPF's to be initialized from floats.
 Admittedly, for initialization from a float *literal* there are still going
to be some surprises for the unwary:  with MPF precision set to 128 bits,
MPF( 1.1) is going to give a binary number that's an accurate representation
of the decimal 1.1 to only 53 bits, not 128 bits.



> The main issue here is what means the user if he calls MPF(2.3):
>
> a) MPF("2.3")
>
> b) MPF("2.2999999999999998")



All 3 of MPF(2.3), MPF("2.3") and MPF(" 2.29...998") should be different
values.  MPF(2.3) is the closest 53-bit binary floating point number to the
decimal 2.3, padded out with zero bits to whatever the current MPF precision
is.  MPF("2.3") should ideally be the closest p-bit binary floating point
number to the decimal 2.3, where p is the current precision.  But in fact,
with the way that GMP works it seems that all that can be said is that MPF("
2.3") is a (p+some_extra_bits) binary floating point number that's close
(but not necessarily closest) to the decimal 2.3.  Similarly for MPF("
2.29...998").


By the way, I'm wondering whether this discussion really belongs on
comp.lang.python instead...


Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070930/31243546/attachment.htm 

From jimjjewett at gmail.com  Sun Sep 30 18:31:23 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 30 Sep 2007 12:31:23 -0400
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: <20070929155823.C552B3A4045@sparrow.telecommunity.com>
References: 
	<766a29bd0709281134m48c930b6ye5d03ed08b27f4d3@mail.gmail.com>
	
	
	
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
	
	<20070929155823.C552B3A4045@sparrow.telecommunity.com>
Message-ID: 

At 10:26 AM 9/29/2007 -0500, Michael Urman wrote:
> This isn't just a matter of dicts; any collection type can be susceptible.

The reason that dicts (and sets) are even worse is that the comparison
could be delayed.  If

    b"bytes" in [...]

raises an exception, it happens while b"bytes" is still in the
traceback context.  With a dictionary, the problem comparison could be
delayed until the next resize.  Even if the TypeError did tell you
which dict and (pair of pre-existing) keys were a problem, you still
wouldn't know how those keys got there.

Example data flow:

    insert string1 with hash X
    insert string2 with hash X -- collision, so it moves to the next slot
    del string1
    insert bytes with hash X -- replaces the dummy entry, so nothing raised yet
    ...
    insert something utterly unrelated, such as an integer.  This
causes a resize, so that now string2 and bytes do collide and raise a
TypeError complaining about strings and bytes -- even though the key
you added is neither.

-jJ