From techtonik at gmail.com Wed Jan 1 12:58:35 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 1 Jan 2014 14:58:35 +0300 Subject: [Python-ideas] os.architecture In-Reply-To: References: <8761q7kwt6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Dec 30, 2013 at 3:13 PM, Andrew Barnert wrote: > On Dec 30, 2013, at 0:56, anatoly techtonik wrote: > >> Ok. Architecture is a fail in terminology. The word "OS architecture" >> can mean many things, and it will be the same design flaw as os.name. >> >> How about os.bitness instead? > > You missed the part where you were told that os is for OS services, not platform (including hardware, interpreter, and OS) information. I've heard your opinion. Now why do you think os is for OS services? Docs say os is about OS interfaces, to which bitness or architecture is interface information. > Anyway, "bitness" by itself doesn't tell you whether it will return 32 or 64 when running a 32-bit Python on 64-bit Windows That's why it is "os.bitness", not "interpreter.bitness" or "cpu.bitness". > It's just as potentially ambiguous as the functions that already exist Do you still think so after my example above? From thomasgrzybowski at gmail.com Wed Jan 1 21:12:58 2014 From: thomasgrzybowski at gmail.com (tg) Date: Wed, 01 Jan 2014 15:12:58 -0500 Subject: [Python-ideas] Reporting tools for python Message-ID: <52C476CA.6000501@gmail.com> With the more general use of python for access to database information, numpy and scipy analysis, and web posting, it seems that there should be more and better means of reporting from python. Some of the existing tools are too low-level for general use (such as Reportlab). As far as I can tell, there are no tools that approach the high-level functionality of Proc Report, as used in SAS. Pagination with headers and footers, and column-spanning headers are some specific tool limitations. I believe that there would be even more usage of python in science and industry if there were better tools for reporting. ~Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Wed Jan 1 21:28:53 2014 From: phd at phdru.name (Oleg Broytman) Date: Wed, 1 Jan 2014 21:28:53 +0100 Subject: [Python-ideas] Reporting tools for python In-Reply-To: <52C476CA.6000501@gmail.com> References: <52C476CA.6000501@gmail.com> Message-ID: <20140101202853.GA32646@phdru.name> Hi! On Wed, Jan 01, 2014 at 03:12:58PM -0500, tg wrote: > With the more general use of python for access to database > information, numpy and scipy analysis, and web posting, it seems that > there should be more and better means of reporting from python. Well, python is a programming language. It doesn't need any builtin reporting. Even the standard library doesn't. (python-ideas is about ides for python and stdlib, not third-party libraries or applications.) > As far as I can tell, there are no tools that approach the > high-level functionality > of Proc Report, as used in SAS. Pagination with headers and footers, > and column-spanning headers are some specific tool limitations. Like http://pythonreports.sourceforge.net/ ? It was written in our company (not by me) and was in use for some time. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From stephen at xemacs.org Thu Jan 2 06:24:00 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 02 Jan 2014 14:24:00 +0900 Subject: [Python-ideas] os.architecture In-Reply-To: References: <8761q7kwt6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ppobj6rz.fsf@uwakimon.sk.tsukuba.ac.jp> anatoly techtonik writes: > I've heard your opinion. Now why do you think os is for OS > services? Because everything in there is a Python wrapper for an OS service, and because platfrom covers your use case. That may not be obvious to you. But AFAICT (once explained) it works for most Pythonistas and is a consistent point of view. Your suggestion is nowhere near TOOWTDI, so it's not going to happen. From techtonik at gmail.com Wed Jan 1 20:01:13 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 1 Jan 2014 22:01:13 +0300 Subject: [Python-ideas] Fixing __file__ to be absolute Message-ID: Fixing this thing will make my happy (or very sad if you'd like this). Problem is described here: http://stackoverflow.com/a/6416333/239247 Summary: 1. chdir() 2. dirname(__file__) 3. FAIL Proposal: from __future__ import abs__file__ ? -- anatoly t. From taleinat at gmail.com Thu Jan 2 13:37:56 2014 From: taleinat at gmail.com (Tal Einat) Date: Thu, 2 Jan 2014 14:37:56 +0200 Subject: [Python-ideas] Fixing __file__ to be absolute In-Reply-To: References: Message-ID: On Wed, Jan 1, 2014 at 9:01 PM, anatoly techtonik wrote: > Fixing this thing will make my happy (or very sad if you'd like this). > > Problem is described here: > http://stackoverflow.com/a/6416333/239247 > Summary: > 1. chdir() > 2. dirname(__file__) > 3. FAIL > > Proposal: > from __future__ import abs__file__ Anatoly, this subject was already discussed on this list, just three months ago, in a thread you started! [1]_ To quote one of Nick Coglahan's replies [2]_: > Note that any remaining occurrences of non-absolute values in __file__ are > generally considered bugs in the import system. However, we tend not to fix > them in maintenance releases, since converting relative paths to absolute > paths runs a risk of breaking user code. > We're definitely *not* going to further pollute the module namespace with > values that can be trivially and reliably derived from existing values. - Tal .. [1]: https://mail.python.org/pipermail/python-ideas/2013-September/023469.html .. [2]: https://mail.python.org/pipermail/python-ideas/2013-September/023486.html From liam.marsh.home at gmail.com Thu Jan 2 12:57:49 2014 From: liam.marsh.home at gmail.com (Liam Marsh) Date: Thu, 2 Jan 2014 12:57:49 +0100 Subject: [Python-ideas] *var()* In-Reply-To: References: Message-ID: hello,here is my idea: var(): input var name (str), outputs var value example: >>>count1=1.34 >>>var('count',1) 1.34thank you and have a nice day! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jan 2 14:27:42 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 2 Jan 2014 05:27:42 -0800 Subject: [Python-ideas] *var()* In-Reply-To: References: Message-ID: On Thu, Jan 2, 2014 at 3:57 AM, Liam Marsh wrote: > hello,here is my idea: > var(): > input var name (str), > outputs var value > example: > >>>>count1=1.34 >>>>var('count',1) > 1.34thank you and have a nice day! This is underspecified. What should it do for this code? count = 3 def foo(): print var('count', 1) foo() If the output is "1", then you're in luck and can already use vars().get('count', 1) Otherwise, I don't know a trivial one-liner to do it. Either way I'd be -1 on its inclusion in Python, it encourages a bad idiom. -- Devin From brett at python.org Thu Jan 2 14:28:48 2014 From: brett at python.org (Brett Cannon) Date: Thu, 2 Jan 2014 08:28:48 -0500 Subject: [Python-ideas] Fixing __file__ to be absolute In-Reply-To: References: Message-ID: On Thu, Jan 2, 2014 at 7:37 AM, Tal Einat wrote: > On Wed, Jan 1, 2014 at 9:01 PM, anatoly techtonik > wrote: > > Fixing this thing will make my happy (or very sad if you'd like this). > > > > Problem is described here: > > http://stackoverflow.com/a/6416333/239247 > > Summary: > > 1. chdir() > > 2. dirname(__file__) > > 3. FAIL > > > > Proposal: > > from __future__ import abs__file__ > > Anatoly, this subject was already discussed on this list, just three > months ago, in a thread you started! [1]_ > > To quote one of Nick Coglahan's replies [2]_: > > > Note that any remaining occurrences of non-absolute values in __file__ > are > > generally considered bugs in the import system. However, we tend not to > fix > > them in maintenance releases, since converting relative paths to absolute > > paths runs a risk of breaking user code. > > > We're definitely *not* going to further pollute the module namespace with > > values that can be trivially and reliably derived from existing values. > This was also changed in Python 3.4 back in October: http://hg.python.org/cpython/rev/76184b5339f2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 2 14:35:00 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Jan 2014 00:35:00 +1100 Subject: [Python-ideas] *var()* In-Reply-To: References: Message-ID: <20140102133500.GM29356@ando> On Thu, Jan 02, 2014 at 12:57:49PM +0100, Liam Marsh wrote: > hello,here is my idea: > var(): > input var name (str), > outputs var value > example: > > >>>count1=1.34 > >>>var('count',1) > 1.34thank you and have a nice day! Hello Liam, and welcome! Is this your first post here? I don't recall seeing your name before. I'm afraid I don't quite understand your example above. The "thank you and have a nice day" confuses me, I don't understand where it comes from. Also, I'm not sure why you define a variable count1 = 1.34, and then pass "count", 1 as two separate arguments to the function. So I'm going to try to guess what your idea actually is, or at least what I think is reasonable, if I get it wrong please feel free to correct me. You want a function, var(), which takes a single argument, the name of a variable, and then returns the value of that variable. E.g. given a variable "count1" set to the value 1.34, the function call: var("count1") will return 1.34. Is this what you mean? If so, firstly, the name "var" is too close to the existing function "vars". This would cause confusion. Secondly, you can already do this, or at least *almost* this, using the locals() and globals() functions. Both will return a dict containing the local and global variables, so you can look up the variable name easily using locals() and standard dictionary methods: py> count1 = 1.34 py> locals()['count1'] 1.34 py> locals().get('count2', 'default') 'default' The only thing which is missing is that there's no way to look up a variable name if you don't know which scope it is in. Normally name resolution goes: locals nonlocals globals builtins You can easily look up a local name, or a global name, using the locals() and globals() function. With just a tiny bit more effort, you can also look in the builtins. But there's no way that I know of to look up a nonlocal name, or a name in an unspecified scope. Consequently, this *almost* works: def lookup(name): import builtins for namespace in (locals(), globals(), vars(builtins)): try: return namespace[name] except KeyError: pass raise NameError("name '%s' not found" % name) except for the nonlocal scope. I would have guessed that you could get this working with eval, but if there is such a way, I can't work it out. I think this would make a nice addition to the inspect module. I wouldn't want to see it as a builtin function, since it would encourage a style of programming which I think is poor, but for those occasional uses where you want to look up a variable from an unknown scope, I think this would be handy. -- Steven From techtonik at gmail.com Thu Jan 2 14:46:53 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 2 Jan 2014 16:46:53 +0300 Subject: [Python-ideas] Fixing __file__ to be absolute In-Reply-To: References: Message-ID: On Thu, Jan 2, 2014 at 4:28 PM, Brett Cannon wrote: > On Thu, Jan 2, 2014 at 7:37 AM, Tal Einat wrote: >> On Wed, Jan 1, 2014 at 9:01 PM, anatoly techtonik >> wrote: >> > Fixing this thing will make my happy (or very sad if you'd like this). >> > >> > Problem is described here: >> > http://stackoverflow.com/a/6416333/239247 >> > Summary: >> > 1. chdir() >> > 2. dirname(__file__) >> > 3. FAIL >> > >> > Proposal: >> > from __future__ import abs__file__ >> >> Anatoly, this subject was already discussed on this list, just three >> months ago, in a thread you started! [1]_ >> >> To quote one of Nick Coglahan's replies [2]_: >> >> > Note that any remaining occurrences of non-absolute values in __file__ >> > are >> > generally considered bugs in the import system. However, we tend not to >> > fix >> > them in maintenance releases, since converting relative paths to >> > absolute >> > paths runs a risk of breaking user code. >> >> > We're definitely *not* going to further pollute the module namespace >> > with >> > values that can be trivially and reliably derived from existing values. > > > This was also changed in Python 3.4 back in October: > http://hg.python.org/cpython/rev/76184b5339f2 Thanks. That's just what I was looking for - a status update. Links in emails are not telling anything about progress being made, roadmap, problems and versions of Python. Seem like tracker is a poor tool to track this stuff too. Now in spite of recent Python 3 status update, the question is how possible to make this feature more visible and implemented in previous version as from __future__ import abs__file__? I'd like to ask for two perspectives: 1. technical feasibility 2. political obstacles (backward compatibility policy / process obstacles), even if they are obvious Also, what is the process of nominating this features to selection in Python 2.8 (or whatever comes out of this incremental development idea)? So, three questions with ideas in total. -- anatoly t. From brett at python.org Thu Jan 2 15:52:12 2014 From: brett at python.org (Brett Cannon) Date: Thu, 2 Jan 2014 09:52:12 -0500 Subject: [Python-ideas] Fixing __file__ to be absolute In-Reply-To: References: Message-ID: On Thu, Jan 2, 2014 at 8:46 AM, anatoly techtonik wrote: > On Thu, Jan 2, 2014 at 4:28 PM, Brett Cannon wrote: > > On Thu, Jan 2, 2014 at 7:37 AM, Tal Einat wrote: > >> On Wed, Jan 1, 2014 at 9:01 PM, anatoly techtonik > >> wrote: > >> > Fixing this thing will make my happy (or very sad if you'd like this). > >> > > >> > Problem is described here: > >> > http://stackoverflow.com/a/6416333/239247 > >> > Summary: > >> > 1. chdir() > >> > 2. dirname(__file__) > >> > 3. FAIL > >> > > >> > Proposal: > >> > from __future__ import abs__file__ > >> > >> Anatoly, this subject was already discussed on this list, just three > >> months ago, in a thread you started! [1]_ > >> > >> To quote one of Nick Coglahan's replies [2]_: > >> > >> > Note that any remaining occurrences of non-absolute values in __file__ > >> > are > >> > generally considered bugs in the import system. However, we tend not > to > >> > fix > >> > them in maintenance releases, since converting relative paths to > >> > absolute > >> > paths runs a risk of breaking user code. > >> > >> > We're definitely *not* going to further pollute the module namespace > >> > with > >> > values that can be trivially and reliably derived from existing > values. > > > > > > This was also changed in Python 3.4 back in October: > > http://hg.python.org/cpython/rev/76184b5339f2 > > Thanks. That's just what I was looking for - a status update. > Links in emails are not telling anything about progress being > made, roadmap, problems and versions of Python. Seem like > tracker is a poor tool to track this stuff too. > It's not in released code yet so there is no way to really promote this in a way that is guaranteed not to change. It will be in the What's New doc for Python 3.4, though, when the final version is released: http://docs.python.org/3.4/whatsnew/3.4.html#other-language-changes > > Now in spite of recent Python 3 status update, the question is how > possible to make this feature more visible and implemented in > previous version as from __future__ import abs__file__? > There is no chance that will ever happen. > > I'd like to ask for two perspectives: > 1. technical feasibility > I don't see why it wouldn't be technically possible since I made it work in Python 3.4. > 2. political obstacles (backward compatibility policy / process obstacles), > even if they are obvious > It would be a total break in backwards-compatibility by adding a new feature in a bugfix release and that's never acceptable (and that rule has been in effect since Python 2.2.1). > > Also, what is the process of nominating this features to selection in > Python 2.8 (or whatever comes out of this incremental development idea)? > There is no future Python 2.8 release so there is no process to nominate something; PEP 404 is very clear on this: http://python.org/dev/peps/pep-0404/ . And there is no "incremental development idea" or something that's going to change the current development process of Python so that part of the questions doesn't make sense to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liam.marsh.home at gmail.com Thu Jan 2 17:22:21 2014 From: liam.marsh.home at gmail.com (Liam Marsh) Date: Thu, 2 Jan 2014 17:22:21 +0100 Subject: [Python-ideas] *var()* In-Reply-To: References: Message-ID: dear Jeanpierre, sorry, no. for >>>count1=3, var('count1') or var(str('count',1)) will output 3 in fact, it is even better to use libraries, and it was stupid to send the first email before trying an other way. sorry. 2014/1/2 Devin Jeanpierre > On Thu, Jan 2, 2014 at 3:57 AM, Liam Marsh > wrote: > > hello,here is my idea: > > var(): > > input var name (str), > > outputs var value > > example: > > > >>>>count1=1.34 > >>>>var('count',1) > > 1.34 > >thank you and have a nice day! > > This is underspecified. What should it do for this code? > > count = 3 > def foo(): > print var('count', 1) > foo() > > If the output is "1", then you're in luck and can already use > vars().get('count', 1) > > Otherwise, I don't know a trivial one-liner to do it. Either way I'd > be -1 on its inclusion in Python, it encourages a bad idiom. > > -- Devin > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.spir at gmail.com Thu Jan 2 17:39:24 2014 From: denis.spir at gmail.com (spir) Date: Thu, 02 Jan 2014 17:39:24 +0100 Subject: [Python-ideas] *var()* In-Reply-To: <20140102133500.GM29356@ando> References: <20140102133500.GM29356@ando> Message-ID: <52C5963C.1040909@gmail.com> On 01/02/2014 02:35 PM, Steven D'Aprano wrote: > On Thu, Jan 02, 2014 at 12:57:49PM +0100, Liam Marsh wrote: > >> hello,here is my idea: >> var(): >> input var name (str), >> outputs var value >> example: >> >>>>> count1=1.34 >>>>> var('count',1) >> 1.34thank you and have a nice day! > > > Hello Liam, and welcome! Is this your first post here? I don't recall > seeing your name before. > > I'm afraid I don't quite understand your example above. The "thank you > and have a nice day" confuses me, I don't understand where it comes > from. Also, I'm not sure why you define a variable count1 = 1.34, and > then pass "count", 1 as two separate arguments to the function. So I'm > going to try to guess what your idea actually is, or at least what I > think is reasonable, if I get it wrong please feel free to correct me. > > You want a function, var(), which takes a single argument, the name of a > variable, and then returns the value of that variable. E.g. given a > variable "count1" set to the value 1.34, the function call: > > var("count1") > > will return 1.34. > > Is this what you mean? > > If so, firstly, the name "var" is too close to the existing function > "vars". This would cause confusion. > > Secondly, you can already do this, or at least *almost* this, using > the locals() and globals() functions. Both will return a dict containing > the local and global variables, so you can look up the variable name > easily using locals() and standard dictionary methods: > > > py> count1 = 1.34 > py> locals()['count1'] > 1.34 > py> locals().get('count2', 'default') > 'default' > > > The only thing which is missing is that there's no way to look up a > variable name if you don't know which scope it is in. Normally name > resolution goes: > > locals > nonlocals > globals > builtins > > You can easily look up a local name, or a global name, using the > locals() and globals() function. With just a tiny bit more effort, you > can also look in the builtins. But there's no way that I know of to look > up a nonlocal name, or a name in an unspecified scope. Consequently, > this *almost* works: > > def lookup(name): > import builtins > for namespace in (locals(), globals(), vars(builtins)): > try: > return namespace[name] > except KeyError: > pass > raise NameError("name '%s' not found" % name) > > > except for the nonlocal scope. > > I would have guessed that you could get this working with eval, but if > there is such a way, I can't work it out. > > I think this would make a nice addition to the inspect module. I > wouldn't want to see it as a builtin function, since it would encourage > a style of programming which I think is poor, but for those occasional > uses where you want to look up a variable from an unknown scope, I think > this would be handy. I once used a direct try ... except NameError, which automagically looks up in the whole scope cascade: i = 1 try: x = i except NameError: x = None # no "lookup-able" symbol 'j' try: y = j except NameError: y = None print (x,y) # ==> 1 None Pretty practicle. [Actually, I've never had any need for this in real python code, it was to simulate variable strings (implanted as eg "Hello, {username}!"), which requires variable lookup by name, itself variable. But python already has the final feature (even twice, with % or format).] Denis From james at dontusethiscode.com Thu Jan 2 21:29:21 2014 From: james at dontusethiscode.com (James Powell) Date: Thu, 02 Jan 2014 15:29:21 -0500 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple Message-ID: <52C5CC21.5030002@dontusethiscode.com> Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g., 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int)) In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration. As a result, the following are considered invalid: 'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz') Traceback (most recent call last): File "", line 1, in TypeError: startswith first arg must be str, unicode, or tuple There are two common workarounds: 'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'}) Of course, the following construction already has a clear, separate meaning: 'spam'.startswith('sz') # 'spam' starts with 'sz' In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch? The code would look something like: it = PyObject_GetIter(subobj); if (it == NULL) return NULL; iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; } Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z': 'spam'.startswith('sz') I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com From guido at python.org Fri Jan 3 00:24:00 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Jan 2014 13:24:00 -1000 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <52C5CC21.5030002@dontusethiscode.com> References: <52C5CC21.5030002@dontusethiscode.com> Message-ID: The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?) On Thu, Jan 2, 2014 at 10:29 AM, James Powell wrote: > Some functions and methods allow the provision of a tuple of arguments > which will be looped over internally. e.g., > > 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' > isinstance(42, (float, int)) > > In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to > perform this internal iteration. > > As a result, the following are considered invalid: > > 'spam'.startswith(['s', 'z']) > 'spam'.startswith({'s', 'z'}) > 'spam'.startswith(x for x in 'sz') > > Traceback (most recent call last): > File "", line 1, in > TypeError: startswith first arg must be str, unicode, or tuple > > There are two common workarounds: > > 'spam'.startswith(tuple({'s', 'z'})) > any('spam'.startwith(c) for c in {'s', 'z'}) > > Of course, the following construction already has a clear, separate meaning: > > 'spam'.startswith('sz') # 'spam' starts with 'sz' > > In these cases, could we supplant the PyTuple_Check with one that would > allow any iterator? Alternatively, could add this as an additional branch? > > The code would look something like: > > it = PyObject_GetIter(subobj); > if (it == NULL) > return NULL; > > iternext = *Py_TYPE(it)->tp_iternext; > for(;;) { > substring = iternext(it); > if (substring == NULL) > Py_RETURN_FALSE; > result = tailmatch(self, substring, start, end, -1); > Py_DECREF(substring); > if (result) > Py_RETURN_TRUE; > } > > Of course, in the case of methods like .startswith, this would need to > ensure the following behaviour remains unchanged. The following should > always check if 'spam' starts with 'sz' not starts with 's' or with 'z': > > 'spam'.startswith('sz') > > I searched bugs.python.org and python-ideas for any previous discussion > of this topic. If this seems reasonable, I can submit an enhancement to > bugs.python.org with a patch for unicodeobject.c:unicode_startswith > > Cheers, > James Powell > > follow: @dontusethiscode + @nycpython > attend: nycpython.org + flask-nyc.org > read: seriously.dontusethiscode.com > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) From amber.yust at gmail.com Fri Jan 3 00:33:59 2014 From: amber.yust at gmail.com (Amber Yust) Date: Thu, 02 Jan 2014 23:33:59 +0000 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple References: <52C5CC21.5030002@dontusethiscode.com> Message-ID: <-7933402584649597485@gmail297201516> I could see expanding to allow lists/sets as well as tuples being useful, e.g. for using dynamically generated prefix lists without creating additional tuple objects, but I don't see arbitrary iteration being necessary. On Thu Jan 02 2014 at 3:25:20 PM, Guido van Rossum wrote: > The current behavior is intentional, and the ambiguity of strings > themselves being iterables is the main reason. Since startswith() is > almost always called with a literal or tuple of literals anyway, I see > little need to extend the semantics. (I notice that you don't actually > give any examples where the iterator would be useful -- have you > encountered any, or are you just arguing for consistency's sake?) > > On Thu, Jan 2, 2014 at 10:29 AM, James Powell > wrote: > > Some functions and methods allow the provision of a tuple of arguments > > which will be looped over internally. e.g., > > > > 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' > > isinstance(42, (float, int)) > > > > In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to > > perform this internal iteration. > > > > As a result, the following are considered invalid: > > > > 'spam'.startswith(['s', 'z']) > > 'spam'.startswith({'s', 'z'}) > > 'spam'.startswith(x for x in 'sz') > > > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: startswith first arg must be str, unicode, or tuple > > > > There are two common workarounds: > > > > 'spam'.startswith(tuple({'s', 'z'})) > > any('spam'.startwith(c) for c in {'s', 'z'}) > > > > Of course, the following construction already has a clear, separate > meaning: > > > > 'spam'.startswith('sz') # 'spam' starts with 'sz' > > > > In these cases, could we supplant the PyTuple_Check with one that would > > allow any iterator? Alternatively, could add this as an additional > branch? > > > > The code would look something like: > > > > it = PyObject_GetIter(subobj); > > if (it == NULL) > > return NULL; > > > > iternext = *Py_TYPE(it)->tp_iternext; > > for(;;) { > > substring = iternext(it); > > if (substring == NULL) > > Py_RETURN_FALSE; > > result = tailmatch(self, substring, start, end, -1); > > Py_DECREF(substring); > > if (result) > > Py_RETURN_TRUE; > > } > > > > Of course, in the case of methods like .startswith, this would need to > > ensure the following behaviour remains unchanged. The following should > > always check if 'spam' starts with 'sz' not starts with 's' or with 'z': > > > > 'spam'.startswith('sz') > > > > I searched bugs.python.org and python-ideas for any previous discussion > > of this topic. If this seems reasonable, I can submit an enhancement to > > bugs.python.org with a patch for unicodeobject.c:unicode_startswith > > > > Cheers, > > James Powell > > > > follow: @dontusethiscode + @nycpython > > attend: nycpython.org + flask-nyc.org > > read: seriously.dontusethiscode.com > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james at dontusethiscode.com Fri Jan 3 00:37:56 2014 From: james at dontusethiscode.com (James Powell) Date: Thu, 02 Jan 2014 18:37:56 -0500 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> Message-ID: <52C5F854.90306@dontusethiscode.com> On 01/02/2014 06:24 PM, Guido van Rossum wrote: > The current behavior is intentional, and the ambiguity of strings > themselves being iterables is the main reason. Since startswith() is > almost always called with a literal or tuple of literals anyway, I see > little need to extend the semantics. (I notice that you don't actually > give any examples where the iterator would be useful -- have you > encountered any, or are you just arguing for consistency's sake?) This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating: any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes)) However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity: isinstance(x, {int, float}) I do agree that it's definitely important to retain the behaviour of: 'spam'.startswith('sz') At same time, I think the non-string iterable problem is already fairly well-known and not a source of great confusion. How often has one typed: isinstance(x, Iterable) and not isinstance(x, str) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com From guido at python.org Fri Jan 3 00:59:04 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Jan 2014 13:59:04 -1000 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <52C5F854.90306@dontusethiscode.com> References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: On Thu, Jan 2, 2014 at 1:37 PM, James Powell wrote: > On 01/02/2014 06:24 PM, Guido van Rossum wrote: >> The current behavior is intentional, and the ambiguity of strings >> themselves being iterables is the main reason. Since startswith() is >> almost always called with a literal or tuple of literals anyway, I see >> little need to extend the semantics. (I notice that you don't actually >> give any examples where the iterator would be useful -- have you >> encountered any, or are you just arguing for consistency's sake?) > > This is driven by a real-world example wherein a large number of > prefixes stored in a set, necessitating: > > any('spam'.startswith(c) for c in prefixes) > # or > 'spam'.startswith(tuple(prefixes)) Neither of these strikes me as bad. Also, depending on whether the set of prefixes itself changes dynamically, it may be best to lift the tuple() call out of the startswith() call. Note that for performance, I suspect that the any() version will be slower if you can avoid calling tuple() every time -- I recall once finding that x.startswith('ab') benchmarked slower than x[:2] == 'ab' because the name lookup for 'startswith' dominated the overall time. > However, .startswith doesn't seem to be the only example of this, and > the other examples are free of the string/iterable ambiguity: > > isinstance(x, {int, float}) But this is even less likely to have a dynamically generated argument. And there could still be another ambiguity here: a metaclass could conceivably make its instances (i.e. classes) iterable. > I do agree that it's definitely important to retain the behaviour of: > > 'spam'.startswith('sz') Duh. :-) > At same time, I think the non-string iterable problem is already fairly > well-known and not a source of great confusion. How often has one typed: > > isinstance(x, Iterable) and not isinstance(x, str) If you find yourself typing that a lot I think you have a bigger problem though. All in all I hope you will give up your push for this feature. It just doesn't seem all that important, and you really just move the inconsistency to a different place (special-casing strings instead of tuples). -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 3 01:16:39 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Jan 2014 14:16:39 -1000 Subject: [Python-ideas] *var()* In-Reply-To: <20140102133500.GM29356@ando> References: <20140102133500.GM29356@ando> Message-ID: On Thu, Jan 2, 2014 at 3:35 AM, Steven D'Aprano wrote: > I would have guessed that you could get this working with eval, but if > there is such a way, I can't work it out. It's trivial if you directly invoke eval(): x = 42 def example(): print 'first:', eval('x') y = 'hello world' print 'second:', eval('y') example() will print first: 42 second: hello world Writing Liam's var() as a regular function would require using sys._getframe() and won't access intermediate scopes; something like this would at least find locals and globals: def var(*args): name = ''.join(map(str, args)) # So var('count', 1) is the same as var('count1') frame = sys._getframe(1) # Caller's frame return eval(name, frame.f_globals, frame.f_locals) Now this works as desired: x = 42 def example(): print 'first:', var('x') y = 'hello world' print 'second:', var('y') example() All in all, agreed this doesn't need to be added to the language, given that it's easy enough() to invoke eval() directly. (And advanced programmers tend to use all kinds of other tricks to avoid the need.) Two more things, especially for Liam: (1) There was nothing stupid about your post -- welcome to the Python community! (2) eval() is much more powerful than just variable lookup; if you write a program that asks its user for a variable name and then pass that to eval(), a clever user could trick your program into running code you might not like to run, by typing an expression with a side-effect as the "variable name". But if you're just beginning it's probably best not to worry too much about such possibilities -- most likely you yourself are the only user of your programs! -- --Guido van Rossum (python.org/~guido) From james at dontusethiscode.com Fri Jan 3 01:39:07 2014 From: james at dontusethiscode.com (James Powell) Date: Thu, 02 Jan 2014 19:39:07 -0500 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: <52C606AB.9090704@dontusethiscode.com> On 01/02/2014 06:59 PM, Guido van Rossum wrote: >> This is driven by a real-world example wherein a large number of >> prefixes stored in a set, necessitating: >> any('spam'.startswith(c) for c in prefixes) >> # or >> 'spam'.startswith(tuple(prefixes)) > Neither of these strikes me as bad. Also, depending on whether the set > of prefixes itself changes dynamically, it may be best to lift the > tuple() call out of the startswith() call. I agree. The any() formulation proves good enough in practice. Creating a tuple can be a bit tricky, since the list of prefixes could be large and could change. >> However, .startswith doesn't seem to be the only example of this, and >> the other examples are free of the string/iterable ambiguity: >> isinstance(x, {int, float}) > And there could still be another ambiguity here: a metaclass could > conceivably make its instances (i.e. classes) iterable. It's an interesting point that there's fundamental ambiguity between providing an iterable of arguments or providing a single argument that is itself an iterable (e.g., in the case of a type that is itself iterable, like Enum) In fact, I've actually warmed up to the any() formulation, because it makes explicit which behaviour you want. >> I do agree that it's definitely important to retain the behaviour of: >> 'spam'.startswith('sz') > Duh. :-) You never know... > All in all I hope you will give up your push for this feature. It just > doesn't seem all that important, and you really just move the > inconsistency to a different place (special-casing strings instead of > tuples). For these functions and methods, being able to provide a tuple of arguments instead of a single argument seems mostly a convenience. It allows the most common case of wanting to internalise the iteration with a minimum of ambiguity. The any() or tuple() formulation are available where needed. In the end, I'm happy to drop the push for this feature. (In general, I agree that there isn't a need to stamp out all inconsistencies or to belabour the use of abstract types.) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com From python at 2sn.net Fri Jan 3 01:19:51 2014 From: python at 2sn.net (Alexander Heger) Date: Fri, 3 Jan 2014 11:19:51 +1100 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: >> isinstance(x, Iterable) and not isinstance(x, str) > > If you find yourself typing that a lot I think you have a bigger problem though. How do you replace this? From guido at python.org Fri Jan 3 01:49:14 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Jan 2014 14:49:14 -1000 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: By designing an API that doesn't require such overloading. On Thursday, January 2, 2014, Alexander Heger wrote: > >> isinstance(x, Iterable) and not isinstance(x, str) > > > > If you find yourself typing that a lot I think you have a bigger problem > though. > > How do you replace this? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jan 3 02:18:34 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Jan 2014 12:18:34 +1100 Subject: [Python-ideas] *var()* In-Reply-To: References: <20140102133500.GM29356@ando> Message-ID: <20140103011833.GP29356@ando> On Thu, Jan 02, 2014 at 02:16:39PM -1000, Guido van Rossum wrote: > On Thu, Jan 2, 2014 at 3:35 AM, Steven D'Aprano wrote: > > I would have guessed that you could get this working with eval, but if > > there is such a way, I can't work it out. > > It's trivial if you directly invoke eval(): That's what I thought too, but I get surprising results with nonlocals. a = b = "global" def test1(): b = c = "nonlocal" def inner(): d = "local" return (a, b, c, d) return inner() def test2(): b = c = "nonlocal" def inner(): d = "local" c # Need this or the function fails with NameError. return (eval('a'), eval('b'), eval('c'), eval('d')) return inner() assert test1() == test2() # Fails. test1() returns ('global', 'nonlocal', 'nonlocal', 'local'), which is what I expect. But test2() returns ('global', 'global', 'nonlocal', 'local'), which surprises me. If I understand what is going on in test2's inner function, eval('b') doesn't see the nonlocal b so it picks up the global b. (If there is no global b, you get NameError.) But eval('c') sees the nonlocal c because we have a closure, due to the reference to c in the previous line. If there's a way to get eval('b') to return "nonlocal" without having a closure, I don't know it. This suggests to me that you can't reliably look-up a nonlocal from an inner function using eval. -- Steven From python at 2sn.net Fri Jan 3 04:54:57 2014 From: python at 2sn.net (Alexander Heger) Date: Fri, 3 Jan 2014 14:54:57 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple Message-ID: > By designing an API that doesn't require such overloading. > > On Thursday, January 2, 2014, Alexander Heger wrote: >> >> >> isinstance(x, Iterable) and not isinstance(x, str) >> > >> > If you find yourself typing that a lot I think you have a bigger problem >> > though. >> >> How do you replace this? for my applications this seemed the most natural way - have the method deal with what it is fed, which could be strings or any kind of collections or iterables of strings. But never would I want to disassemble strings into characters. From the previous message I gather that I am not the only one with this application case. Generally, I find strings being iterables of characters as useful as if integers were iterables of bits. They should just be units. They already start out being not mutable. I think it would be a positive design change for Python 4 to make them units instead of being iterables. At least for me, there is much fewer applications where the latter is useful than where it requires extra code. Overall, it makes the language less clean that a string is an iterable; a special case we always have to code around. I know it will break a lot of existing code, but so did the string change from py2 to 3. (It would break very few of my codes, though.) -Alexander From rosuav at gmail.com Fri Jan 3 04:59:51 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Jan 2014 14:59:51 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: Message-ID: On Fri, Jan 3, 2014 at 2:54 PM, Alexander Heger wrote: > Generally, I find strings being iterables of characters as useful as > if integers were iterables of bits. They should just be units. What this would mean is that any time you want to iterate over the characters, you'd have to iterate over string.split('') instead. So the question is, is that common enough to be a problem? The other point that comes to mind is that iteration and indexing are closely related. I think most people would agree that "abcde"[1] should be 'b' (granted, there's room for debate as to whether that should be a one-character string or an integer with the Unicode codepoint, but either way); it's possible to iterate over anything by indexing it with 0, then 1, then 2, etc, until it raises IndexError. For a string to not be iterable, that identity would have to be broken. ChrisA From breamoreboy at yahoo.co.uk Fri Jan 3 05:27:15 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 03 Jan 2014 04:27:15 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: Message-ID: On 03/01/2014 03:54, Alexander Heger wrote: > > Generally, I find strings being iterables of characters as useful as > if integers were iterables of bits. They should just be units. They > already start out being not mutable. I think it would be a positive > design change for Python 4 to make them units instead of being > iterables. At least for me, there is much fewer applications where > the latter is useful than where it requires extra code. Overall, it > makes the language less clean that a string is an iterable; a special > case we always have to code around. > I find your terminology misleading. A string is a sequence in the same way that list, tuple, range, bytes, bytearray and memoryview are. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From guido at python.org Fri Jan 3 05:58:06 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Jan 2014 18:58:06 -1000 Subject: [Python-ideas] *var()* In-Reply-To: <20140103011833.GP29356@ando> References: <20140102133500.GM29356@ando> <20140103011833.GP29356@ando> Message-ID: Right, that's why I said "won't access intermediate scopes"... On Thursday, January 2, 2014, Steven D'Aprano wrote: > On Thu, Jan 02, 2014 at 02:16:39PM -1000, Guido van Rossum wrote: > > On Thu, Jan 2, 2014 at 3:35 AM, Steven D'Aprano > > wrote: > > > I would have guessed that you could get this working with eval, but if > > > there is such a way, I can't work it out. > > > > It's trivial if you directly invoke eval(): > > That's what I thought too, but I get surprising results with > nonlocals. > > > a = b = "global" > > def test1(): > b = c = "nonlocal" > def inner(): > d = "local" > return (a, b, c, d) > return inner() > > def test2(): > b = c = "nonlocal" > def inner(): > d = "local" > c # Need this or the function fails with NameError. > return (eval('a'), eval('b'), eval('c'), eval('d')) > return inner() > > assert test1() == test2() # Fails. > > > test1() returns ('global', 'nonlocal', 'nonlocal', 'local'), which is > what I expect. But test2() returns ('global', 'global', 'nonlocal', > 'local'), > which surprises me. > > If I understand what is going on in test2's inner function, eval('b') > doesn't see the nonlocal b so it picks up the global b. (If there is no > global b, you get NameError.) But eval('c') sees the nonlocal c because > we have a closure, due to the reference to c in the previous line. > > If there's a way to get eval('b') to return "nonlocal" without having a > closure, I don't know it. This suggests to me that you can't reliably > look-up a nonlocal from an inner function using eval. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jan 3 10:23:14 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 03 Jan 2014 04:23:14 -0500 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: Message-ID: On 1/2/2014 10:59 PM, Chris Angelico wrote: > On Fri, Jan 3, 2014 at 2:54 PM, Alexander Heger wrote: >> Generally, I find strings being iterables of characters as useful as >> if integers were iterables of bits. They should just be units. > > What this would mean is that any time you want to iterate over the > characters, you'd have to iterate over string.split('') instead. So > the question is, is that common enough to be a problem? > > The other point that comes to mind is that iteration and indexing are > closely related. def iter(collection): # is something like (ignoring two param form) if hasattr('__iter__'): return ob.__iter__ elif hasattr('__getitem__'): return iterator(ob) In 2.x, str does *not* have .__iter__, so the second branch is taken. >>> iter('ab') In 3.x, str *does* have .__iter__. >>> iter('ab') If .__iter__ were removed, strings would revert to using the generic iterator and would *still* be iterable. > I think most people would agree that "abcde"[1] > should be 'b' (granted, there's room for debate as to whether that > should be a one-character string or an integer with the Unicode > codepoint, but either way); it's possible to iterate over anything by > indexing it with 0, then 1, then 2, etc, until it raises IndexError. > For a string to not be iterable, that identity would have to be > broken. Which, to me, would be really ugly ;-). -- Terry Jan Reedy From denis.spir at gmail.com Fri Jan 3 11:19:35 2014 From: denis.spir at gmail.com (spir) Date: Fri, 03 Jan 2014 11:19:35 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: Message-ID: <52C68EB7.6090404@gmail.com> On 01/03/2014 04:54 AM, Alexander Heger wrote: >> By designing an API that doesn't require such overloading. >> >> On Thursday, January 2, 2014, Alexander Heger wrote: >>> >>>>> isinstance(x, Iterable) and not isinstance(x, str) >>>> >>>> If you find yourself typing that a lot I think you have a bigger problem >>>> though. >>> >>> How do you replace this? > > for my applications this seemed the most natural way - have the method > deal with what it is fed, which could be strings or any kind of > collections or iterables of strings. But never would I want to > disassemble strings into characters. From the previous message I > gather that I am not the only one with this application case. > > Generally, I find strings being iterables of characters as useful as > if integers were iterables of bits. They should just be units. They > already start out being not mutable. I think it would be a positive > design change for Python 4 to make them units instead of being > iterables. At least for me, there is much fewer applications where > the latter is useful than where it requires extra code. Overall, it > makes the language less clean that a string is an iterable; a special > case we always have to code around. > > I know it will break a lot of existing code, but so did the string > change from py2 to 3. (It would break very few of my codes, though.) I agree there is an occasionnal need which I also met in real code: it was parse result data, which can be a string (terminal patterns, that really "eat" part of the source) or list (or otherwise "tre" iterable collection, for composite or repetitive patterns). But the case is rare because it requires coincidence of conditions: * both string and collections may come as input * both are valid, from the app's logics' point of view * one want to iterate collections, but not strings On the other hand, I find you much too quickly dismiss real and very common need to iterate strings (on the lowest units of code points), apparently on the only base that in your own programming practice you don't need/want it. We should not make iterating strings a special case (eg by requiring explicit call to an iterator like for ucode in s.ucodes() because the case is so common. Instead we may consider finding a way to exclude strings in some collection traversal idiom (for which I have good proposal: the obvious one would .items(), but it's used for a different meaning), which would for instance yield an exception on strings because they don't match the idiom ("str object has no 'items' attribute"). Denis From ncoghlan at gmail.com Fri Jan 3 12:41:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jan 2014 21:41:09 +1000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <52C68EB7.6090404@gmail.com> References: <52C68EB7.6090404@gmail.com> Message-ID: On 3 January 2014 20:19, spir wrote: > On 01/03/2014 04:54 AM, Alexander Heger wrote: >>> >>> By designing an API that doesn't require such overloading. >>> >>> On Thursday, January 2, 2014, Alexander Heger wrote: >>>> >>>> >>>>>> isinstance(x, Iterable) and not isinstance(x, str) >>>>> >>>>> >>>>> If you find yourself typing that a lot I think you have a bigger >>>>> problem >>>>> though. >>>> >>>> >>>> How do you replace this? >> >> >> for my applications this seemed the most natural way - have the method >> deal with what it is fed, which could be strings or any kind of >> collections or iterables of strings. But never would I want to >> disassemble strings into characters. From the previous message I >> gather that I am not the only one with this application case. >> >> Generally, I find strings being iterables of characters as useful as >> if integers were iterables of bits. They should just be units. They >> already start out being not mutable. I think it would be a positive >> design change for Python 4 to make them units instead of being >> iterables. At least for me, there is much fewer applications where >> the latter is useful than where it requires extra code. Overall, it >> makes the language less clean that a string is an iterable; a special >> case we always have to code around. >> >> I know it will break a lot of existing code, but so did the string >> change from py2 to 3. (It would break very few of my codes, though.) > > > I agree there is an occasionnal need which I also met in real code: it was > parse result data, which can be a string (terminal patterns, that really > "eat" part of the source) or list (or otherwise "tre" iterable collection, > for composite or repetitive patterns). But the case is rare because it > requires coincidence of conditions: > * both string and collections may come as input > * both are valid, from the app's logics' point of view > * one want to iterate collections, but not strings > > On the other hand, I find you much too quickly dismiss real and very common > need to iterate strings (on the lowest units of code points), apparently on > the only base that in your own programming practice you don't need/want it. > > We should not make iterating strings a special case (eg by requiring > explicit call to an iterator like for ucode in s.ucodes() because the case > is so common. Instead we may consider finding a way to exclude strings in > some collection traversal idiom (for which I have good proposal: the obvious > one would .items(), but it's used for a different meaning), which would for > instance yield an exception on strings because they don't match the idiom > ("str object has no 'items' attribute"). The underlying problem is that strings have a dual nature: you can view them as either a sequence of code points (which is how Python models them), or else you can view them as an opaque chunk of text (which is often how you want to treat them in code that accepts either containers or atomic values and treats them differently). This has some interesting implications for API design. "def f(*args)" handles the constraint fairly well, as f("astring") is treated as a single value and f(*"string") is an unlikely mistake for anyone to make. "def f(iterable)" has problems in many cases, since f("string") is treated as an iterable of code points, even if you'd prefer an immediate error. "def f(iterable_or_atomic)" also has problems, since strings will use the "iterable" path, even if the atomic handling would be more appropriate. Algorithms that recursively descend into containers also need to deal with the fact that doing so with strings causes an infinite loop (since iterating over a string produces length 1 strings). This is a genuine problem, which is why the question of how to cleanly deal with these situations keeps coming up every couple of years, and the current state of the art answer is "grit your teeth and use isinstance(obj, str)" (or a configurable alternative). However, I'm wondering if it might be reasonable to add a new entry in collections.abc for 3.5: >>> from abc import ABC >>> from collections.abc import Iterable >>> class Atomic(ABC): ... @classmethod ... def __subclasshook__(cls, subclass): ... if not issubclass(subclass, Iterable): ... return True ... return NotImplemented ... >>> Atomic.register(str) >>> Atomic.register(bytes) >>> Atomic.register(bytearray) >>> isinstance(1, Atomic) True >>> isinstance(1.0, Atomic) True >>> isinstance(1j, Atomic) True >>> isinstance("Hello", Atomic) True >>> isinstance(b"Hello", Atomic) True >>> isinstance((), Atomic) False >>> isinstance([], Atomic) False >>> isinstance({}, Atomic) False Any type which wasn't iterable would automatically be considered atomic, while some types which *are* iterable could *also* be registered as atomic (with str, bytes and bytearray being the obvious candidates, as shown above). Armed with such an ABC, you could then write an "iter_non_atomic" helper function as: def iter_non_atomic(iterable): if isinstance(iterable, Atomic): raise TypeError("{!r} is considered atomic".format(iterable.__class__.__name__) return iter(iterable) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From masklinn at masklinn.net Fri Jan 3 13:12:41 2014 From: masklinn at masklinn.net (Masklinn) Date: Fri, 3 Jan 2014 13:12:41 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> Message-ID: <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> On 2014-01-03, at 12:41 , Nick Coghlan wrote: > "def f(iterable_or_atomic)" also has problems, since strings will use > the "iterable" path, even if the atomic handling would be more > appropriate. > > Algorithms that recursively descend into containers also need to deal > with the fact that doing so with strings causes an infinite loop > (since iterating over a string produces length 1 strings). > > This is a genuine problem, which is why the question of how to cleanly > deal with these situations keeps coming up every couple of years, and > the current state of the art answer is "grit your teeth and use > isinstance(obj, str)" (or a configurable alternative). > > However, I'm wondering if it might be reasonable to add a new entry in > collections.abc for 3.5: > >>>> from abc import ABC >>>> from collections.abc import Iterable >>>> class Atomic(ABC): > ... @classmethod > ... def __subclasshook__(cls, subclass): > ... if not issubclass(subclass, Iterable): > ... return True > ... return NotImplemented > ... I?ve used some sort of ad-hoc version of it enough that I think it?s a good idea, although I?d suggest ?scalar?: ?atomic? also exists (with very different semantics) in concurrency contexts, whereas I believe scalar always means single-value (non-compound) data type. >>>> Atomic.register(str) > >>>> Atomic.register(bytes) > >>>> Atomic.register(bytearray) > >>>> isinstance(1, Atomic) > True >>>> isinstance(1.0, Atomic) > True >>>> isinstance(1j, Atomic) > True >>>> isinstance("Hello", Atomic) > True >>>> isinstance(b"Hello", Atomic) > True >>>> isinstance((), Atomic) > False >>>> isinstance([], Atomic) > False >>>> isinstance({}, Atomic) > False From ncoghlan at gmail.com Fri Jan 3 13:30:31 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Jan 2014 22:30:31 +1000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: On 3 January 2014 22:12, Masklinn wrote: > On 2014-01-03, at 12:41 , Nick Coghlan wrote: >> "def f(iterable_or_atomic)" also has problems, since strings will use >> the "iterable" path, even if the atomic handling would be more >> appropriate. >> >> Algorithms that recursively descend into containers also need to deal >> with the fact that doing so with strings causes an infinite loop >> (since iterating over a string produces length 1 strings). >> >> This is a genuine problem, which is why the question of how to cleanly >> deal with these situations keeps coming up every couple of years, and >> the current state of the art answer is "grit your teeth and use >> isinstance(obj, str)" (or a configurable alternative). >> >> However, I'm wondering if it might be reasonable to add a new entry in >> collections.abc for 3.5: >> >>>>> from abc import ABC >>>>> from collections.abc import Iterable >>>>> class Atomic(ABC): >> ... @classmethod >> ... def __subclasshook__(cls, subclass): >> ... if not issubclass(subclass, Iterable): >> ... return True >> ... return NotImplemented >> ... > > I?ve used some sort of ad-hoc version of it enough that I think it?s > a good idea, although I?d suggest ?scalar?: ?atomic? also > exists (with very different semantics) in concurrency contexts, whereas > I believe scalar always means single-value (non-compound) data type. Yeah, that makes sense. I believe the NumPy folks run into a somewhat similar issue with the subtle distinction between treating scalars as scalars and treating them as zero-dimensional arrays. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From denis.spir at gmail.com Fri Jan 3 15:17:44 2014 From: denis.spir at gmail.com (spir) Date: Fri, 03 Jan 2014 15:17:44 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: <52C6C688.6030104@gmail.com> On 01/03/2014 01:12 PM, Masklinn wrote: > I?ve used some sort of ad-hoc version of it enough that I think it?s > a good idea, although I?d suggest ?scalar?: ?atomic? also > exists (with very different semantics) in concurrency contexts, whereas > I believe scalar always means single-value (non-compound) data type. I used to use, for non highly educated folks, "element" or "elementary" (considering "scalar" too rare a term, and "atomic" potentially misleading). Denis From joshua at landau.ws Fri Jan 3 15:17:19 2014 From: joshua at landau.ws (Joshua Landau) Date: Fri, 3 Jan 2014 14:17:19 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: On 3 January 2014 12:12, Masklinn wrote: > On 2014-01-03, at 12:41 , Nick Coghlan wrote: > I?ve used some sort of ad-hoc version of it enough that I think it?s > a good idea, although I?d suggest ?scalar?: ?atomic? also > exists (with very different semantics) in concurrency contexts, whereas > I believe scalar always means single-value (non-compound) data type. OTOH, to many non-mathematical people I hardly expect "is this scalar" to feel nearly as meaningful a question as "is this atomic". To bike-shed, how about "unitary". Nevertheless, I like the idea and the problem is a real one. From denis.spir at gmail.com Fri Jan 3 15:21:31 2014 From: denis.spir at gmail.com (spir) Date: Fri, 03 Jan 2014 15:21:31 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> Message-ID: <52C6C76B.8050709@gmail.com> On 01/03/2014 12:41 PM, Nick Coghlan wrote: > The underlying problem is that strings have a dual nature: you can > view them as either a sequence of code points (which is how Python > models them), or else you can view them as an opaque chunk of text > (which is often how you want to treat them in code that accepts either > containers or atomic values and treats them differently). > > This has some interesting implications for API design. > > "def f(*args)" handles the constraint fairly well, as f("astring") is > treated as a single value and f(*"string") is an unlikely mistake for > anyone to make. > > "def f(iterable)" has problems in many cases, since f("string") is > treated as an iterable of code points, even if you'd prefer an > immediate error. > > "def f(iterable_or_atomic)" also has problems, since strings will use > the "iterable" path, even if the atomic handling would be more > appropriate. > > Algorithms that recursively descend into containers also need to deal > with the fact that doing so with strings causes an infinite loop > (since iterating over a string produces length 1 strings). > > This is a genuine problem, which is why the question of how to cleanly > deal with these situations keeps coming up every couple of years, and > the current state of the art answer is "grit your teeth and use > isinstance(obj, str)" (or a configurable alternative). > > However, I'm wondering if it might be reasonable to add a new entry in > collections.abc for 3.5: > >>>> >>>from abc import ABC >>>> >>>from collections.abc import Iterable >>>> >>>class Atomic(ABC): > ... @classmethod > ... def __subclasshook__(cls, subclass): > ... if not issubclass(subclass, Iterable): > ... return True > ... return NotImplemented > ... >>>> >>>Atomic.register(str) > >>>> >>>Atomic.register(bytes) > >>>> >>>Atomic.register(bytearray) > >>>> >>>isinstance(1, Atomic) > True >>>> >>>isinstance(1.0, Atomic) > True >>>> >>>isinstance(1j, Atomic) > True >>>> >>>isinstance("Hello", Atomic) > True >>>> >>>isinstance(b"Hello", Atomic) > True >>>> >>>isinstance((), Atomic) > False >>>> >>>isinstance([], Atomic) > False >>>> >>>isinstance({}, Atomic) > False > > Any type which wasn't iterable would automatically be considered > atomic, while some types which *are* iterable could *also* be > registered as atomic (with str, bytes and bytearray being the obvious > candidates, as shown above). > > Armed with such an ABC, you could then write an "iter_non_atomic" > helper function as: > > def iter_non_atomic(iterable): > if isinstance(iterable, Atomic): > raise TypeError("{!r} is considered > atomic".format(iterable.__class__.__name__) > return iter(iterable) I like this solution. But would live with checking for type (usually str). The point is that, while not that uncommon, when the issue arises one has to deal with it at one or at most a few places in code (typically at start of one a few methods of a given type). It is not as if we had to carry an unneeded overload about everywhere. Denis From ncoghlan at gmail.com Fri Jan 3 15:39:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 4 Jan 2014 00:39:15 +1000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <52C6C76B.8050709@gmail.com> References: <52C68EB7.6090404@gmail.com> <52C6C76B.8050709@gmail.com> Message-ID: On 4 January 2014 00:21, spir wrote: > On 01/03/2014 12:41 PM, Nick Coghlan wrote: >> Armed with such an ABC, you could then write an "iter_non_atomic" >> helper function as: >> >> def iter_non_atomic(iterable): >> if isinstance(iterable, Atomic): >> raise TypeError("{!r} is considered >> atomic".format(iterable.__class__.__name__) >> return iter(iterable) > > > I like this solution. But would live with checking for type (usually str). The ducktyping variant I've also used on occasion is "hasattr(obj, 'encode')" rather than an instance check against a concrete type (it also has the benefit of picking up both str and unicode in Python 2 when writing 2/3 compatible code that can't rely on basestring, as well as UserString instances) > The point is that, while not that uncommon, when the issue arises one has to > deal with it at one or at most a few places in code (typically at start of > one a few methods of a given type). It is not as if we had to carry an > unneeded overload about everywhere. Right, I see it as very similar to the "is that a sequence or a mapping?" question that was one of the key motivations for adding the ABC machinery in the first place. For that case, people historically used a check like "hasattr(obj, 'keys')" (and I think we still do that in a couple of places). Here, the distinction is between true containers types like sets, dicts and lists, and more structured iterables like strings, where the whole is substantially more than the sum of its parts. Actually, that would be another way of carving out the distinction - rather than trying to cover *all* Atomic types, just have an AtomicIterable ABC that indicated any structure where applying operations like "flatten" doesn't make sense. In addition to str, bytes and bytearray, memoryview and namedtuple instances would also be appropriate candidates. The Iterable suffix would indicate directly that this wasn't related to concurrency. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Jan 3 16:54:17 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 04 Jan 2014 00:54:17 +0900 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: <87mwjdjc2e.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > I?ve used some sort of ad-hoc version of it enough that I think it?s > a good idea, although I?d suggest ?scalar?: ?atomic? also > exists (with very different semantics) in concurrency contexts, whereas > I believe scalar always means single-value (non-compound) data type. Sure, but if you're a Unicode geek "scalar" essentially means "character", so a string ain't that! Seriously, all the good words have been taken two or three times already in some other field. Pick one and don't worry about the overloading -- learning to spell English is *much* harder. From denis.spir at gmail.com Fri Jan 3 17:31:22 2014 From: denis.spir at gmail.com (spir) Date: Fri, 03 Jan 2014 17:31:22 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <87mwjdjc2e.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <87mwjdjc2e.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52C6E5DA.5070609@gmail.com> On 01/03/2014 04:54 PM, Stephen J. Turnbull wrote: > Masklinn writes: > > > I?ve used some sort of ad-hoc version of it enough that I think it?s > > a good idea, although I?d suggest ?scalar?: ?atomic? also > > exists (with very different semantics) in concurrency contexts, whereas > > I believe scalar always means single-value (non-compound) data type. > > Sure, but if you're a Unicode geek "scalar" essentially means > "character", so a string ain't that! Unfortunately in unicode slang "character" does not mean character ;-) (but, say, whatever a code point happens to represent) > Seriously, all the good words have been taken two or three times > already in some other field. Pick one and don't worry about the > overloading -- learning to spell English is *much* harder. Thankfully no one needs spelling english corectly to program --except for keywords... Denis From denis.spir at gmail.com Fri Jan 3 17:39:15 2014 From: denis.spir at gmail.com (spir) Date: Fri, 03 Jan 2014 17:39:15 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> <52C6C76B.8050709@gmail.com> Message-ID: <52C6E7B3.2060003@gmail.com> On 01/03/2014 03:39 PM, Nick Coghlan wrote: > Here, the distinction is between true containers types like sets, > dicts and lists, and more structured iterables like strings, where the > whole is substantially more than the sum of its parts. That's it: the unique property of strings is that composing & combining are the same operation, while for true containers ther are distinct: when combining sets (union), one gets a set at the same complexity level, whatever the items are, while when composing sets one gets a set of sets. > Actually, that would be another way of carving out the distinction - > rather than trying to cover *all* Atomic types, just have an > AtomicIterable ABC that indicated any structure where applying > operations like "flatten" doesn't make sense. In addition to str, > bytes and bytearray, memoryview and namedtuple instances would also be > appropriate candidates. Yes, maybe it's more practicle; but an ABC type common to strings (and the like) and atomic types also makes sense. Denis PS: I had another common use case at times, with trees which leaves may be string, or not (esp for their str and repr methods). From abarnert at yahoo.com Fri Jan 3 18:27:21 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 3 Jan 2014 09:27:21 -0800 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> <52C6C76B.8050709@gmail.com> Message-ID: <7CE76D67-E43B-40A7-9A60-FFC517A34353@yahoo.com> On Jan 3, 2014, at 6:39, Nick Coghlan wrote: > The Iterable suffix would indicate directly that this wasn't related > to concurrency. I don't know; something whose iter was guaranteed to return a iterator that I could next without synchronizing could be pretty handy. ;) More seriously, I think a strength of your original version was having a single abstract type for both non-iterables and things that are iterable but you sometimes don't want to treat that way. A flatten function that uses "not isinstance(x, Iterable) or isinstance(x, AtomicIterable)" is less obvious than one that just uses "isinstance(x, Atomic)", and will be a source of 10x as many stupid "oops I used and instead of or" type bugs. If there really is no acceptable name for the easier concept, the tradeoff could be worth it anyway, but I think it's worth trying harder for one One last question to bring up: Is there a reasonable/common use case where you do want to flatten multi-char strings to single-char strings, but then want to treat single-char strings as atoms? I can certainly imagine toy cases like that, but it could easily be so rarely useful that it's ok to leave that clumsy to write. From bruce at leapyear.org Fri Jan 3 19:11:59 2014 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 3 Jan 2014 10:11:59 -0800 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: On Fri, Jan 3, 2014 at 6:17 AM, Joshua Landau wrote: > OTOH, to many non-mathematical people I hardly expect "is this scalar" > to feel nearly as meaningful a question as "is this atomic". > > To bike-shed, how about "unitary". > "atomic" has the wrong meaning since it says it doesn't have any component parts. Scalar has the right meaning. As to the idea of making strings not iterable, that would break my code. I write a lot of code to manipulate words (to create puzzles) and iterating over strings is fundamental. In fact, I'd like to have strings as results of iteration operations on strings: >>> sorted('string') 'ginrst' >>> list(itertools.permutations('bar')) ['bar', 'bra', 'abr', 'arb', 'rba', 'rab'] instead I have to write >>> ''.join(sorted('string')) >>> [''.join(s) for s in itertools.permutations('bar')] This would probably break less code than making strings non-iterable, but realize that there's approximately 0% chance this would ever change and there's no easy way to cover every iteration operation. And it would confuse people if sometimes: (x.upper() for x in s) returned an iterator and sometimes it returned a string. --- Bruce My guest puzzle for Puzzles Live: http://www.puzzazz.com/puzzles-live/10 -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sat Jan 4 05:08:19 2014 From: python at 2sn.net (Alexander Heger) Date: Sat, 04 Jan 2014 15:08:19 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> Message-ID: <52C78933.3000208@2sn.net> Dear Nick, yes, defining an ABC for this case would be an excellent solution. Thanks. -Alexander > However, I'm wondering if it might be reasonable to add a new entry in > collections.abc for 3.5: > >>>> from abc import ABC >>>> from collections.abc import Iterable >>>> class Atomic(ABC): > ... @classmethod > ... def __subclasshook__(cls, subclass): > ... if not issubclass(subclass, Iterable): > ... return True > ... return NotImplemented > ... >>>> Atomic.register(str) > >>>> Atomic.register(bytes) > >>>> Atomic.register(bytearray) > >>>> isinstance(1, Atomic) > True >>>> isinstance(1.0, Atomic) > True >>>> isinstance(1j, Atomic) > True >>>> isinstance("Hello", Atomic) > True >>>> isinstance(b"Hello", Atomic) > True >>>> isinstance((), Atomic) > False >>>> isinstance([], Atomic) > False >>>> isinstance({}, Atomic) > False > > Any type which wasn't iterable would automatically be considered > atomic, while some types which *are* iterable could *also* be > registered as atomic (with str, bytes and bytearray being the obvious > candidates, as shown above). > > Armed with such an ABC, you could then write an "iter_non_atomic" > helper function as: > > def iter_non_atomic(iterable): > if isinstance(iterable, Atomic): > raise TypeError("{!r} is considered > atomic".format(iterable.__class__.__name__) > return iter(iterable) > > Cheers, > Nick. > From python at 2sn.net Sat Jan 4 05:23:59 2014 From: python at 2sn.net (Alexander Heger) Date: Sat, 04 Jan 2014 15:23:59 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: Message-ID: <52C78CDF.2030505@2sn.net> > On Fri, Jan 3, 2014 at 2:54 PM, Alexander Heger wrote: >> Generally, I find strings being iterables of characters as useful as >> if integers were iterables of bits. They should just be units. > > What this would mean is that any time you want to iterate over the > characters, you'd have to iterate over string.split('') instead. So > the question is, is that common enough to be a problem? you could still have had str.iter() > The other point that comes to mind is that iteration and indexing are > closely related. I think most people would agree that "abcde"[1] > should be 'b' (granted, there's room for debate as to whether that > should be a one-character string or an integer with the Unicode > codepoint, but either way); it's possible to iterate over anything by > indexing it with 0, then 1, then 2, etc, until it raises IndexError. > For a string to not be iterable, that identity would have to be > broken. OK, I admit that not being able to iterate over something that can be indexed may be confusing. Though indexing of strings is somewhat special in many languages. -Alexander From rosuav at gmail.com Sat Jan 4 06:32:04 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 4 Jan 2014 16:32:04 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <52C78CDF.2030505@2sn.net> References: <52C78CDF.2030505@2sn.net> Message-ID: On Sat, Jan 4, 2014 at 3:23 PM, Alexander Heger wrote: >> The other point that comes to mind is that iteration and indexing are >> closely related. I think most people would agree that "abcde"[1] >> should be 'b' (granted, there's room for debate as to whether that >> should be a one-character string or an integer with the Unicode >> codepoint, but either way); it's possible to iterate over anything by >> indexing it with 0, then 1, then 2, etc, until it raises IndexError. >> For a string to not be iterable, that identity would have to be >> broken. > > OK, I admit that not being able to iterate over something that can be > indexed may be confusing. Though indexing of strings is somewhat special in > many languages. I don't know that it's particularly special. In some languages, a string is simply an array of small integers (maybe bytes, maybe Unicode codepoints), so when you index into one, you get the integers. Python deems that the elements of a string are themselves strings, which is somewhat special I suppose, but only because the representation of a character is a short string. And of course, there are languages that treat strings as simple atomic scalars, no subscripting allowed at all - I don't think that's an advantage over either of the above. :) When you index a string, you get a character. Whatever the language uses to represent a character, that's what you get. I don't think this is particularly esoteric, but maybe that's just me. ChrisA From denis.spir at gmail.com Sat Jan 4 11:22:16 2014 From: denis.spir at gmail.com (spir) Date: Sat, 04 Jan 2014 11:22:16 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> Message-ID: <52C7E0D8.6000007@gmail.com> On 01/03/2014 07:11 PM, Bruce Leban wrote: > As to the idea of making strings not iterable, that would break my code. I > write a lot of code to manipulate words (to create puzzles) and iterating > over strings is fundamental. In fact, I'd like to have strings as results > of iteration operations on strings: > >>>> >>>sorted('string') > 'ginrst' >>>> >>>list(itertools.permutations('bar')) > ['bar', 'bra', 'abr', 'arb', 'rba', 'rab'] > > > instead I have to write > >>>> >>>''.join(sorted('string')) >>>> >>>[''.join(s) for s in itertools.permutations('bar')] Maybe we just need a 'cat' or 'concat' [1] method for lists: sorted('string').cat() (s for s in itertools.permutations('bar')).cat() (Then, a hard choice: should cat crash when items are not strings, or automagically stringify its operands? I wish join would do the latter.) Denis [1] I have not understood yet why "concatenation", instead of just "catenation". Literaly means chaining (things) together; but I'm still trying to figure out how one can chain things apart ;-) As if strings were called "withstrings" or "stringtogethers", more or less. Enlightening welcome. (Same for "concatenative languages"... of which one is called "cat"!) From ram.rachum at gmail.com Sat Jan 4 23:41:01 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Sat, 4 Jan 2014 14:41:01 -0800 (PST) Subject: [Python-ideas] `pathlib.Path.write` and `pathlib.Path.read` Message-ID: <904adce4-6534-47b4-bedf-112624e7331f@googlegroups.com> Hi, I'd really like to have methods `pathlib.Path.write` and `pathlib.Path.read`. Untested implementation: def read(self, binary=False): with self.open('br' is binary else 'r') as file: return file.read() def write(self, data. binary=False): with self.open('bw' is binary else 'w') as file: file.write(data) This will be super useful to me. Many files actions are one liners like that, and avoiding putting the `with` clause in user code would be wonderful. What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Sat Jan 4 23:05:27 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Sat, 4 Jan 2014 14:05:27 -0800 (PST) Subject: [Python-ideas] Introduce constant: `pathlib.null_path` Message-ID: What do you think about introducing this constant in the `pathlib` module: null_path = pathlib.Path('\\Device\\Null') if os.name = 'nt' else pathlib.Path('/dev/null') Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 4 23:59:04 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 5 Jan 2014 09:59:04 +1100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <52C7E0D8.6000007@gmail.com> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> Message-ID: <20140104225857.GZ29356@ando> On Sat, Jan 04, 2014 at 11:22:16AM +0100, spir wrote: > On 01/03/2014 07:11 PM, Bruce Leban wrote: > >As to the idea of making strings not iterable, that would break my code. I > >write a lot of code to manipulate words (to create puzzles) and iterating > >over strings is fundamental. In fact, I'd like to have strings as results > >of iteration operations on strings: > > > >>>>>>>sorted('string') > >'ginrst' > >>>>>>>list(itertools.permutations('bar')) > >['bar', 'bra', 'abr', 'arb', 'rba', 'rab'] That would be nice to have. > >instead I have to write > > > >>>>>>>''.join(sorted('string')) > >>>>>>>[''.join(s) for s in itertools.permutations('bar')] Which is a slight inconvenience, but not a great one. You can always save three characters by creating a helper function: join = ''.join > Maybe we just need a 'cat' or 'concat' [1] method for lists: > sorted('string').cat() > (s for s in itertools.permutations('bar')).cat() -1 Lists are general collections, giving them a method that depends on a specific kind of item is ugly. Adding that same method to generator expressions is even worse. We don't have list.sum() for adding lists of numbers, we have a sum() function that takes a list. > (Then, a hard choice: should cat crash when items are not strings, or > automagically stringify its operands? I wish join would do the latter.) -1 Joining what you think is a list of strings but actually isn't is an error. The right thing to do in the face of an error is to raise an exception, not to silently hide the error. If you want to automatically convert arbitrary items into strings, it is better to explicitly do so: ''.join(str(x) for x in items) than to have it magically, and incorrectly, happen implicitly. > [1] I have not understood yet why "concatenation", instead of just > "catenation". Literaly means chaining (things) together; but I'm still > trying to figure out how one can chain things apart ;-) Chain your left arm to the wall on your left, and your right arm to the wall on your right. Your arms are now chained apart. http://www.vlvstamps.com/man-chained-to-wall.html (Safe for work.) -- Steven From benjamin at python.org Sun Jan 5 00:25:40 2014 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 4 Jan 2014 23:25:40 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Introduce_constant=3A_=60pathlib=2Enull?= =?utf-8?b?X3BhdGhg?= References: Message-ID: Ram Rachum writes: > > What do you think about introducing this constant in the `pathlib` module: > ? ?null_path = pathlib.Path('\\Device\\Null') if os.name = 'nt' else pathlib.Path('/dev/null') = What's wrong with pathlib.Path(os.devnull)? From victor.stinner at gmail.com Sun Jan 5 00:27:25 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 5 Jan 2014 00:27:25 +0100 Subject: [Python-ideas] Introduce constant: `pathlib.null_path` In-Reply-To: References: Message-ID: There is already os.path.devnull. Victor Le 4 janv. 2014 23:48, "Ram Rachum" a ?crit : > What do you think about introducing this constant in the `pathlib` module: > > null_path = pathlib.Path('\\Device\\Null') if os.name = 'nt' else > pathlib.Path('/dev/null') > > > Thanks, > Ram. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Sun Jan 5 00:27:54 2014 From: ram at rachum.com (Ram Rachum) Date: Sun, 5 Jan 2014 01:27:54 +0200 Subject: [Python-ideas] Introduce constant: `pathlib.null_path` In-Reply-To: References: Message-ID: Cool, I didn't know about that. Thanks! On Sun, Jan 5, 2014 at 1:25 AM, Benjamin Peterson wrote: > Ram Rachum writes: > > > > > What do you think about introducing this constant in the `pathlib` > module: > > null_path = pathlib.Path('\\Device\\Null') if os.name = 'nt' else > pathlib.Path('/dev/null') > = > What's wrong with pathlib.Path(os.devnull)? > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/eXtl40Ysgu8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Sun Jan 5 00:30:07 2014 From: masklinn at masklinn.net (Masklinn) Date: Sun, 5 Jan 2014 00:30:07 +0100 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <20140104225857.GZ29356@ando> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> Message-ID: <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> On 2014-01-04, at 23:59 , Steven D'Aprano wrote: > On Sat, Jan 04, 2014 at 11:22:16AM +0100, spir wrote: >> On 01/03/2014 07:11 PM, Bruce Leban wrote: >>> As to the idea of making strings not iterable, that would break my code. I >>> write a lot of code to manipulate words (to create puzzles) and iterating >>> over strings is fundamental. In fact, I'd like to have strings as results >>> of iteration operations on strings: >>> >>>>>>>>> sorted('string') >>> 'ginrst' >>>>>>>>> list(itertools.permutations('bar')) >>> ['bar', 'bra', 'abr', 'arb', 'rba', 'rab'] > > That would be nice to have. More generally, it would be nice if a sequence type could specify how to derive a new instance of itself (from an iterable for instance). Constructors don't necessarily work (e.g. str's constructor). Clojure has such a concept through the IPersistentCollection protocol: empty(coll) creates a new (empty) instance of coll (clojure's collections being immutable, it makes sense to create an empty collection then add stuff into it via into() or conj()) From amber.yust at gmail.com Sun Jan 5 01:08:13 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sun, 05 Jan 2014 00:08:13 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> Message-ID: <-9056898550324328423@gmail297201516> __fromiter__, anyone? On Sat Jan 04 2014 at 3:31:59 PM, Masklinn wrote: > On 2014-01-04, at 23:59 , Steven D'Aprano wrote: > > On Sat, Jan 04, 2014 at 11:22:16AM +0100, spir wrote: > >> On 01/03/2014 07:11 PM, Bruce Leban wrote: > >>> As to the idea of making strings not iterable, that would break my > code. I > >>> write a lot of code to manipulate words (to create puzzles) and > iterating > >>> over strings is fundamental. In fact, I'd like to have strings as > results > >>> of iteration operations on strings: > >>> > >>>>>>>>> sorted('string') > >>> 'ginrst' > >>>>>>>>> list(itertools.permutations('bar')) > >>> ['bar', 'bra', 'abr', 'arb', 'rba', 'rab'] > > > > That would be nice to have. > > More generally, it would be nice if a sequence type could specify how to > derive a new instance of itself (from an iterable for instance). > Constructors don't necessarily work (e.g. str's constructor). Clojure > has such a concept through the IPersistentCollection protocol: > empty(coll) creates a new (empty) instance of coll (clojure's > collections being immutable, it makes sense to create an empty > collection then add stuff into it via into() or conj()) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sun Jan 5 01:50:11 2014 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 5 Jan 2014 00:50:11 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <-9056898550324328423@gmail297201516> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> <-9056898550324328423@gmail297201516> Message-ID: On Jan 5, 2014 12:08 AM, "Amber Yust" wrote: > > __fromiter__, anyone? I'm unconvinced that it should be a dunder method. Do you expect it to be used like fromiter(str, characters) ? However, +1 on the name, +0 on the idea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amber.yust at gmail.com Sun Jan 5 03:10:52 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sun, 05 Jan 2014 02:10:52 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> <-9056898550324328423@gmail297201516> Message-ID: <-1263298433535047096@gmail297201516> I'm thinking of it being analogous to the __getstate__ and __setstate__ dunders used by Pickle to allow customization of object creation. On Sat Jan 04 2014 at 4:50:11 PM, Joshua Landau wrote: > On Jan 5, 2014 12:08 AM, "Amber Yust" wrote: > > > > __fromiter__, anyone? > > I'm unconvinced that it should be a dunder method. Do you expect it to be > used like > > fromiter(str, characters) > > ? > > However, +1 on the name, +0 on the idea. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 5 08:24:03 2014 From: guido at python.org (Guido van Rossum) Date: Sat, 4 Jan 2014 21:24:03 -1000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple In-Reply-To: <-1263298433535047096@gmail297201516> References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> <-9056898550324328423@gmail297201516> <-1263298433535047096@gmail297201516> Message-ID: Is this tread still about strings vs. other iterables? First of all, the motivation for making strings iterable is that they are indexable and sliceable, which means they act like sequences. Historically, indexing and slicing predated the concept of iterators in Python. Many other languages (starting with Pascal and C) also treat strings as arrays; while many of those have a separate character type, a few languages follow Python's example (or the other way around, I don't feel like tracking the influences exactly, or even finding examples -- I do know they exist). There are also languages where strings are *not* considered arrays (I think this is the case in Ruby and Perl). In such languages string manipulation is typically done using regular expressions or similar APIs, although there usually also non-array APIs to get characters or substrings using indexes, but those APIs may not be O(1), e.g. for reasons having to do with decoding UTF-8 on the fly. All in all I am happy with Python's string-as-array semantics and I don't want to change this. While I would like to encourage API designs that don't require distinguishing between strings and other iterables (just like I prefer APIs that don't require distinguishing between sequences and mappings, or between callables and "plain values"), I realize that pragmatically people are going to want to write such code, and an ABC seems a good choice. However, if "Atomic" is still under consideration, I would strongly argue against that particular term. Given that a string is an array of characters, calling it an "atom" (== indivisible) seems particularly out of order. (And yes, I know that the use of the term in physics is also a misnomer -- let's not repeat that mistake. :-) Alas, I don't have a better name, but I'm sure the thesauriers will find something. We have until Python 3.5 is released to agree on a name. :-) -- --Guido van Rossum (python.org/~guido) From aquavitae69 at gmail.com Sun Jan 5 12:09:38 2014 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 5 Jan 2014 13:09:38 +0200 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones. Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex. Also, the only times I can recall using a string as a sequence is in doctests (because it reads better than a list of characters) or in the interpreter when I'm trying something out. I'm not suggesting changing it - there's too much history for that, but I am interested to know if there is some fundamental reason that strings are sequences. If a new string object was being implemented now, would it be a sequence? On 3 Jan 2014 02:49, "Guido van Rossum" wrote: > By designing an API that doesn't require such overloading. > > On Thursday, January 2, 2014, Alexander Heger wrote: > >> >> isinstance(x, Iterable) and not isinstance(x, str) >> > >> > If you find yourself typing that a lot I think you have a bigger >> problem though. >> >> How do you replace this? >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > --Guido van Rossum (on iPad) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sun Jan 5 18:49:26 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 5 Jan 2014 10:49:26 -0700 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: On Jan 5, 2014 4:10 AM, "David Townshend" wrote: > > Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones. Sometimes I think it would be more clear if strings weren't sequences but had various attributes that exposed sequence "views", e.g. codepoints, etc. Making strings non-sequences isn't realistic at this point, but adding the sequence view attributes may still be nice. That said, at present it's not something I personally have any use case for. There was an article floating around the web recently where the deficiencies of unicode implementations was discussed and I recall something there or in related discussions about use cases for having different views into a string. Wow that was vague. :) The different views into unicode strings certainly comes up from time to time on our lists. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jan 5 18:53:16 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Jan 2014 18:53:16 +0100 Subject: [Python-ideas] `pathlib.Path.write` and `pathlib.Path.read` References: <904adce4-6534-47b4-bedf-112624e7331f@googlegroups.com> Message-ID: <20140105185316.7ac5084f@fsol> On Sat, 4 Jan 2014 14:41:01 -0800 (PST) Ram Rachum wrote: > > This will be super useful to me. Many files actions are one liners like > that, and avoiding putting the `with` clause in user code would be > wonderful. > > What do you think? I agree something like that would be useful, I'm just not sure what the ideal API would be. For starters I think "binary" shouldn't be an argument: there should be separate methods for reading/writing text and binary contents. Also, you need to be able to pass encoding and other parameters for text files. Regards Antoine. From abarnert at yahoo.com Sun Jan 5 18:48:31 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 5 Jan 2014 09:48:31 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> On Jan 5, 2014, at 3:09, David Townshend wrote: > Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones. You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings. > Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map. From amber.yust at gmail.com Sun Jan 5 19:17:13 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sun, 05 Jan 2014 18:17:13 +0000 Subject: [Python-ideas] strings as iterables - from str.startswith taking any iterator instead of just tuple References: <52C68EB7.6090404@gmail.com> <73AA00E0-B717-4288-9572-60040206F06F@masklinn.net> <52C7E0D8.6000007@gmail.com> <20140104225857.GZ29356@ando> <14507B3E-0A95-41D8-93DC-FFFAE5EDFDAE@masklinn.net> <-9056898550324328423@gmail297201516> <-1263298433535047096@gmail297201516> Message-ID: <818997001994832807@gmail297201516> For ABC names, perhaps "IndependentSequence" or "UnaffiliatedSequence"? On Sat Jan 04 2014 at 11:25:23 PM, Guido van Rossum wrote: > Is this tread still about strings vs. other iterables? > > First of all, the motivation for making strings iterable is that they > are indexable and sliceable, which means they act like sequences. > > Historically, indexing and slicing predated the concept of iterators > in Python. Many other languages (starting with Pascal and C) also > treat strings as arrays; while many of those have a separate character > type, a few languages follow Python's example (or the other way > around, I don't feel like tracking the influences exactly, or even > finding examples -- I do know they exist). There are also languages > where strings are *not* considered arrays (I think this is the case in > Ruby and Perl). In such languages string manipulation is typically > done using regular expressions or similar APIs, although there usually > also non-array APIs to get characters or substrings using indexes, but > those APIs may not be O(1), e.g. for reasons having to do with > decoding UTF-8 on the fly. > > All in all I am happy with Python's string-as-array semantics and I > don't want to change this. > > While I would like to encourage API designs that don't require > distinguishing between strings and other iterables (just like I prefer > APIs that don't require distinguishing between sequences and mappings, > or between callables and "plain values"), I realize that pragmatically > people are going to want to write such code, and an ABC seems a good > choice. > > However, if "Atomic" is still under consideration, I would strongly > argue against that particular term. Given that a string is an array of > characters, calling it an "atom" (== indivisible) seems particularly > out of order. (And yes, I know that the use of the term in physics is > also a misnomer -- let's not repeat that mistake. :-) > > Alas, I don't have a better name, but I'm sure the thesauriers will > find something. We have until Python 3.5 is released to agree on a > name. :-) > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at 2sn.net Sun Jan 5 20:02:29 2014 From: python at 2sn.net (Alexander Heger) Date: Mon, 6 Jan 2014 06:02:29 +1100 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: > People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map. whereas the issue seems now settled, you could use explicit functions like str.iter(), str.codepoints(), str.substr(), ... From ethan at stoneleaf.us Sun Jan 5 20:33:51 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 11:33:51 -0800 Subject: [Python-ideas] a new bytestring type? Message-ID: <52C9B39F.6060205@stoneleaf.us> As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. How many would be interested in having a 'bytestring'? What do you see as the distinguishing characteristics? -- ~Ethan~ From amber.yust at gmail.com Sun Jan 5 20:58:04 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sun, 05 Jan 2014 19:58:04 +0000 Subject: [Python-ideas] a new bytestring type? References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <-3280845380621406811@gmail297201516> How would you see this bytestring type as differentiating itself from bytes? What use cases do you envision? On Sun Jan 05 2014 at 11:56:46 AM, Ethan Furman wrote: > As anyone who has worked with Python 3 and low-level protocols knows, > Python 3 has no 'bytestring' type. It has > immutable and mutable versions of arrays of integers, otherwise known as > 'bytes' and 'bytearray'. > > How many would be interested in having a 'bytestring'? > > What do you see as the distinguishing characteristics? > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Jan 5 21:04:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 12:04:20 -0800 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9B39F.6060205@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <52C9BAC4.9070106@stoneleaf.us> On 01/05/2014 11:33 AM, Ethan Furman wrote: > As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has > immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. > > How many would be interested in having a 'bytestring'? +1 > What do you see as the distinguishing characteristics? Indexing returns a bytestring of length 1, not an integer `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)' -- ~Ethan~ From lukasz at langa.pl Sun Jan 5 21:30:12 2014 From: lukasz at langa.pl (=?utf-8?Q?=C5=81ukasz_Langa?=) Date: Sun, 5 Jan 2014 12:30:12 -0800 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9BAC4.9070106@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <52C9BAC4.9070106@stoneleaf.us> Message-ID: <6043A12F-B1B6-45E3-A6AA-35D766955190@langa.pl> On Jan 5, 2014, at 12:04 PM, Ethan Furman wrote: > On 01/05/2014 11:33 AM, Ethan Furman wrote: >> As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has >> immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. >> >> How many would be interested in having a 'bytestring'? > > +1 "I don't always +1 on python-ideas, but when I do, I do it on my own posts." ;) -- Best regards, ?ukasz Langa WWW: http://lukasz.langa.pl/ Twitter: @llanga IRC: ambv on #python-dev From ethan at stoneleaf.us Sun Jan 5 21:08:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 12:08:05 -0800 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <-3280845380621406811@gmail297201516> References: <52C9B39F.6060205@stoneleaf.us> <-3280845380621406811@gmail297201516> Message-ID: <52C9BBA5.8080104@stoneleaf.us> On 01/05/2014 11:58 AM, Amber Yust wrote: > > How would you see this bytestring type as differentiating itself from bytes? What use cases do you envision? I put the questions there so others could fill in the blanks for themselves. I have responded to the original question with two of the differentiating features (the two that bug me most, of course ;). -- ~Ethan~ From solipsis at pitrou.net Sun Jan 5 22:01:44 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 5 Jan 2014 22:01:44 +0100 Subject: [Python-ideas] a new bytestring type? References: <52C9B39F.6060205@stoneleaf.us> <52C9BAC4.9070106@stoneleaf.us> Message-ID: <20140105220144.67c0a613@fsol> On Sun, 05 Jan 2014 12:04:20 -0800 Ethan Furman wrote: > On 01/05/2014 11:33 AM, Ethan Furman wrote: > > As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has > > immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. > > > > How many would be interested in having a 'bytestring'? > > +1 > > > > What do you see as the distinguishing characteristics? > > Indexing returns a bytestring of length 1, not an integer > > `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)' I agree with that, but it's much too late, and I'm -10 on adding another, similar but different, bytestring type. Regards Antoine. From ethan at stoneleaf.us Sun Jan 5 21:51:33 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 12:51:33 -0800 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <6043A12F-B1B6-45E3-A6AA-35D766955190@langa.pl> References: <52C9B39F.6060205@stoneleaf.us> <52C9BAC4.9070106@stoneleaf.us> <6043A12F-B1B6-45E3-A6AA-35D766955190@langa.pl> Message-ID: <52C9C5D5.9080508@stoneleaf.us> On 01/05/2014 12:30 PM, ?ukasz Langa wrote: > On Jan 5, 2014, at 12:04 PM, Ethan Furman wrote: > >> On 01/05/2014 11:33 AM, Ethan Furman wrote: >>> As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has >>> immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. >>> >>> How many would be interested in having a 'bytestring'? >> >> +1 > > "I don't always +1 on python-ideas, but when I do, I do it on my own posts." +1 QOTW ! From ncoghlan at gmail.com Sun Jan 5 23:57:24 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Jan 2014 08:57:24 +1000 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9B39F.6060205@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> Message-ID: On 6 Jan 2014 03:56, "Ethan Furman" wrote: > > As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. > > How many would be interested in having a 'bytestring'? > > What do you see as the distinguishing characteristics? I actually expected someone to have experimented with an "encodedstr" type by now. This would be a type that behaved like the Python 2 str type, but had an encoding attribute. On encountering Unicode text strings, it would encode then appropriately. However, people have generally instead followed the model of decoding to text and operating in that domain, since it avoids a lot of subtle issues (like accidentally embedding byte order marks when concatenating strings). This is likely encouraged by the fact that str, bytes and bytearray don't currently implement type coercion correctly (which in turn is due to a long standing bug in the way the abstract C API handles sequence types defined in C rather than Python), so an encodedstr type would need to inherit from str or bytes to get interoperability, and then wouldn't interoperate with the other one. Cheers, Nick. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jan 6 01:27:08 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Jan 2014 11:27:08 +1100 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On Mon, Jan 6, 2014 at 4:48 AM, Andrew Barnert wrote: > And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) You could simply "for ch in s.split('')". A number of languages define that to mean fracturing a string into one-character strings. Python currently raises ValueError, so it won't break existing code. But yes, it's easier to be able to iterate over a string. ChrisA From rosuav at gmail.com Mon Jan 6 01:38:17 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Jan 2014 11:38:17 +1100 Subject: [Python-ideas] `pathlib.Path.write` and `pathlib.Path.read` In-Reply-To: <20140105185316.7ac5084f@fsol> References: <904adce4-6534-47b4-bedf-112624e7331f@googlegroups.com> <20140105185316.7ac5084f@fsol> Message-ID: On Mon, Jan 6, 2014 at 4:53 AM, Antoine Pitrou wrote: > For starters I think "binary" shouldn't be an > argument: there should be separate methods for reading/writing text and > binary contents. For reading, yes. For writing, the type of the 'data' argument should say whether it's binary or text, without having to be told. Not sure it belongs in pathlib, though. Here's the naive code to do a simple translation on a file: data = open(fn).read() open(fn,"w").write(data.replace('Q','QU')) Doesn't use with, will probably work on CPython but risks trampling on itself in other interpreters. Needs a solution. Will someone who's told "hey, there's a potential problem in that code" go looking in pathlib? I'm not sure about that. I'd be thinking about files and strings, but not about paths. It'd be great as a built-in: write_file(fn,read_file(fn).replace('Q','QU')) or in some namespace that screams "Hey look, file I/O", but I can't imagine looking for it in pathlib. Now that 'file' isn't a builtin, would it be worth having a file module that has this sort of thing? Or would that cause too much confusion? ChrisA From dreamingforward at gmail.com Mon Jan 6 01:39:53 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 5 Jan 2014 18:39:53 -0600 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9B39F.6060205@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> Message-ID: > As anyone who has worked with Python 3 and low-level protocols knows, Python > 3 has no 'bytestring' type. It has immutable and mutable versions of arrays > of integers, otherwise known as 'bytes' and 'bytearray'. "arrays of integers"? You mean, unsigned short ints? There's an important difference. One references an abstraction, and one references a concrete machine type. The other consideration is knowing what you mean by "string", if you mean something to be interpreted textually, then the convention is to use unsigned chars to document your intentions, which "technically" is the same (as far as memory layout is concerned). (I say "technically" because there is some space reserved for endian-ness which can change the bit ordering.) > How many would be interested in having a 'bytestring'? > > What do you see as the distinguishing characteristics? What it *should* have is a bytes-type, which is a raw, 8-bit type which may or may not printable on the screen with quotation marks. Different subtypes, >>>class Text(bytes) can interpret those bytes as they want (as a text string for example, with or without formatting awareness for control codes. Otherwise File(bytes) can interpret those bytes as binary data, so as to write to the file system without any transformation of the codes (i.e. raw). I'm afraid this reply may not be up to the standards of the list, but hopefully has some useful data that has gone without good understanding. MarkJ From abarnert at yahoo.com Mon Jan 6 01:38:03 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 5 Jan 2014 16:38:03 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: <948D47B0-407B-48F9-9806-84CEAF22F2A3@yahoo.com> On Jan 5, 2014, at 11:02, Alexander Heger wrote: >> People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map. > > whereas the issue seems now settled, you could use explicit functions > like str.iter(), str.codepoints(), str.substr(), ... Sure, and we could add list.iter(), list.slice(), etc. and get rid of iterables, indexing and slicing, entirely. If we add separate map and similar methods to every iterable type, we can even get rid of iterators. If it's good enough for ObjC, why should Python try to be more readable or concise? From dreamingforward at gmail.com Mon Jan 6 01:45:28 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 5 Jan 2014 18:45:28 -0600 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: > "arrays of integers"? You mean, unsigned short ints? There's an > important difference. One references an abstraction, and one > references a concrete machine type. > > The other consideration is knowing what you mean by "string", if you > mean something to be interpreted textually, then the convention is to > use unsigned chars to document your intentions, which "technically" is > the same (as far as memory layout is concerned). (I say "technically" > because there is some space reserved for endian-ness which can change > the bit ordering.) One mistake I already wish to correct is in the last sentence: "endian-ness" *always* changes or refers to the bit ordering. Secondly, the term only applies to numerical (always integer, AFAIK) representation -- not for chars. Trying to be complete... MarkJ From cs at zip.com.au Mon Jan 6 02:29:12 2014 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 6 Jan 2014 12:29:12 +1100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9C5D5.9080508@stoneleaf.us> References: <52C9C5D5.9080508@stoneleaf.us> Message-ID: <20140106012912.GA72493@cskk.homeip.net> On 05Jan2014 12:51, Ethan Furman wrote: > On 01/05/2014 12:30 PM, ?ukasz Langa wrote: > >On Jan 5, 2014, at 12:04 PM, Ethan Furman wrote: > >>On 01/05/2014 11:33 AM, Ethan Furman wrote: > >>>As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has > >>>immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. > >>> > >>>How many would be interested in having a 'bytestring'? > >> > >>+1 > > > >"I don't always +1 on python-ideas, but when I do, I do it on my own posts." > > +1 QOTW ! +1 QOTW ... but doesn't your +1 falsify the quote you're +1ing? -- Cameron Simpson This person is currently undergoing electric shock therapy at Agnews Developmental Center in San Jose, California. All his opinions are static, please ignore him. Thank you, Nurse Ratched - the sig quote of Bob "Another beer, please" Christ From dreamingforward at gmail.com Mon Jan 6 03:00:27 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Sun, 5 Jan 2014 20:00:27 -0600 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: >> "arrays of integers"? You mean, unsigned short ints? There's an >> important difference. One references an abstraction, and one >> references a concrete machine type. >> >> The other consideration is knowing what you mean by "string", if you >> mean something to be interpreted textually, then the convention is to >> use unsigned chars to document your intentions, which "technically" is >> the same (as far as memory layout is concerned). (I say "technically" >> because there is some space reserved for endian-ness which can change >> the bit ordering.) > > One mistake I already wish to correct ... > Trying to be complete... Come to think of it, this issue (the relationship between bytes, text, and char/ints) may be the entire reason Python3 "uptake" hasn't happened. It gets back to the same old argument I've been trying to make about "models of computation". Python3 apparently did not respect the machine and went the way of the "dark side", hence scientific computing hasn't been as quick to convert to Python 3. Specifically, the final issue with regard to bytes (and it's consequent model of computation) is thus: 1) how they maintain representation on the file system (the "disk") vs. 2) how they are represented and managed in memory. This is the primary articulation point regarding how the *abstraction of computing* relates to its *implementation*. This also relates to the Turing Machine and it's articulation with the underlying VonNeumann architecture (implementation). Ned, I hope you're finally understanding this. MarkJ From ethan at stoneleaf.us Mon Jan 6 02:37:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Jan 2014 17:37:43 -0800 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <20140106012912.GA72493@cskk.homeip.net> References: <52C9C5D5.9080508@stoneleaf.us> <20140106012912.GA72493@cskk.homeip.net> Message-ID: <52CA08E7.4010603@stoneleaf.us> On 01/05/2014 05:29 PM, Cameron Simpson wrote: > On 05Jan2014 12:51, Ethan Furman wrote: >> On 01/05/2014 12:30 PM, ?ukasz Langa wrote: >>> On Jan 5, 2014, at 12:04 PM, Ethan Furman wrote: >>>> On 01/05/2014 11:33 AM, Ethan Furman wrote: >>>>> As anyone who has worked with Python 3 and low-level protocols knows, Python 3 has no 'bytestring' type. It has >>>>> immutable and mutable versions of arrays of integers, otherwise known as 'bytes' and 'bytearray'. >>>>> >>>>> How many would be interested in having a 'bytestring'? >>>> >>>> +1 >>> >>> "I don't always +1 on python-ideas, but when I do, I do it on my own posts." >> >> +1 QOTW ! > > +1 QOTW > > ... but doesn't your +1 falsify the quote you're +1ing? Hrmmm.... well, just in case: +2! From ned at nedbatchelder.com Mon Jan 6 04:39:51 2014 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 05 Jan 2014 22:39:51 -0500 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <52CA2587.8000200@nedbatchelder.com> On 1/5/14 9:00 PM, Mark Janssen wrote: >>> "arrays of integers"? You mean, unsigned short ints? There's an >>> important difference. One references an abstraction, and one >>> references a concrete machine type. >>> >>> The other consideration is knowing what you mean by "string", if you >>> mean something to be interpreted textually, then the convention is to >>> use unsigned chars to document your intentions, which "technically" is >>> the same (as far as memory layout is concerned). (I say "technically" >>> because there is some space reserved for endian-ness which can change >>> the bit ordering.) >> One mistake I already wish to correct ... >> Trying to be complete... > Come to think of it, this issue (the relationship between bytes, text, > and char/ints) may be the entire reason Python3 "uptake" hasn't > happened. It gets back to the same old argument I've been trying to > make about "models of computation". Python3 apparently did not > respect the machine and went the way of the "dark side", hence > scientific computing hasn't been as quick to convert to Python 3. > > Specifically, the final issue with regard to bytes (and it's > consequent model of computation) is thus: 1) how they maintain > representation on the file system (the "disk") vs. 2) how they are > represented and managed in memory. This is the primary articulation > point regarding how the *abstraction of computing* relates to its > *implementation*. This also relates to the Turing Machine and it's > articulation with the underlying VonNeumann architecture > (implementation). > > Ned, I hope you're finally understanding this. Mark, I think you are confusing my posts in Python-List with this thread. I would rather you didn't address me: my interactions with you in the past have been unpleasant, especially where we've tried to get to the bottom of one of your typically obscure references to the theory of computation. You've mocked and ignored me when I've tried to treat your ideas with respect, so I'm not going to make that mistake again. --Ned. > MarkJ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From tjreedy at udel.edu Mon Jan 6 06:08:15 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 06 Jan 2014 00:08:15 -0500 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On 1/5/2014 12:48 PM, Andrew Barnert wrote: > On Jan 5, 2014, at 3:09, David Townshend > wrote: > >> Reading this thread made me start to think about why a string is a >> sequence, Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example: class Symbol: def __init__(self, name): self._name = name # optionally check that name is string def __eq__(self, other): return self._name == other._name def __hash__(self): return hash(self._name) def __repr__(self): return 'Symbol({r:})'.format(self._name) __str__ = __repr__ # or define to tast Now Symbols are hashable, equality-comparable, but not iterable. In other words, I believe the desire for a non-iterable 'string' is a desire for something that is not really a string, but is perhaps being represented as a string merely for convenience. Using duples as linked-list nodes (which I have done), because one does not bother to define a node class is similar. Tuple iteration is equally meaningless in this context as string iteration is in symbol context. > You've seriously never indexed or sliced a string? Those are the two > core operations in sequences, and they're obviously useful on > strings. And as already explained, indexable means iterable. >> Every use case I can think of for iterating over a string either >> involves first splitting the string, or would be better done with a >> regex Splitting involves forward iteration. Regex matching adds backtracking on top of forward iteration. Please tell me a *string* algorithm that does *not* involve character iteration somewhere. > People have mentioned use cases for iterating strings in this thread. > And it's easy to think of more. There are all kinds of algorithms > that treat strings as sequences of characters. Sure, many of these > functions are already methods on str or otherwise built into the > stdlib, but that just means they're implemented by iterating the > string storage in C with a loop around "*++s". I was going to make the same point. Strings have the following methods: 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly all start with 'for c in s:' (or 'in reversed(s)'). The ones that do not generally use len(s). Len(s) is calculated in str.__new__ with an internal iteration: 'for char added to string, increment len counter'. Comparing strings also involves interation, hence sorting lists of strings by comparison > And if you want to > extend that set of builtins with similar functions, how else would > you do it but with a "for ch in s" loop? (Well, you could "for ch in > list(s)", but that's still treating strings as iterables.) For > example, many people are asked to write a rot13 function in one of > their first classes. How would you write that if strings weren't > iterables? There's no way a regex is going to help you here, unless > yo u wanted to do something like using re.sub('.') as a convoluted > and slow way of writing map. AFAIK, all the codecs iterate character by character. -- Terry Jan Reedy From stephen at xemacs.org Mon Jan 6 08:35:28 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 06 Jan 2014 16:35:28 +0900 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9B39F.6060205@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <87iotximv3.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > How many would be interested in having a 'bytestring'? -1. It's an attractive nuisance. > What do you see as the distinguishing characteristics? Its main attraction is that it allows people who in practice only ever deal with one non-Unicode encoding to ignore the fact that their data is in fact encoded, and that their applications are very likely not robust to data encoded differently. While I sympathize with their problem to some extent (especially people who are writing low-level web services), I don't think you'd ever again be able to trust a 3rd- party module in a web context without doing a thorough audit to ensure that all uses of 'bytestrings' are appropriate in themselves and appropriately guarded against leaking garbage into other contexts. Thus, "attractive nuisance." From bruce at leapyear.org Mon Jan 6 08:06:10 2014 From: bruce at leapyear.org (Bruce Leban) Date: Sun, 5 Jan 2014 23:06:10 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On Sun, Jan 5, 2014 at 9:48 AM, Andrew Barnert wrote: > > Reading this thread made me start to think about why a string is a > sequence, and I can't actually see any obvious reason, other than > historical ones. > > You've seriously never indexed or sliced a string? Those are the two core > operations in sequences, and they're obviously useful on strings. > I am doing most coding in two languages right now: Python and Javascript. I have never wished that Python had string.charAt(i) but I have often wished that Javascript had string[i]. When I've iterated over the characters in a string in Javascript, it has never occurred to me to write it using str.split(''). By irrelevant analogy, I have never used complex numbers in Python or Javascript and I can't see any obvious reason to support them. It just confuses people who inadvertently write cmath.sqrt instead of math.sqrt. For the few people that use complex numbers, they would be better served by a tuple of real and imaginary parts. As someone who doesn't use them, my opinion is clearly more important that that of those that use them. --- Bruce Learn how hackers think: http://j.mp/gruyere-security (Not serious about removing complex numbers from Python. If you didn't see the sarcasm, sorry.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Mon Jan 6 09:09:41 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 6 Jan 2014 00:09:41 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy wrote: > On 1/5/2014 12:48 PM, Andrew Barnert wrote: >> >> On Jan 5, 2014, at 3:09, David Townshend >> wrote: >> >>> Reading this thread made me start to think about why a string is a >>> sequence, > > > Because a string is defined in math/language theory as a sequence of symbols > from an alphabet. If you want to invent or define something else, such as an > atomic symbol type, please use a different term. For example: And sequences in math / CS are functions from the natural numbers to elements of the sequence. Since isinstance(str, types.FunctionType) isn't True, it must mean that Python strings aren't strings. But seriously, Python functions aren't functions, the set of Python complex numbers is not the set of complex numbers, Python types aren't types, and Python addition is not addition; mathematical terminology in programming is evocative and not actually literally true. Arguments based on trying to literally copy math to the letter are flawed, probably irretrievably so. The important feature of strings in math is not that they are literally a sequence of characters, but that they correspond to a sequence of characters isomorphically. You can represent them any way you like, as long as you maintain that isomorphism, and the operations with the right names do the right thing, etc. As evidence, observe that not every programming language has its string type obey the equivalent of Python's sequence interface or math's notion of "sequence" per se (mapping naturals to elements). For example, Haskell strings are linked lists; Rust strings are arrays behind the scenes but don't expose it within the str type; etc. It's not just strings, either, There are a multitude of ways of defining the natural numbers -- maybe a natural number is a set of a given structure (and which structure?), maybe it is a pair of integers where the second integer is 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a bitstring of arbitrary finite length. The usual construction in math is the first, but Python uses the last one. To say Python doesn't actually have natural numbers but does have strings, is absurd, but it is what your logic points towards. If two things are equivalent, everything said about one can be said about the other, and math is about saying things about stuff, not about precise definitions of structure -- those are chosen for convenience. -- Devin From jeanpierreda at gmail.com Mon Jan 6 09:19:44 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 6 Jan 2014 00:19:44 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On Mon, Jan 6, 2014 at 12:09 AM, Devin Jeanpierre wrote: > On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy wrote: >> On 1/5/2014 12:48 PM, Andrew Barnert wrote: >>> >>> On Jan 5, 2014, at 3:09, David Townshend >>> wrote: >>> >>>> Reading this thread made me start to think about why a string is a >>>> sequence, >> >> >> Because a string is defined in math/language theory as a sequence of symbols >> from an alphabet. If you want to invent or define something else, such as an >> atomic symbol type, please use a different term. For example: > > And sequences in math / CS are functions from the natural > numbers to elements of the sequence. Since isinstance(str, > types.FunctionType) isn't True, it must mean that Python strings > aren't strings. > > But seriously, Python functions aren't functions, the set of Python > complex numbers is not the set of complex numbers, Python types aren't > types, and Python addition is not addition; mathematical terminology > in programming is evocative and > not actually literally true. Arguments based on trying to literally > copy math to the letter are flawed, probably irretrievably so. > > The important feature of strings in math is not that they are > literally a sequence of characters, but that they correspond to a > sequence of characters isomorphically. You can represent them any way > you like, as long as you maintain that isomorphism, and the operations > with the right names do the right thing, etc. As evidence, observe > that not every programming language has its string type obey the > equivalent of Python's sequence interface or math's notion of > "sequence" per se (mapping naturals to elements). For example, Haskell > strings are linked lists; Rust strings are arrays behind the scenes > but don't expose it within the str type; etc. > > It's not just strings, either, There are a multitude of ways of > defining the natural numbers -- maybe a natural number is a set of a > given structure (and which structure?), maybe it is a pair of integers > where the second integer is 1, maybe it is an infinite sequence of > rationals whose limit is a rational with denominator 1, maybe it is a > bitstring of arbitrary finite length. The usual construction in math > is the first, but Python uses the last one. To say Python doesn't > actually have natural numbers but does have strings, is absurd, but it > is what your logic points towards. If two things are equivalent, > everything said about one can be said about the other, and math is > about saying things about stuff, not about precise definitions of > structure -- those are chosen for convenience. Apologies, I wasn't thinking much and bungled that last argument (should've talked about integers instead of naturals; and even did, for half of it...). Fixed: [...] There are a multitude of ways of defining the integers -- maybe an integer is an equivalence class over the pairs of naturals, maybe it is rational number with denominator 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a two's complement bitstring of arbitrary length. The usual construction in math is the first (or the second to last), but Python uses the last one. To say that Python doesn't actually have integers, but does have strings, is absurd, but [...] -- Devin From geertj at gmail.com Mon Jan 6 09:28:18 2014 From: geertj at gmail.com (Geert Jansen) Date: Mon, 6 Jan 2014 09:28:18 +0100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <52C9B39F.6060205@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> Message-ID: On Sun, Jan 5, 2014 at 8:33 PM, Ethan Furman wrote: > As anyone who has worked with Python 3 and low-level protocols knows, Python > 3 has no 'bytestring' type. It has immutable and mutable versions of arrays > of integers, otherwise known as 'bytes' and 'bytearray'. > > How many would be interested in having a 'bytestring'? I'm not missing a new type, but I am missing the format method on the binary types. Regards, Geert From stephen at xemacs.org Mon Jan 6 11:57:18 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 06 Jan 2014 19:57:18 +0900 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> Geert Jansen writes: > I'm not missing a new type, but I am missing the format method on the > binary types. I'm curious about precisely what your use cases are, and just what formatting they need. The problem that Python 2 code has over and over imposed on me is that the temptation to avoid the overhead of conversion to and then from unicode when processing text by just using str results in the equivalent of bs1 = returns_a_bytestring_encoded_in_utf8() bs2 = returns_a_bytestring_encoded_in_koi8() bs3 = b'{0} {1}'.format(bs1, bs2) # and lose big when something expects valid UTF-8 in bs3 In low-level code, the assignments to bs1, bs2, and bs3 are likely to be in three separate contexts, even three separate modules. I understand about consenting adults, but it's just too hard to enforce good practice here if you make it easy to pass around and operate on encoded bytestrings. I don't see how you avoid this pitfall, except by making it easier to pass around Unicode than encoded strings. And given that encoding and decoding are unavoidable, that means making use of bytestrings with text semantics painful. So to answer my question from my own point of view, for example, I would have no problem at all with b'{0:c}'.format(27) == b'\x1b' # insert an ASCII ESC character I would be leery of b'{0:s}'.format(b'\x1b[M') == b'\x1b[M' # insert a ANSI control sequence for the reason given above (for this use case, I would prefer blue_code = ord('M') # Or b'M', doesn't matter! b'\x1b[{0:c}'.format(blue_code) == b'\x1b[M' -- and forgive me for not looking up my ANSI color sequences, it's only luck if that's close) and I would consider b'{0:d}'.format(27) == b'27' # insert the ASCII representation to be an abomination since there's no reason to suppose that any given bytestring is encoded in an ASCII-compatible way, or bigendian for that matter. Ditto everything else that involves representing a number as a string of numeric characters. From steve at pearwood.info Mon Jan 6 11:57:33 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 6 Jan 2014 21:57:33 +1100 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: <20140106105732.GI29356@ando> On Sun, Jan 05, 2014 at 11:06:10PM -0800, Bruce Leban wrote: > As someone who doesn't use them [complex numbers], my > opinion is clearly more important that that of those that use them. :-) +1 QOTW -- Steven From abarnert at yahoo.com Mon Jan 6 12:16:05 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jan 2014 03:16:05 -0800 (PST) Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <1389006965.30768.YahooMailNeo@web181006.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Sunday, January 5, 2014 2:57 PM >I actually expected someone to have experimented with an "encodedstr" type by now. This would be a type that behaved like the Python 2 str type, but had an encoding attribute. On encountering Unicode text strings, it would encode then appropriately. I did something like this when I was first playing with 3.0, and I managed to find it.? I tried two different implementations, a bytes subclass that fakes being a str as well as possible by decoding on the fly (or, in some cases, by encoding its arguments on the fly), and a str that fakes being a bytes as well as possible by doing the opposite. >However, people have generally instead followed the model of decoding to text and operating in that domain, since it avoids a lot of subtle issues (like accidentally embedding byte order marks when concatenating strings). It's also conceptually cleaner to work with text as text instead of as bytes that you can sort of use as text. Also, one major reason people resist working with text (or upgrading to 3.x) is the perceived performance costs of dealing with Unicode. But if you want to do any kind of string processing on your text beyond searching for ASCII header names and the like, you pretty much have to do it as Unicode or it's wrong. So, you'd need something that allows you to do those ASCII header searches in 8-bit-land, but either doesn't allow full string processing, or automatically decodes and re-encodes on the fly (which obviously isn't going to be faster). >This is likely encouraged by the fact that str, bytes and bytearray don't currently implement type coercion correctly (which in turn is due to a long standing bug in the way the abstract C API handles sequence types defined in C rather than Python), so an encodedstr type would need to inherit from str or bytes to get interoperability, and then wouldn't interoperate with the other one. What's the bug? Anyway, I started off with the idea of inheriting from str or bytes in the first place because it seemed more natural than delegating, so I guess I didn't run into it.? In general, it seems like you can interoperate just fine; an ebytes or estr (the names of my two classes) can, e.g., find, format, join, radd, whatever a bytes, str, ebytes, or estr without a problem, returning the appropriate types. The problem is interacting with functions that explicitly want the other type. This includes C functions that, e.g., take a "U" parameter, like TextIOWrapper.write, but it's just as much of a problem with Python functions that check isinstance(str) (either to reject bytes, or to switch and do different things on bytes and str). So, you have to write things like "f.write(str(s))" instead of "f.write(s)" all over the place. There's also a problem with functions that will take a str and do something useful, or take a bytes and do something stupid, like assume it must be in the appropriate encoding for the filesystem. An ebytes just looks like a bytes to such functions, and therefore does the wrong thing. Again, you have to do things like "open(str(s))"?and, if you don't, instead of an error you get silent mojibake. (Which I guess is a good simulation of the Python 2 str type after all?) I couldn't find a way around the problem for ebytes. For estr, I fought for a while to make it support the buffer protocol (I wrote a Cython wrapper to let me delegate to another buffer from Python so I wouldn't have to write the whole thing in C), which fixes the problems with most C API functions, but doesn't help at all for Python functions. Meanwhile, there are some design issues that aren't entirely clear. The most obvious one is the performance issue I raised above. Should we cache the Unicode? Maybe even pre-compute it? I went with no caching just because it was the simplest implementation. Exactly which methods should act on bytes and which on characters? My initial cut was that searching-related methods like startswith, index, split, or replace should be bytes, while things like casefold and zfill Unicode. The division isn't entirely clear, but it's something to start with. (I also considered switching on the types of the other arguments?e.g., replace would be byte-based when given a bytes or an ebytes of the same encoding, but Unicode-based when given a str or an ebytes of a different encoding?but that seemed overly complicated.) Should indexing and iteration return numbers, as with bytes? It's obvious what encode should do (transcode to an ebytes in a different encoding), but what about decode? (I left bytes.decode alone, but I think that was a bad choice; that makes it an inverse to a change_encoding function that reinterprets the bytes as a different encoding, rather than an inverse to encode.) All that being said, just being able to use format or % with a mix of str and known-encoding-bytes is pretty handy. Anyway, in case anyone wants to take a look at it, I can't find the Cython wrapper, so I dropped estr, but cleaned up ebytes and made sure it works with 3.3 and 3.4 and?uploaded it to?https://github.com/abarnert/ebytes. Please forgive the clunky way I wrote all the forwarding methods. From geertj at gmail.com Mon Jan 6 12:19:08 2014 From: geertj at gmail.com (Geert Jansen) Date: Mon, 6 Jan 2014 12:19:08 +0100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jan 6, 2014 at 11:57 AM, Stephen J. Turnbull wrote: > > I'm not missing a new type, but I am missing the format method on the > > binary types. > > I'm curious about precisely what your use cases are, and just what > formatting they need. One use case I came across was when creating chunks for the HTTP chunked encoding. Chunks contain a ascii header, a raw/encoded chunk body, and an ascii trailer. Using a bytes.format, it would look like this: chunk = '{0:X}\r\n{1}\r\n'.format(len(buf), buf) This is what I am using now: chunk = bytearray() chunk.extend('{0:X}\r\n'.format(len(buf)).encode('ascii')) chunk.extend(buf) chunk.extend('\r\n'.encode('ascii')) Regards, Geert > > The problem that Python 2 code has over and over imposed on me is that > the temptation to avoid the overhead of conversion to and then from > unicode when processing text by just using str results in the > equivalent of > > bs1 = returns_a_bytestring_encoded_in_utf8() > bs2 = returns_a_bytestring_encoded_in_koi8() > > bs3 = b'{0} {1}'.format(bs1, bs2) > # and lose big when something expects valid UTF-8 in bs3 > > In low-level code, the assignments to bs1, bs2, and bs3 are likely to > be in three separate contexts, even three separate modules. I > understand about consenting adults, but it's just too hard to enforce > good practice here if you make it easy to pass around and operate on > encoded bytestrings. I don't see how you avoid this pitfall, except > by making it easier to pass around Unicode than encoded strings. And > given that encoding and decoding are unavoidable, that means making > use of bytestrings with text semantics painful. > > So to answer my question from my own point of view, for example, I > would have no problem at all with > > b'{0:c}'.format(27) == b'\x1b' # insert an ASCII ESC character > > I would be leery of > > b'{0:s}'.format(b'\x1b[M') == b'\x1b[M' # insert a ANSI control sequence > > for the reason given above (for this use case, I would prefer > > blue_code = ord('M') # Or b'M', doesn't matter! > b'\x1b[{0:c}'.format(blue_code) == b'\x1b[M' > > -- and forgive me for not looking up my ANSI color sequences, it's > only luck if that's close) and I would consider > > b'{0:d}'.format(27) == b'27' # insert the ASCII representation > > to be an abomination since there's no reason to suppose that any given > bytestring is encoded in an ASCII-compatible way, or bigendian for > that matter. Ditto everything else that involves representing a > number as a string of numeric characters. > From abarnert at yahoo.com Mon Jan 6 12:34:31 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jan 2014 03:34:31 -0800 (PST) Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <1389008071.36369.YahooMailNeo@web181003.mail.ne1.yahoo.com> From: Geert Jansen Sent: Monday, January 6, 2014 12:28 AM > I'm not missing a new type, but I am missing the format method on the > binary types. I miss that too, but it's a bit tricky. '{}'.format(x) calls str(x). b'{}'.format(x) can't call bytes(x). At least not unless you want b'#{}'.format(6) to give you b'#\0\0\0\0\0\0'. Besides, most types don't provide a __bytes__, so even if it weren't for this problem, it wouldn't really be useful for anything except inserting bytes into other bytes.?So, what _should_ it call? You could add encoding and errors keyword parameters (defaulting to 'ascii' and 'strict'), so b'{}'.format(x, encoding='utf-8') calls str(x).encode('utf-8'), which solves all of those problems? except that now it means you can't stick bytes objects into bytes formats, which is even worse. You could solve that by making objects that support the buffer protocol (like bytes) copy as-is instead of going through str and encode. That would mean you can't use bytes with a placeholder with any format flags, but maybe that's a good thing anyway (e.g., do you really want b'{:3}'.format(b'\xc3\xa9') to only pad to 2 characters instead of 3 because it's a 2-byte character?). That would be enough to let you cram pre-encoded/formatted bytes, and things like numbers, into bytes formats made up of ASCII headers, which I think is 90% of what people want here. Does that seem worth pursuing? From abarnert at yahoo.com Mon Jan 6 12:52:33 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 6 Jan 2014 03:52:33 -0800 (PST) Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> I didn't receive Stephen's email, so forgive me for replying through a reply? From: Geert Jansen Sent: Monday, January 6, 2014 3:19 AM > On Mon, Jan 6, 2014 at 11:57 AM, Stephen J. Turnbull > wrote: > >> ? > I'm not missing a new type, but I am missing the format method on > the >> ? > binary types. >> >> I'm curious about precisely what your use cases are, and just what >> formatting they need. Besides Geert's chunked HTTP example, there are tons of intern protocols and?file formats (including Python source code!), that have ASCII headers (that in some way define an encoding for the actual payload). So things like b'Content-Length: {}'.format(len(payload)) or even b'Content-Type: text/html; charset={}'.format(encoding) are useful. >> ? I would consider >> >> ? ? b'{0:d}'.format(27) == b'27'? ? ? ? ? ? # insert the ASCII representation >> >> to be an abomination since there's no reason to suppose that any given >> bytestring is encoded in an ASCII-compatible way, or bigendian for >> that matter.? Ditto everything else that involves representing a >> number as a string of numeric characters. Endianness isn't relevant here; b'{}'.format(32768) is b'32768', not b'\x80\x00' or b'\x00\x80'. That's what the d format means. As for assuming that it's ASCII-compatible, again, there are all kinds of protocols that work with any ASCII-compatbile charset but don't work otherwise. Yeah, this can be a problem if you want to create an HTTP page or a Python source file in EBCDIC or UTF-16-LE?but even then, if the headers are interpreted as pure ASCII and then the payload is extracted and decoded separately, it still works.?In fact, it works better than if people try to construct everything as text end then encode, giving you illegal/unreadable EBCDIC headers, and this is a common incorrect workaround that Python 2-familiar people do when forced to deal with Python 3. Obviously you could solve most of the same problems by formatting the headers as text, encoding them to ASCII, then concatenating the payload. And I'm not really worried about performance issues with that. But I am worried about convenience and readability?compare the desired and actual versions of Geert's code. As I said in my other email, I might be happy assuming ASCII-strict for everything that isn't a buffer, and copying bytes as-is for everything that is. That _might_ be more of an attractive nuisance than a useful feature, but? it definitely is attractive, and I'm not sure it's a nuisance. From geertj at gmail.com Mon Jan 6 12:57:41 2014 From: geertj at gmail.com (Geert Jansen) Date: Mon, 6 Jan 2014 12:57:41 +0100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <1389008071.36369.YahooMailNeo@web181003.mail.ne1.yahoo.com> References: <52C9B39F.6060205@stoneleaf.us> <1389008071.36369.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: On Mon, Jan 6, 2014 at 12:34 PM, Andrew Barnert wrote: > b'{}'.format(x) can't call bytes(x). At least not unless you want b'#{}'.format(6) to give you b'#\0\0\0\0\0\0'. Besides, most types don't provide a __bytes__, so even if it weren't for this problem, it wouldn't really be useful for anything except inserting bytes into other bytes. So, what _should_ it call? > > You could add encoding and errors keyword parameters (defaulting to 'ascii' and 'strict'), so b'{}'.format(x, encoding='utf-8') calls str(x).encode('utf-8'), which solves all of those problems? except that now it means you can't stick bytes objects into bytes formats, which is even worse. > > You could solve that by making objects that support the buffer protocol (like bytes) copy as-is instead of going through str and encode. That would mean you can't use bytes with a placeholder with any format flags, but maybe that's a good thing anyway (e.g., do you really want b'{:3}'.format(b'\xc3\xa9') to only pad to 2 characters instead of 3 because it's a 2-byte character?). > > That would be enough to let you cram pre-encoded/formatted bytes, and things like numbers, into bytes formats made up of ASCII headers, which I think is 90% of what people want here. Does that seem worth pursuing? Agreed that probably the main case is inserting bytes objects verbatim in a message with a a small ASCII header and possibly trainer. Format flags are useful, e.g. with chunked HTTP encoding you need to insert the length in hex. But if those are only available for non-bytes objects that'd probably be fine. I'm not too familiar with the implementation of format() so I can't say much about it. Regards, Geert From masklinn at masklinn.net Mon Jan 6 12:59:13 2014 From: masklinn at masklinn.net (Masklinn) Date: Mon, 6 Jan 2014 12:59:13 +0100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8F2BE13C-0AB4-4405-87E4-C45BCC6B224F@masklinn.net> On 2014-01-06, at 11:57 , Stephen J. Turnbull wrote: > Geert Jansen writes: > >> I'm not missing a new type, but I am missing the format method on the >> binary types. > > I'm curious about precisely what your use cases are, and just what > formatting they need. Building up protocol output, especially (but not solely) ascii-based ones, from existing or computed parts. Basically the same reasons behind Erlang's bit syntax (on the building side thereof): http://www.erlang.org/doc/programming_examples/bit_syntax.html Essentially a partial and more readable (especially more readable) version of what `struct` provides, and one in which the "pattern" can contain literal constant content. `struct` is nice, but it doesn't scale very well to big binary creation, and it's fairly horrible when part of the output is constant as constant parts *still* have to be patterned and injected as parameters. Also, no support for keyword arguments. From denis.spir at gmail.com Mon Jan 6 13:34:01 2014 From: denis.spir at gmail.com (spir) Date: Mon, 06 Jan 2014 13:34:01 +0100 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> Message-ID: <52CAA2B9.7090607@gmail.com> On 01/05/2014 06:49 PM, Eric Snow wrote: > On Jan 5, 2014 4:10 AM, "David Townshend" wrote: >> >> Reading this thread made me start to think about why a string is a > sequence, and I can't actually see any obvious reason, other than > historical ones. > > Sometimes I think it would be more clear if strings weren't sequences but > had various attributes that exposed sequence "views", e.g. codepoints, > etc. Making strings non-sequences isn't realistic at this point, but > adding the sequence view attributes may still be nice. > > That said, at present it's not something I personally have any use case > for. There was an article floating around the web recently where the > deficiencies of unicode implementations was discussed and I recall > something there or in related discussions about use cases for having > different views into a string. Wow that was vague. :) The different views > into unicode strings certainly comes up from time to time on our lists. This does not fit the picture as long as strings are indexable and sliceable, in my opinion. But most importantly, from the user practice & experience perspective, and while from a theoretical one it may be debattable, I consider it a great feature of python that everyday "mondane" string processing can be done using simple and easy Python string routines (i include here indexing & slicing). Alternatives would be regexes (read: Perl) and/or matching/parsing/searching libs (eg pyparsing) everywhere in python code; both are difficult, error-prone, hard to debug. The former are plain esoteric (but terribly practicle ;-), and I'm happy to rarely have to decipher *others'* regexes when reading python code (my own are far easier, indeed ;-). Denis From ram.rachum at gmail.com Mon Jan 6 14:28:48 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Mon, 6 Jan 2014 05:28:48 -0800 (PST) Subject: [Python-ideas] Getting file name of Path without suffix Message-ID: <90104c20-7543-4b76-9108-b69b1ee2928f@googlegroups.com> Hi guys, What do you think about introducing this Path property: @property def suffixless_name(self): return self.name[:-len(self.suffix)] if self.suffix else self.name It's simple but I'd really hate to have this conditional slicing in user code. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jan 6 14:46:05 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 6 Jan 2014 14:46:05 +0100 Subject: [Python-ideas] Getting file name of Path without suffix References: <90104c20-7543-4b76-9108-b69b1ee2928f@googlegroups.com> Message-ID: <20140106144605.15a3b3f5@fsol> On Mon, 6 Jan 2014 05:28:48 -0800 (PST) Ram Rachum wrote: > Hi guys, > > What do you think about introducing this Path property: > > @property > def suffixless_name(self): > return self.name[:-len(self.suffix)] if self.suffix else self.name > > It's simple but I'd really hate to have this conditional slicing in user > code. Have you tried .stem? http://docs.python.org/dev/library/pathlib.html#pathlib.PurePath.stem Regards Antoine. From breamoreboy at yahoo.co.uk Mon Jan 6 14:57:55 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 06 Jan 2014 13:57:55 +0000 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: On 06/01/2014 08:28, Geert Jansen wrote: > On Sun, Jan 5, 2014 at 8:33 PM, Ethan Furman wrote: >> As anyone who has worked with Python 3 and low-level protocols knows, Python >> 3 has no 'bytestring' type. It has immutable and mutable versions of arrays >> of integers, otherwise known as 'bytes' and 'bytearray'. >> >> How many would be interested in having a 'bytestring'? > > I'm not missing a new type, but I am missing the format method on the > binary types. > > Regards, > Geert Is this what the new PEP 460 is aimed at or am I again barking in the wrong forest? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Mon Jan 6 15:50:40 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 00:50:40 +1000 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: On 6 Jan 2014 21:58, "Mark Lawrence" wrote: > > On 06/01/2014 08:28, Geert Jansen wrote: >> >> On Sun, Jan 5, 2014 at 8:33 PM, Ethan Furman wrote: >>> >>> As anyone who has worked with Python 3 and low-level protocols knows, Python >>> 3 has no 'bytestring' type. It has immutable and mutable versions of arrays >>> of integers, otherwise known as 'bytes' and 'bytearray'. >>> >>> How many would be interested in having a 'bytestring'? >> >> >> I'm not missing a new type, but I am missing the format method on the >> binary types. >> >> Regards, >> Geert > > > Is this what the new PEP 460 is aimed at or am I again barking in the wrong forest? Yep, parallel discussions. Cheers, Nick. > > -- > My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. > > Mark Lawrence > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 6 15:58:30 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Jan 2014 00:58:30 +1000 Subject: [Python-ideas] a new bytestring type? In-Reply-To: <1389006965.30768.YahooMailNeo@web181006.mail.ne1.yahoo.com> References: <52C9B39F.6060205@stoneleaf.us> <1389006965.30768.YahooMailNeo@web181006.mail.ne1.yahoo.com> Message-ID: On 6 Jan 2014 19:16, "Andrew Barnert" wrote: > > From: Nick Coghlan > Sent: Sunday, January 5, 2014 2:57 PM > > > >I actually expected someone to have experimented with an "encodedstr" type by now. This would be a type that behaved like the Python 2 str type, but had an encoding attribute. On encountering Unicode text strings, it would encode then appropriately. > > I did something like this when I was first playing with 3.0, and I managed to find it. > > I tried two different implementations, a bytes subclass that fakes being a str as well as possible by decoding on the fly (or, in some cases, by encoding its arguments on the fly), and a str that fakes being a bytes as well as possible by doing the opposite. > > >However, people have generally instead followed the model of decoding to text and operating in that domain, since it avoids a lot of subtle issues (like accidentally embedding byte order marks when concatenating strings). > > > It's also conceptually cleaner to work with text as text instead of as bytes that you can sort of use as text. > > Also, one major reason people resist working with text (or upgrading to 3.x) is the perceived performance costs of dealing with Unicode. But if you want to do any kind of string processing on your text beyond searching for ASCII header names and the like, you pretty much have to do it as Unicode or it's wrong. So, you'd need something that allows you to do those ASCII header searches in 8-bit-land, but either doesn't allow full string processing, or automatically decodes and re-encodes on the fly (which obviously isn't going to be faster). > > >This is likely encouraged by the fact that str, bytes and bytearray don't currently implement type coercion correctly (which in turn is due to a long standing bug in the way the abstract C API handles sequence types defined in C rather than Python), so an encodedstr type would need to inherit from str or bytes to get interoperability, and then wouldn't interoperate with the other one. > > > What's the bug? http://bugs.python.org/issue11477 CPython doesn't check for NotImplemented results from sq_concat or sq_repeat, so the sequence implementations raise TypeError directly and the RHS doesn't get consulted to see if it can handle the operation. Subclassing works anyway because subclasses are always checked first even when they're the RHS. Thanks for the info on your experiences with attempting to implement an encodedstr type. I still feel there is potential merit to the concept, but it's certainly going to take some thought. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jan 6 18:14:07 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 02:14:07 +0900 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87eh4lhw2o.fsf@uwakimon.sk.tsukuba.ac.jp> Geert Jansen writes: > One use case I came across was when creating chunks for the HTTP > chunked encoding. Chunks contain a ascii header, a raw/encoded chunk > body, and an ascii trailer. Using a bytes.format, it would look like > this: > > chunk = '{0:X}\r\n{1}\r\n'.format(len(buf), buf) You forgot the b prefix. > This is what I am using now: > > chunk = bytearray() > chunk.extend('{0:X}\r\n'.format(len(buf)).encode('ascii')) > chunk.extend(buf) > chunk.extend('\r\n'.encode('ascii')) Either of those is a big win compared to this? # OK, we'd want efficient definition of a bunch of these, # which is a cost. def itox (n): return '{0:X}'.format(n).encode('ascii') chunk = b'\r\n'.join([itox(len(buf)), buf, b'']) But see my response to Andrew, also. From stephen at xemacs.org Mon Jan 6 18:16:23 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 02:16:23 +0900 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: <87d2k5hvyw.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Lawrence writes: > Is this what the new PEP 460 is aimed at or am I again barking in the > wrong forest? Sure, but that's only hours old. And I think there's a better way. From stephen at xemacs.org Mon Jan 6 19:37:36 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 03:37:36 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> Aside: I just read Victor's PEP 460, and apparently a lot of the assumptions I'm making are true! Andrew Barnert writes: > From: Geert Jansen > > On Mon, Jan 6, 2014 at 11:57 AM, Stephen J. Turnbull > > wrote: > > > >> ? > I'm not missing a new type, but I am missing the format method on > >> > the binary types. > >> > >> I'm curious about precisely what your use cases are, and just what > >> formatting they need. > > Besides Geert's chunked HTTP example, there are tons of intern > protocols and?file formats (including Python source code!), Python source code must use an ASCII-compatible encoding to use PEP 263. No widechars, no EBCDIC. But yes, I know about ASCII header formats -- I'm a Mailman developer. > that have ASCII headers (that in some way define an encoding for > the actual payload). So things like > b'Content-Length: {}'.format(len(payload)) > or even > b'Content-Type: text/html; charset={}'.format(encoding) > are useful. Useful, sure. But that much more useful than the alternative? What's wrong with def itob(n): # besides efficiency :-) return "{0:d}".format(n).encode('ascii') b'Content-Length: ' + itob(len(payload)) b'Content-Type: text/html; charset=' + encoding for such cases? Not to forget that for cases with multiple parts to combine, bytes.join() is way fast -- which matters to most people who want these operations. So I just don't see a real need for generic formatting operations here. (regex is another matter, but that's already implemented.) > As for assuming that it's ASCII-compatible, again, there are all > kinds of protocols that work with any ASCII-compatbile charset but > don't work otherwise. If you *can* assume it's ASCII-compatible bytes, what's wrong with str in Python 3? The basic idea is to use inbytes.decode('ascii', errors='surrogateescape') which will DTRT if you try to encode it without the surrogateescape handler: it raises an exception unless the bytes is pure ASCII. It's memory-efficient for pure ASCII, and has all the string facilities we love. But of course it would be too painful for sending JPEGs by chunked HTTP a la Geert. So ... now that we have the flexible string representation (PEP 393), let's add a 7-bit representation! (Don't take that too seriously, there are interesting more general variants I'm not going to talk about tonight.) The 7-bit representation satisfies the following requirements: 1. It is only produced on input by a new 'ascii-compatible' codec, which sets the "7-bit representation" flag in the str object on input if it encounters any non-ASCII bytes (if pure ASCII, it produces an 8-bit str object). This will be slower than just reading in the bytes in many cases, but I hope not unacceptably so. 2. When sliced, the result needs to be checked for non-ASCII bytes. If none, the result is promoted to 8-bit. 3. When combined with a str in 8-bit representation: a. If the 8-bit str contains any Latin-1 or C1 characters, both strs are promoted to 16-bit, and non-ASCII characters in the 7-bit string are converted by the surrogateescape handler. b. Otherwise they're combined into a 7-bit str. 4. When combined with a str in 16-bit or 32-bit representation, the 7-bit string is "decoded" to the same representation, as if using the 'ascii' codec with the 'surrogateescape' handler. 5. String methods that would raise or produce undefined results if used on str containing surrogate-encoded bytes need to be taught to do the same on non-ASCII bytes in 7-bit str objects. 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str and pure ASCII 8-bit str, and raises on anything else. (Sorry, no, ISO 8859-1 does *not* get passed through without exception.) 7. On output other codecs raise on a 7-bit str, unless the surrogateescape handler is in use. IOW, it's almost as fast as bytes if you restrict yourself to ASCII- compatible behavior, and you pay the price if you try to mix it with "real" Unicode str objects. Otherwise you can do anything with it you could do with a str. I don't think this actually has serious efficiency implications for Unicode handling, since the relevant compatibility tests need to be done anyway when combining strs. All the expensive operations occur when mixing 7-bit str and "real" non-ASCII Unicode, but we really don't want to do that if we can avoid it, any more than we want to use surrogate encoding if we can avoid it. Efficiency for low-level protocols could be improved by having the 'ascii-compatible' codec always produce 7-bit. I haven't thought carefully about this yet. For same reasons, there should be few surprises where people inadvertantly mix 7-bit str with "real" Unicode, since creating 7-bit is only done by the 'ascii-compatible' codec. People who are doing that will be using ASCII compatible protocols and should be used to being careful with non-ASCII bytes. Finally, none of the natural idioms require a b prefix on their literals. :-) N.B. Much of the above assumes that working with Unicode in 8-bit representation is basically as efficient as working with bytes. That is an assumption on my part, I hope it's verified. Comments? From dreamingforward at gmail.com Mon Jan 6 19:53:29 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 6 Jan 2014 12:53:29 -0600 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: >> How many would be interested in having a 'bytestring'? > > I'm not missing a new type, but I am missing the format method on the > binary types. Wouldn't a type "cast" like TextFile(bytestring) be sufficient? markj From tjreedy at udel.edu Tue Jan 7 00:39:10 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 06 Jan 2014 18:39:10 -0500 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On 1/6/2014 3:09 AM, Devin Jeanpierre wrote: > On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy wrote: >>> On Jan 5, 2014, at 3:09, David Townshend >>> wrote: >>> >>>> Reading this thread made me start to think about why a string is a >>>> sequence, >> >> >> Because a string is defined in math/language theory as a sequence of symbols >> from an alphabet. If you want to invent or define something else, such as an >> atomic symbol type, please use a different term. For example: > > And sequences in math / CS are functions from the natural > numbers to elements of the sequence. And functions (mappings) in math are defined either by a rule for calculating the output from the input or by a table (set of pairs) giving the output for each input. If the input domain is the finite sequence of counts from 0 to k, the table can be condensed to a sequence of k+1 output values. > Since isinstance(str, types.FunctionType) isn't True, Python has multiple builtin callable types, and users can define more, so you need to expand that test. Anyway, since a string is not a function defined by rule, it must be a function defined by a table. Since the input domain is a finite sequence of counts, we can and do condense the table to a sequence of output values. Which is an expansion of what I said. > [snip] -- Terry Jan Reedy From ethan at stoneleaf.us Mon Jan 6 23:59:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 06 Jan 2014 14:59:11 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52CB353F.7000805@stoneleaf.us> On 01/06/2014 10:37 AM, Stephen J. Turnbull wrote: > > Comments? Having a 7-bit str variant is definitely an interesting idea, but it wouldn't help me and is probably insufficient for network protocols as well. The binary data I deal with occupies the full 0-255 range, some of which is actually encoded text (and I decode it before passing it back to the user), some of which is simple binary data, and some of which is simple ASCII (metadata about fields and whatnot). -- ~Ethan~ From jeanpierreda at gmail.com Tue Jan 7 01:20:18 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 6 Jan 2014 16:20:18 -0800 Subject: [Python-ideas] str.startswith taking any iterator instead of just tuple In-Reply-To: References: <52C5CC21.5030002@dontusethiscode.com> <52C5F854.90306@dontusethiscode.com> <570040DC-C0F5-4E46-8D8A-1F0AE144D0B9@yahoo.com> Message-ID: On Mon, Jan 6, 2014 at 3:39 PM, Terry Reedy wrote: >> Since isinstance(str, types.FunctionType) isn't True, > > Python has multiple builtin callable types, and users can define more, so > you need to expand that test. Anyway, since a string is not a function > defined by rule, it must be a function defined by a table. Since the input > domain is a finite sequence of counts, we can and do condense the table to a > sequence of output values. Which is an expansion of what I said. No, I don't need to expand the test -- the limitation of the test was the entire point. I was making fun of your argument that because the mathematical terms are the same, therefore they must be the same in Python. "strings are sequences in math, therefore they are in python" is a superficial and fundamentally wrong argument. Here's another argument of that form: "the nth element of a string is not a string in math, therefore the nth element of a string is not a string in Python". That's a lie, of course. There are too many ways that type of argument falls flat. -- Devin From stephen at xemacs.org Tue Jan 7 06:05:47 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 14:05:47 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CB353F.7000805@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> Message-ID: <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Having a 7-bit str variant is definitely an interesting idea, but > it wouldn't help me and is probably insufficient for network > protocols as well. I'd like evidence for that latter. > The binary data I deal with occupies the full 0-255 range, My proposal deals with such data. It simply prevents the program from interpreting the 128-255 range as Unicode characters. You can still use regexps etc on the full range 0-255. > some of which is actually encoded text (and I decode it before > passing it back to the user), some of which is simple binary data, > and some of which is simple ASCII (metadata about fields and > whatnot). You're wrong, it would help you. Encoded text must be decoded, and in that case it doesn't help you. Unless you can treat it as a single ASCII-compatible encoding (eg, this works for ISO-8859 or KOI8), when the proposal wins for you. Binary data and pure ASCII, the proposal wins for you, unless you're worried about spurious recognition of the binary data as ASCII metadata. In that last case, again, nothing is going to help you as it's a domain problem. My proposal is undefeated in your use case. From geertj at gmail.com Tue Jan 7 06:32:11 2014 From: geertj at gmail.com (Geert Jansen) Date: Tue, 7 Jan 2014 06:32:11 +0100 Subject: [Python-ideas] a new bytestring type? In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> Message-ID: On Mon, Jan 6, 2014 at 7:53 PM, Mark Janssen wrote: >>> How many would be interested in having a 'bytestring'? >> >> I'm not missing a new type, but I am missing the format method on the >> binary types. > > Wouldn't a type "cast" like TextFile(bytestring) be sufficient? Unless I'm missing something, no. For the use case described the result needs to be a bytes object. Regards, Geert From ethan at stoneleaf.us Tue Jan 7 06:51:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 06 Jan 2014 21:51:11 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52CB95CF.3080801@stoneleaf.us> On 01/06/2014 09:05 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: >> >> The binary data I deal with occupies the full 0-255 range, > > My proposal deals with such data. It simply prevents the program from > interpreting the 128-255 range as Unicode characters. You can still > use regexps etc on the full range 0-255. > >> some of which is actually encoded text (and I decode it before >> passing it back to the user), some of which is simple binary data, >> and some of which is simple ASCII (metadata about fields and >> whatnot). > > You're wrong, it would help you. Encoded text must be decoded, and in > that case it doesn't help you. Unless you can treat it as a single > ASCII-compatible encoding (eg, this works for ISO-8859 or KOI8), when > the proposal wins for you. Binary data and pure ASCII, the proposal > wins for you, unless you're worried about spurious recognition of the > binary data as ASCII metadata. In that last case, again, nothing is > going to help you as it's a domain problem. My proposal is undefeated > in your use case. I just read your proposal again, and must admit I don't understand how it would help me, but I look forward to testing an implementation! One wrinkle, though -- the data is binary, and if read would have to be read using the latin1 codec... although, I suppose I could open it, read the first 32 bytes, close it, figure out the encoding, reopen with the encoding.... hmmmm -- yup, still not sure how it would all work, but looking forward to testing it. -- ~Ethan~ From stephen at xemacs.org Tue Jan 7 14:00:17 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 07 Jan 2014 22:00:17 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CB95CF.3080801@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> Message-ID: <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > I just read your proposal again, and must admit I don't understand > how it would help me, but I look forward to testing an > implementation! > > One wrinkle, though -- the data is binary, and if read would have > to be read using the latin1 codec... That depends on what you mean by "binary". If the binary payload is just a blob that gets passed on (eg, as in an HTTP client receiving and storing a JPEG file), you read the stream as 'ascii-compatible', parse the headers using regexps or whatever, print any relevant parsed data to logs using 'ascii-compatible', slice off the blob, and write the blob to disk as 'ascii-compatible'. This has the advantage over latin1 that the bytes are marked as "uninterpreted text". It doesn't mean you can't create mojibake; you still can. But Python will complain if you try to output it as text in an encoding (unless you use the 'surrogateescape' handler, in which case you're explicitly accepting responsibility for any mess you create). If you mean to process the binary, it would depend on what you want to do whether it would help or not. struct- and ctypes-style processing, no, it won't help because you need to convert to bytes to use those. (It might make sense to read the headers into a buffer this way, parse them as ASCII-compatible text, and then read the rest as bytes.) Pure byte code, doesn't help, although it probably doesn't hurt. From steve at pearwood.info Tue Jan 7 16:44:03 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jan 2014 02:44:03 +1100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140107154401.GK29356@ando> On Tue, Jan 07, 2014 at 03:37:36AM +0900, Stephen J. Turnbull wrote: > So ... now that we have the flexible string representation (PEP 393), > let's add a 7-bit representation! (Don't take that too seriously, > there are interesting more general variants I'm not going to talk > about tonight.) > > The 7-bit representation satisfies the following requirements: > > 1. It is only produced on input by a new 'ascii-compatible' codec, > which sets the "7-bit representation" flag in the str object on > input if it encounters any non-ASCII bytes (if pure ASCII, it > produces an 8-bit str object). This will be slower than just > reading in the bytes in many cases, but I hope not unacceptably so. I'm confused by your suggestion here. It seems to me that you've got the conditions backwards. (Or I don't understand them.) Perhaps a couple of examples will make it clear. Suppose we take a pure-ASCII byte-string and decode it: b'abcd'.decode('ascii-compatible') According to the above, this will produce a regular str object, 'abcd', using the regular 8-bit internal representation, and the "7-bit repr" flag cleared. Correct? (So the flag is *cleared* when all the chars in the string are 7-bit, and *set* when at least one is not. Yes?) Suppose we take a byte-string with a non-ASCII byte: b'abc\xFF'.decode('ascii-compatible') This will return... what? I think it returns a so-called 7-bit representation, but I'm not sure what it is a representation of. I presume the internals will actually contain the four bytes 61 62 63 FF and the "7-bit repr" flag will be set. Is that flag the only difference between these two strings? b'abc\xFF'.decode('ascii-compatible') 'abc\xFF' Presumably they will compare equal, yes? > 2. When sliced, the result needs to be checked for non-ASCII bytes. > If none, the result is promoted to 8-bit. > > 3. When combined with a str in 8-bit representation: > > a. If the 8-bit str contains any Latin-1 or C1 characters, both > strs are promoted to 16-bit, and non-ASCII characters in the > 7-bit string are converted by the surrogateescape handler. > > b. Otherwise they're combined into a 7-bit str. A concrete example: s = b'abcd'.decode('ascii-compatible') t = 'x' # ASCII-compatible s + t => returns 'abcdx', with the "7-bit repr" flag cleared. s = b'abcd'.decode('ascii-compatible') t = '?' # U+00FF, non-ASCII. s + t => returns 'abcd\uDCFF', with the "7-bit repr" flag set The \uDCFF at the end is the ? encoded with the surrogateescape error handler. There's a problem with this: two strings, visually indistinguishable, but differing only in the internal representation, give completely different results: b'abcd'.decode('ascii') + '?' => 'abcd\u00FF' b'abcd'.decode('ascii-compatible') + '?' => 'abcd\uDCFF' > 4. When combined with a str in 16-bit or 32-bit representation, the > 7-bit string is "decoded" to the same representation, as if using > the 'ascii' codec with the 'surrogateescape' handler. Another example: s = b'abcd'.decode('ascii-compatible') assert s = 'abcd' s + '?' => returns what? Your description confuses me. The "7-bit string" is already text, how do you decode it to the 16-bit internal representation? > 5. String methods that would raise or produce undefined results if > used on str containing surrogate-encoded bytes need to be taught > to do the same on non-ASCII bytes in 7-bit str objects. Do you have an example of such string methods? > 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str > and pure ASCII 8-bit str, and raises on anything else. (Sorry, > no, ISO 8859-1 does *not* get passed through without exception.) > > 7. On output other codecs raise on a 7-bit str, unless the > surrogateescape handler is in use. What do you mean by "on output"? Do you mean when encoding? This concerns me: b'abcd'.decode('ascii').encode('latin-1') => returns b'abcd' b'abcd'.decode('ascii-compatible').encode('latin-1') => raises And yet, the two 'abcd' strings you get are visually indistinguishable, and only differ by a hidden, internal flag. I've probably misunderstood something about your proposal, so please explain where I've gone wrong. Please give examples! -- Steven From ncoghlan at gmail.com Tue Jan 7 17:19:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Jan 2014 02:19:09 +1000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107154401.GK29356@ando> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> Message-ID: On 7 Jan 2014 23:45, "Steven D'Aprano" wrote: > > On Tue, Jan 07, 2014 at 03:37:36AM +0900, Stephen J. Turnbull wrote: > > > So ... now that we have the flexible string representation (PEP 393), > > let's add a 7-bit representation! (Don't take that too seriously, > > there are interesting more general variants I'm not going to talk > > about tonight.) > > > > The 7-bit representation satisfies the following requirements: > > > > 1. It is only produced on input by a new 'ascii-compatible' codec, > > which sets the "7-bit representation" flag in the str object on > > input if it encounters any non-ASCII bytes (if pure ASCII, it > > produces an 8-bit str object). This will be slower than just > > reading in the bytes in many cases, but I hope not unacceptably so. > > I'm confused by your suggestion here. It seems to me that you've got the > conditions backwards. (Or I don't understand them.) Perhaps a couple of > examples will make it clear. > > Suppose we take a pure-ASCII byte-string and decode it: > > b'abcd'.decode('ascii-compatible') > > According to the above, this will produce a regular str object, 'abcd', > using the regular 8-bit internal representation, and the "7-bit repr" > flag cleared. Correct? (So the flag is *cleared* when all the chars in > the string are 7-bit, and *set* when at least one is not. Yes?) > > Suppose we take a byte-string with a non-ASCII byte: > > b'abc\xFF'.decode('ascii-compatible') > > This will return... what? I think it returns a so-called 7-bit > representation, but I'm not sure what it is a representation of. I > presume the internals will actually contain the four bytes > > 61 62 63 FF > > and the "7-bit repr" flag will be set. Is that flag the only difference > between these two strings? > > b'abc\xFF'.decode('ascii-compatible') > 'abc\xFF' > > Presumably they will compare equal, yes? > > > > 2. When sliced, the result needs to be checked for non-ASCII bytes. > > If none, the result is promoted to 8-bit. > > > > 3. When combined with a str in 8-bit representation: > > > > a. If the 8-bit str contains any Latin-1 or C1 characters, both > > strs are promoted to 16-bit, and non-ASCII characters in the > > 7-bit string are converted by the surrogateescape handler. > > > > b. Otherwise they're combined into a 7-bit str. > > > A concrete example: > > s = b'abcd'.decode('ascii-compatible') > t = 'x' # ASCII-compatible > s + t > => returns 'abcdx', with the "7-bit repr" flag cleared. > > > s = b'abcd'.decode('ascii-compatible') > t = '?' # U+00FF, non-ASCII. > > s + t > => returns 'abcd\uDCFF', with the "7-bit repr" flag set > > The \uDCFF at the end is the ? encoded with the surrogateescape error > handler. > > There's a problem with this: two strings, visually indistinguishable, > but differing only in the internal representation, give completely > different results: > > b'abcd'.decode('ascii') + '?' > => 'abcd\u00FF' > > b'abcd'.decode('ascii-compatible') + '?' > => 'abcd\uDCFF' > > > > 4. When combined with a str in 16-bit or 32-bit representation, the > > 7-bit string is "decoded" to the same representation, as if using > > the 'ascii' codec with the 'surrogateescape' handler. > > Another example: > > s = b'abcd'.decode('ascii-compatible') > assert s = 'abcd' > s + '?' > => returns what? > > Your description confuses me. The "7-bit string" is already text, how do > you decode it to the 16-bit internal representation? > > > > 5. String methods that would raise or produce undefined results if > > used on str containing surrogate-encoded bytes need to be taught > > to do the same on non-ASCII bytes in 7-bit str objects. > > Do you have an example of such string methods? > > > > 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str > > and pure ASCII 8-bit str, and raises on anything else. (Sorry, > > no, ISO 8859-1 does *not* get passed through without exception.) > > > > 7. On output other codecs raise on a 7-bit str, unless the > > surrogateescape handler is in use. > > What do you mean by "on output"? Do you mean when encoding? > > This concerns me: > > b'abcd'.decode('ascii').encode('latin-1') > => returns b'abcd' > > b'abcd'.decode('ascii-compatible').encode('latin-1') > => raises > > And yet, the two 'abcd' strings you get are visually indistinguishable, > and only differ by a hidden, internal flag. > > I've probably misunderstood something about your proposal, so please > explain where I've gone wrong. Please give examples! I haven't been following the discussion in detail (linux.conf.au and the Py3 discussions have most of my attention this week), but I'm definitely not clear on how this 7-bit proposal differs meaningfully from just using ascii with the surrogateescape error handler. Cheers, Nick. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jan 7 18:46:15 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 7 Jan 2014 09:46:15 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107154401.GK29356@ando> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> Message-ID: I think Stephen's name "7-bit" is confusing people. If you try to interpret the name sensibly, you get Steven's broken interpretation. But if you read it as a nonsense word and work through the logic, it all makes sense. On Jan 7, 2014, at 7:44, Steven D'Aprano wrote: > On Tue, Jan 07, 2014 at 03:37:36AM +0900, Stephen J. Turnbull wrote: > >> So ... now that we have the flexible string representation (PEP 393), >> let's add a 7-bit representation! (Don't take that too seriously, >> there are interesting more general variants I'm not going to talk >> about tonight.) >> >> The 7-bit representation satisfies the following requirements: >> >> 1. It is only produced on input by a new 'ascii-compatible' codec, >> which sets the "7-bit representation" flag in the str object on >> input if it encounters any non-ASCII bytes (if pure ASCII, it >> produces an 8-bit str object). This will be slower than just >> reading in the bytes in many cases, but I hope not unacceptably so. > > I'm confused by your suggestion here. It seems to me that you've got the > conditions backwards. (Or I don't understand them.) Perhaps a couple of > examples will make it clear. > > Suppose we take a pure-ASCII byte-string and decode it: > > b'abcd'.decode('ascii-compatible') > > According to the above, this will produce a regular str object, 'abcd', > using the regular 8-bit internal representation, and the "7-bit repr" > flag cleared. Correct? (So the flag is *cleared* when all the chars in > the string are 7-bit, and *set* when at least one is not. Yes?) Correct. The floobl representation is not used because there are no non-ASCII bytes. > Suppose we take a byte-string with a non-ASCII byte: > > b'abc\xFF'.decode('ascii-compatible') > > This will return... what? I think it returns a so-called 7-bit > representation, but I'm not sure what it is a representation of. The representation is the bytes 61 62 63 FF with the floobl flag set. It's a representation of an 'a' char, a 'b' char, a 'c' char, and a smuggled FF byte--identical to 'abc\uDCFF'. (This last bit is the part I'm a bit wary of, as it promoted surrogate-escape to being an inherent part of the meaning of Unicode strings in Python. But maybe Stephen has an answer for that. And anyway, it's a much smaller problem than the one you think is there.) > I > presume the internals will actually contain the four bytes > > 61 62 63 FF > > and the "7-bit repr" flag will be set. Is that flag the only difference > between these two strings? > > b'abc\xFF'.decode('ascii-compatible') > 'abc\xFF' The floobl flag is the only difference between the two internal representations, but there's a big difference in the meaning. > Presumably they will compare equal, yes? I would hope not. One of them has the Unicode character U+FF, the other has smuggled byte 0xFF, so they'd better not compare equal. However, the latter should compare equal to 'abc\uDCFF'. That's the entire key here: the new representation is nothing but a more compact way to represent strings that contain nothing but ASCII and surrogate escapes. >> 2. When sliced, the result needs to be checked for non-ASCII bytes. >> If none, the result is promoted to 8-bit. >> >> 3. When combined with a str in 8-bit representation: >> >> a. If the 8-bit str contains any Latin-1 or C1 characters, both >> strs are promoted to 16-bit, and non-ASCII characters in the >> 7-bit string are converted by the surrogateescape handler. >> >> b. Otherwise they're combined into a 7-bit str. > > > A concrete example: > > s = b'abcd'.decode('ascii-compatible') > t = 'x' # ASCII-compatible > s + t > => returns 'abcdx', with the "7-bit repr" flag cleared. Right. Here both s and t are normal 8-bit strings reprs in the first place, so the new logic doesn't even get invoked. So yes, that's what it returns. > s = b'abcd'.decode('ascii-compatible') > t = '?' # U+00FF, non-ASCII. > > s + t > => returns 'abcd\uDCFF', with the "7-bit repr" flag set No, you've missed two key bits here. First, you're again adding two regular 8-bit-repr strings, not a non-ASCII-smuggling string plus an 8-bit, so the new logic doesn't get invoked at all. Plus, even if s were a 7-bit-flagged string like 'ab\xfe'.decode('ascii-compatible'), that wouldn't turn t into \uDCFF. Only bytes in the floobl-flagged string are surrogate-escaped; characters in the normal string are handled normally. So you'd have 'ab\uDCFE\xFF'. Also, both strings are promoted to 16-bit, and the floobl flag is never set with 16-bit or 32-bit representations. > The \uDCFF at the end is the ? encoded with the surrogateescape error > handler. > > There's a problem with this: two strings, visually indistinguishable, > but differing only in the internal representation, give completely > different results: > > b'abcd'.decode('ascii') + '?' > => 'abcd\u00FF' > > b'abcd'.decode('ascii-compatible') + '?' > => 'abcd\uDCFF' Nope, again, these both give the first result. >> 4. When combined with a str in 16-bit or 32-bit representation, the >> 7-bit string is "decoded" to the same representation, as if using >> the 'ascii' codec with the 'surrogateescape' handler. > > Another example: > > s = b'abcd'.decode('ascii-compatible') > assert s = 'abcd' > s + '?' > => returns what? 'abcd?'. Since the first one is a plain 8-bit string, and the second a plain 16-bit string, the new logic never even gets involved. And again, if you change this so s is b'abc\xFE'.decode('ascii-compatible'), then you're adding a floobl string and a 16-bit string, so the FE byte gets encoded as DCFE, while the pi character is left unchanged, so you get 'abc\uDCFE?'. > Your description confuses me. The "7-bit string" is already text, how do > you decode it to the 16-bit internal representation? By decoding its representation as if it were bytes, using surrogate-escape. >> 5. String methods that would raise or produce undefined results if >> used on str containing surrogate-encoded bytes need to be taught >> to do the same on non-ASCII bytes in 7-bit str objects. > > Do you have an example of such string methods? > > >> 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str >> and pure ASCII 8-bit str, and raises on anything else. (Sorry, >> no, ISO 8859-1 does *not* get passed through without exception.) >> >> 7. On output other codecs raise on a 7-bit str, unless the >> surrogateescape handler is in use. > > What do you mean by "on output"? Do you mean when encoding? Presumably "output" means something like writing to a TextIOWrapper whose encoding whose codec is ascii-compatible. In which case you're right, it would be clearer to just say "when encoding". However, I think there's a mistake in the design of 6 here. Surely encoding 'abc\uDCFF' should give you the bytes 61 62 63 FF, not an exception, right? (Unless the idea is that such a string is guaranteed to have a floobl-flagged 8-bit representation, not a 16-bit one, no matter how you try to create it in Python or in C, and I don't think the other rules make that guarantee.) > > This concerns me: > > b'abcd'.decode('ascii').encode('latin-1') > => returns b'abcd' > > b'abcd'.decode('ascii-compatible').encode('latin-1') > => raises Nope. The decoding returns the string 'abcd', in normal 8-bit representation, in both cases. There are no non-ASCII bytes, so the floobl flag isn't set. So you get the same result either way. > And yet, the two 'abcd' strings you get are visually indistinguishable, > and only differ by a hidden, internal flag. > > I've probably misunderstood something about your proposal, so please > explain where I've gone wrong. Please give examples! > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Tue Jan 7 17:48:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 08:48:05 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52CC2FC5.9080906@stoneleaf.us> On 01/07/2014 05:00 AM, Stephen J. Turnbull wrote: > Ethan Furman writes: >> >> I just read your proposal again, and must admit I don't understand >> how it would help me, but I look forward to testing an >> implementation! >> >> One wrinkle, though -- the data is binary, and if read would have >> to be read using the latin1 codec... > > If you mean to process the binary, it would depend on what you want to > do whether it would help or not. struct- and ctypes-style processing, > no, it won't help because you need to convert to bytes to use those. > (It might make sense to read the headers into a buffer this way, parse > them as ASCII-compatible text, and then read the rest as bytes.) Pure > byte code, doesn't help, although it probably doesn't hurt. Sounds like it doesn't help me then. My binary stream is mixed: - binary that has to be converted (4-byte ints, for example) - ascii that has to be converted (ints stored as ascii text) - encoded text (character and memo fields) and the precise location of each varies from file to file. -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 18:57:33 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 18:57:33 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> Message-ID: <20140107185733.7ad1a3be@fsol> On Tue, 07 Jan 2014 08:48:05 -0800 Ethan Furman wrote: > - ascii that has to be converted (ints stored as ascii text) > - encoded text (character and memo fields) What is the difference supposed to be between those two? Regards Antoine. From abarnert at yahoo.com Tue Jan 7 19:11:07 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 7 Jan 2014 10:11:07 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I think there are three problems with your proposal--all of which I mentioned in the long reply to Steven, but I suspect many people tl;dr'd over that, and I like your proposal enough that I want to make sure either I'm wrong, or you fix them. So: On Jan 6, 2014, at 10:37, "Stephen J. Turnbull" wrote: > So ... now that we have the flexible string representation (PEP 393), > let's add a 7-bit representation! The name has confused both Steven and Nick into misinterpreting the idea, and it confused me until I read over the details twice and it finally clicked, and it still doesn't make sense after I understand what you mean. This is an 8-bit representation where non-ASCII bytes are used to smuggle non-ASCII bytes. Just like the existing 16-bit representation where surrogate escapes are used to smuggle non-ASCII bytes. It's not a 7-bit representation unless there's nothing but ASCII in it--and it's never used in the case where there's nothing but ASCII. I'm not sure what the right word is, but this isn't it. > 1. It is only produced on input by a new 'ascii-compatible' codec, This name might also be confusing people. > > 3. When combined with a str in 8-bit representation: > > a. If the 8-bit str contains any Latin-1 or C1 characters, both > strs are promoted to 16-bit, and non-ASCII characters in the > 7-bit string are converted by the surrogateescape handler. This part worries me a bit. The bytes 61 62 63 FF in this new representation actually _mean_ 'abc' followed by a smuggled FF byte. But the words 0061 0062 0063 DCFF in a 16-bit representation just mean 'abc\uDCFF', which _can be interpreted_, via the surrogate-escape mechanism, as 'abc' and a smuggled byte, but don't actually _mean_ that. It seems like your proposal only works if we change it so that they really _do_ mean that. > 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str > and pure ASCII 8-bit str, and raises on anything else. So if a 7-bit string gets converted to a surrogate-escaped 16-bit string, it can never be written out again? For a contrived example: (b'abc\xff'.decode('ascii-compatible') + '\u1234')[:4].encode('ascii-compatible') I'd expect to get back my b'abcd\xff'. But your rules give me an exception. Maybe you were expecting this to be taken care of in the slicing, but rule 1 makes that impossible; you can never get a 7-bit string by doing anything but decoding ascii-compatible (or combining two 7-bit strings). I think ascii-compatible has to accept non-8-bit-repr strings (by encoding ASCII as ASCII and surrogate escapes as bytes and everything else is an exception). This is necessary because 60 61 62 FF (7-bit) and 0061 0062 0063 DCFF (16-bit) are the same string anyway. But it's especially necessary because the former can be silently converted into the latter (and there's no way to even test whether that's happened). Of course that means biting the bullet and saying that \uDCFF in python really means a smuggled FF byte, rather than just being a way to smuggle an FF byte through Unicode if want to you do so explicitly. But as I said above, I think you've already bitten that bullet. From stephen at xemacs.org Tue Jan 7 19:33:19 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 03:33:19 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> Message-ID: <87ppo3hcb4.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > I haven't been following the discussion in detail (linux.conf.au and > the Py3 discussions have most of my attention this week), but I'm > definitely not clear on how this 7-bit proposal differs meaningfully > from just using ascii with the surrogateescape error handler. > Cheers, Nick. It doesn't differ meaningfully to me. I doubt I'll be writing any programs in the near future that aren't just as well and efficiently done by decoding as ascii with surrogateescape. It does give you an 8-bit representation, with the benefits that gives you (very fast encode and fast decode), whereas the ascii + surrogateescape approach gives you a 16-bit representation sometimes. Some people seem to care about that, eg, it seems to fit the chunked HTTP use-case perfectly. It gives you an 8-bit almost-bytes type without the b prefix on literals. I don't know if that would actually be useful to anybody. Finally (and again, I haven't thought this through) you have a halfway house that can in principle be mixed more or less freely with either bytes (and bytearray and memoryview) or Unicode, but not with both. (There is intentionally no way to get back to "ascii-compatible" representation from one of the other str representations, and in the same way combining with one of the bytes types would give a bytes type.) I realize this probably doesn't work without modification because as designed it *is* str and the type system wouldn't be able to distinguish between the ascii-compatible representation and a str in another representation. So maybe this would bring us back to the idea of a new bytestring type. I'll get back to Steven's post later, but it and others seem to be stuck in the greylist. (Hate spam, hate spam, hate what spam does to us....) From ethan at stoneleaf.us Tue Jan 7 19:10:19 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 10:10:19 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107185733.7ad1a3be@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> Message-ID: <52CC430B.6040908@stoneleaf.us> On 01/07/2014 09:57 AM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 08:48:05 -0800 > Ethan Furman wrote: >> - ascii that has to be converted (ints stored as ascii text) >> - encoded text (character and memo fields) > > What is the difference supposed to be between those two? The method used for conversion and the return type: - ascii-encoded text: b'123' --> int(123) - encoded text (ascii or russian or asian or ...): b'abc' --> u'abc' and for completeness: - binary integer: b'\x00\x01' --> int(1) -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 19:47:52 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 19:47:52 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> Message-ID: <20140107194752.304604a1@fsol> On Tue, 07 Jan 2014 10:10:19 -0800 Ethan Furman wrote: > On 01/07/2014 09:57 AM, Antoine Pitrou wrote: > > On Tue, 07 Jan 2014 08:48:05 -0800 > > Ethan Furman wrote: > >> - ascii that has to be converted (ints stored as ascii text) > >> - encoded text (character and memo fields) > > > > What is the difference supposed to be between those two? > > The method used for conversion and the return type: > > - ascii-encoded text: b'123' --> int(123) > - encoded text (ascii or russian or asian or ...): b'abc' --> u'abc' I'm sorry, I still don't parse this. What is it in Python 3.3 that prevents you from doing this? Regards Antoine. From ethan at stoneleaf.us Tue Jan 7 19:38:40 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 10:38:40 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC45E4.7010400@mrabarnett.plus.com> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> Message-ID: <52CC49B0.2090406@stoneleaf.us> On 01/07/2014 10:22 AM, MRAB wrote: > On 2014-01-07 17:46, Andrew Barnert wrote: >> On Jan 7, 2014, at 7:44, Steven D'Aprano wrote: >> > I was thinking about Ethan's suggestion of introducing a new bytestring > class and a lot of these suggestions are what I thought the bytestring > class could do. >>> >>> Suppose we take a pure-ASCII byte-string and decode it: >>> >>> b'abcd'.decode('ascii-compatible') >>> > That would be: > > bytestring(b'abcd') > > or even: > > bytestring('abcd') > > [snip] >> >>> Suppose we take a byte-string with a non-ASCII byte: >>> >>> b'abc\xFF'.decode('ascii-compatible') >>> > That would be: > > bytestring(b'abc\xFF') > > Bytes outside the ASCII range would be mapped to Unicode low > surrogates: > > bytestring(b'abc\xFF') == bytestring('abc\uDCFF') Not sure what you mean here. The resulting bytes should be 'abc\xFF' and of length 4. -- ~Ethan~ From python at mrabarnett.plus.com Tue Jan 7 20:32:26 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 07 Jan 2014 19:32:26 +0000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC49B0.2090406@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> Message-ID: <52CC564A.7040602@mrabarnett.plus.com> On 2014-01-07 18:38, Ethan Furman wrote: > On 01/07/2014 10:22 AM, MRAB wrote: >> On 2014-01-07 17:46, Andrew Barnert wrote: >>> On Jan 7, 2014, at 7:44, Steven D'Aprano wrote: >>> >> I was thinking about Ethan's suggestion of introducing a new bytestring >> class and a lot of these suggestions are what I thought the bytestring >> class could do. > >>>> >>>> Suppose we take a pure-ASCII byte-string and decode it: >>>> >>>> b'abcd'.decode('ascii-compatible') >>>> >> That would be: >> >> bytestring(b'abcd') >> >> or even: >> >> bytestring('abcd') >> >> [snip] >>> >>>> Suppose we take a byte-string with a non-ASCII byte: >>>> >>>> b'abc\xFF'.decode('ascii-compatible') >>>> >> That would be: >> >> bytestring(b'abc\xFF') >> >> Bytes outside the ASCII range would be mapped to Unicode low >> surrogates: >> >> bytestring(b'abc\xFF') == bytestring('abc\uDCFF') > > Not sure what you mean here. The resulting bytes should be 'abc\xFF' and of length 4. > 'abc\xFF' is a Unicode string, but you wouldn't be able to convert it to a bytestring because '\xFF' is a codepoint outside the ASCII range and not a low surrogate. From ethan at stoneleaf.us Tue Jan 7 19:57:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 10:57:04 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107194752.304604a1@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> Message-ID: <52CC4E00.7000903@stoneleaf.us> On 01/07/2014 10:47 AM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 10:10:19 -0800 > Ethan Furman wrote: >> On 01/07/2014 09:57 AM, Antoine Pitrou wrote: >>> On Tue, 07 Jan 2014 08:48:05 -0800 >>> Ethan Furman wrote: >>>> - ascii that has to be converted (ints stored as ascii text) >>>> - encoded text (character and memo fields) >>> >>> What is the difference supposed to be between those two? >> >> The method used for conversion and the return type: >> >> - ascii-encoded text: b'123' --> int(123) >> - encoded text (ascii or russian or asian or ...): b'abc' --> u'abc' > > I'm sorry, I still don't parse this. What is it in Python 3.3 that > prevents you from doing this? Nothing at all, and that part works fine. The trouble (for me) comes in when I try to use single bytes, either when creating or extracting. The above examples were to show that Stephen J Turnbull's idea wouldn't work for me. -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 20:59:36 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 20:59:36 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> Message-ID: <20140107205936.7706c393@fsol> On Tue, 07 Jan 2014 10:57:04 -0800 Ethan Furman wrote: > > Nothing at all, and that part works fine. > > The trouble (for me) comes in when I try to use single bytes, > either when creating or extracting. Hmm... aren't you exagerating the trouble? It's not very difficult to work with single bytes in Python 3... Regards Antoine. From ethan at stoneleaf.us Tue Jan 7 21:07:15 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:07:15 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107205936.7706c393@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> Message-ID: <52CC5E73.1060700@stoneleaf.us> On 01/07/2014 11:59 AM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 10:57:04 -0800 > Ethan Furman wrote: >> >> Nothing at all, and that part works fine. >> >> The trouble (for me) comes in when I try to use single bytes, >> either when creating or extracting. > > Hmm... aren't you exagerating the trouble? It's not very difficult to > work with single bytes in Python 3... No, I'm not. I don't think of b'C' as the integer 67 any more than I think of the number 256 as the bytes b'\x01\xFF'. I don't think of a series of bytes as a container anymore than I think of a series of characters as a container. -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 21:08:24 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 21:08:24 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> Message-ID: <20140107210824.1a60792d@fsol> On Tue, 07 Jan 2014 12:07:15 -0800 Ethan Furman wrote: > On 01/07/2014 11:59 AM, Antoine Pitrou wrote: > > On Tue, 07 Jan 2014 10:57:04 -0800 > > Ethan Furman wrote: > >> > >> Nothing at all, and that part works fine. > >> > >> The trouble (for me) comes in when I try to use single bytes, > >> either when creating or extracting. > > > > Hmm... aren't you exagerating the trouble? It's not very difficult to > > work with single bytes in Python 3... > > No, I'm not. I don't think of b'C' as the integer 67 any more than I > think of the number 256 as the bytes b'\x01\xFF'. Ethan, can you please show a practical issue you're having? From ethan at stoneleaf.us Tue Jan 7 20:43:49 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 11:43:49 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC564A.7040602@mrabarnett.plus.com> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> Message-ID: <52CC58F5.2050603@stoneleaf.us> On 01/07/2014 11:32 AM, MRAB wrote: > On 2014-01-07 18:38, Ethan Furman wrote: >> On 01/07/2014 10:22 AM, MRAB wrote: >>>> On Jan 7, 2014, at 7:44, Steven D'Aprano wrote: >>>> >>>>> Suppose we take a byte-string with a non-ASCII byte: >>>>> >>>>> b'abc\xFF'.decode('ascii-compatible') >>>>> >>> That would be: >>> >>> bytestring(b'abc\xFF') >>> >>> Bytes outside the ASCII range would be mapped to Unicode low >>> surrogates: >>> >>> bytestring(b'abc\xFF') == bytestring('abc\uDCFF') >> >> Not sure what you mean here. The resulting bytes should be 'abc\xFF' and of length 4. >> > 'abc\xFF' is a Unicode string, but you wouldn't be able to convert it > to a bytestring because '\xFF' is a codepoint outside the ASCII range > and not a low surrogate. I can see terminology is going to be a pain in this thread. ;) My vision for a bytestring type (more refined): - made up of single bytes in the range 0 - 255 (no unicode anywhere) - indexing returns a bytestring of length 1, not an integer (as bytes does) - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)' So my statement above of 'abc\xFF' should not be interpreted as a unicode string... I guess I'll use 'y' as an abbreviation for now: y'abc\xFF'. -- ~Ethan~ From guido at python.org Tue Jan 7 21:49:47 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jan 2014 10:49:47 -1000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC58F5.2050603@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> Message-ID: On Tue, Jan 7, 2014 at 9:43 AM, Ethan Furman wrote: > My vision for a bytestring type (more refined): > > - made up of single bytes in the range 0 - 255 (no unicode anywhere) > > - indexing returns a bytestring of length 1, not an integer (as bytes > does) > > - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not > 'bytestring(0, 0, 0, 0, 0, 0, 0)' It sounds like you are just unhappy with some of the behavior of the bytes object. I agree that these two behaviors are suboptimal, but it is just too late to change them, and it's not enough to add a new type -- not by a long shot. The constructor behavior can be changed using a custom factory function. The indexing behavior, unfortunately, needs to be dealt with by changing b[i] into b[i:i+1] everywhere. -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Tue Jan 7 21:58:12 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 07 Jan 2014 20:58:12 +0000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC58F5.2050603@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> Message-ID: <52CC6A64.8030000@mrabarnett.plus.com> On 2014-01-07 19:43, Ethan Furman wrote: > On 01/07/2014 11:32 AM, MRAB wrote: >> On 2014-01-07 18:38, Ethan Furman wrote: >>> On 01/07/2014 10:22 AM, MRAB wrote: >>>>> On Jan 7, 2014, at 7:44, Steven D'Aprano wrote: >>>>> >>>>>> Suppose we take a byte-string with a non-ASCII byte: >>>>>> >>>>>> b'abc\xFF'.decode('ascii-compatible') >>>>>> >>>> That would be: >>>> >>>> bytestring(b'abc\xFF') >>>> >>>> Bytes outside the ASCII range would be mapped to Unicode low >>>> surrogates: >>>> >>>> bytestring(b'abc\xFF') == bytestring('abc\uDCFF') >>> >>> Not sure what you mean here. The resulting bytes should be 'abc\xFF' and of length 4. >>> >> 'abc\xFF' is a Unicode string, but you wouldn't be able to convert it >> to a bytestring because '\xFF' is a codepoint outside the ASCII range >> and not a low surrogate. > > I can see terminology is going to be a pain in this thread. ;) > > My vision for a bytestring type (more refined): > > - made up of single bytes in the range 0 - 255 (no unicode anywhere) > > - indexing returns a bytestring of length 1, not an integer (as bytes does) > > - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)' > > So my statement above of 'abc\xFF' should not be interpreted as a unicode string... I guess I'll use 'y' as an > abbreviation for now: y'abc\xFF'. > No disagreement there. The point about Unicode is about how it could behave if mixed with Unicode strings. From ethan at stoneleaf.us Tue Jan 7 21:49:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:49:11 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140107210824.1a60792d@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140107210824.1a60792d@fsol> Message-ID: <52CC6847.6040608@stoneleaf.us> On 01/07/2014 12:08 PM, Antoine Pitrou wrote: > On Tue, 07 Jan 2014 12:07:15 -0800 > Ethan Furman wrote: >> On 01/07/2014 11:59 AM, Antoine Pitrou wrote: >>> On Tue, 07 Jan 2014 10:57:04 -0800 >>> Ethan Furman wrote: >>>> >>>> Nothing at all, and that part works fine. >>>> >>>> The trouble (for me) comes in when I try to use single bytes, >>>> either when creating or extracting. >>> >>> Hmm... aren't you exagerating the trouble? It's not very difficult to >>> work with single bytes in Python 3... >> >> No, I'm not. I don't think of b'C' as the integer 67 any more than I >> think of the number 256 as the bytes b'\x01\xFF'. > > Ethan, can you please show a practical issue you're having? Seriously? You've already agreed with me on my first two points at the beginning of this thread. It's safe to assume I was having practical issues with those points. -- ~Ethan~ From ethan at stoneleaf.us Tue Jan 7 21:58:40 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 12:58:40 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> Message-ID: <52CC6A80.60100@stoneleaf.us> On 01/07/2014 12:49 PM, Guido van Rossum wrote: > On Tue, Jan 7, 2014 at 9:43 AM, Ethan Furman wrote: >> My vision for a bytestring type (more refined): >> >> - made up of single bytes in the range 0 - 255 (no unicode anywhere) >> >> - indexing returns a bytestring of length 1, not an integer (as bytes >> does) >> >> - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not >> 'bytestring(0, 0, 0, 0, 0, 0, 0)' > > It sounds like you are just unhappy with some of the behavior of the > bytes object. I agree that these two behaviors are suboptimal, but it > is just too late to change them, and it's not enough to add a new type > -- not by a long shot. The constructor behavior can be changed using a > custom factory function. The indexing behavior, unfortunately, needs > to be dealt with by changing b[i] into b[i:i+1] everywhere. Of course I'm unhappy with it, it doesn't behave the way I think it should, and it's not consistent. The reason I started the thread was to hopefully gather others requirements to have a truly distinct and useful new type. Doesn't seem to have happened, though. :( Is it too late to change the repr for bytes? I can't think of anywhere else in the stdlib where what you see is not what you get: --> [0, 1, 2] [0, 1, 2] --> [0, 1, 2][1] 1 --> {'this':'that', 'these':'those'} {'this': 'that', 'these': 'those'} --> {'this':'that', 'these':'those'}['these'] 'those' --> 'abcdef' 'abcdef' --> 'abcdef'[3] 'd' But of course with bytes: --> b'abcdef' b'abcdef' --> b'abcdef'[3] 100 -- ~Ethan~ From solipsis at pitrou.net Tue Jan 7 22:48:12 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 7 Jan 2014 22:48:12 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140107210824.1a60792d@fsol> <52CC6847.6040608@stoneleaf.us> Message-ID: <20140107224812.7cb45316@fsol> On Tue, 07 Jan 2014 12:49:11 -0800 Ethan Furman wrote: > On 01/07/2014 12:08 PM, Antoine Pitrou wrote: > > On Tue, 07 Jan 2014 12:07:15 -0800 > > Ethan Furman wrote: > >> On 01/07/2014 11:59 AM, Antoine Pitrou wrote: > >>> On Tue, 07 Jan 2014 10:57:04 -0800 > >>> Ethan Furman wrote: > >>>> > >>>> Nothing at all, and that part works fine. > >>>> > >>>> The trouble (for me) comes in when I try to use single bytes, > >>>> either when creating or extracting. > >>> > >>> Hmm... aren't you exagerating the trouble? It's not very difficult to > >>> work with single bytes in Python 3... > >> > >> No, I'm not. I don't think of b'C' as the integer 67 any more than I > >> think of the number 256 as the bytes b'\x01\xFF'. > > > > Ethan, can you please show a practical issue you're having? > > Seriously? You've already agreed with me on my first two points at the beginning of this thread. It's safe to assume I > was having practical issues with those points. Well, I agree with those points, but I still think they're minor, and not very hard to workaround. Hence my comment about "exagerating the trouble". Regards Antoine. From guido at python.org Tue Jan 7 22:52:41 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jan 2014 11:52:41 -1000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC6A80.60100@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <52CC6A80.60100@stoneleaf.us> Message-ID: On Tue, Jan 7, 2014 at 10:58 AM, Ethan Furman wrote: > On 01/07/2014 12:49 PM, Guido van Rossum wrote: >> >> On Tue, Jan 7, 2014 at 9:43 AM, Ethan Furman wrote: >>> >>> My vision for a bytestring type (more refined): >>> >>> - made up of single bytes in the range 0 - 255 (no unicode anywhere) >>> >>> - indexing returns a bytestring of length 1, not an integer (as bytes >>> does) >>> >>> - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not >>> 'bytestring(0, 0, 0, 0, 0, 0, 0)' >> >> >> It sounds like you are just unhappy with some of the behavior of the >> bytes object. I agree that these two behaviors are suboptimal, but it >> is just too late to change them, and it's not enough to add a new type >> -- not by a long shot. The constructor behavior can be changed using a >> custom factory function. The indexing behavior, unfortunately, needs >> to be dealt with by changing b[i] into b[i:i+1] everywhere. > Of course I'm unhappy with it, it doesn't behave the way I think it should, > and it's not consistent. Consistent with what? (Before you rush in an answer, remember that there are almost always multiple sides to a consistency argument.) > The reason I started the thread was to hopefully gather others requirements > to have a truly distinct and useful new type. Doesn't seem to have > happened, though. :( So now is the time to man up and live with it. It's not going to change. > Is it too late to change the repr for bytes? Yes. > I can't think of anywhere else > in the stdlib where what you see is not what you get: > > --> [0, 1, 2] > [0, 1, 2] > > --> [0, 1, 2][1] > 1 > > --> {'this':'that', 'these':'those'} > {'this': 'that', 'these': 'those'} > > --> {'this':'that', 'these':'those'}['these'] > 'those' > > --> 'abcdef' > 'abcdef' > > --> 'abcdef'[3] > 'd' > > But of course with bytes: > > --> b'abcdef' > b'abcdef' > > --> b'abcdef'[3] > 100 I don't see what's wrong with those. Both produce valid expressions that, when entered, compare equal to the object whose repr() was printed. What more would you *want*? -- --Guido van Rossum (python.org/~guido) From python at 2sn.net Tue Jan 7 23:36:31 2014 From: python at 2sn.net (Alexander Heger) Date: Wed, 8 Jan 2014 09:36:31 +1100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <52CC6A80.60100@stoneleaf.us> Message-ID: >> Of course I'm unhappy with it, it doesn't behave the way I think it should, >> and it's not consistent. > > Consistent with what? (Before you rush in an answer, remember that > there are almost always multiple sides to a consistency argument.) > I don't see what's wrong with those. Both produce valid expressions > that, when entered, compare equal to the object whose repr() was > printed. What more would you *want*? I find that the definition str is inconsistent indeed, because the items in a string are strings again, not characters (or code points). I don't think there is too many other examples in Python where the same is true; indexing a list does not give a list but the item that is at the point. In [4]: type(b'abc') Out[4]: builtins.bytes In [5]: type(b'abc'[1]) Out[5]: builtins.int In [6]: type('abc') Out[6]: builtins.str In [7]: type('abc'[1]) Out[7]: builtins.str there is no byte type in Python, so the closest is int (there is a byte type in numpy); if there was one, indexing a byte array could return that, but I assume the use case would be quite limited. But that there is no "characters" but only strings of length one is a confusing concept. It is as of scalars were the same as arrays of length one. These are different concepts, however. (Though, admittedly, numpy will take arrays of length 1 as scalars at least in some cases as a convenience - though I think it should not as it prevent users from writing consistent code that will be easy to read later. The same is here the case for Python with strings.) In [11]: [1,2,3] + [1] Out[11]: [1, 2, 3, 1] In [12]: [1,2,3] + [1][0] TypeError: can only concatenate list (not "int") to list In [13]: 'abc' + 'd' Out[13]: 'abcd' In [14]: 'abc' + 'd'[0] Out[14]: 'abcd' so, yes, the interface to strings and arrays is inconsistent. At least in this aspect. From tjreedy at udel.edu Tue Jan 7 23:38:42 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 07 Jan 2014 17:38:42 -0500 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC58F5.2050603@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> Message-ID: On 1/7/2014 2:43 PM, Ethan Furman wrote: > My vision for a bytestring type (more refined): > - made up of single bytes in the range 0 - 255 (no unicode anywhere) > > - indexing returns a bytestring of length 1, not an integer (as bytes > does) > > - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not > 'bytestring(0, 0, 0, 0, 0, 0, 0)' To me, a major feature of Python is that it a) has more than one basic structure type (versus just strings or symbolic expressions) but b) is conservative in its multiplicity. It is not minimal, but it is minimalistic. It took over a decade for Guido to agree that Python should have separate built-in bool and set classes instead of just using ints as bools and tuples, lists, and dicts as sets, or using imported classes for either. The above describes a minor variation on bytes and seems to me to be a classic case for subclassing, whether in Python for ease or C for speed, in an imported module. The result could be kept private or made public as you wish. Yes, the minor differences would be important to you, the author of the subclass, but that is always the motivation for subclassing. One of the major advances in Python was to make it possible (in 2.2) to subclass the basic builtin structure classes. It seems to me that subclasses that work in multiple versions of Python, such as are already being used, are the appropriate solution to the specialized problems that people have with the Python string builtins. -- Terry Jan Reedy From guido at python.org Wed Jan 8 00:06:39 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Jan 2014 13:06:39 -1000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <52CC6A80.60100@stoneleaf.us> Message-ID: You're off-topic for this sub-thread. Ethan said he wanted to change the repr() of bytes, but didn't specify what change he wanted. The inconsistency in the *interface* is not under discussion any more (I've already said agree it is unfortunate, but not bad enough to warrant a new type or a backward incompatible change). On Tue, Jan 7, 2014 at 12:36 PM, Alexander Heger wrote: >>> Of course I'm unhappy with it, it doesn't behave the way I think it should, >>> and it's not consistent. >> >> Consistent with what? (Before you rush in an answer, remember that >> there are almost always multiple sides to a consistency argument.) > >> I don't see what's wrong with those. Both produce valid expressions >> that, when entered, compare equal to the object whose repr() was >> printed. What more would you *want*? > > I find that the definition str is inconsistent indeed, because the > items in a string are strings again, not characters (or code points). > I don't think there is too many other examples in Python where the > same is true; indexing a list does not give a list but the item that > is at the point. > > In [4]: type(b'abc') > Out[4]: builtins.bytes > > In [5]: type(b'abc'[1]) > Out[5]: builtins.int > > In [6]: type('abc') > Out[6]: builtins.str > > In [7]: type('abc'[1]) > Out[7]: builtins.str > > there is no byte type in Python, so the closest is int (there is a > byte type in numpy); if there was one, indexing a byte array could > return that, but I assume the use case would be quite limited. But > that there is no "characters" but only strings of length one is a > confusing concept. It is as of scalars were the same as arrays of > length one. These are different concepts, however. (Though, > admittedly, numpy will take arrays of length 1 as scalars at least in > some cases as a convenience - though I think it should not as it > prevent users from writing consistent code that will be easy to read > later. The same is here the case for Python with strings.) > > In [11]: [1,2,3] + [1] > Out[11]: [1, 2, 3, 1] > > In [12]: [1,2,3] + [1][0] > TypeError: can only concatenate list (not "int") to list > > In [13]: 'abc' + 'd' > Out[13]: 'abcd' > > In [14]: 'abc' + 'd'[0] > Out[14]: 'abcd' > > so, yes, the interface to strings and arrays is inconsistent. At > least in this aspect. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) From dreamingforward at gmail.com Wed Jan 8 00:20:45 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 7 Jan 2014 17:20:45 -0600 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC5E73.1060700@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> Message-ID: >>> The trouble (for me) comes in when I try to use single bytes, >>> either when creating or extracting. >> >> Hmm... aren't you exagerating the trouble? It's not very difficult to >> work with single bytes in Python 3... > > No, I'm not. I don't think of b'C' as the integer 67 any more than I think > of the number 256 as the bytes b'\x01\xFF'. There's something fundamentally wrong with these brainfarts coming out on the list. Just how, Ethan, did you think you could represent binary data in a text string, whether preceded by the char 'b' or not? What did you think you would do when you got to character 0, the first (pseudo)-symbol in ASCII? Why don't you jackasses start listening instead of wanking each other with bullshit? markj From dreamingforward at gmail.com Wed Jan 8 00:49:32 2014 From: dreamingforward at gmail.com (Mark Janssen) Date: Tue, 7 Jan 2014 17:49:32 -0600 Subject: [Python-ideas] The fools shall start sucking the cock. Message-ID: Okay, how's everyone doing with their Python 2 vs.3, bytes/unicode vs. shit-extruder expertise? Anyone need some relief, perhaps some guidance? markj *kicks feet up to table* From brett at python.org Wed Jan 8 01:07:19 2014 From: brett at python.org (Brett Cannon) Date: Tue, 7 Jan 2014 19:07:19 -0500 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: That language is not called for (what the heck is the subject line even supposed to mean?). While I'm not saying you can use a swear word here or there to punctuate a statement, being this over-the-top is not considerate of others. On Tue, Jan 7, 2014 at 6:49 PM, Mark Janssen wrote: > Okay, how's everyone doing with their Python 2 vs.3, bytes/unicode > vs. shit-extruder expertise? > > Anyone need some relief, perhaps some guidance? > > markj > *kicks feet up to table* > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jan 8 01:39:11 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jan 2014 11:39:11 +1100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC2FC5.9080906@stoneleaf.us> References: <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> Message-ID: <20140108003910.GL29356@ando> On Tue, Jan 07, 2014 at 08:48:05AM -0800, Ethan Furman wrote: > [...] My binary stream is mixed: > > - binary that has to be converted (4-byte ints, for example) > - ascii that has to be converted (ints stored as ascii text) > - encoded text (character and memo fields) Ethan, you keep referring to ascii text and encoded text as if they are different things. They're not. You have a binary file containing bytes. Some of those bytes represent data of one kind (say, 4-bit ints). Some of those bytes represent data of a different kind (Latin-1 encoded text representing character and memo fields) and other bytes represent data of a third kind (ASCII encoded text representing ints, but you don't mention what the meaning of those ints is). ASCII or Latin-1, the text is still encoded into bytes, and still needs to be decoded back to text. Since Latin-1 is a superset of ASCII, you could use Latin-1 for them all, and still get the same result. Of course you can't just decode the entire file into Latin-1, since parts of it represent non-text data, but you could decode all the text parts individually using Latin-1 and/or ASCII. (To those reading and wondering how I know the character and memo fields use Latin-1, Ethan has discussed this case on comp.lang.python.) -- Steven From ethan at stoneleaf.us Wed Jan 8 01:56:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 16:56:37 -0800 Subject: [Python-ideas] [OT] banning Mark Janssen Message-ID: <52CCA245.7090005@stoneleaf.us> Moderators, Mark Janssen's posts are becoming extremely abusive, which seems to me to be against he code of conduct. Can we ban him, at least from the mailing lists? -- ~Ethan~ From songofacandy at gmail.com Wed Jan 8 01:50:30 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 8 Jan 2014 09:50:30 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> Message-ID: I'm `PyMySQL `_ (pure Python MySQL driver) developer. I share my experience that I've suffered by bytes doesn't have %-format. `MySQL-python `_ is a most major DB-API 2.0 driver for MySQL. Other MySQL drivers like PyMySQL, MySQL-connector-python are designed compatible it as possible. MySQL-python uses 'format' paramstyle. http://www.python.org/dev/peps/pep-0249/#paramstyle https://github.com/farcepest/MySQLdb1/blob/master/MySQLdb/__init__.py#L27 MySQL protocol is basically encoded text, but it may contain arbitrary (escaped) binary. Here is simplified example constructing real SQL from SQL format and arguments. (Works only on Python 2.7) def escape_string(s): return s.replace("'", "''") def convert(x): if isinstance(x, unicode): x = x.encode('utf-8') # Use encoding assigned to connection in real. if isinstance(x, str): x = "'" + escape_string(x) + "'" # 'quoted and '' escaped string' else: x = str(x) # like 42 return x def build_query(query, *args): if isinstance(query, unicode): query = query.encode('utf-8') return query % tuple(map(convert, args)) textdata = b"hello" bindata = b"abc\xff\x00" query = "UPDATE table SET textcol=%s bincol=%s" print build_query(query, textdata, bindata) I can't port this to Python 3. Fortunately, MySQL supports hex string like x'616263ff00' So I use it and PyMySQL supports binary data on Python 3. But hex string consumes double space than normal (escaped) bytes. This is why I don't use hexstring on Python 2. https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/converters.py#L303 https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/converters.py#L71 On Wed, Jan 8, 2014 at 8:20 AM, Mark Janssen wrote: > >>> The trouble (for me) comes in when I try to use single bytes, > >>> either when creating or extracting. > >> > >> Hmm... aren't you exagerating the trouble? It's not very difficult to > >> work with single bytes in Python 3... > > > > No, I'm not. I don't think of b'C' as the integer 67 any more than I > think > > of the number 256 as the bytes b'\x01\xFF'. > > There's something fundamentally wrong with these brainfarts coming out > on the list. Just how, Ethan, did you think you could represent > binary data in a text string, whether preceded by the char 'b' or not? > What did you think you would do when you got to character 0, the > first (pseudo)-symbol in ASCII? > > Why don't you jackasses start listening instead of wanking each other > with bullshit? > > markj > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 8 02:19:38 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 07 Jan 2014 17:19:38 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140108003910.GL29356@ando> References: <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140108003910.GL29356@ando> Message-ID: <52CCA7AA.8090102@stoneleaf.us> On 01/07/2014 04:39 PM, Steven D'Aprano wrote: > On Tue, Jan 07, 2014 at 08:48:05AM -0800, Ethan Furman wrote: > >> [...] My binary stream is mixed: >> >> - binary that has to be converted (4-byte ints, for example) >> - ascii that has to be converted (ints stored as ascii text) >> - encoded text (character and memo fields) > > Ethan, you keep referring to ascii text and encoded text as if they are > different things. They're not. Would you feel better if I called them ASCII-encoded text, and other-encoded text? And they are different, if for no other reason than they are using different encodings. Further, the ASCII-encoded text can be directly compared with byte sequences because . . . they're bytes! ;) > You have a binary file containing bytes. > Some of those bytes represent data of one kind (say, 4-bit ints). Some > of those bytes represent data of a different kind (Latin-1 encoded text > representing character and memo fields) and other bytes represent data > of a third kind (ASCII encoded text representing ints, but you don't > mention what the meaning of those ints is). ASCII-encoded text reprenting ints are ints. I don't know what they mean, but presumably they have something to do with whatever the user named the field. For example, I would imagine that b'35' in an AGE field meant 35 years; luckily I only have to give the user back the integer 35, not figure out what it's supposed to mean. > ASCII or Latin-1, the text is still encoded into bytes, and still needs > to be decoded back to text. No, it doesn't. I don't need to convert b'35' into u'35' to convert to 35. I don't need to convert b'N' to u'N' to know I have a Numeric field, nor b'T' to u'T' to get True. -- ~Ethan~ From steve at pearwood.info Wed Jan 8 03:20:18 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Jan 2014 13:20:18 +1100 Subject: [Python-ideas] [OT] banning Mark Janssen In-Reply-To: <52CCA245.7090005@stoneleaf.us> References: <52CCA245.7090005@stoneleaf.us> Message-ID: <20140108022018.GN29356@ando> On Tue, Jan 07, 2014 at 04:56:37PM -0800, Ethan Furman wrote: > Moderators, > > Mark Janssen's posts are becoming extremely abusive, which seems to me to > be against he code of conduct. > > Can we ban him, at least from the mailing lists? I think he should be given one formal warning, but won't object if the moderators decide to just kick his arse out of here. It isn't as if he contributes anything useful to the discussion. For the record, I have no objection to swearing or profanity (we're all adults here, or at least we're supposed to act like them), but there is a difference between "rude words" and abuse, and Mark crosses the line into abuse. (I would also like to preemptively state that I object in the strongest possible terms to a blanket "no swearing" policy, just in case anyone is thinking of introducing such a thing.) -- Steven From abarnert at yahoo.com Wed Jan 8 04:32:22 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 7 Jan 2014 19:32:22 -0800 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: <8E16FD25-2B34-416A-BB34-0B2943FD3FD4@yahoo.com> On Jan 7, 2014, at 16:07, Brett Cannon wrote: > That language is not called for Personally, I find it useful. When I have no idea what a message means, sometimes that means I have to put more effort into it--maybe the author is way above my level of expertise, or maybe he's writing English as a third language--and sometimes it means I can just ignore it--maybe it's contentless, a troll, or the product of insanity. A subject line like this makes it much faster to figure out which case this is. > (what the heck is the subject line even supposed to mean?). While I'm not saying you can use a swear word here or there to punctuate a statement, being this over-the-top is not considerate of others. > > > On Tue, Jan 7, 2014 at 6:49 PM, Mark Janssen wrote: >> Okay, how's everyone doing with their Python 2 vs.3, bytes/unicode >> vs. shit-extruder expertise? >> >> Anyone need some relief, perhaps some guidance? >> >> markj >> *kicks feet up to table* >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Wed Jan 8 04:41:57 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 7 Jan 2014 19:41:57 -0800 Subject: [Python-ideas] [OT] banning Mark Janssen In-Reply-To: <20140108022018.GN29356@ando> References: <52CCA245.7090005@stoneleaf.us> <20140108022018.GN29356@ando> Message-ID: I'm for banning him. He has contributed in discussion occasionally, but abuse is abuse. On Tue, Jan 7, 2014 at 6:20 PM, Steven D'Aprano wrote: > On Tue, Jan 07, 2014 at 04:56:37PM -0800, Ethan Furman wrote: > > Moderators, > > > > Mark Janssen's posts are becoming extremely abusive, which seems to me to > > be against he code of conduct. > > > > Can we ban him, at least from the mailing lists? > > I think he should be given one formal warning, but won't object if the > moderators decide to just kick his arse out of here. It isn't as if he > contributes anything useful to the discussion. > > For the record, I have no objection to swearing or profanity (we're all > adults here, or at least we're supposed to act like them), but there is > a difference between "rude words" and abuse, and Mark crosses the line > into abuse. > > (I would also like to preemptively state that I object in the strongest > possible terms to a blanket "no swearing" policy, just in case anyone is > thinking of introducing such a thing.) > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Jan 8 04:52:01 2014 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 7 Jan 2014 21:52:01 -0600 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: [Brett Cannon] > That language is not called for (what the heck is the subject line even > supposed to mean?). Allow me to clarify: a cock is the male of any species of bird, not just a rooster. I was confused too before I looked that up ;-) > While I'm not saying you can use a swear word here or > there to punctuate a statement, being this over-the-top is > not considerate of others. I blame it on the PSF. Apparently we haven't been clear enough on what we're looking for when voting on Community Service Awards: http://www.python.org/community/awards/psf-awards/ still-wondering-what-the-wise-shall-start-doing-ly y'rs - tim From stephen at xemacs.org Wed Jan 8 07:04:44 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 15:04:44 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> Message-ID: <87fvozggar.fsf@uwakimon.sk.tsukuba.ac.jp> I'm responding here rather than directly to Steven because Andrew explains it as well as I could. In all cases where I don't comment, Andrew is 100% correct as to my intended semantics. The critical point is just that in cases where "the ASCII characters are themselves" and an 8-bit representation is theoretically possible, an 8-bit representation is used. More precisely, if the identities of 128-255 as characters is not important to the programmer, these bytes are not interpreted as characters, in the same way that surrogate- escaped bytes are uninterpreted in the current representation. Andrew Barnert writes: > I think Stephen's name "7-bit" is confusing people. Indeed, and I apologize for confusing Steven in particular, which is entirely due to that poor choice. > If you try to interpret the name sensibly, you get Steven's broken > interpretation. But if you read it as a nonsense word and work > through the logic, it all makes sense. Maybe "ascii-compatible" is better. It's a union type, including all encodings where octets 0-127 receive the standard mapping to the ASCII characters, but octets 128-255 are ambiguous. > > Suppose we take a byte-string with a non-ASCII byte: > > > > b'abc\xFF'.decode('ascii-compatible') > > > > This will return... what? I think it returns a so-called 7-bit > > representation, but I'm not sure what it is a representation of. > > The representation is the bytes 61 62 63 FF with the floobl flag > set. It's a representation of an 'a' char, a 'b' char, a 'c' char, > and a smuggled FF byte--identical to 'abc\uDCFF'. Except that it's an 8-bit representation invisible to Python except for maybe the timeit package, yes. > (This last bit is the part I'm a bit wary of, as it promoted > surrogate-escape to being an inherent part of the meaning of > Unicode strings in Python. They're already part of the inherent meaning of Unicode strings. The alternative is to read ASCII-compatible streams as latin1, which *changes their meaning*. > > Your description confuses me. The "7-bit string" is already text, how do > > you decode it to the 16-bit internal representation? > > By decoding its representation as if it were bytes, using surrogate-escape. Strictly speaking, it's not a "decoding", it's a change of internal representation. > >> 5. String methods that would raise or produce undefined results if > >> used on str containing surrogate-encoded bytes need to be taught > >> to do the same on non-ASCII bytes in 7-bit str objects. > > > > Do you have an example of such string methods? No, I don't, but I imagined there might be some. (My original example was case conversion, but that doesn't work because Python doesn't check for whether something is actually a code point that can be a character, even -- it just notices that surrogate-encoded bytes don't have alternative cases in the database and passes them through.) > >> 7. On output other codecs raise on a 7-bit str, unless the > >> surrogateescape handler is in use. > > > > What do you mean by "on output"? Do you mean when encoding? Yes. You (all, but Steven in particular) have my apology for the imprecision. > However, I think there's a mistake in the design of 6 here. Surely > encoding 'abc\uDCFF' should give you the bytes 61 62 63 FF, not an > exception, right? (Unless the idea is that such a string is > guaranteed to have a floobl-flagged 8-bit representation, not a > 16-bit one, no matter how you try to create it in Python or in C, > and I don't think the other rules make that guarantee.) Andrew is correct, that is a mistake in design. I thought an 8-bit representation was guaranteed in that case, with the "floobl" flag set. I think that Andrew's idea is correct, but this miss makes me nervous about the coherence of the concept. From stephen at xemacs.org Wed Jan 8 07:08:17 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 15:08:17 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> Message-ID: <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > The above describes a minor variation on bytes and seems to me to be a > classic case for subclassing, whether in Python for ease or C for speed, > in an imported module. I agree with you, but the discussion on python-dev indicates that the majority of core devs, including Guido IIUC, disagree with us. In fact they want to add many str-like capabilities to bytes (and the related mutable classes bytearray and memoryview). From stephen at xemacs.org Wed Jan 8 07:18:24 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 15:18:24 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CC2FC5.9080906@stoneleaf.us> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> Message-ID: <87d2k3gfnz.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Sounds like it doesn't help me then. My binary stream is mixed: > > - binary that has to be converted (4-byte ints, for example) > - ascii that has to be converted (ints stored as ascii text) > - encoded text (character and memo fields) > > and the precise location of each varies from file to file. Yes, I understand all that, but without code examples (or rather precise specification of the semantics you're implementing) I can't discuss whether my 'ascii-compatible' (the Artist Formerly Known as "7-bit representation") would help you write efficient and readable code. Cf. INADA-san's post for what would help me. From breamoreboy at yahoo.co.uk Wed Jan 8 09:02:20 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jan 2014 08:02:20 +0000 Subject: [Python-ideas] [OT] banning Mark Janssen In-Reply-To: <52CCA245.7090005@stoneleaf.us> References: <52CCA245.7090005@stoneleaf.us> Message-ID: On 08/01/2014 00:56, Ethan Furman wrote: > Moderators, > > Mark Janssen's posts are becoming extremely abusive, which seems to me > to be against he code of conduct. > > Can we ban him, at least from the mailing lists? > > -- > ~Ethan~ He's a complete waste of space, please get rid of him. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Wed Jan 8 10:59:33 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Jan 2014 19:59:33 +1000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8 Jan 2014 14:08, "Stephen J. Turnbull" wrote: > > Terry Reedy writes: > > > The above describes a minor variation on bytes and seems to me to be a > > classic case for subclassing, whether in Python for ease or C for speed, > > in an imported module. > > I agree with you, but the discussion on python-dev indicates that the > majority of core devs, including Guido IIUC, disagree with us. In > fact they want to add many str-like capabilities to bytes (and the > related mutable classes bytearray and memoryview). That's far from a foregone conclusion. The main problem we've had over the past few years is the inability to get past "just give us back the Python 2 str type" responses from wire protocol developers attempting to migrate that aren't happy with the approach of manipulating data in the text domain and on to actual experiments with a suitable type for wire protocol development that interoperates nicely with the Python 3 text model. Now that your proposal has been better explained, yes, I agree that "asciibytes" and "asciistr" types would be well worth experimenting with. I mention both, since it's far from clear if a str subclass or a bytes subclass (or neither, although that may require bug fixes in CPython) would be more convenient for this use case. The key difference between such a type and a str with surrogate escaped elements or a Python 2 bytestring is that it would attempt to implicitly *encode* any Unicode text it encountered as strict ASCII text. This would allow text and binary processing to share code paths, with limited risk of producing mojibake (particularly since this type wouldn't be a builtin). The type would also share the str behaviour of returning a single element subsequence when indexed rather than an integer. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From terrycwk1994 at gmail.com Wed Jan 8 11:16:47 2014 From: terrycwk1994 at gmail.com (Terry Chia) Date: Wed, 8 Jan 2014 18:16:47 +0800 Subject: [Python-ideas] Strong password hashing algorithms in the standard library Message-ID: Hi all, I would like to propose that a new library for strong password hashing algorithms[1] be included in the standard library. The proposed library should have implementations of one or more strong password hashes like pbkdf2, bcrypt or scrypt. There already exist third party libraries like passlib[2] that accomplishes the same thing but I feel that inclusion of the algorithms in the standard library would do a lot to help people that are not as security-aware to do the right thing when it comes to password storage. Alternatively, if the idea of adding the algorithms into the standard library does not have much support, I would like to see a warning added to the hashlib[3] documentation discouraging its use for password hashing. Thoughts? Cheers, Terry [1] http://security.stackexchange.com/q/211/10211 [2] https://code.google.com/p/passlib/ [3] http://docs.python.org/2/library/hashlib.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed Jan 8 11:18:12 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jan 2014 10:18:12 +0000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 08/01/2014 09:59, Nick Coghlan wrote: > > Now that your proposal has been better explained, yes, I agree that > "asciibytes" and "asciistr" types would be well worth experimenting > with. I mention both, since it's far from clear if a str subclass or a > bytes subclass (or neither, although that may require bug fixes in > CPython) would be more convenient for this use case. > Could you subclass both to get the best of both worlds? As in class asciixyz(str, bytes): > Cheers, > Nick. > -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From solipsis at pitrou.net Wed Jan 8 11:34:08 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 11:34:08 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> Message-ID: <20140108113408.51509b48@fsol> On Wed, 8 Jan 2014 09:50:30 +0900 INADA Naoki wrote: > > textdata = b"hello" textdata shouldn't be a bytes object! If it's text it's a str. > bindata = b"abc\xff\x00" > query = "UPDATE table SET textcol=%s bincol=%s" > > print build_query(query, textdata, bindata) > > > I can't port this to Python 3. I'm sure you can port it. Just decode your bindata using surrogateescape: bindata = bindata.decode('utf8', 'surrogateescape') and then encode the query at the end: query = query.encode('utf8', 'surrogateescape') It will be a little slower, though. Regards Antoine From solipsis at pitrou.net Wed Jan 8 11:35:31 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 11:35:31 +0100 Subject: [Python-ideas] The fools shall start sucking the cock. References: Message-ID: <20140108113531.4ebe0148@fsol> Not to mention the utter lack of content. Regards Antoine. On Tue, 7 Jan 2014 19:07:19 -0500 Brett Cannon wrote: > That language is not called for (what the heck is the subject line even > supposed to mean?). While I'm not saying you can use a swear word here or > there to punctuate a statement, being this over-the-top is not considerate > of others. > > > On Tue, Jan 7, 2014 at 6:49 PM, Mark Janssen wrote: > > > Okay, how's everyone doing with their Python 2 vs.3, bytes/unicode > > vs. shit-extruder expertise? > > > > Anyone need some relief, perhaps some guidance? > > > > markj > > *kicks feet up to table* > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > From enric.tejedor at bsc.es Wed Jan 8 12:21:31 2014 From: enric.tejedor at bsc.es (Enric Tejedor) Date: Wed, 08 Jan 2014 12:21:31 +0100 Subject: [Python-ideas] Decorators on loops Message-ID: <52CD34BB.1050307@bsc.es> Hello, I would like to discuss a new use of python decorators. I apologize if this has already been suggested before. The basic idea would be to support decorators on loops, in addition to functions and classes. Something like this: @mydecorator for i in range(10): # loop body In mydecorator, I would like to have access to the loop body and the iterable object. In my case, I would use this to parallelize the iterations of the loop. Thank you for your feedback, Enric WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer From songofacandy at gmail.com Wed Jan 8 12:31:10 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 8 Jan 2014 20:31:10 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140108113408.51509b48@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140108113408.51509b48@fsol> Message-ID: On Wed, Jan 8, 2014 at 7:34 PM, Antoine Pitrou wrote: > On Wed, 8 Jan 2014 09:50:30 +0900 > INADA Naoki > wrote: > > > > textdata = b"hello" > > textdata shouldn't be a bytes object! If it's text it's a str. > > PyMySQL and MySQL-python supports both of unicode text and encoded text. So bytes may be text in MySQL if it inserted into TEXT or VARCHAR column. > > bindata = b"abc\xff\x00" > > query = "UPDATE table SET textcol=%s bincol=%s" > > > > print build_query(query, textdata, bindata) > > > > > > I can't port this to Python 3. > > I'm sure you can port it. Just decode your bindata using > surrogateescape: > > bindata = bindata.decode('utf8', 'surrogateescape') > > and then encode the query at the end: > > query = query.encode('utf8', 'surrogateescape') > > It will be a little slower, though. > You're right. I've not considered using surrogateescape here. But MySQL connection may be not utf8. It's default latin1 and you can use many encoding. Some encoding doesn't ensure roundtrip. In such encoding, bindata = bindata.decode('sjis', 'surrogateescape') query = query % bindata query.encode('sjis', 'surrogateescape') may break bindata. I may be able to ascii for decoding when mysql uses ascii compatible encoding. bindata = bindata.decode('ascii', 'surrogateescape') query = query % bindata query.encode('sjis', 'surrogateescape') But I think decode/encode with surrogateescape is not only slow, but also dangerous when using encoding except ascii or utf8. > > Regards > > Antoine > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 8 12:38:22 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 12:38:22 +0100 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140108113408.51509b48@fsol> Message-ID: <20140108123822.252fc642@fsol> On Wed, 8 Jan 2014 20:31:10 +0900 INADA Naoki wrote: > > You're right. I've not considered using surrogateescape here. > > But MySQL connection may be not utf8. It's default latin1 and you can use > many encoding. > Some encoding doesn't ensure roundtrip. In such encoding, > [...] > > But I think decode/encode with surrogateescape is not only slow, but also > dangerous when using > encoding except ascii or utf8. You're right. Thanks exposing your use case, I think it's a good data point for the bytes formatting PEP. Regards Antoine. From terrycwk1994 at gmail.com Wed Jan 8 12:42:23 2014 From: terrycwk1994 at gmail.com (Terry Chia) Date: Wed, 8 Jan 2014 19:42:23 +0800 Subject: [Python-ideas] Strong password hashing algorithms in the standard library In-Reply-To: References: Message-ID: That's great! Are there any plans to also include algorithms like bcrypt and scrypt given that they are stronger than pbkdf2 for GPU/FPGA-using attackers? Also, can the same warning be placed on older documentations like the 2.7 one given the large amount of people still using 2.7? On Wed, Jan 8, 2014 at 7:30 PM, Ronald Oussoren wrote: > > > On Jan 08, 2014, at 11:17 AM, Terry Chia wrote: > > Hi all, > > I would like to propose that a new library for strong password hashing > algorithms[1] > be included in the standard library. The proposed library should have > implementations > of one or more strong password hashes like pbkdf2, bcrypt or scrypt. > > There already exist third party libraries like passlib[2] that > accomplishes the same thing > but I feel that inclusion of the algorithms in the standard library would > do a lot to help > people that are not as security-aware to do the right thing when it comes > to password > storage. > > Alternatively, if the idea of adding the algorithms into the standard > library does not have > much support, I would like to see a warning added to the hashlib[3] > documentation > discouraging its use for password hashing. > > > Python 3.4 will include hash lib.pbkdf2_hmac, see < > http://docs.python.org/3.4/library/hashlib.html#key-derivation-function>. > That documentation also warns about using a plain hash function for > creating password hashes. > > Ronald > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 8 12:42:24 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 8 Jan 2014 12:42:24 +0100 Subject: [Python-ideas] Strong password hashing algorithms in the standard library References: Message-ID: <20140108124224.69dcb257@fsol> Hi Terry, On Wed, 8 Jan 2014 18:16:47 +0800 Terry Chia wrote: > > I would like to propose that a new library for strong password hashing > algorithms[1] > be included in the standard library. The proposed library should have > implementations > of one or more strong password hashes like pbkdf2, bcrypt or scrypt. In 3.4, hashlib has gained a pbkdf2 implementation: http://docs.python.org/dev/library/hashlib.html#key-derivation-function I think other similar primitives should be added alongside. It's probably enough to open an issue on http://bugs.python.org. If you want guidance on how to contribute code, please take a look at the developers' guide: http://docs.python.org/devguide/ Best regards Antoine. From songofacandy at gmail.com Wed Jan 8 12:53:26 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 8 Jan 2014 20:53:26 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <20140108123822.252fc642@fsol> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140108113408.51509b48@fsol> <20140108123822.252fc642@fsol> Message-ID: FYI, I can make sample data that is not roundtrip easily with iso2022-jp encoding. In [5]: b'\x1b$B\x1b(B'.decode('iso2022_jp') Out[5]: '' In [6]: b'\x1b$B\x1b(B'.decode('iso2022_jp', 'surrogateescape').encode('iso2022_jp', 'surrogateescape') Out[6]: b'' On Wed, Jan 8, 2014 at 8:38 PM, Antoine Pitrou wrote: > On Wed, 8 Jan 2014 20:31:10 +0900 > INADA Naoki > wrote: > > > > You're right. I've not considered using surrogateescape here. > > > > But MySQL connection may be not utf8. It's default latin1 and you can use > > many encoding. > > Some encoding doesn't ensure roundtrip. In such encoding, > > > [...] > > > > But I think decode/encode with surrogateescape is not only slow, but also > > dangerous when using > > encoding except ascii or utf8. > > You're right. Thanks exposing your use case, I think it's a good data > point for the bytes formatting PEP. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Jan 8 12:59:10 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 8 Jan 2014 22:59:10 +1100 Subject: [Python-ideas] Decorators on loops In-Reply-To: <52CD34BB.1050307@bsc.es> References: <52CD34BB.1050307@bsc.es> Message-ID: On Wed, Jan 8, 2014 at 10:21 PM, Enric Tejedor wrote: > The basic idea would be to support decorators on loops, in addition to > functions and classes. Something like this: > > @mydecorator > for i in range(10): > # loop body > > In mydecorator, I would like to have access to the loop body and the > iterable object. > > In my case, I would use this to parallelize the iterations of the loop. That's a nice theory, but the basic form of the decorator wouldn't work. Here's how decorators work on functions: @foo def bar(): pass is the same as: def bar(): pass bar = foo(bar) It depends on there being something assigned-to. With loops, that's not the case, so it's not possible to decorate them in the usual sense. Can you turn your loop into a map() call? Something like this: def loop_body(i): # all the code for your loop body list(map(loop_body, range(10))) Once you have it in that form, you can use multiprocessing.Pool() and its map() method, which will parallelize the loop for you (by distributing it over a pool of subprocesses). Would that cover what you need? ChrisA From masklinn at masklinn.net Wed Jan 8 13:08:31 2014 From: masklinn at masklinn.net (Masklinn) Date: Wed, 8 Jan 2014 13:08:31 +0100 Subject: [Python-ideas] Decorators on loops In-Reply-To: References: <52CD34BB.1050307@bsc.es> Message-ID: <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> On 2014-01-08, at 12:59 , Chris Angelico wrote: > On Wed, Jan 8, 2014 at 10:21 PM, Enric Tejedor wrote: >> The basic idea would be to support decorators on loops, in addition to >> functions and classes. Something like this: >> >> @mydecorator >> for i in range(10): >> # loop body >> >> In mydecorator, I would like to have access to the loop body and the >> iterable object. >> >> In my case, I would use this to parallelize the iterations of the loop. > > That's a nice theory, but the basic form of the decorator wouldn't > work. Here's how decorators work on functions: > > @foo > def bar(): > pass > > is the same as: > > def bar(): > pass > bar = foo(bar) > > It depends on there being something assigned-to. With loops, that's > not the case, so it's not possible to decorate them in the usual > sense. > > Can you turn your loop into a map() call? Something like this: > > def loop_body(i): > # all the code for your loop body > list(map(loop_body, range(10))) > > Once you have it in that form, you can use multiprocessing.Pool() and > its map() method, which will parallelize the loop for you (by > distributing it over a pool of subprocesses). Would that cover what > you need? Alternatively, wrap the loop in a function and then do AST munging in the decorator. Something similar (in spirit at least) to Numba (http://numba.pydata.org). You could even do something like immediate function invocation in the decorator, and bind the result to the function name, although I'm not sure your coworkers will like you. From stephen at xemacs.org Wed Jan 8 13:11:40 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 08 Jan 2014 21:11:40 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> Message-ID: <87bnzmhdvn.fsf@uwakimon.sk.tsukuba.ac.jp> >>>>> INADA Naoki writes: > I share my experience that I've suffered by bytes doesn't have %-format. > `MySQL-python is a most major DB-API 2.0 driver for MySQL. > MySQL-python uses 'format' paramstyle. > MySQL protocol is basically encoded text, but it may contain arbitrary > (escaped) binary. > Here is simplified example constructing real SQL from SQL format and > arguments. (Works only on Python 2.7) '>' quotes are omitted for clarity and comments deleted. def escape_string(s): return s.replace("'", "''") def convert(x): if isinstance(x, unicode): x = x.encode('utf-8') if isinstance(x, str): x = "'" + escape_string(x) + "'" else: x = str(x) return x def build_query(query, *args): if isinstance(query, unicode): query = query.encode('utf-8') return query % tuple(map(convert, args)) textdata = b"hello" bindata = b"abc\xff\x00" query = "UPDATE table SET textcol=%s bincol=%s" print build_query(query, textdata, bindata) > I can't port this to Python 3. Why not? The obvious translation is # This is Python 3!! def escape_string(s): return s.replace("'", "''") def convert(x): if isinstance(x, bytes): x = escape_string(x.decode('ascii', errors='surrogateescape')) x = "'" + x + "'" else: x = str(x) return x def build_query(query, *args): query = query % tuple(map(convert, args)) return query.encode('utf-8', errors='surrogateescape') textdata = "hello" bindata = b"abc\xff\x00" query = "UPDATE table SET textcol=%s bincol=%s" print build_query(query, textdata, bindata) The main issue I can think you might have with this is that there will need to be conversions to and from 16-bit representations, which take up unnecessary space for bindata, and are relatively slow for bindata. But it seems to me that these are second-order costs compared to the other work an adapter needs to do. What am I missing? With the proposed 'ascii-compatible' representation, if you have to handle many MB of binary or textdata with non-ASCII characters, def convert(x): if isinstance(x, str): x = x.encode('utf-8').decode('ascii-compatible') elif isinstance(x, bytes): x = escape_string(x.decode('ascii-compatible')) x = "'" + x + "'" else: x = str(x) # like 42 return x def build_query(query, *args): query = convert(query) % tuple(map(convert, args)) return query.encode('utf-8', errors='surrogateescape') ensures that the '%' format operator is always dealing with 8-bit representations only. There might be a conversion from 16-bit to 8-bit for str, but there will be no conversions from 8-bit to 16-bit representations. I don't know if that makes '%' itself faster, but it might. From ned at nedbatchelder.com Wed Jan 8 13:17:36 2014 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 08 Jan 2014 07:17:36 -0500 Subject: [Python-ideas] Decorators on loops In-Reply-To: <52CD34BB.1050307@bsc.es> References: <52CD34BB.1050307@bsc.es> Message-ID: <52CD41E0.7040306@nedbatchelder.com> On 1/8/14 6:21 AM, Enric Tejedor wrote: > Hello, > > I would like to discuss a new use of python decorators. I apologize if > this has already been suggested before. > > The basic idea would be to support decorators on loops, in addition to > functions and classes. Something like this: > > @mydecorator > for i in range(10): > # loop body > > In mydecorator, I would like to have access to the loop body and the > iterable object. In the case of function and class decorators, Python has an object that can be passed to the decorator: the function or the class. For a loop decorator, how would you "have access to the loop body"? It sounds like it would have to be compiled differently, into a separate code object? --Ned. > > In my case, I would use this to parallelize the iterations of the loop. > > > Thank you for your feedback, > > > Enric > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain > information which is privileged, confidential, proprietary, or exempt > from disclosure under applicable law. If you are not the intended > recipient or the person responsible for delivering the message to the > intended recipient, you are strictly prohibited from disclosing, > distributing, copying, or in any way using this message. If you have > received this communication in error, please notify the sender and > destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ronaldoussoren at mac.com Wed Jan 8 12:30:12 2014 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 08 Jan 2014 11:30:12 +0000 (GMT) Subject: [Python-ideas] Strong password hashing algorithms in the standard library In-Reply-To: Message-ID: On Jan 08, 2014, at 11:17 AM, Terry Chia wrote: Hi all, I would like to propose that a new library for strong password hashing algorithms[1] be included in the standard library. The proposed library should have implementations of one or more strong password hashes like pbkdf2, bcrypt or scrypt. There already exist third party libraries like passlib[2] that accomplishes the same thing but I feel that inclusion of the algorithms in the standard library would do a lot to help people that are not as security-aware to do the right thing when it comes to password storage. Alternatively, if the idea of adding the algorithms into the standard library does not have much support, I would like to see a warning added to the hashlib[3] documentation discouraging its use for password hashing. ? Python 3.4 will include?hash lib.pbkdf2_hmac, see . That documentation also warns about using a plain hash function for creating password hashes. Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Wed Jan 8 13:30:45 2014 From: joshua at landau.ws (Joshua Landau) Date: Wed, 8 Jan 2014 12:30:45 +0000 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: On 8 January 2014 00:07, Brett Cannon wrote: > On Tue, Jan 7, 2014 at 6:49 PM, Mark Janssen wrote: >> [insults] > > That language is not called for (what the heck is the subject line even > supposed to mean?). While I'm not saying you can use a swear word here or > there to punctuate a statement, being this over-the-top is not considerate > of others. Agreed. Mark Janssen, You have been disregarding acceptable public etiquette for a while in your posts, both on this python-ideas and python-list. This is a request for you to stop. From songofacandy at gmail.com Wed Jan 8 14:10:42 2014 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 8 Jan 2014 22:10:42 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <87bnzmhdvn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <87bnzmhdvn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: You're right. As I said previous mail, I had not considered about using surrogateescape. But surrogateescpae is not silverbullet. Decode with ascii and encode with target encoding is not valid on ascii compatible encoding. In [29]: bindata = b'abc' In [30]: bindata = bindata.decode('ascii', 'surrogateescape') In [31]: text = 'abc' In [32]: query = 'SET textcolumn=%s bincolumn=%s' % ("'" + text + "'", "'" + bindata + "'") In [33]: query.encode('utf16', 'surrogateescape') Out[33]: b"\xff\xfeS\x00E\x00T\x00 \x00t\x00e\x00x\x00t\x00c\x00o\x00l\x00u\x00m\x00n\x00=\x00'\x00a\x00b\x00c\x00'\x00 \x00b\x00i\x00n\x00c\x00o\x00l\x00u\x00m\x00n\x00=\x00'\x00a\x00b\x00c\x00'\x00" Fortunately, I can't use utf16 as client encoding with MySQL. mysql> SET NAMES utf16; ERROR 1231 (42000): Variable 'character_set_client' can't be set to the value of 'utf16' On Wed, Jan 8, 2014 at 9:11 PM, Stephen J. Turnbull wrote: > >>>>> INADA Naoki writes: > > > I share my experience that I've suffered by bytes doesn't have %-format. > > `MySQL-python is a most major DB-API 2.0 driver for MySQL. > > MySQL-python uses 'format' paramstyle. > > > MySQL protocol is basically encoded text, but it may contain arbitrary > > (escaped) binary. > > Here is simplified example constructing real SQL from SQL format and > > arguments. (Works only on Python 2.7) > > '>' quotes are omitted for clarity and comments deleted. > > def escape_string(s): > return s.replace("'", "''") > > def convert(x): > if isinstance(x, unicode): > x = x.encode('utf-8') > if isinstance(x, str): > x = "'" + escape_string(x) + "'" > else: > x = str(x) > return x > > def build_query(query, *args): > if isinstance(query, unicode): > query = query.encode('utf-8') > return query % tuple(map(convert, args)) > > textdata = b"hello" > bindata = b"abc\xff\x00" > query = "UPDATE table SET textcol=%s bincol=%s" > > print build_query(query, textdata, bindata) > > > I can't port this to Python 3. > > Why not? The obvious translation is > > # This is Python 3!! > def escape_string(s): > return s.replace("'", "''") > > def convert(x): > if isinstance(x, bytes): > x = escape_string(x.decode('ascii', errors='surrogateescape')) > x = "'" + x + "'" > else: > x = str(x) > return x > > def build_query(query, *args): > query = query % tuple(map(convert, args)) > return query.encode('utf-8', errors='surrogateescape') > > textdata = "hello" > bindata = b"abc\xff\x00" > query = "UPDATE table SET textcol=%s bincol=%s" > > print build_query(query, textdata, bindata) > > The main issue I can think you might have with this is that there will > need to be conversions to and from 16-bit representations, which take > up unnecessary space for bindata, and are relatively slow for bindata. > But it seems to me that these are second-order costs compared to the > other work an adapter needs to do. What am I missing? > > With the proposed 'ascii-compatible' representation, if you have to > handle many MB of binary or textdata with non-ASCII characters, > > def convert(x): > if isinstance(x, str): > x = x.encode('utf-8').decode('ascii-compatible') > elif isinstance(x, bytes): > x = escape_string(x.decode('ascii-compatible')) > x = "'" + x + "'" > else: > x = str(x) # like 42 > return x > > def build_query(query, *args): > query = convert(query) % tuple(map(convert, args)) > return query.encode('utf-8', errors='surrogateescape') > > ensures that the '%' format operator is always dealing with 8-bit > representations only. There might be a conversion from 16-bit to > 8-bit for str, but there will be no conversions from 8-bit to 16-bit > representations. I don't know if that makes '%' itself faster, but > it might. > > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholth at gmail.com Wed Jan 8 14:52:34 2014 From: dholth at gmail.com (Daniel Holth) Date: Wed, 8 Jan 2014 08:52:34 -0500 Subject: [Python-ideas] Strong password hashing algorithms in the standard library In-Reply-To: References: Message-ID: On Wed, Jan 8, 2014 at 6:42 AM, Terry Chia wrote: > That's great! > > Are there any plans to also include algorithms like bcrypt and scrypt given > that they are stronger than pbkdf2 for GPU/FPGA-using attackers? > > Also, can the same warning be placed on older documentations like the 2.7 > one given the large amount of people still using 2.7? On some platforms os.crypt() can do bcrypt or an iterative sha crypt used in red hat etc. From enric.tejedor at bsc.es Wed Jan 8 15:23:04 2014 From: enric.tejedor at bsc.es (Enric Tejedor) Date: Wed, 08 Jan 2014 15:23:04 +0100 Subject: [Python-ideas] Decorators on loops In-Reply-To: <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> References: <52CD34BB.1050307@bsc.es> <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> Message-ID: <52CD5F48.9060604@bsc.es> Thank you for your replies, El 08/01/14 13:08, Masklinn escribi?: >> That's a nice theory, but the basic form of the decorator wouldn't >> work. Here's how decorators work on functions: >> >> @foo >> def bar(): >> pass >> >> is the same as: >> >> def bar(): >> pass >> bar = foo(bar) >> >> It depends on there being something assigned-to. With loops, that's >> not the case, so it's not possible to decorate them in the usual >> sense. >> >> Can you turn your loop into a map() call? Something like this: >> >> def loop_body(i): >> # all the code for your loop body >> list(map(loop_body, range(10))) >> >> Once you have it in that form, you can use multiprocessing.Pool() and >> its map() method, which will parallelize the loop for you (by >> distributing it over a pool of subprocesses). Would that cover what >> you need? >> > Alternatively, wrap the loop in a function and then do AST munging in > the decorator. Something similar (in spirit at least) to Numba > (http://numba.pydata.org). > > You could even do something like immediate function invocation in the > decorator, and bind the result to the function name, although I'm not > sure your coworkers will like you. > > I would use this feature as a part of a parallel programming model for Python apps. Ideally, the programmer would place a decorator before their loops in order to parallelize them, similarly to OpenMP and its pragmas. Yes, I could make the programmer wrap the body of their loops in functions and then decorate those functions: # decorator of my PM library def parallel ( iterable ): def call ( func ): # parallelize the iterations here, maybe with multiprocessing and map for local execution, or another strategy for remote execution return call # user's code @parallel ( range ( count ) ) def loop (i): # loop body But this solution requires programmers to modify the loops they want to parallelize, and not simply place a decorator before them, like this: @parallel for i in range(count): # loop body > In the case of function and class decorators, Python has an object that > can be passed to the decorator: the function or the class. For a loop > decorator, how would you "have access to the loop body"? It sounds like > it would have to be compiled differently, into a separate code object? > > > > > > --Ned. > > Yes, perhaps when a loop had a decorator, the loop body could be encapsulated and compiled as a function (similar to the "loop" function I wrote), and that function object would be received by the decorator, along with an iterable object that represents the iteration space. All this would be hidden from the programmer, who would only decorate a regular loop. Thanks! Enric WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer From rosuav at gmail.com Wed Jan 8 15:39:16 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 01:39:16 +1100 Subject: [Python-ideas] Decorators on loops In-Reply-To: <52CD5F48.9060604@bsc.es> References: <52CD34BB.1050307@bsc.es> <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> <52CD5F48.9060604@bsc.es> Message-ID: On Thu, Jan 9, 2014 at 1:23 AM, Enric Tejedor wrote: > Yes, perhaps when a loop had a decorator, the loop body could be > encapsulated and compiled as a function (similar to the "loop" function > I wrote), and that function object would be received by the decorator, > along with an iterable object that represents the iteration space. All > this would be hidden from the programmer, who would only decorate a > regular loop. > The biggest problem with that kind of magic is scoping. Look at this: def func(): best = 0 for x in range(10): val = long_computation(x) if val > best: best = val return best (Granted, this can be done with builtins, but let's keep the example simple.) If the body of the loop becomes a new function, there needs to be a nonlocal directive to make sure 'best' references the outer one: def func(): best = 0 @parallelize(range(10)) def body(x): nonlocal best val = long_computation(x) if val > best: best = val return best This syntax would work, but it'll raise UnboundLocalError without the nonlocal declaration. Since Python tags non-local variables (as opposed to C-like languages, which tag local variables), there's no easy way to just add another scope and have it function invisibly. Any bit of magic that creates a local scope is going to cause problems in any but the simplest cases. Far better to force people to be explicit about it, and then the rules are clearer. Note that the parallelize decorator I use here would be a little unusual, in that it has to actually call the function (and in fact call it multiple times), and its return value is ignored. This would work, but it might confuse people, so you'd want to name it something that explains what's happening. It wouldn't be hard to write, in this form, though - it'd basically just pass the iterable to multiprocessing.Pool().map(). However, the example I give here wouldn't work (at least, I don't think it would) with multiprocessing, because external variable scopes would be duplicated, not shared, between processes. So once again, you'd have to write your code with parallelization in mind, rather than simply stick a decorator on a loop and have it fork out across processes. ChrisA From alc at spika.net Wed Jan 8 15:57:07 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Wed, 8 Jan 2014 15:57:07 +0100 Subject: [Python-ideas] from __past__ import division, str, etc Message-ID: Hi, I'm new here. I am sorry if this idea has already been discussed, but I have not found a way to search this list (I am not used to mailing lists at all). I've seen recently some discussion in reddit about python 2 vs python 3, and the slow adoption of the latter. I am proposing here pragmatic way to speed up the process of porting old code and thus solving the split in the community, that I believe it is a serious threat. It is not clean, not at all, but it might work: just give python 2 whiners what they [we] want, and do it using "from __past__ import", in a similar way "from __future__ import" is used. The advantage of this method is that porting old code would be trivial, and each module could be rewritten at its own pace (for example, when a new feature is required). The 2to3.py tool could be updated to perform as many safe changes as it could (safe in the sense of 100% certainty of not breaking anything), and import old features as needed. I am thinking both in language syntax like division behaviour, unicode, str, etc, and major library changes. Past features may be added per request, and the 2to3 tool should allow users to force the use of any of them, just in case. The whole process should be almost automatic: the user might just run the tool at the root folder of the code base, with any required command-line arguments to force some features, and the tool would generate working python3 code. There might be some issues regarding the interaction of new python3 code with code that uses old features, maintaining a more complex code base, and there might be other issues I am missing (like fundamental changes in python 3 internal architecture that can't accomodate some older features), but it might work and I think it could be useful to discuss this idea. A potential non technical problem involves users abusing this mechanism to write new code with old features, but I believe it is a minor risk if this means the whole python community finally moves to python 3. Hope this is useful. Alejandro -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 8 16:05:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 01:05:38 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On 9 January 2014 00:57, Alejandro L?pez Correa wrote: > Hi, > > I'm new here. I am sorry if this idea has already been discussed, but I have > not found a way to search this list (I am not used to mailing lists at all). > > I've seen recently some discussion in reddit about python 2 vs python 3, and > the slow adoption of the latter. I am proposing here pragmatic way to speed > up the process of porting old code and thus solving the split in the > community, that I believe it is a serious threat. It is not clean, not at > all, but it might work: just give python 2 whiners what they [we] want, and > do it using "from __past__ import", in a similar way "from __future__ > import" is used. Hi, You may want to read through http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html before lending too much weight to ill-informed Reddit commentary. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Jan 8 16:17:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 02:17:44 +1100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On Thu, Jan 9, 2014 at 1:57 AM, Alejandro L?pez Correa wrote: > I am thinking both in language syntax like division behaviour, unicode, str, > etc, and major library changes. The point of the __future__ directive is to enable per-module changes, which are applied at compile-time. The __future__ features spanning the 2.x / 3.x gap are: division (changes the meaning of an operator) absolute_import (changes the way modules are searched for) print_function (ditches some language magic in favour of a function) unicode_literals (changes the meaning of unadorned quoted strings) In theory, division and unicode_literals could probably be the targets of a from __past__ directive, but there's little point. Change the code now, use the directive, and then when you move to 3.x, the directive does nothing. (The other two would be more of a problem - I doubt the code to make print a statement exists in Py3, and the complete rewrite of the import machinery would make old-style importing dubious. In any case, you probably don't want old-style importing.) The unicode and str (or str and bytes) types have been the subject of some other discussions here on python-ideas, so I recommend reading up on those threads; I won't try to reopen the discussion here. There've been quite a few suggestions made, several of which could be quite viable without even requiring interpreter changes. Library changes are definitely not something you'd want a "from __past__ import" statement for. That would be exceedingly messy. However, there are a number of wrapper modules that let you bury the 2-vs-3 differences; instead of importing module X_name_1 or module X_name_2, you simply import X from wrapper, and it'll automatically give you the one you need. That at least covers the cases where the APIs are the same and it's just the names that differ. When anything more than that has changed, it wouldn't be possible to use a per-module flag (as "from __future__ import" is) to change that anyway. Once you feel the push to change interpreters and execute the code under Python 3, it's best to make your code run properly under Py3, rather than try to hold onto the past. Straddle the gap by continuing to run a Py2 interpreter and progressively changing your code to use __future__ print_function and division, and to get the text/bytes distinction clear, and then the jump to Py3 will be way easier. ChrisA From ncoghlan at gmail.com Wed Jan 8 16:38:39 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 01:38:39 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On 9 January 2014 01:05, Nick Coghlan wrote: > On 9 January 2014 00:57, Alejandro L?pez Correa wrote: >> Hi, >> >> I'm new here. I am sorry if this idea has already been discussed, but I have >> not found a way to search this list (I am not used to mailing lists at all). >> >> I've seen recently some discussion in reddit about python 2 vs python 3, and >> the slow adoption of the latter. I am proposing here pragmatic way to speed >> up the process of porting old code and thus solving the split in the >> community, that I believe it is a serious threat. It is not clean, not at >> all, but it might work: just give python 2 whiners what they [we] want, and >> do it using "from __past__ import", in a similar way "from __future__ >> import" is used. > > Hi, > > You may want to read through > http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html > before lending too much weight to ill-informed Reddit commentary. My apologies, that was rather rude of me when you're offering to help (I'm irritable at the moment since I've deemed it necessary to spend a bunch of time over the past week updating my Python 3 Q & A rather than enjoying my Christmas holidays, working on Python 3.4 or, this week, enjoying linux.conf.au 2014). Anyway, the problems impacting wire protocol developers are known, but it's been damnably difficult to get anything other than "I like Python 2 better" out of them when it comes to discussing possible *solutions* (although even the descriptions of the problems have been useful in guiding some changes over the course of the 3.x series). The primary pain point for developers of binary protocol manipulation code is that the Python 2 text model was *right* for boundary code that converts binary data to text or structured data. However, it's wrong for basically everything else, which is why we changed it for Python 3. The main challenge is thus getting people to stop asking the question "How do we bring back the Python 2 text model" (which is never going to happen - we changed the model for a reason), and instead ask "What changes can be made to Python 3, such as introducing additional purpose specific types, to make it a better language for wire protocol development?". There's nothing actually *saying* "thou shalt only use builtin types for manipulation of wire protocol data", but that's the way all porting efforts have been carried out to date. As part of addressing that, it's likely that certain kinds of Python 2 code will become easier to port to Python 3, but the bigger issue is to actually try to improve wire protocol development in Python 3 rather than getting stuck on just recreating the deeply flawed Python 2 model. I put some possible ideas for improvements at http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#is-python-3-more-convenient-than-python-2-in-every-respect, but what we really need at this point is some *experimentation* with possible approaches (especially new types like asciiview and asciibytes). Regards, Nick. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Wed Jan 8 16:41:58 2014 From: brett at python.org (Brett Cannon) Date: Wed, 8 Jan 2014 10:41:58 -0500 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: After others coming forward about previous behavior, this email is serving as an official warning: one more infraction of the CoC and you will be banned from this mailing list. Please try to take this seriously and be respectful of others on this mailing list. On Tue, Jan 7, 2014 at 6:49 PM, Mark Janssen wrote: > Okay, how's everyone doing with their Python 2 vs.3, bytes/unicode > vs. shit-extruder expertise? > > Anyone need some relief, perhaps some guidance? > > markj > *kicks feet up to table* > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 8 16:46:14 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 00:46:14 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB353F.7000805@stoneleaf.us> <87a9f8idp0.fsf@uwakimon.sk.tsukuba.ac.jp> <52CB95CF.3080801@stoneleaf.us> <8738l0hrq6.fsf@uwakimon.sk.tsukuba.ac.jp> <52CC2FC5.9080906@stoneleaf.us> <20140107185733.7ad1a3be@fsol> <52CC430B.6040908@stoneleaf.us> <20140107194752.304604a1@fsol> <52CC4E00.7000903@stoneleaf.us> <20140107205936.7706c393@fsol> <52CC5E73.1060700@stoneleaf.us> <20140108113408.51509b48@fsol> Message-ID: <87a9f6h3y1.fsf@uwakimon.sk.tsukuba.ac.jp> >>>>> INADA Naoki writes: > On Wed, Jan 8, 2014 at 7:34 PM, Antoine Pitrou wrote: >> INADA Naoki wrote: > Some encoding doesn't ensure roundtrip. In that case, in Python 2 you're depending on all "text" to be encoded in the same encoding. And even so you may be in trouble: def convert(x): if isinstance(x, unicode): x = x.encode(round_trip_not_guaranteed) could cause your query to fail when it should succeed. 'x' is user-supplied data, so you have no control over that. > I may be able to ascii for decoding when mysql uses ascii compatible > encoding. You can *always* use 'ascii', 'latin1', or 'utf-8' with 'surrogateescape' for decoding, and roundtrip is guaranteed. > But I think decode/encode with surrogateescape is not only slow, Evidence? Especially as compared with the connection overhead of the DBMS? > but also dangerous when using encoding except ascii or utf8. Or latin1. But here's your code as translated to Python 3.3, assuming a connection encoding of Shift JIS: # unchanged source, but this is Python 3 str == Unicode def escape_string(s): return s.replace("'", "''") def convert(x): if isinstance(x, str): # Correct type unicode->str x = "'" + escape_string(x) + "'" elif isinstance(x, bytes): # Correct type str->bytes # SAFE: ASCII is a Unicode subset, RT guaranteed. x = x.decode('ascii', errors='surrogateescape') x = "'" + escape_string(x) + "'" else: x = str(x) return x def build_query(query, *args): if isinstance(query, bytes): # want str for the format operator query = query.decode('sjis') query = query % tuple(map(convert, args)) # CORRECT: for ASCII-compatible encodings, including Shift # JIS and Big 5, since the binary blob doesn't contain any # non-ASCII characters and the non-character bytes 128-255 # will be restored properly by the error handler. return query.encode('sjis', errors='surrogate-escape') textdata = b"hello" # or "hello" bindata = b"abc\xff\x00" query = "UPDATE table SET textcol=%s bincol=%s" print build_query(query, textdata, bindata) The only problem with correctness will occur if the MySQL connection uses a non-ASCII-compatible encoding (UTF-16, fixed-width EUC) in the query string, because the ASCII bytes in the blob will be "widened" by "encode". Widechar encodings could actually be handled with a "binary" codec that recognizes *no* characters and always surrogate-encodes every byte. But that's pretty obviously going to be unacceptable. I guess bytes.format() is pretty well unstoppable at this point. From brett at python.org Wed Jan 8 16:49:18 2014 From: brett at python.org (Brett Cannon) Date: Wed, 8 Jan 2014 10:49:18 -0500 Subject: [Python-ideas] [OT] banning Mark Janssen In-Reply-To: <52CCA245.7090005@stoneleaf.us> References: <52CCA245.7090005@stoneleaf.us> Message-ID: On Tue, Jan 7, 2014 at 7:56 PM, Ethan Furman wrote: > Moderators, > > Mark Janssen's posts are becoming extremely abusive, which seems to me to > be against he code of conduct. > > Can we ban him, at least from the mailing lists? > I actually issued a warning last night but since I accidentally sent it from my personal address it got bounced; just sent it from the proper address. I have publicly stated people get one warning before getting banned so I don't want to circumvent that practice. If the CoC is broken again feel free to point out where it happened and the appropriate action will be taken. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alc at spika.net Wed Jan 8 17:22:02 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Wed, 8 Jan 2014 17:22:02 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: Answering both Chris Angelico and you, I think I have not made my point clear. I am not really complaining about python 3. My opinion about the changes (at least those of which I am aware) is that they are sane. Unfortunately, many people are reluctant to change, and from what I've read that is actually a problem (not that I have actual data). I think large python 2 code bases won't change unless the benefits are larger than the costs. In the costs we have to count the available developer time, for example, and in many cases that means [a lot of] money. The idea is to offer a solution to those programmers so they can trivially port their code base, write new code in python 3 and rewrite old python 2 code as soon as possible. I am not suggesting offering back the whole python 2.7 by any means. Many changes can be safely performed by the 2to3 tool, probably. My suggestion is to offer a convenient way to bring everybody into python 3. > My apologies, that was rather rude of me when you're offering to help No worries. > The main challenge is thus getting people to stop asking the question > "How do we bring back the Python 2 text model" (which is never going > to happen - we changed the model for a reason), and instead ask "What > changes can be made to Python 3, such as introducing additional > purpose specific types, to make it a better language for wire protocol > development?". There's nothing actually *saying* "thou shalt only use > builtin types for manipulation of wire protocol data", but that's the > way all porting efforts have been carried out to date. Enabling old functionality when required is not the same as bringing back python 2, since python 3 is there by default and python 2 code won't work by default. It means just providing a good way to make old code work. The key part is the translation from 2 to 3. This does not mean that the code has to run unchanged but that the translation may be performed automatically, at least in 99.9% of cases. In practice this could involve a mixture of changes to python 3 itself to support the 2to3 tool, and improvements to the tool. With a 2to3 tool that covers 99.99% of the cases, we could even have .py2 modules that would be translated transparently to .py when first used, in the same way compilation works, raising an exception in case something goes wrong. Anyway, I understand it is not a clean way to proceed, but something along these lines might be the only way to speed up the adoption of python 3, and minimise the risk of defection to other languages. Cheers, Alejandro -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 8 17:46:22 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 02:46:22 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On 9 January 2014 02:22, Alejandro L?pez Correa wrote: > Answering both Chris Angelico and you, I think I have not made my point > clear. I am not really complaining about python 3. My opinion about the > changes (at least those of which I am aware) is that they are sane. > Unfortunately, many people are reluctant to change, and from what I've read > that is actually a problem (not that I have actual data). I think large > python 2 code bases won't change unless the benefits are larger than the > costs. This is mostly a communications problem on our part. I certainly thought "5 years for Python 3 to be the default choice for new projects" was fairly straightforward to interpret, but some overly optimistic folks with less experience of corporate adoption rates managed to misinterpret that as something more like "5 years until more people are writing Python 3 code than Python 2 code". If the latter was the goal, then we'd have a crisis, but it was never the goal - Python 2 has a massive installed base, and it's going to take a long time for new Python 3 projects and Python 2 to Python 3 migrations to overtake that. > In the costs we have to count the available developer time, for > example, and in many cases that means [a lot of] money. The idea is to offer > a solution to those programmers so they can trivially port their code base, > write new code in python 3 and rewrite old python 2 code as soon as > possible. > > I am not suggesting offering back the whole python 2.7 by any means. Many > changes can be safely performed by the 2to3 tool, probably. My suggestion is > to offer a convenient way to bring everybody into python 3. This is also largely an education problem. A couple of projects have legitimate gripes about binary protocol handling in Python 3, and since they have no pressing interest in migrating (and thus little motivation to build the missing pieces of infrastructure themselves), their response has been to tell the core team "we're not migrating until *you* provide a suitable replacement for this particular Python 2 feature". It's a reasonable request, but hasn't been at the top of the core teams priority list up to this point (that's now likely to change for Python 3.5). >> My apologies, that was rather rude of me when you're offering to help > No worries. > > >> The main challenge is thus getting people to stop asking the question >> "How do we bring back the Python 2 text model" (which is never going >> to happen - we changed the model for a reason), and instead ask "What >> changes can be made to Python 3, such as introducing additional >> purpose specific types, to make it a better language for wire protocol >> development?". There's nothing actually *saying* "thou shalt only use >> builtin types for manipulation of wire protocol data", but that's the >> way all porting efforts have been carried out to date. > Enabling old functionality when required is not the same as bringing back > python 2, since python 3 is there by default and python 2 code won't work by > default. It means just providing a good way to make old code work. The key > part is the translation from 2 to 3. This does not mean that the code has to > run unchanged but that the translation may be performed automatically, at > least in 99.9% of cases. In practice this could involve a mixture of changes > to python 3 itself to support the 2to3 tool, and improvements to the tool. There's very little actually *missing* from Python 3 now, though, and it's far from clear that the key remaining missing piece (a type for manipulating ASCII compatible binary protocol data) can't be provided as a library on PyPI. > With a 2to3 tool that covers 99.99% of the cases, we could even have .py2 > modules that would be translated transparently to .py when first used, in > the same way compilation works, raising an exception in case something goes > wrong. I'm pretty sure someone already wrote one of those - they're a problem, because they mean the tracebacks for runtime exceptions don't match the source code (that's one of the major reasons single-source approaches came to dominate as the preferred migration mechanism for libraries and frameworks, leaving 2to3 as an option mainly considered by applications that can abandon Python 2 support when migrating to Python 3). > Anyway, I understand it is not a clean way to proceed, but something along > these lines might be the only way to speed up the adoption of python 3, and > minimise the risk of defection to other languages. We're largely happy with the rate of adoption though - there were just some folks that didn't grasp the kinds of time scales we're talking about for a migration of this magnitude. See this for more details: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#but-uptake-is-so-slow-doesn-t-this-mean-python-3-is-failing-as-a-platform Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From barry at python.org Wed Jan 8 18:17:06 2014 From: barry at python.org (Barry Warsaw) Date: Wed, 8 Jan 2014 12:17:06 -0500 Subject: [Python-ideas] [OT] banning Mark Janssen References: <52CCA245.7090005@stoneleaf.us> <20140108022018.GN29356@ando> Message-ID: <20140108121706.37cbc7b3@anarchist.wooz.org> On Jan 08, 2014, at 01:20 PM, Steven D'Aprano wrote: >(I would also like to preemptively state that I object in the strongest >possible terms to a blanket "no swearing" policy, just in case anyone is >thinking of introducing such a thing.) "Swear" words in and of themselves don't violate the CoC. It's how they're used that matters. (i.e. context is everything) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Wed Jan 8 18:21:12 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 02:21:12 +0900 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: <877gaagzjr.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Anyway, the problems impacting wire protocol developers are known, > but it's been damnably difficult to get anything other than "I like > Python 2 better" out of them when it comes to discussing possible > *solutions* Good to know you feel that way too, I thought I just missed a lot of important discussions. :-( > The main challenge is thus getting people to stop asking the question > "How do we bring back the Python 2 text model" (which is never going > to happen - we changed the model for a reason), and instead ask "What > changes can be made to Python 3, such as introducing additional > purpose specific types, to make it a better language for wire protocol > development?". After spending enough time on Inada-san's use-case to find a real problem with treating wire protocols as text, I've come to the conclusion that those really are the same question, though. Add even a little bit of binary handling to a database connection, and even though almost everything is actually just ASCII, the few things that aren't blow everything up and you want everything to be bytes. At that point you end up really wanting bytes to have pretty much everything str does except maybe unidata lookups! From enric.tejedor at bsc.es Wed Jan 8 18:47:45 2014 From: enric.tejedor at bsc.es (Enric Tejedor) Date: Wed, 08 Jan 2014 18:47:45 +0100 Subject: [Python-ideas] Decorators on loops In-Reply-To: References: <52CD34BB.1050307@bsc.es> <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> <52CD5F48.9060604@bsc.es> Message-ID: <52CD8F41.6000203@bsc.es> Hi, > The biggest problem with that kind of magic is scoping. Look at this: > > def func(): > best = 0 > for x in range(10): > val = long_computation(x) > if val > best: best = val > return best > > (Granted, this can be done with builtins, but let's keep the example simple.) > > If the body of the loop becomes a new function, there needs to be a > nonlocal directive to make sure 'best' references the outer one: > > def func(): > best = 0 > @parallelize(range(10)) > def body(x): > nonlocal best > val = long_computation(x) > if val > best: best = val > return best > > This syntax would work, but it'll raise UnboundLocalError without the > nonlocal declaration. Since Python tags non-local variables (as > opposed to C-like languages, which tag local variables), there's no > easy way to just add another scope and have it function invisibly. Any > bit of magic that creates a local scope is going to cause problems in > any but the simplest cases. Far better to force people to be explicit > about it, and then the rules are clearer. > > Note that the parallelize decorator I use here would be a little > unusual, in that it has to actually call the function (and in fact > call it multiple times), and its return value is ignored. This would > work, but it might confuse people, so you'd want to name it something > that explains what's happening. It wouldn't be hard to write, in this > form, though - it'd basically just pass the iterable to > multiprocessing.Pool().map(). > > However, the example I give here wouldn't work (at least, I don't > think it would) with multiprocessing, because external variable scopes > would be duplicated, not shared, between processes. So once again, > you'd have to write your code with parallelization in mind, rather > than simply stick a decorator on a loop and have it fork out across > processes. > Correct, this is indeed a problem. It would be tricky to make this work in the general case. In a simpler scenario, we could assume that iterations won't update the same data. On the other hand, to prevent the UnboundLocalError, the variables needed inside the loop could be passed to the decorator and appear in the loop function's signature. results = [0] * 10 @parallel(range(10), results) def loop(i, results): results[i] = some_computation(i) Then the decorator would be: def parallel(*args): iterable = args[0] params = args[1:] def call(func): # create parallel invocations of func with iterable and params return call I think this solution would work if you wanted to do things like performing independent updates on a list. Anyway now I see more clearly the implications of such a construct for loops. Thanks again for your feedback, Enric > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. http://www.bsc.es/disclaimer From stephen at xemacs.org Wed Jan 8 18:47:41 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 02:47:41 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8761pugybm.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > > a. If the 8-bit str contains any Latin-1 or C1 characters, both > > strs are promoted to 16-bit, and non-ASCII characters in the > > 7-bit string are converted by the surrogateescape handler. > > This part worries me a bit. The bytes 61 62 63 FF in this new > representation actually _mean_ 'abc' followed by a smuggled FF > byte. No, it doesn't. It means 'abc' followed by something that cannot be encoded by any codec without the surrogateescape handler. 'ascii-compatible' merely defaults to that handler. I wouldn't actually be too upset if I were told, no, you have to specify explicitly. > > 6. On output the 'ascii-compatible' codec simply memcpy's 7-bit str > > and pure ASCII 8-bit str, and raises on anything else. > > So if a 7-bit string gets converted to a surrogate-escaped 16-bit > string, it can never be written out again? Of course it can. Use .encode('ascii', errors='surrogateescape') > (b'abc\xff'.decode('ascii-compatible') + '\u1234')[:4].encode('ascii-compatible') > > I'd expect to get back my b'abcd\xff'. But your rules give me an > exception. Yes. This whole proposal was aimed at wire protocols. It's very bad if something intended to be ready to be squirted into the wire needs (expensive) encoding. > I think ascii-compatible has to accept non-8-bit-repr strings (by > encoding ASCII as ASCII and surrogate escapes as bytes and > everything else is an exception). This is necessary because 60 61 > 62 FF (7-bit) and 0061 0062 0063 DCFF (16-bit) are the same string > anyway. But it's especially necessary because the former can be > silently converted into the latter (and there's no way to even test > whether that's happened). Well, one way around that would be to require that the latter not exist (convert it to "7-bit" during construction). But I've come to the conclusion that this is all too irregular and confusing. I'm pretty sure that I can come up with a set of rules that are not inherently self-contradictory, but I'm also pretty sure that the resulting type will behave unintuitively for almost everybody. Also, despite my original thought, it's really hard to see how unnecessary encode/decode cycles can be eliminated. So I think I need to go back to the drawing board. So I hope I haven't wasted too much of your time; it's been very educational for me. From abarnert at yahoo.com Wed Jan 8 18:57:10 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 8 Jan 2014 09:57:10 -0800 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1B2E849F-C349-470C-830B-13299AB74DBF@yahoo.com> On Jan 8, 2014, at 2:18, Mark Lawrence wrote: > On 08/01/2014 09:59, Nick Coghlan wrote: >> >> Now that your proposal has been better explained, yes, I agree that >> "asciibytes" and "asciistr" types would be well worth experimenting >> with. I mention both, since it's far from clear if a str subclass or a >> bytes subclass (or neither, although that may require bug fixes in >> CPython) would be more convenient for this use case. > > Could you subclass both to get the best of both worlds? As in > > class asciixyz(str, bytes): You can't. (Try it,) More importantly, how would that work? You'd have the implementation of str (effectively a tagged union of char8/char16/char32 arrays) plus the separate implementation of bytes (effectively a char8 array). Do you leave the first one empty? And then avoid super() and instead explicitly delegate only to the bytes base? That could work (at the relatively minimal cost of an extra empty '' worth of storage) as long as you don't run into any code that tries to use the internal details of the str. But unfortunately, most builtins and extension module functions _do_ try to use the internal details of the str. In CPython, for example, a function that takes a string usually does so by parsing the argument as, say, a u#, which gives you the character array from a str directly. Even functions that take str objects will usually at some point call string-protocol functions to get at their array. The simple way around this is to make all such functions effectively call __str__ on any object that isn't a real str. But that would make almost _everything_ usable as a string--f.write(2) would now work. So you'd really need to create a new dunder method (and C API slot) __asstr__ that's only implemented by objects that really want to act like a str, not just have a str representation. Also, I'm not sure all such functions have a reasonable way to refcount the resulting str object properly. The alternative would be to expose the entire string protocol into Python--including, most importantly, the methods to get at the array directly. I'm not sure how you'd even design the API for those methods in Python. We don't even expose the buffer protocol to Python today. I didn't go into all this detail to try to prove that the idea is impossible, but rather in hopes that someone would have an answer that makes everything work. Making string-protocol strings more "pluggable" might have other benefits besides the "encodedstr" type. Imagine being able to build an explicitly UTF-16 type to make it faster and easier to deal with Win32 or Java or other such things. (Or could you just use encodedstr('utf-16-le') for that?) Or expose a "rope"-like type for large mutable strings. Or experiment with alternatives to the 3.3-style internal storage, like Stephen's ASCII-compatible byte-smuggling flag, by faking them in Python instead of building them in C. (That would probably be sufficient to find any holes in the specification, even if it wouldn't be very helpful for perf testing.) From breamoreboy at yahoo.co.uk Wed Jan 8 19:11:10 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jan 2014 18:11:10 +0000 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <1B2E849F-C349-470C-830B-13299AB74DBF@yahoo.com> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <87eh4jgg4u.fsf@uwakimon.sk.tsukuba.ac.jp> <1B2E849F-C349-470C-830B-13299AB74DBF@yahoo.com> Message-ID: On 08/01/2014 17:57, Andrew Barnert wrote: > On Jan 8, 2014, at 2:18, Mark Lawrence wrote: > >> On 08/01/2014 09:59, Nick Coghlan wrote: >>> >>> Now that your proposal has been better explained, yes, I agree that >>> "asciibytes" and "asciistr" types would be well worth experimenting >>> with. I mention both, since it's far from clear if a str subclass or a >>> bytes subclass (or neither, although that may require bug fixes in >>> CPython) would be more convenient for this use case. >> >> Could you subclass both to get the best of both worlds? As in >> >> class asciixyz(str, bytes): > > You can't. (Try it,) More importantly, how would that work? I haven't the faintest idea :) > > but rather in hopes that someone would have an answer that makes everything work. The reason I threw this in in the first place. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From flying-sheep at web.de Wed Jan 8 19:35:34 2014 From: flying-sheep at web.de (Philipp A.) Date: Wed, 8 Jan 2014 19:35:34 +0100 Subject: [Python-ideas] [OT] banning Mark Janssen In-Reply-To: <20140108121706.37cbc7b3@anarchist.wooz.org> References: <52CCA245.7090005@stoneleaf.us> <20140108022018.GN29356@ando> <20140108121706.37cbc7b3@anarchist.wooz.org> Message-ID: 2014/1/8 Barry Warsaw > "Swear" words in and of themselves don't violate the CoC. It's how they're > used that matters. (i.e. context is everything) > > -Barry > i?d say the amount and kind of all words is irrelevant as long as meaning is conveyed. spam, personal insults, etc. are not ok, no matter what words they are composed of. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Jan 8 19:41:04 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 08 Jan 2014 19:41:04 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: Am 08.01.2014 16:38, schrieb Nick Coghlan: > On 9 January 2014 01:05, Nick Coghlan wrote: >> On 9 January 2014 00:57, Alejandro L?pez Correa wrote: >>> Hi, >>> >>> I'm new here. I am sorry if this idea has already been discussed, but I have >>> not found a way to search this list (I am not used to mailing lists at all). >>> >>> I've seen recently some discussion in reddit about python 2 vs python 3, and >>> the slow adoption of the latter. I am proposing here pragmatic way to speed >>> up the process of porting old code and thus solving the split in the >>> community, that I believe it is a serious threat. It is not clean, not at >>> all, but it might work: just give python 2 whiners what they [we] want, and >>> do it using "from __past__ import", in a similar way "from __future__ >>> import" is used. >> >> Hi, >> >> You may want to read through >> http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html >> before lending too much weight to ill-informed Reddit commentary. > > My apologies, that was rather rude of me when you're offering to help > (I'm irritable at the moment since I've deemed it necessary to spend a > bunch of time over the past week updating my Python 3 Q & A rather > than enjoying my Christmas holidays, working on Python 3.4 or, this > week, enjoying linux.conf.au 2014). Please know that we all love you a bit more for that :) Georg From masklinn at masklinn.net Wed Jan 8 19:53:11 2014 From: masklinn at masklinn.net (Masklinn) Date: Wed, 8 Jan 2014 19:53:11 +0100 Subject: [Python-ideas] Decorators on loops In-Reply-To: <52CD8F41.6000203@bsc.es> References: <52CD34BB.1050307@bsc.es> <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> <52CD5F48.9060604@bsc.es> <52CD8F41.6000203@bsc.es> Message-ID: <222EC7A8-F7DF-4234-85C7-275183469CAC@masklinn.net> On 2014-01-08, at 18:47 , Enric Tejedor wrote: > Correct, this is indeed a problem. It would be tricky to make this work > in the general case. > > In a simpler scenario, we could assume that iterations won't update the > same data. > On the other hand, to prevent the UnboundLocalError, the variables > needed inside the loop could be passed to the decorator and appear in > the loop function's signature. > > results = [0] * 10 > > @parallel(range(10), results) > def loop(i, results): > results[i] = some_computation(i) At this point you don't really need a decorator anymore, this is an odd-ish way to write `results = map(some_computation, range(10))`, and as others have noted the standard library already has a parallelized version thereof: http://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map > Then the decorator would be: > > def parallel(*args): > iterable = args[0] > params = args[1:] > > def call(func): > # create parallel invocations of func with iterable and params > > return call > > I think this solution would work if you wanted to do things like > performing independent updates on a list. The problem of a loop being that its semantics are too generic to make anything even remotely close to such an assumption. From haoyi.sg at gmail.com Wed Jan 8 20:05:39 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 8 Jan 2014 11:05:39 -0800 Subject: [Python-ideas] Decorators on loops In-Reply-To: <222EC7A8-F7DF-4234-85C7-275183469CAC@masklinn.net> References: <52CD34BB.1050307@bsc.es> <25C893E2-74AF-40D1-BB92-2B89F3B0257C@masklinn.net> <52CD5F48.9060604@bsc.es> <52CD8F41.6000203@bsc.es> <222EC7A8-F7DF-4234-85C7-275183469CAC@masklinn.net> Message-ID: > The problem of a loop being that its semantics are too generic to make anything even remotely close to such an assumption. I think it's the opposite problem, really: the semantics (repeatedly calling .next() on iter(...)) is too specific, and is incompatible with what you want. What you want is a generic map() function which encodes the semantics (independent updates) that you want, and map() is trivially parallelizable. On Wed, Jan 8, 2014 at 10:53 AM, Masklinn wrote: > On 2014-01-08, at 18:47 , Enric Tejedor wrote: > > Correct, this is indeed a problem. It would be tricky to make this work > > in the general case. > > > > In a simpler scenario, we could assume that iterations won't update the > > same data. > > On the other hand, to prevent the UnboundLocalError, the variables > > needed inside the loop could be passed to the decorator and appear in > > the loop function's signature. > > > > results = [0] * 10 > > > > @parallel(range(10), results) > > def loop(i, results): > > results[i] = some_computation(i) > > At this point you don't really need a decorator anymore, this is an > odd-ish way to write `results = map(some_computation, range(10))`, and > as others have noted the standard library already has a parallelized > version thereof: > > http://docs.python.org/2/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map > > > Then the decorator would be: > > > > def parallel(*args): > > iterable = args[0] > > params = args[1:] > > > > def call(func): > > # create parallel invocations of func with iterable and params > > > > return call > > > > I think this solution would work if you wanted to do things like > > performing independent updates on a list. > > The problem of a loop being that its semantics are too generic to make > anything even remotely close to such an assumption. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jan 8 23:01:05 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 9 Jan 2014 09:01:05 +1100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: <20140108220104.GS29356@ando> On Wed, Jan 08, 2014 at 05:22:02PM +0100, Alejandro L?pez Correa wrote: [...] > Anyway, I understand it is not a clean way to proceed, but something along > these lines might be the only way to speed up the adoption of python 3 One assumption in this discussion, and the various related discussions on Reddit and other places, is that adoption of Python 3 is too slow and needs to be sped up. I don't believe this is true. I believe adoption is just right and exactly what should be expected. Alex Gaynor wrote a blog post a week or so ago claiming that, five years since Python 3 was first released, everyone should have migrated by now and that since only "five percent" (a figure which I believe he pulled out of thin air) have migrated, Python 3 has been a failure. I challenge that belief. I've been hanging around here and on the Python-Dev list for a long time, and while I can't find any official pronouncement, the sense has always been that Python 3 adoption will take ten years, not five. (That's my recollection -- if any of the core developers wish to correct me, please do.) Rates of adoption are much, much higher than gossip on the Internet suggests. About 70% of the top 200 projects on PyPI support Python 3, and downloads of Python 3 are very healthy, possibly even higher than downloads of Python 2. On the tutor list, I see a significant number of beginners using Python 3. It seems to me that given the circumstances, Python 3 adoption is right where we should expect it to be half-way through a decade-long process. There will be a period at the start when hardly anyone will migrate, then a period of accelerating migration, which will accelerate further when the mainstream Linux distros start shipping Python 3 as their system Python (ArchLinux is not mainstream, but Fedora is planning the change), followed by a sudden rush in another four or five years when people realise that Python 2.7 becoming unmaintained is no longer a distant prospect but is about to happen. For many people, waiting until the last minute is the most sensible thing that they can do. This gives time for the early adoptors to discover and iron out all the wrinkles and difficulties. Rather than approaching this as "Python 3 has been a failure, what can we do to save it?" we should be approaching this as "Python 3 has been a success, what lessons can we take from the early adoptors to make it even easier for the next wave of adoptors?" "from __past__ import spam" does not make it easier to adopt. It just makes it easier to *put off adopting*. > and minimise the risk of defection to other languages. People threaten that, but it is an irrational threat. (Mind you, people do silly, irrational things every day.) If you think its hard to migrate from Python 2 to 3, when you get to keep 90% of your code base and most of the backward-incompatible changes are a few libraries that have been renamed and a handful of syntax changes, how hard will it be to throw away 100% of your code and start again with a completely different language? -- Steven From alc at spika.net Thu Jan 9 00:14:21 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Thu, 9 Jan 2014 00:14:21 +0100 Subject: [Python-ideas] from __past__ import division, str, etc Message-ID: I am posting again this. I am new to mailing lists and I've realised I've sent it only to Nick Coghlan four hours ago. ---- 2014/1/8 Nick Coghlan > This is mostly a communications problem on our part. I certainly > thought "5 years for Python 3 to be the default choice for new > projects" was fairly straightforward to interpret > [...] > We're largely happy with the rate of adoption though - there were just > some folks that didn't grasp the kinds of time scales we're talking > about for a migration of this magnitude. > > See this for more details: > http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#but-uptake-is-so-slow-doesn-t-this-mean-python-3-is-failing-as-a-platform Ok, thanks. I see this has been a recurring topic and a lot of care has been given. I am looking at a different issue, though. I am thinking in existing projects, python 2 code in use, not even third party libraries but end products. I fear that if existing projects remain python 2 for too long because the benefit of expending resources on the migration are not worth the return, when the time finally comes for the upgrade another language might be chosen. At that point the divergence between that ancient py2 code and the latest py3 version would probably be greater than now, and other languages may offer features like GIL-less multithreading. If this happens in a large scale the whole python community may shrink and lose momentum. I do not know whether this is a real risk or not, and it is really up to you to assess it. I think a convenient way to run old python 2 modules along with new python 3 ones may be a good idea despite the cost. Even embedding a complete python 2.7 interpreter that executes .py2 modules and somehow shares the state with the main python 3 environment. I don't know whether that "monstrosity" is feasible without changing python 3 core too much, but it should help many people and nullify the risk of losing them. A large community is desirable in this context. > > With a 2to3 tool that covers 99.99% of the cases, we could even have .py2 > > modules that would be translated transparently to .py when first used > > I'm pretty sure someone already wrote one of those - they're a > problem, because they mean the tracebacks for runtime exceptions don't > match the source code The idea is that exceptions that end up showing tracebacks should be, uhmm, exceptional (the tool should work 99.9% of the time and we are talking about working py2 code). When something happens, the problem of a different source in the traceback could be handled by the translation tool by adding annotations (even comments). Cheers, Alejandro From rosuav at gmail.com Thu Jan 9 00:38:15 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 10:38:15 +1100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On Thu, Jan 9, 2014 at 10:14 AM, Alejandro L?pez Correa wrote: >> I'm pretty sure someone already wrote one of those - they're a >> problem, because they mean the tracebacks for runtime exceptions don't >> match the source code > > The idea is that exceptions that end up showing tracebacks should be, > uhmm, exceptional (the tool should work 99.9% of the time and we are > talking about working py2 code). When something happens, the problem > of a different source in the traceback could be handled by the > translation tool by adding annotations (even comments). That's a sort-of-viable option (C preprocessors have used #line directives for decades), but not really ideal. For it to work with current Python, it would have to actually _be_ comments, so every line would have to have something appended: # "file.py" 213 How would that behave on arbitrary code? What if there's backslash continuation? Will people know to go looking elsewhere? Exceptions DO happen. And when they do, the language should try to make it easy to figure out what's going on. I'm not sure how well that would be served by this, especially given that it's not supposed to be a normal workflow. If you build a new language that uses Python as its back-end, then manipulating the source code WOULD be the normal workflow, and in that case I'd wholeheartedly support editing the recorded line numbers (I think you can do that with AST manipulation??) so tracebacks show the original file and line. But this shouldn't be that normal. ChrisA From alc at spika.net Thu Jan 9 00:34:22 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Thu, 9 Jan 2014 00:34:22 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: <20140108220104.GS29356@ando> References: <20140108220104.GS29356@ando> Message-ID: 2014/1/8 Steven D'Aprano : > About 70% of the top 200 projects on PyPI support Python 3, and > downloads of Python 3 are very healthy, possibly even higher than > downloads of Python 2. I do not think that one is a particularly good metric. For each project hosted at PyPI how many are not there? People have personal projects, companies have internal software, and there are products that contain at least some python and are targeted at final customers, like games or Maya. Not everything is open source, but even if it is proprietary software it is good to have it since that way more jobs are offered and more people can earn money with this language, and that is a guarantee for its long-term success. >> and minimise the risk of defection to other languages. > > People threaten that, but it is an irrational threat. (Mind you, people > do silly, irrational things every day.) If you think its hard to migrate > from Python 2 to 3, when you get to keep 90% of your code base and most > of the backward-incompatible changes are a few libraries that have been > renamed and a handful of syntax changes, how hard will it be to throw > away 100% of your code and start again with a completely different > language? I think human psychology works like that. Many people may delay the acquisition of a new car, but once they are committed to buy a new one they want the best they can afford (within their budget). Some languages may gain momentum and gain the "cool" vibe. We saw the rise of Ruby a while ago, and maybe a language that handles well multiple cores could be a strong temptation in the future. If people keep investing in python, small bits at a time, keeping their codebase always up to date, it is more difficult, IMHO, to commit to a full rewrite. Alejandro From breamoreboy at yahoo.co.uk Thu Jan 9 00:43:14 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 08 Jan 2014 23:43:14 +0000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: On 08/01/2014 23:14, Alejandro L?pez Correa wrote: > > I think a convenient way to run old python 2 modules along with new > python 3 ones may be a good idea despite the cost. One of the major costs, quoting Winston Churchill, will be blood, toil, tears and sweat. How much of this are you personally intending to put into this effort, or are you happy to try and force core developers into a situation that many of them don't want to be in? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rosuav at gmail.com Thu Jan 9 00:54:23 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Jan 2014 10:54:23 +1100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On Thu, Jan 9, 2014 at 10:34 AM, Alejandro L?pez Correa wrote: > 2014/1/8 Steven D'Aprano : >> About 70% of the top 200 projects on PyPI support Python 3, and >> downloads of Python 3 are very healthy, possibly even higher than >> downloads of Python 2. > > I do not think that one is a particularly good metric. For each > project hosted at PyPI how many are not there? People have personal > projects, companies have internal software, and there are products > that contain at least some python and are targeted at final customers, > like games or Maya. But what IS a good metric? How are you going to measure any of that? It's better to at least use PyPI stats than to pull numbers out of a hat. >>> and minimise the risk of defection to other languages. >> >> People threaten that, but it is an irrational threat. (Mind you, people >> do silly, irrational things every day.) If you think its hard to migrate >> from Python 2 to 3, when you get to keep 90% of your code base and most >> of the backward-incompatible changes are a few libraries that have been >> renamed and a handful of syntax changes, how hard will it be to throw >> away 100% of your code and start again with a completely different >> language? > > I think human psychology works like that. Many people may delay the > acquisition of a new car, but once they are committed to buy a new one > they want the best they can afford (within their budget). Some > languages may gain momentum and gain the "cool" vibe. We saw the rise > of Ruby a while ago, and maybe a language that handles well multiple > cores could be a strong temptation in the future. If people keep > investing in python, small bits at a time, keeping their codebase > always up to date, it is more difficult, IMHO, to commit to a full > rewrite. Maybe. But how much temptation would it need to be to induce a complete rewrite? (Mind you, it's not always a *complete* rewrite. I've been "porting" code from Win32 C++ to GTK Pike, and in the process usually shortened it by 50% or better, but mostly what I'm doing is reading the old code, taking maybe a few bits of it that are so simple they'd be the same in nearly any language, and reimplementing the original logic.) The expanded gap between Python 2.7 and Python 3.7 is mainly going to be features of 3.7 that you could choose to use now that you've ported, rather than mandatory changes. Python doesn't arbitrarily drop features or break stuff in minor releases. That means the gap between 2.7 and 3.7 will still be far FAR narrower than the gap between Python and Ruby - so, correspondingly, the temptation to switch to Ruby would have to be really strong. In the porting case I mentioned a moment ago, there really was a very strong temptation (using Win32 APIs meant I was bound to Windows (though Wine is a wonderful thing), and the C++ code was going through stupid levels of overhead to manage memory and such), so it was worth switching. I was NOT able to convince my boss to switch our web site from PHP into Python, because he just couldn't see enough benefit from changing language - but moving to a new PHP was a much lower hump to get over. (Only a few things needed changing.) ChrisA From alc at spika.net Thu Jan 9 01:08:23 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Thu, 9 Jan 2014 01:08:23 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: Message-ID: 2014/1/9 Mark Lawrence : > On 08/01/2014 23:14, Alejandro L?pez Correa wrote: >> >> I think a convenient way to run old python 2 modules along with new >> python 3 ones may be a good idea despite the cost. > > One of the major costs, quoting Winston Churchill, will be blood, toil, > tears and sweat. How much of this are you personally intending to put into > this effort, or are you happy to try and force core developers into a > situation that many of them don't want to be in? I am not trying to force anything. I am offering my views and some ideas. Personally, I am not experiencing any problem with python 3 other than some missing third party libraries. I am sorry if my posts seem rude (I do not know whether that is the case). When writing in English, many times it is more like adjusting what I want to say to what I know how to say. That "despite the cost" was a poor choice of words. I agree that it is not reasonable to expect a change that means "blood, toil, tears and sweat" for the core developers when they are against it. However, there might be easy solutions not pretty but pragmatic and that might be implemented without polluting the main code base a lot. But again, I am not trying to force anything but convince people. I am starting to realise this debate has been going on for too long and it seems to have left some scars. I am sorry, it is brand new to me. Alejandro From alc at spika.net Thu Jan 9 01:18:55 2014 From: alc at spika.net (=?UTF-8?Q?Alejandro_L=C3=B3pez_Correa?=) Date: Thu, 9 Jan 2014 01:18:55 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: 2014/1/9 Chris Angelico : > On Thu, Jan 9, 2014 at 10:34 AM, Alejandro L?pez Correa wrote: >> 2014/1/8 Steven D'Aprano : > > But what IS a good metric? How are you going to measure any of that? > It's better to at least use PyPI stats than to pull numbers out of a > hat. > The problem I see is that metric might be equal or worse than just guessing because it is clearly biased: it focuses on open source projects hosted on PyPI. It is easy to measure it, but maybe it is not good to do so if that measure is used to make important decisions. In my [very limited] experience, the number of open source projects pales in comparison to that of projects kept "in the shadows". > Maybe. But how much temptation would it need to be to induce a > complete rewrite? (Mind you, it's not always a *complete* rewrite. > I've been "porting" code from Win32 C++ to GTK Pike, and in the > process usually shortened it by 50% or better, but mostly what I'm > doing is reading the old code, taking maybe a few bits of it that are > so simple they'd be the same in nearly any language, and > reimplementing the original logic.) The expanded gap between Python > 2.7 and Python 3.7 is mainly going to be features of 3.7 that you > could choose to use now that you've ported, rather than mandatory > changes. Python doesn't arbitrarily drop features or break stuff in > minor releases. That means the gap between 2.7 and 3.7 will still be > far FAR narrower than the gap between Python and Ruby - so, > correspondingly, the temptation to switch to Ruby would have to be > really strong. In the porting case I mentioned a moment ago, there > really was a very strong temptation (using Win32 APIs meant I was > bound to Windows (though Wine is a wonderful thing), and the C++ code > was going through stupid levels of overhead to manage memory and > such), so it was worth switching. I was NOT able to convince my boss > to switch our web site from PHP into Python, because he just couldn't > see enough benefit from changing language - but moving to a new PHP > was a much lower hump to get over. (Only a few things needed > changing.) Fair enough. I think it is a good argument. Alejandro From amber.yust at gmail.com Thu Jan 9 02:19:30 2014 From: amber.yust at gmail.com (Amber Yust) Date: Wed, 8 Jan 2014 17:19:30 -0800 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: Also note that even if publicly visible projects are outnumbered by private projects, the public projects tend to have a much larger impact on the overall ecosystem, because they're used by many entities (whereas private projects are typically only used by a single entity given their nature). On Jan 8, 2014 5:13 PM, "Alejandro L?pez Correa" wrote: > 2014/1/9 Chris Angelico : > > On Thu, Jan 9, 2014 at 10:34 AM, Alejandro L?pez Correa > wrote: > >> 2014/1/8 Steven D'Aprano : > > > > But what IS a good metric? How are you going to measure any of that? > > It's better to at least use PyPI stats than to pull numbers out of a > > hat. > > > > The problem I see is that metric might be equal or worse than just > guessing because it is clearly biased: it focuses on open source > projects hosted on PyPI. It is easy to measure it, but maybe it is not > good to do so if that measure is used to make important decisions. In > my [very limited] experience, the number of open source projects pales > in comparison to that of projects kept "in the shadows". > > > Maybe. But how much temptation would it need to be to induce a > > complete rewrite? (Mind you, it's not always a *complete* rewrite. > > I've been "porting" code from Win32 C++ to GTK Pike, and in the > > process usually shortened it by 50% or better, but mostly what I'm > > doing is reading the old code, taking maybe a few bits of it that are > > so simple they'd be the same in nearly any language, and > > reimplementing the original logic.) The expanded gap between Python > > 2.7 and Python 3.7 is mainly going to be features of 3.7 that you > > could choose to use now that you've ported, rather than mandatory > > changes. Python doesn't arbitrarily drop features or break stuff in > > minor releases. That means the gap between 2.7 and 3.7 will still be > > far FAR narrower than the gap between Python and Ruby - so, > > correspondingly, the temptation to switch to Ruby would have to be > > really strong. In the porting case I mentioned a moment ago, there > > really was a very strong temptation (using Win32 APIs meant I was > > bound to Windows (though Wine is a wonderful thing), and the C++ code > > was going through stupid levels of overhead to manage memory and > > such), so it was worth switching. I was NOT able to convince my boss > > to switch our web site from PHP into Python, because he just couldn't > > see enough benefit from changing language - but moving to a new PHP > > was a much lower hump to get over. (Only a few things needed > > changing.) > > Fair enough. I think it is a good argument. > > Alejandro > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jan 9 05:44:44 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 09 Jan 2014 17:44:44 +1300 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <8761pugybm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <8761pugybm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52CE293C.5020503@canterbury.ac.nz> Stephen J. Turnbull wrote: > No, it doesn't. It means 'abc' followed by something that cannot be > encoded by any codec without the surrogateescape handler. > 'ascii-compatible' merely defaults to that handler. I wouldn't > actually be too upset if I were told, no, you have to specify > explicitly. If I understand correctly, your intention is that 61 62 63 FF in this representation would simply be a more compact version of 0061 0062 0063 DCFF, with exactly the same semantics. If that's right, then maybe something like "compressed surrogateescape" or "8-bit surrogateescape" would be a better name for it? Also, it could be produced automatically where possible by any decoding operation that specified surrogateescape -- there wouldn't have to be a dedicated encoding name for it (although there could be for convenience). It could also potentially be produced by any slicing or other string operations that resulted in characters within the appropriate ranges, just like any of the other internal representations. -- Greg From stephen at xemacs.org Thu Jan 9 09:40:04 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Jan 2014 17:40:04 +0900 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: <52CE293C.5020503@canterbury.ac.nz> References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <8761pugybm.fsf@uwakimon.sk.tsukuba.ac.jp> <52CE293C.5020503@canterbury.ac.nz> Message-ID: <87ppo1ft0b.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > If I understand correctly, your intention is that > 61 62 63 FF in this representation would simply be > a more compact version of 0061 0062 0063 DCFF, > with exactly the same semantics. Pretty much so. There remain some ambiguities and questions about efficient implementability in my mind. > If that's right, then maybe something like "compressed > surrogateescape" or "8-bit surrogateescape" would be > a better name for it? Maybe. Thanks for the suggestion! However, as I mentioned already I'm going to back off on this for a while, because in the process of analyzing Inada-san's use case I realized that by itself it doesn't save much besides space, and isn't pretty too boot. From victor.stinner at gmail.com Thu Jan 9 09:55:29 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 9 Jan 2014 09:55:29 +0100 Subject: [Python-ideas] The fools shall start sucking the cock. In-Reply-To: References: Message-ID: 2014/1/8 Brett Cannon : > After others coming forward about previous behavior, this email is serving > as an official warning: one more infraction of the CoC and you will be > banned from this mailing list. ( CoC stands for Code of Conduct, text available at http://python.org/psf/codeofconduct/ ) Victor From ncoghlan at gmail.com Thu Jan 9 12:12:42 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 21:12:42 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: <20140108220104.GS29356@ando> References: <20140108220104.GS29356@ando> Message-ID: On 9 Jan 2014 06:02, "Steven D'Aprano" wrote: > > On Wed, Jan 08, 2014 at 05:22:02PM +0100, Alejandro L?pez Correa wrote: > [...] > > Anyway, I understand it is not a clean way to proceed, but something along > > these lines might be the only way to speed up the adoption of python 3 > > One assumption in this discussion, and the various related discussions > on Reddit and other places, is that adoption of Python 3 is too slow and > needs to be sped up. I don't believe this is true. I believe adoption > is just right and exactly what should be expected. > > Alex Gaynor wrote a blog post a week or so ago claiming that, five years > since Python 3 was first released, everyone should have migrated by now > and that since only "five percent" (a figure which I believe he pulled > out of thin air) have migrated, Python 3 has been a failure. Alex's numbers were real - they're based on user agent header analysis for PyPI downloads. However, the other readily available metric is python.orginstaller downloads, and those favour Python 3 (and that's even before we publish 3.4, which has nice additions like pip, statistics and asyncio). Alex is a smart guy, but I don't know how he managed to get "After 5 years, Python 3 should be more widely used than Python 2" (clearly unrealistic) from "After 5 years, Python 3 should be mature enough to be the default choice for new projects". The latter is what we actually said, and, allowing for the 6 month delay to replace 3.0's unusably slow IO stack in 3.1, still looks plausible given the updates coming in Python 3.4. That article was actually the one that made me realise my Q&A needed a few more questions and answers :) > I challenge that belief. I've been hanging around here and on the > Python-Dev list for a long time, and while I can't find any official > pronouncement, the sense has always been that Python 3 adoption will > take ten years, not five. (That's my recollection -- if any of the core > developers wish to correct me, please do.) 5 years to be the default for new projects, we never set a goal for overtaking Python 2 overall. This Q and the one after it are most directly relevant: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#when-can-we-expect-python-3-to-be-the-obvious-choice-for-new-projects >Rates of adoption are much, > much higher than gossip on the Internet suggests. About 70% of the top > 200 projects on PyPI support Python 3, and downloads of Python 3 are > very healthy, possibly even higher than downloads of Python 2. >On the > tutor list, I see a significant number of beginners using Python 3. All our discussions with distros and redistributors are also about *how* to manage the transition, rather than *if*. Red Hat providing commercial support for Python 3.3 as of last September is a *big* deal, particularly given this week's announcement about CentOS being adopted as an officially Red Hat sponsored project and the popularity of CentOS as a platform in the scientific community. > > It seems to me that given the circumstances, Python 3 adoption is right > where we should expect it to be half-way through a decade-long process. > There will be a period at the start when hardly anyone will migrate, > then a period of accelerating migration, which will accelerate further > when the mainstream Linux distros start shipping Python 3 as their > system Python (ArchLinux is not mainstream, but Fedora is planning the > change), followed by a sudden rush in another four or five years when > people realise that Python 2.7 becoming unmaintained is no longer a > distant prospect but is about to happen. Yup. I actually started adding a timeline to my Q&A: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-are-or-were-some-of-the-key-dates-in-the-python-3-transition > For many people, waiting until the last minute is the most sensible > thing that they can do. This gives time for the early adoptors to > discover and iron out all the wrinkles and difficulties. Rather than > approaching this as "Python 3 has been a failure, what can we do to save > it?" we should be approaching this as "Python 3 has been a success, what > lessons can we take from the early adoptors to make it even easier for > the next wave of adoptors?" We've definitely made some mistakes in the area of communications, though. In particular, we probably should have had something like my Q&A available as an authoritative information source from the beginning, instead of only creating it around the release of Python 3.3. > > "from __past__ import spam" does not make it easier to adopt. It just > makes it easier to *put off adopting*. > > > > and minimise the risk of defection to other languages. > > People threaten that, but it is an irrational threat. (Mind you, people > do silly, irrational things every day.) If you think its hard to migrate > from Python 2 to 3, when you get to keep 90% of your code base and most > of the backward-incompatible changes are a few libraries that have been > renamed and a handful of syntax changes, how hard will it be to throw > away 100% of your code and start again with a completely different > language? Exactly. By the time 2.7 goes into security fix only mode, we will have been maintaining Python 2 and Python 3 in parallel for more than *8 years*. This is a deliberate choice on our part to allow plenty of time for users to decide to migrate on their own, rather than attempting to force them to migrate with the stick of a lack of support. Even after that, commercial Python 2.x support will be available until at least 2023, and likely longer. Cheers, Nick. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 9 12:16:51 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 9 Jan 2014 21:16:51 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On 9 Jan 2014 09:49, "Amber Yust" wrote: > > Also note that even if publicly visible projects are outnumbered by private projects, the public projects tend to have a much larger impact on the overall ecosystem, because they're used by many entities (whereas private projects are typically only used by a single entity given their nature). It also mistakenly assumes our goal is to get existing *applications* to migrate. It really isn't - we're obviously delighted if app developers choose to switch (as it indicates we have created a compelling platform), but we *needed* key library and framework developers to add Python 3 support in order to bootstrap the Python 3 development ecosystem. Cheers, Nick. > > On Jan 8, 2014 5:13 PM, "Alejandro L?pez Correa" wrote: >> >> 2014/1/9 Chris Angelico : >> > On Thu, Jan 9, 2014 at 10:34 AM, Alejandro L?pez Correa wrote: >> >> 2014/1/8 Steven D'Aprano : >> > >> > But what IS a good metric? How are you going to measure any of that? >> > It's better to at least use PyPI stats than to pull numbers out of a >> > hat. >> > >> >> The problem I see is that metric might be equal or worse than just >> guessing because it is clearly biased: it focuses on open source >> projects hosted on PyPI. It is easy to measure it, but maybe it is not >> good to do so if that measure is used to make important decisions. In >> my [very limited] experience, the number of open source projects pales >> in comparison to that of projects kept "in the shadows". >> >> > Maybe. But how much temptation would it need to be to induce a >> > complete rewrite? (Mind you, it's not always a *complete* rewrite. >> > I've been "porting" code from Win32 C++ to GTK Pike, and in the >> > process usually shortened it by 50% or better, but mostly what I'm >> > doing is reading the old code, taking maybe a few bits of it that are >> > so simple they'd be the same in nearly any language, and >> > reimplementing the original logic.) The expanded gap between Python >> > 2.7 and Python 3.7 is mainly going to be features of 3.7 that you >> > could choose to use now that you've ported, rather than mandatory >> > changes. Python doesn't arbitrarily drop features or break stuff in >> > minor releases. That means the gap between 2.7 and 3.7 will still be >> > far FAR narrower than the gap between Python and Ruby - so, >> > correspondingly, the temptation to switch to Ruby would have to be >> > really strong. In the porting case I mentioned a moment ago, there >> > really was a very strong temptation (using Win32 APIs meant I was >> > bound to Windows (though Wine is a wonderful thing), and the C++ code >> > was going through stupid levels of overhead to manage memory and >> > such), so it was worth switching. I was NOT able to convince my boss >> > to switch our web site from PHP into Python, because he just couldn't >> > see enough benefit from changing language - but moving to a new PHP >> > was a much lower hump to get over. (Only a few things needed >> > changing.) >> >> Fair enough. I think it is a good argument. >> >> Alejandro >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Thu Jan 9 15:03:27 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 9 Jan 2014 15:03:27 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On Thu, Jan 9, 2014 at 12:16 PM, Nick Coghlan wrote: > > On 9 Jan 2014 09:49, "Amber Yust" wrote: > > > > Also note that even if publicly visible projects are outnumbered by > private projects, the public projects tend to have a much larger impact on > the overall ecosystem, because they're used by many entities (whereas > private projects are typically only used by a single entity given their > nature). > > It also mistakenly assumes our goal is to get existing *applications* to > migrate. It really isn't - we're obviously delighted if app developers > choose to switch (as it indicates we have created a compelling platform), > but we *needed* key library and framework developers to add Python 3 > support in order to bootstrap the Python 3 development ecosystem. > True. I think one of the key points here is that different important libs haven't been ported yet: https://python3wos.appspot.com/ Too many of them are still marked red and IMO that is the main reason why a lot of people are being so hesitant, not unicode. "boto" alone counts as hundreds of thousands potential users which simply cannot migrate. Django made the transition only a couple of months ago, which basically means it's still in a beta state, and AFAIK fundamental projects such as Twisted don't even have an ETA. Considering 5 years have passed since Python 3.0 first made it's appearance I consider this a *serious* delay. >From a user standpoint this sort of appears as a signal which translates into "if neither big project X has migrated after 5 years why should I?". That's likely to apply even if project X is not within the list of your dependencies, because you may not depend from X now but maybe you will in the future, either because you need X or because Y requires X in order to work. It is *crucial* for people maintaining those libraries to put Python 3 porting on top of their TODO list at the cost of not working on new features. --- Giampaolo https://code.google.com/p/psutil/ https://code.google.com/p/pyftpdlib/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 9 17:50:55 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 02:50:55 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On 9 Jan 2014 22:03, "Giampaolo Rodola'" wrote: > > > > > On Thu, Jan 9, 2014 at 12:16 PM, Nick Coghlan wrote: >> >> >> On 9 Jan 2014 09:49, "Amber Yust" wrote: >> > >> > Also note that even if publicly visible projects are outnumbered by private projects, the public projects tend to have a much larger impact on the overall ecosystem, because they're used by many entities (whereas private projects are typically only used by a single entity given their nature). >> >> It also mistakenly assumes our goal is to get existing *applications* to migrate. It really isn't - we're obviously delighted if app developers choose to switch (as it indicates we have created a compelling platform), but we *needed* key library and framework developers to add Python 3 support in order to bootstrap the Python 3 development ecosystem. > > > True. > I think one of the key points here is that different important libs haven't been ported yet: > https://python3wos.appspot.com/ > Too many of them are still marked red and IMO that is the main reason why a lot of people are being so hesitant, not unicode. > "boto" alone counts as hundreds of thousands potential users which simply cannot migrate. > Django made the transition only a couple of months ago, which basically means it's still in a beta state, and AFAIK fundamental projects such as Twisted don't even have an ETA. > Considering 5 years have passed since Python 3.0 first made it's appearance I consider this a *serious* delay. > From a user standpoint this sort of appears as a signal which translates into "if neither big project X has migrated after 5 years why should I?". > That's likely to apply even if project X is not within the list of your dependencies, because you may not depend from X now but maybe you will in the future, either because you need X or because Y requires X in order to work. It is *crucial* for people maintaining those libraries to put Python 3 porting on top of their TODO list at the cost of not working on new features. This is still focusing on migrating *existing* applications. We're not especially worried if existing applications keep using 2.7 - it's a good language that is almost certain to be commercially supported for at least another decade, even though upstream support will switch to security fix only mode in 2015. If it ain't broke (and for existing applications, 2.7 generally ain't broke), don't fix it. But if a project has persistent problems with application developers persistently introducing bugs by using 8-bit strings where they should be using Unicode, or otherwise running into the assorted bug magnets we removed in Python 3, the migration may be worth considering. A user that starts with Python 3 simply wouldn't consider a dependency like boto as an option, and would reach for asyncio rather than Twisted for their explicit asynchronous programming needs. Cheers, Nick. > > --- Giampaolo > https://code.google.com/p/psutil/ > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jan 9 19:00:47 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 9 Jan 2014 10:00:47 -0800 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: <49E3D2C0-85CB-49B2-B358-00A5F3A2938C@yahoo.com> On Jan 9, 2014, at 8:50, Nick Coghlan wrote: > But if a project has persistent problems with application developers persistently introducing bugs by using 8-bit strings where they should be using Unicode, or otherwise running into the assorted bug magnets we removed in Python 3, the migration may be worth considering. One thing to note: For many applications, it's not that hard to migrate to the six-able subset of 2.7/3.3. This allows 2.x-centric contributors (including those who want to be able to just use the python that Apple or Ubuntu pre-installed on their dev box), allows you to continue using py2exe for your Windows binaries, and gives you an out if you run into the "but what it we later need some library that hasn't been ported yet" problem (which I think is drastically overblown, but it's certainly a common enough fear). And meanwhile, from my experience, it's at least as hard to introduce subtle Unicode bugs in a dual-version code base as a 3.x-only code base, and just as easy to debug them, so you get at least one advantage over 2.x-only. And being able to migrate gradually instead of having a flag-day release is always nice. In my day job, I work on a project that's written in multiple languages, and the python parts are all 2.7+/3.3+. While I miss being able to use some 3.3 features, and it's annoying to deal with problems like 2.7 using too much memory when processing giant XML files or the old version of sqlite in 2.7.2 panicking over a simple union and ignoring an index, it's still far better than having to debug mojibake in 2.7, or writing in node.js or ObjC. From g.rodola at gmail.com Thu Jan 9 19:05:22 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 9 Jan 2014 19:05:22 +0100 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On Thu, Jan 9, 2014 at 5:50 PM, Nick Coghlan wrote: > > On 9 Jan 2014 22:03, "Giampaolo Rodola'" wrote: > > > > > > > > > > On Thu, Jan 9, 2014 at 12:16 PM, Nick Coghlan > wrote: > >> > >> > >> On 9 Jan 2014 09:49, "Amber Yust" wrote: > >> > > >> > Also note that even if publicly visible projects are outnumbered by > private projects, the public projects tend to have a much larger impact on > the overall ecosystem, because they're used by many entities (whereas > private projects are typically only used by a single entity given their > nature). > >> > >> It also mistakenly assumes our goal is to get existing *applications* > to migrate. It really isn't - we're obviously delighted if app developers > choose to switch (as it indicates we have created a compelling platform), > but we *needed* key library and framework developers to add Python 3 > support in order to bootstrap the Python 3 development ecosystem. > > > > > > True. > > I think one of the key points here is that different important libs > haven't been ported yet: > > https://python3wos.appspot.com/ > > Too many of them are still marked red and IMO that is the main reason > why a lot of people are being so hesitant, not unicode. > > "boto" alone counts as hundreds of thousands potential users which > simply cannot migrate. > > Django made the transition only a couple of months ago, which basically > means it's still in a beta state, and AFAIK fundamental projects such as > Twisted don't even have an ETA. > > Considering 5 years have passed since Python 3.0 first made it's > appearance I consider this a *serious* delay. > > From a user standpoint this sort of appears as a signal which translates > into "if neither big project X has migrated after 5 years why should I?". > > That's likely to apply even if project X is not within the list of your > dependencies, because you may not depend from X now but maybe you will in > the future, either because you need X or because Y requires X in order to > work. It is *crucial* for people maintaining those libraries to put Python > 3 porting on top of their TODO list at the cost of not working on new > features. > > This is still focusing on migrating *existing* applications. > I was talking about existing third party libraries (Twisted, gevent, lxml etc), not user applications. In order to port user applications you need those libraries to be ported first, and it is crucial that at least the most used ones are ported. > A user that starts with Python 3 simply wouldn't consider a dependency > like boto as an option > Why not? Note that I picked "boto" just because it's the first in that list. > and would reach for asyncio rather than Twisted for their explicit > asynchronous programming needs. > I would't be so sure about that. We're talking about two very mature and established projects, with tons of third-party components (see https://github.com/facebook/tornado/wiki/Links), each solving a common set of problems in their own way, which will likely continue to be used independently from asyncio (which is still in a beta state) for quite a while. --- Giampaolo https://code.google.com/p/psutil/ https://code.google.com/p/pyftpdlib/ https://code.google.com/p/pysendfile/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 9 19:29:20 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Jan 2014 04:29:20 +1000 Subject: [Python-ideas] from __past__ import division, str, etc In-Reply-To: References: <20140108220104.GS29356@ando> Message-ID: On 10 Jan 2014 02:05, "Giampaolo Rodola'" wrote: > > On Thu, Jan 9, 2014 at 5:50 PM, Nick Coghlan wrote: >> >> >> On 9 Jan 2014 22:03, "Giampaolo Rodola'" wrote: >> > >> > >> > >> > >> > On Thu, Jan 9, 2014 at 12:16 PM, Nick Coghlan wrote: >> >> >> >> >> >> On 9 Jan 2014 09:49, "Amber Yust" wrote: >> >> > >> >> > Also note that even if publicly visible projects are outnumbered by private projects, the public projects tend to have a much larger impact on the overall ecosystem, because they're used by many entities (whereas private projects are typically only used by a single entity given their nature). >> >> >> >> It also mistakenly assumes our goal is to get existing *applications* to migrate. It really isn't - we're obviously delighted if app developers choose to switch (as it indicates we have created a compelling platform), but we *needed* key library and framework developers to add Python 3 support in order to bootstrap the Python 3 development ecosystem. >> > >> > >> > True. >> > I think one of the key points here is that different important libs haven't been ported yet: >> > https://python3wos.appspot.com/ >> > Too many of them are still marked red and IMO that is the main reason why a lot of people are being so hesitant, not unicode. >> > "boto" alone counts as hundreds of thousands potential users which simply cannot migrate. >> > Django made the transition only a couple of months ago, which basically means it's still in a beta state, and AFAIK fundamental projects such as Twisted don't even have an ETA. >> > Considering 5 years have passed since Python 3.0 first made it's appearance I consider this a *serious* delay. >> > From a user standpoint this sort of appears as a signal which translates into "if neither big project X has migrated after 5 years why should I?". >> > That's likely to apply even if project X is not within the list of your dependencies, because you may not depend from X now but maybe you will in the future, either because you need X or because Y requires X in order to work. It is *crucial* for people maintaining those libraries to put Python 3 porting on top of their TODO list at the cost of not working on new features. >> >> This is still focusing on migrating *existing* applications. > > I was talking about existing third party libraries (Twisted, gevent, lxml etc), not user applications. > In order to port user applications you need those libraries to be ported first, and it is crucial that at least the most used ones are ported. >> >> A user that starts with Python 3 simply wouldn't consider a dependency like boto as an option > > Why not? Note that I picked "boto" just because it's the first in that list. >> >> and would reach for asyncio rather than Twisted for their explicit asynchronous programming needs. > > I would't be so sure about that. We're talking about two very mature and established projects, with tons of third-party components (see https://github.com/facebook/tornado/wiki/Links), each solving a common set of problems in their own way, which will likely continue to be used independently from asyncio (which is still in a beta state) for quite a while. Yes, Python 3 will be an even *better* ecosystem as more of the Python 2 ecosystem becomes available. That is not in dispute. The point is that *most new software* should be able to find appropriate packages in Python 3 at this point in time, and also has access to modules like "python-future", which make it relatively straightforward to downgrade to Python 2.7 if you start in Python 3 and then find a Python 2 only library that you absolutely positively have to depend on. This means that those unported libraries aren't a reason to *start* a greenfield project in Python 2. "Python 3 by default" *also* doesn't mean "never any reason to start a new Python 2 project instead". Cheers, Nick. > > > --- Giampaolo > https://code.google.com/p/psutil/ > https://code.google.com/p/pyftpdlib/ > https://code.google.com/p/pysendfile/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Thu Jan 9 21:01:17 2014 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 09 Jan 2014 14:01:17 -0600 Subject: [Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?] In-Reply-To: References: <52C9B39F.6060205@stoneleaf.us> <87fvp1idip.fsf@uwakimon.sk.tsukuba.ac.jp> <1389009153.31778.YahooMailNeo@web181001.mail.ne1.yahoo.com> <87bnzphs7j.fsf@uwakimon.sk.tsukuba.ac.jp> <20140107154401.GK29356@ando> <52CC45E4.7010400@mrabarnett.plus.com> <52CC49B0.2090406@stoneleaf.us> <52CC564A.7040602@mrabarnett.plus.com> <52CC58F5.2050603@stoneleaf.us> <52CC6A80.60100@stoneleaf.us> Message-ID: On 01/07/2014 04:36 PM, Alexander Heger wrote: >>> >>Of course I'm unhappy with it, it doesn't behave the way I think it should, >>> >>and it's not consistent. >> >Consistent with what? (Before you rush in an answer, remember that >> >there are almost always multiple sides to a consistency argument.) >> >I don't see what's wrong with those. Both produce valid expressions >> >that, when entered, compare equal to the object whose repr() was >> >printed. What more would you*want*? > I find that the definition str is inconsistent indeed, because the > items in a string are strings again, not characters (or code points). > I don't think there is too many other examples in Python where the > same is true; indexing a list does not give a list but the item that > is at the point. If you use slices, then it's more consistent with strings. A slice of a list gives you a list, a slice of a string gives you a string. The idea of sub-components always breaks down at some level. Then it shifts to equivalent translations, rather than smaller units. Like converting strings to bytes, and back again. They aren't sub components of each other. Where you draw the lines is dependent on how close you look. (Python, bytecode, C code, assemby, bytes, bits, voltages, ...) We can stay at the python level if we choose the viewpoint that an object is the Python code that creates that object. We have to allow for the execution of that code in our understanding of it. Cheers, Ron From ram.rachum at gmail.com Sat Jan 11 15:18:32 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Sat, 11 Jan 2014 06:18:32 -0800 (PST) Subject: [Python-ideas] `OrderedDict.items().__getitem__` Message-ID: I think that `OrderedDict.items().__getitem__` should be implemented, to solve this ugliness: http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jan 11 15:36:11 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 01:36:11 +1100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: On Sun, Jan 12, 2014 at 1:18 AM, Ram Rachum wrote: > I think that `OrderedDict.items().__getitem__` should be implemented, to > solve this ugliness: > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 > > What do you think? Well, the first problem with that is that __getitem__ already exists, and it's dict-style :) So you can't fetch out an item by its position that way. But suppose you create a method that returns the Nth element. The implementation in CPython 3.4 is a linked list, so getting an arbitrary element by index would be quite inefficient. Getting specifically the first can be done either with what you see in that link (it could be made a tiny bit shorter, but not much), but anything else would effectively entail iterating over the whole thing until you get to that position, so you may as well do that explicitly. Alternatively, if you're okay with it being a destructive operation, you can use popitem() to snag the first (or last, if you wish) key/value pair. ChrisA From __peter__ at web.de Sat Jan 11 16:36:49 2014 From: __peter__ at web.de (Peter Otten) Date: Sat, 11 Jan 2014 16:36:49 +0100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` References: Message-ID: Ram Rachum wrote: > I think that `OrderedDict.items().__getitem__` should be implemented, to > solve this ugliness: > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first- item-of-ordereddict-in-python-3 > > What do you think? I think an O(N) __getitem__() is even uglier. Also, you should have really compelling reasons for allowing the interfaces of dict.items() and OrderedDict.items() to diverge. Personally, I'd use a helper function def first(items): for item in items: return item raise ValueError("No first item in an empty sequence") and I don't understand why user thefourtheye is downvoted. Hiding a non- obvious if small piece of code behind a self-explaining name seems like good programming practice. From breamoreboy at yahoo.co.uk Sat Jan 11 16:55:19 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 11 Jan 2014 15:55:19 +0000 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: On 11/01/2014 14:18, Ram Rachum wrote: > I think that `OrderedDict.items().__getitem__` should be implemented, to > solve this ugliness: > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 > > What do you think? > > Thanks, > Ram. > Use the more_itertools first function. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rymg19 at gmail.com Sat Jan 11 21:51:28 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 11 Jan 2014 14:51:28 -0600 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: Based on your popitem idea: get_first = lambda d: d.copy().popitem() get_last = lambda d: d.copy().popitem(last=True) On Sat, Jan 11, 2014 at 8:36 AM, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 1:18 AM, Ram Rachum wrote: > > I think that `OrderedDict.items().__getitem__` should be implemented, to > > solve this ugliness: > > > > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 > > > > What do you think? > > Well, the first problem with that is that __getitem__ already exists, > and it's dict-style :) So you can't fetch out an item by its position > that way. But suppose you create a method that returns the Nth > element. > > The implementation in CPython 3.4 is a linked list, so getting an > arbitrary element by index would be quite inefficient. Getting > specifically the first can be done either with what you see in that > link (it could be made a tiny bit shorter, but not much), but anything > else would effectively entail iterating over the whole thing until you > get to that position, so you may as well do that explicitly. > Alternatively, if you're okay with it being a destructive operation, > you can use popitem() to snag the first (or last, if you wish) > key/value pair. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jan 11 22:41:03 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 08:41:03 +1100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: On Sun, Jan 12, 2014 at 7:51 AM, Ryan Gonzalez wrote: > Based on your popitem idea: > > get_first = lambda d: d.copy().popitem() > get_last = lambda d: d.copy().popitem(last=True) That's a destructive operation, though. Great if you want it, terrible if you don't. ChrisA From grosser.meister.morti at gmx.net Sat Jan 11 22:47:26 2014 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 11 Jan 2014 22:47:26 +0100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: <52D1BBEE.4060407@gmx.net> Why not: get_first = lambda d: next(iter(d.items())) No need for a full copy of the dict. On 01/11/2014 09:51 PM, Ryan Gonzalez wrote: > Based on your popitem idea: > > get_first = lambda d: d.copy().popitem() > get_last = lambda d: d.copy().popitem(last=True) > > > > On Sat, Jan 11, 2014 at 8:36 AM, Chris Angelico > wrote: > > On Sun, Jan 12, 2014 at 1:18 AM, Ram Rachum > wrote: > > I think that `OrderedDict.items().__getitem__` should be implemented, to > > solve this ugliness: > > > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 > > > > What do you think? > > Well, the first problem with that is that __getitem__ already exists, > and it's dict-style :) So you can't fetch out an item by its position > that way. But suppose you create a method that returns the Nth > element. > > The implementation in CPython 3.4 is a linked list, so getting an > arbitrary element by index would be quite inefficient. Getting > specifically the first can be done either with what you see in that > link (it could be made a tiny bit shorter, but not much), but anything > else would effectively entail iterating over the whole thing until you > get to that position, so you may as well do that explicitly. > Alternatively, if you're okay with it being a destructive operation, > you can use popitem() to snag the first (or last, if you wish) > key/value pair. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > Ryan > When your hammer is C++, everything begins to look like a thumb. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Sat Jan 11 23:03:39 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 09:03:39 +1100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: <52D1BBEE.4060407@gmx.net> References: <52D1BBEE.4060407@gmx.net> Message-ID: On Sun, Jan 12, 2014 at 8:47 AM, Mathias Panzenb?ck wrote: > Why not: > > get_first = lambda d: next(iter(d.items())) > > No need for a full copy of the dict. > > > On 01/11/2014 09:51 PM, Ryan Gonzalez wrote: >> >> Based on your popitem idea: >> >> get_first = lambda d: d.copy().popitem() >> get_last = lambda d: d.copy().popitem(last=True) Oh right. Yeah, copy(). So this isn't destructive, but as Mathias says, it's probably inefficient. (I say "probably" because it's theoretically possible to optimize the copy operation - but I don't see anything like that in the source code.) ChrisA From breamoreboy at yahoo.co.uk Sat Jan 11 23:32:02 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 11 Jan 2014 22:32:02 +0000 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: <52D1BBEE.4060407@gmx.net> Message-ID: On 11/01/2014 22:03, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 8:47 AM, Mathias Panzenb?ck > wrote: >> Why not: >> >> get_first = lambda d: next(iter(d.items())) >> >> No need for a full copy of the dict. >> >> >> On 01/11/2014 09:51 PM, Ryan Gonzalez wrote: >>> >>> Based on your popitem idea: >>> >>> get_first = lambda d: d.copy().popitem() >>> get_last = lambda d: d.copy().popitem(last=True) > > Oh right. Yeah, copy(). So this isn't destructive, but as Mathias > says, it's probably inefficient. (I say "probably" because it's > theoretically possible to optimize the copy operation - but I don't > see anything like that in the source code.) > > ChrisA > Surely a shallow copy isn't guaranteed to work properly in all cases anyway? copy(...) D.copy() -> a shallow copy of D -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rosuav at gmail.com Sat Jan 11 23:53:36 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 09:53:36 +1100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: <52D1BBEE.4060407@gmx.net> Message-ID: On Sun, Jan 12, 2014 at 9:32 AM, Mark Lawrence wrote: > Surely a shallow copy isn't guaranteed to work properly in all cases anyway? > > copy(...) > D.copy() -> a shallow copy of D A shallow copy is sufficient if it's about to mutate the dictionary itself (popitem). It's the right semantics... just the wrong complexity, as it's expensive on large dictionaries :) ChrisA From rymg19 at gmail.com Sun Jan 12 00:01:57 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 11 Jan 2014 17:01:57 -0600 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: <52D1BBEE.4060407@gmx.net> Message-ID: It is pretty inefficient. As for getting the last item, however, I think something like that might end up the best. And, you've gotta admit, it isn't bad for a 30-second solution with no real planning whatsoever. On Sat, Jan 11, 2014 at 4:03 PM, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 8:47 AM, Mathias Panzenb?ck > wrote: > > Why not: > > > > get_first = lambda d: next(iter(d.items())) > > > > No need for a full copy of the dict. > > > > > > On 01/11/2014 09:51 PM, Ryan Gonzalez wrote: > >> > >> Based on your popitem idea: > >> > >> get_first = lambda d: d.copy().popitem() > >> get_last = lambda d: d.copy().popitem(last=True) > > Oh right. Yeah, copy(). So this isn't destructive, but as Mathias > says, it's probably inefficient. (I say "probably" because it's > theoretically possible to optimize the copy operation - but I don't > see anything like that in the source code.) > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sun Jan 12 00:04:06 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 11 Jan 2014 18:04:06 -0500 Subject: [Python-ideas] namedtuple baseclass Message-ID: Hello all, I propose to add a baseclass for all namedtuples. Right now 'namedtuple' function dynamically creates a class derived from 'tuple', which complicates things like dynamic dispatch. Basically, the only way of checking if an object is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')". One possible approach would be to: 1. Rename 'namedtuple' function to '_namedtuple' 2. Add a class 'namedtuple(tuple)', with its '__new__' method proxying '_namedtuple' function 3. Modify the class template to derive namedtuples from the 'namedtuple' class, instead of 'tuple' This way, it's possible to simple write 'isinstance(o, namedtuple)'. I have a working patch that implements the above logic (all python unittests pass), so if you find this useful I can start an issue on bugs.python.org. Thank you, Yury -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Jan 12 00:24:47 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 10:24:47 +1100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: <52D1BBEE.4060407@gmx.net> Message-ID: On Sun, Jan 12, 2014 at 10:01 AM, Ryan Gonzalez wrote: > It is pretty inefficient. As for getting the last item, however, I think > something like that might end up the best. > For getting the last item, reversed() should be as fast as iter() is for getting the first - at least in CPython 3.4, which is what I was looking at. > And, you've gotta admit, it isn't bad for a 30-second solution with no real > planning whatsoever. There is that :) ChrisA From jbvsmo at gmail.com Sun Jan 12 01:14:29 2014 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Sat, 11 Jan 2014 22:14:29 -0200 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: I never liked this implementation of namedtuple with "exec". I remember some proposals (and even a working implementation) of namedtuple done with metaclasses. I Don't remember why they were rejected. I think at least having a base class other than tuple is something useful. +1 Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sun Jan 12 01:40:44 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 11 Jan 2014 19:40:44 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: Yeah, while I was working on the patch, I thought about rewriting it all without the use of "exec". But that would be too much of a change 10 days before RC1. Therefore, the proposed change is minimal, aimed to only slightly improve the current design. Yury On Sat, Jan 11, 2014 at 7:14 PM, Jo?o Bernardo wrote: > I never liked this implementation of namedtuple with "exec". I remember > some proposals > (and even a working implementation) of namedtuple done with metaclasses. I > Don't remember > why they were rejected. > > I think at least having a base class other than tuple is something useful. > > +1 > > > Jo?o Bernardo > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jan 12 02:05:46 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 12:05:46 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: <20140112010542.GT3869@ando> On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: > Hello all, > > I propose to add a baseclass for all namedtuples. Right now 'namedtuple' > function dynamically creates a class derived from 'tuple', which complicates > things like dynamic dispatch. Basically, the only way of checking if an > object > is an instance of 'namedtuple' is to do "isinstance(o, tuple) and > hasattr(o, '_fields')". Let me see if I understand your use-case. You want to dynamically dispatch on various objects. Given two objects: p1 = (23, 42) p2 = namedtuple("pair", "a b")(23, 42) assert p1 == p2 you want to dispatch p1 and p2 differently. Is that correct? Then, given a third object: class Person(namedtuple("Person", "name sex age occupation id")): def say_hello(self): print("Hello %s" % self.name) p3 = Person("Fred Smith", "M", 35, "nurse", 927056) you want to dispatch p2 and p3 the same. Is that correct? If I am correct, I wonder what sort of code you are writing that wants to treat p1 and p2 differently, and p2 and p3 the same. To me, this seems ill-advised. Apart from tuple (and object), p2 and p3 should not share a common base class, because they have nothing in common. [...] > This way, it's possible to simple write 'isinstance(o, namedtuple)'. I am having difficulty thinking of circumstances where I would want to do that. -1 on the idea. -- Steven From yselivanov.ml at gmail.com Sun Jan 12 02:26:59 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 11 Jan 2014 20:26:59 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112010542.GT3869@ando> References: <20140112010542.GT3869@ando> Message-ID: Hi Steven, On Sat, Jan 11, 2014 at 8:05 PM, Steven D'Aprano wrote: > On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: >> Hello all, >> >> I propose to add a baseclass for all namedtuples. Right now 'namedtuple' >> function dynamically creates a class derived from 'tuple', which complicates >> things like dynamic dispatch. Basically, the only way of checking if an >> object >> is an instance of 'namedtuple' is to do "isinstance(o, tuple) and >> hasattr(o, '_fields')". > > Let me see if I understand your use-case. You want to dynamically > dispatch on various objects. Given two objects: > > p1 = (23, 42) > p2 = namedtuple("pair", "a b")(23, 42) > assert p1 == p2 > > > you want to dispatch p1 and p2 differently. Is that correct? > > > Then, given a third object: > > class Person(namedtuple("Person", "name sex age occupation id")): > def say_hello(self): > print("Hello %s" % self.name) > > p3 = Person("Fred Smith", "M", 35, "nurse", 927056) > > > you want to dispatch p2 and p3 the same. Is that correct? Well, it all depends on a use case ;) In my concrete use case - yes, more to that below. > If I am correct, I wonder what sort of code you are writing that wants > to treat p1 and p2 differently, and p2 and p3 the same. To me, this > seems ill-advised. Apart from tuple (and object), p2 and p3 should not > share a common base class, because they have nothing in common. Well, everything in python is a subclass/instance of object, so what? Yes, I think that different namedtuples should be an instance of some remote common parent, derived from tuple, because they are different, they *are* namedtuples after all. They have field names for the data stored in them, and that is what distinguishes them from plain tuples. > [...] >> This way, it's possible to simple write 'isinstance(o, namedtuple)'. > > I am having difficulty thinking of circumstances where I would want to > do that. My use case: I have a system that dumps python objects to some intermediate format, which is later converted to html, or dumped in a terminal (for debug, reporting, and other purposes). And I want to dump namedtuples with their field names/values (not as a simple tuples). I'm sure there are much more use cases than my current itch. Python has the richest and most beautiful OO facilities, we have lots of ABCs and elegant exceptions tree, everything is well structured. To me, it's logical, that one of the most commonly used classes should have a proper base class. - Yury From eric at trueblade.com Sun Jan 12 02:27:39 2014 From: eric at trueblade.com (Eric V. Smith) Date: Sat, 11 Jan 2014 20:27:39 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112010542.GT3869@ando> References: <20140112010542.GT3869@ando> Message-ID: <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> See also http://bugs.python.org/issue7796 for a discussion of this issue. -- Eric. > On Jan 11, 2014, at 8:05 PM, Steven D'Aprano wrote: > >> On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: >> Hello all, >> >> I propose to add a baseclass for all namedtuples. Right now 'namedtuple' >> function dynamically creates a class derived from 'tuple', which complicates >> things like dynamic dispatch. Basically, the only way of checking if an >> object >> is an instance of 'namedtuple' is to do "isinstance(o, tuple) and >> hasattr(o, '_fields')". > > Let me see if I understand your use-case. You want to dynamically > dispatch on various objects. Given two objects: > > p1 = (23, 42) > p2 = namedtuple("pair", "a b")(23, 42) > assert p1 == p2 > > > you want to dispatch p1 and p2 differently. Is that correct? > > > Then, given a third object: > > class Person(namedtuple("Person", "name sex age occupation id")): > def say_hello(self): > print("Hello %s" % self.name) > > p3 = Person("Fred Smith", "M", 35, "nurse", 927056) > > > you want to dispatch p2 and p3 the same. Is that correct? > > If I am correct, I wonder what sort of code you are writing that wants > to treat p1 and p2 differently, and p2 and p3 the same. To me, this > seems ill-advised. Apart from tuple (and object), p2 and p3 should not > share a common base class, because they have nothing in common. > > > [...] >> This way, it's possible to simple write 'isinstance(o, namedtuple)'. > > I am having difficulty thinking of circumstances where I would want to > do that. > > -1 on the idea. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From yselivanov.ml at gmail.com Sun Jan 12 02:44:06 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 11 Jan 2014 20:44:06 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> Message-ID: Hi Eric, Thank you very much for bringing this up. I couldn't find that issue (perhaps, because I was looking for an open ticket). >From the discussion there, it seems that Raymond and Guido agreed to have a common base class for namedtuple for py3.3; however, that was in 2010/2011. Perhaps, any doubts that existed at that time are not the case now? Thanks, Yury On Sat, Jan 11, 2014 at 8:27 PM, Eric V. Smith wrote: > See also http://bugs.python.org/issue7796 for a discussion of this issue. > > -- > Eric. > >> On Jan 11, 2014, at 8:05 PM, Steven D'Aprano wrote: >> >>> On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: >>> Hello all, >>> >>> I propose to add a baseclass for all namedtuples. Right now 'namedtuple' >>> function dynamically creates a class derived from 'tuple', which complicates >>> things like dynamic dispatch. Basically, the only way of checking if an >>> object >>> is an instance of 'namedtuple' is to do "isinstance(o, tuple) and >>> hasattr(o, '_fields')". >> >> Let me see if I understand your use-case. You want to dynamically >> dispatch on various objects. Given two objects: >> >> p1 = (23, 42) >> p2 = namedtuple("pair", "a b")(23, 42) >> assert p1 == p2 >> >> >> you want to dispatch p1 and p2 differently. Is that correct? >> >> >> Then, given a third object: >> >> class Person(namedtuple("Person", "name sex age occupation id")): >> def say_hello(self): >> print("Hello %s" % self.name) >> >> p3 = Person("Fred Smith", "M", 35, "nurse", 927056) >> >> >> you want to dispatch p2 and p3 the same. Is that correct? >> >> If I am correct, I wonder what sort of code you are writing that wants >> to treat p1 and p2 differently, and p2 and p3 the same. To me, this >> seems ill-advised. Apart from tuple (and object), p2 and p3 should not >> share a common base class, because they have nothing in common. >> >> >> [...] >>> This way, it's possible to simple write 'isinstance(o, namedtuple)'. >> >> I am having difficulty thinking of circumstances where I would want to >> do that. >> >> -1 on the idea. >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sun Jan 12 08:53:54 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 11 Jan 2014 23:53:54 -0800 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> Message-ID: It sounds like the consensus there wasn't to have a base class for namedtuple, but instead to have an abc that all namedtuples, and C namedtuple-like types, would be registered with, and that would have no API beyond that of Sequence. If I understand the original request in this thread, I'm not sure this would satisfy the use case. He's looking to detect namedtuples so he can extract their names along with their values. Which is a perfectly reasonable thing to do for the kind of reflective code he wants to write. It would presumably use code like this: if isinstance(x, NamedTuple); d = OrderedDict(zip(x._fields, x)) do_stuff(d) But that won't work with any abstract NamedTuple, only one that has a _fields member that lists the field names. So you'd need to write this: if isinstance(NamedTuple): try: d = OrderedDict(zip(x._fields, x)) except AttributeError: whoops, it's an os.stat_result or something else: do_stuff(d) And at that point, the isinstance check isn't helping anything over the duck typing on _fields, which you can already do today. So to satisfy this use case, you'd either need an actual namedtuple base class instead of an abc, or an abc that adds some API for getting the field names (or name-value pairs). Either of which seems reasonable--except for the odd quirk of having a public API in a class that's prefixed with an underscore. (If it's not prefixed with an underscore, it can conflict with a field name, which defeats the whole purpose of namedtuple.) Sent from a random iPhone On Jan 11, 2014, at 17:44, Yury Selivanov wrote: > Hi Eric, > > Thank you very much for bringing this up. I couldn't find that issue (perhaps, > because I was looking for an open ticket). > > From the discussion there, it seems that Raymond and Guido agreed to > have a common base class for namedtuple for py3.3; however, that was in > 2010/2011. > > Perhaps, any doubts that existed at that time are not the case now? > > Thanks, > Yury > > > > On Sat, Jan 11, 2014 at 8:27 PM, Eric V. Smith wrote: >> See also http://bugs.python.org/issue7796 for a discussion of this issue. >> >> -- >> Eric. >> >>> On Jan 11, 2014, at 8:05 PM, Steven D'Aprano wrote: >>> >>>> On Sat, Jan 11, 2014 at 06:04:06PM -0500, Yury Selivanov wrote: >>>> Hello all, >>>> >>>> I propose to add a baseclass for all namedtuples. Right now 'namedtuple' >>>> function dynamically creates a class derived from 'tuple', which complicates >>>> things like dynamic dispatch. Basically, the only way of checking if an >>>> object >>>> is an instance of 'namedtuple' is to do "isinstance(o, tuple) and >>>> hasattr(o, '_fields')". >>> >>> Let me see if I understand your use-case. You want to dynamically >>> dispatch on various objects. Given two objects: >>> >>> p1 = (23, 42) >>> p2 = namedtuple("pair", "a b")(23, 42) >>> assert p1 == p2 >>> >>> >>> you want to dispatch p1 and p2 differently. Is that correct? >>> >>> >>> Then, given a third object: >>> >>> class Person(namedtuple("Person", "name sex age occupation id")): >>> def say_hello(self): >>> print("Hello %s" % self.name) >>> >>> p3 = Person("Fred Smith", "M", 35, "nurse", 927056) >>> >>> >>> you want to dispatch p2 and p3 the same. Is that correct? >>> >>> If I am correct, I wonder what sort of code you are writing that wants >>> to treat p1 and p2 differently, and p2 and p3 the same. To me, this >>> seems ill-advised. Apart from tuple (and object), p2 and p3 should not >>> share a common base class, because they have nothing in common. >>> >>> >>> [...] >>>> This way, it's possible to simple write 'isinstance(o, namedtuple)'. >>> >>> I am having difficulty thinking of circumstances where I would want to >>> do that. >>> >>> -1 on the idea. >>> >>> >>> -- >>> Steven >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Sun Jan 12 09:17:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 19:17:56 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> Message-ID: On Sun, Jan 12, 2014 at 6:53 PM, Andrew Barnert wrote: > So to satisfy this use case, you'd either need an actual namedtuple base class instead of an abc, or an abc that adds some API for getting the field names (or name-value pairs). Either of which seems reasonable--except for the odd quirk of having a public API in a class that's prefixed with an underscore. (If it's not prefixed with an underscore, it can conflict with a field name, which defeats the whole purpose of namedtuple.) > Is compatibility with the current namedtuple important, or can this be done another way? For instance, the fields could be retrieved with __getitem__ instead: # Hacking it in with a subclass. Gives no benefit # but is a proof of concept. class Point(namedtuple('Point', ['x', 'y'])): def __getitem__(self, which): if which=="fields": return self._fields return super().__getitem__(which) >>> a=Point(1,2) >>> a.x 1 >>> a.y 2 >>> a.fields Traceback (most recent call last): File "", line 1, in a.fields AttributeError: 'Point' object has no attribute 'fields' >>> a["fields"] ('x', 'y') >>> a[0] 1 >>> a[1] 2 Normally, __getitem__ will be used with integers (since this is basically a sequence, not a mapping). Would it break things to use a string in this way? It's guaranteed not to collide with either form of access (as a tuple, or as fields). ChrisA From techtonik at gmail.com Sun Jan 12 08:26:07 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 12 Jan 2014 10:26:07 +0300 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> Message-ID: On Sun, Jan 12, 2014 at 4:44 AM, Yury Selivanov wrote: > > Perhaps, any doubts that existed at that time are not the case now? Sometimes I feel that various questions about namedtuple class, record and similar proposals need a separate FAQ, but like everybody else I am lazy to create one, so it never happens. From steve at pearwood.info Sun Jan 12 12:43:01 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 22:43:01 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> Message-ID: <20140112114301.GZ3869@ando> On Sun, Jan 12, 2014 at 07:17:56PM +1100, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 6:53 PM, Andrew Barnert wrote: > > So to satisfy this use case, you'd either need an actual namedtuple > > base class instead of an abc, or an abc that adds some API for > > getting the field names (or name-value pairs). Either of which seems > > reasonable--except for the odd quirk of having a public API in a > > class that's prefixed with an underscore. (If it's not prefixed with > > an underscore, it can conflict with a field name, which defeats the > > whole purpose of namedtuple.) > > > > Is compatibility with the current namedtuple important, or can this be > done another way? For instance, the fields could be retrieved with > __getitem__ instead: It's a tuple. It already uses __getitem__ to return items indexed by position. Adding magic so that obj["fields"] is an alias for obj._fields is, well, horrible. > # Hacking it in with a subclass. Gives no benefit > # but is a proof of concept. > class Point(namedtuple('Point', ['x', 'y'])): > def __getitem__(self, which): > if which=="fields": return self._fields > return super().__getitem__(which) I think you missed that namedtuple like objects written in C don't have a _fields attribute, e.g. os.stat_result. If you're going to insist that they add special handling in __getitem__, wouldn't it just be cleaner and simpler to get them to add a _fields attribute? So... * An ABC for namedtuple as agreed by Raymond and Guido wouldn't include any extra functionality beyond Sequence, so it doesn't guarantee the existence of _fields; that doesn't satisfy the use-case. * An actual namedtuple superclass only works for the namedtuple factory function, not for C namedtuple-like types. Both could be fixed -- Python could define a namedtuple superclass, and all relevant C types like os.stat_result could be changed to inherit from them. (But what of those which don't?) Or the ABC could be extended to include a promise of _fields, but that would exclude C types. Either way, in order to satisfy this use-case, there would be a whole lot of changes needed. Or, you can duck-type: if isinstance(o, tuple): try: fields = o._fields except AttributeError: fields = ... # fall back Have I missed something? -- Steven From rosuav at gmail.com Sun Jan 12 12:46:51 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 22:46:51 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112114301.GZ3869@ando> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> Message-ID: On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano wrote: > It's a tuple. It already uses __getitem__ to return items indexed by > position. Adding magic so that obj["fields"] is an alias for > obj._fields is, well, horrible. It's only an alias in the simple version that I did there. If it were to be used as a means of avoiding the _fields reserved name, it wouldn't be an alias. But yes, it is somewhat magical. I was hunting for an out-of-band way to get that sort of information. ChrisA From steve at pearwood.info Sun Jan 12 12:55:16 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Jan 2014 22:55:16 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> Message-ID: <20140112115516.GA3869@ando> On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano wrote: > > It's a tuple. It already uses __getitem__ to return items indexed by > > position. Adding magic so that obj["fields"] is an alias for > > obj._fields is, well, horrible. > > It's only an alias in the simple version that I did there. If it were > to be used as a means of avoiding the _fields reserved name, it > wouldn't be an alias. But yes, it is somewhat magical. I was hunting > for an out-of-band way to get that sort of information. I still don't get how you think this solves the problem that the OP's use-case is to use isinstance() to identify namedtuples, then read _fields. But with the (proposed, not implemented) namedtuple ABC, isinstance(o, NamedTuple) could be true and o._fields fail. Breaking backwards compatibility to write that as o["fields"] instead won't help, because it will still fail: py> t = os.stat_result([1]*10) py> t["fields"] Traceback (most recent call last): File "", line 1, in TypeError: tuple indices must be integers, not str Changing namedtuple is not enough. Oh, and this is a backwards-compatibility breaking change, because _fields is part of the *public* API for namedtuple, despite the leading underscore. So I fail to see how anything short of a massive re-engineering of not just namedtuple but also any C namedtuple-like types will satisfy the OP's use-case. Have I missed something? -- Steven From rosuav at gmail.com Sun Jan 12 13:07:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 12 Jan 2014 23:07:46 +1100 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112115516.GA3869@ando> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> Message-ID: On Sun, Jan 12, 2014 at 10:55 PM, Steven D'Aprano wrote: > On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote: >> On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano wrote: >> > It's a tuple. It already uses __getitem__ to return items indexed by >> > position. Adding magic so that obj["fields"] is an alias for >> > obj._fields is, well, horrible. >> >> It's only an alias in the simple version that I did there. If it were >> to be used as a means of avoiding the _fields reserved name, it >> wouldn't be an alias. But yes, it is somewhat magical. I was hunting >> for an out-of-band way to get that sort of information. > > I still don't get how you think this solves the problem that the OP's > use-case is to use isinstance() to identify namedtuples, then read > _fields. That was a slightly tangential comment stemming from Andrew Barnert's remark that using _fields for a public API is quirky. (Which is why I quoted him in my post.) This would no longer use an underscore name for something public. That's all. ChrisA From yselivanov.ml at gmail.com Sun Jan 12 13:51:56 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 12 Jan 2014 07:51:56 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112115516.GA3869@ando> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> Message-ID: Steven, On Sun, Jan 12, 2014 at 6:55 AM, Steven D'Aprano wrote: > On Sun, Jan 12, 2014 at 10:46:51PM +1100, Chris Angelico wrote: >> On Sun, Jan 12, 2014 at 10:43 PM, Steven D'Aprano wrote: >> > It's a tuple. It already uses __getitem__ to return items indexed by >> > position. Adding magic so that obj["fields"] is an alias for >> > obj._fields is, well, horrible. >> >> It's only an alias in the simple version that I did there. If it were >> to be used as a means of avoiding the _fields reserved name, it >> wouldn't be an alias. But yes, it is somewhat magical. I was hunting >> for an out-of-band way to get that sort of information. > > I still don't get how you think this solves the problem that the OP's > use-case is to use isinstance() to identify namedtuples, then read > _fields. But with the (proposed, not implemented) namedtuple ABC, > isinstance(o, NamedTuple) could be true and o._fields fail. If we decide to implement an ABC, then any class that satisfies it should implement '_fields' (and _make, and other namedtuple public methods) properly (this can be enforced in the ABC's '__subclasshook__') > Breaking > backwards compatibility to write that as o["fields"] instead won't help, > because it will still fail: > > py> t = os.stat_result([1]*10) > py> t["fields"] > Traceback (most recent call last): > File "", line 1, in > TypeError: tuple indices must be integers, not str > > > Changing namedtuple is not enough. > > Oh, and this is a backwards-compatibility breaking change, because > _fields is part of the *public* API for namedtuple, despite the leading > underscore. > > So I fail to see how anything short of a massive re-engineering of not > just namedtuple but also any C namedtuple-like types will satisfy the > OP's use-case. Have I missed something? If we go with the ABC route, then we can simply implement '_fields' and other namedtuple methods for the low-level C structure os.stat_results is using later. But for now, stat_result is not a namedtuple (lacks all of namedtuple API). So I'm not sure that C namedtuple-like types should hold us bask on this proposal. BTW, ABC proposal aside: the current namedtuple implementation creates the class from a template with "exec" call. For every namedtuple, it's entire set of methods is created over and over again. Even for the memory efficiency sake, having a base class with *some* of the common methods (which are currently in the template) is better. - Yury From ericsnowcurrently at gmail.com Sun Jan 12 17:33:57 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 12 Jan 2014 09:33:57 -0700 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> Message-ID: On Jan 12, 2014 5:52 AM, "Yury Selivanov" wrote: > BTW, ABC proposal aside: the current namedtuple implementation > creates the class from a template with "exec" call. For every namedtuple, > it's entire set of methods is created over and over again. Even for the > memory efficiency sake, having a base class with *some* of the common > methods (which are currently in the template) is better. It's a trade-off. We increase the definition-time cost by using exec, but minimize the cost of traversing the attribute lookup chain when using instances. The purely ABC approach in the referenced issue preserves this instance-favoring-optimization design. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sun Jan 12 17:45:44 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 12 Jan 2014 11:45:44 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> Message-ID: Eric, On Sun, Jan 12, 2014 at 11:33 AM, Eric Snow wrote: > > On Jan 12, 2014 5:52 AM, "Yury Selivanov" wrote: >> BTW, ABC proposal aside: the current namedtuple implementation >> creates the class from a template with "exec" call. For every namedtuple, >> it's entire set of methods is created over and over again. Even for the >> memory efficiency sake, having a base class with *some* of the common >> methods (which are currently in the template) is better. > > It's a trade-off. We increase the definition-time cost by using exec, but > minimize the cost of traversing the attribute lookup chain when using > instances. The purely ABC approach in the referenced issue preserves this > instance-favoring-optimization design. > > -eric Correct me if i'm wrong, but what's the point of speeding up (2%?) attribute lookup on "_make", "__repr__", and other namedtuple methods? What matters is the performance of "__getitem__" and field property access, but that would be the same if a metaclass (or simple "type" call) is used to construct nametuples. Anyways, I'm not proposing to touch the main bulk of the current implementation (and perhaps there are another reasons why it is as it is). The only thing I think would be nice to have (for now), is to have a base class for namedtuples other than tuple. Thank you, Yury From raymond.hettinger at gmail.com Sun Jan 12 21:01:35 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sun, 12 Jan 2014 20:01:35 +0000 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: On Jan 11, 2014, at 11:04 PM, Yury Selivanov wrote: > I propose to add a baseclass for all namedtuples. Right now 'namedtuple' > function dynamically creates a class derived from 'tuple', which complicates > things like dynamic dispatch. A named tuple is a protocol, not a class. Here's the glossary entry: ''' named tuple Any tuple-like class whose indexable elements are also accessible using named attributes (for example, time.localtime() returns a tuple-like object where the year is accessible either with an index such as t[0] or with a named attribute like t.tm_year). A named tuple can be a built-in type such as time.struct_time, or it can be created with a regular class definition. A full featured named tuple can also be created with the factory function collections.namedtuple(). The latter approach automatically provides extra features such as a self-documenting representation like Employee(name='jones', title='programmer'). ''' > Basically, the only way of checking if an object > is an instance of 'namedtuple' is to do "isinstance(o, tuple) and hasattr(o, '_fields')". Yes, that is the correct way of doing it. ABCs weren't meant to replace all instances of duck typing. Raymond P.S. Here's a link to previous discussion on the subject: http://bugs.python.org/issue7796 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Jan 12 22:17:20 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 12 Jan 2014 16:17:20 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: On 1/12/2014 3:01 PM, Raymond Hettinger wrote: > > On Jan 11, 2014, at 11:04 PM, Yury Selivanov > > wrote: > >> I propose to add a baseclass for all namedtuples. Right now 'namedtuple' >> function dynamically creates a class derived from 'tuple', which >> complicates >> things like dynamic dispatch. > > A named tuple is a protocol, not a class. > Here's the glossary entry: > ''' > named tuple > > Any tuple-like class whose indexable elements are also accessible > using named attributes (for example, time.localtime() > returns a tuple-like object where > the /year/ is accessible either with an index such as t[0] or with a > named attribute like t.tm_year). > > A named tuple can be a built-in type such as time.struct_time > , or it can be created with a > regular class definition. A full featured named tuple can also be > created with the factory function collections.namedtuple() > . The latter > approach automatically provides extra features such as a > self-documenting representation like Employee(name='jones', > title='programmer'). > ''' That is a really nice glossary entry. I had not seen it before. >> Basically, the only way of checking if an object >> is an instance of 'namedtuple' is to do "isinstance(o, tuple) and >> hasattr(o, '_fields')". > > Yes, that is the correct way of doing it. That looks fine to me also, so I agree that nothing new is needed. -- Terry Jan Reedy From abarnert at yahoo.com Sun Jan 12 22:35:34 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 12 Jan 2014 13:35:34 -0800 (PST) Subject: [Python-ideas] namedtuple baseclass In-Reply-To: <20140112115516.GA3869@ando> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> Message-ID: <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> From: Steven D'Aprano Sent: Sunday, January 12, 2014 3:55 AM > Changing namedtuple is not enough. In fact, it's almost completely orthogonal to adding a NamedTuple ABC. Changing namedtuple shouldn't be necessary, and definitely won't be sufficient. > So I fail to see how anything short of a massive re-engineering of not > just namedtuple but also any C namedtuple-like types will satisfy the > OP's use-case. Have I missed something? I said pretty much the same thing yesterday? but on further reflection, I think it's a lot simpler than it looks. First, let's write collections.abc.NamedTuple: ? ? class NamedTuple(Sequence): ? ? ? ? @classmethod ? ? ? ? def __subclasshook__(cls, sub): ? ? ? ? ? ? if not issubclass(sub, collections.abc.Sequence): ? ? ? ? ? ? ? ? return False ? ? ? ? ? ? try: ? ? ? ? ? ? ? ? sub._fields ? ? ? ? ? ? ? ? return True ? ? ? ? ? ? except: ? ? ? ? ? ? ? ? return NotImplemented That's easy, and it works with namedtuple types with no change, and it should work with any Python wrapper type that's designed to emulate namedtuple without using it (e.g., if someone decides to write a custom implementation with a shared base class, so he can make all of his types share implementations for _make and friends, as has been suggested on this thread). So, what about C types? Obviously they don't generally supply _fields?or anything else useful. But most (all?) of the namedtuple-like types in builtins/stdlib are built with?PyStructSequence, and adding _fields to them requires just a few lines at the end of?PyStructSequence_InitType2: ? ? PyObject *_fields = PyTuple_New(visible_length_key); for (i=0; i!=visible_key_length; ++i) { PyObject *field = PyUnicode_FromString(desc->fields[i].name); PyTuple_SET_ITEM(_fields, i, field); } PyDict_SetItemString(dict, "_fields", fields); In fact, that might be worth doing even without the NamedTuple ABC proposal. But StructSequence has only been an exposed, documented protocol since 3.3, so surely there are?extension modules out there that do their namedtuple-like types manually. (In a quick look around, I couldn't find any examples?although I did find a couple with Python wrappers that create a namedtuple around the result returned by a C implementation function?but I'm sure they exist.) Obviously you need to be able to get the field names from somewhere?whether that's an attribute or method on the type, copy-pasting from documentation or source, or even parsing the repr of an instance or something?but then you can just generate a wrapper from the type and its field names. And we could just leave it at that:?"Sorry, those aren't NamedTuple classes, but you can always implement a wrapper in Python yourself." Or we could add a wrapper-generator to the collections module.?Something like this: ? ? def namedtupleize(cls, fields): ? ? ? ? if isinstance(fields, str): ? ? ? ? ? ? fields = fields.split() ? ? ? ? class Sub: ? ? ? ? ? ? _fields = fields ? ? ? ? ? ? def __init__(self, *args, **kwargs): ? ? ? ? ? ? ? ? self.values = cls(*args, **kwargs) ? ? ? ? ? ? def __repr__(self): ? ? ? ? ? ? ? ? return repr(self.values) ? ? ? ? ? ? # a handful of other special methods that can't be getattrified ? ? ? ? ? ? def __getattr__(self, attr): ? ? ? ? ? ? ? ? return getattr(self.values, attr) ? ? ? ? return Sub ? ? statfields = 'st_mode st_ino st_dev st_nlink st_uid?st_gid st_size st_atime st_mtime st_ctime' ? ? Stat = namedtuplize(os.stat_result, stat fields) ? ? stats = (Stat(os.stat(f)) for f in os.listdir('.')) (I'm using os.stat_result as an example, even though it's already a PyStructSequence so you wouldn't need it here, only for lack of a real-life example.) And then you can write a wrapper around os.stat that returns a Stat instead of an os.stat_result.?Or, going the other way, in a quick&dirty script that just wraps a handful of these,?you can just even wrap?each object: ? ? def namedtuplify(obj, fields): ? ? ? ? return namedtuplize(type(obj), fields)(obj) While the namedtuplize function could be useful in the stdlib, the namedtuplify function is less useful, and there are many cases where it's a bad idea, and it's trivial to write yourself if you have need it, so I wouldn't add that to collections, except maybe as a recipe in the docs. One last thing: Either the ABC or the wrapper could also add _as_odict and the other methods that can be easily derived from _fields, because they're useful, and?I frequently see people doing _as_odict by calling getattr(self, field) on each field. From yselivanov.ml at gmail.com Sun Jan 12 22:58:19 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 12 Jan 2014 16:58:19 -0500 Subject: [Python-ideas] namedtuple baseclass In-Reply-To: References: Message-ID: Raymond, On January 12, 2014 at 3:01:42 PM, Raymond Hettinger (raymond.hettinger at gmail.com) wrote: > > > On Jan 11, 2014, at 11:04 PM, Yury Selivanov > wrote: > > > I propose to add a baseclass for all namedtuples. Right now 'namedtuple' > > function dynamically creates a class derived from 'tuple', > which complicates > > things like dynamic dispatch. > > A named tuple is a protocol, not a class. This line actually makes a lot of sense, thank you for the explanation. Since it?s a protocol, and a widely used one, then how about reopening a? discussion (started in #7796) on adding an ABC ?collections.abc.NamedTuple?? I understand the issue with structseq, but we can have the ABC now for regular named tuples. If/Once the named tuple API is implemented for structseqs, it will automatically conform to the proposed ABC. Thank you, Yury From abarnert at yahoo.com Mon Jan 13 01:17:20 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 12 Jan 2014 16:17:20 -0800 (PST) Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) In-Reply-To: <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> Message-ID: <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> I don't think the proposed NamedTuple ABC adds anything on top of duck typing on _fields (or on whichever other method you need, and possibly checking for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a protocol, not a type. But I think one of the ideas that came out of that discussion is worth pursuing on its own: giving a _fields member to every structseq type. Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, are implemented with PyStructSequence. Since 3.3, that's been a public, documented protocol. A structseq type is already a tuple. And it?stores all the information needed to expose the fields to Python, it just doesn't expose them in any way. And?making it do so is easy. (Either add it to the type __dict__ at type creation, or add a getter that generates it on the fly from tp_members.) Of course a structseq can do more than a namedtuple. In particular, using a structseq via its _fields would mean that you miss its "non-sequence" fields, like st_mtime_ns. But then that's already true for using a structseq as a sequence, or just looking at its repr, so I don't think that's a problem. (The "visible fields" are visible for a reason?) And this still wouldn't mean that _fields is part of the "named tuple protocol" described in the glossary, just that it's part of structseq types as well as collections.namedtuple types. And this wouldn't give structseq an on-demand __dict__ so you can just call var(s) instead of OrderedDict(zip(s._fields, s)). Still, it seems like a clear win. A small patch, a bit of extra storage on each structseq type object (not on the instances), and now you can reflect on the most common kind of C named tuple types the same way you do on the most common kind of Python named tuple types. From abarnert at yahoo.com Mon Jan 13 01:32:21 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 12 Jan 2014 16:32:21 -0800 (PST) Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) In-Reply-To: <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> Message-ID: <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> Here's a quick patch: diff -r bc5f257f5cc1 Lib/test/test_structseq.py --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 @@ -28,6 +28,16 @@ ? ? ? ? ?for i in range(-len(t), len(t)-1): ? ? ? ? ? ? ?self.assertEqual(t[i], astuple[i]) ? + ? ?def test_fields(self): + ? ? ? ?t = time.gmtime() + ? ? ? ?self.assertEqual(t._fields, + ? ? ? ? ? ? ? ? ? ? ? ? ('tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min',? + ? ? ? ? ? ? ? ? ? ? ? ? ?'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst')) + ? ? ? ?st = os.stat(__file__) + ? ? ? ?self.assertIn("st_mode", st._fields) + ? ? ? ?self.assertIn("st_ino", st._fields) + ? ? ? ?self.assertIn("st_dev", st._fields) + ? ? ?def test_repr(self): ? ? ? ? ?t = time.gmtime() ? ? ? ? ?self.assertTrue(repr(t)) diff -r bc5f257f5cc1 Objects/structseq.c --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 @@ -7,6 +7,7 @@ ?static char visible_length_key[] = "n_sequence_fields"; ?static char real_length_key[] = "n_fields"; ?static char unnamed_fields_key[] = "n_unnamed_fields"; +static char _fields_key[] = "_fields"; ? ?/* Fields with this name have only a field index, not a field name. ? ? They are only allowed for indices < n_visible_fields. */ @@ -14,6 +15,7 @@ ?_Py_IDENTIFIER(n_sequence_fields); ?_Py_IDENTIFIER(n_fields); ?_Py_IDENTIFIER(n_unnamed_fields); +_Py_IDENTIFIER(_fields); ? ?#define VISIBLE_SIZE(op) Py_SIZE(op) ?#define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ @@ -327,6 +329,7 @@ ? ? ?PyMemberDef* members; ? ? ?int n_members, n_unnamed_members, i, k; ? ? ?PyObject *v; + ? ?PyObject *_fields; ? ?#ifdef Py_TRACE_REFS ? ? ?/* if the type object was chained, unchain it first @@ -389,6 +392,19 @@ ? ? ?SET_DICT_FROM_INT(real_length_key, n_members); ? ? ?SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members); ? + ? ?_fields = PyTuple_New(desc->n_in_sequence); + ? ?if (!_fields) + ? ? ? ?return -1; + ? ?for (i = 0; i != desc->n_in_sequence; ++i) { + ? ? ? ?PyObject *field = PyUnicode_FromString(members[i].name); + ? ? ? ?PyTuple_SET_ITEM(_fields, i, field); + ? ?} + ? ?if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { + ? ? ? ?Py_DECREF(_fields); + ? ? ? ?return -1; + ? ?} + ? ?Py_DECREF(_fields); + ? ? ?return 0; ?} ? @@ -417,7 +433,8 @@ ?{ ? ? ?if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL ? ? ? ? ?|| _PyUnicode_FromId(&PyId_n_fields) == NULL - ? ? ? ?|| _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) + ? ? ? ?|| _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL + ? ? ? ?|| _PyUnicode_FromId(&PyId__fields) == NULL) ? ? ? ? ?return -1; ? ? ? ?return 0; ----- Original Message ----- > From: Andrew Barnert > To: "python-ideas at python.org" > Cc: > Sent: Sunday, January 12, 2014 4:17 PM > Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) > > I don't think the proposed NamedTuple ABC adds anything on top of duck > typing on _fields (or on whichever other method you need, and possibly checking > for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a > protocol, not a type. > > But I think one of the ideas that came out of that discussion is worth pursuing > on its own: giving a _fields member to every structseq type. > > Most of the namedtuple-like classes in the builtins/stdlib, like os.stat_result, > are implemented with PyStructSequence. Since 3.3, that's been a public, > documented protocol. A structseq type is already a tuple. And it?stores all the > information needed to expose the fields to Python, it just doesn't expose > them in any way. And?making it do so is easy. (Either add it to the type > __dict__ at type creation, or add a getter that generates it on the fly from > tp_members.) > > Of course a structseq can do more than a namedtuple. In particular, using a > structseq via its _fields would mean that you miss its "non-sequence" > fields, like st_mtime_ns. But then that's already true for using a structseq > as a sequence, or just looking at its repr, so I don't think that's a > problem. (The "visible fields" are visible for a reason?) > > And this still wouldn't mean that _fields is part of the "named tuple > protocol" described in the glossary, just that it's part of structseq > types as well as collections.namedtuple types. > > And this wouldn't give structseq an on-demand __dict__ so you can just call > var(s) instead of OrderedDict(zip(s._fields, s)). > > Still, it seems like a clear win. A small patch, a bit of extra storage on each > structseq type object (not on the instances), and now you can reflect on the > most common kind of C named tuple types the same way you do on the most common > kind of Python named tuple types. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From ethan at stoneleaf.us Mon Jan 13 01:44:03 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 12 Jan 2014 16:44:03 -0800 Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) In-Reply-To: <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> Message-ID: <52D336D3.3020908@stoneleaf.us> On 01/12/2014 04:32 PM, Andrew Barnert wrote: > > Here's a quick patch: Please put the patch on the issue tracker[1]. Create a new issue if an appropriate one does not already exist. Thanks. -- ~Ethan~ [1] http://bugs.python.org From abarnert at yahoo.com Mon Jan 13 02:16:20 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 12 Jan 2014 17:16:20 -0800 (PST) Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) In-Reply-To: <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> Message-ID: <1389575780.47710.YahooMailNeo@web181005.mail.ne1.yahoo.com> See?http://bugs.python.org/issue20230 for the issue and patch. Thanks to Ethan Furman for telling me to post it there instead of here. ----- Original Message ----- > From: Andrew Barnert > To: Andrew Barnert ; "python-ideas at python.org" > Cc: > Sent: Sunday, January 12, 2014 4:32 PM > Subject: Re: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) > > Here's a quick patch: > > diff -r bc5f257f5cc1 Lib/test/test_structseq.py > --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 > +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 > @@ -28,6 +28,16 @@ > ? ? ? ? ?for i in range(-len(t), len(t)-1): > ? ? ? ? ? ? ?self.assertEqual(t[i], astuple[i]) > ? > + ? ?def test_fields(self): > + ? ? ? ?t = time.gmtime() > + ? ? ? ?self.assertEqual(t._fields, > + ? ? ? ? ? ? ? ? ? ? ? ? ('tm_year', 'tm_mon', > 'tm_mday', 'tm_hour', 'tm_min',? > + ? ? ? ? ? ? ? ? ? ? ? ? ?'tm_sec', 'tm_wday', > 'tm_yday', 'tm_isdst')) > + ? ? ? ?st = os.stat(__file__) > + ? ? ? ?self.assertIn("st_mode", st._fields) > + ? ? ? ?self.assertIn("st_ino", st._fields) > + ? ? ? ?self.assertIn("st_dev", st._fields) > + > ? ? ?def test_repr(self): > ? ? ? ? ?t = time.gmtime() > ? ? ? ? ?self.assertTrue(repr(t)) > diff -r bc5f257f5cc1 Objects/structseq.c > --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 > +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 > @@ -7,6 +7,7 @@ > ?static char visible_length_key[] = "n_sequence_fields"; > ?static char real_length_key[] = "n_fields"; > ?static char unnamed_fields_key[] = "n_unnamed_fields"; > +static char _fields_key[] = "_fields"; > ? > ?/* Fields with this name have only a field index, not a field name. > ? ? They are only allowed for indices < n_visible_fields. */ > @@ -14,6 +15,7 @@ > ?_Py_IDENTIFIER(n_sequence_fields); > ?_Py_IDENTIFIER(n_fields); > ?_Py_IDENTIFIER(n_unnamed_fields); > +_Py_IDENTIFIER(_fields); > ? > ?#define VISIBLE_SIZE(op) Py_SIZE(op) > ?#define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ > @@ -327,6 +329,7 @@ > ? ? ?PyMemberDef* members; > ? ? ?int n_members, n_unnamed_members, i, k; > ? ? ?PyObject *v; > + ? ?PyObject *_fields; > ? > ?#ifdef Py_TRACE_REFS > ? ? ?/* if the type object was chained, unchain it first > @@ -389,6 +392,19 @@ > ? ? ?SET_DICT_FROM_INT(real_length_key, n_members); > ? ? ?SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members); > ? > + ? ?_fields = PyTuple_New(desc->n_in_sequence); > + ? ?if (!_fields) > + ? ? ? ?return -1; > + ? ?for (i = 0; i != desc->n_in_sequence; ++i) { > + ? ? ? ?PyObject *field = PyUnicode_FromString(members[i].name); > + ? ? ? ?PyTuple_SET_ITEM(_fields, i, field); > + ? ?} > + ? ?if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { > + ? ? ? ?Py_DECREF(_fields); > + ? ? ? ?return -1; > + ? ?} > + ? ?Py_DECREF(_fields); > + > ? ? ?return 0; > ?} > ? > @@ -417,7 +433,8 @@ > ?{ > ? ? ?if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL > ? ? ? ? ?|| _PyUnicode_FromId(&PyId_n_fields) == NULL > - ? ? ? ?|| _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) > + ? ? ? ?|| _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL > + ? ? ? ?|| _PyUnicode_FromId(&PyId__fields) == NULL) > ? ? ? ? ?return -1; > ? > ? ? ?return 0; > > > > > ----- Original Message ----- >> From: Andrew Barnert >> To: "python-ideas at python.org" >> Cc: >> Sent: Sunday, January 12, 2014 4:17 PM >> Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: > namedtuple base class) >> >> I don't think the proposed NamedTuple ABC adds anything on top of duck >> typing on _fields (or on whichever other method you need, and possibly > checking >> for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a >> protocol, not a type. >> >> But I think one of the ideas that came out of that discussion is worth > pursuing >> on its own: giving a _fields member to every structseq type. >> >> Most of the namedtuple-like classes in the builtins/stdlib, like > os.stat_result, >> are implemented with PyStructSequence. Since 3.3, that's been a public, > >> documented protocol. A structseq type is already a tuple. And it?stores all > the >> information needed to expose the fields to Python, it just doesn't > expose >> them in any way. And?making it do so is easy. (Either add it to the type >> __dict__ at type creation, or add a getter that generates it on the fly > from >> tp_members.) >> >> Of course a structseq can do more than a namedtuple. In particular, using a > >> structseq via its _fields would mean that you miss its > "non-sequence" >> fields, like st_mtime_ns. But then that's already true for using a > structseq >> as a sequence, or just looking at its repr, so I don't think that's > a >> problem. (The "visible fields" are visible for a reason?) >> >> And this still wouldn't mean that _fields is part of the "named > tuple >> protocol" described in the glossary, just that it's part of > structseq >> types as well as collections.namedtuple types. >> >> And this wouldn't give structseq an on-demand __dict__ so you can just > call >> var(s) instead of OrderedDict(zip(s._fields, s)). >> >> Still, it seems like a clear win. A small patch, a bit of extra storage on > each >> structseq type object (not on the instances), and now you can reflect on > the >> most common kind of C named tuple types the same way you do on the most > common >> kind of Python named tuple types. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > From ncoghlan at gmail.com Mon Jan 13 03:41:21 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 12:41:21 +1000 Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) In-Reply-To: <1389575780.47710.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <20140112010542.GT3869@ando> <1F7D8E53-6D36-400C-8B57-144507130F85@trueblade.com> <20140112114301.GZ3869@ando> <20140112115516.GA3869@ando> <1389562534.3364.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1389572240.39368.YahooMailNeo@web181004.mail.ne1.yahoo.com> <1389573141.54253.YahooMailNeo@web181002.mail.ne1.yahoo.com> <1389575780.47710.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: On 13 Jan 2014 11:19, "Andrew Barnert" wrote: > > See http://bugs.python.org/issue20230 for the issue and patch. Thanks to Ethan Furman for telling me to post it there instead of here. This approach sounds good to me for 3.5. The ABC recipe might make a good addition to the ActiveState cookbook. Cheers, Nick. > > > ----- Original Message ----- > > From: Andrew Barnert > > To: Andrew Barnert ; "python-ideas at python.org" < python-ideas at python.org> > > Cc: > > Sent: Sunday, January 12, 2014 4:32 PM > > Subject: Re: [Python-ideas] Making PyStructSequence expose _fields (was Re: namedtuple base class) > > > > Here's a quick patch: > > > > diff -r bc5f257f5cc1 Lib/test/test_structseq.py > > --- a/Lib/test/test_structseq.pySun Jan 12 14:12:59 2014 -0800 > > +++ b/Lib/test/test_structseq.pySun Jan 12 16:31:15 2014 -0800 > > @@ -28,6 +28,16 @@ > > for i in range(-len(t), len(t)-1): > > self.assertEqual(t[i], astuple[i]) > > > > + def test_fields(self): > > + t = time.gmtime() > > + self.assertEqual(t._fields, > > + ('tm_year', 'tm_mon', > > 'tm_mday', 'tm_hour', 'tm_min', > > + 'tm_sec', 'tm_wday', > > 'tm_yday', 'tm_isdst')) > > + st = os.stat(__file__) > > + self.assertIn("st_mode", st._fields) > > + self.assertIn("st_ino", st._fields) > > + self.assertIn("st_dev", st._fields) > > + > > def test_repr(self): > > t = time.gmtime() > > self.assertTrue(repr(t)) > > diff -r bc5f257f5cc1 Objects/structseq.c > > --- a/Objects/structseq.cSun Jan 12 14:12:59 2014 -0800 > > +++ b/Objects/structseq.cSun Jan 12 16:31:15 2014 -0800 > > @@ -7,6 +7,7 @@ > > static char visible_length_key[] = "n_sequence_fields"; > > static char real_length_key[] = "n_fields"; > > static char unnamed_fields_key[] = "n_unnamed_fields"; > > +static char _fields_key[] = "_fields"; > > > > /* Fields with this name have only a field index, not a field name. > > They are only allowed for indices < n_visible_fields. */ > > @@ -14,6 +15,7 @@ > > _Py_IDENTIFIER(n_sequence_fields); > > _Py_IDENTIFIER(n_fields); > > _Py_IDENTIFIER(n_unnamed_fields); > > +_Py_IDENTIFIER(_fields); > > > > #define VISIBLE_SIZE(op) Py_SIZE(op) > > #define VISIBLE_SIZE_TP(tp) PyLong_AsLong( \ > > @@ -327,6 +329,7 @@ > > PyMemberDef* members; > > int n_members, n_unnamed_members, i, k; > > PyObject *v; > > + PyObject *_fields; > > > > #ifdef Py_TRACE_REFS > > /* if the type object was chained, unchain it first > > @@ -389,6 +392,19 @@ > > SET_DICT_FROM_INT(real_length_key, n_members); > > SET_DICT_FROM_INT(unnamed_fields_key, n_unnamed_members); > > > > + _fields = PyTuple_New(desc->n_in_sequence); > > + if (!_fields) > > + return -1; > > + for (i = 0; i != desc->n_in_sequence; ++i) { > > + PyObject *field = PyUnicode_FromString(members[i].name); > > + PyTuple_SET_ITEM(_fields, i, field); > > + } > > + if (PyDict_SetItemString(dict, _fields_key, _fields) < 0) { > > + Py_DECREF(_fields); > > + return -1; > > + } > > + Py_DECREF(_fields); > > + > > return 0; > > } > > > > @@ -417,7 +433,8 @@ > > { > > if (_PyUnicode_FromId(&PyId_n_sequence_fields) == NULL > > || _PyUnicode_FromId(&PyId_n_fields) == NULL > > - || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL) > > + || _PyUnicode_FromId(&PyId_n_unnamed_fields) == NULL > > + || _PyUnicode_FromId(&PyId__fields) == NULL) > > return -1; > > > > return 0; > > > > > > > > > > ----- Original Message ----- > >> From: Andrew Barnert > >> To: "python-ideas at python.org" > >> Cc: > >> Sent: Sunday, January 12, 2014 4:17 PM > >> Subject: [Python-ideas] Making PyStructSequence expose _fields (was Re: > > namedtuple base class) > >> > >> I don't think the proposed NamedTuple ABC adds anything on top of duck > >> typing on _fields (or on whichever other method you need, and possibly > > checking > >> for Sequence). As Raymond Hettinger summarized it nicely, namedtuple is a > >> protocol, not a type. > >> > >> But I think one of the ideas that came out of that discussion is worth > > pursuing > >> on its own: giving a _fields member to every structseq type. > >> > >> Most of the namedtuple-like classes in the builtins/stdlib, like > > os.stat_result, > >> are implemented with PyStructSequence. Since 3.3, that's been a public, > > > >> documented protocol. A structseq type is already a tuple. And it stores all > > the > >> information needed to expose the fields to Python, it just doesn't > > expose > >> them in any way. And making it do so is easy. (Either add it to the type > >> __dict__ at type creation, or add a getter that generates it on the fly > > from > >> tp_members.) > >> > >> Of course a structseq can do more than a namedtuple. In particular, using a > > > >> structseq via its _fields would mean that you miss its > > "non-sequence" > >> fields, like st_mtime_ns. But then that's already true for using a > > structseq > >> as a sequence, or just looking at its repr, so I don't think that's > > a > >> problem. (The "visible fields" are visible for a reason?) > >> > >> And this still wouldn't mean that _fields is part of the "named > > tuple > >> protocol" described in the glossary, just that it's part of > > structseq > >> types as well as collections.namedtuple types. > >> > >> And this wouldn't give structseq an on-demand __dict__ so you can just > > call > >> var(s) instead of OrderedDict(zip(s._fields, s)). > >> > >> Still, it seems like a clear win. A small patch, a bit of extra storage on > > each > >> structseq type object (not on the instances), and now you can reflect on > > the > >> most common kind of C named tuple types the same way you do on the most > > common > >> kind of Python named tuple types. > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From musicdenotation at gmail.com Mon Jan 13 07:48:29 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 13 Jan 2014 13:48:29 +0700 Subject: [Python-ideas] Multi-statement anonymous functions Message-ID: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> Proposed syntaxes: > let function(*args,**kwargs): > ...body... > function2(...args...): > ...body... > in: > [statements] > do: > [statements] > where [function declarations in the same form as above] Inspired by Haskell and Julia. This has the advantage that declared functions aren't binded to names outside their context. From amber.yust at gmail.com Mon Jan 13 08:11:49 2014 From: amber.yust at gmail.com (Amber Yust) Date: Mon, 13 Jan 2014 07:11:49 +0000 Subject: [Python-ideas] Multi-statement anonymous functions References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> Message-ID: <3296035059350497264@gmail297201516> Can't you already essentially accomplish the same thing by simply nesting function definitions within another function? On Sun Jan 12 2014 at 10:49:10 PM, wrote: > Proposed syntaxes: > > let function(*args,**kwargs): > > ...body... > > function2(...args...): > > ...body... > > in: > > [statements] > > > do: > > [statements] > > where [function declarations in the same form as above] > > Inspired by Haskell and Julia. > > This has the advantage that declared functions aren't binded to names > outside their context. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jan 13 09:21:07 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 13 Jan 2014 00:21:07 -0800 (PST) Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> Message-ID: <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> From: "musicdenotation at gmail.com" Sent: Sunday, January 12, 2014 10:48 PM > Subject: [Python-ideas] Multi-statement anonymous functions > > Proposed syntaxes: >> let function(*args,**kwargs): >> ? ? ...body... >> function2(...args...): >> ? ? ...body... >> in: >> ? ? [statements] > >> do: >> ? ? [statements] >> where [function declarations in the same form as above] > > Inspired by Haskell and Julia. > > This has the advantage that declared functions aren't binded to names > outside their context. I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features?in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): ? ? @in statement that uses function1 ? ? def function1(*args, **kwargs): ? ? ? ? body ? ? statement that uses function1 and var1 given: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var1 = value Meanwhile, my first question for your syntax is: Why limit it to function definitions??It's worth noting that a Haskell?let statement creates local bindings for any values you want; it's not restricted to functions.?And that restriction is the only thing that forces the awkward block structure (which would?need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: ? ? let: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var = value ? ? ? ? any other statement you want ? ? in: ? ? ? ? statements ? or, for that matter, just a local-scope statement: ? ? local: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var = value ? ? ? ? any other statement you want ? ? ? ? statements that use those definitions This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) But I'm wondering why you need a local scope.? The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable,?you just use an assignment/def/class/etc.?And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? From musicdenotation at gmail.com Mon Jan 13 11:23:01 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 13 Jan 2014 17:23:01 +0700 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: <54DB1723-7628-4E00-A7D7-2EA0511F15E0@gmail.com> > On Jan 13, 2014, at 15:21, Andrew Barnert wrote: > > From: "musicdenotation at gmail.com" > > Sent: Sunday, January 12, 2014 10:48 PM > > >> Subject: [Python-ideas] Multi-statement anonymous functions >> >> Proposed syntaxes: >>> let function(*args,**kwargs): >>> ...body... >>> function2(...args...): >>> ...body... >>> in: >>> [statements] >> >>> do: >>> [statements] >>> where [function declarations in the same form as above] >> >> Inspired by Haskell and Julia. >> >> This has the advantage that declared functions aren't binded to names >> outside their context. > > > I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? > > If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): > > > @in statement that uses function1 > def function1(*args, **kwargs): > body > > statement that uses function1 and var1 given: > def function1(*args, **kwargs): > body > var1 = value > > Meanwhile, my first question for your syntax is: Why limit it to function definitions? It's worth noting that a Haskell let statement creates local bindings for any values you want; it's not restricted to functions. And that restriction is the only thing that forces the awkward block structure (which would need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: > > let: > > def function1(*args, **kwargs): > body > var = value > any other statement you want > in: > statements > > ? or, for that matter, just a local-scope statement: > > local: > def function1(*args, **kwargs): > body > var = value > any other statement you want > statements that use those definitions > > > This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) > > But I'm wondering why you need a local scope. > > The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable, you just use an assignment/def/class/etc. And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. > > Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. > > Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? I change my proposal: > let: > [all new variables created here are local to the let...in... scope, can use global and nonlocal] > in: > [all new variables created here belong to the surrounding scope, but variables introduced in the let statement will be usable and reassignable] or: > do: > [same semantics as in above] > where: > [same semantics as let above] Or my original proposal but with variable assignment allowed. Actually, my original proposal was because I didn't want to mess up with globals() and locals(). And the where statement is to allow function definitions after their usage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From musicdenotation at gmail.com Mon Jan 13 12:06:00 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 13 Jan 2014 18:06:00 +0700 Subject: [Python-ideas] Multi-statement anonymous functions Message-ID: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> Mutable namespaces and modules are just workarounds and cannot be substituted for local namespaces. ---Original message--- From: Andrew Barnert Sent: Mon, 13 Jan 2014 00:21:07 -0800 To: <|musicdenotation at gmail.com|><|python-ideas at python.org|> Subject: Re: [Python-ideas] Multi-statement anonymous functions From: "musicdenotation at gmail.com" Sent: Sunday, January 12, 2014 10:48 PM > Subject: [Python-ideas] Multi-statement anonymous functions > > Proposed syntaxes: >> let function(*args,**kwargs): >> ? ? ...body... >> function2(...args...): >> ? ? ...body... >> in: >> ? ? [statements] > >> do: >> ? ? [statements] >> where [function declarations in the same form as above] > > Inspired by Haskell and Julia. > > This has the advantage that declared functions aren't binded to names > outside their context. I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features?in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): ? ? @in statement that uses function1 ? ? def function1(*args, **kwargs): ? ? ? ? body ? ? statement that uses function1 and var1 given: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var1 = value Meanwhile, my first question for your syntax is: Why limit it to function definitions??It's worth noting that a Haskell?let statement creates local bindings for any values you want; it's not restricted to functions.?And that restriction is the only thing that forces the awkward block structure (which would?need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: ? ? let: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var = value ? ? ? ? any other statement you want ? ? in: ? ? ? ? statements ? or, for that matter, just a local-scope statement: ? ? local: ? ? ? ? def function1(*args, **kwargs): ? ? ? ? ? ? body ? ? ? ? var = value ? ? ? ? any other statement you want ? ? ? ? statements that use those definitions This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) But I'm wondering why you need a local scope.? The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable,?you just use an assignment/def/class/etc.?And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? From abarnert at yahoo.com Mon Jan 13 12:13:15 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 13 Jan 2014 03:13:15 -0800 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <54DB1723-7628-4E00-A7D7-2EA0511F15E0@gmail.com> References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> <54DB1723-7628-4E00-A7D7-2EA0511F15E0@gmail.com> Message-ID: <8F7C780E-D653-4CBD-8690-7D7B97CC18D0@yahoo.com> So I assume you haven't read PEP 403 and 3150, and don't intend to, even though they directly relate to your idea? Sent from a random iPhone On Jan 13, 2014, at 2:23, musicdenotation at gmail.com wrote: >> On Jan 13, 2014, at 15:21, Andrew Barnert wrote: > > >> From: "musicdenotation at gmail.com" >> >> Sent: Sunday, January 12, 2014 10:48 PM >> >> >>> Subject: [Python-ideas] Multi-statement anonymous functions >>> >>> Proposed syntaxes: >>>> let function(*args,**kwargs): >>>> ...body... >>>> function2(...args...): >>>> ...body... >>>> in: >>>> [statements] >>> >>>> do: >>>> [statements] >>>> where [function declarations in the same form as above] >>> >>> Inspired by Haskell and Julia. >>> >>> This has the advantage that declared functions aren't binded to names >>> outside their context. >> >> >> I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? >> >> If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): >> >> >> @in statement that uses function1 >> def function1(*args, **kwargs): >> body >> >> statement that uses function1 and var1 given: >> def function1(*args, **kwargs): >> body >> var1 = value >> >> Meanwhile, my first question for your syntax is: Why limit it to function definitions? It's worth noting that a Haskell let statement creates local bindings for any values you want; it's not restricted to functions. And that restriction is the only thing that forces the awkward block structure (which would need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: >> >> let: >> >> def function1(*args, **kwargs): >> body >> var = value >> any other statement you want >> in: >> statements >> >> ? or, for that matter, just a local-scope statement: >> >> local: >> def function1(*args, **kwargs): >> body >> var = value >> any other statement you want >> statements that use those definitions >> >> >> This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) >> >> But I'm wondering why you need a local scope. >> >> The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable, you just use an assignment/def/class/etc. And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. >> >> Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. >> >> Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? > > I change my proposal: >> let: >> [all new variables created here are local to the let...in... scope, can use global and nonlocal] >> in: >> [all new variables created here belong to the surrounding scope, but variables introduced in the let statement will be usable and reassignable] > or: >> do: >> [same semantics as in above] >> where: >> [same semantics as let above] > Or my original proposal but with variable assignment allowed. > > Actually, my original proposal was because I didn't want to mess up with globals() and locals(). > And the where statement is to allow function definitions after their usage. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jan 13 12:16:43 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 13 Jan 2014 03:16:43 -0800 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> References: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> Message-ID: <14C845E9-108D-427E-A06D-BBE188F96763@yahoo.com> On Jan 13, 2014, at 3:06, musicdenotation at gmail.com wrote: > Mutable namespaces and modules are just workarounds and cannot be substituted for local namespaces. Sure, in the exact same way that mutable file objects are just workarounds and cannot be substituted for an I/O monad. If you don't think being able to write "a=3" and modify the current (module/class/local) scope is helpful, I think you may be using the wrong language. > ---Original message--- > From: Andrew Barnert > Sent: Mon, 13 Jan 2014 00:21:07 -0800 > To: <|musicdenotation at gmail.com|><|python-ideas at python.org|> > Subject: Re: [Python-ideas] Multi-statement anonymous functions > > > From: "musicdenotation at gmail.com" > > Sent: Sunday, January 12, 2014 10:48 PM > > >> Subject: [Python-ideas] Multi-statement anonymous functions >> >> Proposed syntaxes: >>> let function(*args,**kwargs): >>> ...body... >>> function2(...args...): >>> ...body... >>> in: >>> [statements] >> >>> do: >>> [statements] >>> where [function declarations in the same form as above] >> >> Inspired by Haskell and Julia. >> >> This has the advantage that declared functions aren't binded to names >> outside their context. > > > I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? > > If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): > > > @in statement that uses function1 > def function1(*args, **kwargs): > body > > statement that uses function1 and var1 given: > def function1(*args, **kwargs): > body > var1 = value > > Meanwhile, my first question for your syntax is: Why limit it to function definitions? It's worth noting that a Haskell let statement creates local bindings for any values you want; it's not restricted to functions. And that restriction is the only thing that forces the awkward block structure (which would need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: > > let: > > def function1(*args, **kwargs): > body > var = value > any other statement you want > in: > statements > > ? or, for that matter, just a local-scope statement: > > local: > def function1(*args, **kwargs): > body > var = value > any other statement you want > statements that use those definitions > > > This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) > > But I'm wondering why you need a local scope. > > The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable, you just use an assignment/def/class/etc. And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. > > Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. > > Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? From ncoghlan at gmail.com Mon Jan 13 14:53:08 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Jan 2014 23:53:08 +1000 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <8F7C780E-D653-4CBD-8690-7D7B97CC18D0@yahoo.com> References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> <54DB1723-7628-4E00-A7D7-2EA0511F15E0@gmail.com> <8F7C780E-D653-4CBD-8690-7D7B97CC18D0@yahoo.com> Message-ID: On 13 January 2014 21:13, Andrew Barnert wrote: > So I assume you haven't read PEP 403 and 3150, and don't intend to, even > though they directly relate to your idea? In particular: http://www.python.org/dev/peps/pep-3150/#rejected-alternatives :) It isn't in PEP 3150 itself any more (since it was no longer relevant after the PEP switched to explicit forward references), but any multiple-name-binding based proposal needs to account for the torture test I created back when PEP 3150 allowed implicit access to the statement local scope: http://hg.python.org/peps/file/fc2aa3ef6d34/pep-3150.txt#l300 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From musicdenotation at gmail.com Mon Jan 13 15:06:03 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 13 Jan 2014 21:06:03 +0700 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <14C845E9-108D-427E-A06D-BBE188F96763@yahoo.com> References: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> <14C845E9-108D-427E-A06D-BBE188F96763@yahoo.com> Message-ID: <2B246411-911A-41D2-A8EE-21336ED00789@gmail.com> > On Jan 13, 2014, at 18:16, Andrew Barnert wrote: > >> On Jan 13, 2014, at 3:06, musicdenotation at gmail.com wrote: >> >> Mutable namespaces and modules are just workarounds and cannot be substituted for local namespaces. > > Sure, in the exact same way that mutable file objects are just workarounds and cannot be substituted for an I/O monad. > > If you don't think being able to write "a=3" and modify the current (module/class/local) scope is helpful, I think you may be using the wrong language. > >> ---Original message--- >> From: Andrew Barnert >> Sent: Mon, 13 Jan 2014 00:21:07 -0800 >> To: <|musicdenotation at gmail.com|><|python-ideas at python.org|> >> Subject: Re: [Python-ideas] Multi-statement anonymous functions >> >> >> From: "musicdenotation at gmail.com" >> >> Sent: Sunday, January 12, 2014 10:48 PM >> >> >>> Subject: [Python-ideas] Multi-statement anonymous functions >>> >>> Proposed syntaxes: >>>> let function(*args,**kwargs): >>>> ...body... >>>> function2(...args...): >>>> ...body... >>>> in: >>>> [statements] >>> >>>> do: >>>> [statements] >>>> where [function declarations in the same form as above] >>> >>> Inspired by Haskell and Julia. >>> >>> This has the advantage that declared functions aren't binded to names >>> outside their context. >> >> >> I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? >> >> If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): >> >> >> @in statement that uses function1 >> def function1(*args, **kwargs): >> body >> >> statement that uses function1 and var1 given: >> def function1(*args, **kwargs): >> body >> var1 = value >> >> Meanwhile, my first question for your syntax is: Why limit it to function definitions? It's worth noting that a Haskell let statement creates local bindings for any values you want; it's not restricted to functions. And that restriction is the only thing that forces the awkward block structure (which would need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: >> >> let: >> >> def function1(*args, **kwargs): >> body >> var = value >> any other statement you want >> in: >> statements >> >> ? or, for that matter, just a local-scope statement: >> >> local: >> def function1(*args, **kwargs): >> body >> var = value >> any other statement you want >> statements that use those definitions >> >> >> This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) >> >> But I'm wondering why you need a local scope. >> >> The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable, you just use an assignment/def/class/etc. And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. >> >> Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. >> >> Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? No, an I/O monad is a workaround for free side effects. What I want is a canonical, obvious, natural solution to a problem, not a workaround. From masklinn at masklinn.net Mon Jan 13 16:11:19 2014 From: masklinn at masklinn.net (Masklinn) Date: Mon, 13 Jan 2014 16:11:19 +0100 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <2B246411-911A-41D2-A8EE-21336ED00789@gmail.com> References: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> <14C845E9-108D-427E-A06D-BBE188F96763@yahoo.com> <2B246411-911A-41D2-A8EE-21336ED00789@gmail.com> Message-ID: <51758FFE-1475-480B-91D9-027CD939EB46@masklinn.net> On 2014-01-13, at 15:06 , musicdenotation at gmail.com wrote: > No, an I/O monad is a workaround for free side effects. What I want is a canonical, obvious, natural solution to a problem, not a workaround. Monads are not workarounds for anything (anymore than option types are a workaround for a lack of null), they're a type-safe encoding of a sequential computation, the IO monad being the application of the concept to the IO subset of side-effecting computations. Monads are not restricted to side-effecting computations (let alone IO ones), in Haskell option types and lists are also monadic types. From musicdenotation at gmail.com Mon Jan 13 16:50:10 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 13 Jan 2014 22:50:10 +0700 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <8F7C780E-D653-4CBD-8690-7D7B97CC18D0@yahoo.com> References: <9804F4ED-4A18-4DBF-A91D-8839A890AC16@gmail.com> <1389601267.25183.YahooMailNeo@web181005.mail.ne1.yahoo.com> <54DB1723-7628-4E00-A7D7-2EA0511F15E0@gmail.com> <8F7C780E-D653-4CBD-8690-7D7B97CC18D0@yahoo.com> Message-ID: <9039CDB2-9674-421B-B5B1-8E7790639F38@gmail.com> >> On Jan 13, 2014, at 18:13, Andrew Barnert wrote: > > So I assume you haven't read PEP 403 and 3150, and don't intend to, even though they directly relate to your idea? > > Sent from a random iPhone > >> On Jan 13, 2014, at 2:23, musicdenotation at gmail.com wrote: >> >>>> On Jan 13, 2014, at 15:21, Andrew Barnert wrote: >>> >>> From: "musicdenotation at gmail.com" >>> >>> Sent: Sunday, January 12, 2014 10:48 PM >>> >>> >>>> Subject: [Python-ideas] Multi-statement anonymous functions >>>> >>>> Proposed syntaxes: >>>>> let function(*args,**kwargs): >>>>> ...body... >>>>> function2(...args...): >>>>> ...body... >>>>> in: >>>>> [statements] >>>> >>>>> do: >>>>> [statements] >>>>> where [function declarations in the same form as above] >>>> >>>> Inspired by Haskell and Julia. >>>> >>>> This has the advantage that declared functions aren't binded to names >>>> outside their context. >>> >>> >>> I think there's something interesting here, but I'm not seeing it. What's the actual use case for this? >>> >>> If you haven't read PEP 403 and PEP 3150, you should; they both offer similar (but not identical) features in a way that seems more readable (both more compact, and "fronting" the most important part of the construct): >>> >>> >>> @in statement that uses function1 >>> def function1(*args, **kwargs): >>> body >>> >>> statement that uses function1 and var1 given: >>> def function1(*args, **kwargs): >>> body >>> var1 = value >>> >>> Meanwhile, my first question for your syntax is: Why limit it to function definitions? It's worth noting that a Haskell let statement creates local bindings for any values you want; it's not restricted to functions. And that restriction is the only thing that forces the awkward block structure (which would need to be parsed differently than existing Python structures, both by the compiler and by human readers). Why not just a let statement that lets you execute _any_ statements in a local scope, then use that scope: >>> >>> let: >>> >>> def function1(*args, **kwargs): >>> body >>> var = value >>> any other statement you want >>> in: >>> statements >>> >>> ? or, for that matter, just a local-scope statement: >>> >>> local: >>> def function1(*args, **kwargs): >>> body >>> var = value >>> any other statement you want >>> statements that use those definitions >>> >>> >>> This has an advantage over Nick Coghlan's two proposals in that you get to run a full suite with the local scope, instead of just a single statement. (His fronting of the statement makes that restriction necessary; yours doesn't.) >>> >>> But I'm wondering why you need a local scope. >>> >>> The let statement is necessary in Haskell because namespaces, like everything else, are immutable, and there are no real assignments; if you want to bind another variable, you have to create a new scope with that binding on top of the existing one. In Python, if you want to bind another variable, you just use an assignment/def/class/etc. And if you're worried about the name being accessible from outside of the namespace (e.g., if someone does a "from foo import *" on you), there are already idiomatic ways to deal with that: prefix the name with _, or give the module an __all__. Or, again: Python namespaces are mutable, so you can just del a binding after you're done with it if you really need to. >>> >>> Coming at it from a different angle, JavaScript?which has mutable namespaces very much like Python?needs local scopes pretty frequently. But that's only because it has no modules, so everything is in one giant global namespace, which makes it hard to avoid conflicts, figure out where things are defined, etc. So that doesn't seem to apply to Python either. >>> >>> Also, in most cases where you _do_ need a local scope, just defining and calling a function works just fine. That's what people do in Python when they need a local binding for micro-optimization purposes. And the same idiom is used all over the place in JavaScript (which, again, needs local scopes much more often than Python). Is there a use case where that isn't appropriate? >> >> I change my proposal: >>> let: >>> [all new variables created here are local to the let...in... scope, can use global and nonlocal] >>> in: >>> [all new variables created here belong to the surrounding scope, but variables introduced in the let statement will be usable and reassignable] >> or: >>> do: >>> [same semantics as in above] >>> where: >>> [same semantics as let above] >> Or my original proposal but with variable assignment allowed. >> >> Actually, my original proposal was because I didn't want to mess up with globals() and locals(). >> And the where statement is to allow function definitions after their usage. I have read them. I intentionally introduce two ways to do it because the let variation allows you to not pollute up the namespace while still programming the "traditional" but more obvious Python way. Trivia: The following lines of code contains a cultural reference. Do you know what it is? It is very very recent. print(not_a_foot, file=be_seen) del it -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jan 14 03:33:50 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 14 Jan 2014 13:33:50 +1100 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <2B246411-911A-41D2-A8EE-21336ED00789@gmail.com> References: <52d3c8ad.a7dd420a.650e.410c@mx.google.com> <14C845E9-108D-427E-A06D-BBE188F96763@yahoo.com> <2B246411-911A-41D2-A8EE-21336ED00789@gmail.com> Message-ID: <20140114023350.GD3403@ando> On Mon, Jan 13, 2014 at 09:06:03PM +0700, musicdenotation at gmail.com wrote: > What I want is a canonical, obvious, natural solution to a problem, > not a workaround. Please explain what the problem is, what you consider "canonical", "obvious", and "natural", and how we should distinguish a "solution" from a "workaround". Dropping arbitrary syntax into our laps with no explanation of what it means and what is metasyntax does not help. For example, you proposed: let function(*args,**kwargs): ...body... function2(...args...): ...body... in: [statements] do: [statements] where [function declarations in the same form as above] I have no idea what that is supposed to mean. E.g. is "function" a keyword (part of the syntax) or the name of something (a new function perhaps?)? Are the dots and [] syntax or metasyntax? Don't assume we are familiar with Haskell and Julia, or that we can tell which bits are metasyntax and which are intended as new syntax. A good way to proceed is to give examples of the syntax, show the expected output, and preferrably include a plain English description of what it does that is new or different from existing syntax. Thank you. -- Steven From musicdenotation at gmail.com Thu Jan 16 05:55:00 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Thu, 16 Jan 2014 11:55:00 +0700 Subject: [Python-ideas] J Message-ID: <52d76656.488b440a.5eaf.ffffd736@mx.google.com> An embedded and charset-unspecified text was scrubbed... Name: cid:user:composed URL: From denis.spir at gmail.com Thu Jan 16 11:20:08 2014 From: denis.spir at gmail.com (spir) Date: Thu, 16 Jan 2014 11:20:08 +0100 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: <52D7B258.6010908@gmail.com> On 01/11/2014 03:36 PM, Chris Angelico wrote: > On Sun, Jan 12, 2014 at 1:18 AM, Ram Rachum wrote: >> I think that `OrderedDict.items().__getitem__` should be implemented, to >> solve this ugliness: >> >> http://stackoverflow.com/questions/21062781/shortest-way-to-get-first-item-of-ordereddict-in-python-3 >> >> What do you think? > > Well, the first problem with that is that __getitem__ already exists, > and it's dict-style :) So you can't fetch out an item by its position > that way. But suppose you create a method that returns the Nth > element. > > The implementation in CPython 3.4 is a linked list, so getting an > arbitrary element by index would be quite inefficient. [...] I have occasionnally implemented ordered sets or associative arrays in a really simple, stupid manner [1]: just store the items or entries (pairs) in an array (instead of, say, anywhere in memory at the allocator's convenience), in addition to the usual array of "buckets". About efficiency, there is a kind of balance of benefits & costs: In average, common operations should in principle be faster due to memory compacity and consequent cache usage. No cost in memory size (entries must be somewhere anyway). There is a little cost when the whole data structure grows, since now 2 arrays have to be resized up; to mitigate, i provide a 'predim' (predimension) method that avoids most growing operations. The point is that now entries form an array that keeps insertion order, can be traversed and even indexed. An issue, however, like for any flexible-size array, is with item deletion. I don't delete at once, which would require compacting the array everytime an entry is removed, instead just mark entries as deleted. Whenever a big proportion of items are removed [2], there is automatic compaction. But with such a trick, indexing in then invalid (and prevented); to retrieve this indexing feature, if items have been deleted, clients must first run a 'compact' method, that actually removes all deleted items at once (and if many worth it resizes down). For traversal however, there is no issue: the implementation just needs to skip items marked as deleted. I have no idea whether such a stupid way to make ordered sets/dicts is compatible with the present requirements or implementation for python's ordereddicts (but suspect it is not). And I guess the constraint on indexing does not really fit the python way, in that an implementation constraint leaks into the client interface. Just wanted however to say a few words about that scheme due its simplicity and practicality. Comments welcome. Denis [1] The common need is usually for what I call "mod tables", used as symbol tables: a mod table is like a hash table, but with keys beeing unsigned ints, thus there is no hash, instead plain modulo. This makes a sort of sparse array, but ordered. Numeric keys actually represent interned strings which themselves are keys in symbol tables (scopes, namespaces...). [2] To avoid a kind of threashold effect, compaction happens when the count of items is less than 3/8 of capacity, not half of it. There must be an hysteresis (difference of threashold to resize up versus down) to avoid instability in the hypothetical case where the count of items is close to a resizing capacity and items are constantly beeing put & removed. From oscar.j.benjamin at gmail.com Thu Jan 16 11:53:42 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 16 Jan 2014 10:53:42 +0000 Subject: [Python-ideas] `OrderedDict.items().__getitem__` In-Reply-To: References: Message-ID: <20140116105341.GB11119@gmail.com> On Sat, Jan 11, 2014 at 04:36:49PM +0100, Peter Otten wrote: > Ram Rachum wrote: > > > I think that `OrderedDict.items().__getitem__` should be implemented, to > > solve this ugliness: > > > > http://stackoverflow.com/questions/21062781/shortest-way-to-get-first- > item-of-ordereddict-in-python-3 > > > > What do you think? > > I think an O(N) __getitem__() is even uglier. Also, you should have really > compelling reasons for allowing the interfaces of dict.items() and > OrderedDict.items() to diverge. Agreed, but I do think that OrderedDict could be more helpful here. I haven't wanted to get the first item before but I have wanted to get the last without popping it off. Since this can be provided in O(1) I think it would make a reasonable addition as a property of OrderedDict: @property def last(self): if not self: raise KEyError('dictionary is empty') return self.__root.prev Just returning the key sufficient but in my own use cases I would have wanted the whole item which you could easily do: @property def lastitem(self): if not self: raise KEyError('dictionary is empty') key = self.__root.prev return key, self.__map[key] Oscar From ram.rachum at gmail.com Fri Jan 17 14:00:50 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Fri, 17 Jan 2014 05:00:50 -0800 (PST) Subject: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` Message-ID: <9aca6c85-f924-4adf-b205-a2acbf006bb1@googlegroups.com> Hi, I'd like to use `concurrent.futures.ProcessPoolExecutor` but have each process contain multiple worker threads. We could have an `n_threads` argument to the constructor, defaulting to 1 to maintain backward compatibility, and setting a value higher than 1 would cause multiple threads to be spawned in each process. What do you think? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Fri Jan 17 14:45:14 2014 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Fri, 17 Jan 2014 15:45:14 +0200 Subject: [Python-ideas] Make max() stable Message-ID: Hi all, Given several objects with the same key, max() returns the first one: >>> key = lambda x: 0 >>> max(1, 2, key=key) 1 This means it is not stable, at least according to the definition in "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): "Informally, an algorithm is stable if it respects the original order of equivalent objects. So if we think of minimum and maximum as selecting, respectively, the smallest and the second smallest from a list of two arguments, stability requires that when called with equivalent elements, minimum should return the first and maximum should return the second." A page later, In a side note, the authors mention that "STL incorrectly requires that `max(a, b)` return `a` when `a` and `b` are equivalent." (As a reminder, Stepanov is the chief designer of the STL). So, I know this is not a big issue, to say the least, but is there any reason *not* to return the last argument? Are we trying to be compatible with STL somehow? I admit I don't know of any real use case where this really matters, but I can point out that >>> ab = (1, 2) >>> (min(ab, key=key), max(ab, key=key)) (1, 1) is visibly less sensible than (1, 2). Hope I'm not being a crank here... Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jan 17 17:06:04 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Jan 2014 03:06:04 +1100 Subject: [Python-ideas] Make max() stable In-Reply-To: References: Message-ID: <20140117160604.GJ3915@ando> On Fri, Jan 17, 2014 at 03:45:14PM +0200, ????? wrote: > Hi all, > > Given several objects with the same key, max() returns the first one: > > >>> key = lambda x: 0 > >>> max(1, 2, key=key) > 1 A more natural example showing that both min and max return the *first* of equal elements is: py> values = (1, 2, 1.0, 2.0) py> min(values) 1 py> max(values) 2 > This means it is not stable, at least according to the definition in > "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): > > "Informally, an algorithm is stable if it respects the original order of > equivalent objects. Stability is normally only of concern with sorting algorithms. I'm not sure whether Stepanov and McJones' definition is widely accepted, or important. There are clear reasons for desiring sorting to be stable. Are there any clear reasons to desire max to be stable in this sense? (There is another, unrelated, meaning of stability with regard to numeric algorithms, but it has nothing to do with the order of objects.) Note that stability in this sense only is meaningful when your values are objects that you care about their identity as well as value. > So if we think of minimum and maximum as selecting, > respectively, the smallest and the second smallest from a list of two > arguments, stability requires that when called with equivalent elements, > minimum should return the first and maximum should return the second." > > A page later, In a side note, the authors mention that "STL incorrectly > requires that `max(a, b)` return `a` when `a` and `b` are equivalent." (As > a reminder, Stepanov is the chief designer of the STL). > > So, I know this is not a big issue, to say the least, but is there any > reason *not* to return the last argument? Are we trying to be compatible > with STL somehow? No, I expect that the result is an accident of implementation. Specifically, the min and max algorithms probably look something like this: min = first item for each item: if item < min: min = item max = first item for each item: if item > max: max = item > I admit I don't know of any real use case where this really matters, but I > can point out that > > >>> ab = (1, 2) > >>> (min(ab, key=key), max(ab, key=key)) > (1, 1) > > is visibly less sensible than (1, 2). Using your weird key function here is going to give bizarre results. Consider: ab = (2, 1) min(ab, key=key), max(ab, key=key) Current behaviour is to return (2, 2). I don't think that returning 2 for the minimum and 1 for the maximum is more sensible, but that's because the key function is not sensible, not because of any objection to making max stable in this sense. -- Steven From rosuav at gmail.com Fri Jan 17 17:15:50 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 18 Jan 2014 03:15:50 +1100 Subject: [Python-ideas] Make max() stable In-Reply-To: <20140117160604.GJ3915@ando> References: <20140117160604.GJ3915@ando> Message-ID: On Sat, Jan 18, 2014 at 3:06 AM, Steven D'Aprano wrote: > Using your weird key function here is going to give bizarre results. > Consider: > > ab = (2, 1) > min(ab, key=key), max(ab, key=key) > > Current behaviour is to return (2, 2). I don't think that returning 2 > for the minimum and 1 for the maximum is more sensible, but that's > because the key function is not sensible, not because of any objection > to making max stable in this sense. Imagine implementing min and max this way (ignoring key= and the possibility of a single iterable arg): def min(*args): return sorted(args)[0] def max(*args): return sorted(args)[-1] By that definition, a stable sort means that: lst = sorted((x,y)) assert lst == [min(lst), max(lst)] will pass for any x and y. That said, I don't see any particular use cases for this identity. Maybe the OP can enlighten? ChrisA From tjreedy at udel.edu Sat Jan 18 00:34:24 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Jan 2014 18:34:24 -0500 Subject: [Python-ideas] Make max() stable In-Reply-To: References: Message-ID: On 1/17/2014 8:45 AM, ????? wrote: > Hi all, > > Given several objects with the same key, max() returns the first one: > > >>> key = lambda x: 0 > >>> max(1, 2, key=key) > 1 As documented: "If multiple items are maximal, the function returns the first one encountered." > This means it is not stable, at least according to the definition in > "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): > > "Informally, an algorithm is stable if it respects the original order of > equivalent objects. Min and max are inherently functions of multisets, with order irrelevant but duplicate values allowed. So I do not think 'stability' applies, even if a multiset is presented in some arbitrary order. > So if we think of minimum and maximum as selecting, > respectively, the smallest and the second smallest from a list of two > arguments, Why not say largest and second largest, but why either rather than just smallest and largest? > stability requires that when called with equivalent elements, > minimum should return the first and maximum should return the second." The Python dev who wrote the doc disagrees: "This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0] and heapq.nlargest(1, iterable, key=keyfunc)." A simpler reason for the current behavior is that it is more efficient to not rebind the internal variable when an equal object is encountered. In any case, changing the definition and implementation of max will break any code that depends on returning first versus last. We have to have a good reason to do so. If there is no such code (other than the test code that checks the 'first encountered' behavior), then there is no need to change. -- Terry Jan Reedy From abarnert at yahoo.com Sat Jan 18 01:09:55 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 17 Jan 2014 16:09:55 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: References: Message-ID: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> On Jan 17, 2014, at 15:34, Terry Reedy wrote: > On 1/17/2014 8:45 AM, ????? wrote: >> Hi all, >> >> Given several objects with the same key, max() returns the first one: >> >> >>> key = lambda x: 0 >> >>> max(1, 2, key=key) >> 1 > > As documented: "If multiple items are maximal, the function returns the first one encountered." > >> This means it is not stable, at least according to the definition in >> "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): >> >> "Informally, an algorithm is stable if it respects the original order of >> equivalent objects. > > Min and max are inherently functions of multisets, with order irrelevant but duplicate values allowed. I'm not sure that's necessarily true. The maximal value of a sequence makes every bit as much sense as the maximal value of a set or multiset, and I think it comes up quite often. For example, if I have a series of experiments at different times, the (time, value) pairs have an obvious meaningful order, and asking for max(experiments, key=itemgetter(1)) is a meaningful thing to do. But often, even for a sequence, you don't care which max you get. And often, when you do care, you explicitly want the first. The high score on a video game belongs to the first person who reached that score, not to someone who later tied him. Sure, _sometimes_ you want the last rather than the first. I've actually written variants of max, nlargest, groupby, etc. that track the last value instead of the first (e.g., to make groupby treat adjacent runs as a group) multiple times. But I wouldn't expect that to be the default behavior of any of those functions. So, I think the existing design of all these functions is less surprising than the alternative, and adequately documented, and it's easy enough to write the alternative when you need it. > In any case, changing the definition and implementation of max will break any code that depends on returning first versus last. This is about as perfect a reason as possible. If it ever matters, we can't change it; if it never matters, we have no reason to change it... From tjreedy at udel.edu Sat Jan 18 01:48:55 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 17 Jan 2014 19:48:55 -0500 Subject: [Python-ideas] Make max() stable In-Reply-To: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> References: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> Message-ID: On 1/17/2014 7:09 PM, Andrew Barnert wrote: > On Jan 17, 2014, at 15:34, Terry Reedy wrote: >> Min and max are inherently functions of multisets, with order >> irrelevant but duplicate values allowed. I should have said a multiset of comparable objects. > I'm not sure that's necessarily true. The maximal value of a sequence > makes every bit as much sense as the maximal value of a set or > multiset A list of comparable objects *is* a multiset of comparable objects. So is any iterable of comparable objects. Which is why 'iterable of comparable objects' is the proper domain for max. Similar comments apply to any commutative associative operator. > For example, if I have > a series of experiments at different times, the (time, value) pairs > have an obvious meaningful order, and asking for max(experiments, > key=itemgetter(1)) is a meaningful thing to do. max((value,time) for time,value in experiments) gives the lastest high value. In general max((val,i) for i,val in enumerate(iterable)) does the same. If max gave the last maximum, it would be trickier to get the first maximum, just as it is now to get the last minimum. -- Terry Jan Reedy From ethan at stoneleaf.us Sat Jan 18 01:26:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 16:26:48 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> References: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> Message-ID: <52D9CA48.8000501@stoneleaf.us> On 01/17/2014 04:09 PM, Andrew Barnert wrote: > > This is about as perfect a reason as possible. If it ever matters, we can't change it; if it never matters, we have no reason to change it... +1 QOTW -- ~Ethan~ From ned at nedbatchelder.com Sat Jan 18 02:35:57 2014 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 17 Jan 2014 20:35:57 -0500 Subject: [Python-ideas] Make max() stable In-Reply-To: References: Message-ID: <52D9DA7D.5090307@nedbatchelder.com> On 1/17/14 8:45 AM, ????? wrote: > Hi all, > > Given several objects with the same key, max() returns the first one: > > >>> key = lambda x: 0 > >>> max(1, 2, key=key) > 1 > > This means it is not stable, at least according to the definition in > "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): > > "Informally, an algorithm is stable if it respects the original order > of equivalent objects. So if we think of minimum and maximum as > selecting, respectively, the smallest and the second smallest from a > list of two arguments, stability requires that when called with > equivalent elements, minimum should return the first and maximum > should return the second." I don't understand this logic at all. Stability matters in sorting because sort() takes a sequence and returns a sequence, and for various reasons you might need to sort a list twice, with different criteria. Stability guarantees that the second sort won't discard the work of the first sort. Is there an example of an actual problem that stability of min and max would make easier to solve? --Ned. From nas-python at arctrix.com Sat Jan 18 04:22:19 2014 From: nas-python at arctrix.com (Neil Schemenauer) Date: Fri, 17 Jan 2014 21:22:19 -0600 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x Message-ID: <20140118032219.GA11381@python.ca> The transition to Python 3 is happening but there is still a massive amount of code that needs to be ported. One of the most disruptive changes in Python 3 is the strict separation of bytes from unicode strings. Most of the other incompatible changes can be handled by 2to3. Here is a far out idea to make transition smoother. Release version 2.8 of Python with nearly all Python 3.x incompatible changes except for the bytes/unicode changes. This could include: - print as function - default string literal as unicode - return view objects for dict.keys(), etc - rename modules in standard library - rename long to int - rename .next() to __next__() - accept only new 'raise' syntax - remove backticks for repr - rename unicode to str - removal of 'apply', 'buffer', 'callable', 'execfile' - exec as function - rename os.getcwdu() to os.getcwd() - remove dict.has_key - move intern to sys.intern() - rename xrange to range - remove xreadlines New features of Python 3.x could be backported if easy since they could be useful to entice developers to move from 2.7 to 2.8. Problems with this idea: - it would be a huge amount of work. There are thousands of commits to Python 3.x since it was branched. Most of them are not related to the above features but back porting them would still be a huge effort. I tried backport 'print' as a function just to get an idea of the work. - if people install this new version of Python as the default, old scripts and programs will break. I believe this breakage was the movation for making Python 3 an all-at-once jump. I'm not sure how to handle this, maybe this version could be used only by developers during their Python 3 porting efforts. Alternatively, only install it as 'python2.8', never 'python' or 'python2'. An alternative approach to producing Python 2.8 would be to start with the Python 3.x latest branch. Modify bytesobject and unicodeobject to have as close to Python 2 behavior as practical. A-journey-of-a-thousand-miles-begins-ly y'rs Neil From ethan at stoneleaf.us Sat Jan 18 04:59:23 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 17 Jan 2014 19:59:23 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: <52D9FC1B.5060802@stoneleaf.us> This thread just happened not three weeks ago. Python 2.8 ain't gonna happen. -- ~Ethan~ From rosuav at gmail.com Sat Jan 18 05:49:39 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 18 Jan 2014 15:49:39 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: On Sat, Jan 18, 2014 at 2:22 PM, Neil Schemenauer wrote: > - it would be a huge amount of work. There are thousands of > commits to Python 3.x since it was branched. Most of them are not > related to the above features but back porting them would still be > a huge effort. I tried backport 'print' as a function just to get > an idea of the work. > Guido's time machine strikes again. Put this at the top of your script and run it under 2.7 or 2.6: from __future__ import print_function ChrisA From tjreedy at udel.edu Sat Jan 18 06:24:33 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 18 Jan 2014 00:24:33 -0500 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: On 1/17/2014 10:22 PM, Neil Schemenauer wrote: > The transition to Python 3 is happening but there is still a massive > amount of code that needs to be ported. For application code, why does it need to be ported. > One of the most disruptive > changes in Python 3 is the strict separation of bytes from unicode > strings. Most of the other incompatible changes can be handled by > 2to3. For many application areas, the text problem seems to have been somewhat solved, to the point where people are writing 2&3 code successfully. > Here is a far out idea to make transition smoother. Release version > 2.8 of Python with nearly all Python 3.x incompatible changes except > for the bytes/unicode changes. Various people have suggested versions of this idea. At one time, I could imagine it, even after PEP404. But a 2.8 project should have started soon after 2.7 was released with 2.8 released soon after 3.3 or certainly now with 3.4. I think it too late now. > This could include: I believe you left out the int division change. > Problems with this idea: People who cannot move to 3.x because of libraries could not move to 2.8 for the same reason. Over half of the most commonly downloaded libraries already have 3.x versions. Major linux distributions are already in the process of switching to 3.x as default Python. > - it would be a huge amount of work. Yes, and the current volunteer pydev group will not do it. So this is literally the wrong forum. Martijn Faassen posted the following on python-list on the 6th. ''' I've started an informal channel "#python2.8" on freenode. It's to discuss the potential for a Python 2.8 version -- to see whether there is interest in it, what it could contain, how it could facilitate porting to Python 3, who would work on it, etc. If you are interested in constructive discussion about a Python 2.8, please join. I realize that if there is actual code created, and if it's not under the umbrella of the PSF, it couldn't be called "Python 2.8" due to trademark reasons. But that's premature - let's have some discussions first to see whether anything can happen. ''' > There are thousands of > commits to Python 3.x since it was branched. Most of them are not > related to the above features but back porting them would still be > a huge effort. I tried backport 'print' as a function just to get > an idea of the work. You are unusual. Many 2.8 advocates want it handed to them for free. -- Terry Jan Reedy From stephen at xemacs.org Sat Jan 18 06:26:19 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Jan 2014 14:26:19 +0900 Subject: [Python-ideas] Make max() stable In-Reply-To: <52D9DA7D.5090307@nedbatchelder.com> References: <52D9DA7D.5090307@nedbatchelder.com> Message-ID: <87mwitc138.fsf@uwakimon.sk.tsukuba.ac.jp> Ned Batchelder writes: > On 1/17/14 8:45 AM, ????? wrote: > > "Informally, an algorithm is stable if it respects the original order > > of equivalent objects. So if we think of minimum and maximum as > > selecting, respectively, the smallest and the second smallest from a > > list of two arguments, stability requires that when called with > > equivalent elements, minimum should return the first and maximum > > should return the second." > I don't understand this logic at all. Stability matters in sorting > because sort() takes a sequence and returns a sequence, and for various > reasons you might need to sort a list twice, with different criteria. > Stability guarantees that the second sort won't discard the work of the > first sort. Two comments. First, I don't understand at all why earlier members of a sequence may be presumed to be smaller. It could easily go the other way around. Second, since these operations are *selections* from a collection (which might impose order or not, which might impose uniqueness or not), it's the same problem that Steven d'Aprano faced in defining mode for the statistics PEP: Do you admit failure (here, noncomparability of some of the maximal items), so that a value that is none of the items must be returned? In the case of multiple equivalent values, do you return a representative or the collection? From tim.peters at gmail.com Sat Jan 18 06:37:28 2014 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 17 Jan 2014 23:37:28 -0600 Subject: [Python-ideas] Make max() stable In-Reply-To: References: Message-ID: [????? ] > Given several objects with the same key, max() returns the first one: > > >>> key = lambda x: 0 > >>> max(1, 2, key=key) > 1 > > This means it is not stable, at least according to the definition in > "Elements of Programming" by Alexander Stepanov and Paul McJones (pg. 52): > > "Informally, an algorithm is stable if it respects the original order of > equivalent objects. So if we think of minimum and maximum as selecting, > respectively, the smallest and the second smallest from a list of two > arguments, stability requires that when called with equivalent elements, > minimum should return the first and maximum should return the second." A sound argument, provided one accepts the "if". But nobody in the known history of the world *does* think of min and max that way outside this silly quote ;-) > A page later, In a side note, the authors mention that "STL incorrectly > requires that `max(a, b)` return `a` when `a` and `b` are equivalent." (As a > reminder, Stepanov is the chief designer of the STL). > > So, I know this is not a big issue, to say the least, but is there any > reason *not* to return the last argument? Are we trying to be compatible > with STL somehow? No. We're just doing what everyone *really* expects min and max to do - including whoever implemented the STL's max(). > ... > Hope I'm not being a crank here... One removed from being a crank is not itself being a crank :-) From abarnert at yahoo.com Sat Jan 18 06:42:13 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 17 Jan 2014 21:42:13 -0800 (PST) Subject: [Python-ideas] Make max() stable In-Reply-To: References: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> Message-ID: <1390023733.40011.YahooMailNeo@web181004.mail.ne1.yahoo.com> From: Terry Reedy Sent: Friday, January 17, 2014 4:48 PM > On 1/17/2014 7:09 PM, Andrew Barnert wrote: >> On Jan 17, 2014, at 15:34, Terry Reedy wrote: > >>> Min and max are inherently functions of multisets, with order >>> irrelevant but duplicate values allowed. > > I should have said a multiset of comparable objects. > >> I'm not sure that's necessarily true. The maximal value of a > sequence >> makes every bit as much sense as the maximal value of a set or >> multiset > > A list of comparable objects *is* a multiset of comparable objects. No, a list is a multiset _with order_. Which is the whole point. You claimed that because it's a multiset, the order doesn't matter. But because the domain of max is a sequence (or, better, as you correctly point out, an iterable), not a multiset,?the order does matter. Otherwise this entire question wouldn't arise in the first place. >> ??asking for max(experiments,?key=itemgetter(1)) is a meaningful thing to do. > > max((value,time) for time,value in experiments) >? > gives the lastest high value. In general > > max((val,i) for i,val in enumerate(iterable)) > > does the same. Sure, given my list of experiments in time order, these give the same result as my expression (except with the members of the tuple reversed, which we can ignore). And? No matter how you write this, you're not just picking the highest value, you're picking the highest value _with the earliest time_. Which is a meaningful thing to do. > If max gave the last maximum, it would be trickier to get the first maximum,? > just as it is now to get the last minimum. Of course. If you read my whole message, that was exactly my point: both are useful. Well, that, and the fact that the current behavior is (a) useful more often than the opposite, and (b): compatible with reams of existing code. And therefore, it would be a bad idea to gratuitously change max to return the last instead of the first. Which I think you agree with completely, so I'm not sure why you're trying to disprove it. From stephen at xemacs.org Sat Jan 18 08:24:43 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 18 Jan 2014 16:24:43 +0900 Subject: [Python-ideas] Make max() stable In-Reply-To: <1390023733.40011.YahooMailNeo@web181004.mail.ne1.yahoo.com> References: <4CE1110A-2C3A-4955-8A59-787FC3D8F14E@yahoo.com> <1390023733.40011.YahooMailNeo@web181004.mail.ne1.yahoo.com> Message-ID: <87iothbvlw.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > No, a list is a multiset _with order_. Which is the whole > point. You claimed that because it's a multiset, the order doesn't > matter. But because the domain of max is a sequence (or, better, as > you correctly point out, an iterable), not a multiset,?the order > does matter. Otherwise this entire question wouldn't arise in the > first place. Iterables need not have order in a sense that allows definition of "stability". That's why things like OrderedDict are necessary. From jeanpierreda at gmail.com Sat Jan 18 08:40:49 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 17 Jan 2014 23:40:49 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: <52D9DA7D.5090307@nedbatchelder.com> References: <52D9DA7D.5090307@nedbatchelder.com> Message-ID: On Fri, Jan 17, 2014 at 5:35 PM, Ned Batchelder wrote: > Is there an example of an actual problem that stability of min and max would > make easier to solve? In a language like C++, you if min and max had the property specified by the OP, you might do: x = min(a, b); y = max(a, b); And then x is the smallest, and y is the other one, and it's simple and easy and less code than an if statement. I suspect this is where the desire comes from. In Python, of course, you do x, y = sorted([a, b]) -- Devin From abarnert at yahoo.com Sat Jan 18 08:56:21 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 17 Jan 2014 23:56:21 -0800 (PST) Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> From: Neil Schemenauer Sent: Friday, January 17, 2014 7:22 PM > Here is a far out idea to make transition smoother.? Release version > 2.8 of Python with nearly all Python 3.x incompatible changes except > for the bytes/unicode changes. What exactly do you mean by "the bytes/unicode changes"? There's a wide range of differences between 2.7 and 3.4 that could fall into this category. At least two of them, you'll specifically included in your proposed 2.8, including one of the three huge ones. Here's the ones I can think of off the top of my head, in rough order of most to least code-breaking: ?* No automatic conversions from bytes to unicode. ?* No automatic conversions from unicode to bytes. ?* Rename unicode to str (included in your suggestion). ?* File objects can be either unicode-based (text) or bytes-based (binary), defaulting to unicode. ?* The stdin/out/err files, StringIO, and various other common file objects are text. ?* __str__ (and __repr__) must return unicode, not bytes?and it's what print, "%s", default "{}", etc. call. ?* __bytes__ (the 3.x equivalent of 2.x's __str__) exists, but is not called by anything but bytes(), and is not supplied by most builtin/stdlib types (which is why, e.g., bytes(2) returns b'\0\0', not b'2'). ?* Dozens of builtins and stdlib functions that used to work on bytes (or, in some cases, on either bytes or unicode) now work on unicode (e.g, csv.reader, json.loads). ?* Default string literal as unicode (included in your suggestion; already available with a future statement). ?* No bytes.encode or unicode.decode. (In 2.x, when used with codecs like 'ascii' or 'utf-8' these were almost always errors? but errors that a lot of badly-written code relies on to "work", as long as you never give it a non-ASCII character.) ?* No bytes.__mod__ or bytes.format (at least in 3.4; this may change later). ?* Bytes is an iterable of small ints rather than of single-char bytes. ?* File objects are the wrappers from the io module, not thin wrappers around C stdio. ?* All text files have universal newlines enabled, unless otherwise specified by the (not in 2.x) newline param. ?* Functions like chr and ord are based on?Unicode code points, not bytes. (There are no bytes equivalent because there's no need if bytes is an iterable of ints.) ?* Different internal representation for unicode objects. ?* Different C API for unicode objects. ?* No basestring. So? which of these do you want, and which do you not? I suspect that, whatever your exact answers, it would be a lot easier to fork 3.4 and port the 2.7 behavior you want than to fork 2.7 and backport almost all of 3.4. And if you do it that way, you could even adapt the idea someone proposed a few weeks ago?not popular on this list, but maybe popular with your target audience?of turning each change on and off with a "from __past__ import misfeature" statement, so people could pick and choose the ones they need, and gradually remove past statements as they port from your forked 2.8 to real 3.4. However,?I also suspect that, whatever your exact answers, it won't be that useful.?Look at people's reasons for not moving to 3.x: ?*?If your app already works in 2.7, and has no need for any new 3.x-only packages, it makes perfect sense to stay with 2.7. Which means there's no reason to move to 2.8. ?* If your app works in 2.7, but you're worried that it will eventually become hard to find?supported 2.7 installations to run on, would you really expect finding 2.8 installations to be be easier? ?*?If you're staying with 2.7 because your OS, hosting company, dev team, school, whatever provides it, there's no reason to go to 2.8. ?* If?you depend on a package that hasn't been ported to 3.x? well, that's four separate issues. ?* If you depend on an in-house/small-market package that hasn't been ported, it's really the same case as "I have an app that works just fine in 2.7." ?* If you depend on a package that hasn't been ported because it's effectively moribund, it's not going to be ported to 2.8. ?* If you depend on a package that actually has been ported to 3.x, but you're too stupid to find information anywhere but blog posts or StackOverflow questions dated 2009 (which is depressingly common?), those posts are not going to tell you about 2.8. ?* If you depend on a package that's legitimately hard to port to 3.x, it obviously won't be ported to 2.8 yet either?and since it'll probably?be a lower priority for the developers, even if 2.8 is an easier port than 3.4 there's no guarantee it'll come sooner. (Also, consider that typically, people depend on 6 packages that have been ported and 1 that hasn't; if they switch to 2.8, that'll be 7 packages they need to wait on rather than 1.) ?* If you have code that sort of works in 2.7?if you're careful to feed it only ASCII, just renaming str will almost certainly break your code. If you fix it, it will be as easy to port to 3.4 as to 2.8. If you don't fix it? well, at best this is the same as the first case; if not, it's the same as the next one. ?* If you have code that's legitimately difficult to port to 3.x because, e.g., it relies on parsing and creating network messages or file formats that mix ASCII text and binary or encoded-text payloads, just renaming str will break your code. And it may be non-trivial to fix. I'm having a hard time imagining code that would be easy to port to 2.8, but not to 3.x. For example: ? ? payload = ? ? sock.sendall('Header: {}\r\nAnother: {}\r\n\r\n{}'.format( ? ? ? ? headers['header'], headers['another'], payload)) Even with just the two changes you already suggested:?First, you have to change the literal to a bytes literal. More seriously, you have to rename that payload type's __str__ method to __bytes__. And if it does any string stuff internally, like encoding JSON, that has to change. Meanwhile, your?logging code probably relies on the same _str__ method actually returning a str, so you have to add one of those. Assuming headers is a dict of strs, you either need to go back up the chain (or into the API that provides it) and change that so it's been a dict of bytes all along, or you need to explicitly encode the headers here. That doesn't sound too hard overall? but that gives you working Python 3.5?code (assuming PEP 460 goes through). And there doesn't seem to be any shortcut that would give you working 2.8 code without also working in 3.5. Also, one quick comment: > - removal of 'apply', 'buffer', 'callable', 'callable' exists in Python 3.2+. Not a big deal, unless this implies that you're basing everything on the state of the ecosystem back in Python 3.1. I don't think that it does, but just in case:?Three years ago, people didn't have much experience with porting yet (e.g., writing 2.x code and running it through 2to3 at install time was considered the best way to port things gradually?) and most of PyPI didn't exist for 3.x yet. Back then, this suggestion would have been a lot more compelling than it is today, because all anyone could say was, "Wait and see, we're hoping it'll be better" instead of "Look and see, it already is better." From steve at pearwood.info Sat Jan 18 09:12:48 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Jan 2014 19:12:48 +1100 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <52D9DA7D.5090307@nedbatchelder.com> Message-ID: <20140118081248.GN3915@ando> On Fri, Jan 17, 2014 at 11:40:49PM -0800, Devin Jeanpierre wrote: > On Fri, Jan 17, 2014 at 5:35 PM, Ned Batchelder wrote: > > Is there an example of an actual problem that stability of min and max would > > make easier to solve? > > In a language like C++, you if min and max had the property specified > by the OP, you might do: > > x = min(a, b); > y = max(a, b); > > And then x is the smallest, and y is the other one, and it's simple and > easy and less code than an if statement. But that's how max and min work right now, modulo that object identity is not important. If you care about object identity, you're probably doing something underhanded *wink* Given the case that a and b are *equal* (as measured by the key function, if given) then it shouldn't matter whether you get smallest = a biggest = b or smallest = b biggest = a or smallest = biggest = a or smallest = biggest = b These variations only are meaningful if a and b are different types with the same value, or the same type but different identities. Even if these variations are important, I don't think there is any inherent benefit to one over the other. Personally, I'd either keep the current behaviour, or purely for the symmetry, pick the so-called "stable" behaviour. But I don't see any rational reason for preferring one over the other. Now that Python 3 documents the specific behaviour, it's not worth changing. > I suspect this is where the desire comes from. > > In Python, of course, you do x, y = sorted([a, b]) Now the interesting thing here is that sorted *is* stable, so if a and b are equal, sorted([a, b]) is guaranteed to return [a, b]. Which gives the behaviour requested. -- Steven From greg.ewing at canterbury.ac.nz Sat Jan 18 09:25:23 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 18 Jan 2014 21:25:23 +1300 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <52D9DA7D.5090307@nedbatchelder.com> Message-ID: <52DA3A73.9060703@canterbury.ac.nz> Devin Jeanpierre wrote: > In a language like C++, you if min and max had the property specified > by the OP, you might do: > > x = min(a, b); > y = max(a, b); > > And then x is the smallest, and y is the other one With Python's current definition of max(), you can get that effect using x, y = min(a, b), max(b, a) So max() *does* respect the order of its operands; it's just that the order it respects may not be obvious unless you're Dutch. -- Greg From abarnert at yahoo.com Sat Jan 18 09:26:25 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 18 Jan 2014 00:26:25 -0800 (PST) Subject: [Python-ideas] Make max() stable In-Reply-To: References: <52D9DA7D.5090307@nedbatchelder.com> Message-ID: <1390033585.24715.YahooMailNeo@web181004.mail.ne1.yahoo.com> From: Devin Jeanpierre Sent: Friday, January 17, 2014 11:40 PM > In a language like C++, you if min and max had the property specified > by the OP, you might do: > > x = min(a, b); > y = max(a, b); > > And then x is the smallest, and y is the other one, and it's simple and > easy and less code than an if statement. I suspect this is where the > desire comes from. > > In Python, of course, you do x, y = sorted([a, b]) You can write the exact same thing in C++: ? ? template whatever(T a, T b) { ? ? ? ? T x, y; ? ? ? ? tie(x, y) = tuplize(sorted(make_vector(a, b))); And then x is the smallest, and y is the other one. Plus, all that static type safety makes it even better than the Python version, right? And extending all those helpers to take a variable number of arguments is a simple matter of template metaprogramming (either with a bit of preprocessor help via boost, or with template parameter packs). Of course you need to write those three helper function templates. Making them work for two parameters is trivial; making them work for?an arbitrary number of parameters is a simple matter of template metaprogramming, which any sixth-year C++ student can write in a matter of weeks; it's just partially specializing on parameter packs, which can be done with simple compile-time recursion. And the advantage is that all that static typing makes sure that you get errors at compile time instead of run time if you make any mistakes?or often even if you don't. From greg.ewing at canterbury.ac.nz Sat Jan 18 09:37:52 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 18 Jan 2014 21:37:52 +1300 Subject: [Python-ideas] Make max() stable In-Reply-To: <1390033585.24715.YahooMailNeo@web181004.mail.ne1.yahoo.com> References: <52D9DA7D.5090307@nedbatchelder.com> <1390033585.24715.YahooMailNeo@web181004.mail.ne1.yahoo.com> Message-ID: <52DA3D60.6010807@canterbury.ac.nz> Andrew Barnert wrote: > And the advantage is that all that static typing makes sure that you get verrors at compile time instead of run time if you make any mistakes?or often > even if you don't. Also, if you do it right, the whole computation is performed at compile time, so you don't even have to run the program. This makes it very easy to deploy in a cross-platform manner on minimally-specced hardware. -- Greg From jeanpierreda at gmail.com Sat Jan 18 11:40:50 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 18 Jan 2014 02:40:50 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: <20140118081248.GN3915@ando> References: <52D9DA7D.5090307@nedbatchelder.com> <20140118081248.GN3915@ando> Message-ID: On Sat, Jan 18, 2014 at 12:12 AM, Steven D'Aprano wrote: > These variations only are meaningful if a and b are different types > with the same value, or the same type but different identities. Even if > these variations are important, I don't think there is any inherent > benefit to one over the other. These variations are also important if a and b are just plain different values, same type or no. This can happen if max/min are passed a key function -- equality of a sort key doesn't mean the values are interchangeable for all purposes if x and y are strings with the same length, min(x, y, key=len) + max(x, y, key=len) is something different in each of those helpfully enumerated cases, and that's with a well behaved type and a superficially OK looking expression. That said, considering (as Greg points out) that all four variations can be transformed into one another by rearranging the arguments to min and max, I think it's pretty clear that there's nothing strongly favoring any one of them, so on that (and the rest) I agree. -- Devin From musicdenotation at gmail.com Sat Jan 18 15:52:17 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Sat, 18 Jan 2014 21:52:17 +0700 Subject: [Python-ideas] Tail recursion elimination Message-ID: On April 22, 2009, Guido van Rossum wrote: > First, as one commenter remarked, TRE is incompatible with nice stack traces: when a tail recursion is eliminated, there's no stack frame left to use to print a traceback when something goes wrong later. This will confuse users who inadvertently wrote something recursive (the recursion isn't obvious in the stack trace printed), and makes debugging hard. Providing an option to disable TRE seems wrong to me: Python's default is and should always be to be maximally helpful for debugging. What are "nice" stack traces? If you mean stack traces that record every function call then it is not nice and helpful at all given their length. Do loops have nice stack traces as you mean it? No. When something goes wrong in a loop, you don't get to see every iteration in the stack trace. You debug loops by examining the values of variables in the iteration it goes wrong. Likewise you debug a tail recursive function by examining the arguments that went into the function call that blows up. And if you don't want to turn on TRE by default, you can turn it off and offer an option to enable. > Second, the idea that TRE is merely an optimization, which each Python implementation can choose to implement or not, is wrong. Once tail recursion elimination exists, developers will start writing code that depends on it, and their code won't run on implementations that don't provide it: a typical Python implementation allows 1000 recursions, which is plenty for non-recursively written code and for code that recurses to traverse, for example, a typical parse tree, but not enough for a recursively written loop over a large list. Yes, it is more of a language feature than an implementation feature. But once CPython implements it, I think other implementations will follow suit or Python developers will not write code that uses TRE just like JavaScript developers don't use Mozilla-specific extensions like let keyword. > Third, I don't believe in recursion as the basis of all programming. This is a fundamental belief of certain computer scientists, especially those who love Scheme and like to teach programming by starting with a "cons" cell and recursion. But to me, seeing recursion as the basis of everything else is just a nice theoretical approach to fundamental mathematics (turtles all the way down), not a day-to-day tool. This isn't a valid argument. That something isn't fundamental is almost never an argument to leave it out. (Except for Scheme.) > Of course, none of this does anything to address my first three arguments. Is it really such a big deal to rewrite your function to use a loop? (After all TRE only addresses recursion that can easily be replaced by a loop. :-) It isn't a big deal, yes. But why restrict programmers? Python is a multi-paradigm language anyway: you can write imperative or functional code with(out) object orientation. "There should be one ? and only one ? obvious way to do it"? You are misinterpreting it. It is about having only one obvious way to do things, but you have removed another obvious way to do certain things because certain problems are better expressed using recursion rather than looping (traversing a tree, for example). I think the most obvious way to do something is how it is defined. If there was really one way to do anything, there should be no for-loops, list comprehensions, or even object orientation at all in Python. Remember that recursion is either something lower- or higher-level than looping. And if you don't like abstractions over primitive concepts, you should be coding in machine code right now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Sat Jan 18 16:08:38 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 18 Jan 2014 13:08:38 -0200 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: You can use tail recursion elimination in Python as it is today. So, if you are needing that, just package this reference implementation in a module:- http://metapython.blogspot.com.br/2010/11/tail-recursion-elimination-in-python.html js -><- On 18 January 2014 12:52, wrote: > On April 22, 2009, Guido van Rossum wrote: > > First, as one commenter remarked, TRE is incompatible with nice stack > traces: when a tail recursion is eliminated, there's no stack frame left to > use to print a traceback when something goes wrong later. This will confuse > users who inadvertently wrote something recursive (the recursion isn't > obvious in the stack trace printed), and makes debugging hard. Providing an > option to disable TRE seems wrong to me: Python's default is and should > always be to be maximally helpful for debugging. > > What are "nice" stack traces? If you mean stack traces that record every > function call then it is not nice and helpful at all given their length. Do > loops have nice stack traces as you mean it? No. When something goes wrong > in a loop, you don't get to see every iteration in the stack trace. You > debug loops by examining the values of variables in the iteration it goes > wrong. Likewise you debug a tail recursive function by examining the > arguments that went into the function call that blows up. And if you don't > want to turn on TRE by default, you can turn it off and offer an option to > enable. > > Second, the idea that TRE is merely an optimization, which each Python > implementation can choose to implement or not, is wrong. Once tail recursion > elimination exists, developers will start writing code that depends on it, > and their code won't run on implementations that don't provide it: a typical > Python implementation allows 1000 recursions, which is plenty for > non-recursively written code and for code that recurses to traverse, for > example, a typical parse tree, but not enough for a recursively written loop > over a large list. > > Yes, it is more of a language feature than an implementation feature. But > once CPython implements it, I think other implementations will follow suit > or Python developers will not write code that uses TRE just like JavaScript > developers don't use Mozilla-specific extensions like let keyword. > > Third, I don't believe in recursion as the basis of all programming. This is > a fundamental belief of certain computer scientists, especially those who > love Scheme and like to teach programming by starting with a "cons" cell and > recursion. But to me, seeing recursion as the basis of everything else is > just a nice theoretical approach to fundamental mathematics (turtles all the > way down), not a day-to-day tool. > > This isn't a valid argument. That something isn't fundamental is almost > never an argument to leave it out. (Except for Scheme.) > > Of course, none of this does anything to address my first three arguments. > Is it really such a big deal to rewrite your function to use a loop? (After > all TRE only addresses recursion that can easily be replaced by a loop. :-) > > It isn't a big deal, yes. But why restrict programmers? Python is a > multi-paradigm language anyway: you can write imperative or functional code > with(out) object orientation. "There should be one ? and only one ? obvious > way to do it"? You are misinterpreting it. It is about having only one > obvious way to do things, but you have removed another obvious way to do > certain things because certain problems are better expressed using recursion > rather than looping (traversing a tree, for example). I think the most > obvious way to do something is how it is defined. If there was really one > way to do anything, there should be no for-loops, list comprehensions, or > even object orientation at all in Python. Remember that recursion is either > something lower- or higher-level than looping. And if you don't like > abstractions over primitive concepts, you should be coding in machine code > right now. > > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From musicdenotation at gmail.com Sat Jan 18 16:58:23 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Sat, 18 Jan 2014 22:58:23 +0700 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: > On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" wrote: > > You can use tail recursion elimination in Python as it is today. I have seen many "implementations" of tail-call optimization, and their common problem is that they all require special syntax to work. I need a solution that is directly usable with Python's orrdinary return statement. -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sat Jan 18 17:28:20 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 18 Jan 2014 16:28:20 +0000 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: On 18/01/2014 15:58, musicdenotation at gmail.com wrote: >> On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" >> > > wrote: > >> You can use tail recursion elimination in Python as it is today. > I have seen many "implementations" of tail-call optimization, and their > common problem is that they all require special syntax to work. I need a > solution that is directly usable with Python's orrdinary /return/ > statement. > Then implement one and publish it so everybody else can use it. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From stefan_ml at behnel.de Sat Jan 18 17:40:37 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 18 Jan 2014 17:40:37 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: musicdenotation at gmail.com, 18.01.2014 16:58: > I have seen many "implementations" of tail-call optimization, and their > common problem is that they all require special syntax to work. I need a > solution that is directly usable with Python's orrdinary return > statement. What do you need it for? (and note that your answer might to be more suited for python-list than python-ideas, in which case you may reply over there) Stefan From jsbueno at python.org.br Sat Jan 18 18:12:29 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 18 Jan 2014 15:12:29 -0200 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: This one is. What are you taliking about? On 18 January 2014 13:58, wrote: > On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" wrote: > > > You can use tail recursion elimination in Python as it is today. > > I have seen many "implementations" of tail-call optimization, and their > common problem is that they all require special syntax to work. I need a > solution that is directly usable with Python's orrdinary return statement. What are you talking about? This one is usable with ordinary returns. It just requires a decorator. From ned at nedbatchelder.com Sat Jan 18 19:02:28 2014 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sat, 18 Jan 2014 13:02:28 -0500 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: <52DAC1B4.8080607@nedbatchelder.com> On 1/18/14 10:58 AM, musicdenotation at gmail.com wrote: >> On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" > > wrote: > >> You can use tail recursion elimination in Python as it is today. > I have seen many "implementations" of tail-call optimization, and > their common problem is that they all require special syntax to work. > I need a solution that is directly usable with Python's orrdinary > /return/ statement. You haven't explained why you need it. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jan 18 19:29:46 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 18 Jan 2014 10:29:46 -0800 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: Whether or not you really need it, adding it to Python is a fun challenge that's worth trying. The first problem is that CPython makes a C function call for every Python function call, and C doesn't eliminate tail calls; the only way to do it manually is with longjmp. So, you probably want to add it to either Stackless or PyPy instead. Second, detecting tail recursion to eliminate it is pretty hard, and you have to do a runtime check to detect that the function being called is already on the stack, which potentially slows down every function call. Fortunately, eliminating _all_ tail calls instead of just recursive ones is a lot easier in Python, and it's better in just about every way. Third, eliminating tail calls means the aren't on the stack at runtime, which means there's no obvious way to display useful tracebacks. I don't think too many Python users would accept the tradeoff of giving up good tracebacks to enable certain kinds of non-pythonic code, but even if you don't solve this, you can always maintain a fork the same way that Stackless has been doing. Meanwhile, it will be a lot easier to do this in steps: First add a tail statement that's like return except that its expression must be a Call; instead of compiling to a return bytecode it replaces the call_function* with a tail_function*, which you can initially fake by doing a call and return. Next, write a real implementation of tail_function* that jumps instead of calling and returning. Next, write a simple keyhole optimizer that converts any call_function* followed immediately by return into a tail_function*, which makes your custom syntax unnecessary, so you can revert it. Finally, solve the traceback problem in some way. (Maybe you could do something tricky here: Split the stack frame object into the bit needed for tracebacks and the bit needed for actual calling; tail call elimination takes care of the second one, and a different optimization to detect and run-compress loops takes care of the first one. Making that fast enough so that it doesn't slow down every call unacceptable probably means keeping around a dict mapping some kind of "position" record to frames.) Sent from a random iPhone On Jan 18, 2014, at 7:58, musicdenotation at gmail.com wrote: >> On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" wrote: > > >> You can use tail recursion elimination in Python as it is today. > I have seen many "implementations" of tail-call optimization, and their common problem is that they all require special syntax to work. I need a solution that is directly usable with Python's orrdinary return statement. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Jan 19 00:09:39 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 18 Jan 2014 18:09:39 -0500 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: On 1/18/2014 9:52 AM, musicdenotation at gmail.com wrote: A few notes first: 1) A tail call is a 'top level' call in a return statement. return f(*args, **kwds) A directly recursive call, where f refers to the function with the return statement, is a special case. 2) Whether a tail call is directly recursive cannot be determined, in Python, at compile time merely by looking at the function definition. If 'f' is the definition name of the function, an assumption can be made, but doing so is a semantic change in the language. 3). 'Tail recursion elimination' has two quite different meanings. 3a. Compile the (in Python, assumed to be) directly recursive function as if it had been written with a while loop. This is typically done in languages that do not expose while loops. This eliminates the function call overhead as well as stack growth. This does nothing for indirect recursion through intermediate function calls. 3b. At runtime, make the call but reuse the current stack frame. This is easily done (at least in CPython) for all tail calls. But doing so for all tail calls will make stack traces pretty useless, as tail calls are rather common. Determining whether the call is recursive, to limit stack reuse, takes extra work. Either choice only eliminates stack growth. Comments on some of you responses to a *subset* of TRE issues. > On April 22, 2009, Guido van Rossum wrote: >> First, ... > Do [while/for] loops have nice stack traces as you mean it? No, if you want a complete trace, you must add a print statement inside the loop. Looping with tail recursion gives you the complete trace for free. >> Second, ... What is fundamental, besides alternation, is the ability to express an unbounded amount of repetition, with variation, in a small finite amount of code. I call this computational induction, in analogy to mathematical induction. The alternative implemenations include recursion and explicit while/for loops. There are also the combinator and fixed point approaches. >> > I think the most obvious way to do something is how it is /defined/. Most recursive definitions are naturally written with body recursion rather than tail recursion. A simple example (without input checking) is the factorial function. def fact(n): if n: return fact(n-1) * n else: return 1 To get TRE, one must re-write in tail-recursive form. (Python default arguments actually make this easier than in most languages.) def fact(N, n=1, result=1): if n References: Message-ID: <20140119004515.GP3915@ando> On Sat, Jan 18, 2014 at 10:29:46AM -0800, Andrew Barnert wrote: > Whether or not you really need it, adding it to Python is a fun > challenge that's worth trying. "Need" is a funny thing. There's nothing you can do with a for-loop that you can't do with a while-loop, but that doesn't mean we don't "need" for-loops. Certain algorithms and idioms are simply better written in terms of for-loops, and certain algorithms are simply better written in terms of recursion than looping. You can go a long way without recursion, or only shallow recursion. In 15 years + of writing Python code, I've never been in a position that I couldn't solve a problem because of the lack of tail recursion. But every time I manually convert a recursive algorithm to an iterative one, I feel that I'm doing make-work, manually doing something which the compiler is much better at than I am, and the result is often less natural, or even awkward. (Trampolines? Ewww.) > Third, eliminating tail calls means the aren't on the stack at > runtime, which means there's no obvious way to display useful > tracebacks. I don't think too many Python users would accept the > tradeoff of giving up good tracebacks to enable certain kinds of > non-pythonic code, What makes you say that it is "non-pythonic"? You seem to be assuming that *by definition* anything written recursively is non-pythonic. I do not subscribe to that view. In fact, in some cases, I *would* willingly give up *non-useful* tracebacks for the ability to write more idiomatic code. Have you seen the typical recursive traceback? They're a great argument for "less is more". What normally happens is that you get a RuntimeError and the traceback blows away your xterm's buffer with hundreds of identical or near-identical lines. But even in the case where you didn't hit the recursion limit, the traceback is pretty much a picture of redundancy and noise: py> a(7) Traceback (most recent call last): File "", line 1, in File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 2, in a return b(n-1) File "./rectest.py", line 5, in b return c(n-1) + a(n) File "./rectest.py", line 9, in c return 1/n ZeroDivisionError: division by zero The only thing that I care about is the very last line, that function c tries to divide by zero. The rest of the traceback is just noise, I don't even look at it. Now, it's okay if you disagree, or if you can see something useful in the traceback other than the last entry. I'm not suggesting that TCE should be compulsary. I would be happy with a commandline switch to turn it on, or better still, a decorator to apply it to certain functions and not others. I expect that I'd have TCE turned off for debugging. But perhaps not -- it's not like Haskell and Scheme programmers are unable to debug their recursive code. The point is that tracebacks are not sacrosanct, and, yes, I would like the choice between writing idiomatic recursive code and more extensive tracebacks. Trading off speed for convenience is perfectly Pythonic -- that's why we have the ability to write C extensions, is it not? > but even if you don't solve this, you can always > maintain a fork the same way that Stackless has been doing. Having to fork the entire compiler just to write a few functions in their most idiomatic, natural (recursive) form seems a bit extreme, wouldn't you say? -- Steven From rymg19 at gmail.com Sun Jan 19 01:56:38 2014 From: rymg19 at gmail.com (Ryan) Date: Sat, 18 Jan 2014 18:56:38 -0600 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: I wrote one that uses decorators. How is that special syntax? musicdenotation at gmail.com wrote: >> On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" >wrote: > >> >> You can use tail recursion elimination in Python as it is today. >I have seen many "implementations" of tail-call optimization, and their >common problem is that they all require special syntax to work. I need >a solution that is directly usable with Python's orrdinary return >statement. > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nas-python at arctrix.com Sun Jan 19 02:13:32 2014 From: nas-python at arctrix.com (Neil Schemenauer) Date: Sat, 18 Jan 2014 19:13:32 -0600 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> Message-ID: <20140119011332.GA5735@python.ca> On 2014-01-18, Terry Reedy wrote: > On 1/17/2014 10:22 PM, Neil Schemenauer wrote: > >The transition to Python 3 is happening but there is still a massive > >amount of code that needs to be ported. > > For application code, why does it need to be ported. Unless Python 2.x is going to be maintained in perpetuity then code will have to be ported. This point seems obvious to me. > For many application areas, the text problem seems to have been > somewhat solved, to the point where people are writing 2&3 code > successfully. Sure you can write code that's compatible with 2&3, that's not the code I'm talking about. I'm talking about the millions (maybe billions) of lines of existing Python code. > I think it too late now. I disagree. The amount of Python 2 code that exists exceeds the amount of Python 3 by orders of magnitude. That existing codebase either stops evolving and stays Python 2 forever or we do all that's practical to help people move it to a current version of Python. > I believe you left out the int division change. That should be on the list. > People who cannot move to 3.x because of libraries could not move to > 2.8 for the same reason. Over half of the most commonly downloaded > libraries already have 3.x versions. That's a necessary condition but the vast amount of existing Python 2 code has not been ported. Lots of it would be private libraries or applications. You only have to look at the download stats for the Python interpreter to confirm this. > I realize that if there is actual code created, and if it's not > under the umbrella of the PSF, it couldn't be called "Python 2.8" > due to trademark reasons. I don't give a shit what it's called. A Python 2 fork is going to happen whether the PSF blesses it or not, I can't believe that's even a point of discussion. People are still maintaining Cobol compilers. Neil From nas-python at arctrix.com Sun Jan 19 02:14:13 2014 From: nas-python at arctrix.com (Neil Schemenauer) Date: Sat, 18 Jan 2014 19:14:13 -0600 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> References: <20140118032219.GA11381@python.ca> <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> Message-ID: <20140119011413.GB5735@python.ca> On 2014-01-17, Andrew Barnert wrote: > What exactly do you mean by "the bytes/unicode changes"? I mean all of those things that you listed. bytes = the Python 2.7 str object, str object is Python 2.7 unicode object. > I suspect that, whatever your exact answers, it would be a lot > easier to fork 3.4 and port the 2.7 behavior you want than to fork > 2.7 and backport almost all of 3.4. It's a lot of work no matter which way you do it. That's one of the biggest problems with this idea. > And if you do it that way, you could even adapt the idea someone > proposed a few weeks ago?not popular on this list, but maybe > popular with your target audience?of turning each change on and > off with a "from __past__ import misfeature" statement, so people > could pick and choose the ones they need, and gradually remove > past statements as they port from your forked 2.8 to real 3.4. You can't make those changes with __future__/__past__ imports. They effect the whole runtime, not single module. > However,?I also suspect that, whatever your exact answers, it > won't be that useful.?Look at people's reasons for not moving to > 3.x: Imagine I'm a developer with the Python 2.x codebase. I'm either lazy or I'm too busy with other company projects that I can't put the effort into porting my 2.x code to 3.x, even if all the 3rd party libraries have been ported. How can we make it easier for them to move their code towards Python 3.x rather than keeping it as 2.x? A maintained interpreter to run Python 2.x code is going to continue to exist. Some python-dev people seem to suggest we can suggest that end of maintenance of Python 2.7 is going to force people to port their code. That's ridiculous. I want to make it more attractive for these developers to move towards Python 3 rather than stalling out on Python 2.7 forever. How best to do that is still to be determined. I think my 2.8 idea might be better than the status quo but it's just a crazy idea. > I'm having a hard time imagining code that would be easy to port to 2.8, but not to 3.x. For example: > > ? ? payload = > ? ? sock.sendall('Header: {}\r\nAnother: {}\r\n\r\n{}'.format( > ? ? ? ? headers['header'], headers['another'], payload)) > > Even with just the two changes you already suggested:?First, you > have to change the literal to a bytes literal. That part is easy, could even be done with an automated tool (change u' to ' and ' to b'). > More seriously, you have to rename that payload type's __str__ > method to __bytes__. Nope, no __bytes__ in my proposed 2.8. > And if it does any string stuff internally, like encoding JSON, > that has to change. Meanwhile, your?logging code probably relies > on the same _str__ method actually returning a str, so you have to > add one of those. Assuming headers is a dict of strs, you either > need to go back up the chain (or into the API that provides it) > and change that so it's been a dict of bytes all along, or you > need to explicitly encode the headers here. That doesn't sound too > hard overall? but that gives you working Python 3.5?code (assuming > PEP 460 goes through). And there doesn't seem to be any shortcut > that would give you working 2.8 code without also working in 3.5. I think you are misunderstanding my proposal, no problems like the ones you suggest, bytes() would be the Python 2.7 str class. All the internal bytes/unicode internals would be like 2.7. That's basically the whole idea of this proposal, the bytes/str change in 3.x is the really disruptive one, separate it into separate interpreter versions to make porting easier. Neil From rosuav at gmail.com Sun Jan 19 02:31:20 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 19 Jan 2014 12:31:20 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011413.GB5735@python.ca> References: <20140118032219.GA11381@python.ca> <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> <20140119011413.GB5735@python.ca> Message-ID: On Sun, Jan 19, 2014 at 12:14 PM, Neil Schemenauer wrote: > That's > basically the whole idea of this proposal, the bytes/str change in > 3.x is the really disruptive one, separate it into separate > interpreter versions to make porting easier. It may be disruptive to a whole lot of code that's been happily oblivious to the whole issue, but it's also central to more and more of Py3's library. It's going to become increasingly difficult to backport stuff from Py3 to a system that doesn't have the same back-end string handling. If you're prepared to make a whole bunch of incompatible changes to move to this hypothetical 2.8, why not make all the changes at once? Unless 2.x will be maintained forever (with a 2.9, a 2.10, and so on), the changes will have to be made. If it's so costly to make a full pass over your code to port it to 3.3/3.4/3.5, surely it would be twice as costly to make that exact same full pass to port it to 2.8, and then another just the same to port 2.8 to 3.6? I still maintain that the biggest complaints about the jump from 2.x to 3.x are largely dealt with by 2.7 and 3.3/3.4. Yes, it's hard to jump from 2.5 to 3.1, but you don't have to. Just stick with 2.x until your users are all on 2.7, then strip out all the code that's supporting pre-2.7 versions. Once you have that, you can start in with some __future__ directives (division, print_function, unicode_literals), and start sorting out the bytes/unicode distinction *at your leisure*. (In some cases, that "sorting out" is simply a matter of naming. I have some code that reads from a socket, and it's divided into three parts: first pass works with "data" and handles TELNET codes, second pass works with "text" and handles ANSI codes, third pass works with "text" and handles newlines. It's obvious from the parameter names that the conversion from bytes to Unicode has to happen between the first and second passes.) Then, when you finally come to port it to 3.x (which mightn't be for another however-many years, when 2.7's python.org support finally ends, or it might be even later, when RHEL stops shipping patches, or it might not even be then - code doesn't stop working just because it's not supported), you make the jump to whichever version is most convenient. Currently, I'm seeing 3.3 as easier to jump to than 3.2 (eg the redundant compatibility notation u"str" is available), and 3.4 is getting some more on that front; maybe some features won't make it into 3.4 so they'll be in 3.5. And maybe it'll be 3.7 that you jump to. That's not a problem. Whatever version you port it to, you make *one* assault on your code, and there you are, taking advantage of exactly as much of 3.x as you feel like, and it's all working. ChrisA From steve at pearwood.info Sun Jan 19 03:04:04 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jan 2014 13:04:04 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011332.GA5735@python.ca> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: <20140119020404.GQ3915@ando> On Sat, Jan 18, 2014 at 07:13:32PM -0600, Neil Schemenauer wrote: > On 2014-01-18, Terry Reedy wrote: > > On 1/17/2014 10:22 PM, Neil Schemenauer wrote: > > >The transition to Python 3 is happening but there is still a massive > > >amount of code that needs to be ported. > > > > For application code, why does it need to be ported. > > Unless Python 2.x is going to be maintained in perpetuity then code > will have to be ported. This point seems obvious to me. Why? If it isn't broken, don't break it. At last year's US PyCon, there was at least one person still using Python 1.5 in production. Doing so means that he gets no bug fixes or security updates for 1.5, but if he doesn't need them, that is no loss. Python 2.7 will almost certainly still be supported by (for example) Red Hat until 2023, which is probably longer than most applications will be still in use. > > For many application areas, the text problem seems to have been > > somewhat solved, to the point where people are writing 2&3 code > > successfully. > > Sure you can write code that's compatible with 2&3, that's not the > code I'm talking about. I'm talking about the millions (maybe > billions) of lines of existing Python code. > > > I think it too late now. > > I disagree. The amount of Python 2 code that exists exceeds the > amount of Python 3 by orders of magnitude. That existing codebase > either stops evolving and stays Python 2 forever Why is that a problem? Some people will never migrate away from Python 2.7/2.5/2.4/1.5. That's okay. A few months ago I ported an application from 2.3 to 2.6. It's not well recognised that Python 3 is not the first time Python broke backwards compatibility: string exceptions raise "This is an error" became a warning in 2.5 (I think) and a SyntaxError in 2.6. This application made extensive use of string exceptions. My customer was happy with 2.3 code for years, until they upgraded their server to a version of Centos with 2.6, and that was the only reason they upgraded. I expect they will stick with 2.6 until such time as they upgrade the server again in another decade or so, and that's fine. They may never upgrade, and that's fine too. > or we do all that's > practical to help people move it to a current version of Python. Define "all that's practical". How much hand-holding do they need? On the Python-Dev list, there are *hundreds* of emails about this issue, which is distracting the core devs from making Python 3 even more awesome. [...] > I don't give a shit what it's called. A Python 2 fork is going to > happen whether the PSF blesses it or not I doubt that. Stackless may try to call itself Python 2.8, but it won't be porting Python 3 features: https://mail.python.org/pipermail/python-dev/2013-November/130421.html Stackless wanted to release a 2.8, but it wouldn't contain any additional Py3 features: https://mail.python.org/pipermail/python-dev/2013-November/130421.html it would be a version bump to support a newer Microsoft compiler. There are plenty of people who *say* they want a Python 2.8 with half the Python 3 features, but nobody as far as I can see is actually willing to do the work. If they were, why haven't they started? They don't need permission. -- Steven From steve at pearwood.info Sun Jan 19 03:28:11 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jan 2014 13:28:11 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: <20140119022811.GR3915@ando> On Fri, Jan 17, 2014 at 09:22:19PM -0600, Neil Schemenauer wrote: [...] > Here is a far out idea to make transition smoother. Release version > 2.8 of Python with nearly all Python 3.x incompatible changes except > for the bytes/unicode changes. This could include: It's hardly a "far out" idea. You're not the first to suggest this. There are many people asking for -- demanding, almost -- a Python 2.8 that provides only the subset of Python3 that they are interested in and gives them an excuse to avoid migrating for another three or five or ten years. Because really that's what 2.8 is all about -- providing people an excuse to put off migrating for a bit longer. But the thing is, they've still got a good three or more years before 2.7 goes into "security patches only" mode, and likely years more before it becomes unmaintained. And then there's third-party support from companies like RedHat. They will continue supporting Python 2 until end of 2023: https://access.redhat.com/site/support/policy/updates/errata/#Life_Cycle_Dates I wonder whether the 2 to 3 transition might not have been handled better with a long-drawn out process of slowly adding backwards-incompatible changes a few at a time? This is like the old chestnut about whether it is better to ease yourself into a really cold swimming pool a little at a time, or get it over with in one go by diving in. In both cases, you have pain, is it better to have a lot of pain that only lasts a short while, or a little bit of pain that goes on and on and on and on...? I think that had Python decided to add backwards-incompatible changes a few at a time, people now would be complaining about that and demanding that there be a once-off cutover version. > - removal of 'apply', 'buffer', 'callable', 'execfile' callable is back in Python 3.3. > Problems with this idea: > > - it would be a huge amount of work. [...] > > - if people install this new version of Python as the default, old > scripts and programs will break. [...] - It gives people an excuse to avoid migrating, and as sure as the sun rises in the east, will lead to people calling for Python 2.9 a few years from now. > An alternative approach to producing Python 2.8 would be to start > with the Python 3.x latest branch. Modify bytesobject and > unicodeobject to have as close to Python 2 behavior as practical. > > A-journey-of-a-thousand-miles-begins-ly y'rs The journey *already began* back in Python 2.6. Python 2.6 is the start of the journey, it introduces dict views, next() builtin, from __future__ absolute_import print_function and unicode_literals, and probably more that I have forgotten. So really, people have had 2.6 and 2.7 to ease the transition from 2.5 to 3.x. If they haven't taken advantage of that, what makes you think that 2.8 and 2.9 will convince them to migrate? But you don't have to believe me. Python is open source. Feel free to fork it and backport whatever features you like, and see how much interest you get from the wider community. Just don't call it "2.8", that sends the wrong message and is a pretty rude thing to do given that the core developers have said that they will not make a 2.8: http://www.python.org/dev/peps/pep-0404/ Just because there will not be a CPython 2.8 doesn't mean you can't go ahead with your plan to backport 3 features to a 2 base. Just call it something else. -- Steven From steve at pearwood.info Sun Jan 19 03:34:19 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jan 2014 13:34:19 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011413.GB5735@python.ca> References: <20140118032219.GA11381@python.ca> <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> <20140119011413.GB5735@python.ca> Message-ID: <20140119023419.GS3915@ando> On Sat, Jan 18, 2014 at 07:14:13PM -0600, Neil Schemenauer wrote: > On 2014-01-17, Andrew Barnert wrote: > > And if you do it that way, you could even adapt the idea someone > > proposed a few weeks ago?not popular on this list, but maybe > > popular with your target audience?of turning each change on and > > off with a "from __past__ import misfeature" statement, so people > > could pick and choose the ones they need, and gradually remove > > past statements as they port from your forked 2.8 to real 3.4. > > You can't make those changes with __future__/__past__ imports. They > effect the whole runtime, not single module. I believe you are wrong. from __future__ imports are designed to effect the single module they are in. I see no reason why from __past__ can't work the same way. [steve at ando ~]$ cat a.py def func(): return "Hello World" [steve at ando ~]$ cat b.py from __future__ import unicode_literals def func(): return "Hello World" [steve at ando ~]$ python2.7 -c "import b, a; print repr(b.func()), repr(a.func())" u'Hello World' 'Hello World' -- Steven From haoyi.sg at gmail.com Sun Jan 19 04:42:00 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Sat, 18 Jan 2014 19:42:00 -0800 Subject: [Python-ideas] Tail recursion elimination Message-ID: <3426697229381222197@unknownmsgid> MacroPy also has an implementation of TCO implemented using trampolining. It trades stack introspection for load-time-analysis, which could be a win or a loss depending on how you view things. ------------------------------ From: Ryan Sent: 1/18/2014 4:57 PM To: musicdenotation at gmail.com; Joao S. O. Bueno; python-ideas at python.org Subject: Re: [Python-ideas] Tail recursion elimination I wrote one that uses decorators. How is that special syntax? musicdenotation at gmail.com wrote: > > On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" > wrote: > > > You can use tail recursion elimination in Python as it is today. > > I have seen many "implementations" of tail-call optimization, and their > common problem is that they all require special syntax to work. I need a > solution that is directly usable with Python's orrdinary *return*statement. > > ------------------------------ > > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sun Jan 19 04:58:59 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 19 Jan 2014 14:58:59 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: <7wd2joaagr.fsf@benfinney.id.au> Neil Schemenauer writes: > On 2014-01-18, Terry Reedy wrote: > > For application code, why does it need to be ported [to Python 3]. > > Unless Python 2.x is going to be maintained in perpetuity then code > will have to be ported. This point seems obvious to me. Maintained by whom? The PSF will stop maintaining Python 2, yes. But that doesn't stop other parties ? Red Hat, ActiveState, etc. ? doing so for whatever customers are still interested in compensating them for their work. So long as the cost of getting the Python interpreter maintained by *someone* is lower than the perceived cost of porting to Python 3, the code doesn't need to be ported. This is a great and salient benefit of Python itself being free software: Unlike a non-free software platform, no recipient of a free-software Python is beholden to the vendor for ongoing maintenance. That point seems obvious to me. -- \ ?It is the fundamental duty of the citizen to resist and to | `\ restrain the violence of the state.? ?Noam Chomsky, 1971 | _o__) | Ben Finney From rymg19 at gmail.com Sun Jan 19 05:05:49 2014 From: rymg19 at gmail.com (Ryan) Date: Sat, 18 Jan 2014 22:05:49 -0600 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <3426697229381222197@unknownmsgid> References: <3426697229381222197@unknownmsgid> Message-ID: <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> Now there's a new library I need to try! Haoyi Li wrote: >MacroPy also has an implementation of TCO implemented using >trampolining. >It trades stack introspection for load-time-analysis, which could be a >win >or a loss depending on how you view things. >------------------------------ >From: Ryan >Sent: 1/18/2014 4:57 PM >To: musicdenotation at gmail.com; Joao S. O. Bueno; >python-ideas at python.org >Subject: Re: [Python-ideas] Tail recursion elimination > >I wrote one that uses decorators. How is that special syntax? > >musicdenotation at gmail.com wrote: >> >> On Jan 18, 2014, at 22:08, "Joao S. O. Bueno" >> wrote: >> >> >> You can use tail recursion elimination in Python as it is today. >> >> I have seen many "implementations" of tail-call optimization, and >their >> common problem is that they all require special syntax to work. I >need a >> solution that is directly usable with Python's orrdinary >*return*statement. >> >> ------------------------------ >> >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From musicdenotation at gmail.com Sun Jan 19 08:39:15 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Sun, 19 Jan 2014 14:39:15 +0700 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> Message-ID: <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> I propose tail-call optimization to be added into CPython. From abarnert at yahoo.com Sun Jan 19 08:50:39 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 18 Jan 2014 23:50:39 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011239.GB5365@python.ca> References: <20140118032219.GA11381@python.ca> <1390031781.77896.YahooMailNeo@web181004.mail.ne1.yahoo.com> <20140119011239.GB5365@python.ca> Message-ID: <72FCAB60-FC07-48BA-91FD-AB5ADB4B969B@yahoo.com> On Jan 18, 2014, at 17:12, Neil Schemenauer wrote: > On 2014-01-17, Andrew Barnert wrote: >> What exactly do you mean by "the bytes/unicode changes"? > > I mean all of those things that you listed. bytes = the Python 2.7 > str object, str object is Python 2.7 unicode object. If you really mean I the things I listed, we already have that. It's called Python 3.4. If you want to fork it and rename it 2.8, I can't imagine who that would help. A smaller list of changes _might_ mean a useful intermediate target, but if you're not even willing to think through the issues and discuss them, you're not going to come up with such a list. >> I suspect that, whatever your exact answers, it would be a lot >> easier to fork 3.4 and port the 2.7 behavior you want than to fork >> 2.7 and backport almost all of 3.4. > > It's a lot of work no matter which way you do it. That's one of the > biggest problems with this idea. > >> And if you do it that way, you could even adapt the idea someone >> proposed a few weeks ago?not popular on this list, but maybe >> popular with your target audience?of turning each change on and >> off with a "from __past__ import misfeature" statement, so people >> could pick and choose the ones they need, and gradually remove >> past statements as they port from your forked 2.8 to real 3.4. > > You can't make those changes with __future__/__past__ imports. They > effect the whole runtime, not single module. Sure you can. It already works for __future__ Unicode literals in 2.7. Most of the other changes would work just as well. A few might not--but again, you have to go through them one by one and decide. >> However, I also suspect that, whatever your exact answers, it >> won't be that useful. Look at people's reasons for not moving to >> 3.x: > > Imagine I'm a developer with the Python 2.x codebase. I'm either > lazy or I'm too busy with other company projects that I can't put > the effort into porting my 2.x code to 3.x, even if all the 3rd > party libraries have been ported. > > How can we make it easier for them to move their code towards Python > 3.x rather than keeping it as 2.x? Not by publishing something that requires the exact same code changes as 3.4 and calling it 2.8. That might trick a handful of devs, and help a handful of others trick their managers, but that's not much benefit. > A maintained interpreter to run > Python 2.x code is going to continue to exist. Some python-dev > people seem to suggest we can suggest that end of maintenance of > Python 2.7 is going to force people to port their code. That's > ridiculous. I've never heard anyone suggest this. The people who are most gung ho about 3.x are the ones who keep pointing out that many apps never need to port and that people like RedHat are likely to continue supporting 2.7 long after the PSF stops doing so. > I want to make it more attractive for these developers to move > towards Python 3 rather than stalling out on Python 2.7 forever. > How best to do that is still to be determined. I think my 2.8 idea > might be better than the status quo but it's just a crazy idea. A crazy idea is one thing; a misinformed idea is another. > >> I'm having a hard time imagining code that would be easy to port to 2.8, but not to 3.x. For example: >> >> payload = >> sock.sendall('Header: {}\r\nAnother: {}\r\n\r\n{}'.format( >> headers['header'], headers['another'], payload)) >> >> Even with just the two changes you already suggested: First, you >> have to change the literal to a bytes literal. > > That part is easy, could even be done with an automated tool > (change u' to ' and ' to b'). > >> More seriously, you have to rename that payload type's __str__ >> method to __bytes__. > > Nope, no __bytes__ in my proposed 2.8. Then the code just doesn't work. The payload types existing __str__ returns a bytes object, which raises a TypeError. >> And if it does any string stuff internally, like encoding JSON, >> that has to change. Meanwhile, your logging code probably relies >> on the same _str__ method actually returning a str, so you have to >> add one of those. Assuming headers is a dict of strs, you either >> need to go back up the chain (or into the API that provides it) >> and change that so it's been a dict of bytes all along, or you >> need to explicitly encode the headers here. That doesn't sound too >> hard overall? but that gives you working Python 3.5 code (assuming >> PEP 460 goes through). And there doesn't seem to be any shortcut >> that would give you working 2.8 code without also working in 3.5. > > I think you are misunderstanding my proposal, no problems like the > ones you suggest, bytes() would be the Python 2.7 str class. All > the internal bytes/unicode internals would be like 2.7. You're contradicting yourself. You explicitly said that your proposal includes all of the changes I suggested. That includes, right near the very top, things like no automatic str/bytes conversions. But you seem to be assuming they would still exist even though you decided to remove them. > That's > basically the whole idea of this proposal, the bytes/str change in > 3.x is the really disruptive one, separate it into separate > interpreter versions But you've proposed that the all of the elements of the str/bytes change should be part of 2.8, which means it will be just as disruptive as 3.4. From ncoghlan at gmail.com Sun Jan 19 09:04:46 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 18:04:46 +1000 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011332.GA5735@python.ca> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: On 19 January 2014 11:13, Neil Schemenauer wrote: > On 2014-01-18, Terry Reedy wrote: >> On 1/17/2014 10:22 PM, Neil Schemenauer wrote: >> >The transition to Python 3 is happening but there is still a massive >> >amount of code that needs to be ported. >> >> For application code, why does it need to be ported. > > Unless Python 2.x is going to be maintained in perpetuity then code > will have to be ported. This point seems obvious to me. Red Hat will offer commercial Python 2 support until at least 2023 (since the RHEL7 beta was just released with Python 2.7 as the system Python and the current lifecycle for RHEL releases is 10 years), and I expect other commercial redistributors will similarly extend the lifetime of Python 2 well beyond 2015 when the level of support we provide for free reverts to security fix only mode. With CentOS and other downstream community rebuilds of RHEL available, this even includes the availability of *free* prebuilt versions. So Python 2 application developers don't have anything to worry about *if they're happy with Python 2.7 as it stands*, especially after accounting for the Python 3 standard library modules that are also available on PyPI for Python 2 (or are relatively easy to fork and port back to Python 2, or just copy and paste the relevant code into a private utility module). However, now that we're approaching the release of Python 3.4 (the second Python 3 release without a corresponding Python 2 release), some Python 2 developers are finally beginning to realise how much they had come to rely on the relatively steady cadence of new features and functionality previously delivered in an easily consumable bundle by the CPython core development team. So, those developers are now faced with a few different options: - invest in migrating to Python 3 themselves (the cost of which will vary from being similar to any major Python version update, with most of the cost being in compatibility testing, to substantially more expensive, depending on the exact nature of the application, its dependencies and the quality of their respective test suites) - try to guilt the existing core developers into creating Python 2.8 for them for free (not going to happen, read PEP 404) - try to hire enough of the core developers to convince Guido to approve a Python 2.8 release from python.org (not impossible, but likely prohibitively expensive, since most, perhaps all, of the core development team are already gainfully employed elsewhere) - fork CPython to create their own Python 2.8 (also cost prohibitive from a time and materials perspective, unless you already have the infrastructure and community in place to maintain a CPython fork) That last point is relevant to the discussions around Stackless 2.8: CCP and the rest of the Stackless community have been maintaining a CPython fork for so long that the idea of porting some of the backwards compatible Python 3.3 and 3.4 changes over to a Stackless 2.8 release is a relatively straightforward one for them and something they're seriously considering. However, significant compatibility testing costs would still be incurred in a switch from CPython 2.7 to Stackless 2.8, so conservative developers are still likely to stick with the devil they know (most of the really interesting changes in Python 3 are the backwards incompatible ones, so they won't be backported, even in Stackless 2.8). There's lots and lots of info about the state of the Python 3 transition here: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html I'd call reading that Q&A the starting point for any discussion of creating a Python 2.8 release, but it really isn't. The starting point is a deep understanding behind the business drivers of open source based commercial operations and how they deal with cases where they depend on things that upstream has decided not to support any more. Sometimes they invest in the infrastructure needed to create their own fork (since their motivations no longer align with the existing development team's motivations), sometimes they pay commercial redistributors to continue supporting the older version (an approach I appreciate, since it represents one of the things that ultimately gets me paid), sometimes they approach the existing development team (or a related foundation) about directly funding continued development of the version being discontinued and sometimes they decide to invest in updating to the newer platform themselves. This dynamic isn't unique to open source though, as it impacts even large proprietary platform vendors like Microsoft - Windows XP almost certainly remained supported for so long because a whole lot of paying users that weren't happy with the state of Windows Vista and offered Microsoft enough money to ensure they could keep using Windows XP until Windows 7 was available. The only difference there is that in the proprietary case, the *only* option users have is to beg the vendor to continue maintenance - the options of forking or paying someone else to take up maintenance aren't available due to the licensing restrictions on the proprietary platform. Returning to the Python 3 case, as things currently stand, the combination of Armin Ronacher's python-modernize with Benjamin Petersen's six module is one approach to smooth migration, as is Ed Schofield's python-future module and its futurize tool. For application porting (which may be able to just drop Python 2 support rather than needing to maintain Python 2 and Python 3 support in parallel), Guido's original 2to3 conversion tool may suffice. PEP 461 will likely add a binary interpolation feature *back* to Python 3.5, removing an additional blocker to forward migrations for current Python 2 users (just as PEP 414 did by restoring Unicode literal support). While the Stackless community are looking at creating a Stackless 2.8 release, and some Python 2 users may decide it is worth migrating to the Stackless fork to gain access to any Python 3 features they decide to backport, rather than migrating to Python 3 itself, this is all perfectly fine - it's the open source model working *exactly as it is supposed to*, by giving people the option to take steps that meet *their* needs, rather than being completely subject to the desires of the core development team. The only thing people *don't* get to do is make suggestions about what *should* happen without also explaining: * what problem the suggestion is designed to solve, and how it actually helps to solve it * how the proposal is going to be resourced, especially when it is something the existing development team have disclaimed any interest in doing for free Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 19 09:13:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 18:13:06 +1000 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119020404.GQ3915@ando> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <20140119020404.GQ3915@ando> Message-ID: On 19 January 2014 12:04, Steven D'Aprano wrote: > On Sat, Jan 18, 2014 at 07:13:32PM -0600, Neil Schemenauer wrote: >> I disagree. The amount of Python 2 code that exists exceeds the >> amount of Python 3 by orders of magnitude. That existing codebase >> either stops evolving and stays Python 2 forever > > Why is that a problem? Some people will never migrate away from Python > 2.7/2.5/2.4/1.5. That's okay. A few months ago I ported an application > from 2.3 to 2.6. It's not well recognised that Python 3 is not the first > time Python broke backwards compatibility: string exceptions > > raise "This is an error" > > became a warning in 2.5 (I think) and a SyntaxError in 2.6. This > application made extensive use of string exceptions. My customer was > happy with 2.3 code for years, until they upgraded their server to a > version of Centos with 2.6, and that was the only reason they upgraded. > I expect they will stick with 2.6 until such time as they upgrade the > server again in another decade or so, and that's fine. They may never > upgrade, and that's fine too. For anyone that ever travels by plane, it can be worth watching aircraft entertainment systems go through their boot cycle to see what they're running on (the difficulty of getting software, even entertainment software, approved to run on aircraft can make for very long lead times). The last one I checked was based on Red Hat 7.1, released in 2001 and unsupported for a very long time, but still entirely serviceable for that particular use case. Old doesn't always mean broken, sometimes it just annoys your developers to be asked to use such old and blunt tools when newer and sharper ones are available :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From musicdenotation at gmail.com Sun Jan 19 09:47:00 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Sun, 19 Jan 2014 15:47:00 +0700 Subject: [Python-ideas] Make print() not append line break by default Message-ID: <52db9111.88a3420a.6785.6324@mx.google.com> And add println() From stefan_ml at behnel.de Sun Jan 19 10:10:13 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 19 Jan 2014 10:10:13 +0100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119022811.GR3915@ando> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> Message-ID: Steven D'Aprano, 19.01.2014 03:28: > - It gives people an excuse to avoid migrating, and as sure as the sun > rises in the east, will lead to people calling for Python 2.9 a few > years from now. Thank you, Steven, for taking the time to write this post. Stefan From bruce at leapyear.org Sun Jan 19 11:00:36 2014 From: bruce at leapyear.org (Bruce Leban) Date: Sun, 19 Jan 2014 02:00:36 -0800 Subject: [Python-ideas] Make print() not append line break by default In-Reply-To: <52db9111.88a3420a.6785.6324@mx.google.com> References: <52db9111.88a3420a.6785.6324@mx.google.com> Message-ID: I think this is a great suggestion if the goal is to break lots of programs for no good reason. Can we rename 'dict' to 'map' while we're at it? The best suggestions are motivated by an actual problem or use case. There's no problem here. Just use: print(..., end='') The bar for breaking changes is very high. This is -100 on a scale of 0 to 10. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security On Sun, Jan 19, 2014 at 12:47 AM, wrote: > And add println() > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sun Jan 19 11:52:02 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 19 Jan 2014 10:52:02 +0000 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: On 19/01/2014 07:39, musicdenotation at gmail.com wrote: > I propose tail-call optimization to be added into CPython. Then implement it so everybody else can use it. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From breamoreboy at yahoo.co.uk Sun Jan 19 12:01:38 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 19 Jan 2014 11:01:38 +0000 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011332.GA5735@python.ca> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: On 19/01/2014 01:13, Neil Schemenauer wrote: > > I don't give a shit what it's called. A Python 2 fork is going to > happen whether the PSF blesses it or not, I can't believe that's > even a point of discussion. People are still maintaining Cobol > compilers. I don't care what it's called either. And I'll believe the fork when I see it. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From steve at pearwood.info Sun Jan 19 12:08:03 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Jan 2014 22:08:03 +1100 Subject: [Python-ideas] Make print() not append line break by default In-Reply-To: <52db9111.88a3420a.6785.6324@mx.google.com> References: <52db9111.88a3420a.6785.6324@mx.google.com> Message-ID: <20140119110803.GT3915@ando> On Sun, Jan 19, 2014 at 03:47:00PM +0700, musicdenotation at gmail.com wrote: > And add println() Print natural log? For the major use cases print is designed for, you want it to print a newline at the end. For those rare times you don't, print(..., end='') is simple enough. Besides, print has inserted a newline at the end since at least Python 1.5. There is a lot of code relying on that behaviour. Even if we wanted to change, backwards-compatibility considerations would prevent it. -- Steven From tjreedy at udel.edu Sun Jan 19 12:09:56 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 06:09:56 -0500 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011332.GA5735@python.ca> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: On 1/18/2014 8:13 PM, Neil Schemenauer wrote: > On 2014-01-18, Terry Reedy wrote: >> I realize that if there is actual code created, and if it's not >> under the umbrella of the PSF, it couldn't be called "Python 2.8" >> due to trademark reasons. Except I did not. This is part of a quote from Martjin Faasen. You should have left the attribution and quote marks in. > I don't give a shit what it's called. A Python 2 fork is going to > happen whether the PSF blesses it or not, The core developers said years ago that if *other* people want to make a post 2.7 Python, just not called 'Python 2.8' (because we do care), they are free to. We *expect* that there will be commercial support (Red Hat, for instance) at least for keeping 2.7 updated to work on new platforms, perhaps with a few other patches. If you are correct about the tremendous demand for a 'something 2.8', then some group should be able to make money creating and selling it. However, as far as I know, no person and no corporation has yet offered money to PSF or individual core developers to develop a possibly PSF-blessed Python 2.8. > I can't believe that's even a point of discussion. You are the one who brought it up on *this* list, where is it mostly off-topic, because *this* list is about future Python 3 versions. That was the point of me directing you to Faasen's 'something 2.8' discussion. -- Terry Jan Reedy From stefan_ml at behnel.de Sun Jan 19 12:24:51 2014 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 19 Jan 2014 12:24:51 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: Andrew Barnert, 18.01.2014 19:29: > The first problem is that CPython makes a C function call for every > Python function call, and C doesn't eliminate tail calls; the only way > to do it manually is with longjmp Many C compilers actually fold tail recursion into loops. However, that has nothing to do with an *interpreter* that happens to be written in C not eliminating tail recursion. There is no technical reason you couldn't do TRE in CPython at the *interpreter* level. And this has nothing to do with longjmp. Stefan From jsbueno at python.org.br Sun Jan 19 12:54:42 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 19 Jan 2014 09:54:42 -0200 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: On 19 January 2014 08:52, Mark Lawrence wrote: > On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >> >> I propose tail-call optimization to be added into CPython. > > > Then implement it so everybody else can use it. On a second though, it actually could be done, at the VM level. I am not a proponent, but after my second though I am from "-1" to "+0". I believe that anytime one have the sequence: 20 CALL_FUNCTION 1 23 RETURN_VALUE in byte code, the current stack frame could be discarded prior to making the function call. Looking from 10000 meters, it feels like it would not impact any other aspect of the language but for enabling automatically tail recursion calls. js -><- > > > -- > My fellow Pythonistas, ask not what our language can do for you, ask what > you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From jsbueno at python.org.br Sun Jan 19 12:57:34 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 19 Jan 2014 09:57:34 -0200 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: OTOH, since we are at it, we'd better check 2009 BDLF's opinion on the subject: http://neopythonic.blogspot.com.br/2009/04/tail-recursion-elimination.html On 19 January 2014 09:54, Joao S. O. Bueno wrote: > On 19 January 2014 08:52, Mark Lawrence wrote: >> On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >>> >>> I propose tail-call optimization to be added into CPython. >> >> >> Then implement it so everybody else can use it. > > On a second though, it actually could be done, at the VM level. > I am not a proponent, but after my second though I am from "-1" to "+0". > > I believe that anytime one have the sequence: > > 20 CALL_FUNCTION 1 > 23 RETURN_VALUE > > in byte code, the current stack frame could be discarded prior > to making the function call. Looking from 10000 meters, it feels > like it would not impact any other aspect of the language but for > enabling automatically tail recursion calls. > > js > -><- > >> >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask what >> you can do for our language. >> >> Mark Lawrence >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ From tjreedy at udel.edu Sun Jan 19 13:12:16 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 07:12:16 -0500 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: <20140119004515.GP3915@ando> References: <20140119004515.GP3915@ando> Message-ID: TCO (Tail Call Optimization) means that when TCO is in effect and a tail call "return f()" is executed, the current execution context (stack frame) is used for the call instead of allocating a new one. What is 'optimized' is space usage. The effect on time is not clear. On 1/18/2014 7:45 PM, Steven D'Aprano wrote: > What makes you say that it is "non-pythonic"? You seem to be assuming > that *by definition* anything written recursively is non-pythonic. I do > not subscribe to that view. Neither do I. > In fact, in some cases, I *would* willingly give up *non-useful* > tracebacks for the ability to write more idiomatic code. > The point is that tracebacks are not sacrosanct, and, yes, I would like > the choice between writing idiomatic recursive code and more extensive > tracebacks. Trading off speed for convenience is perfectly Pythonic -- > that's why we have the ability to write C extensions, is it not? Are you willing to do any of the work needed to make the option available, starting with a specification? If so, I have some ideas. > Having to fork the entire compiler just to write a few functions in > their most idiomatic, natural (recursive) form seems a bit extreme, > wouldn't you say? A 'fork' could consist of a relatively small patch that could be uploaded to, for instance, PyPI. I would not be surprised if 100-200 lines might be enough. -- Terry Jan Reedy From ncoghlan at gmail.com Sun Jan 19 13:31:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Jan 2014 22:31:06 +1000 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: References: <20140119004515.GP3915@ando> Message-ID: On 19 January 2014 22:12, Terry Reedy wrote: > TCO (Tail Call Optimization) means that when TCO is in effect and a tail > call "return f()" is executed, the current execution context (stack > frame) is used for the call instead of allocating a new one. What is > 'optimized' is space usage. The effect on time is not clear. > > On 1/18/2014 7:45 PM, Steven D'Aprano wrote: > >> What makes you say that it is "non-pythonic"? You seem to be assuming >> that *by definition* anything written recursively is non-pythonic. I do >> not subscribe to that view. > > > Neither do I. Guido is on record as preferring iterative algorithms as more comprehensible for more people, and explicitly opposed to adding tail call optimisation. I tend to agree with him - functional programming works OK in the small (and pure functions are a fine tool for managing complexity), but to scale up in a way that fits people's brains, you need to start writing code that looks more like a cookbook. If you want inspiration on how to design a language for typical human thought patterns, look to cookbooks, training guides and operator manuals, not mathematics. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ned at nedbatchelder.com Sun Jan 19 13:56:21 2014 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 19 Jan 2014 07:56:21 -0500 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: <52DBCB75.7070209@nedbatchelder.com> On 1/19/14 6:54 AM, Joao S. O. Bueno wrote: > On 19 January 2014 08:52, Mark Lawrence wrote: >> On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >>> I propose tail-call optimization to be added into CPython. >> >> Then implement it so everybody else can use it. > On a second though, it actually could be done, at the VM level. > I am not a proponent, but after my second though I am from "-1" to "+0". > > I believe that anytime one have the sequence: > > 20 CALL_FUNCTION 1 > 23 RETURN_VALUE > > in byte code, the current stack frame could be discarded prior > to making the function call. Looking from 10000 meters, it feels > like it would not impact any other aspect of the language but for > enabling automatically tail recursion calls. A big confusion here is between "tail recursion calls" and "tail calls". This change would eliminate all tail calls, so that if f() ended by calling g(), then g would reuse the stack frame of f. If g raises an exception, the stack trace would have no evidence of f in it. This is what people mean about unusable stack traces. And don't forget that the stack is inspectable at runtime, so we aren't only talking about the visible stack trace produced on error, but the result of inspect.stack() etc, also. Sure, if you eliminate only *recursive* tail calls, then the resulting stack traces aren't so bad, because you can do bookkeeping so that the 1000 recursive calls to the same function are represented by one frame with an annotation of 1000 on it somewhere. But how do you make it work your above code work only for recursive calls? And what about mutually recursive calls? Aren't those important too? So we have two choices: the relatively easy job of eliminating all tail calls, which will throw away information we value, or the unsolved problem of how to eliminate recursive tail calls. --Ned. > > js > -><- > >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask what >> you can do for our language. >> >> Mark Lawrence >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From musicdenotation at gmail.com Sun Jan 19 15:00:00 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Sun, 19 Jan 2014 21:00:00 +0700 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: <76821962-4422-4840-BF06-E3C81F762290@gmail.com> > On Jan 19, 2014, at 18:57, "Joao S. O. Bueno" wrote: > > OTOH, since we are at it, we'd better check > 2009 BDLF's opinion on the subject: > > http://neopythonic.blogspot.com.br/2009/04/tail-recursion-elimination.html > > >> On 19 January 2014 09:54, Joao S. O. Bueno wrote: >>> On 19 January 2014 08:52, Mark Lawrence wrote: >>>> On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >>>> >>>> I propose tail-call optimization to be added into CPython. >>> >>> >>> Then implement it so everybody else can use it. >> >> On a second though, it actually could be done, at the VM level. >> I am not a proponent, but after my second though I am from "-1" to "+0". >> >> I believe that anytime one have the sequence: >> >> 20 CALL_FUNCTION 1 >> 23 RETURN_VALUE >> >> in byte code, the current stack frame could be discarded prior >> to making the function call. Looking from 10000 meters, it feels >> like it would not impact any other aspect of the language but for >> enabling automatically tail recursion calls. >> >> js >> -><- >> >>> >>> >>> -- >>> My fellow Pythonistas, ask not what our language can do for you, ask what >>> you can do for our language. >>> >>> Mark Lawrence >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ Actually, my original post is a response to his arguments. From denis.spir at gmail.com Sun Jan 19 15:03:05 2014 From: denis.spir at gmail.com (spir) Date: Sun, 19 Jan 2014 15:03:05 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: Message-ID: <52DBDB19.8080900@gmail.com> On 01/19/2014 12:09 AM, Terry Reedy wrote: I share all your views except for the following, which in my view is incomplete: > 1) A tail call is a 'top level' call in a return statement. > return f(*args, **kwds) > A directly recursive call, where f refers to the function with the return > statement, is a special case. This is also true for "actions" (rather than proper functions, which purpose is to compute a new piece of data). It's actually rather common (also in C ;-). There is no data procuct. (see PS) Denis PS: An example may be an "action" writing out list items which calls, or rather delegates to, another action that writes additional items preceded by a separator. def write_items (stream, l): n = len(l) if n == 0: stream.write('\n') return stream.write(str(l[0])) if n == 1 : return write_other_items(stream, l, n) # tail call def write_other_items (stream, l, n): for i in range(1,n): stream.write(" ") stream.write(str(l[i])) stream.write('\n') from sys import stdout l = [] write_items(stdout, l) l = [1] write_items(stdout, l) l = [1,2,3] write_items(stdout, l) From nas-python at arctrix.com Sun Jan 19 15:18:19 2014 From: nas-python at arctrix.com (Neil Schemenauer) Date: Sun, 19 Jan 2014 08:18:19 -0600 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119022811.GR3915@ando> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> Message-ID: <20140119141819.GA8137@python.ca> On 2014-01-19, Steven D'Aprano wrote: > [Neil] > > - if people install this new version of Python as the default, old > > scripts and programs will break. [...] > > - It gives people an excuse to avoid migrating, and as sure as the sun > rises in the east, will lead to people calling for Python 2.9 a few > years from now. That would be progress though. My proposed 2.8 would have most of the incompatible changes from 3.x so if people port it they will be much closer to 3.x. Neil From rosuav at gmail.com Sun Jan 19 15:22:38 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jan 2014 01:22:38 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119141819.GA8137@python.ca> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> Message-ID: On Mon, Jan 20, 2014 at 1:18 AM, Neil Schemenauer wrote: > On 2014-01-19, Steven D'Aprano wrote: >> [Neil] >> > - if people install this new version of Python as the default, old >> > scripts and programs will break. [...] >> >> - It gives people an excuse to avoid migrating, and as sure as the sun >> rises in the east, will lead to people calling for Python 2.9 a few >> years from now. > > That would be progress though. My proposed 2.8 would have most of > the incompatible changes from 3.x so if people port it they will be > much closer to 3.x. I still haven't seen any evidence that porting half-way and then half-way again later is going to be less work than just biting the bullet and porting to 3.x (either sooner or later, whichever is more convenient). ChrisA From solipsis at pitrou.net Sun Jan 19 15:23:08 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Jan 2014 15:23:08 +0100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> Message-ID: <20140119152308.6da15740@fsol> On Sun, 19 Jan 2014 08:18:19 -0600 Neil Schemenauer wrote: > On 2014-01-19, Steven D'Aprano wrote: > > [Neil] > > > - if people install this new version of Python as the default, old > > > scripts and programs will break. [...] > > > > - It gives people an excuse to avoid migrating, and as sure as the sun > > rises in the east, will lead to people calling for Python 2.9 a few > > years from now. > > That would be progress though. My proposed 2.8 would have most of > the incompatible changes from 3.x so if people port it they will be > much closer to 3.x. Not sure how code that would be incompatible with both 2.7 and 3.x should be considered progress. Regards Antoine. From denis.spir at gmail.com Sun Jan 19 15:27:13 2014 From: denis.spir at gmail.com (spir) Date: Sun, 19 Jan 2014 15:27:13 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: <52DBE0C1.9060705@gmail.com> On 01/19/2014 12:54 PM, Joao S. O. Bueno wrote: > On 19 January 2014 08:52, Mark Lawrence wrote: >> On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >>> >>> I propose tail-call optimization to be added into CPython. >> >> >> Then implement it so everybody else can use it. > > On a second though, it actually could be done, at the VM level. > I am not a proponent, but after my second though I am from "-1" to "+0". > > I believe that anytime one have the sequence: > > 20 CALL_FUNCTION 1 > 23 RETURN_VALUE > > in byte code, the current stack frame could be discarded prior > to making the function call. Looking from 10000 meters, it feels > like it would not impact any other aspect of the language but for > enabling automatically tail recursion calls. You also need to adjust frame size, possibly even its structure (dunno, depends on implementation details of python's "calling convention" so to say), to get a right space (and disposition) for the callee's input variables. Denis From denis.spir at gmail.com Sun Jan 19 16:07:16 2014 From: denis.spir at gmail.com (spir) Date: Sun, 19 Jan 2014 16:07:16 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <20140119004515.GP3915@ando> References: <20140119004515.GP3915@ando> Message-ID: <52DBEA24.2020903@gmail.com> On 01/19/2014 01:45 AM, Steven D'Aprano wrote: > What makes you say that it is "non-pythonic"? You seem to be assuming > that *by definition* anything written recursively is non-pythonic. I do > not subscribe to that view. It is certainly hard to judge the size of the field of "naturally" recursive algorithms. First it depends on applications or application domains, on thinking habits, on kinds of data structures... one is accustomed with. Second, there is much vagueness and ambiguity on the very term of recursion in programming. Recursion and recurrence just mean literally re-running, that is a cyclic, repetitive, looping, (re)iterative process. The typical case used as example is factorial. fact(0) = 1 fact(n) = n * fact(n-1) This is plain semantics. To get an *algorithm* to *compute* fact(n), one can interpret these semantics in 2 ways: * forward iteration: start with base case (0) then as long as we don't reach n compute the next factorial * backward iteration: start with n, then as long as we don't reach the base case compute the previous factorial Both are recursive, but in programming we call the second case a recursion, while the former is at times called corecursion (see wikipedia); corecursion is just equivalent to plain loops, since moving forward, and just as easy to understand. [Which is for the least strange, I'd happily swap the terms.] [And the actual computing process of backward iteration is actually forward, indeed, but this does not appear in code.] Funnily enough (since sum is rarely used as example) the trivial case of sum can lead to a similar reasoning. For quite a while I played with mostly functional languages, and as a consequence implemented many routines as "backward recursions" (ending with the base case), even when back to procedural programming, especially when dealing with tree-like or other linked data. I remember a case of a radix trie (see wikipedia) which I had a hard time getting right, actually a hard time *thinking*. Then by pure chance I stepped on an implementation of tries in python, using "forward recursion", which was trivial to understand. I translated the logic to my C trie, very happily. This radically changed my view on "naturally" recursive algorithms. That tries (and other linked data when implemented the functional way) are *self-similar* (see wikipedia) data structures, that thus corresponding algorithms too are "naturally" self-similar, does not imply that *backward* recursion is the natural / simple / easy way. (Now, I do agree that def fact (n): return 1 if n<2 else n * fact(n-1) is very ok: simple and easy enough... once one gets the very principle of backward recursion, meaning thinking recurrence so-to-say inside out.) Denis From jsbueno at python.org.br Sun Jan 19 16:12:21 2014 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 19 Jan 2014 13:12:21 -0200 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <52DBE0C1.9060705@gmail.com> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DBE0C1.9060705@gmail.com> Message-ID: On 19 January 2014 12:27, spir wrote: > On 01/19/2014 12:54 PM, Joao S. O. Bueno wrote: >> >> On 19 January 2014 08:52, Mark Lawrence wrote: >>> >>> On 19/01/2014 07:39, musicdenotation at gmail.com wrote: >>>> >>>> >>>> I propose tail-call optimization to be added into CPython. >>> >>> >>> >>> Then implement it so everybody else can use it. >> >> >> On a second though, it actually could be done, at the VM level. >> I am not a proponent, but after my second though I am from "-1" to "+0". >> >> I believe that anytime one have the sequence: >> >> 20 CALL_FUNCTION 1 >> 23 RETURN_VALUE >> >> in byte code, the current stack frame could be discarded prior >> to making the function call. Looking from 10000 meters, it feels >> like it would not impact any other aspect of the language but for >> enabling automatically tail recursion calls. > > > You also need to adjust frame size, possibly even its structure (dunno, > depends on implementation details of python's "calling convention" so to > say), to get a right space (and disposition) for the callee's input > variables. Not in this suggestion - I did not propose re-using the frame, as seens to be the case around the calls, just because of that: these frames in Python seen to be tied to the code object within it. My suggestion is simply to discard the current frame before building the frame for the call. (Maybe adding some logging information on this next frame so that the stack trace could be complete) js -><- > > Denis > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From nicholas.cole at gmail.com Sun Jan 19 16:22:38 2014 From: nicholas.cole at gmail.com (Nicholas Cole) Date: Sun, 19 Jan 2014 15:22:38 +0000 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> Message-ID: On Sun, Jan 19, 2014 at 2:22 PM, Chris Angelico wrote: > On Mon, Jan 20, 2014 at 1:18 AM, Neil Schemenauer > wrote: >> On 2014-01-19, Steven D'Aprano wrote: >>> [Neil] >>> > - if people install this new version of Python as the default, old >>> > scripts and programs will break. [...] >>> >>> - It gives people an excuse to avoid migrating, and as sure as the sun >>> rises in the east, will lead to people calling for Python 2.9 a few >>> years from now. >> >> That would be progress though. My proposed 2.8 would have most of >> the incompatible changes from 3.x so if people port it they will be >> much closer to 3.x. > > I still haven't seen any evidence that porting half-way and then > half-way again later is going to be less work than just biting the > bullet and porting to 3.x (either sooner or later, whichever is more > convenient). All of these threads do remind me of the Achilles Paradox. From denis.spir at gmail.com Sun Jan 19 16:31:53 2014 From: denis.spir at gmail.com (spir) Date: Sun, 19 Jan 2014 16:31:53 +0100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DBE0C1.9060705@gmail.com> Message-ID: <52DBEFE9.8060000@gmail.com> On 01/19/2014 04:12 PM, Joao S. O. Bueno wrote: >> >You also need to adjust frame size, possibly even its structure (dunno, >> >depends on implementation details of python's "calling convention" so to >> >say), to get a right space (and disposition) for the callee's input >> >variables. > Not in this suggestion - I did not propose re-using the frame, > as seens to be the case around the calls, just because of that: > these frames in Python seen to be tied to the code object > within it. My suggestion is simply to discard the current frame > before building the frame for the call. (Maybe adding some logging > information on this next frame so that the stack trace could be complete) All right, I did not rightly get your proposal. Denis From techtonik at gmail.com Sun Jan 19 10:08:27 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 19 Jan 2014 12:08:27 +0300 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140118032219.GA11381@python.ca> References: <20140118032219.GA11381@python.ca> Message-ID: On Sat, Jan 18, 2014 at 6:22 AM, Neil Schemenauer wrote: > The transition to Python 3 is happening but there is still a massive > amount of code that needs to be ported. That's a common illusion. Python 2 is a good binary language, Python 3 is a good text language. Leaving things as-is saves lifetime and energy. There is a conflicting constraint that you can't get all three: 1. readable language 2. work with strings as abstract unicode datapoints 3. work with strings as binary data Python 2 was more explicit for unicode data (and this was tiresome for text lovers) and Python 3 is explicit about binary (which makes life harder for those who work with binary data). > One of the most disruptive > changes in Python 3 is the strict separation of bytes from unicode > strings. Most of the other incompatible changes can be handled by > 2to3. 2to3 is far from being a perfect tool, not a user level one, for sure, but I don't maintain list of all things that cause troubles. Probably the major one is that there is no docs how to write own fixers (and you need that for 3rd party projects). The thing I disagree is that incompatible changes can be handled by 2to3. There are many internal things that make Python 3 awesome, but they were not ported to Python 2, because people wanted "the next better thing" and thought about Python 2 as a dead end. Some of us still think this way, but I hope that recent threads made them more flexible. Many internal features would be good to be backported into Python 2 series and these are invisible on 2to3 level. > Here is a far out idea to make transition smoother. Release version > 2.8 of Python with nearly all Python 3.x incompatible changes except > for the bytes/unicode changes. This could include: > > - print as function > > - default string literal as unicode And this will be literally the end of Python 2.8 in the same way as Python 3. Just attach here the list of consequences. Good exercise for story-writing: "And now all strings are unicorne". > - return view objects for dict.keys(), etc > > - rename modules in standard library > > - rename long to int > > - rename .next() to __next__() > > - accept only new 'raise' syntax > > - remove backticks for repr > > - rename unicode to str > > - removal of 'apply', 'buffer', 'callable', 'execfile' > > - exec as function > > - rename os.getcwdu() to os.getcwd() > > - remove dict.has_key > > - move intern to sys.intern() > > - rename xrange to range > > - remove xreadlines > > New features of Python 3.x could be backported if easy since they > could be useful to entice developers to move from 2.7 to 2.8. What if people don't need bloated Python with all there features? What if Python 4 should not only move stdlib into modules, but features also? I look at this list as an RPG called "Personal Python". You generate you character by selecting traits you like. Some of them are conflicting like "default binary vs unicode". Once your character is ready, you may start to play with it. Probably something I'd expect from PyPy project, but well it requires more engineering and experiment time than it is possible in open source projects. Here is the idea without implementation how to pack those features: http://techtonik.rainforce.org/2013/04/program-config-as-dna-strand.html > An alternative approach to producing Python 2.8 would be to start > with the Python 3.x latest branch. Modify bytesobject and > unicodeobject to have as close to Python 2 behavior as practical. I'd start with PyPy. They need more help with Python 3 transition. > A-journey-of-a-thousand-miles-begins-ly y'rs From techtonik at gmail.com Sun Jan 19 10:16:59 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 19 Jan 2014 12:16:59 +0300 Subject: [Python-ideas] Make print() not append line break by default In-Reply-To: <52db9111.88a3420a.6785.6324@mx.google.com> References: <52db9111.88a3420a.6785.6324@mx.google.com> Message-ID: On Sun, Jan 19, 2014 at 11:47 AM, wrote: > And add println() Python 2: def echo(msg, lineend=''): import sys sys.stdout.write(msg + lineend) It is better than having dozen of print functions in documentation that make this documentation unreadable. -- anatoly t. From stephen at xemacs.org Sun Jan 19 18:19:51 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jan 2014 02:19:51 +0900 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119011332.GA5735@python.ca> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> Message-ID: <87ppnnanyg.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Schemenauer writes: > On 2014-01-18, Terry Reedy wrote: > > On 1/17/2014 10:22 PM, Neil Schemenauer wrote: > > >The transition to Python 3 is happening but there is still a massive > > >amount of code that needs to be ported. > > > > For application code, why does it need to be ported. > > Unless Python 2.x is going to be maintained in perpetuity then code > will have to be ported. This point seems obvious to me. But it's not even true. Python 2.7 is a Turing-complete language, it can do anything that any language can do as an abstract computation, and 2.7.6 has extremely few bugs and sufficient bindings to OS facilities to do almost anything in practice as well. It's a pretty darn good language. Most Python 2 programs will probably be abandoned before Python 2.7.6 will need additional maintenance beyond what is already provided by various OS distros. > I disagree. The amount of Python 2 code that exists exceeds the > amount of Python 3 by orders of magnitude. That existing codebase > either stops evolving and stays Python 2 forever But "stays Python 2 forever" != "stops evolving". There is absolutely nothing to stop a Python 2 program from evolving dramatically over the indefinite future, any more than sticking to C89 stops a lot of C programs from evolving. I don't see any real reason to suppose that most applications will find a true need to evolve in directions that Python 2 doesn't support for quite a while. > A Python 2 fork is going to happen whether the PSF blesses it or > not, I can't believe that's even a point of discussion. It's not a point of discussion. In the same sense that COBOL compilers continue to be maintained today, Python 2 was forked long ago. Not only are there non-CPython implementations of the language, every distro (commercial or not) has their own patches (perhaps a null set for Python 2.7.6). That's not going to stop, and as Nick points out Stackless is even likely to add some Python 3 features to their implementation of 2.x. But that's a specialty interest, and not even all Stackless users will necessarily use those features. I doubt many commercial packagers of CPython will have customers interested in them -- the Stackless guys want the Python 3 features for internal use as much as for their clients IIRC. But a fork of the kind you propose isn't going to happen. Definitely not under the auspices of the PSF, that's been settled with PEP 404. Nor with volunteer labor -- there aren't any volunteers for that. If there were, they would have started long ago. And I don't see a story for a commercial fork, either. The problem is that there's no "halfway point" here. Porting a program from Python 2 to Python 3 either does not require a fundamental rethink of its internal text processing, or it does. In the former case, 2to3 does a pretty good job, and what's left is a SMOP, mostly to fit appropriate decoding/encoding on to I/O. In the latter case, you've got big problems -- a complete redesign and an audit of all code for conformance to the new design. This is the watershed; there's no way to create a language intermediate between Python 2 and Python 3 so that porting Python 2 to Python-sqrt(6) is half the work, and porting Python-sqrt(6) to Python 3 is half the work. From stephen at xemacs.org Sun Jan 19 18:26:30 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jan 2014 02:26:30 +0900 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DBE0C1.9060705@gmail.com> Message-ID: <87ob37annd.fsf@uwakimon.sk.tsukuba.ac.jp> Joao S. O. Bueno writes: > My suggestion is simply to discard the current frame before > building the frame for the call. (Maybe adding some logging > information on this next frame so that the stack trace could be > complete) That way lies madness. The logging information needs to be stored somewhere. If it's to be "complete", it may as well be in ... wait for it ... a stack frame. From abarnert at yahoo.com Sun Jan 19 21:01:00 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jan 2014 12:01:00 -0800 (PST) Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <20140119004515.GP3915@ando> References: <20140119004515.GP3915@ando> Message-ID: <1390161660.41249.YahooMailNeo@web181002.mail.ne1.yahoo.com> From: Steven D'Aprano Sent: Saturday, January 18, 2014 4:45 PM > On Sat, Jan 18, 2014 at 10:29:46AM -0800, Andrew Barnert wrote: > >> Whether or not you really need it, adding it to Python is a fun >> challenge that's worth trying. > > "Need" is a funny thing. Which I why I made that point. It's not a completely objective question, and it may be hard for the OP (or you, or anyone else) to convince anyone that he "needs" it even though he does (or, more importantly, convince people that _they_ need it). If so, he doesn't have to let that stop him from writing and sharing an implementation. It may turn out that, once people have a chance to play with it, that will convince everyone better than any abstract argument he could make. If not, at least he's had fun, learned about CPython internals, and, most importantly, produced a fork that he can maintain as long as he thinks he needs it. Depending on your time and resources, that may not be worth doing, but that's the same decision as any other development project; there's nothing actually stopping anyone from doing it if it's worth their while, so anyone who wants this should consider whether it's worth their while to do it. > You can go a long way without recursion, or only shallow recursion. In > 15 years + of writing Python code, I've never been in a position that I > couldn't solve a problem because of the lack of tail recursion. But > every time I manually convert a recursive algorithm to an iterative one, > I feel that I'm doing make-work, manually doing something which the > compiler is much better at than I am, and the result is often less > natural, or even awkward. (Trampolines? Ewww.) But the same is true for converting a naive recursive algorithm to tail-recursive. It's unpleasant make-work, just like converting it to iteration. In a language like Common Lisp, it's about the same amount of work, but the tail-recursive version often ends up looking more natural. In a language like Python, where we typically deal in iterables rather than recursive data structures, I believe it would often be _more_ work rather than the same amount, and end up looking a lot less natural rather than more. I'm sure there would be exceptions, but I suspect they would be rare. >> Third, eliminating tail calls means the aren't on the stack at >> runtime, which means there's no obvious way to display useful >> tracebacks. I don't think too many Python users would accept the >> tradeoff of giving up good tracebacks to enable certain kinds of >> non-pythonic code, > > What makes you say that it is "non-pythonic"? You seem to be assuming > that *by definition* anything written recursively is non-pythonic. Not at all. There's plenty of code that's naturally recursive even in Python?and much of that code is written recursively today. For a good example, see os.walk. However, the main driver for TCE is the ability to write looping constructs recursively, which is not possible without it (unless the thing you're looping over is guaranteed not to be too big). Look at any tutorial on tail recursion; it's always recursing over a cons list or something similar. And looping that way in Python will almost always be non-pythonic, because you will have to drive the iterable manually. Again, there are surely exceptions, but I doubt they'd be very common. > In fact, in some cases, I *would* willingly give up *non-useful*? > tracebacks for the ability to write more idiomatic code. Have you seen > the typical recursive traceback? But if you eliminate tail calls, you're not just eliminating recursive tracebacks; you're eliminating every stack frame that ends in a tail call. Which includes a huge number of useful frames. If you restrict it to _only_ eliminating recursive tail calls, then it goes from something that can be done at compile time (as I showed in my previous email) to something that has to be done at runtime, making every function call slower. And it doesn't work with mutual or indirect recursion (unless you want to walk the whole stack to see if the function being called exists higher up?which makes it even slower, and also gets us back to eliminating useful tracebacks). > py> a(7) > Traceback (most recent call last): > ? File "", line 1, in > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 9, in c > ? ? return 1/n > ZeroDivisionError: division by zero > > The only thing that I care about is the very last line, that function c > tries to divide by zero. The rest of the traceback is just noise, I > don't even look at it. Your example is not actually tail-recursive. I'm guessing you know this, and decided that having something that blows up fast just to have an example of a recursive traceback was more important than having an example that also fits into the rest of the discussion?which is perfectly reasonable.? But it's still worth calling that out, because at least half the blog posts out there that say "Python sucks because it doesn't have TCE" prove Python's suckiness by showing a non-tail-recursive algorithm that would blow up exactly the same way in Scheme as in Python.? > Now, it's okay if you disagree, or if you can see something useful in? > the traceback other than the last entry. Sure. Unless that line in b is the only place in your code that ever calls c, I think it would be useful to know how we got to c and why n is 0. If that isn't useful, than _no_ tracebacks are ever useful, not just recursive ones. > I'm not suggesting that TCE > should be compulsary. I would be happy with a commandline switch to > turn it on, or better still, a decorator to apply it to certain > functions and not others. I expect that I'd have TCE turned off for > debugging. But the primary reason people want TCE is to be able to write functions that otherwise wouldn't run. Nobody asks for TCE because they're concerned about 2KB wasted on stack traces in their shallow algorithm; they ask for it because their deep algorithm fails with a recursion error. So, turning it off to debug it means turning off the ability to reproduce the error you're trying to debug. >> but even if you don't solve this, you can always? >> maintain a fork the same way that Stackless has been doing. > > Having to fork the entire compiler just to write a few functions in > their most idiomatic, natural (recursive) form seems a bit extreme, > wouldn't you say? Not necessarily. The whole reason Stackless exists is to be able to write some algorithms in a natural way that wasn't possible with mainline CPython. At least early on, it looked at least plausible that Stackless would eventually be merged into the main core, although that turned out not to happen. There are some core language changes that were inspired by Stackless. Someone?(Ralf Schmidt, I think?) was able to extract some of Stackless's functionality into a module that works with CPython, which is very cool. But even without any of that, people were able to use?Stackless when they wanted to write code that required its features. That's surely better than not being able to write it, period. And a TCE fork could go the same way. It might get merged into the core one day, or it might inspire some changes in the core, or it might turn out to be possible to extract the key functionality into a module for CPython?but even if none of that happens, you, and others, can still use your fork when you want to. If you prefer to call it a patch or a branch or something else instead of a fork, that's fine, but it's basically the same amount of work either way, and there's nothing stopping anyone who wants it from doing it. From haoyi.sg at gmail.com Sun Jan 19 22:33:28 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Sun, 19 Jan 2014 13:33:28 -0800 Subject: [Python-ideas] Tail recursion elimination Message-ID: <7822052862240502399@unknownmsgid> > Having to fork the entire compiler just to write a few functions in > their most idiomatic, natural (recursive) form seems a bit extreme, > wouldn't you say? You don't need to. MacroPy's @tco decorator is about as easy as you could ask for. 'pip install macropy', 'from macropy.experimental.tco import macros, tco' is about as easy as you could ask for. Works for arbitrary tail-calls too, not just tail recursion. If you haven't tried it out, complaining about the difficulty of implementing tail-call-optimization yourself seems silly. From: Andrew Barnert Sent: 1/19/2014 12:04 PM To: Steven D'Aprano; python-ideas at python.org Subject: Re: [Python-ideas] Tail recursion elimination From: Steven D'Aprano Sent: Saturday, January 18, 2014 4:45 PM > On Sat, Jan 18, 2014 at 10:29:46AM -0800, Andrew Barnert wrote: > >> Whether or not you really need it, adding it to Python is a fun >> challenge that's worth trying. > > "Need" is a funny thing. Which I why I made that point. It's not a completely objective question, and it may be hard for the OP (or you, or anyone else) to convince anyone that he "needs" it even though he does (or, more importantly, convince people that _they_ need it). If so, he doesn't have to let that stop him from writing and sharing an implementation. It may turn out that, once people have a chance to play with it, that will convince everyone better than any abstract argument he could make. If not, at least he's had fun, learned about CPython internals, and, most importantly, produced a fork that he can maintain as long as he thinks he needs it. Depending on your time and resources, that may not be worth doing, but that's the same decision as any other development project; there's nothing actually stopping anyone from doing it if it's worth their while, so anyone who wants this should consider whether it's worth their while to do it. > You can go a long way without recursion, or only shallow recursion. In > 15 years + of writing Python code, I've never been in a position that I > couldn't solve a problem because of the lack of tail recursion. But > every time I manually convert a recursive algorithm to an iterative one, > I feel that I'm doing make-work, manually doing something which the > compiler is much better at than I am, and the result is often less > natural, or even awkward. (Trampolines? Ewww.) But the same is true for converting a naive recursive algorithm to tail-recursive. It's unpleasant make-work, just like converting it to iteration. In a language like Common Lisp, it's about the same amount of work, but the tail-recursive version often ends up looking more natural. In a language like Python, where we typically deal in iterables rather than recursive data structures, I believe it would often be _more_ work rather than the same amount, and end up looking a lot less natural rather than more. I'm sure there would be exceptions, but I suspect they would be rare. >> Third, eliminating tail calls means the aren't on the stack at >> runtime, which means there's no obvious way to display useful >> tracebacks. I don't think too many Python users would accept the >> tradeoff of giving up good tracebacks to enable certain kinds of >> non-pythonic code, > > What makes you say that it is "non-pythonic"? You seem to be assuming > that *by definition* anything written recursively is non-pythonic. Not at all. There's plenty of code that's naturally recursive even in Python?and much of that code is written recursively today. For a good example, see os.walk. However, the main driver for TCE is the ability to write looping constructs recursively, which is not possible without it (unless the thing you're looping over is guaranteed not to be too big). Look at any tutorial on tail recursion; it's always recursing over a cons list or something similar. And looping that way in Python will almost always be non-pythonic, because you will have to drive the iterable manually. Again, there are surely exceptions, but I doubt they'd be very common. > In fact, in some cases, I *would* willingly give up *non-useful* > tracebacks for the ability to write more idiomatic code. Have you seen > the typical recursive traceback? But if you eliminate tail calls, you're not just eliminating recursive tracebacks; you're eliminating every stack frame that ends in a tail call. Which includes a huge number of useful frames. If you restrict it to _only_ eliminating recursive tail calls, then it goes from something that can be done at compile time (as I showed in my previous email) to something that has to be done at runtime, making every function call slower. And it doesn't work with mutual or indirect recursion (unless you want to walk the whole stack to see if the function being called exists higher up?which makes it even slower, and also gets us back to eliminating useful tracebacks). > py> a(7) > Traceback (most recent call last): > ? File "", line 1, in > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 2, in a > ? ? return b(n-1) > ? File "./rectest.py", line 5, in b > ? ? return c(n-1) + a(n) > ? File "./rectest.py", line 9, in c > ? ? return 1/n > ZeroDivisionError: division by zero > > The only thing that I care about is the very last line, that function c > tries to divide by zero. The rest of the traceback is just noise, I > don't even look at it. Your example is not actually tail-recursive. I'm guessing you know this, and decided that having something that blows up fast just to have an example of a recursive traceback was more important than having an example that also fits into the rest of the discussion?which is perfectly reasonable. But it's still worth calling that out, because at least half the blog posts out there that say "Python sucks because it doesn't have TCE" prove Python's suckiness by showing a non-tail-recursive algorithm that would blow up exactly the same way in Scheme as in Python. > Now, it's okay if you disagree, or if you can see something useful in > the traceback other than the last entry. Sure. Unless that line in b is the only place in your code that ever calls c, I think it would be useful to know how we got to c and why n is 0. If that isn't useful, than _no_ tracebacks are ever useful, not just recursive ones. > I'm not suggesting that TCE > should be compulsary. I would be happy with a commandline switch to > turn it on, or better still, a decorator to apply it to certain > functions and not others. I expect that I'd have TCE turned off for > debugging. But the primary reason people want TCE is to be able to write functions that otherwise wouldn't run. Nobody asks for TCE because they're concerned about 2KB wasted on stack traces in their shallow algorithm; they ask for it because their deep algorithm fails with a recursion error. So, turning it off to debug it means turning off the ability to reproduce the error you're trying to debug. >> but even if you don't solve this, you can always >> maintain a fork the same way that Stackless has been doing. > > Having to fork the entire compiler just to write a few functions in > their most idiomatic, natural (recursive) form seems a bit extreme, > wouldn't you say? Not necessarily. The whole reason Stackless exists is to be able to write some algorithms in a natural way that wasn't possible with mainline CPython. At least early on, it looked at least plausible that Stackless would eventually be merged into the main core, although that turned out not to happen. There are some core language changes that were inspired by Stackless. Someone?(Ralf Schmidt, I think?) was able to extract some of Stackless's functionality into a module that works with CPython, which is very cool. But even without any of that, people were able to use?Stackless when they wanted to write code that required its features. That's surely better than not being able to write it, period. And a TCE fork could go the same way. It might get merged into the core one day, or it might inspire some changes in the core, or it might turn out to be possible to extract the key functionality into a module for CPython?but even if none of that happens, you, and others, can still use your fork when you want to. If you prefer to call it a patch or a branch or something else instead of a fork, that's fine, but it's basically the same amount of work either way, and there's nothing stopping anyone who wants it from doing it. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ From musicdenotation at gmail.com Sun Jan 19 23:32:30 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Mon, 20 Jan 2014 05:32:30 +0700 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: References: <20140119004515.GP3915@ando> Message-ID: <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> >> On Jan 19, 2014, at 19:31, Nick Coghlan wrote: >> >> On 19 January 2014 22:12, Terry Reedy wrote: >> TCO (Tail Call Optimization) means that when TCO is in effect and a tail >> call "return f()" is executed, the current execution context (stack >> frame) is used for the call instead of allocating a new one. What is >> 'optimized' is space usage. The effect on time is not clear. >> >>> On 1/18/2014 7:45 PM, Steven D'Aprano wrote: >>> >>> What makes you say that it is "non-pythonic"? You seem to be assuming >>> that *by definition* anything written recursively is non-pythonic. I do >>> not subscribe to that view. >> >> >> Neither do I. > > Guido is on record as preferring iterative algorithms as more > comprehensible for more people, and explicitly opposed to adding tail > call optimisation. I tend to agree with him - functional programming > works OK in the small (and pure functions are a fine tool for managing > complexity), but to scale up in a way that fits people's brains, you > need to start writing code that looks more like a cookbook. > > If you want inspiration on how to design a language for typical human > thought patterns, look to cookbooks, training guides and operator > manuals, not mathematics. > > Nick > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ See this: http://www.stanford.edu/class/cs242/readings/backus.pdf It fits peoples' brains more because of familiarity, not "nature". While procedures in a guide (cookbook, user manual,...) are better written imperatively because of the way things are done (so are user interfaces), the behind-the-scenes algorithms have no single "intuitive" way to write that applies for all cases. They are written imperatively because of performance (and later, familiarity). Poor support for functional programming + Global Interpreter Lock = Outdated language. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mcepl at redhat.com Sun Jan 19 23:33:56 2014 From: mcepl at redhat.com (=?UTF-8?Q?Mat=C4=9Bj?= Cepl) Date: Sun, 19 Jan 2014 23:33:56 +0100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <7wd2joaagr.fsf@benfinney.id.au> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> Message-ID: <20140119223357.178194112C@wycliff.ceplovi.cz> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2014-01-19, 03:58 GMT, you wrote: > But that doesn't stop other parties ? Red Hat, ActiveState, > etc. ? doing so for whatever customers are still interested in > compensating them for their work. a) necessary disclaimer: I AM not speaking for my employer, just words out of my ass. b) The point which is overlooked here, that people promoting python 2.8 are not speaking for STABILITY in the sense RHEL is stable. They want further DEVELOPMENT and CHANGES of Python to improve and react to the changed circumstances. That is not, as far as I understand it, the business Red Hat is in. Our customers ask us to support Python 2.7.* (or 2.6.* for RHEL-6, and 2.4.* for RHEL-5) with API UNCHANGED as it is now so that their applications developed now for RHEL 7 (or RHEL 6, 5, etc.) are running UNCHANGED. They are usually NOT interested in further development and changing Python API. So, I don't see us as rooting for the further development of Python 2.* API. Best, Mat?j -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iD8DBQFS3FLU4J/vJdlkhKwRAnrDAJ45gSeWpGolBz/REHg04JE1yoPSnACcD1cj Q6EMTVNt1iPe2/USm2vPxEk= =Pufw -----END PGP SIGNATURE----- From tjreedy at udel.edu Mon Jan 20 00:13:56 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 18:13:56 -0500 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: Proposal (mostly not mine): add 'return from f(args)', in analogy with 'yield from iterator', to return a value to the caller from an execution frame running f(args) (and either reuse or delete the frame that ran 'return from'). The function name 'f' would not have to match the name of the function being compiled, this would actually be TCO, even if it were nearly always used for recursive tail calls. That does mean that is would work for mutually tail recursive functions. On 1/19/2014 6:57 AM, Joao S. O. Bueno wrote: > OTOH, since we are at it, we'd better check > 2009 BDLF's opinion on the subject: > > http://neopythonic.blogspot.com.br/2009/04/tail-recursion-elimination.html I read throught the comments and near the very end, in July 2013, Dan LaMotte said... ''' Definitely seems to be complicated/impossible to determine a function is tail recursion 'compliant' statically in python, however, what if it were an 'opt in' feature that uses a different 'return' keyword? def f(n): if n > 0: tailcall f(n - 1) return 0 ''' In additional paragraphs, he noted, among other things, that this makes the feature 'opt-in' on a function by function basis. Guido replied "Dan: your proposal has the redeeming quality of clearly being a language feature rather than a possible optimization. I don't really expect there to be enough demand to actually add this to the language though. Maybe you can use macropy to play around with the idea though?" ???? then suggested 'return from'. My only contribution is to point out the analogy with the new, and initially strange, 'yield from'. Guido seems to have said that if a) someone tries out the idea with macropy, and b) someone demonstrates enough demand, he might consider adding such a feature. So this seems to me the best option to pursue to get something into CPython. I also think it is the best proposal so far. As for a), I have not looked as macropy, but: On 1/19/2014 4:33 PM, Haoyi Li wrote:> MacroPy's @tco decorator is about as easy as you could ask for. 'pip > install macropy', 'from macropy.experimental.tco import macros, tco' > is about as easy as you could ask for. Works for arbitrary tail-calls > too, not just tail recursion. That leaves b) for those of you who want the feature. Any PEP should admit that the feature might be abused. Someone might write return from len(composite) Unless return from refuses to delete the frame making a call to a C function, the effect would be to save a trivial O(1) space as the cost of deleting the most important line of a stack trace should len() raise. But I think this falls under the 'consenting adults' principle. A proposed doc should make it clear that the intended use is to make deeply recursive or mutually recursive functions run and not to replace all tail calls. -- Terry Jan Reedy From ben+python at benfinney.id.au Mon Jan 20 00:35:41 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 20 Jan 2014 10:35:41 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> Message-ID: <85sisjwnn6.fsf@benfinney.id.au> Mat?j Cepl writes: > On 2014-01-19, 03:58 GMT, you wrote: > > But that doesn't stop other parties ? Red Hat, ActiveState, > > etc. ? doing so for whatever customers are still interested in > > compensating them for their work. > > a) necessary disclaimer: I AM not speaking for my employer, just > words out of my ass. > b) The point which is overlooked here, that people promoting > python 2.8 are not speaking for STABILITY in the sense RHEL is > stable. They want further DEVELOPMENT and CHANGES of Python to > improve and react to the changed circumstances. I'm not overlooking that, I'm pointing out that Python is free software, so *the option is there*, for those who want Python 2 maintained indefinitely, to motivate and compensate some party to do it. Python 2 is free software, so any capable party can fulfil the developer and maintainer role without any further permission required. The PSF has made it clear they will not be that party past a certain point; but Python 2 is licensed freely from the PSF to all recipients, so the PSF's decision not to maintain Python 2 in no way prevents anyone else doing so. So, what ?people promoting the continuance of Python 2? are asking for is entirely within their power to have, if they want it enough. Will they do it? That's up to them; no-one is stopping them. > That is not, as far as I understand it, the business Red Hat is in.[?] > So, I don't see us as rooting for the further development of Python > 2.* API. And that's an entirely reasonable decision for Red Hat to make. My point is that *nothing the PSF is doing prevents* such a party from choosing to do so. In other words, those who want Python 2 to continue need to either bite the bullet and move their migration to Python 3 forward, or get themselves organised and come up with an entity which will maintain Python 2 for as long as they want it maintained. It's no-one else's responsibility, and no-one else is stopping them. Put up or shut up, folks! -- \ ?Software patents provide one more means of controlling access | `\ to information. They are the tool of choice for the internet | _o__) highwayman.? ?Anthony Taylor | Ben Finney From timothy.c.delaney at gmail.com Mon Jan 20 00:39:52 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 20 Jan 2014 10:39:52 +1100 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: On 20 January 2014 10:13, Terry Reedy wrote: > Proposal (mostly not mine): add 'return from f(args)', in analogy with > 'yield from iterator', to return a value to the caller from an execution > frame running f(args) (and either reuse or delete the frame that ran > 'return from'). The function name 'f' would not have to match the name of > the function being compiled, this would actually be TCO, even if it were > nearly always used for recursive tail calls. That does mean that is would > work for mutually tail recursive functions. > As someone who is happy with the status quo, "return from" seems to me to be the only sensible way to incorporate it into the language. Direct analogy with yield from, clear semantics ... I like it. Any PEP should admit that the feature might be abused. Someone might write > return from len(composite) > Unless return from refuses to delete the frame making a call to a C > function, the effect would be to save a trivial O(1) space as the cost of > deleting the most important line of a stack trace should len() raise. But I > think this falls under the 'consenting adults' principle. A proposed doc > should make it clear that the intended use is to make deeply recursive or > mutually recursive functions run and not to replace all tail calls. Consenting adults does make things nice and simple. I'm not proposing the following semantics, but I can think of an alternative that might be useful, but likely difficult (and costly) to implement, and difficult to explain. When code goes through a "return from", that frame is retained, but when a new frame for the same code object is created in the call stack, you *then* delete the calling frame. Hmm - actually, you could keep a structure (e.g. a dict) on the side mapping code objects to the most recent frame for that code object - that would make it reasonably cheap to do. Wouldn't get particularly large either since you'd only be recording frames that continued through a "return from". Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From var.mail.daniel at gmail.com Mon Jan 20 00:41:18 2014 From: var.mail.daniel at gmail.com (Daniel da Silva) Date: Sun, 19 Jan 2014 18:41:18 -0500 Subject: [Python-ideas] Predicate Sets Message-ID: Below is a description of a very simple but immensely useful class called a "predicate set". In combination with the set and list comprehensions they would allow another natural layer of reasoning with mathematical set logic in Python. In my opinion, a concept like this would be best located in the functools module. *Overview:* Sets in mathematics can be defined by a list of elements without repetitions, and alternatively by a predicate (function) that determines inclusion. A predicate set would be a set-like class that is instantiated with a predicate function that is called to determine ``a in the_predicate_set''. >> myset = predicateset(lambda s: s.startswith('a')) >> 'xyz' in myset False >> 'abc' in myset True >> len(myself) Traceback (most recent call last): [...] TypeError *Example Uses:* # Dynamic excludes in searching foo_files = search_files('foo', exclude=set(['a.out', 'Makefile'])) bar_files = search_files('bar', exclude=predicateset(lambda fname: not fname.endswith('~'))) # exclude *~ # Use in place of a set with an ORM validusernames = predicateset(lambda s: re.match(s, '[a-zA-Z0-9]+')) class Users(db.Model): username = db.StringProperty(choices=validusernames) password = db.StringProperty() -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jan 20 00:44:32 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 19 Jan 2014 15:44:32 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <85sisjwnn6.fsf@benfinney.id.au> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> Message-ID: <52DC6360.10709@stoneleaf.us> On 01/19/2014 03:35 PM, Ben Finney wrote: > > In other words, those who want Python 2 to continue need to either bite > the bullet and move their migration to Python 3 forward Um, if they want Python 2 to continue, why would they migrate to Python 3? -- ~Ethan~ From rosuav at gmail.com Mon Jan 20 00:49:59 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jan 2014 10:49:59 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119223357.178194112C@wycliff.ceplovi.cz> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> Message-ID: On Mon, Jan 20, 2014 at 9:33 AM, Mat?j Cepl wrote: > On 2014-01-19, 03:58 GMT, you wrote: >> But that doesn't stop other parties ? Red Hat, ActiveState, >> etc. ? doing so for whatever customers are still interested in >> compensating them for their work. Please, this is a list with lots of recipients. Don't say "you" wrote here - use a name :) Thanks! ChrisA From ian at feete.org Mon Jan 20 01:04:59 2014 From: ian at feete.org (Ian Foote) Date: Mon, 20 Jan 2014 00:04:59 +0000 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: Message-ID: <52DC682B.1000203@feete.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 19/01/14 23:41, Daniel da Silva wrote: > Below is a description of a very simple but immensely useful class > called a "predicate set". In combination with the set and list > comprehensions they would allow another natural layer of reasoning > with mathematical set logic in Python. > > In my opinion, a concept like this would be best located in the > functools module. > > > *Overview:* Sets in mathematics can be defined by a list of > elements without repetitions, and alternatively by a predicate > (function) that determines inclusion. A predicate set would be a > set-like class that is instantiated with a predicate function that > is called to determine ``a in the_predicate_set''. > >>> myset = predicateset(lambda s: s.startswith('a')) 'xyz' in >>> myset > False >>> 'abc' in myset > True >>> len(myself) > Traceback (most recent call last): [...] TypeError * * *Example > Uses:* # Dynamic excludes in searching foo_files = > search_files('foo', exclude=set(['a.out', 'Makefile'])) bar_files = > search_files('bar', exclude=predicateset(lambda fname: not > fname.endswith('~'))) # exclude *~ > > # Use in place of a set with an ORM validusernames = > predicateset(lambda s: re.match(s, '[a-zA-Z0-9]+')) > > class Users(db.Model): username = > db.StringProperty(choices=validusernames) password = > db.StringProperty() > > Hi Daniel, That's an interesting idea. I'm not sure it would be used enough to include in the standard library though. Have you considered releasing an implementation on PyPI? That has the advantage that people can start using it earlier than would be possible if it was added to the standard library. Regards, Ian -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJS3GgrAAoJEODsV4MF7PWzI1EH/0FKiJYKZgRd6iW04Ic9NPXw QL+EKQU0UdRjCvP9IWrBSdGYnmB06YHdwyeLPpk0+amGSzXpsMGNRHtAXhxjba00 1Q9UKHnVcIj3kgjfYg+LKezMVJHQF4vE+umrbMQFeWBt7FEKfqseCbyDRIZAm9I8 G/dOzP3dxC4lktlCtLv6sfVD8D648A9wMNX5879SoUKjX+Qs0ySZ9CVxhBbyFVgP kXLG1/9NlmkyJmWsL6hHwWYI9WwnJ433Ts74bqmwOaTDlGdmmZNHfQT5kIHzRK8V g8XXZWxct8EVvTjyL+//n+DuSsFEDxhXTX0gGXMs0xDXunbDBHWNggs9G2B+GI0= =WsXD -----END PGP SIGNATURE----- From steve at pearwood.info Mon Jan 20 01:06:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jan 2014 11:06:45 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <52DC6360.10709@stoneleaf.us> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> Message-ID: <20140120000645.GV3915@ando> On Sun, Jan 19, 2014 at 03:44:32PM -0800, Ethan Furman wrote: > On 01/19/2014 03:35 PM, Ben Finney wrote: > > > >In other words, those who want Python 2 to continue need to either bite > >the bullet and move their migration to Python 3 forward > > Um, if they want Python 2 to continue, why would they migrate to Python 3? Because you can't always get what you want. I want a pony, but since I can't afford one or have any place to keep it, I've made do without. -- Steven From ben+python at benfinney.id.au Mon Jan 20 01:07:17 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 20 Jan 2014 11:07:17 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> Message-ID: <85ob37wm6i.fsf@benfinney.id.au> Ethan Furman writes: > On 01/19/2014 03:35 PM, Ben Finney wrote: > > > > In other words, those who want Python 2 to continue need to either > > bite the bullet and move their migration to Python 3 forward > > Um, if they want Python 2 to continue, why would they migrate to > Python 3? One of the often-stated justifications for wanting Python 2 to continue is that the party wants to migrate their code base to Python 3, but ?eventually?. With that clause, I'm pointing out that ?we can't find anyone to continue maintaining Python 2 the way we want for the price we want to pay for the length of time we want to keep using Python 2? still leaves the plaintiff with the option to hurry up and migrate to Python 3. -- \ ?Airports are ugly. Some are very ugly. Some attain a degree of | `\ ugliness that can only be the result of a special effort.? | _o__) ?Douglas Adams, _The Long Dark Tea-Time of the Soul_, 1988 | Ben Finney From steve at pearwood.info Mon Jan 20 01:16:40 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jan 2014 11:16:40 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140119141819.GA8137@python.ca> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> Message-ID: <20140120001640.GW3915@ando> On Sun, Jan 19, 2014 at 08:18:19AM -0600, Neil Schemenauer wrote: > On 2014-01-19, Steven D'Aprano wrote: > > [Neil] > > > - if people install this new version of Python as the default, old > > > scripts and programs will break. [...] > > > > - It gives people an excuse to avoid migrating, and as sure as the sun > > rises in the east, will lead to people calling for Python 2.9 a few > > years from now. > > That would be progress though. My proposed 2.8 would have most of > the incompatible changes from 3.x so if people port it they will be > much closer to 3.x. Progress towards what, though? You say that they will be "closer" to migrating, but another way to look at it is that they will be *further away* from migrating: - the only work they have to do is the easy parts, like adapting from zip returning a list to zip returning an iterator, in other words the part of the migration which can be handled by a simple-minded mechanical script like 2to3; - in return they get access to many of the desirable new features of Python 3; - which reduces their incentive to tackle the big, difficult, structural changes needed for Python 3 (e.g. handling text as Unicode properly). To me, that's a step backwards. One aim here is for the core developers to have one code base to maintain, not two. My grateful thanks to them for taking on all this extra work, and it has been a lot of work, to make it easier for users to migrate, but enough is enough. Adding 2.8 will extend that burden on the core developers by at least three years (18 months of active development plus 18 months of security features); adding 2.9 by the same again. It is entirely appropriate for the core devs to draw a line and say *this is when we stop supporting Python 2*, and that line has been drawn a long time ago at 2.7. If people don't migrate after a decade, they won't migrate after 16 years, especially if they get "all the good bits" apart from the Unicode text model (which many English speakers don't care about), so what you're actually suggesting is that the core devs agree to an extra 3-5 years of maintaining the 2.x series for the sake of people who will very likely never migrate to 3.x. -- Steven From steve at pearwood.info Mon Jan 20 01:23:22 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jan 2014 11:23:22 +1100 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: References: <20140119004515.GP3915@ando> Message-ID: <20140120002322.GX3915@ando> On Sun, Jan 19, 2014 at 07:12:16AM -0500, Terry Reedy wrote: > Are you willing to do any of the work needed to make the option > available, starting with a specification? If so, I have some ideas. Given the amount of controversy over this, it would probably need a PEP. I might be able to start with a pre-PEP, time permitting, and see how that goes. (If those interminable bytes/unicode/2.8 threads on the Python-Dev list would start to die off, I might have more time to treat this seriously.) > >Having to fork the entire compiler just to write a few functions in > >their most idiomatic, natural (recursive) form seems a bit extreme, > >wouldn't you say? > > A 'fork' could consist of a relatively small patch that could be > uploaded to, for instance, PyPI. I would not be surprised if 100-200 > lines might be enough. Lines of *C* though, right? Which means for anyone to use it, they would have to be willing to build Python from source, applying your patch, or the maintainer would have to volunteer to provide pre-built binaries. Neither of which is exactly a recipe for broad take-up. -- Steven From tjreedy at udel.edu Mon Jan 20 01:34:22 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 19:34:22 -0500 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: Message-ID: On 1/19/2014 6:41 PM, Daniel da Silva wrote: > Below is a description of a very simple but immensely useful class > called a "predicate set". In combination with the set and list > comprehensions they would allow another natural layer of reasoning with > mathematical set logic in Python. Sets defined by predicates are usually infinite and mathematical set logic works fine with such. > *Overview:* > Sets in mathematics can be defined by a list of elements without > repetitions, and alternatively by a predicate (function) that determines > inclusion. A predicate set would be a set-like class that is > instantiated with a predicate function that is called to determine ``a > in the_predicate_set''. > > >> myset = predicateset(lambda s: s.startswith('a')) > >> 'xyz' in myset > False > >> 'abc' in myset > True > >> len(myself) > Traceback (most recent call last): > [...] > TypeError This illustrates the problem with the idea. Only containment is really straightforward. (I am aware that some operations could be implemented by defining new predicates. To combines sets with predicatesets, the sets would have to be represented by predicates, as done below.) > *Example Uses:* > # Dynamic excludes in searching > foo_files = search_files('foo', exclude=set(['a.out', 'Makefile'])) > bar_files = search_files('bar', exclude=predicateset(lambda fname: not > fname.endswith('~'))) # exclude *~ > > # Use in place of a set with an ORM > validusernames = predicateset(lambda s: re.match(s, '[a-zA-Z0-9]+')) I think these examples are backwards. The APIs should accept functions either in addition to or instead of collections. It is trivial to turn a collection into a predicate >>> p = {'a', 'b', 'c'}.__contains__ >>> p('a') True >>> p('d') False You need realistic examples that use other operations (but not len ;-). -- Terry Jan Reedy From steve at pearwood.info Mon Jan 20 01:53:37 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jan 2014 11:53:37 +1100 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: References: <20140119004515.GP3915@ando> Message-ID: <20140120005335.GY3915@ando> On Sun, Jan 19, 2014 at 10:31:06PM +1000, Nick Coghlan wrote: > Guido is on record as preferring iterative algorithms as more > comprehensible for more people, and explicitly opposed to adding tail > call optimisation. Many people struggle with recursion. Many people struggle with couroutines, and asychronous programming, and Unicode. Some people never quite get the hang of object oriented programming. That doesn't imply that Python should only offer features which nobody struggles with. It would be a pretty bare language if that were the case :-) > I tend to agree with him - functional programming > works OK in the small (and pure functions are a fine tool for managing > complexity), but to scale up in a way that fits people's brains, you > need to start writing code that looks more like a cookbook. Python is not a pure functional language. Adding TCE won't make it one. If somebody wants to write their app in a pure functional manner, they're either not going to use Python at all, or they'll do it regardless of the lack of TCE and just grumble that Python is only suitable for "toy" applications. But as a *component* of a larger "cookbook" style application, pure functions are great. And some functions are more naturally written in recursive style rather than iterative. I have no interest in writing my entire app as a pure-functional app (if I wanted to do that, I'd use Haskell). But I do have great interest in being able to write functions in the most natural way possible, and that sometimes means recursively, without having to compromise for performance. > If you want inspiration on how to design a language for typical human > thought patterns, look to cookbooks, training guides and operator > manuals, not mathematics. And Python is a great example of that, but it's not really relevant to the idea of adding TCE. Or at least, its no more relevant than are people's grumbles that adding such things as closures and coroutines makes Python more complex and too advanced for "ordinary programmers". Adding TCE need not affect Python as a language. People who like iteration will still write iterative functions. People who think like Java programmers will still write Java in Python, people who think like bash scriptors will still write bash in Python. The only addition is that people who think like Scheme programmers will have one less thing to complain about Python *wink* Most programmers write for themselves, or for a small group. Arguing that Sue (who can think recursively) ought to write her code using an iterative algorithm because Tom and Jerry won't otherwise understand it is not a terribly strong argument when Tom and Jerry aren't in Sue's target audience. -- Steven From mertz at gnosis.cx Mon Jan 20 02:15:44 2014 From: mertz at gnosis.cx (David Mertz) Date: Sun, 19 Jan 2014 17:15:44 -0800 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: I was mostly disliking the idea of TCO during this discussion. However, the idiom of 'return from' seems sufficiently elegant and explicit--and has exactly the semantics you'd expect from 'yield from'--that I am actually +1 on that idea. Being an explicit construct, it definitely becomes a case of "consenting adults" not of implicit magic. I.e. you are declaring right in the code that you don't expect to see a frame in a stack trace, which is fair enough. I mean, if you *really* wanted to you could muck around with 'sys._getframe(N).f_whatever' already which would give inaccurate tracebacks too. Probably there would be a way to removed frames from the stack even, using some such trick in current python. On Sun, Jan 19, 2014 at 3:13 PM, Terry Reedy wrote: > Proposal (mostly not mine): add 'return from f(args)', in analogy with > 'yield from iterator', to return a value to the caller from an execution > frame running f(args) (and either reuse or delete the frame that ran > 'return from'). The function name 'f' would not have to match the name of > the function being compiled, this would actually be TCO, even if it were > nearly always used for recursive tail calls. That does mean that is would > work for mutually tail recursive functions. > > On 1/19/2014 6:57 AM, Joao S. O. Bueno wrote: > >> OTOH, since we are at it, we'd better check >> 2009 BDLF's opinion on the subject: >> >> http://neopythonic.blogspot.com.br/2009/04/tail-recursion- >> elimination.html >> > > I read throught the comments and near the very end, in July 2013, Dan > LaMotte said... ''' > Definitely seems to be complicated/impossible to determine a function is > tail recursion 'compliant' statically in python, however, what if it were > an 'opt in' feature that uses a different 'return' keyword? > > def f(n): > if n > 0: > tailcall f(n - 1) > return 0 > ''' > In additional paragraphs, he noted, among other things, that this makes > the feature 'opt-in' on a function by function basis. > > Guido replied "Dan: your proposal has the redeeming quality of clearly > being a language feature rather than a possible optimization. I don't > really expect there to be enough demand to actually add this to the > language though. Maybe you can use macropy to play around with the idea > though?" > > ???? then suggested 'return from'. My only contribution is to point out > the analogy with the new, and initially strange, 'yield from'. > > Guido seems to have said that if a) someone tries out the idea with > macropy, and b) someone demonstrates enough demand, he might consider > adding such a feature. So this seems to me the best option to pursue to get > something into CPython. I also think it is the best proposal so far. > > As for a), I have not looked as macropy, but: > On 1/19/2014 4:33 PM, Haoyi Li wrote:> MacroPy's @tco decorator is about > as easy as you could ask for. 'pip > > install macropy', 'from macropy.experimental.tco import macros, tco' > > is about as easy as you could ask for. Works for arbitrary tail-calls > > too, not just tail recursion. > > That leaves b) for those of you who want the feature. > > Any PEP should admit that the feature might be abused. Someone might write > return from len(composite) > Unless return from refuses to delete the frame making a call to a C > function, the effect would be to save a trivial O(1) space as the cost of > deleting the most important line of a stack trace should len() raise. But I > think this falls under the 'consenting adults' principle. A proposed doc > should make it clear that the intended use is to make deeply recursive or > mutually recursive functions run and not to replace all tail calls. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Mon Jan 20 02:17:04 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 19 Jan 2014 17:17:04 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: Message-ID: On Sun, Jan 19, 2014 at 3:41 PM, Daniel da Silva wrote: > Below is a description of a very simple but immensely useful class called a > "predicate set". In combination with the set and list comprehensions they > would allow another natural layer of reasoning with mathematical set logic > in Python. Efficiently implementing the set operators (intersection, union, etc.) requires using ROBDDs (reduced ordered binary decision diagrams), which are complex enough to deserve their _own_ library. It's not a simple task, and shouldn't be written from scratch. That said, if you implemented it, and did it efficiently, I'd find it hugely helpful. I ended up implementing it on my own in a bit of a brute force fashion once (I used truth tables instead of BDDs): https://bitbucket.org/devin.jeanpierre/replay/src/4ca3e412e511a9af87c335303c9ab40848be99c0/replay/sets.py?at=default (I make no claims to this being good or correct code) -- Devin From tjreedy at udel.edu Mon Jan 20 02:28:46 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 20:28:46 -0500 Subject: [Python-ideas] Tail Call Optimization (was Re: Tail recursion elimination) In-Reply-To: <20140120002322.GX3915@ando> References: <20140119004515.GP3915@ando> <20140120002322.GX3915@ando> Message-ID: On 1/19/2014 7:23 PM, Steven D'Aprano wrote: > On Sun, Jan 19, 2014 at 07:12:16AM -0500, Terry Reedy wrote: > >> Are you willing to do any of the work needed to make the option >> available, starting with a specification? If so, I have some ideas. Since writing the above, I came across the 'return from' idea, which I think is the best so far, and better than any of the 'ideas' I was thinking of. See my 'return from' post. > Given the amount of controversy over this, it would probably need a PEP. > I might be able to start with a pre-PEP, time permitting, and see how > that goes. (If those interminable bytes/unicode/2.8 threads on the > Python-Dev list would start to die off, I might have more time to treat > this seriously.) >> A 'fork' could consist of a relatively small patch that could be >> uploaded to, for instance, PyPI. I would not be surprised if 100-200 >> lines might be enough. > > Lines of *C* though, right? Yes. > Which means for anyone to use it, they would > have to be willing to build Python from source, applying your patch, or > the maintainer would have to volunteer to provide pre-built binaries. A typical combination is source for *nix and a Windows installer. > Neither of which is exactly a recipe for broad take-up. Use of macropy.experimental.tco would give some indication of the popularity of the idea. Without using it, I do not know how close it is. A 'return from' patch could start by copying the code that recognizes 'yield from' and compiles it to a YIELD_FROM bytecode. (Or by looking at the part of the yield from patch that added the code.) Writing code to implement a RETURN_FROM bytecode, by modifying the RETURN_VALUE function, would be a separate step. -- Terry Jan Reedy From steve at pearwood.info Mon Jan 20 02:49:19 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 20 Jan 2014 12:49:19 +1100 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <1390161660.41249.YahooMailNeo@web181002.mail.ne1.yahoo.com> References: <20140119004515.GP3915@ando> <1390161660.41249.YahooMailNeo@web181002.mail.ne1.yahoo.com> Message-ID: <20140120014919.GZ3915@ando> On Sun, Jan 19, 2014 at 12:01:00PM -0800, Andrew Barnert wrote: > From: Steven D'Aprano [...] > > In fact, in some cases, I *would* willingly give up *non-useful*? > > tracebacks for the ability to write more idiomatic code. Have you seen > > the typical recursive traceback? > > But if you eliminate tail calls, you're not just eliminating recursive > tracebacks; you're eliminating every stack frame that ends in a tail > call. Which includes a huge number of useful frames. > > If you restrict it to _only_ eliminating recursive tail calls, then it > goes from something that can be done at compile time (as I showed in > my previous email) to something that has to be done at runtime, making > every function call slower. And it doesn't work with mutual or > indirect recursion (unless you want to walk the whole stack to see if > the function being called exists higher up?which makes it even slower, > and also gets us back to eliminating useful tracebacks). But if TCE becomes opt-in, say by the proposed "return from" syntax, then you can keep your cake and eat it too. I can decide at *edit* time, "this function should have TCE enabled", and leave the rest of my code to have the "normal" behaviour. If the choice was "TCE everywhere" versus "TCE nowhere", I would choose nowhere too. But it need not be that choice. > > py> a(7) > > Traceback (most recent call last): > > ? File "", line 1, in > > ? File "./rectest.py", line 2, in a > > ? ? return b(n-1) > > ? File "./rectest.py", line 5, in b > > ? ? return c(n-1) + a(n) [...] > > ? File "./rectest.py", line 9, in c > > ? ? return 1/n > > ZeroDivisionError: division by zero > > > > The only thing that I care about is the very last line, that function c > > tries to divide by zero. The rest of the traceback is just noise, I > > don't even look at it. > > Your example is not actually tail-recursive. > > I'm guessing you know this, and decided that having something that > blows up fast just to have an example of a recursive traceback was > more important than having an example that also fits into the rest of > the discussion?which is perfectly reasonable.? Yes, you got me. It was throw away code, which I've since thrown away, but if I recall correctly one of the three functions was tail-recursive. I was more concerned with making the rhetorical point that sometimes the only part of the traceback you care about is the bit that actually fails, at which point the rest of the traceback is noise and you might choose to prefer performance over a more detailed traceback. > But it's still worth calling that out, because at least half the blog > posts out there that say "Python sucks because it doesn't have TCE" > prove Python's suckiness by showing a non-tail-recursive algorithm > that would blow up exactly the same way in Scheme as in Python.? I work with one of those guys :-( > > I'm not suggesting that TCE should be compulsary. I would be happy > > with a commandline switch to turn it on, or better still, a > > decorator to apply it to certain functions and not others. I expect > > that I'd have TCE turned off for debugging. > > But the primary reason people want TCE is to be able to write > functions that otherwise wouldn't run. Nobody asks for TCE because > they're concerned about 2KB wasted on stack traces in their shallow > algorithm; they ask for it because their deep algorithm fails with a > recursion error. So, turning it off to debug it means turning off the > ability to reproduce the error you're trying to debug. You seem to be assuming that bugs in deep algorithms only manifest themselves in sufficiently deep data sets that turning TCE off will cause a recursion error before the true bug manifests, thus masking the bug you care about by mere lack of resources. I don't believe that is the case for all bugs, or even a majority. If it is true for some bugs -- of course it will be -- then a solution is to add enough temporary debugging code (e.g. logging, or even just good ol' print) to see enough of what is going on that you can identify the bug, stacktrace or no stacktrace. Chances are you would have to write some temporary debugging code regardless of whether the algorithm was iterative or recursive, TCE or no TCE. -- Steven From tjreedy at udel.edu Mon Jan 20 02:58:17 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 19 Jan 2014 20:58:17 -0500 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: On 1/19/2014 8:15 PM, David Mertz wrote: > On Sun, Jan 19, 2014 at 3:13 PM, Terry Reedy > Proposal (mostly not mine): add 'return from f(args)', in analogy > with 'yield from iterator', to return a value to the caller from an > execution frame running f(args) (and either reuse or delete the > frame that ran 'return from'). The function name 'f' would not have > to match the name of the function being compiled, this would > actually be TCO, even if it were nearly always used for recursive > tail calls. That does mean that is would work for mutually tail > recursive functions. > I was mostly disliking the idea of TCO during this discussion. However, > the idiom of 'return from' seems sufficiently elegant and explicit--and > has exactly the semantics you'd expect from 'yield from'--that I am > actually +1 on that idea. > > Being an explicit construct, it definitely becomes a case of "consenting > adults" not of implicit magic. I.e. you are declaring right in the code > that you don't expect to see a frame in a stack trace, which is fair > enough. I mean, if you *really* wanted to you could muck around with > 'sys._getframe(N).f_whatever' already which would give inaccurate > tracebacks too. Probably there would be a way to removed frames from > the stack even, using some such trick in current python. Acting upon encountering a call-return bytecode pair has the following problems. 1. It is CPython specific and probably not portable to all implementations. Guido has cited this as a major block. 2. It must by optional, but how? 2A. A command line option is too broad. For some inputs, functions would return or crash depending on the option. Not good. Also, command line options do not work well when starting Python with icons. 2B. A future import would have a narrower scope but still might be too broad. It would also be an abuse because the 'future' would be a fake future that is partly now and partly never. 2C. A sys flag has the non-icon problems of a command line option. An explicit indicator in the function avoids most of these problems. The only one I am not sure about is other implementations, but with explicit system independent syntax, there is at least a chance. A developer can temporarily switch back to return (with small enough input) to get a full stack trace for exactly one function, just as one can temporarily add 'print' to get a 'loop trace' for exactly one loop. -- Terry Jan Reedy From stephen at xemacs.org Mon Jan 20 04:06:14 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 20 Jan 2014 12:06:14 +0900 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140120001640.GW3915@ando> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> <20140120001640.GW3915@ando> Message-ID: <87fvoj9wt5.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > To me, that's a step backwards. I agree, but this kind of "step backwards" is a "consenting adults" issue. So let's avoid such pejorative terminology, and stick to the line that a lot of resources would be required to create such a Python 2.8, and there's little benefit to be had. From rosuav at gmail.com Mon Jan 20 04:30:35 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jan 2014 14:30:35 +1100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <87fvoj9wt5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140118032219.GA11381@python.ca> <20140119022811.GR3915@ando> <20140119141819.GA8137@python.ca> <20140120001640.GW3915@ando> <87fvoj9wt5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jan 20, 2014 at 2:06 PM, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > To me, that's a step backwards. > > I agree, but this kind of "step backwards" is a "consenting adults" > issue. So let's avoid such pejorative terminology, and stick to the > line that a lot of resources would be required to create such a Python > 2.8, and there's little benefit to be had. No, I'm with Steven on this. (Steven with a v, as opposed to Stephen with a ph. It's like talking to the detectives in Tintin.) Even if it cost no resources at all - if Python 2.8 already existed, exactly as described - it would be a third Python to aim for (as well as 2.7 and 3.x). It's already hard enough to span lots of Python versions; adding another that's deliberately and consciously incompatible with both the primary branches would be a major problem. It may be that code that runs on 2.7 and 3.4 will also automatically run on 2.8 (which seems possible, but far from certain), but if not, 2.8 would cause problems for everyone who tries to write code for every supported version. For anything other than in-house scripts where one person/team controls both the script and the interpreter it runs on, compatibility with multiple versions will be critical; and adding something incompatible with both current versions is an XKCD 927 situation [1]. No matter how cheap or expensive it is to do, that's a problem *in itself*, so the proposal has to justify itself enough to overcome that. ChrisA [1] http://xkcd.com/927/ From bruce at leapyear.org Mon Jan 20 04:40:42 2014 From: bruce at leapyear.org (Bruce Leban) Date: Sun, 19 Jan 2014 19:40:42 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <20140120000645.GV3915@ando> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> Message-ID: On Sun, Jan 19, 2014 at 4:06 PM, Steven D'Aprano wrote: > On Sun, Jan 19, 2014 at 03:44:32PM -0800, Ethan Furman wrote: > > On 01/19/2014 03:35 PM, Ben Finney wrote: > > > > > >In other words, those who want Python 2 to continue need to either bite > > >the bullet and move their migration to Python 3 forward > > > > Um, if they want Python 2 to continue, why would they migrate to Python > 3? > > Because you can't always get what you want. I want a pony, but since I > can't afford one or have any place to keep it, I've made do without. I think the odds of Python getting from __future__ import pony are slightly higher than there being a Python 2.8. I assume by "pony" you really mean what I'd like to have: from __future__ import everything since my goal is to write Python 3 compatible code even though I'm temporarily stuck with Python 2 due to stack issues. The __future__ imports makes it easier to write forward compatible code. As it is, I have to list the individual imports in every file and I also add: range = xrange --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Jan 20 05:19:44 2014 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 20 Jan 2014 04:19:44 +0000 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> Message-ID: <52DCA3E0.1030608@mrabarnett.plus.com> On 2014-01-20 03:40, Bruce Leban wrote: > > On Sun, Jan 19, 2014 at 4:06 PM, Steven D'Aprano > wrote: > > On Sun, Jan 19, 2014 at 03:44:32PM -0800, Ethan Furman wrote: > > On 01/19/2014 03:35 PM, Ben Finney wrote: > > > > > >In other words, those who want Python 2 to continue need to > either bite > > >the bullet and move their migration to Python 3 forward > > > > Um, if they want Python 2 to continue, why would they migrate to > Python 3? > > Because you can't always get what you want. I want a pony, but since I > can't afford one or have any place to keep it, I've made do without. > > > I think the odds of Python getting > > from __future__ import pony > > are slightly higher than there being a Python 2.8. I assume by "pony" > you really mean what I'd like to have: > > from __future__ import everything > That should be: from __future__ import * although it would still be discouraged because you might find that you're no longer able to get at some of the stuff you have already. :-) > since my goal is to write Python 3 compatible code even though I'm > temporarily stuck with Python 2 due to stack issues. The __future__ > imports makes it easier to write forward compatible code. As it is, I > have to list the individual imports in every file and I also add: > > range = xrange > From abarnert at yahoo.com Mon Jan 20 07:15:20 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jan 2014 22:15:20 -0800 Subject: [Python-ideas] Tail recursion elimination In-Reply-To: <20140120014919.GZ3915@ando> References: <20140119004515.GP3915@ando> <1390161660.41249.YahooMailNeo@web181002.mail.ne1.yahoo.com> <20140120014919.GZ3915@ando> Message-ID: On Jan 19, 2014, at 17:49, Steven D'Aprano wrote: > But if TCE becomes opt-in, say by the proposed "return from" syntax, > then you can keep your cake and eat it too. I can decide at *edit* time, > "this function should have TCE enabled", and leave the rest of my code > to have the "normal" behaviour. My first post on the subject suggested adding a new keyword (I think I used "tailcall", borrowed from Guido's post) to do explicit tail calls, and only building TCE as an automatic optimization on top of it (which I'm pretty sure could be done with a trivial peephole optimizer rule) if you still need it after that. So obviously, I agree with this. And yes, "return from" is definitely better than "tailcall"--readable and understandable, no new keyword, etc. And I still think this would be a fun project even though I don't think I would ever use it. I tried effectively this same design against Stackless 2.6 a few years ago, but it sometimes leaked, and would crash whenever a C function called a Python function that tail called, and I ran out of free time to debug any further. The point is, this isn't a massive impossible project; many of the people insisting they want it are probably capable of writing it, even if they've never tried hacking on the interpreter. (The grammar is a huge pain the first time, however...) From abarnert at yahoo.com Mon Jan 20 07:36:25 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jan 2014 22:36:25 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> Message-ID: <6A2C5FFF-D3F3-480D-8309-BA7C3DAC80D4@yahoo.com> On Jan 19, 2014, at 19:40, Bruce Leban wrote: > > On Sun, Jan 19, 2014 at 4:06 PM, Steven D'Aprano wrote: >> On Sun, Jan 19, 2014 at 03:44:32PM -0800, Ethan Furman wrote: >> > On 01/19/2014 03:35 PM, Ben Finney wrote: >> > > >> > >In other words, those who want Python 2 to continue need to either bite >> > >the bullet and move their migration to Python 3 forward >> > >> > Um, if they want Python 2 to continue, why would they migrate to Python 3? >> >> Because you can't always get what you want. I want a pony, but since I >> can't afford one or have any place to keep it, I've made do without. > > I think the odds of Python getting > > from __future__ import pony > > are slightly higher than there being a Python 2.8. I assume by "pony" you really mean what I'd like to have: > > from __future__ import everything If that existed, I wouldn't use it. Without it, I know my 2.6+/3.3+ code will work until 3.7. With it, if 3.5 added a new future feature, my code may only work until 3.4. That's not worth it for the convenience of saving a few characters. > since my goal is to write Python 3 compatible code even though I'm temporarily stuck with Python 2 due to stack issues. The __future__ imports makes it easier to write forward compatible code. As it is, I have to list the individual imports in every file and I also add: > > range = xrange There are only four live future features in 2.6 and 2.7, and you can fit them all into one statement that fits in 80 columns. Which you can put into your project template, and then you're done with it. And then I usually have one more line, "from sixify import *", where sixify is a project-specific collection of imports from six. (And then the challenge is fighting to stop people from putting non-six-related things into sixify and turning it into one of those "stdafx.h" messes that every windows c++ app has.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jan 20 08:56:49 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 19 Jan 2014 23:56:49 -0800 (PST) Subject: [Python-ideas] Predicate Sets In-Reply-To: References: Message-ID: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> From: Daniel da Silva Sent: Sunday, January 19, 2014 3:41 PM >Overview: >? ? Sets in mathematics can be defined by a list of elements without repetitions, and alternatively by a predicate (function) that determines inclusion. The whole point of modern set theory is that sets cannot be defined by a predicate alone; only by a predicate _and a set to apply it over_.?Which we already have in set comprehensions. And your suggestion has the exact same problem that naive set theory had: >>> myset = predicateset(lambda s: s.startswith('a')) >>> 'xyz' in myset False >>> russellset = predicateset(lambda s: s not in s) >>> russellset in russelset Presumably this should cause the computer to scream "DOES NOT COMPUTE!" and blow up, which I think would be hard to implement in CPython. Still, this could be useful despite not being mathematically consistent.?Python functions don't have to be mathematical functions, and you could easily just state that using a predicateset that turns out to be a proper class as undefined behavior, so it's perfectly acceptable if an implementation wants to hang forever or fail with a recursion error or whatever. Anyway, the way you've designed this, as far as I can tell, there's nothing stopping it from being a module on PyPI that you can come back and propose for inclusion in the stdlib if a lot of people start using it. So I'd say go for it. (And you can even propose syntax, a comprehension with no for clause: {x if expression(x)}, if it's popular enough that seems warranted.) Also, this isn't a Set in Python terms?or an Iterable or a Sized; it's just a Container. Which is perfectly reasonable, and means len(s) and iter(s) failing is exactly what you should expect. But the name could lead people to expect it to be a Set. Then again, "predicatecontainer" sounds horrible, so maybe the small potential for confusion is fine. You still need to work out the details. Most of them seem easy, but there are some interesting questions. ?* It's presumably immutable, and therefore Hashable. (It can fail if its predicate isn't?which most callables are, but that's not guaranteed?but I believe that's fine for Hashables.) ?* Is the predicate callable accessible through a public name, or do you have to access it through __contains__? ?* Presumably intersection, union, difference, and symmetric_difference with another predicateset do the obvious thing (or/and/and not/xor the predicates). Or is there something more efficient you could do? There are some modules on PyPI that deal with boolean combinations of predicates; maybe just borrow the design or even import the implementation from one of them? ?* intersection with a set or other Iterable can return a set, equivalent to {x for x in s if x in ps}. And __rand__ allows it to work in the wrong direction when using the operator. But set.intersection(predicateset) will raise a TypeError, and there's not much you can do about that. (And the same goes for the other methods.) ?* union, difference, and symmetric difference with an Iterable presumably turns the other argument into a predicateset(x in s) and then operates on that? Or is there a better way to do it? ?* isdisjoint with a set or other iterable is easy, but what about with another predicateset? An error? ?* issubset and issuperset don't seem implementable, except in the special case that one predicate is made by intersection or union from the other; do they just not exist? ?* Do you want other operations from naive set theory that don't make sense for Python sets, like the unconstrained complement? They could all be implemented with the existing operations and a set of all things (e.g., self.complement() is just predicateset(lambda x: True)).difference(self)), so maybe not. But they might be convenient. (Again, tying in with the boolean-predicates libraries, most of them have a "not" type operation.) The big problem is coming up with a compelling use case. This one doesn't sell me: ? ? bar_files = search_files('bar', exclude=predicateset(lambda fname: not fname.endswith('~')))? It seems like it make more sense to have exclude take a function, so you could just write: ? ? bar_files = search_files('bar', exclude=lambda fname: not fname.endswith('~')) In general, calling a function is just as easy, natural, and readable as testing membership; calling filter or using a comprehension would generally be simpler than creating a predicateset just to use intersection; etc. And in cases where sometimes a container is useful, but sometimes a function is better? well, look at?re.sub or BeautifulSoup.find. I've seen people who didn't know that you could pass a function to re.sub, but nobody who, on seeing it, had any trouble understanding what it did. Maybe there's a use for "legacy" APIs that were designed around containers and would be hard to change. For example, many file-picker dialogs let you specify the acceptable extensions, but not a filter function. But in most cases, that's because they're ultimately calling some underlying C/ObjC/.NET/whatever function that needs an array, and a predicateset won't help there anyway. (Or, put another way, they're not designed around containers, they're designed around iterables.) From rosuav at gmail.com Mon Jan 20 09:09:53 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 20 Jan 2014 19:09:53 +1100 Subject: [Python-ideas] Predicate Sets In-Reply-To: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: On Mon, Jan 20, 2014 at 6:56 PM, Andrew Barnert wrote: > Also, this isn't a Set in Python terms?or an Iterable or a Sized; it's just a Container. Which is perfectly reasonable, and means len(s) and iter(s) failing is exactly what you should expect. But the name could lead people to expect it to be a Set. Then again, "predicatecontainer" sounds horrible, so maybe the small potential for confusion is fine. > If I might be permitted to bikeshed the name a little: My first thought (from the subject line) was that this was a set *of* predicates, not a set *defined by a* predicate. But a frozenset isn't a set of frozens either, so this might be less confusing than I thought. ChrisA From ncoghlan at gmail.com Mon Jan 20 09:55:57 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Jan 2014 18:55:57 +1000 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: On 20 Jan 2014 11:16, "David Mertz" wrote: > > I was mostly disliking the idea of TCO during this discussion. However, the idiom of 'return from' seems sufficiently elegant and explicit--and has exactly the semantics you'd expect from 'yield from'--that I am actually +1 on that idea. I agree that a PEP for "return from" would be interesting. It also gives debuggers something to latch on to in order to handle the new scenario (just as they needed some adjustment to handle "yield from"). "return from" could also be explicitly disallowed in try blocks and with statements (since those inherently conflict with the idea of reusing the current frame for a different call). By keeping a list of references to the ellided calls (perhaps using counts for more efficient handling of recursive calls), you could even partially reconstruct the missing parts of the traceback. > Being an explicit construct, it definitely becomes a case of "consenting adults" not of implicit magic. I.e. you are declaring right in the code that you don't expect to see a frame in a stack trace, which is fair enough. I mean, if you *really* wanted to you could muck around with 'sys._getframe(N).f_whatever' already which would give inaccurate tracebacks too. Probably there would be a way to removed frames from the stack even, using some such trick in current python. Yep, we do that (from C) in importlib to try to reduce the infrastructure noise in the tracebacks shown to users. Cheers, Nick. > > > On Sun, Jan 19, 2014 at 3:13 PM, Terry Reedy wrote: >> >> Proposal (mostly not mine): add 'return from f(args)', in analogy with 'yield from iterator', to return a value to the caller from an execution frame running f(args) (and either reuse or delete the frame that ran 'return from'). The function name 'f' would not have to match the name of the function being compiled, this would actually be TCO, even if it were nearly always used for recursive tail calls. That does mean that is would work for mutually tail recursive functions. >> >> On 1/19/2014 6:57 AM, Joao S. O. Bueno wrote: >>> >>> OTOH, since we are at it, we'd better check >>> 2009 BDLF's opinion on the subject: >>> >>> http://neopythonic.blogspot.com.br/2009/04/tail-recursion-elimination.html >> >> >> I read throught the comments and near the very end, in July 2013, Dan LaMotte said... ''' >> Definitely seems to be complicated/impossible to determine a function is tail recursion 'compliant' statically in python, however, what if it were an 'opt in' feature that uses a different 'return' keyword? >> >> def f(n): >> if n > 0: >> tailcall f(n - 1) >> return 0 >> ''' >> In additional paragraphs, he noted, among other things, that this makes the feature 'opt-in' on a function by function basis. >> >> Guido replied "Dan: your proposal has the redeeming quality of clearly being a language feature rather than a possible optimization. I don't really expect there to be enough demand to actually add this to the language though. Maybe you can use macropy to play around with the idea though?" >> >> ???? then suggested 'return from'. My only contribution is to point out the analogy with the new, and initially strange, 'yield from'. >> >> Guido seems to have said that if a) someone tries out the idea with macropy, and b) someone demonstrates enough demand, he might consider adding such a feature. So this seems to me the best option to pursue to get something into CPython. I also think it is the best proposal so far. >> >> As for a), I have not looked as macropy, but: >> On 1/19/2014 4:33 PM, Haoyi Li wrote:> MacroPy's @tco decorator is about as easy as you could ask for. 'pip >> > install macropy', 'from macropy.experimental.tco import macros, tco' > is about as easy as you could ask for. Works for arbitrary tail-calls > too, not just tail recursion. >> >> That leaves b) for those of you who want the feature. >> >> Any PEP should admit that the feature might be abused. Someone might write >> return from len(composite) >> Unless return from refuses to delete the frame making a call to a C function, the effect would be to save a trivial O(1) space as the cost of deleting the most important line of a stack trace should len() raise. But I think this falls under the 'consenting adults' principle. A proposed doc should make it clear that the intended use is to make deeply recursive or mutually recursive functions run and not to replace all tail calls. >> >> -- >> Terry Jan Reedy >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Mon Jan 20 11:26:27 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 20 Jan 2014 02:26:27 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: On Sun, Jan 19, 2014 at 11:56 PM, Andrew Barnert wrote: > From: Daniel da Silva >>Overview: >> Sets in mathematics can be defined by a list of elements without repetitions, and alternatively by a predicate (function) that determines inclusion. > > The whole point of modern set theory is that sets cannot be defined by a predicate alone; only by a predicate _and a set to apply it over_. Which we already have in set comprehensions. > > And your suggestion has the exact same problem that naive set theory had: > >>>> myset = predicateset(lambda s: s.startswith('a')) >>>> 'xyz' in myset > False > >>>> russellset = predicateset(lambda s: s not in s) >>>> russellset in russelset > > Presumably this should cause the computer to scream "DOES NOT COMPUTE!" and blow up, which I think would be hard to implement in CPython. > > Still, this could be useful despite not being mathematically consistent. No; what you have shown is that a predicateset can't both accept the function you specified, and also have its containment method always return a value (as opposed to raising an exception or not halting). You have not shown that the idea of a predicateset is inherently contradictory, unless that idea includes both of those facts -- and that would indeed be silly, since, as you've shown, that is an idea with self-contradicting requirements. In contrast, naive set theory thought all of those things: a set can be defined in that way, and a set either contains something or not, but not neither and not both. And Russell proved that this is impossible. There is not any kind of fundamental problem with the idea of a Python set-like object defined by Python predicates. Python sets aren't mathematical sets, and Python predicates aren't mathematical predicates. Things can be different from how they are described in mathematics, without being internally inconsistent, and without being useless. [...] > The big problem is coming up with a compelling use case. [...] > In general, calling a function is just as easy, natural, and readable as testing membership; calling filter or using a comprehension would generally be simpler than creating a predicateset just to use intersection; etc. Yes. If a predicate set is just a thin wrapper around predicates, it is pointless. IMO the only utility of specially wrapping predicates is allowing them to be combined efficiently, but the bulk of the work there is just in manipulating sets of bitvectors (best done with ROBDDs as far as I know). Arguably the work after that is trivial. -- Devin From denis.spir at gmail.com Mon Jan 20 13:12:31 2014 From: denis.spir at gmail.com (spir) Date: Mon, 20 Jan 2014 13:12:31 +0100 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> Message-ID: <52DD12AF.2090309@gmail.com> On 01/19/2014 11:32 PM, musicdenotation at gmail.com wrote: > It fits peoples' brains more because of familiarity, not "nature". That people often use "intuitive" or "natural" instead of "famliar" or "usual" does not mean, logically, that there is no better intuitive or natural choice. That people misuse a term does not imply it has no proper meaning. For instance, closed intervals are more intuitive or natural, obviously (but for some reason I don't know). If you ask someone to count from 1 to 9, you will probably be surprised to hear him/her start from 2 or stop after 8. If you are asked to choose a letter between c and g, you will probably be surprised to hear that 'c' or 'g' is no good choice. [This does not mean that closed intervals are the right choice in programming, i'm just discuting the notions of intuitive or natural; this is related to the way we spontaneously think or understand. Programming may require unintuitive or unnatural design choices, for some other, independant reasons; dunno. For the matter, I think the right choice may be neither [i,j] closed nore [i,k[ half-closed intervals, but (i,n) ranges, where n is the number of items.] About the case of recursivity, whether it may be intuitive or natural, I think (see some previous post) that is very, very hard to judge. It is so abstract, and obviously difficult to catch. It require understanding recurrence (remember difficulty of most people at school?) and then tuuning it inside out *in mind* like a sock ;-), to produce an algo running backwards, and still understanding that it will do the right thing (because in fact it computes forwards behind the stage, which is totally implicit, and again hard to get). About optimisation of tail calls, I share Guido's "pronouncement". Mainly because these optimisable (backward) recursive algo are the ones one can easily express by a forward algo (using loops and/or corecursivity), if I understanding the issue well (which i'm not 100% sure, but I don't know any counter-example). The issue of stack traces and programmer feedback is just for me another reason (not decisive because such algos often require inserting debug prints anyway, to understand what actually happens and/or diagnose a bug). Denis From rosuav at gmail.com Mon Jan 20 14:09:35 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 21 Jan 2014 00:09:35 +1100 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: <52DD12AF.2090309@gmail.com> References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> Message-ID: On Mon, Jan 20, 2014 at 11:12 PM, spir wrote: > For instance, closed intervals are more intuitive or natural, obviously (but > for some reason I don't know). If you ask someone to count from 1 to 9, you > will probably be surprised to hear him/her start from 2 or stop after 8. If > you are asked to choose a letter between c and g, you will probably be > surprised to hear that 'c' or 'g' is no good choice. I'm not so sure about that. The half-open interval makes as much sense as the fully closed - all you have to do is interpret the indices as being *between* elements. Take, for example, Scripture verses. (Quotes taken from THE HOLY BIBLE, NEW INTERNATIONAL VERSION?, NIV? Copyright ? 1973, 1978, 1984, 2011 by Biblica, Inc.? Used by permission. All rights reserved worldwide. Copyright notice included for license compliance. Note that I'm using bracketed numbers to indicate the beginnings of verses - in a printed Bible, these would normally be in superscript.) John 14: [31] To the Jews who had believed him, Jesus said, ?If you hold to my teaching, you are really my disciples. [32] Then you will know the truth, and the truth will set you free.? [33] This passage is normally referred to as "John 14:31-32", but as you see, the verse marker [32] is in the middle of the quote. Using a half-open interval, this would start at "John 14:31" and end at "John 14:33". Half-open means: "Begin at the beginning, go on till you come to the end, then stop", as the King of Hearts instructed the White Rabbit. It's easy to indicate the beginning of a chapter: your start reference is verse 1. Here's the beginning of the account of the creation of the world: [1] In the beginning God created the heavens and the earth. [2] Now the earth was formless and empty, darkness was over the surface of the deep, and the Spirit of God was hovering over the waters. [3] And God said, ?Let there be light,? and there was light. [4] God saw that the light was good, and he separated the light from the darkness. [5] God called the light ?day,? and the darkness he called ?night.? And there was evening, and there was morning?the first day. [6] Common parlance: Genesis 1:1-5. Half-open: Genesis 1:1-6. Conclusion: Tie. No argument to be made for either side. But what if you're looking at the *end* of a chapter? Here are a few verses from later on in Genesis 1: [29] Then God said, ?I give you every seed-bearing plant on the face of the whole earth and every tree that has fruit with seed in it. They will be yours for food. [30] And to all the beasts of the earth and all the birds of the air and all the creatures that move on the ground?everything that has the breath of life in it?I give every green plant for food.? And it was so. [31] God saw all that he had made, and it was very good. And there was evening, and there was morning?the sixth day. Common parlance: Genesis 1:29-31. Half-open: Genesis 1:29-2:1. It's much more obvious by the latter that this passage extends exactly to the end of the chapter. Obviously it's way WAY too late to change the way Bible references are written, any more than Melway could renumber their maps all of a sudden. Massive case of lock-in and backward-incompatibility with existing code. But I put it to you that the half-open would make at least as much sense as the closed, in any situation where there are boundaries with contents between them. Note, by the way, that I'm not looking at anything involving backward scanning or wider strides, both of which Python's slice notation supports. Neither of those is inherently real-world intuitive, so the exact semantics can be defined as whatever makes sense in code. (And there was some discussion a little while ago about exactly that.) I'm just looking at the very simple and common case of referencing a subset of consecutive elements from a much larger whole. The closed interval makes more sense when the indices somehow *are* the values being retrieved. When you count from 1 to 9, you expect nine numbers: 1, 2, ..., 8, 9. When you list odd numbers from 1 to 9, you expect 1, 3, 5, 7, 9. But what if you're looking at a container train and numbering the twenty-foot-equivalent-units (TEU) that it has? A 40-foot container requires 2 TEU, a 60-foot container requires 3 TEU. A "reefer" (refridgerated container) might require an extra slot, or at least it might be a 56-footer and consume 3 TEU. One wagon might, if you're lucky, carry 5 TEU; numbering them 1 through 5 would be obvious, but numbering the boundaries between them as 0 through 5 is better at handling the multiple TEU containers. (Even more so when you look at double-stacked containers. An over-height 40-foot container could consume 2 TEU horizontally and 2 TEU vertically, and be put in slots (0,0)-(2,2). This is, in fact, exactly how a GTK2 Table layout works.) Both types of intervals have their places. ChrisA From denis.spir at gmail.com Mon Jan 20 15:29:47 2014 From: denis.spir at gmail.com (spir) Date: Mon, 20 Jan 2014 15:29:47 +0100 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> Message-ID: <52DD32DB.3040403@gmail.com> On 01/20/2014 02:09 PM, Chris Angelico wrote: > On Mon, Jan 20, 2014 at 11:12 PM, spir wrote: >> For instance, closed intervals are more intuitive or natural, obviously (but >> for some reason I don't know). If you ask someone to count from 1 to 9, you >> will probably be surprised to hear him/her start from 2 or stop after 8. If >> you are asked to choose a letter between c and g, you will probably be >> surprised to hear that 'c' or 'g' is no good choice. > > I'm not so sure about that. The half-open interval makes as much sense > as the fully closed - all you have to do is interpret the indices as > being *between* elements. Take, for example, Scripture verses. (Quotes > taken from THE HOLY BIBLE, NEW INTERNATIONAL VERSION?, NIV? Copyright > ? 1973, 1978, 1984, 2011 by Biblica, Inc.? Used by permission. All > rights reserved worldwide. Copyright notice included for license > compliance. Note that I'm using bracketed numbers to indicate the > beginnings of verses - in a printed Bible, these would normally be in > superscript.) > > John 14: > [31] To the Jews who had believed him, Jesus said, ?If you hold to my > teaching, you are really my disciples. [32] Then you will know the > truth, and the truth will set you free.? [33] > > This passage is normally referred to as "John 14:31-32", but as you > see, the verse marker [32] is in the middle of the quote. Using a > half-open interval, this would start at "John 14:31" and end at "John > 14:33". Half-open means: "Begin at the beginning, go on till you come > to the end, then stop", as the King of Hearts instructed the White > Rabbit. > > It's easy to indicate the beginning of a chapter: your start reference > is verse 1. Here's the beginning of the account of the creation of the > world: > > [1] In the beginning God created the heavens and the earth. [2] Now > the earth was formless and empty, darkness was over the surface of the > deep, and the Spirit of God was hovering over the waters. [3] And God > said, ?Let there be light,? and there was light. [4] God saw that the > light was good, and he separated the light from the darkness. [5] God > called the light ?day,? and the darkness he called ?night.? And there > was evening, and there was morning?the first day. [6] > > Common parlance: Genesis 1:1-5. Half-open: Genesis 1:1-6. Conclusion: > Tie. No argument to be made for either side. But what if you're > looking at the *end* of a chapter? Here are a few verses from later on > in Genesis 1: > > [29] Then God said, ?I give you every seed-bearing plant on the face > of the whole earth and every tree that has fruit with seed in it. They > will be yours for food. [30] And to all the beasts of the earth and > all the birds of the air and all the creatures that move on the > ground?everything that has the breath of life in it?I give every green > plant for food.? And it was so. [31] God saw all that he had made, and > it was very good. And there was evening, and there was morning?the > sixth day. > > Common parlance: Genesis 1:29-31. Half-open: Genesis 1:29-2:1. It's > much more obvious by the latter that this passage extends exactly to > the end of the chapter. I do agree with your reasoning, it is indeed totally logical. However, it is not at all intuitive or natural (maybe tis is why Bible refs do not work your way ;-) dunno). This is probably related to the issue of prog indices interpreted as ordinals [*] or offsets. Aparently, obviously in fact, people intuitively or naturally interpret them as ordinals; which breaks your logic or conflicts with it. Whether it's "much more obvious" (quoting you in the last parag above) is a also question of how you interpret indices: if they're ordinals for you, then Genesis 1:29-31 is perfectly clear on where the ref'ed passage stops. [*] "ordinal" in the mathematical or linguistic sense, meaning a natural number holding the rank of an item in a sequence (not python's ord()) > Obviously it's way WAY too late to change the way Bible references are > written, any more than Melway could renumber their maps all of a > sudden. Massive case of lock-in and backward-incompatibility with > existing code. But I put it to you that the half-open would make at > least as much sense as the closed, in any situation where there are > boundaries with contents between them. > > Note, by the way, that I'm not looking at anything involving backward > scanning or wider strides, both of which Python's slice notation > supports. Neither of those is inherently real-world intuitive, so the > exact semantics can be defined as whatever makes sense in code. (And > there was some discussion a little while ago about exactly that.) I'm > just looking at the very simple and common case of referencing a > subset of consecutive elements from a much larger whole. > > The closed interval makes more sense when the indices somehow *are* > the values being retrieved. You are right; see also note below on the case where [i,k[ is actually advantageous by itself. > When you count from 1 to 9, you expect > nine numbers: 1, 2, ..., 8, 9. When you list odd numbers from 1 to 9, > you expect 1, 3, 5, 7, 9. But what if you're looking at a container > train and numbering the twenty-foot-equivalent-units (TEU) that it > has? A 40-foot container requires 2 TEU, a 60-foot container requires > 3 TEU. A "reefer" (refridgerated container) might require an extra > slot, or at least it might be a 56-footer and consume 3 TEU. One wagon > might, if you're lucky, carry 5 TEU; numbering them 1 through 5 would > be obvious, but numbering the boundaries between them as 0 through 5 > is better at handling the multiple TEU containers. (Even more so when > you look at double-stacked containers. An over-height 40-foot > container could consume 2 TEU horizontally and 2 TEU vertically, and > be put in slots (0,0)-(2,2). This is, in fact, exactly how a GTK2 > Table layout works.) Both types of intervals have their places. I also think there may be 2 kinds of notations for slices and such, one beeing [i,j] and the other maybe (i,n) where n is the number of items, rather than [i,k[ where k is the "post-last" or "past-the-end" index. Reasons to think on that path: * since n is not an index, it avoids all thinking trouble and misinterpretations with k as opposed to j; in particular, it avoids the "intuitive conflict" evoked above * n makes sense and is useful by itself (eg think at typical arrays {p,n} or slices/views {i,n}, or at algos for copy, compare, traversal, concat, map...) * when [i,k[ works better than [i,j], most often it's because we have n (k=i+n) or need n (n=k-j), thus we avoid +1 or -1; this, rather than any worth of k by itself * other cases where [i,k[ seems to work nicely is "self-feeding" in fact: we have & need [i,k[ just because the lang uses that, but the same would be true whatever the interval notation (eg the lang returns i,k from a builtin func searching something in a seq, and we then use it to get a subseq) * the only advantage of k by itself, logically, I think, is when scanning a non-terminated token (eg a number): we must pass the last item (digit) to know the token is finished, thus end up holding i & k, not i & j; however, if we use (i,n) notation, it's easy enough to write (i,k-i), so no big deal, just as in the opposite case; and this situation is obviously, i guess, a little minority of uses of intervals [1] Anyway, i think only practice of alternatives and talk among non-ideologically blinded programmers can tell us what's worth or not. Denis [1] However, from a semantic point of view, [i,k[ is problematic even in this very case where it seems nicer at first sight, because we don't need to type -1. Say we're scanning for a number and there is "1234567" in source: <-----> n 1234567 i jk We get i & k. If the lang uses [i,k[ intervals, then we just write it that way to get the right substring, and are pleased not to have to type -1. However, the "semantic truth" (if I may say) is that we stopped scanning *after* the last digit, and need to slice up to the *previous* character. This is not written in s[i,k]; where is the idea "up to the previous position" expressed in this notation? "Previous" translates to -1 in arithmetic or programming. For the notation to be semantically correct, it should say "-1" somewhere. And this is why closed intervals s[i,k-1] are superior, from the semantic perspective, even in this case, the very case where half-open intervals superficially look nicer. Half-open intervals do not say what they mean, so-to-say, they cheat ;-) (booh!) A related point (semantics, thinking) is that, as far as i know, many programmers in langs using [i,k[ just do *not* think it. They just know from exp that it just works in most cases (reasons listed above) but do not think, for instance in this case along the lines: "all right, we stop after the last digit, thus need to slice up to the previous position, thus a half-open interval is right here". No, they seem to do it blindly like an automat. I asked other programmers about that when I noticed it was true by me (i use it blidly, don't know on a given case why/how it works unless I stop and *start* to think). I just prefer (to be) a programmer who thinks than a coding machine, but it's just me. From rosuav at gmail.com Mon Jan 20 15:50:23 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 21 Jan 2014 01:50:23 +1100 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: <52DD32DB.3040403@gmail.com> References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> <52DD32DB.3040403@gmail.com> Message-ID: On Tue, Jan 21, 2014 at 1:29 AM, spir wrote: > I also think there may be 2 kinds of notations for slices and such, one > beeing [i,j] and the other maybe (i,n) where n is the number of items, > rather than [i,k[ where k is the "post-last" or "past-the-end" index. This is why REXX has the "DO... FOR" loop syntax. You can code a loop thus: do i=1 to 5 /* 1, 2, 3, 4, 5 */ do i=1 to 5 by 2 /* 1, 3, 5 */ do i=1 by 2 for 6 /* 1, 3, 5, 7, 9, 11 */ The 'for N' criterion specifies the number of iterations to do, regardless of the stop position. (REXX doesn't have slice notation, so loops are the nearest equivalent.) It would be quite reasonable to create a slice-like object in Python, but I'm not sure how to put all of this functionality into syntax that's tight enough to be useful - nobody wants to write foo[slice(1,None,2,count=5)] ! ChrisA From denis.spir at gmail.com Mon Jan 20 16:29:39 2014 From: denis.spir at gmail.com (spir) Date: Mon, 20 Jan 2014 16:29:39 +0100 Subject: [Python-ideas] return from -- breadth of usage In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: <52DD40E3.6040908@gmail.com> I think tail call is very common. Consider following examples: def perform (input): # a "action" data = prepare(input) process(data) # tail call def result (input): # a "function" properly speaking data = prepare(input) return process(data) # tail call def case1 (input): if cond(input): return deal_with_special_case() # tail call def case2 (input): if cond(input): return deal_with_common_case() # tail call def perform_cases (input): if cond1(input): case1(input) # tail call elif cond2(input): case2(input) # tail call elif cond3(input): case3(input) # tail call def result_cases (input): if cond1(input): return case1(input) # tail call elif cond2(input): return case2(input) # tail call elif cond3(input): return case3(input) # tail call There are probably many more typical *schemas* of common tail call use cases. It is in any case very frequent, of pretty various usage, and not specific to functional or functional-like programming. Instead, we all use tail calls constantly, without even thinking at it, just like we constantly make prose ;-). [1] My point of view is not that tail call is a special (maybe very minoritary) kind of call, but that there are 2 kinds of calls maybe of equal importance: * delegation: another proc is passed the responsability of performing a task, or achieving the rest of it (tail call) * assistance: another proc is used to assist in a main task, still controlled and assumed by the main proc (sub call) I guess there are 2 main situations of delegation: / tail calls * the main proc sorts out cases and delegates in some or all cases * the main proc prepares the task and a delegate achieves it which may be mixed. (I may miss some, for sure.) "return from" may well do the job, but entertains imo wrong views about tail calls. Maybe "pass" would do the job better. When a delegate f performs an action (action, examples 'perform' & 'perform_cases' & 'case*' above), it can be interpreted as "pass the responsability of the task to f", or just "pass by f". When a delegate f computes a result (function, examples 'result' & 'result_cases' above) it can interpreted as "pass f's result back to the caller". (There is a similar ambiguity with "return", actually also matching semantic ambiguity.) denis [1] Allusion to https://en.wikipedia.org/wiki/Le_Bourgeois_gentilhomme From rosuav at gmail.com Mon Jan 20 16:39:33 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 21 Jan 2014 02:39:33 +1100 Subject: [Python-ideas] return from -- breadth of usage In-Reply-To: <52DD40E3.6040908@gmail.com> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DD40E3.6040908@gmail.com> Message-ID: On Tue, Jan 21, 2014 at 2:29 AM, spir wrote: > def perform (input): # a "action" > data = prepare(input) > process(data) # tail call > > def result (input): # a "function" properly speaking > data = prepare(input) > return process(data) # tail call To Python, the second one could be a tail call, but the first one isn't. It's really: def perform (input): # a "action" data = prepare(input) process(data) return None If process() happens to return None, then it becomes a tail call, but since Python has no way of knowing if this will be the case, it can't optimize anything away. (Conversely, if the interpreter knew that perform()'s return value was going to be ignored, the same optimization could be made, but it can't assume that either.) But if 'return from' syntax is added, I don't think it'll be much of an issue to put explicit return statements in functions where you know it'll always be None. def perform (input): # a "action" data = prepare(input) return from process(data) # now a tail call ChrisA From jonathan at slenders.be Mon Jan 20 16:57:48 2014 From: jonathan at slenders.be (Jonathan Slenders) Date: Mon, 20 Jan 2014 16:57:48 +0100 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: Interesting. I very much like the "return from" syntax. It's explicit and consistent enough with "yield from". When using coroutines, it currently also happens that at some points you have a choice to drop certain frames from the stack. Take for instance the following: @coroutine def a(): result = yield from b() # 'b' is another coroutine return result or often written as: @coroutine def a(): return (yield from b()) You could write it as: def a(): return b() In the last example, you delegate to another coroutine, removing 'a' from the stack. (see this discussion: https://groups.google.com/forum/#!topic/python-tulip/5xW44wh5Krs ) 2014/1/20 Nick Coghlan > > On 20 Jan 2014 11:16, "David Mertz" wrote: > > > > I was mostly disliking the idea of TCO during this discussion. However, > the idiom of 'return from' seems sufficiently elegant and explicit--and has > exactly the semantics you'd expect from 'yield from'--that I am actually +1 > on that idea. > > I agree that a PEP for "return from" would be interesting. It also gives > debuggers something to latch on to in order to handle the new scenario > (just as they needed some adjustment to handle "yield from"). > > "return from" could also be explicitly disallowed in try blocks and with > statements (since those inherently conflict with the idea of reusing the > current frame for a different call). > > By keeping a list of references to the ellided calls (perhaps using counts > for more efficient handling of recursive calls), you could even partially > reconstruct the missing parts of the traceback. > > > Being an explicit construct, it definitely becomes a case of "consenting > adults" not of implicit magic. I.e. you are declaring right in the code > that you don't expect to see a frame in a stack trace, which is fair > enough. I mean, if you *really* wanted to you could muck around with > 'sys._getframe(N).f_whatever' already which would give inaccurate > tracebacks too. Probably there would be a way to removed frames from the > stack even, using some such trick in current python. > > Yep, we do that (from C) in importlib to try to reduce the infrastructure > noise in the tracebacks shown to users. > > Cheers, > Nick. > > > > > > > On Sun, Jan 19, 2014 at 3:13 PM, Terry Reedy wrote: > >> > >> Proposal (mostly not mine): add 'return from f(args)', in analogy with > 'yield from iterator', to return a value to the caller from an execution > frame running f(args) (and either reuse or delete the frame that ran > 'return from'). The function name 'f' would not have to match the name of > the function being compiled, this would actually be TCO, even if it were > nearly always used for recursive tail calls. That does mean that is would > work for mutually tail recursive functions. > >> > >> On 1/19/2014 6:57 AM, Joao S. O. Bueno wrote: > >>> > >>> OTOH, since we are at it, we'd better check > >>> 2009 BDLF's opinion on the subject: > >>> > >>> > http://neopythonic.blogspot.com.br/2009/04/tail-recursion-elimination.html > >> > >> > >> I read throught the comments and near the very end, in July 2013, Dan > LaMotte said... ''' > >> Definitely seems to be complicated/impossible to determine a function > is tail recursion 'compliant' statically in python, however, what if it > were an 'opt in' feature that uses a different 'return' keyword? > >> > >> def f(n): > >> if n > 0: > >> tailcall f(n - 1) > >> return 0 > >> ''' > >> In additional paragraphs, he noted, among other things, that this makes > the feature 'opt-in' on a function by function basis. > >> > >> Guido replied "Dan: your proposal has the redeeming quality of clearly > being a language feature rather than a possible optimization. I don't > really expect there to be enough demand to actually add this to the > language though. Maybe you can use macropy to play around with the idea > though?" > >> > >> ???? then suggested 'return from'. My only contribution is to point out > the analogy with the new, and initially strange, 'yield from'. > >> > >> Guido seems to have said that if a) someone tries out the idea with > macropy, and b) someone demonstrates enough demand, he might consider > adding such a feature. So this seems to me the best option to pursue to get > something into CPython. I also think it is the best proposal so far. > >> > >> As for a), I have not looked as macropy, but: > >> On 1/19/2014 4:33 PM, Haoyi Li wrote:> MacroPy's @tco decorator is > about as easy as you could ask for. 'pip > >> > install macropy', 'from macropy.experimental.tco import macros, tco' > > is about as easy as you could ask for. Works for arbitrary tail-calls > > too, not just tail recursion. > >> > >> That leaves b) for those of you who want the feature. > >> > >> Any PEP should admit that the feature might be abused. Someone might > write > >> return from len(composite) > >> Unless return from refuses to delete the frame making a call to a C > function, the effect would be to save a trivial O(1) space as the cost of > deleting the most important line of a stack trace should len() raise. But I > think this falls under the 'consenting adults' principle. A proposed doc > should make it clear that the intended use is to make deeply recursive or > mutually recursive functions run and not to replace all tail calls. > >> > >> -- > >> Terry Jan Reedy > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Jan 20 19:36:12 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 20 Jan 2014 12:36:12 -0600 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> Message-ID: +1 for the example you used. On Mon, Jan 20, 2014 at 7:09 AM, Chris Angelico wrote: > On Mon, Jan 20, 2014 at 11:12 PM, spir wrote: > > For instance, closed intervals are more intuitive or natural, obviously > (but > > for some reason I don't know). If you ask someone to count from 1 to 9, > you > > will probably be surprised to hear him/her start from 2 or stop after 8. > If > > you are asked to choose a letter between c and g, you will probably be > > surprised to hear that 'c' or 'g' is no good choice. > > I'm not so sure about that. The half-open interval makes as much sense > as the fully closed - all you have to do is interpret the indices as > being *between* elements. Take, for example, Scripture verses. (Quotes > taken from THE HOLY BIBLE, NEW INTERNATIONAL VERSION?, NIV? Copyright > ? 1973, 1978, 1984, 2011 by Biblica, Inc.? Used by permission. All > rights reserved worldwide. Copyright notice included for license > compliance. Note that I'm using bracketed numbers to indicate the > beginnings of verses - in a printed Bible, these would normally be in > superscript.) > > John 14: > [31] To the Jews who had believed him, Jesus said, ?If you hold to my > teaching, you are really my disciples. [32] Then you will know the > truth, and the truth will set you free.? [33] > > This passage is normally referred to as "John 14:31-32", but as you > see, the verse marker [32] is in the middle of the quote. Using a > half-open interval, this would start at "John 14:31" and end at "John > 14:33". Half-open means: "Begin at the beginning, go on till you come > to the end, then stop", as the King of Hearts instructed the White > Rabbit. > > It's easy to indicate the beginning of a chapter: your start reference > is verse 1. Here's the beginning of the account of the creation of the > world: > > [1] In the beginning God created the heavens and the earth. [2] Now > the earth was formless and empty, darkness was over the surface of the > deep, and the Spirit of God was hovering over the waters. [3] And God > said, ?Let there be light,? and there was light. [4] God saw that the > light was good, and he separated the light from the darkness. [5] God > called the light ?day,? and the darkness he called ?night.? And there > was evening, and there was morning?the first day. [6] > > Common parlance: Genesis 1:1-5. Half-open: Genesis 1:1-6. Conclusion: > Tie. No argument to be made for either side. But what if you're > looking at the *end* of a chapter? Here are a few verses from later on > in Genesis 1: > > [29] Then God said, ?I give you every seed-bearing plant on the face > of the whole earth and every tree that has fruit with seed in it. They > will be yours for food. [30] And to all the beasts of the earth and > all the birds of the air and all the creatures that move on the > ground?everything that has the breath of life in it?I give every green > plant for food.? And it was so. [31] God saw all that he had made, and > it was very good. And there was evening, and there was morning?the > sixth day. > > Common parlance: Genesis 1:29-31. Half-open: Genesis 1:29-2:1. It's > much more obvious by the latter that this passage extends exactly to > the end of the chapter. > > Obviously it's way WAY too late to change the way Bible references are > written, any more than Melway could renumber their maps all of a > sudden. Massive case of lock-in and backward-incompatibility with > existing code. But I put it to you that the half-open would make at > least as much sense as the closed, in any situation where there are > boundaries with contents between them. > > Note, by the way, that I'm not looking at anything involving backward > scanning or wider strides, both of which Python's slice notation > supports. Neither of those is inherently real-world intuitive, so the > exact semantics can be defined as whatever makes sense in code. (And > there was some discussion a little while ago about exactly that.) I'm > just looking at the very simple and common case of referencing a > subset of consecutive elements from a much larger whole. > > The closed interval makes more sense when the indices somehow *are* > the values being retrieved. When you count from 1 to 9, you expect > nine numbers: 1, 2, ..., 8, 9. When you list odd numbers from 1 to 9, > you expect 1, 3, 5, 7, 9. But what if you're looking at a container > train and numbering the twenty-foot-equivalent-units (TEU) that it > has? A 40-foot container requires 2 TEU, a 60-foot container requires > 3 TEU. A "reefer" (refridgerated container) might require an extra > slot, or at least it might be a 56-footer and consume 3 TEU. One wagon > might, if you're lucky, carry 5 TEU; numbering them 1 through 5 would > be obvious, but numbering the boundaries between them as 0 through 5 > is better at handling the multiple TEU containers. (Even more so when > you look at double-stacked containers. An over-height 40-foot > container could consume 2 TEU horizontally and 2 TEU vertically, and > be put in slots (0,0)-(2,2). This is, in fact, exactly how a GTK2 > Table layout works.) Both types of intervals have their places. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Jan 20 20:05:03 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 20 Jan 2014 20:05:03 +0100 Subject: [Python-ideas] Predicate Sets In-Reply-To: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: Am 20.01.2014 08:56, schrieb Andrew Barnert: > From: Daniel da Silva Sent: Sunday, January 19, > 2014 3:41 PM > > >> Overview: Sets in mathematics can be defined by a list of elements without >> repetitions, and alternatively by a predicate (function) that determines >> inclusion. > > The whole point of modern set theory is that sets cannot be defined by a > predicate alone; only by a predicate _and a set to apply it over_. Which we > already have in set comprehensions. > > And your suggestion has the exact same problem that naive set theory had: > > >>> myset = predicateset(lambda s: s.startswith('a')) > >>> 'xyz' in myset > False > > >>> russellset = predicateset(lambda s: s not in s) > >>> russellset in russelset > > Presumably this should cause the computer to scream "DOES NOT COMPUTE!" and > blow up... I think it will just raise a NameError... SCNR, Georg From mertz at gnosis.cx Mon Jan 20 21:10:14 2014 From: mertz at gnosis.cx (David Mertz) Date: Mon, 20 Jan 2014 12:10:14 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: Although a cute point, I'm not too concerned about the Russell's Paradox issue. The obvious implementation will get a "RuntimeError: maximum recursion depth exceeded" in that case. But then, no predicate is guaranteed to halt, so that's not really special to the russellset. On the other hand, even though I think the idea of a 'predicateset' is cute mathematically, I'm not really sure what it actually gets you, even in readability. I am perfectly happy spelling this: mypset = predicateset(somefunc) if x in mypset: ... As: if somefunc(x): ... Even for the set operators, set comprehensions seem pretty much equally elegant: such_that = {1, 2, 3} & mypset # Looks nice, I agree But then, this looks pretty nice also: such_that = {x for x in {1, 2, 3} if somefunc(x)} OK, sure the predicateset version might save a few characters, but not all that many. If you want to combine predicate sets that's really just like combining predicates. It *does* sort of remind me that I'd like some standard HOFs as builtins or in the standard library (probably in functools). But still, where you might write: in_both_sets = mypset & mypset2 It's not bad to write a small support module: # combinators.py def allP(*fns): return lambda x: all(f(x) for f in fns) def anyP(*fns): return lambda x: any(f(x) for f in fns) Then express the intersection as: in_both_pred = allP(somefunc, somefunc2) >From there, you can just use the predicate 'in_both_pred' as above. Similarly for union, define: in_either_pred = anyP(somefunc, somefunc2) On Mon, Jan 20, 2014 at 11:05 AM, Georg Brandl wrote: > Am 20.01.2014 08:56, schrieb Andrew Barnert: > > From: Daniel da Silva Sent: Sunday, January > 19, > > 2014 3:41 PM > > > > > >> Overview: Sets in mathematics can be defined by a list of elements > without > >> repetitions, and alternatively by a predicate (function) that determines > >> inclusion. > > > > The whole point of modern set theory is that sets cannot be defined by a > > predicate alone; only by a predicate _and a set to apply it over_. Which > we > > already have in set comprehensions. > > > > And your suggestion has the exact same problem that naive set theory had: > > > > >>> myset = predicateset(lambda s: s.startswith('a')) > > >>> 'xyz' in myset > > False > > > > >>> russellset = predicateset(lambda s: s not in s) > > >>> russellset in russelset > > > > Presumably this should cause the computer to scream "DOES NOT COMPUTE!" > and > > blow up... > > I think it will just raise a NameError... > > SCNR, > Georg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Jan 20 21:16:06 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 20 Jan 2014 15:16:06 -0500 Subject: [Python-ideas] return from -- breadth of usage In-Reply-To: <52DD40E3.6040908@gmail.com> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DD40E3.6040908@gmail.com> Message-ID: On 1/20/2014 10:29 AM, spir wrote: > I think tail call is very common Yes, they are. That is why space-optimizing all tail calls, and destroying proper tracebacks for all tail calls, is gross over-kill. Saving space is only needed when recursion would make the stack space used grow without any particular bound. (Note that this is only an issue for practical implementations, not pure mathematics.) The point of the 'tail call' proposal is to have the programmer explicitly say when space conservation is needed, instead of asking the interpreter to magically make that determination. -- Terry Jan Reedy From denis.spir at gmail.com Mon Jan 20 22:53:09 2014 From: denis.spir at gmail.com (spir) Date: Mon, 20 Jan 2014 22:53:09 +0100 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: Message-ID: <52DD9AC5.8000602@gmail.com> On 01/20/2014 12:41 AM, Daniel da Silva wrote: > Below is a description of a very simple but immensely useful class called a > "predicate set". In combination with the set and list comprehensions they > would allow another natural layer of reasoning with mathematical set logic > in Python. > > In my opinion, a concept like this would be best located in the functools > module. > > > *Overview:* > Sets in mathematics can be defined by a list of elements without > repetitions, and alternatively by a predicate (function) that determines > inclusion. A predicate set would be a set-like class that is instantiated > with a predicate function that is called to determine ``a in > the_predicate_set''. > >>> myset = predicateset(lambda s: s.startswith('a')) >>> 'xyz' in myset > False >>> 'abc' in myset > True >>> len(myself) > Traceback (most recent call last): > [...] > TypeError > > *Example Uses:* > # Dynamic excludes in searching > foo_files = search_files('foo', exclude=set(['a.out', 'Makefile'])) > bar_files = search_files('bar', exclude=predicateset(lambda fname: not > fname.endswith('~'))) # exclude *~ > > # Use in place of a set with an ORM > validusernames = predicateset(lambda s: re.match(s, '[a-zA-Z0-9]+')) > > class Users(db.Model): > username = db.StringProperty(choices=validusernames) > password = db.StringProperty() While the theoretical interest is clear, I don't see the actual point. A predicate set without any actual set (in the ordinary prog sense) is just a criterion function (the predicate) returning a logical true/false, right? (Note: any logical func, any logical expression on a variable, does define a predicate set, doesn't it?) So, we already have this builtin ;-). >>> crit = lambda s: s.startswith('a') >>> crit("xyz") False >>> crit("abc") True One could make a trivial class to build such constructs as objects and implement the 'in' operator for them. class PredSet: def __init__ (self, crit): self.crit = crit def __contains__ (self, x): return self.crit(x) crit = lambda s: s.startswith('a') s = PredSet(crit) print("xyz" in s, "abc" in s) But I don't see any advantage in terms of clarity: crit(x) is as clear, isn't it. One also could add an actual set to such objects, which would automagically put items inside, eg whenever they are checked via the criterion func. (Somewhat like string pools.) class PredSet: def __init__ (self, crit): self.crit = crit self.items = set() def __contains__ (self, x): if self.crit(x): self.items.add(x) return True return False s = PredSet(crit) print("xyz" in s, "abc" in s, "ablah" in s) print(s.items) Would certainly be nice, but I cannot see any usage. All in all, I guess I'm missing the actual point. Denis From abarnert at yahoo.com Mon Jan 20 23:41:22 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 20 Jan 2014 14:41:22 -0800 Subject: [Python-ideas] return from -- breadth of usage In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DD40E3.6040908@gmail.com> Message-ID: On Jan 20, 2014, at 7:39, Chris Angelico wrote: > If process() happens to return None, then it becomes a tail call, but > since Python has no way of knowing if this will be the case, it can't > optimize anything away. (Conversely, if the interpreter knew that > perform()'s return value was going to be ignored, the same > optimization could be made, but it can't assume that either.) > > But if 'return from' syntax is added, I don't think it'll be much of > an issue to put explicit return statements in functions where you know > it'll always be None. This is a great argument for not just the idea of the explicit syntax, but also the "return from" name. I hadn't thought about the fact that (non-functional-style) code often ignores a None return value and then returns None, which automatic TCO can't handle, but explicit can. And in that case, "return from" expresses exactly the right thing, just as it does in the recursive case. From abarnert at yahoo.com Mon Jan 20 23:52:37 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 20 Jan 2014 14:52:37 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: <5038B3ED-D060-4A26-8830-6FE66545AAD1@yahoo.com> On Jan 20, 2014, at 2:26, Devin Jeanpierre wrote: > There is not any kind of fundamental problem with the idea of a Python > set-like object defined by Python predicates. Python sets aren't > mathematical sets, and Python predicates aren't mathematical > predicates. Things can be different from how they are described in > mathematics, without being internally inconsistent, and without being > useless. I made the exact same point in the rest of the paragraph that you cut off, except I said that python functions aren't mathematical functions instead of saying predicates. The original post was suggesting that Python should have predicateset because that's how mathematicians define sets. That is wrong--and, more importantly, irrelevant. Whether a predicateset class is useful or not has to do with its usefulness in writing and reading Python programs, and nothing else. Maybe I should have made the point about it being irrelevant first, and just mentioned the fact that it's wrong as a parenthetical comment. But I'm just too fond of the idea of being able to write a program that Captain Kirk or Zoe Heriot can use to blow up the computer after it takes over the world, which sadly Python does not yet have. (If I remember right, the computer Zoe did it to was programmed in Algol.) From eric at trueblade.com Tue Jan 21 01:59:37 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 20 Jan 2014 19:59:37 -0500 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> Message-ID: <52DDC679.1040406@trueblade.com> On 1/19/2014 10:40 PM, Bruce Leban wrote: > I think the odds of Python getting > > from __future__ import pony > > are slightly higher than there being a Python 2.8. I assume by "pony" > you really mean what I'd like to have: > > from __future__ import everything > > since my goal is to write Python 3 compatible code even though I'm > temporarily stuck with Python 2 due to stack issues. The __future__ > imports makes it easier to write forward compatible code. As it is, I > have to list the individual imports in every file and I also add: > > range = xrange It's unfortunate we didn't add this (and all other changed builtins) to future_builtins in 2.7. Eric. From steve at pearwood.info Tue Jan 21 02:07:26 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 21 Jan 2014 12:07:26 +1100 Subject: [Python-ideas] Predicate Sets In-Reply-To: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> Message-ID: <20140121010724.GA3915@ando> On Sun, Jan 19, 2014 at 11:56:49PM -0800, Andrew Barnert wrote: > And your suggestion has the exact same problem that naive set theory had: > >>> russellset = predicateset(lambda s: s not in s) > >>> russellset in russelset > > Presumably this should cause the computer to scream "DOES NOT > COMPUTE!" and blow up, which I think would be hard to implement in > CPython. It should just raise an exception. I leave implementation as an exercise for the reader :-) This sort of thing is a staple of bad old science fiction, where the Hero would save the world by getting the super-intelligent Artificial Intelligence Doomsday Computer to calculate some variation of the above. But of course, a *truely* intelligent computer would merely say "I see what you did there. Good try, feeble meatbag, but not good enough" and launch the missiles. > The big problem is coming up with a compelling use case. This one doesn't sell me: > > ? ? bar_files = search_files('bar', exclude=predicateset(lambda fname: not fname.endswith('~')))? If it's a project on PyPI, the only use-case necessary is the author thinks it's cool. > It seems like it make more sense to have exclude take a function, so you could just write: > > ? ? bar_files = search_files('bar', exclude=lambda fname: not fname.endswith('~')) What if you want to filter according to multiple conditions? A tuple of functions makes sense. Add a helper function that tests against those multiple functions, and you're halfway to this PredicateSet. Adding set-like methods seems like overkill. -- Steven From haoyi.sg at gmail.com Tue Jan 21 02:51:36 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Mon, 20 Jan 2014 17:51:36 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: <20140121010724.GA3915@ando> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> Message-ID: > What if you want to filter according to multiple conditions? What's wrong with lambda fname: func1(fname) and func2(fname) and func3(fname) ? On Mon, Jan 20, 2014 at 5:07 PM, Steven D'Aprano wrote: > On Sun, Jan 19, 2014 at 11:56:49PM -0800, Andrew Barnert wrote: > > > And your suggestion has the exact same problem that naive set theory had: > > > >>> russellset = predicateset(lambda s: s not in s) > > >>> russellset in russelset > > > > Presumably this should cause the computer to scream "DOES NOT > > COMPUTE!" and blow up, which I think would be hard to implement in > > CPython. > > It should just raise an exception. I leave implementation as an exercise > for the reader :-) > > This sort of thing is a staple of bad old science fiction, where the > Hero would save the world by getting the super-intelligent Artificial > Intelligence Doomsday Computer to calculate some variation of the above. > But of course, a *truely* intelligent computer would merely say "I see > what you did there. Good try, feeble meatbag, but not good enough" and > launch the missiles. > > > > The big problem is coming up with a compelling use case. This one > doesn't sell me: > > > > bar_files = search_files('bar', exclude=predicateset(lambda fname: > not fname.endswith('~'))) > > If it's a project on PyPI, the only use-case necessary is the author > thinks it's cool. > > > > It seems like it make more sense to have exclude take a function, so you > could just write: > > > > bar_files = search_files('bar', exclude=lambda fname: not > fname.endswith('~')) > > What if you want to filter according to multiple conditions? A tuple of > functions makes sense. Add a helper function that tests against those > multiple functions, and you're halfway to this PredicateSet. Adding > set-like methods seems like overkill. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jan 21 03:30:29 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 21 Jan 2014 13:30:29 +1100 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> Message-ID: <20140121023029.GB3915@ando> On Mon, Jan 20, 2014 at 05:51:36PM -0800, Haoyi Li wrote: > > What if you want to filter according to multiple conditions? > > What's wrong with > > lambda fname: func1(fname) and func2(fname) and func3(fname) That is a single compound condition, not multiple conditions. Think about a GUI application with a file selection dialog box, or a search utility. You might offer a rich set of filters, all optional, all selectable by the user at runtime: [x] Hidden dot files .foo [ ] Backup files foo~ [x] File extensions: [ ] Images [x] Text files [ ] Java code [x] Custom: [ zip,tar,foo,bar,baz ] [x] File owner: [ steve ] [ ] Group: [ ] [ ] Modified date between: [ ] and [ ] etc. It's not practical to create one single giant filter function that looks like this: def filter(name): head, ext = os.path.splitext(name) return ( (show_hidden_dot_files and name.startswith('.')) and (show_backup_tilde_files and name.endswith('~')) and (show_images and ext in list_of_image_extensions) and ... ) It would be a pain to maintain and extend, and testing would be horrible. Better to have each setting provide a single filter function, then combine the active filters into a list: def filter(name, list_of_filters): for f in list_of_filters: if not f(name): return False return True One might even use a class to represent the list of filters, and give it "all" and "any" methods, and allow multiple lists to combine so you can say things like: "show the file if *all* of these conditions are true, or if *any* of these different conditions are true, but not if *any* of these conditions are true" which of course is terribly overkill for a simple file selection dialog box, but might be useful for a more complex search engine. None of this should be read as supporting the original request to add PredicateSet into the standard library. But I encourage the OP to write his own library and put it on PyPI. -- Steven From greg.ewing at canterbury.ac.nz Tue Jan 21 05:56:11 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 21 Jan 2014 17:56:11 +1300 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> Message-ID: <52DDFDEB.5090901@canterbury.ac.nz> > On Mon, Jan 20, 2014 at 7:09 AM, Chris Angelico > wrote: > > Note, by the way, that I'm not looking at anything involving backward > scanning That would be for when you were reading your Bible text backwards, looking for hidden Satanic references. -- Greg From ericsnowcurrently at gmail.com Tue Jan 21 07:26:22 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 20 Jan 2014 23:26:22 -0700 Subject: [Python-ideas] Add an attribute spec descriptor. Message-ID: Here's something I've thought about off and on for a while. Occasionally it would be useful to me to have a class attribute I can use to represent an attribute that will exist on *instances* of the class. Properties provide that to an extent, but they are data descriptors which means they will not defer to like-named instance attributes. However, a similar non-data descriptor would fit the bill. For the sake of clarity, here is a simple implementation that demonstrates what I mean. I know it's asking a lot , but try to focus on the idea rather than the code. I've posted a more complete (and feature-rich) implementation online [1]. class Attr: """A non-data descriptor specifying an instance attribute.""" def __init__(self, name, doc=None): self.__name__ = name self.__doc__ = doc def __get__(self, obj, cls): if obj is None: return self else: # The attribute wasn't found on the instance. raise AttributeError(self.__name__) def attribute(f=None): """A decorator that converts a function into an attribute spec.""" return Attr(f.__name__, f.__doc__) def attrs(names): """A class decorator that adds the requested attribute specs.""" def decorator(cls): for name in names: attr = Attr(name) setattr(cls, name, attr) return cls return decorator Other features not shown here (see [1]): * an optional "default" Attr value * an optional "type" Attr (derived from f.__annotations__['return']) * __qualname__ * auto-setting self.__name__ during the first Attr.__get__() call * a nice repr * Attr.from_func() * proper ABC handling in attrs() (not an obvious implementation) * optionally inheriting docstrings Such a descriptor is particularly useful for at least 2 things: 1. indicating that an abstractproperty is "implemented" on *instances* of a class 2. introspecting (on the class) all the attributes of instances of a class Alternatives: * "just use a property". As already noted, a property would work, but is somewhat cumbersome in the case of writable attributes. A non-data descriptor is a more natural fit in that case. * for #1, "just use a normal class attribute". This would mostly work. However, doing so effectively sets a default value, which you may not want. Furthermore, it may not be clear to readers of the code (or of help()) what the point of the class attr is. Thoughts? -eric [1] https://bitbucket.org/ericsnowcurrently/odds_and_ends/src/default/attribute.py [2] Where would Attr/attribute/attrs live in the stdlib? inspect? types? From greg.ewing at canterbury.ac.nz Tue Jan 21 07:46:12 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 21 Jan 2014 19:46:12 +1300 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> Message-ID: <52DE17B4.2090308@canterbury.ac.nz> Jonathan Slenders wrote: > @coroutine > def a(): > return (yield from b()) > > You could write it as: > > def a(): > return b() I'm guessing you mean def a(): return from b() but that wouldn't be a coroutine, because it doesn't contain a 'yield' anywhere. -- Greg From jonathan at slenders.be Tue Jan 21 08:27:52 2014 From: jonathan at slenders.be (Jonathan Slenders) Date: Tue, 21 Jan 2014 08:27:52 +0100 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: <52DE17B4.2090308@canterbury.ac.nz> References: <3426697229381222197@unknownmsgid> <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DE17B4.2090308@canterbury.ac.nz> Message-ID: No I didn't. Those examples that I wrote are equivalent, except that the second will miss a frame on the stack. 2014/1/21 Greg Ewing > Jonathan Slenders wrote: > > @coroutine >> def a(): >> return (yield from b()) >> >> You could write it as: >> >> def a(): >> return b() >> > > I'm guessing you mean > > def a(): > return from b() > > but that wouldn't be a coroutine, because it doesn't > contain a 'yield' anywhere. > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Tue Jan 21 09:20:38 2014 From: flying-sheep at web.de (Philipp A.) Date: Tue, 21 Jan 2014 09:20:38 +0100 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: <52DDC679.1040406@trueblade.com> References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> <52DDC679.1040406@trueblade.com> Message-ID: you?ll have to do quite a bit: # -*- coding: utf-8 -*-from __future__ import print_function, division, unicode_literals, absolute_import from io import open range = xrange str = unicode basestring = (str, bytes) #for isinstance() 2014/1/21 Eric V. Smith > On 1/19/2014 10:40 PM, Bruce Leban wrote: > > I think the odds of Python getting > > > > from __future__ import pony > > > > are slightly higher than there being a Python 2.8. I assume by "pony" > > you really mean what I'd like to have: > > > > from __future__ import everything > > > > since my goal is to write Python 3 compatible code even though I'm > > temporarily stuck with Python 2 due to stack issues. The __future__ > > imports makes it easier to write forward compatible code. As it is, I > > have to list the individual imports in every file and I also add: > > > > range = xrange > > It's unfortunate we didn't add this (and all other changed builtins) to > future_builtins in 2.7. > > Eric. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jan 21 09:27:45 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 21 Jan 2014 00:27:45 -0800 Subject: [Python-ideas] Create Python 2.8 as a transition step to Python 3.x In-Reply-To: References: <20140118032219.GA11381@python.ca> <20140119011332.GA5735@python.ca> <7wd2joaagr.fsf@benfinney.id.au> <20140119223357.178194112C@wycliff.ceplovi.cz> <85sisjwnn6.fsf@benfinney.id.au> <52DC6360.10709@stoneleaf.us> <20140120000645.GV3915@ando> <52DDC679.1040406@trueblade.com> Message-ID: <3C99F946-AF8E-4642-90ED-3625D42CA230@yahoo.com> On Jan 21, 2014, at 0:20, "Philipp A." wrote: > you?ll have to do quite a bit: > > # -*- coding: utf-8 -*- > from __future__ import print_function, division, unicode_literals, absolute_import > > from io import open > > range = xrange > str = unicode > basestring = (str, bytes) #for isinstance() Plus importing imap and ifilter as map and filter, and renaming modules in some way rather than just builtins, and of course you have to wrap half of that in a try and/or if sys.version_info check, or it won't run in 3.x, which defeats the purpose... Which is why I create a project-specific module so I can just "from sixify import *" (along with the future statement, of course) at the top of every module, and it's all taken care of in two lines. > > 2014/1/21 Eric V. Smith >> On 1/19/2014 10:40 PM, Bruce Leban wrote: >> > I think the odds of Python getting >> > >> > from __future__ import pony >> > >> > are slightly higher than there being a Python 2.8. I assume by "pony" >> > you really mean what I'd like to have: >> > >> > from __future__ import everything >> > >> > since my goal is to write Python 3 compatible code even though I'm >> > temporarily stuck with Python 2 due to stack issues. The __future__ >> > imports makes it easier to write forward compatible code. As it is, I >> > have to list the individual imports in every file and I also add: >> > >> > range = xrange >> >> It's unfortunate we didn't add this (and all other changed builtins) to >> future_builtins in 2.7. >> >> Eric. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Tue Jan 21 09:50:18 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 21 Jan 2014 00:50:18 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: <20140121023029.GB3915@ando> References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> <20140121023029.GB3915@ando> Message-ID: Doesn't *all(map(list_of_filters, value))* do what we want here, then? Maybe *imap* if you want the early bailout behavior. I'm all for having *all*, *map*, *any*, etc. be methods rather than top-level-functions (yay, less namespace pollution!), but if we're talking about a list of functions, it seems we can do exactly what we want very concisely using normal list- and function- operations. On Mon, Jan 20, 2014 at 6:30 PM, Steven D'Aprano wrote: > On Mon, Jan 20, 2014 at 05:51:36PM -0800, Haoyi Li wrote: > > > What if you want to filter according to multiple conditions? > > > > What's wrong with > > > > lambda fname: func1(fname) and func2(fname) and func3(fname) > > That is a single compound condition, not multiple conditions. > > Think about a GUI application with a file selection dialog box, or a > search utility. You might offer a rich set of filters, all optional, all > selectable by the user at runtime: > > > [x] Hidden dot files .foo > [ ] Backup files foo~ > [x] File extensions: > [ ] Images > [x] Text files > [ ] Java code > [x] Custom: [ zip,tar,foo,bar,baz ] > [x] File owner: [ steve ] > [ ] Group: [ ] > [ ] Modified date between: [ ] and [ ] > > > etc. It's not practical to create one single giant filter function that > looks like this: > > def filter(name): > head, ext = os.path.splitext(name) > return ( > (show_hidden_dot_files and name.startswith('.')) > and (show_backup_tilde_files and name.endswith('~')) > and (show_images and ext in list_of_image_extensions) > and ... > ) > > > It would be a pain to maintain and extend, and testing would be > horrible. Better to have each setting provide a single filter function, > then combine the active filters into a list: > > def filter(name, list_of_filters): > for f in list_of_filters: > if not f(name): > return False > return True > > One might even use a class to represent the list of filters, and give it > "all" and "any" methods, and allow multiple lists to combine so you can > say things like: > > "show the file if *all* of these conditions are true, or if *any* of > these different conditions are true, but not if *any* of these > conditions are true" > > which of course is terribly overkill for a simple file selection dialog > box, but might be useful for a more complex search engine. > > None of this should be read as supporting the original request to add > PredicateSet into the standard library. But I encourage the OP to write > his own library and put it on PyPI. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Tue Jan 21 09:58:01 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 21 Jan 2014 00:58:01 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> <20140121023029.GB3915@ando> Message-ID: > *all(map(list_of_filters, value))* Scratch that, what I actually want is *all(map(lambda f: f(value), list_of_filters))* I always mix up the order of things going into *map* =( On Tue, Jan 21, 2014 at 12:50 AM, Haoyi Li wrote: > Doesn't > > *all(map(list_of_filters, value))* > > do what we want here, then? Maybe *imap* if you want the early bailout > behavior. > > I'm all for having *all*, *map*, *any*, etc. be methods rather than > top-level-functions (yay, less namespace pollution!), but if we're talking > about a list of functions, it seems we can do exactly what we want very > concisely using normal list- and function- operations. > > > On Mon, Jan 20, 2014 at 6:30 PM, Steven D'Aprano wrote: > >> On Mon, Jan 20, 2014 at 05:51:36PM -0800, Haoyi Li wrote: >> > > What if you want to filter according to multiple conditions? >> > >> > What's wrong with >> > >> > lambda fname: func1(fname) and func2(fname) and func3(fname) >> >> That is a single compound condition, not multiple conditions. >> >> Think about a GUI application with a file selection dialog box, or a >> search utility. You might offer a rich set of filters, all optional, all >> selectable by the user at runtime: >> >> >> [x] Hidden dot files .foo >> [ ] Backup files foo~ >> [x] File extensions: >> [ ] Images >> [x] Text files >> [ ] Java code >> [x] Custom: [ zip,tar,foo,bar,baz ] >> [x] File owner: [ steve ] >> [ ] Group: [ ] >> [ ] Modified date between: [ ] and [ ] >> >> >> etc. It's not practical to create one single giant filter function that >> looks like this: >> >> def filter(name): >> head, ext = os.path.splitext(name) >> return ( >> (show_hidden_dot_files and name.startswith('.')) >> and (show_backup_tilde_files and name.endswith('~')) >> and (show_images and ext in list_of_image_extensions) >> and ... >> ) >> >> >> It would be a pain to maintain and extend, and testing would be >> horrible. Better to have each setting provide a single filter function, >> then combine the active filters into a list: >> >> def filter(name, list_of_filters): >> for f in list_of_filters: >> if not f(name): >> return False >> return True >> >> One might even use a class to represent the list of filters, and give it >> "all" and "any" methods, and allow multiple lists to combine so you can >> say things like: >> >> "show the file if *all* of these conditions are true, or if *any* of >> these different conditions are true, but not if *any* of these >> conditions are true" >> >> which of course is terribly overkill for a simple file selection dialog >> box, but might be useful for a more complex search engine. >> >> None of this should be read as supporting the original request to add >> PredicateSet into the standard library. But I encourage the OP to write >> his own library and put it on PyPI. >> >> >> -- >> Steven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Jan 21 10:09:32 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 21 Jan 2014 11:09:32 +0200 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> <20140121023029.GB3915@ando> Message-ID: 21.01.14 10:58, Haoyi Li ???????(??): > > *all(map(list_of_filters, value))* > > Scratch that, what I actually want is > > *all(map(lambda f: f(value), list_of_filters))* > * > * > I always mix up the order of things going into *map* =(* > * all(f(value) for f in list_of_filters) looks cleaner to me. Perhaps slightly more efficient (but much less readable) form: all(map(operator.methodcaller('__call__', value), list_of_filters) From oscar.j.benjamin at gmail.com Tue Jan 21 11:36:19 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 21 Jan 2014 10:36:19 +0000 Subject: [Python-ideas] return from (was Re: Tail recursion elimination) In-Reply-To: <52DE17B4.2090308@canterbury.ac.nz> References: <40960db6-c5fb-469c-91b4-74b7da3ccdda@email.android.com> <44823229-C2CA-4A96-8718-5B750994ED16@gmail.com> <52DE17B4.2090308@canterbury.ac.nz> Message-ID: <20140121103617.GB2632@gmail.com> On Tue, Jan 21, 2014 at 07:46:12PM +1300, Greg Ewing wrote: > Jonathan Slenders wrote: > > > @coroutine > > def a(): > > return (yield from b()) > > > >You could write it as: > > > > def a(): > > return b() > > I'm guessing you mean > > def a(): > return from b() > > but that wouldn't be a coroutine, because it doesn't > contain a 'yield' anywhere. If b() is a generator/iterator then the second example removes the frame associated fom a() from the stack while you iterate: for x in a(): # one less frame on the stack at this point Oscar From ncoghlan at gmail.com Tue Jan 21 12:59:34 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Jan 2014 21:59:34 +1000 Subject: [Python-ideas] Add an attribute spec descriptor. In-Reply-To: References: Message-ID: In selling this idea, I would focus on the immediate impact it could have on "help(cls)", as well as the automated testing possibilities (checking all attributes are set on an instance). There's also the class-only descriptor behaviour we added for enums to consider, where retrieval via an instance throws AttributeError. Essentially - interesting idea, but one you can experiment with outside the stdlib :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jan 21 13:27:36 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 21 Jan 2014 04:27:36 -0800 Subject: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` In-Reply-To: <8AA6E510-1B8D-4E8B-9A6E-5CC0D1935E53@yahoo.com> References: <9aca6c85-f924-4adf-b205-a2acbf006bb1@googlegroups.com> <013F5951-85AC-4854-9915-D50E4A5319AF@yahoo.com> <8AA6E510-1B8D-4E8B-9A6E-5CC0D1935E53@yahoo.com> Message-ID: <2ACE6D36-2E08-4652-984E-39CB13AFBB21@yahoo.com> On Jan 21, 2014, at 4:20, Andrew Barnert wrote: > And this is very easy to solve: run the downloads on a thread pool, and as each one finishes, kick its post processing off to a process pool. Wait, that's stupid. Even simpler: just use a flat process pool of 2N for everything (or whatever multiplier is appropriate for your load--although often a downloader doesn't want to do more than about 4-12 simultaneous downloads, which is already below 2N on most modern computers...). From abarnert at yahoo.com Tue Jan 21 13:29:42 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 21 Jan 2014 04:29:42 -0800 Subject: [Python-ideas] Fwd: Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` References: <8AA6E510-1B8D-4E8B-9A6E-5CC0D1935E53@yahoo.com> Message-ID: Sent from a random iPhone Begin forwarded message: > From: Andrew Barnert > Date: January 21, 2014, 4:20:19 PST > To: Ram Rachum > Cc: "python-ideas at googlegroups.com" > Subject: Re: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` > > On Jan 21, 2014, at 2:17, Ram Rachum wrote: > >> If you're writing code that needs to use both a lot of IO and a lot of CPU. For example, you're downloading many items from the internet and then doing post-processing on them. > > Yes, but in that case, how could a single executor with n processes and m threads help at all? You can only have one thread per process doing CPU work; they're still going to end up blocking each other. > > And this is very easy to solve: run the downloads on a thread pool, and as each one finishes, kick its post processing off to a process pool. > > But you should be able to build the two-tier pool in under half an hour, and then you can test to find applications where it really does or doesn't help. > >> On Tue, Jan 21, 2014 at 10:42 AM, Andrew Barnert wrote: >>> On Jan 17, 2014, at 5:00, Ram Rachum wrote: >>> >>> > Hi, >>> > >>> > I'd like to use `concurrent.futures.ProcessPoolExecutor` but have each process contain multiple worker threads. We could have an `n_threads` argument to the constructor, defaulting to 1 to maintain backward compatibility, and setting a value higher than 1 would cause multiple threads to be spawned in each process. >>> >>> What for? >>> >>> Generally you use processes because you can't use threads. Whether this is because you're running CPU-bound code that needs to get around the GIL, because you want complete isolation between tasks, because your platform doesn't support threads, or any other reason I can think of, you wouldn't want threads per process either. >>> >>> There are use cases for multiple processes of multiple threads, like running four independent IOCP-based servers (let them all try to use all your cores and let the kernel load balance among them), or isolated tasks with sharing-based subtasks... But those kinds of uses don't make sense in a single executor. >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jan 21 13:29:24 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 21 Jan 2014 04:29:24 -0800 Subject: [Python-ideas] Fwd: Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` References: <013F5951-85AC-4854-9915-D50E4A5319AF@yahoo.com> Message-ID: Google apparently ate this message, and the next one, so... Forwarding them. Apologies for the mess. Apparently you can't just reply to messages that arrive on the list via Google Groups? Sent from a random iPhone Begin forwarded message: > From: Andrew Barnert > Date: January 21, 2014, 0:42:11 PST > To: Ram Rachum > Cc: "python-ideas at googlegroups.com" > Subject: Re: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` > > On Jan 17, 2014, at 5:00, Ram Rachum wrote: > >> Hi, >> >> I'd like to use `concurrent.futures.ProcessPoolExecutor` but have each process contain multiple worker threads. We could have an `n_threads` argument to the constructor, defaulting to 1 to maintain backward compatibility, and setting a value higher than 1 would cause multiple threads to be spawned in each process. > > What for? > > Generally you use processes because you can't use threads. Whether this is because you're running CPU-bound code that needs to get around the GIL, because you want complete isolation between tasks, because your platform doesn't support threads, or any other reason I can think of, you wouldn't want threads per process either. > > There are use cases for multiple processes of multiple threads, like running four independent IOCP-based servers (let them all try to use all your cores and let the kernel load balance among them), or isolated tasks with sharing-based subtasks... But those kinds of uses don't make sense in a single executor. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jan 21 15:37:29 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 21 Jan 2014 06:37:29 -0800 Subject: [Python-ideas] Add an attribute spec descriptor. In-Reply-To: References: Message-ID: <52DE8629.6060300@stoneleaf.us> On 01/20/2014 10:26 PM, Eric Snow wrote: > > Occasionally it would be useful to me to have a class attribute I can > use to represent an attribute that will exist on *instances* of the > class. Properties provide that to an extent, but they are data > descriptors which means they will not defer to like-named instance > attributes. However, a similar non-data descriptor would fit the > bill. Have you checked out Lib/types.py/DynamicClassAttribute ? It may be worth building on that. -- ~Ethan~ From rymg19 at gmail.com Tue Jan 21 18:45:42 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 21 Jan 2014 11:45:42 -0600 Subject: [Python-ideas] Tail Call Optimization -- natural? intuitive? In-Reply-To: <52DDFDEB.5090901@canterbury.ac.nz> References: <20140119004515.GP3915@ando> <2BBCA225-5EE0-440F-8771-6E422F43C2B0@gmail.com> <52DD12AF.2090309@gmail.com> <52DDFDEB.5090901@canterbury.ac.nz> Message-ID: If someone does that, they have more problems than one. On Mon, Jan 20, 2014 at 10:56 PM, Greg Ewing wrote: > On Mon, Jan 20, 2014 at 7:09 AM, Chris Angelico > rosuav at gmail.com>> wrote: >> >> Note, by the way, that I'm not looking at anything involving backward >> scanning >> > > That would be for when you were reading your Bible > text backwards, looking for hidden Satanic references. > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan When your hammer is C++, everything begins to look like a thumb. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Jan 21 19:39:20 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 21 Jan 2014 10:39:20 -0800 Subject: [Python-ideas] Predicate Sets In-Reply-To: References: <1390204609.18428.YahooMailNeo@web181005.mail.ne1.yahoo.com> <20140121010724.GA3915@ando> <20140121023029.GB3915@ando> Message-ID: Isn't that exactly what I suggested up-thread with my suggested small library of combinators? E.g.: def allP(*fns): return lambda x: all(f(x) for f in fns) I like encapsulating it better since it encourages naming such combined functions, e.g.: this_and_that = allP((this, that)) I feel like that encourages reuse and readability when one later wants to write: set_with_predicate = {x for x in baseset if this_and_that(x)} Or: if this_and_that(x): ... On Tue, Jan 21, 2014 at 1:09 AM, Serhiy Storchaka wrote: > 21.01.14 10:58, Haoyi Li ???????(??): > >> > *all(map(list_of_filters, value))* >> >> >> Scratch that, what I actually want is >> >> *all(map(lambda f: f(value), list_of_filters))* >> * >> * >> I always mix up the order of things going into *map* =(* >> * >> > > all(f(value) for f in list_of_filters) > > looks cleaner to me. > > Perhaps slightly more efficient (but much less readable) form: > > all(map(operator.methodcaller('__call__', value), list_of_filters) > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jan 21 20:30:26 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 21 Jan 2014 11:30:26 -0800 (PST) Subject: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` In-Reply-To: References: <9aca6c85-f924-4adf-b205-a2acbf006bb1@googlegroups.com> <013F5951-85AC-4854-9915-D50E4A5319AF@yahoo.com> Message-ID: <1390332626.72340.YahooMailNeo@web181006.mail.ne1.yahoo.com> I slapped together a fork of concurrent/futures/process.py. It's named "procthreadex.py", and it just uses a ThreadPoolExecutor in the _process_worker function. You can get it at?http://pastebin.com/Ba2KPYy3, and a test program skeleton at?http://pastebin.com/ifwX6NaB. Maybe you can find a use case where ProcessThreadPoolExecutor(4, 4) outperforms ProcessPoolExecutor(16). (I haven't been able to.) >________________________________ > From: Ram Rachum >To: Andrew Barnert >Cc: "python-ideas at googlegroups.com" >Sent: Tuesday, January 21, 2014 2:17 AM >Subject: Re: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` > > > >If you're writing code that needs to use both a lot of IO and a lot of CPU. For example, you're downloading many items from the internet and then doing post-processing on them. > > > >On Tue, Jan 21, 2014 at 10:42 AM, Andrew Barnert wrote: > >On Jan 17, 2014, at 5:00, Ram Rachum wrote: >> >>> Hi, >>> >>> I'd like to use `concurrent.futures.ProcessPoolExecutor` but have each process contain multiple worker threads. We could have an `n_threads` argument to the constructor, defaulting to 1 to maintain backward compatibility, and setting a value higher than 1 would cause multiple threads to be spawned in each process. >> >>What for? >> >>Generally you use processes because you can't use threads. Whether this is because you're running CPU-bound code that needs to get around the GIL, because you want complete isolation between tasks, because your platform doesn't support threads, or any other reason I can think of, you wouldn't want threads per process either. >> >>There are use cases for multiple processes of multiple threads, like running four independent IOCP-based servers (let them all try to use all your cores and let the kernel load balance among them), or isolated tasks with sharing-based subtasks... But those kinds of uses don't make sense in a single executor. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Tue Jan 21 20:34:17 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 21 Jan 2014 21:34:17 +0200 Subject: [Python-ideas] Add `n_threads` argument to `concurrent.futures.ProcessPoolExecutor` In-Reply-To: <1390332626.72340.YahooMailNeo@web181006.mail.ne1.yahoo.com> References: <9aca6c85-f924-4adf-b205-a2acbf006bb1@googlegroups.com> <013F5951-85AC-4854-9915-D50E4A5319AF@yahoo.com> <1390332626.72340.YahooMailNeo@web181006.mail.ne1.yahoo.com> Message-ID: Thanks for writing this Andrew! I think you're right, it doesn't really offer a performance advantage over using multiple processes, so I guess I should stick to ProcessPoolExecutor. Thanks for taking the time to write this! Ram. On Tue, Jan 21, 2014 at 9:30 PM, Andrew Barnert wrote: > I slapped together a fork of concurrent/futures/process.py. It's named > "procthreadex.py", and it just uses a ThreadPoolExecutor in the > _process_worker function. You can get it at http://pastebin.com/Ba2KPYy3, > and a test program skeleton at http://pastebin.com/ifwX6NaB. > > Maybe you can find a use case where ProcessThreadPoolExecutor(4, 4) > outperforms ProcessPoolExecutor(16). (I haven't been able to.) > > ------------------------------ > *From:* Ram Rachum > *To:* Andrew Barnert > *Cc:* "python-ideas at googlegroups.com" > *Sent:* Tuesday, January 21, 2014 2:17 AM > > *Subject:* Re: [Python-ideas] Add `n_threads` argument to > `concurrent.futures.ProcessPoolExecutor` > > If you're writing code that needs to use both a lot of IO and a lot of > CPU. For example, you're downloading many items from the internet and then > doing post-processing on them. > > > On Tue, Jan 21, 2014 at 10:42 AM, Andrew Barnert wrote: > > On Jan 17, 2014, at 5:00, Ram Rachum wrote: > > > Hi, > > > > I'd like to use `concurrent.futures.ProcessPoolExecutor` but have each > process contain multiple worker threads. We could have an `n_threads` > argument to the constructor, defaulting to 1 to maintain backward > compatibility, and setting a value higher than 1 would cause multiple > threads to be spawned in each process. > > What for? > > Generally you use processes because you can't use threads. Whether this is > because you're running CPU-bound code that needs to get around the GIL, > because you want complete isolation between tasks, because your platform > doesn't support threads, or any other reason I can think of, you wouldn't > want threads per process either. > > There are use cases for multiple processes of multiple threads, like > running four independent IOCP-based servers (let them all try to use all > your cores and let the kernel load balance among them), or isolated tasks > with sharing-based subtasks... But those kinds of uses don't make sense in > a single executor. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Jan 21 21:17:30 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 21 Jan 2014 15:17:30 -0500 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: <1390335450.26805.73623317.1CE57BAE@webmail.messagingengine.com> On Fri, Jan 17, 2014, at 11:15, Chris Angelico wrote: > By that definition, a stable sort means that: > > lst = sorted((x,y)) > assert lst == [min(lst), max(lst)] > > will pass for any x and y. What definition of stable is this? Why not assert lst == [min(lst), max(lst[::-1])]? From rosuav at gmail.com Tue Jan 21 21:24:33 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 22 Jan 2014 07:24:33 +1100 Subject: [Python-ideas] Make max() stable In-Reply-To: <1390335450.26805.73623317.1CE57BAE@webmail.messagingengine.com> References: <20140117160604.GJ3915@ando> <1390335450.26805.73623317.1CE57BAE@webmail.messagingengine.com> Message-ID: On Wed, Jan 22, 2014 at 7:17 AM, wrote: > On Fri, Jan 17, 2014, at 11:15, Chris Angelico wrote: >> By that definition, a stable sort means that: >> >> lst = sorted((x,y)) >> assert lst == [min(lst), max(lst)] >> >> will pass for any x and y. > > What definition of stable is this? > Why not assert lst == [min(lst), max(lst[::-1])]? The OP's definition. ChrisA From random832 at fastmail.us Tue Jan 21 21:31:02 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 21 Jan 2014 15:31:02 -0500 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <52D9DA7D.5090307@nedbatchelder.com> <20140118081248.GN3915@ando> Message-ID: <1390336262.31187.73625561.7CDD574C@webmail.messagingengine.com> On Sat, Jan 18, 2014, at 5:40, Devin Jeanpierre wrote: > On Sat, Jan 18, 2014 at 12:12 AM, Steven D'Aprano > wrote: > > These variations only are meaningful if a and b are different types > > with the same value, or the same type but different identities. Even if > > these variations are important, I don't think there is any inherent > > benefit to one over the other. > > These variations are also important if a and b are just plain > different values, same type or no. This can happen if max/min are > passed a key function -- equality of a sort key doesn't mean the > values are interchangeable for all purposes I suspect you're getting hung up on two definitions of "value" - or maybe two definitions of "identity". Apropos of nothing, both functions will return NaN if it is the first element of the list, but not if it is in any other position. Of course, the behavior of sorting is also unreliable when faced with lists containing NaN. From mertz at gnosis.cx Tue Jan 21 22:02:05 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 21 Jan 2014 13:02:05 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: > > Imagine implementing min and max this way (ignoring key= and the > possibility of a single iterable arg): > > lst = sorted((x,y)) > assert lst == [min(lst), max(lst)] > > will pass for any x and y. > Well, that's not possible, of course, if one is willing to be slightly perverse: >>> @total_ordering ... class SomewhatOrdered(object): ... def __init__(self, val): ... self.val = val ... def __eq__(self, other): ... return self.val == other.val ... def __lt__(self, other): ... return (self.val, random()) < (other.val, random()) ... def __repr__(self): ... return repr(self.val) ... >>> x, y, z = map(SomewhatOrdered, (1, 1.0, 2)) But even if you were slightly less perverse than this, *sets* (and set-like collections) return elements in indeterminate order which the language does not guarantee. In particular, I do not think we are promised this holds: assert tuple(a)==tuple(b) if a==b else False I can certainly construct a class where that won't hold (i.e. a set-like class that iterates in a non-deterministic order; this need not even be perverse, e.g. if it is 'AsyncResultsSet' that gets its data from I/O source or parallel computations). I have a feeling I could find plain old Python sets that would fail that, but I'm not sure about it. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Jan 21 22:11:58 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 21 Jan 2014 13:11:58 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: Slightly related, here's an invariant that I've wished would hold for a decade, but isn't likely to, even in Python 4: assert all(not x>> a = {1, 1+0j, 2} >>> b = {1+0j, 1, 2} >>> a {(1+0j), 2} >>> b {1, 2} >>> a == b True On Tue, Jan 21, 2014 at 1:02 PM, David Mertz wrote: > Imagine implementing min and max this way (ignoring key= and the >> possibility of a single iterable arg): >> >> lst = sorted((x,y)) >> assert lst == [min(lst), max(lst)] >> >> will pass for any x and y. >> > > Well, that's not possible, of course, if one is willing to be slightly > perverse: > > >>> @total_ordering > ... class SomewhatOrdered(object): > ... def __init__(self, val): > ... self.val = val > ... def __eq__(self, other): > ... return self.val == other.val > ... def __lt__(self, other): > ... return (self.val, random()) < (other.val, random()) > ... def __repr__(self): > ... return repr(self.val) > ... > >>> x, y, z = map(SomewhatOrdered, (1, 1.0, 2)) > > But even if you were slightly less perverse than this, *sets* (and > set-like collections) return elements in indeterminate order which the > language does not guarantee. In particular, I do not think we are promised > this holds: > > assert tuple(a)==tuple(b) if a==b else False > > I can certainly construct a class where that won't hold (i.e. a set-like > class that iterates in a non-deterministic order; this need not even be > perverse, e.g. if it is 'AsyncResultsSet' that gets its data from I/O > source or parallel computations). > > I have a feeling I could find plain old Python sets that would fail that, > but I'm not sure about it. > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Jan 21 22:15:28 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 21 Jan 2014 13:15:28 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: On Tue, Jan 21, 2014 at 1:11 PM, David Mertz wrote: > Slightly related, here's an invariant that I've wished would hold for a > decade, but isn't likely to, even in Python 4: > > assert all(not x Ooops, I meant: assert all(not x dictionaries are, IMO, too sloppy about that. That is, they behave exactly > as documented and as the BDFL has decreed, but I still feel uneasy about: > > >>> a = {1, 1+0j, 2} > >>> b = {1+0j, 1, 2} > >>> a > {(1+0j), 2} > >>> b > {1, 2} > >>> a == b > True > > > On Tue, Jan 21, 2014 at 1:02 PM, David Mertz wrote: > >> Imagine implementing min and max this way (ignoring key= and the >>> possibility of a single iterable arg): >>> >>> lst = sorted((x,y)) >>> assert lst == [min(lst), max(lst)] >>> >>> will pass for any x and y. >>> >> >> Well, that's not possible, of course, if one is willing to be slightly >> perverse: >> >> >>> @total_ordering >> ... class SomewhatOrdered(object): >> ... def __init__(self, val): >> ... self.val = val >> ... def __eq__(self, other): >> ... return self.val == other.val >> ... def __lt__(self, other): >> ... return (self.val, random()) < (other.val, random()) >> ... def __repr__(self): >> ... return repr(self.val) >> ... >> >>> x, y, z = map(SomewhatOrdered, (1, 1.0, 2)) >> >> But even if you were slightly less perverse than this, *sets* (and >> set-like collections) return elements in indeterminate order which the >> language does not guarantee. In particular, I do not think we are promised >> this holds: >> >> assert tuple(a)==tuple(b) if a==b else False >> >> I can certainly construct a class where that won't hold (i.e. a set-like >> class that iterates in a non-deterministic order; this need not even be >> perverse, e.g. if it is 'AsyncResultsSet' that gets its data from I/O >> source or parallel computations). >> >> I have a feeling I could find plain old Python sets that would fail that, >> but I'm not sure about it. >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jan 21 22:21:50 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 22 Jan 2014 08:21:50 +1100 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: On Wed, Jan 22, 2014 at 8:11 AM, David Mertz wrote: > But this is just a question of inequality versus identity and that sets and > dictionaries are, IMO, too sloppy about that. That is, they behave exactly > as documented and as the BDFL has decreed, but I still feel uneasy about: > > >>> a = {1, 1+0j, 2} > >>> b = {1+0j, 1, 2} > >>> a > {(1+0j), 2} > >>> b > {1, 2} > >>> a == b > True This is because Python's made the decision that an int, a float, and a complex, representing the same number, should compare equal. I personally think they shouldn't (partly because it implies that they're all in some sort of tower, where the higher types can represent the lower types perfectly, and can perfectly represent that there's no further information - true of (float, complex) but not of (int, float), and it leads to problems with large integers), but it's a decision that's been made, and sets/dicts have to follow that. With small numbers, it just means that there's an identity-vs-value distinction (1 == 1.0 == 1+0j, but they're not is-identical), and sets have always had and will always have that issue. ChrisA From mertz at gnosis.cx Tue Jan 21 22:36:35 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 21 Jan 2014 13:36:35 -0800 Subject: [Python-ideas] Make max() stable In-Reply-To: References: <20140117160604.GJ3915@ando> Message-ID: Oh yeah, this has been my b?te noire for a long time. I think I first mentioned this in 2003 at: https://mail.python.org/pipermail/python-list/2003-March/205446.html Then later in an IBM developerWorks article in 2005: http://gnosis.cx/publish/programming/charming_python_b25.html (the URL for the IBM version seems to have gone 404). I do know why things are as they are and how to work with them... but hey, at least it let me coin the phrase "Incomparable abominations" which I am still rather proud of. On Tue, Jan 21, 2014 at 1:21 PM, Chris Angelico wrote: > On Wed, Jan 22, 2014 at 8:11 AM, David Mertz wrote: > > But this is just a question of inequality versus identity and that sets > and > > dictionaries are, IMO, too sloppy about that. That is, they behave > exactly > > as documented and as the BDFL has decreed, but I still feel uneasy about: > > > > >>> a = {1, 1+0j, 2} > > >>> b = {1+0j, 1, 2} > > >>> a > > {(1+0j), 2} > > >>> b > > {1, 2} > > >>> a == b > > True > > This is because Python's made the decision that an int, a float, and a > complex, representing the same number, should compare equal. I > personally think they shouldn't (partly because it implies that > they're all in some sort of tower, where the higher types can > represent the lower types perfectly, and can perfectly represent that > there's no further information - true of (float, complex) but not of > (int, float), and it leads to problems with large integers), but it's > a decision that's been made, and sets/dicts have to follow that. With > small numbers, it just means that there's an identity-vs-value > distinction (1 == 1.0 == 1+0j, but they're not is-identical), and sets > have always had and will always have that issue. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From musicdenotation at gmail.com Wed Jan 22 13:43:26 2014 From: musicdenotation at gmail.com (musicdenotation at gmail.com) Date: Wed, 22 Jan 2014 19:43:26 +0700 Subject: [Python-ideas] Multi-statement anonymous functions Message-ID: <2DF0D992-874B-4FFB-8F6D-9D6D3E6B7D42@gmail.com> 1. Mutable namespaces and variables are for computation processes like while or for loops. They are not for temporary variables (that is why classes and functions have their own scopes). 2. I want not to worry about name clashes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 23 00:38:56 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 23 Jan 2014 10:38:56 +1100 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <2DF0D992-874B-4FFB-8F6D-9D6D3E6B7D42@gmail.com> References: <2DF0D992-874B-4FFB-8F6D-9D6D3E6B7D42@gmail.com> Message-ID: <20140122233856.GG3915@ando> On Wed, Jan 22, 2014 at 07:43:26PM +0700, musicdenotation at gmail.com wrote: > 1. Mutable namespaces and variables are for computation processes like > while or for loops. They are not for temporary variables (that is why > classes and functions have their own scopes). > 2. I want not to worry about name clashes. You haven't quoted any context to these two points, so I don't really know how to interpret them. As far as point 1 goes, yes, I cautiously agree, but I don't understand your point, what you think that fact implies, or what relevance it has to the question of multi-statement lambda. As for point 2, I think everybody agrees that having to worry about name clashes is a bad thing. That's why modern programming languages like Python have multiple mechanisms for avoid name clashes, e.g. functions, modules. To say nothing of the good ol' fashioned technique of using naming conventions to avoid nameclashes in the same scope. If somebody *routinely* and *frequently* finds themselves having to worry about clashes, they are probably doing something wrong. But, I really don't understand your point. If you think this is relevant to the proposal, you should explain the connection, not just drop cryptic observations on the list. -- Steven From suresh_vv at yahoo.com Thu Jan 23 08:20:04 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 12:50:04 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions Message-ID: Can we add these two attributes for every function/method where each is a list of callables with the same arguments as the function/method itself? Pardon me if this has been discussed before. Pointers to past discussions (if any) appreciated. Suresh From rosuav at gmail.com Thu Jan 23 08:52:31 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 23 Jan 2014 18:52:31 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: > Can we add these two attributes for every function/method where each is a > list of callables with the same arguments as the function/method itself? > > Pardon me if this has been discussed before. Pointers to past discussions > (if any) appreciated. I'm not exactly sure what you're looking for here. What causes a callable to be added to a function's __before__ list, and/or what will be done with it? If you mean that they'll be called before and after the function itself, that can be more cleanly done with a decorator. ChrisA From suresh_vv at yahoo.com Thu Jan 23 09:11:07 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 13:41:07 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: > On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >> Can we add these two attributes for every function/method where each is a >> list of callables with the same arguments as the function/method itself? >> >> Pardon me if this has been discussed before. Pointers to past discussions >> (if any) appreciated. > > I'm not exactly sure what you're looking for here. What causes a > callable to be added to a function's __before__ list, and/or what will > be done with it? These are modifiable attributes, so something can be added/deleted from the __before__ or __after__ lists. > > If you mean that they'll be called before and after the function > itself, that can be more cleanly done with a decorator. Yes. Each item in the list will be called in order immediately before/after each invocation of the function. This is kinda like decorators, but more flexible and simpler. Scope for abuse may be higher too :-) Suresh From rosuav at gmail.com Thu Jan 23 09:20:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 23 Jan 2014 19:20:44 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. wrote: > On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >> >> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >>> >>> Can we add these two attributes for every function/method where each is a >>> list of callables with the same arguments as the function/method itself? >>> >>> Pardon me if this has been discussed before. Pointers to past discussions >>> (if any) appreciated. >> >> >> I'm not exactly sure what you're looking for here. What causes a >> callable to be added to a function's __before__ list, and/or what will >> be done with it? > > > These are modifiable attributes, so something can be added/deleted from the > __before__ or __after__ lists. > > >> >> If you mean that they'll be called before and after the function >> itself, that can be more cleanly done with a decorator. > > > Yes. Each item in the list will be called in order immediately before/after > each invocation of the function. This is kinda like decorators, but more > flexible and simpler. Scope for abuse may be higher too :-) def prepostcall(func): def wrapper(*args,**kwargs): for f in wrapper.before: f(*args,**kwargs) ret = func(*args,**kwargs) for f in wrapper.after: f(*args,**kwargs) return ret wrapper.before = [] wrapper.after = [] return wrapper @prepostcall def foo(x,y,z): return x*y+z foo.before.append(lambda x,y,z: print("Pre-call")) foo.after.append(lambda x,y,z: print("Post-call")) Now just deal with the question of whether the after functions should be called if the wrapped function throws :) ChrisA From suresh_vv at yahoo.com Thu Jan 23 09:31:50 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 14:01:50 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: Nicely done :-) "foo" may come from a library or something, so rather than a decorator we may have to monkey patch it. Unless there is a nicer solution. Will functools be a good place for something like this? On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: > On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. wrote: >> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >>> >>> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >>>> >>>> Can we add these two attributes for every function/method where each is a >>>> list of callables with the same arguments as the function/method itself? >>>> >>>> Pardon me if this has been discussed before. Pointers to past discussions >>>> (if any) appreciated. >>> >>> >>> I'm not exactly sure what you're looking for here. What causes a >>> callable to be added to a function's __before__ list, and/or what will >>> be done with it? >> >> >> These are modifiable attributes, so something can be added/deleted from the >> __before__ or __after__ lists. >> >> >>> >>> If you mean that they'll be called before and after the function >>> itself, that can be more cleanly done with a decorator. >> >> >> Yes. Each item in the list will be called in order immediately before/after >> each invocation of the function. This is kinda like decorators, but more >> flexible and simpler. Scope for abuse may be higher too :-) > > def prepostcall(func): > def wrapper(*args,**kwargs): > for f in wrapper.before: f(*args,**kwargs) > ret = func(*args,**kwargs) > for f in wrapper.after: f(*args,**kwargs) > return ret > wrapper.before = [] > wrapper.after = [] > return wrapper > > @prepostcall > def foo(x,y,z): > return x*y+z > > foo.before.append(lambda x,y,z: print("Pre-call")) > foo.after.append(lambda x,y,z: print("Post-call")) > > Now just deal with the question of whether the after functions should > be called if the wrapped function throws :) > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From aquavitae69 at gmail.com Thu Jan 23 09:52:07 2014 From: aquavitae69 at gmail.com (David Townshend) Date: Thu, 23 Jan 2014 10:52:07 +0200 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: Maybe I'm missing something, but what's the use case, and why aren't plain old decorators suitable? On Thu, Jan 23, 2014 at 10:31 AM, Suresh V. wrote: > Nicely done :-) > > "foo" may come from a library or something, so rather than a decorator we > may have to monkey patch it. Unless there is a nicer solution. > > Will functools be a good place for something like this? > > > On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: > >> On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. wrote: >> >>> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >>> >>>> >>>> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >>>> >>>>> >>>>> Can we add these two attributes for every function/method where each >>>>> is a >>>>> list of callables with the same arguments as the function/method >>>>> itself? >>>>> >>>>> Pardon me if this has been discussed before. Pointers to past >>>>> discussions >>>>> (if any) appreciated. >>>>> >>>> >>>> >>>> I'm not exactly sure what you're looking for here. What causes a >>>> callable to be added to a function's __before__ list, and/or what will >>>> be done with it? >>>> >>> >>> >>> These are modifiable attributes, so something can be added/deleted from >>> the >>> __before__ or __after__ lists. >>> >>> >>> >>>> If you mean that they'll be called before and after the function >>>> itself, that can be more cleanly done with a decorator. >>>> >>> >>> >>> Yes. Each item in the list will be called in order immediately >>> before/after >>> each invocation of the function. This is kinda like decorators, but more >>> flexible and simpler. Scope for abuse may be higher too :-) >>> >> >> def prepostcall(func): >> def wrapper(*args,**kwargs): >> for f in wrapper.before: f(*args,**kwargs) >> ret = func(*args,**kwargs) >> for f in wrapper.after: f(*args,**kwargs) >> return ret >> wrapper.before = [] >> wrapper.after = [] >> return wrapper >> >> @prepostcall >> def foo(x,y,z): >> return x*y+z >> >> foo.before.append(lambda x,y,z: print("Pre-call")) >> foo.after.append(lambda x,y,z: print("Post-call")) >> >> Now just deal with the question of whether the after functions should >> be called if the wrapped function throws :) >> >> > > > > ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 23 09:57:27 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Jan 2014 18:57:27 +1000 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On 23 Jan 2014 18:32, "Suresh V." wrote: > > Nicely done :-) > > "foo" may come from a library or something, so rather than a decorator we may have to monkey patch it. Unless there is a nicer solution. > > Will functools be a good place for something like this? Another idea along similar lines is the object model in Elk: http://frasertweedale.github.io/elk/ (that's a before/after/around subclass method model, designed specifically as an alternative to using super() to call up to the parent implementation). The main problem with the idea of doing this as a more general feature for arbitrary callables is that it has most of the same downsides as monkey-patching while being strictly less powerful and even more confusing (since it would be difficult to model clearly in tracebacks). Cheers, Nick. > > > On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: >> >> On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. wrote: >>> >>> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >>>> >>>> >>>> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >>>>> >>>>> >>>>> Can we add these two attributes for every function/method where each is a >>>>> list of callables with the same arguments as the function/method itself? >>>>> >>>>> Pardon me if this has been discussed before. Pointers to past discussions >>>>> (if any) appreciated. >>>> >>>> >>>> >>>> I'm not exactly sure what you're looking for here. What causes a >>>> callable to be added to a function's __before__ list, and/or what will >>>> be done with it? >>> >>> >>> >>> These are modifiable attributes, so something can be added/deleted from the >>> __before__ or __after__ lists. >>> >>> >>>> >>>> If you mean that they'll be called before and after the function >>>> itself, that can be more cleanly done with a decorator. >>> >>> >>> >>> Yes. Each item in the list will be called in order immediately before/after >>> each invocation of the function. This is kinda like decorators, but more >>> flexible and simpler. Scope for abuse may be higher too :-) >> >> >> def prepostcall(func): >> def wrapper(*args,**kwargs): >> for f in wrapper.before: f(*args,**kwargs) >> ret = func(*args,**kwargs) >> for f in wrapper.after: f(*args,**kwargs) >> return ret >> wrapper.before = [] >> wrapper.after = [] >> return wrapper >> >> @prepostcall >> def foo(x,y,z): >> return x*y+z >> >> foo.before.append(lambda x,y,z: print("Pre-call")) >> foo.after.append(lambda x,y,z: print("Post-call")) >> >> Now just deal with the question of whether the after functions should >> be called if the wrapped function throws :) >> > > > > >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 23 10:08:32 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 23 Jan 2014 09:08:32 +0000 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On 23 January 2014 08:57, Nick Coghlan wrote: > The main problem with the idea of doing this as a more general feature for > arbitrary callables is that it has most of the same downsides as > monkey-patching while being strictly less powerful and even more confusing > (since it would be difficult to model clearly in tracebacks). Also, this would add overhead to all function calls (even if no before/after functions exist, checking the lists has a small cost) and function call overhead is already higher than many people would like. Paul From suresh_vv at yahoo.com Thu Jan 23 10:17:50 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 14:47:50 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thursday 23 January 2014 02:22 PM, David Townshend wrote: > Maybe I'm missing something, but what's the use case, and why aren't > plain old decorators suitable? May be they are. Let us say I want to alter the way the smtplib.SMTP.sendmail method works. I would like it to call a function that I define.I can then add this function to the __before__ attribute of this library function. Can this be done with decorators? > > > On Thu, Jan 23, 2014 at 10:31 AM, Suresh V. > > wrote: > > Nicely done :-) > > "foo" may come from a library or something, so rather than a > decorator we may have to monkey patch it. Unless there is a nicer > solution. > > Will functools be a good place for something like this? > > > On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: > > On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. > > wrote: > > On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: > > > On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. > > wrote: > > > Can we add these two attributes for every > function/method where each is a > list of callables with the same arguments as the > function/method itself? > > Pardon me if this has been discussed before. > Pointers to past discussions > (if any) appreciated. > > > > I'm not exactly sure what you're looking for here. What > causes a > callable to be added to a function's __before__ list, > and/or what will > be done with it? > > > > These are modifiable attributes, so something can be > added/deleted from the > __before__ or __after__ lists. > > > > If you mean that they'll be called before and after the > function > itself, that can be more cleanly done with a decorator. > > > > Yes. Each item in the list will be called in order > immediately before/after > each invocation of the function. This is kinda like > decorators, but more > flexible and simpler. Scope for abuse may be higher too :-) > > > def prepostcall(func): > def wrapper(*args,**kwargs): > for f in wrapper.before: f(*args,**kwargs) > ret = func(*args,**kwargs) > for f in wrapper.after: f(*args,**kwargs) > return ret > wrapper.before = [] > wrapper.after = [] > return wrapper > > @prepostcall > def foo(x,y,z): > return x*y+z > > foo.before.append(lambda x,y,z: print("Pre-call")) > foo.after.append(lambda x,y,z: print("Post-call")) > > Now just deal with the question of whether the after functions > should > be called if the wrapped function throws :) > > > > > > ChrisA > _________________________________________________ > Python-ideas mailing list > Python-ideas at python.org > > https://mail.python.org/__mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/__codeofconduct/ > > > > > _________________________________________________ > Python-ideas mailing list > Python-ideas at python.org > > https://mail.python.org/__mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/__codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From aquavitae69 at gmail.com Thu Jan 23 10:27:55 2014 From: aquavitae69 at gmail.com (David Townshend) Date: Thu, 23 Jan 2014 11:27:55 +0200 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 11:17 AM, Suresh V. wrote: > On Thursday 23 January 2014 02:22 PM, David Townshend wrote: > >> Maybe I'm missing something, but what's the use case, and why aren't >> plain old decorators suitable? >> > > May be they are. > > Let us say I want to alter the way the smtplib.SMTP.sendmail method works. > I would like it to call a function that I define.I can then add this > function to the __before__ attribute of this library function. > > Can this be done with decorators? > Not a decorator, but you can monkey patch it: @wraps(smtplib.SMTP.sendmail) def sendmail(*args, **kwargs): other_function() return smtplib.SMPT.sendmail(*args, **kwargs) smtplib.SMTP.sendmail = sendmail But I still don't see a good reason for using __before__ rather than the above, other than slightly less typing. In a specific project there might be a lot of this going on and brevity would be justifiable, but in that case writing your own decorator is easy enough. > >> >> On Thu, Jan 23, 2014 at 10:31 AM, Suresh V. >> > > wrote: >> >> Nicely done :-) >> >> "foo" may come from a library or something, so rather than a >> decorator we may have to monkey patch it. Unless there is a nicer >> solution. >> >> Will functools be a good place for something like this? >> >> >> On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: >> >> On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. >> > > wrote: >> >> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >> >> >> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. >> > > wrote: >> >> >> Can we add these two attributes for every >> function/method where each is a >> list of callables with the same arguments as the >> function/method itself? >> >> Pardon me if this has been discussed before. >> Pointers to past discussions >> (if any) appreciated. >> >> >> >> I'm not exactly sure what you're looking for here. What >> causes a >> callable to be added to a function's __before__ list, >> and/or what will >> be done with it? >> >> >> >> These are modifiable attributes, so something can be >> added/deleted from the >> __before__ or __after__ lists. >> >> >> >> If you mean that they'll be called before and after the >> function >> itself, that can be more cleanly done with a decorator. >> >> >> >> Yes. Each item in the list will be called in order >> immediately before/after >> each invocation of the function. This is kinda like >> decorators, but more >> flexible and simpler. Scope for abuse may be higher too :-) >> >> >> def prepostcall(func): >> def wrapper(*args,**kwargs): >> for f in wrapper.before: f(*args,**kwargs) >> ret = func(*args,**kwargs) >> for f in wrapper.after: f(*args,**kwargs) >> return ret >> wrapper.before = [] >> wrapper.after = [] >> return wrapper >> >> @prepostcall >> def foo(x,y,z): >> return x*y+z >> >> foo.before.append(lambda x,y,z: print("Pre-call")) >> foo.after.append(lambda x,y,z: print("Post-call")) >> >> Now just deal with the question of whether the after functions >> should >> be called if the wrapped function throws :) >> >> >> >> >> >> ChrisA >> _________________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> >> https://mail.python.org/__mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/__codeofconduct/ >> >> >> >> >> _________________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> >> https://mail.python.org/__mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/__codeofconduct/ >> >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Jan 23 10:32:19 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 23 Jan 2014 18:32:19 +0900 Subject: [Python-ideas] Multi-statement anonymous functions In-Reply-To: <2DF0D992-874B-4FFB-8F6D-9D6D3E6B7D42@gmail.com> References: <2DF0D992-874B-4FFB-8F6D-9D6D3E6B7D42@gmail.com> Message-ID: <878uu782n0.fsf@uwakimon.sk.tsukuba.ac.jp> musicdenotation at gmail.com writes: > 1. Mutable namespaces and variables are for computation processes > like while or for loops. They are not for temporary variables (that > is why classes and functions have their own scopes).2. I want not > to worry about name clashes. Most of the things you have proposed in recent weeks have long since been shot down to my knowledge, and I wouldn't be surprised to find that the rest are dead on arrival, too. And we have already heard all the standard arguments *for*, and the people who make the decisions weren't impressed then -- they had sufficient arguments *against*. They're pretty consistent about not having their minds changed by neutrino strikes, too, so, no chance of random reversal. That doesn't mean these issues *can't* be re-raised. It does mean people are going to lose patience with you if you don't bring answers for at least some of the issues that got the ideas shot down in the past with you. Generic arguments in favor don't cut it for rejected ideas. And if you don't know what those issues are, strictly speaking, asking is off-topic here (belongs on python-list). I think the most successful radical in recent months has been Haoyi. Grep the archives for his posts (including a proposal for multistatement lambdas, IIRC, and another for "macros"). They are exemplary as to the style you should bring to re-raising a defeated proposal. (Nor do you have to beat Haoyi's standard. Just look at them, they are, as I say, "exemplary". Note that AFAIK he hasn't actually *won* one yet, but he's certainly got the Powers-That- Be thinking seriously about his proposals.) From suresh_vv at yahoo.com Thu Jan 23 10:35:26 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 15:05:26 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thursday 23 January 2014 02:27 PM, Nick Coghlan wrote: > > On 23 Jan 2014 18:32, "Suresh V." > > wrote: > > > > Nicely done :-) > > > > "foo" may come from a library or something, so rather than a > decorator we may have to monkey patch it. Unless there is a nicer solution. > > > > Will functools be a good place for something like this? > > Another idea along similar lines is the object model in Elk: > http://frasertweedale.github.io/elk/ (that's a before/after/around > subclass method model, designed specifically as an alternative to using > super() to call up to the parent implementation). Thanks for the link. Has some interesting ideas. > > The main problem with the idea of doing this as a more general feature > for arbitrary callables is that it has most of the same downsides as > monkey-patching while being strictly less powerful and even more > confusing (since it would be difficult to model clearly in tracebacks). > While being less powerful than monkey patching, it offers a more disciplined way by just adding before/after functionality. I don't see the problems with tracebacks, they just list the before/after function, which is like any other function. From suresh_vv at yahoo.com Thu Jan 23 10:52:56 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Thu, 23 Jan 2014 15:22:56 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thursday 23 January 2014 02:57 PM, David Townshend wrote: > > Not a decorator, but you can monkey patch it: > > @wraps(smtplib.SMTP.sendmail) > def sendmail(*args, **kwargs): > other_function() > return smtplib.SMPT.sendmail(*args, **kwargs) > > smtplib.SMTP.sendmail = sendmail > Correct. I want to say something like: from functools import prepostcall smtplib.SMTP.sendmail = prepostcall(smtplib.SMTP.sendmail) smtplib.SMTP.sendmail.before.append(other_function) This seems less error-prone. And more conducive to multiple patching. From rosuav at gmail.com Thu Jan 23 10:58:05 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 23 Jan 2014 20:58:05 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 8:52 PM, Suresh V. wrote: > Correct. I want to say something like: > > from functools import prepostcall > smtplib.SMTP.sendmail = prepostcall(smtplib.SMTP.sendmail) > smtplib.SMTP.sendmail.before.append(other_function) > > This seems less error-prone. And more conducive to multiple patching. Easy. Just replace the import statement with the def that I gave above, and then it works. Or make your own module of "handy stuff" and use that. Not everything has to be in the stdlib :) ChrisA From tjreedy at udel.edu Thu Jan 23 11:56:16 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Jan 2014 05:56:16 -0500 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On 1/23/2014 3:31 AM, Suresh V. wrote: Top-posting make posts/threads somewhat harder to follow for readers. A decorators is simply a function named before a function def that is called on the resulting function after the function is called. In other words, it is purely syntactic sugar and @prepostcall def foo(x,y,z): return x*y+z is equivalent to def foo(... foo = prepostcall(foo) For builtins, call the decorator function directly on the builtin. In other words, use the last line of the equivalent. int = prepostcall(int) or use another name if you do not want to mask int. > Nicely done :-) > > "foo" may come from a library or something, so rather than a decorator > we may have to monkey patch it. Unless there is a nicer solution. > > Will functools be a good place for something like this? > > On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: >> On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. >> wrote: >>> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >>>> >>>> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. >>>> wrote: >>>>> >>>>> Can we add these two attributes for every function/method where >>>>> each is a >>>>> list of callables with the same arguments as the function/method >>>>> itself? >>>>> >>>>> Pardon me if this has been discussed before. Pointers to past >>>>> discussions >>>>> (if any) appreciated. >>>> >>>> >>>> I'm not exactly sure what you're looking for here. What causes a >>>> callable to be added to a function's __before__ list, and/or what will >>>> be done with it? >>> >>> >>> These are modifiable attributes, so something can be added/deleted >>> from the >>> __before__ or __after__ lists. >>> >>> >>>> >>>> If you mean that they'll be called before and after the function >>>> itself, that can be more cleanly done with a decorator. >>> >>> >>> Yes. Each item in the list will be called in order immediately >>> before/after >>> each invocation of the function. This is kinda like decorators, but more >>> flexible and simpler. Scope for abuse may be higher too :-) >> >> def prepostcall(func): >> def wrapper(*args,**kwargs): >> for f in wrapper.before: f(*args,**kwargs) >> ret = func(*args,**kwargs) >> for f in wrapper.after: f(*args,**kwargs) >> return ret >> wrapper.before = [] >> wrapper.after = [] >> return wrapper >> >> @prepostcall >> def foo(x,y,z): >> return x*y+z >> >> foo.before.append(lambda x,y,z: print("Pre-call")) >> foo.after.append(lambda x,y,z: print("Post-call")) >> >> Now just deal with the question of whether the after functions should >> be called if the wrapped function throws :) >> > > > > >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Terry Jan Reedy From solipsis at pitrou.net Thu Jan 23 16:08:36 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Jan 2014 16:08:36 +0100 Subject: [Python-ideas] __before__ and __after__ attributes for functions References: Message-ID: <20140123160836.3452d8ad@fsol> On Thu, 23 Jan 2014 14:01:50 +0530 "Suresh V." wrote: > Nicely done :-) > > "foo" may come from a library or something, so rather than a decorator > we may have to monkey patch it. Unless there is a nicer solution. > > Will functools be a good place for something like this? If you think this is interesting (for contract-based programming perhaps?), I suggest it should first go into a library uploaded in PyPI, so people can play with it and you refine the API. Note that you could tweak Chris' implementation to be able to write instead: @prepostcall def foo(x,y,z): return x*y+z @foo.before def foo_precond(x, y, z): print("Pre-call") @foo.after def foo_postcond(x, y, y): # XXX should the "after" function also receive the return value? print("Post-call") Regards Antoine. From rosuav at gmail.com Thu Jan 23 16:12:41 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jan 2014 02:12:41 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: <20140123160836.3452d8ad@fsol> References: <20140123160836.3452d8ad@fsol> Message-ID: On Fri, Jan 24, 2014 at 2:08 AM, Antoine Pitrou wrote: > # XXX should the "after" function also receive the return value? That's a possible consideration, but it messes up the "has the same arguments" bit. Plus, what happens to the after function(s) if the main function throws an error? (And what happens to the main if a before function bombs?) Very hard to solve in the general case, which is a good reason for this NOT to go into the stdlib, but just to be implemented whenever it's wanted. ChrisA From abarnert at yahoo.com Thu Jan 23 19:10:33 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 23 Jan 2014 10:10:33 -0800 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: <20140123160836.3452d8ad@fsol> Message-ID: On Jan 23, 2014, at 7:12, Chris Angelico wrote: > On Fri, Jan 24, 2014 at 2:08 AM, Antoine Pitrou wrote: >> # XXX should the "after" function also receive the return value? > > That's a possible consideration, but it messes up the "has the same > arguments" bit. Plus, what happens to the after function(s) if the > main function throws an error? (And what happens to the main if a > before function bombs?) Very hard to solve in the general case, which > is a good reason for this NOT to go into the stdlib, but just to be > implemented whenever it's wanted. There _might_ be good, usually-right answers to these questions. But the only way we're likely to find them is if someone puts it up on PyPI and people start using it, not by guessing a priori. Which is another good reason not to go straight for the stdlib. And a PyPI module can go crazy with options: have after functions that do or don't get the result based on an arg to the decorator, and that do or don't replace the result, and before functions that can return replacement args, and after_except functions that run on exception, get the exception, and can raise or return (think of deferred chaining options), or whatever else you can think of. From mertz at gnosis.cx Thu Jan 23 19:14:39 2014 From: mertz at gnosis.cx (David Mertz) Date: Thu, 23 Jan 2014 10:14:39 -0800 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 12:31 AM, Suresh V. wrote: > Nicely done :-) > "foo" may come from a library or something, so rather than a decorator we > may have to monkey patch it. Unless there is a nicer solution. > Will functools be a good place for something like this? Not really monkey patching. Just: from library import foo @prepostcall def foo(*args, **kws): return foo(*args, **kws) It's just rebinding the name 'foo' with the decorator. > > > On Thursday 23 January 2014 01:50 PM, Chris Angelico wrote: > >> On Thu, Jan 23, 2014 at 7:11 PM, Suresh V. wrote: >> >>> On Thursday 23 January 2014 01:22 PM, Chris Angelico wrote: >>> >>>> >>>> On Thu, Jan 23, 2014 at 6:20 PM, Suresh V. wrote: >>>> >>>>> >>>>> Can we add these two attributes for every function/method where each >>>>> is a >>>>> list of callables with the same arguments as the function/method >>>>> itself? >>>>> >>>>> Pardon me if this has been discussed before. Pointers to past >>>>> discussions >>>>> (if any) appreciated. >>>>> >>>> >>>> >>>> I'm not exactly sure what you're looking for here. What causes a >>>> callable to be added to a function's __before__ list, and/or what will >>>> be done with it? >>>> >>> >>> >>> These are modifiable attributes, so something can be added/deleted from >>> the >>> __before__ or __after__ lists. >>> >>> >>> >>>> If you mean that they'll be called before and after the function >>>> itself, that can be more cleanly done with a decorator. >>>> >>> >>> >>> Yes. Each item in the list will be called in order immediately >>> before/after >>> each invocation of the function. This is kinda like decorators, but more >>> flexible and simpler. Scope for abuse may be higher too :-) >>> >> >> def prepostcall(func): >> def wrapper(*args,**kwargs): >> for f in wrapper.before: f(*args,**kwargs) >> ret = func(*args,**kwargs) >> for f in wrapper.after: f(*args,**kwargs) >> return ret >> wrapper.before = [] >> wrapper.after = [] >> return wrapper >> >> @prepostcall >> def foo(x,y,z): >> return x*y+z >> >> foo.before.append(lambda x,y,z: print("Pre-call")) >> foo.after.append(lambda x,y,z: print("Post-call")) >> >> Now just deal with the question of whether the after functions should >> be called if the wrapped function throws :) >> >> > > > > ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jan 23 19:17:28 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jan 2014 05:17:28 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 5:14 AM, David Mertz wrote: > from library import foo > @prepostcall > def foo(*args, **kws): > return foo(*args, **kws) That's going to infinite-loop, so you'd need to do an 'as' import: from library import foo as foo_original @prepostcall def foo(*args, **kws): return foo_original(*args, **kws) Of course, this assumes you want to do a 'from' import in the first place, rather than the more common approach of referencing 'library.foo()' - if the latter, then it is monkeypatching you need. ChrisA From mertz at gnosis.cx Thu Jan 23 19:31:45 2014 From: mertz at gnosis.cx (David Mertz) Date: Thu, 23 Jan 2014 10:31:45 -0800 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Thu, Jan 23, 2014 at 10:17 AM, Chris Angelico wrote: > On Fri, Jan 24, 2014 at 5:14 AM, David Mertz wrote: > > from library import foo > > @prepostcall > > def foo(*args, **kws): > > return foo(*args, **kws) > > That's going to infinite-loop, so you'd need to do an 'as' import: > > from library import foo as foo_original > @prepostcall > def foo(*args, **kws): > return foo_original(*args, **kws) > > Of course, this assumes you want to do a 'from' import in the first > place, rather than the more common approach of referencing > 'library.foo()' - if the latter, then it is monkeypatching you need. > All true. For some reason I was thinking of the timing of the binding wrongly re. the infinite-loop. But yes, obviously using a different name in an 'as' import solves that. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 24 00:06:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 23 Jan 2014 15:06:08 -0800 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: <52E1A060.6030409@stoneleaf.us> On 01/23/2014 12:31 AM, Suresh V. wrote: > > Will functools be a good place for something like this? PyPI is a good place for this. If it does well there, and stabilizes, /maybe/ it will get into the stdlib. -- ~Ethan~ From suresh_vv at yahoo.com Fri Jan 24 05:09:36 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Fri, 24 Jan 2014 09:39:36 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: On Friday 24 January 2014 12:01 AM, David Mertz wrote: > On Thu, Jan 23, 2014 at 10:17 AM, Chris Angelico > > wrote: > > On Fri, Jan 24, 2014 at 5:14 AM, David Mertz > > wrote: > > from library import foo > > @prepostcall > > def foo(*args, **kws): > > return foo(*args, **kws) > > That's going to infinite-loop, so you'd need to do an 'as' import: > > from library import foo as foo_original > @prepostcall > def foo(*args, **kws): > return foo_original(*args, **kws) > > Of course, this assumes you want to do a 'from' import in the first > place, rather than the more common approach of referencing > 'library.foo()' - if the latter, then it is monkeypatching you need. > > > All true. For some reason I was thinking of the timing of the binding > wrongly re. the infinite-loop. But yes, obviously using a different name > in an 'as' import solves that. Also it would mean that the client code imports from this package. I would like client code to remain exactly as it is (continue to import from its original package) but the behavior is enhanced once this package is imported on startup. From ethan at stoneleaf.us Fri Jan 24 06:09:02 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 23 Jan 2014 21:09:02 -0800 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: Message-ID: <52E1F56E.2030805@stoneleaf.us> On 01/23/2014 08:09 PM, Suresh V. wrote: > > Also it would mean that the client code imports from this package. > I would like client code to remain exactly as it is (continue to > import from its original package) but the behavior is enhanced > once this package is imported on startup. /Something/ has to adjust the pre and post conditions -- if not the client code, then what? -- ~Ethan~ From rosuav at gmail.com Fri Jan 24 08:10:06 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jan 2014 18:10:06 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: <52E1F56E.2030805@stoneleaf.us> References: <52E1F56E.2030805@stoneleaf.us> Message-ID: On Fri, Jan 24, 2014 at 4:09 PM, Ethan Furman wrote: > On 01/23/2014 08:09 PM, Suresh V. wrote: >> >> >> Also it would mean that the client code imports from this package. >> I would like client code to remain exactly as it is (continue to >> import from its original package) but the behavior is enhanced >> once this package is imported on startup. > > > /Something/ has to adjust the pre and post conditions -- if not the client > code, then what? # foo.py: import blah blah.quux() # bar.py: import blah blah.quux.__before__.append(......) import foo With code like that, modifying/rebinding the 'quux' inside bar.py won't affect what happens when foo is imported, ergo monkeypatching the blah module is key. ChrisA From suresh_vv at yahoo.com Fri Jan 24 08:54:07 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Fri, 24 Jan 2014 13:24:07 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: <52E1F56E.2030805@stoneleaf.us> References: <52E1F56E.2030805@stoneleaf.us> Message-ID: On Friday 24 January 2014 10:39 AM, Ethan Furman wrote: > On 01/23/2014 08:09 PM, Suresh V. wrote: >> >> Also it would mean that the client code imports from this package. >> I would like client code to remain exactly as it is (continue to >> import from its original package) but the behavior is enhanced >> once this package is imported on startup. > > /Something/ has to adjust the pre and post conditions -- if not the > client code, then what? pre and post conditions are just one possible use of this. Going back to my smtplib.SMTP.sendmail example. No changes in bulk of client code. Single patch module imported in main. client.py (no changes) from smtplib import SMTP def send_email(): SMTP.sendmail(...) patch.py (new module) from smtplib import SMTP from prepost import prepostcall SMTP.sendmail = prepostcall(SMTP.sendmail) def my_other_func(): pass SMTP.sendmail.before.insert(my_other_function) main.py (single line modification) import patch # new code import client client.send_email() From rosuav at gmail.com Fri Jan 24 12:32:20 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 24 Jan 2014 22:32:20 +1100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: <52E1F56E.2030805@stoneleaf.us> Message-ID: On Fri, Jan 24, 2014 at 6:54 PM, Suresh V. wrote: > patch.py (new module) > > from smtplib import SMTP > from prepost import prepostcall > SMTP.sendmail = prepostcall(SMTP.sendmail) > def my_other_func(): > pass > SMTP.sendmail.before.insert(my_other_function) > > main.py (single line modification) > > import patch # new code > import client > client.send_email() This will work, as long as you do this before any code gets loaded that does "from smtplib.SMTP import sendmail". (The style you use here would work fine, though.) But remember the old adage: With great power comes great responsibility. [1] If the mere importing of another module causes a drastic change in something in the standard library, you risk confusing all sorts of debugging efforts. Stick to really REALLY simple functions, be absolutely sure they're not going to change anything, and for the love of sanity, do NOT mutate any of the arguments. Don't do this: def my_other_func(from_addr, to_addrs, *otherargs): to_addrs.append("secret_bcc at some.domain.com") unless you have a strong desire to be brutally murdered by someone who's just spent three hours trying to find out why his mail is going crazy. ChrisA [1] Or was it something about current? http://xkcd.com/643/ From ram.rachum at gmail.com Fri Jan 24 17:47:14 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Fri, 24 Jan 2014 08:47:14 -0800 (PST) Subject: [Python-ideas] str.rreplace Message-ID: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> I propose implementing str.rreplace. (It'll be to str.replace what str.rsplit is to str.split.) What do you think? -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Jan 24 17:56:45 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Jan 2014 17:56:45 +0100 Subject: [Python-ideas] str.rreplace References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> Message-ID: <20140124175645.66bb8daf@fsol> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) Ram Rachum wrote: > I propose implementing str.rreplace. (It'll be to str.replace what > str.rsplit is to str.split.) I suppose it only differs when the count parameter is supplied? I don't think it can hurt, except for the funny looks of its name. In any case, if str.rreplace is added then so should bytes.rreplace and bytearray.rreplace. Regards Antoine. From ram at rachum.com Fri Jan 24 18:00:05 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 24 Jan 2014 19:00:05 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124175645.66bb8daf@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: Yep, it differs only when count is supplied. Yep, bytes.rreplace and bytearray.rreplace and par for the course :) And yes, the name is annoying, but what can you do? Plus now that I think about it the first two letters happen to be my initials, so I suggest I should be happy :) On Fri, Jan 24, 2014 at 6:56 PM, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > Ram Rachum wrote: > > I propose implementing str.rreplace. (It'll be to str.replace what > > str.rsplit is to str.split.) > > I suppose it only differs when the count parameter is supplied? > > I don't think it can hurt, except for the funny looks of its name. > In any case, if str.rreplace is added then so should bytes.rreplace and > bytearray.rreplace. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/4cLkOx18u48/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jan 24 18:30:00 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 24 Jan 2014 19:30:00 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124175645.66bb8daf@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: 24.01.14 18:56, Antoine Pitrou ???????(??): > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > Ram Rachum wrote: >> I propose implementing str.rreplace. (It'll be to str.replace what >> str.rsplit is to str.split.) > > I suppose it only differs when the count parameter is supplied? > > I don't think it can hurt, except for the funny looks of its name. > In any case, if str.rreplace is added then so should bytes.rreplace and > bytearray.rreplace. bytearray.rremove, tuple.rindex, list.rindex, list.rremove. From solipsis at pitrou.net Fri Jan 24 18:36:33 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Jan 2014 18:36:33 +0100 Subject: [Python-ideas] str.rreplace References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: <20140124183633.60f215f6@fsol> On Fri, 24 Jan 2014 19:30:00 +0200 Serhiy Storchaka wrote: > 24.01.14 18:56, Antoine Pitrou ???????(??): > > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > > Ram Rachum wrote: > >> I propose implementing str.rreplace. (It'll be to str.replace what > >> str.rsplit is to str.split.) > > > > I suppose it only differs when the count parameter is supplied? > > > > I don't think it can hurt, except for the funny looks of its name. > > In any case, if str.rreplace is added then so should bytes.rreplace and > > bytearray.rreplace. > > bytearray.rremove, tuple.rindex, list.rindex, list.rremove. Not sure what those have to do with rreplace(). Overgeneralization doesn't help. Regards Antoine. From kn0m0n3 at gmail.com Fri Jan 24 18:55:48 2014 From: kn0m0n3 at gmail.com (Jason Bursey) Date: Fri, 24 Jan 2014 11:55:48 -0600 Subject: [Python-ideas] data banks access using python with a Samsung Galaxy GNU.org FSF.org Message-ID: For beginners; she knows saber from AMR -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 24 18:43:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 24 Jan 2014 09:43:45 -0800 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124183633.60f215f6@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> Message-ID: <52E2A651.1000309@stoneleaf.us> On 01/24/2014 09:36 AM, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 19:30:00 +0200 > Serhiy Storchaka > wrote: >> 24.01.14 18:56, Antoine Pitrou ???????(??): >>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >>> Ram Rachum wrote: >>>> I propose implementing str.rreplace. (It'll be to str.replace what >>>> str.rsplit is to str.split.) >>> >>> I suppose it only differs when the count parameter is supplied? >>> >>> I don't think it can hurt, except for the funny looks of its name. >>> In any case, if str.rreplace is added then so should bytes.rreplace and >>> bytearray.rreplace. >> >> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > > Not sure what those have to do with rreplace(). The funny look of the name, I think. ;) -- ~Ethan~ From storchaka at gmail.com Fri Jan 24 19:13:26 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 24 Jan 2014 20:13:26 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124183633.60f215f6@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> Message-ID: 24.01.14 19:36, Antoine Pitrou ???????(??): > On Fri, 24 Jan 2014 19:30:00 +0200 > Serhiy Storchaka > wrote: >> 24.01.14 18:56, Antoine Pitrou ???????(??): >>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >>> Ram Rachum wrote: >>>> I propose implementing str.rreplace. (It'll be to str.replace what >>>> str.rsplit is to str.split.) >>> >>> I suppose it only differs when the count parameter is supplied? >>> >>> I don't think it can hurt, except for the funny looks of its name. >>> In any case, if str.rreplace is added then so should bytes.rreplace and >>> bytearray.rreplace. >> >> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > > Not sure what those have to do with rreplace(). Overgeneralization > doesn't help. If open a door for rreplace, it would be not easy to close it for rindex and rremove. From solipsis at pitrou.net Fri Jan 24 19:20:21 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 24 Jan 2014 19:20:21 +0100 Subject: [Python-ideas] str.rreplace References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> Message-ID: <20140124192021.7dcc1c77@fsol> On Fri, 24 Jan 2014 20:13:26 +0200 Serhiy Storchaka wrote: > 24.01.14 19:36, Antoine Pitrou ???????(??): > > On Fri, 24 Jan 2014 19:30:00 +0200 > > Serhiy Storchaka > > wrote: > >> 24.01.14 18:56, Antoine Pitrou ???????(??): > >>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > >>> Ram Rachum wrote: > >>>> I propose implementing str.rreplace. (It'll be to str.replace what > >>>> str.rsplit is to str.split.) > >>> > >>> I suppose it only differs when the count parameter is supplied? > >>> > >>> I don't think it can hurt, except for the funny looks of its name. > >>> In any case, if str.rreplace is added then so should bytes.rreplace and > >>> bytearray.rreplace. > >> > >> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > > > > Not sure what those have to do with rreplace(). Overgeneralization > > doesn't help. > > If open a door for rreplace, it would be not easy to close it for rindex > and rremove. Perhaps you underestimate our collective door closing skills ;) Regards Antoine. From abarnert at yahoo.com Fri Jan 24 19:20:59 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 24 Jan 2014 10:20:59 -0800 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E2A651.1000309@stoneleaf.us> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> Message-ID: <715A00D2-A12B-4D21-A17F-88338F396C3C@yahoo.com> On Jan 24, 2014, at 9:43, Ethan Furman wrote: > On 01/24/2014 09:36 AM, Antoine Pitrou wrote: >> On Fri, 24 Jan 2014 19:30:00 +0200 >> Serhiy Storchaka >> wrote: >>> 24.01.14 18:56, Antoine Pitrou ???????(??): >>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >>>> Ram Rachum wrote: >>>>> I propose implementing str.rreplace. (It'll be to str.replace what >>>>> str.rsplit is to str.split.) >>>> >>>> I suppose it only differs when the count parameter is supplied? >>>> >>>> I don't think it can hurt, except for the funny looks of its name. >>>> In any case, if str.rreplace is added then so should bytes.rreplace and >>>> bytearray.rreplace. >>> >>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >> >> Not sure what those have to do with rreplace(). > > The funny look of the name, I think. ;) And the pronunciation. Hard to say it without sounding like a pirate. Although I guess you could interpret the rr as a rolled r: strrrrings have rrrrreplace thanks to rrrrachum. But the inclusion of rindex makes me think this was a serious suggestion to add r versions of all methods that involve searching. Which probably isn't worth the effort to do, but there's nothing really wrong with the idea. From abarnert at yahoo.com Fri Jan 24 19:25:21 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 24 Jan 2014 10:25:21 -0800 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124192021.7dcc1c77@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> Message-ID: <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> On Jan 24, 2014, at 10:20, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 20:13:26 +0200 > Serhiy Storchaka > wrote: >> 24.01.14 19:36, Antoine Pitrou ???????(??): >>> On Fri, 24 Jan 2014 19:30:00 +0200 >>> Serhiy Storchaka >>> wrote: >>>> 24.01.14 18:56, Antoine Pitrou ???????(??): >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >>>>> Ram Rachum wrote: >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what >>>>>> str.rsplit is to str.split.) >>>>> >>>>> I suppose it only differs when the count parameter is supplied? >>>>> >>>>> I don't think it can hurt, except for the funny looks of its name. >>>>> In any case, if str.rreplace is added then so should bytes.rreplace and >>>>> bytearray.rreplace. >>>> >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >>> >>> Not sure what those have to do with rreplace(). Overgeneralization >>> doesn't help. >> >> If open a door for rreplace, it would be not easy to close it for rindex >> and rremove. > > Perhaps you underestimate our collective door closing skills ;) While we're speculatively overgeneralizing, couldn't all of the index/find/remove/replace/etc. methods take a negative n to count from the end, making r variants unnecessary? From python at mrabarnett.plus.com Fri Jan 24 20:17:04 2014 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 24 Jan 2014 19:17:04 +0000 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124175645.66bb8daf@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: <52E2BC30.4080207@mrabarnett.plus.com> On 2014-01-24 16:56, Antoine Pitrou wrote: > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > Ram Rachum wrote: >> I propose implementing str.rreplace. (It'll be to str.replace what >> str.rsplit is to str.split.) > > I suppose it only differs when the count parameter is supplied? > Not necessarily: >>> 'aaa'.replace('aa', 'x') 'xa' >>> 'aaa'.rreplace('aa', 'x') 'ax' > I don't think it can hurt, except for the funny looks of its name. > In any case, if str.rreplace is added then so should bytes.rreplace and > bytearray.rreplace. > From random832 at fastmail.us Fri Jan 24 21:33:48 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 24 Jan 2014 15:33:48 -0500 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E2BC30.4080207@mrabarnett.plus.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> Message-ID: <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> On Fri, Jan 24, 2014, at 14:17, MRAB wrote: > On 2014-01-24 16:56, Antoine Pitrou wrote: > > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > > Ram Rachum wrote: > >> I propose implementing str.rreplace. (It'll be to str.replace what > >> str.rsplit is to str.split.) > > > > I suppose it only differs when the count parameter is supplied? > > > Not necessarily: > > >>> 'aaa'.replace('aa', 'x') > 'xa' > >>> 'aaa'.rreplace('aa', 'x') > 'ax' >>>'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] 'ax' From rosuav at gmail.com Fri Jan 24 21:48:36 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jan 2014 07:48:36 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> Message-ID: On Sat, Jan 25, 2014 at 7:33 AM, wrote: >>>>'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] > 'ax' It makes me happy when the [::-1] smiley gets used that many times to solve a problem. Very happy. Happy that it isn't in _my_ code, to be precise... ChrisA From breamoreboy at yahoo.co.uk Fri Jan 24 22:01:12 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 24 Jan 2014 21:01:12 +0000 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> Message-ID: On 24/01/2014 20:48, Chris Angelico wrote: > On Sat, Jan 25, 2014 at 7:33 AM, wrote: >>>>> 'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] >> 'ax' > > It makes me happy when the [::-1] smiley gets used that many times to > solve a problem. Very happy. > > Happy that it isn't in _my_ code, to be precise... > > ChrisA +1 -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From python at mrabarnett.plus.com Fri Jan 24 22:04:22 2014 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 24 Jan 2014 21:04:22 +0000 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> Message-ID: <52E2D556.2080206@mrabarnett.plus.com> On 2014-01-24 20:48, Chris Angelico wrote: > On Sat, Jan 25, 2014 at 7:33 AM, wrote: >>>>>'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] >> 'ax' > > It makes me happy when the [::-1] smiley gets used that many times to > solve a problem. Very happy. > > Happy that it isn't in _my_ code, to be precise... > It's probably not as efficient, either! And if we're going to do it that way, do we really need .rindex and .rfind? Or .rstrip (we could use .lstrip)? From ncoghlan at gmail.com Sat Jan 25 01:05:31 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Jan 2014 10:05:31 +1000 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> Message-ID: On 25 Jan 2014 04:29, "Andrew Barnert" wrote: > > On Jan 24, 2014, at 10:20, Antoine Pitrou wrote: > > > On Fri, 24 Jan 2014 20:13:26 +0200 > > Serhiy Storchaka > > wrote: > >> 24.01.14 19:36, Antoine Pitrou ???????(??): > >>> On Fri, 24 Jan 2014 19:30:00 +0200 > >>> Serhiy Storchaka > >>> wrote: > >>>> 24.01.14 18:56, Antoine Pitrou ???????(??): > >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > >>>>> Ram Rachum wrote: > >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what > >>>>>> str.rsplit is to str.split.) > >>>>> > >>>>> I suppose it only differs when the count parameter is supplied? > >>>>> > >>>>> I don't think it can hurt, except for the funny looks of its name. > >>>>> In any case, if str.rreplace is added then so should bytes.rreplace and > >>>>> bytearray.rreplace. > >>>> > >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > >>> > >>> Not sure what those have to do with rreplace(). Overgeneralization > >>> doesn't help. > >> > >> If open a door for rreplace, it would be not easy to close it for rindex > >> and rremove. > > > > Perhaps you underestimate our collective door closing skills ;) > > While we're speculatively overgeneralizing, couldn't all of the index/find/remove/replace/etc. methods take a negative n to count from the end, making r variants unnecessary? Strings already provide rfind and rindex (they're just not part of the general sequence API). Since strings are immutable, there's also no call for an "rremove". rreplace (pronounced as 'ar-replace", like "ar-split" et al) is more obvious than a negative count, and seems like an almost exact parallel to rsplit. On the other hand, I don't recall ever lamenting its absence. Call me +0 on the idea. Cheers, Nick. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jan 25 02:17:25 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Jan 2014 12:17:25 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> Message-ID: <20140125011725.GT3915@ando> On Fri, Jan 24, 2014 at 03:33:48PM -0500, random832 at fastmail.us wrote: > > On Fri, Jan 24, 2014, at 14:17, MRAB wrote: > > On 2014-01-24 16:56, Antoine Pitrou wrote: > > > On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > > > Ram Rachum wrote: > > >> I propose implementing str.rreplace. (It'll be to str.replace what > > >> str.rsplit is to str.split.) > > > > > > I suppose it only differs when the count parameter is supplied? > > > > > Not necessarily: > > > > >>> 'aaa'.replace('aa', 'x') > > 'xa' > > >>> 'aaa'.rreplace('aa', 'x') > > 'ax' Good catch! > >>>'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] > 'ax' That is very possibly the ugliest Python code I have ever seen :-) -- Steven From abarnert at yahoo.com Sat Jan 25 02:36:08 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 24 Jan 2014 17:36:08 -0800 (PST) Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> Message-ID: <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Friday, January 24, 2014 4:05 PM >On 25 Jan 2014 04:29, "Andrew Barnert" wrote: >> >> On Jan 24, 2014, at 10:20, Antoine Pitrou wrote: >> >> > On Fri, 24 Jan 2014 20:13:26 +0200 >> > Serhiy Storchaka >> > wrote: >> >> 24.01.14 19:36, Antoine Pitrou ???????(??): >> >>> On Fri, 24 Jan 2014 19:30:00 +0200 >> >>> Serhiy Storchaka >> >>> wrote: >> >>>> 24.01.14 18:56, Antoine Pitrou ???????(??): >> >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >> >>>>> Ram Rachum wrote: >> >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what >> >>>>>> str.rsplit is to str.split.) >> >>>>> >> >>>>> I suppose it only differs when the count parameter is supplied? >> >>>>> >> >>>>> I don't think it can hurt, except for the funny looks of its name. >> >>>>> In any case, if str.rreplace is added then so should bytes.rreplace and >> >>>>> bytearray.rreplace. >> >>>> >> >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >> >>> >> >>> Not sure what those have to do with rreplace(). Overgeneralization >> >>> doesn't help. >> >> >> >> If open a door for rreplace, it would be not easy to close it for rindex >> >> and rremove. >> > >> > Perhaps you underestimate our collective door closing skills ;) >> >> While we're speculatively overgeneralizing, couldn't all of the index/find/remove/replace/etc. methods take a negative n to count from the end, making r variants unnecessary? >Strings already provide rfind and rindex (they're just not part of the general sequence API). >Since strings are immutable, there's also no call for an "remove". I was responding to Serhiy's (probably facetious or devil's advocate) suggestion that we should regularize the API: add rfind and rindex to tuple (and presumably Sequence), and those plus rremove to list (and presumably MutableSequence), and so on. My point was that if we're going to be that radical, we might as well consider removing methods instead of adding them. Some of the find-like methods already take negative indices; expanding that to all of the index-based methods, and doing the equivalent to the count-based ones, and adding a count or index to those that have neither, would mean all of the "r" variants could go away. I think it's pretty obvious that both this suggestion and Serhiy's are not worth doing for Python?the language has had pretty much the same set of find-style methods for decades, most of them are used frequently, and people rarely go looking for any of the "missing" ones, so why change it? (And I think that was Serhiy's point as well, but I don't want to speak for him.) If people _do_ find themselves missing one particular variant, just adding that one more variant is a lot more conservative than changing everything; if not, there's no reason to add anything at all. From greg.ewing at canterbury.ac.nz Sat Jan 25 06:57:21 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Jan 2014 18:57:21 +1300 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E2A651.1000309@stoneleaf.us> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> Message-ID: <52E35241.1030201@canterbury.ac.nz> Ethan Furman wrote: > On 01/24/2014 09:36 AM, Antoine Pitrou wrote: > >> On Fri, 24 Jan 2014 19:30:00 +0200 >> Serhiy Storchaka >> wrote: >> >>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >> >> Not sure what those have to do with rreplace(). > > The funny look of the name, I think. ;) Yes, obviously the properly serious names for them would be bytearray.evomer, tuple.xedni and list.evomer. No confusing double Rs to trip you up then. -- Greg From python at 2sn.net Sat Jan 25 07:45:05 2014 From: python at 2sn.net (Alexander Heger) Date: Sat, 25 Jan 2014 17:45:05 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140124175645.66bb8daf@fsol> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: >> I propose implementing str.rreplace. (It'll be to str.replace what >> str.rsplit is to str.split.) Instead of str.rreplace you could just add a parameter 'reverse=False|True' and add the same thing wherever needed, including making rfind superfluous. From storchaka at gmail.com Sat Jan 25 08:01:00 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 09:01:00 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> Message-ID: 24.01.14 20:25, Andrew Barnert ???????(??): > While we're speculatively overgeneralizing, couldn't all of the index/find/remove/replace/etc. methods take a negative n to count from the end, making r variants unnecessary? This is backward incompatible change. From storchaka at gmail.com Sat Jan 25 08:11:23 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 09:11:23 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> Message-ID: 25.01.14 02:05, Nick Coghlan ???????(??): > Strings already provide rfind and rindex (they're just not part of the > general sequence API). > > Since strings are immutable, there's also no call for an "rremove". > > rreplace (pronounced as 'ar-replace", like "ar-split" et al) is more > obvious than a negative count, and seems like an almost exact parallel > to rsplit. > > On the other hand, I don't recall ever lamenting its absence. Call me +0 > on the idea. I'm between -0 and +0. On one hand there are precedents, meaning of these methods looks clear and consistent with others, and the cost of adding these methods are pretty low. On other hand, the cost is larger than zero, and these methods are needed very rarely (and there are other ways to do it). In case of doubts I think the status quo wins. From storchaka at gmail.com Sat Jan 25 08:16:17 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 09:16:17 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E2D556.2080206@mrabarnett.plus.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> Message-ID: 24.01.14 23:04, MRAB ???????(??): > On 2014-01-24 20:48, Chris Angelico wrote: >> On Sat, Jan 25, 2014 at 7:33 AM, >> wrote: >>>>>> 'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] >>> 'ax' >> >> It makes me happy when the [::-1] smiley gets used that many times to >> solve a problem. Very happy. >> >> Happy that it isn't in _my_ code, to be precise... >> > It's probably not as efficient, either! Of course it is less efficient than hypothetical rreplace, but I suppose it is most efficient way in current Python. From rosuav at gmail.com Sat Jan 25 08:37:58 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 25 Jan 2014 18:37:58 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> Message-ID: On Sat, Jan 25, 2014 at 6:16 PM, Serhiy Storchaka wrote: > 24.01.14 23:04, MRAB ???????(??): > >> On 2014-01-24 20:48, Chris Angelico wrote: >>> >>> On Sat, Jan 25, 2014 at 7:33 AM, >>> wrote: >>>>>>> >>>>>>> 'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] >>>> >>>> 'ax' >>> >>> >>> It makes me happy when the [::-1] smiley gets used that many times to >>> solve a problem. Very happy. >>> >>> Happy that it isn't in _my_ code, to be precise... >>> >> It's probably not as efficient, either! > > > Of course it is less efficient than hypothetical rreplace, but I suppose it > is most efficient way in current Python. Is it possible to use a reversed iterator, filter it through something that does the replacement, and then do some sort of reversed ''.join() at the end? It'd still be ugly though. ChrisA From g.brandl at gmx.net Sat Jan 25 08:55:36 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 25 Jan 2014 08:55:36 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> Message-ID: Am 25.01.2014 08:16, schrieb Serhiy Storchaka: > 24.01.14 23:04, MRAB ???????(??): >> On 2014-01-24 20:48, Chris Angelico wrote: >>> On Sat, Jan 25, 2014 at 7:33 AM, >>> wrote: >>>>>>> 'aaa'[::-1].replace('aa'[::-1],'x'[::-1])[::-1] >>>> 'ax' >>> >>> It makes me happy when the [::-1] smiley gets used that many times to >>> solve a problem. Very happy. >>> >>> Happy that it isn't in _my_ code, to be precise... >>> >> It's probably not as efficient, either! > > Of course it is less efficient than hypothetical rreplace, but I suppose > it is most efficient way in current Python. There was also the suggestion on stackoverflow of 'x'.join('aaa'.rsplit('aa', 1)) which might be faster and less colon-y, but is very good at covering up the real purpose of the code :) Georg From amber.yust at gmail.com Sat Jan 25 09:01:28 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sat, 25 Jan 2014 08:01:28 +0000 Subject: [Python-ideas] str.rreplace References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> Message-ID: <-5165424205136425370@gmail297201516> On Fri Jan 24 2014 at 11:55:57 PM, Georg Brandl wrote: > There was also the suggestion on stackoverflow of > > 'x'.join('aaa'.rsplit('aa', 1)) > > which might be faster and less colon-y, but is very good at covering up the > real purpose of the code :) > Which is why you throw it in a clearly named function. def rreplace(haystack, needle, replacement, count): """Replace the N rightmost occurrences of one string with another.""" replacement.join(haystack.rsplit(needle, count)) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amber.yust at gmail.com Sat Jan 25 09:01:50 2014 From: amber.yust at gmail.com (Amber Yust) Date: Sat, 25 Jan 2014 08:01:50 +0000 Subject: [Python-ideas] str.rreplace References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> <-5165424205136425370@gmail297201516> Message-ID: <-6404288867984790623@gmail297201516> (Er, module the missing return keyword.) On Sat Jan 25 2014 at 12:01:28 AM, Amber Yust wrote: > On Fri Jan 24 2014 at 11:55:57 PM, Georg Brandl wrote: > > There was also the suggestion on stackoverflow of > > 'x'.join('aaa'.rsplit('aa', 1)) > > which might be faster and less colon-y, but is very good at covering up the > real purpose of the code :) > > > Which is why you throw it in a clearly named function. > > def rreplace(haystack, needle, replacement, count): > """Replace the N rightmost occurrences of one string with another.""" > replacement.join(haystack.rsplit(needle, count)) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From denis.spir at gmail.com Sat Jan 25 09:22:43 2014 From: denis.spir at gmail.com (spir) Date: Sat, 25 Jan 2014 09:22:43 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: <715A00D2-A12B-4D21-A17F-88338F396C3C@yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> <715A00D2-A12B-4D21-A17F-88338F396C3C@yahoo.com> Message-ID: <52E37453.8020507@gmail.com> On 01/24/2014 07:20 PM, Andrew Barnert wrote: > On Jan 24, 2014, at 9:43, Ethan Furman wrote: > >> On 01/24/2014 09:36 AM, Antoine Pitrou wrote: >>> On Fri, 24 Jan 2014 19:30:00 +0200 >>> Serhiy Storchaka >>> wrote: >>>> 24.01.14 18:56, Antoine Pitrou ???????(??): >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) >>>>> Ram Rachum wrote: >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what >>>>>> str.rsplit is to str.split.) >>>>> >>>>> I suppose it only differs when the count parameter is supplied? >>>>> >>>>> I don't think it can hurt, except for the funny looks of its name. >>>>> In any case, if str.rreplace is added then so should bytes.rreplace and >>>>> bytearray.rreplace. >>>> >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >>> >>> Not sure what those have to do with rreplace(). >> >> The funny look of the name, I think. ;) > > And the pronunciation. Hard to say it without sounding like a pirate. Although I guess you could interpret the rr as a rolled r: strrrrings have rrrrreplace thanks to rrrrachum. > > But the inclusion of rindex makes me think this was a serious suggestion to add r versions of all methods that involve searching. Which probably isn't worth the effort to do, but there's nothing really wrong with the idea. Those methods would better have a logical param meaning "traverse backwards", imo. D From denis.spir at gmail.com Sat Jan 25 09:24:15 2014 From: denis.spir at gmail.com (spir) Date: Sat, 25 Jan 2014 09:24:15 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: <715A00D2-A12B-4D21-A17F-88338F396C3C@yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> <715A00D2-A12B-4D21-A17F-88338F396C3C@yahoo.com> Message-ID: <52E374AF.1090905@gmail.com> On 01/24/2014 07:20 PM, Andrew Barnert wrote: > And the pronunciation. Hard to say it without sounding like a pirate. Although I guess you could interpret the rr as a rolled r: strrrrings have rrrrreplace thanks to rrrrachum. it's castinglish d From storchaka at gmail.com Sat Jan 25 09:25:53 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 25 Jan 2014 10:25:53 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E2BC30.4080207@mrabarnett.plus.com> <1390595628.8907.74989621.4E13FE1D@webmail.messagingengine.com> <52E2D556.2080206@mrabarnett.plus.com> Message-ID: 25.01.14 09:55, Georg Brandl ???????(??): > There was also the suggestion on stackoverflow of > > 'x'.join('aaa'.rsplit('aa', 1)) > > which might be faster and less colon-y, but is very good at covering up the > real purpose of the code :) Indeed, it is faster if you less part of string is replaced. But the [::-1] variant looks more funny. From denis.spir at gmail.com Sat Jan 25 09:32:05 2014 From: denis.spir at gmail.com (spir) Date: Sat, 25 Jan 2014 09:32:05 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> Message-ID: <52E37685.4060109@gmail.com> On 01/25/2014 07:45 AM, Alexander Heger wrote: >>> I propose implementing str.rreplace. (It'll be to str.replace what >>> str.rsplit is to str.split.) > > Instead of str.rreplace you could just add a parameter > 'reverse=False|True' and add the same thing wherever needed, including > making rfind superfluous. This is a right way, imo, except that there is no string (/sequence) reversal here, but instead backward traversal. d From phd at phdru.name Sat Jan 25 12:15:13 2014 From: phd at phdru.name (Oleg Broytman) Date: Sat, 25 Jan 2014 12:15:13 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E35241.1030201@canterbury.ac.nz> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> <52E35241.1030201@canterbury.ac.nz> Message-ID: <20140125111513.GA21875@phdru.name> On Sat, Jan 25, 2014 at 06:57:21PM +1300, Greg Ewing wrote: > Ethan Furman wrote: > >On 01/24/2014 09:36 AM, Antoine Pitrou wrote: > > > >>On Fri, 24 Jan 2014 19:30:00 +0200 > >>Serhiy Storchaka > >>wrote: > >> > >>>bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > >> > >>Not sure what those have to do with rreplace(). > > > >The funny look of the name, I think. ;) > > Yes, obviously the properly serious names for > them would be bytearray.evomer, tuple.xedni and > list.evomer. No confusing double Rs to trip > you up then. While we are at it, can we also change the language a bit and add closing lines for compound operators? I suggest pairs like if/fi, for/rof and while/done. I'm still thinking about try/except/finally. That minor addition also would help to create multiline anonymous functions -- just put the body inside def/fed. (Big ugly evil grin.) Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From python at 2sn.net Sat Jan 25 13:21:42 2014 From: python at 2sn.net (Alexander Heger) Date: Sat, 25 Jan 2014 23:21:42 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: <52E37685.4060109@gmail.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E37685.4060109@gmail.com> Message-ID: >> Instead of str.rreplace you could just add a parameter >> 'reverse=False|True' and add the same thing wherever needed, including >> making rfind superfluous. > > This is a right way, imo, except that there is no string (/sequence) > reversal here, but instead backward traversal. I suppose a better name could be found. 'traverse_backward=True|False(default)' For some of the reverse methods problems may occur if they operate on an iterator rather than an actual list, tuple, or similar. From denis.spir at gmail.com Sat Jan 25 14:18:05 2014 From: denis.spir at gmail.com (spir) Date: Sat, 25 Jan 2014 14:18:05 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <52E37685.4060109@gmail.com> Message-ID: <52E3B98D.5000300@gmail.com> On 01/25/2014 01:21 PM, Alexander Heger wrote: > For some of the reverse methods problems may occur if they operate on > an iterator rather than an actual list, tuple, or similar. Sure. Thus maybe the right way is to abandon this altogether and require the user to user a reverse() generator (or should i say iterator here?) instead? (this time, really reverse ;-) d From breamoreboy at yahoo.co.uk Sat Jan 25 14:36:45 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 25 Jan 2014 13:36:45 +0000 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140125111513.GA21875@phdru.name> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <52E2A651.1000309@stoneleaf.us> <52E35241.1030201@canterbury.ac.nz> <20140125111513.GA21875@phdru.name> Message-ID: On 25/01/2014 11:15, Oleg Broytman wrote: > On Sat, Jan 25, 2014 at 06:57:21PM +1300, Greg Ewing wrote: >> Ethan Furman wrote: >>> On 01/24/2014 09:36 AM, Antoine Pitrou wrote: >>> >>>> On Fri, 24 Jan 2014 19:30:00 +0200 >>>> Serhiy Storchaka >>>> wrote: >>>> >>>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. >>>> >>>> Not sure what those have to do with rreplace(). >>> >>> The funny look of the name, I think. ;) >> >> Yes, obviously the properly serious names for >> them would be bytearray.evomer, tuple.xedni and >> list.evomer. No confusing double Rs to trip >> you up then. > > While we are at it, can we also change the language a bit and add > closing lines for compound operators? I suggest pairs like if/fi, > for/rof and while/done. I'm still thinking about try/except/finally. > That minor addition also would help to create multiline anonymous > functions -- just put the body inside def/fed. > (Big ugly evil grin.) > > Oleg. > Big +1 from me. Do we toss a coin to see who gets to write the PEP? Or is it decided by the winner of yet another reenactment of the Battle of Pearl Harbour? :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ron3200 at gmail.com Tue Jan 28 04:18:05 2014 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 27 Jan 2014 21:18:05 -0600 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> Message-ID: On 01/24/2014 07:36 PM, Andrew Barnert wrote: >>>>> While we're speculatively overgeneralizing, couldn't all of the >>>>> index/find/remove/replace/etc. methods take a negative n to >>>>> count from the end, making r variants unnecessary? >>> Strings already provide rfind and rindex (they're just not part of >>> the general sequence API). Since strings are immutable, there's also >>> no call for an "remove". > I was responding to Serhiy's (probably facetious or devil's advocate) > suggestion that we should regularize the API: add rfind and rindex to > tuple (and presumably Sequence), and those plus rremove to list (and > presumably MutableSequence), and so on. > > My point was that if we're going to be that radical, we might as well > consider removing methods instead of adding them. Some of the find-like > methods already take negative indices; expanding that to all of the > index-based methods, and doing the equivalent to the count-based ones, > and adding a count or index to those that have neither, would mean all > of the "r" variants could go away. How about a keyword to specify which end to index from? When used, it would disable negative indexing as well. When not used the current behaviour with negative indexing would be the default. direction=0 # The default with the current (or not specified) # negative indexing allowed. direction=1 # From first. Negative indexing disallowed. direction=-1 # From last. Negative indexing disallowed. (A shorter key word would be nice, but I can't think of any that is as clear.) The reason for turning off the negative indexing is it would also offer a way to avoid some indexing bugs as well. (Using negative indexing with a reversed index is just asking for trouble I think.) While the spelling isn't a short and concise as I would like, I could always wrap them in short helper functions if I wanted... ffind, rfind, findex, rindex.. etc. But those wouldn't need to be added to python. Cheers, Ron > I think it's pretty obvious that both this suggestion and Serhiy's are > not worth doing for Python?the language has had pretty much the same set > of find-style methods for decades, most of them are used frequently, and > people rarely go looking for any of the "missing" ones, so why change > it? (And I think that was Serhiy's point as well, but I don't want to > speak for him.) If people_do_ find themselves missing one particular > variant, just adding that one more variant is a lot more conservative > than changing everything; if not, there's no reason to add anything at > all. From abarnert at yahoo.com Tue Jan 28 05:03:54 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 27 Jan 2014 20:03:54 -0800 (PST) Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> Message-ID: <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> From: Ron Adam Sent: Monday, January 27, 2014 7:18 PM > On 01/24/2014 07:36 PM, Andrew Barnert wrote: >> I was responding to Serhiy's (probably facetious or devil's advocate) >> suggestion that we should regularize the API: add rfind and rindex to >> tuple (and presumably Sequence), and those plus rremove to list (and >> presumably MutableSequence), and so on. >> >> My point was that if we're going to be that radical, we might as well >> consider removing methods instead of adding them. Some of the find-like >> methods already take negative indices; expanding that to all of the >> index-based methods, and doing the equivalent to the count-based ones, >> and adding a count or index to those that have neither, would mean all >> of the "r" variants could go away. > > How about a keyword to specify which end to index from?? When used, it would > disable negative indexing as well.? When not used the current behaviour with > negative indexing would be the default. >? > ? ? direction=0? ? ? ? ? ? # The default with the current > ? ? (or not specified)? ? #? ? negative indexing allowed. > > ? ? direction=1? # From first. Negative indexing disallowed. > ? ? direction=-1? # From last.? Negative indexing disallowed. >? > (A shorter key word would be nice, but I can't think of any that is as > clear.) Why does it have to be -1/0/1 instead of just True/False? In which case we could use "reverse", the same name that's already used for similar things in other methods like list.sort (and that's implied in the current names "rfind", etc.). > The reason for turning off the negative indexing is it would also offer a way to? > avoid some indexing bugs as well.? (Using negative indexing with a reversed > index is just asking for trouble I think.) But str.rfind takes negative indices today: ? ? >>> 'abccba'.rfind('b', -5, -3) ? ? 1 Why take away functionality that already works? And of course str.find takes negative indices and that's actually used in some quick&dirty scripts: ? ? >>> has_ext = path.find('.', -4) Of course you could make an argument that any such scripts deserve to be broken? From ron3200 at gmail.com Tue Jan 28 07:27:31 2014 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 28 Jan 2014 00:27:31 -0600 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: On 01/27/2014 10:03 PM, Andrew Barnert wrote: > From: Ron Adam > > Sent: Monday, January 27, 2014 7:18 PM > > >> On 01/24/2014 07:36 PM, Andrew Barnert wrote: >>> I was responding to Serhiy's (probably facetious or devil's advocate) >>> suggestion that we should regularize the API: add rfind and rindex to >>> tuple (and presumably Sequence), and those plus rremove to list (and >>> presumably MutableSequence), and so on. >>> >>> My point was that if we're going to be that radical, we might as well >>> consider removing methods instead of adding them. Some of the find-like >>> methods already take negative indices; expanding that to all of the >>> index-based methods, and doing the equivalent to the count-based ones, >>> and adding a count or index to those that have neither, would mean all >>> of the "r" variants could go away. >> >> How about a keyword to specify which end to index from? When used, it would >> disable negative indexing as well. When not used the current behaviour with >> negative indexing would be the default. >> > >> direction=0 # The default with the current >> (or not specified) # negative indexing allowed. >> >> direction=1 # From first. Negative indexing disallowed. >> direction=-1 # From last. Negative indexing disallowed. >> > >> (A shorter key word would be nice, but I can't think of any that is as >> clear.) > > Why does it have to be -1/0/1 instead of just True/False? Well, then it would need to be.. True/False/None The reason it needs three modes is to save the current behaviour and not break anything. Actually I'm about even on weather I like the keyword option or separate functions. Also there's the case of taking a slice from the middle with a positive starting index and a negative ending index. And with the exception of examples, nearly all string slicing, use a right and left value to get characters in the forward order even if they are indexed from the right. So that gives four modes... left middle right default With the default being what we have now. I wonder if maybe it would be better to do these things with the string format method? That is a higher level interface more suitable for adding options to. > In which case we could use "reverse", the same name that's already used for similar things in other methods like list.sort (and that's implied in the current names "rfind", etc.). > >> The reason for turning off the negative indexing is it would also offer a way to > >> avoid some indexing bugs as well. (Using negative indexing with a reversed >> index is just asking for trouble I think.) > > But str.rfind takes negative indices today: > > >>> 'abccba'.rfind('b', -5, -3) > 1 > > Why take away functionality that already works? It could still work that way.. just don't specify a direction. :-) > And of course str.find takes negative indices and that's actually used in some quick&dirty scripts: > > >>> has_ext = path.find('.', -4) > > Of course you could make an argument that any such scripts deserve to be broken? I'd say they are already broken in that particular case. ;-) -Ron From denis.spir at gmail.com Tue Jan 28 09:40:54 2014 From: denis.spir at gmail.com (spir) Date: Tue, 28 Jan 2014 09:40:54 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: <52E76D16.4090403@gmail.com> On 01/28/2014 05:03 AM, Andrew Barnert wrote: >> >(A shorter key word would be nice, but I can't think of any that is as >> >clear.) > Why does it have to be -1/0/1 instead of just True/False? > > In which case we could use "reverse", the same name that's already used for similar things in other methods like list.sort (and that's implied in the current names "rfind", etc.). (Again, here there is no reversal, but backwards iteration; in list.sort, there is reversal. I'd vote for making all such methods use a logical param, if it did not break code [because eg rfind is used], on the line: l.find(it, backwards=False) or a shorter param name. ) d From denis.spir at gmail.com Tue Jan 28 09:40:49 2014 From: denis.spir at gmail.com (spir) Date: Tue, 28 Jan 2014 09:40:49 +0100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <73e21a44-d667-4430-b06e-06dde692a3df@googlegroups.com> <20140124175645.66bb8daf@fsol> <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: <52E76D11.4000708@gmail.com> On 01/28/2014 07:27 AM, Ron Adam wrote: >> And of course str.find takes negative indices and that's actually used in some >> quick&dirty scripts: >> >> >>> has_ext = path.find('.', -4) >> >> Of course you could make an argument that any such scripts deserve to be broken? > > I'd say they are already broken in that particular case. ;-) Not if the file(name)s are ones you create & control yourself. (Well, I don't mean I would program that way, except for a throwaway script. ;-) d From steve at pearwood.info Tue Jan 28 13:33:50 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jan 2014 23:33:50 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> References: <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: <20140128123349.GH3915@ando> On Mon, Jan 27, 2014 at 08:03:54PM -0800, Andrew Barnert wrote: > From: Ron Adam > > How about a keyword to specify which end to index from? -1 As a general rule, when you have a function that takes a parameter which selects between two different sets of behaviour, and you normally specify that parameter as a literal or constant known at edit time, then the function should be split into two. E.g.: # Good API string.upper(), string.lower() # Bad API string.convert_case(to_upper=True|False) sorted() and list.sort() (for example) are a counter-example. Sometimes you know which direction you want at edit-time, but there are many use-cases for leaving the decision to run-time. Nearly every application that sorts data lets the user decide which direction to sort. In the case of replace/rreplace, it is more like the upper vs. lower situation than the sorted situation. For almost any reasonable use-case, you will know at edit-time whether you want to go from the left or from the right, so you'll specify the "direction" parameter as a edit-time literal or constant. The same applies to find/rfind. > >? When used, it would > > disable negative indexing as well. -1 Negative indexing is a standard Python feature. There is nothing wrong with negative indexing, no more than there is something wrong with zero-based positive indexing. It's also irrelevant to the replace/rreplace example, since replace doesn't take start/end indexes, and presumably rreplace wouldn't either. > > When not used the current behaviour with > > negative indexing would be the default. > >? > > > ? ? direction=0? ? ? ? ? ? # The default with the current > > ? ? (or not specified)? ? #? ? negative indexing allowed. > > > > ? ? direction=1? # From first. Negative indexing disallowed. > > ? ? direction=-1? # From last.? Negative indexing disallowed. And if you want to operate from the right, with negative indexing allowed? But really, having a flag to decide whether to allow negative indexing is silly. If you don't want negative indexes, just don't use them. > > (A shorter key word would be nice, but I can't think of any that is as > > clear.) > > Why does it have to be -1/0/1 instead of just True/False? > > In which case we could use "reverse", the same name that's already > used for similar things in other methods like list.sort (and that's > implied in the current names "rfind", etc.). sorted(alist, reverse=True) gives the same result as sorted(alist, reverse=False) only reversed. That is not the case here: "Hello world".replace("o", "u", 1, reverse=True) # rreplace ought to return "Hello wurld", not "dlrow ulleH". > > The reason for turning off the negative indexing is it would also offer a way to? > > > avoid some indexing bugs as well.? (Using negative indexing with a reversed > > index is just asking for trouble I think.) > > But str.rfind takes negative indices today: > > ? ? >>> 'abccba'.rfind('b', -5, -3) > ? ? 1 > > Why take away functionality that already works? Exactly. Here, I agree strongly with Andrew. Negative indexing works perfectly well with find/rfind. Slices with negative strides are weird, but negative indexes are well-defined and easy to understand. > And of course str.find takes negative indices and that's actually used > in some quick&dirty scripts: > > ? ? >>> has_ext = path.find('.', -4) > > Of course you could make an argument that any such scripts deserve to > be broken? It would be an awfully bogus argument. Negative indexes are a well-defined part of Python indexing semantics. One might as well argue that any scripts that rely on list slicing making a copy "deserve to be broken". -- Steven From steve at pearwood.info Tue Jan 28 13:46:24 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Jan 2014 23:46:24 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> Message-ID: <20140128124624.GJ3915@ando> On Tue, Jan 28, 2014 at 12:27:31AM -0600, Ron Adam wrote: > >> direction=0 # The default with the current > >> (or not specified) # negative indexing allowed. > >> > >> direction=1 # From first. Negative indexing disallowed. > >> direction=-1 # From last. Negative indexing disallowed. > >> > > > >>(A shorter key word would be nice, but I can't think of any that is as > >>clear.) > > > >Why does it have to be -1/0/1 instead of just True/False? > > Well, then it would need to be.. True/False/None > > The reason it needs three modes is to save the current behaviour and not > break anything. What's "it", and how is this relevant to adding a version of replace that operates from the right? > Actually I'm about even on weather I like the keyword > option or separate functions. > > Also there's the case of taking a slice from the middle with a positive > starting index and a negative ending index. Now we're talking about slices? Providing a positive and negative index to a slice is well-defined and well-understood operation. "I want everything except the first and last item" => [1:-1]. > And with the exception of > examples, nearly all string slicing, use a right and left value to get > characters in the forward order even if they are indexed from the right. With the exception of what examples? The rest of your sentence confuses me. Are you talking about extended slicing with a negative stride given? Please don't over-generalise this issue. It's a simple request to add a version of replaces that operates from the right, just like rfind operates from the right. > So that gives four modes... left middle right default > With the default being what we have now. What? > I wonder if maybe it would be better to do these things with the string > format method? That is a higher level interface more suitable for adding > options to. You're talking about using a mini-language to control the direction of a replacement operation. That's not just an over-generalisation, its a hyper-generalisation. > >And of course str.find takes negative indices and that's actually used in > >some quick&dirty scripts: > > > > >>> has_ext = path.find('.', -4) > > > >Of course you could make an argument that any such scripts deserve to be > >broken? > > I'd say they are already broken in that particular case. ;-) It's broken, but not because of the negative index. -- Steven From storchaka at gmail.com Tue Jan 28 14:07:15 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 28 Jan 2014 15:07:15 +0200 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140128123349.GH3915@ando> References: <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> <20140128123349.GH3915@ando> Message-ID: 28.01.14 14:33, Steven D'Aprano ???????(??): > As a general rule, when you have a function that takes a parameter which > selects between two different sets of behaviour, and you normally > specify that parameter as a literal or constant known at edit time, then > the function should be split into two. > > E.g.: > > # Good API > string.upper(), string.lower() > > # Bad API > string.convert_case(to_upper=True|False) # Good API binascii.hexlify(data), zlib.compress(data) # Bad API codecs.encode(data, encoding='hex_codec'|'zlib_codec') From steve at pearwood.info Tue Jan 28 17:02:52 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 29 Jan 2014 03:02:52 +1100 Subject: [Python-ideas] str.rreplace In-Reply-To: References: <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> <20140128123349.GH3915@ando> Message-ID: <20140128160248.GK3915@ando> On Tue, Jan 28, 2014 at 03:07:15PM +0200, Serhiy Storchaka wrote: > 28.01.14 14:33, Steven D'Aprano ???????(??): > >As a general rule, when you have a function that takes a parameter which > >selects between two different sets of behaviour, and you normally > >specify that parameter as a literal or constant known at edit time, then > >the function should be split into two. > > > >E.g.: > > > ># Good API > >string.upper(), string.lower() > > > ># Bad API > >string.convert_case(to_upper=True|False) > > # Good API > binascii.hexlify(data), zlib.compress(data) Sure. Nothing wrong with them. > # Bad API > codecs.encode(data, encoding='hex_codec'|'zlib_codec') But that's not how the codecs.encode function is usually used. Like my earlier example of sorted(), sometimes you know in advance what encoding you want to use: codecs.encode(text, encoding="uft-8") but for many applications, the encoding parameter is not known until runtime: DEFAULT_ENCODING = "utf-8" encoding = get_encoding() or DEFAULT_ENCODING codecs.encoding(text, encoding=encoding) I can't think of an application where I would want to choose between hex_codec and zlib_codec at runtime, but that's because they are codecs with completely different purposes. A better example might be an application where I choose between compression methods at runtime: def get_compression(): # returns the name of a compression codec # e.g. zlib_codec, bz2_codec, xz_codec, lmza_codec # some of these may not be in the std lib at this time ... codecs.encoding(data, encoding=get_compression()) So the codecs.encoding function does not fail my test of "parameter is nearly always known at edit-time", and it is not a bad API. -- Steven From ron3200 at gmail.com Tue Jan 28 18:43:21 2014 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 28 Jan 2014 11:43:21 -0600 Subject: [Python-ideas] str.rreplace In-Reply-To: <20140128123349.GH3915@ando> References: <20140124183633.60f215f6@fsol> <20140124192021.7dcc1c77@fsol> <1AF1E6EA-17FF-4FA1-8582-9365B22E4714@yahoo.com> <1390613768.85265.YahooMailNeo@web181006.mail.ne1.yahoo.com> <1390881834.74969.YahooMailNeo@web181003.mail.ne1.yahoo.com> <20140128123349.GH3915@ando> Message-ID: On 01/28/2014 06:33 AM, Steven D'Aprano wrote: > On Mon, Jan 27, 2014 at 08:03:54PM -0800, Andrew Barnert wrote: >> From: Ron Adam > >>> How about a keyword to specify which end to index from? > > -1 > > As a general rule, when you have a function that takes a parameter which > selects between two different sets of behaviour, and you normally > specify that parameter as a literal or constant known at edit time, then > the function should be split into two. > > E.g.: > > # Good API > string.upper(), string.lower() > > # Bad API > string.convert_case(to_upper=True|False) You are correct, and I got my methods mixed up this morning ... I was thinking of __getitem__ instead of index. And related methods. The issues I was referring to are not directly related as you pointed out. In most cases I do think having separate functions or methods is better. And in this case it's no different than having partition and rrpartition. I think the argument against rreplace and the strangeness of it's name is too late. There are already a fair number of "r" methods. Cheers, Ron From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 27 18:41:02 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang) Date: Mon, 27 Jan 2014 09:41:02 -0800 (PST) Subject: [Python-ideas] statistics module in Python3.4 Message-ID: Dear all, I am still testing the new statistics module and I found two cases were the behavior of the module seems suboptimal to me. My most important concern is the module's internal _sum function and its implications, the other one about passing Counter objects to module functions. As for the first subject: Specifically, I am not happy with the way the function handles different types. Currently _coerce_types gets called for every element in the function's input sequence and type conversion follows quite complicated rules, and - what is worst - make the outcome of _sum() and thereby mean() dependent on the order of items in the input sequence, e.g.: >>> mean((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5))) 1.9944444444444445 >>> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) Traceback (most recent call last): File "", line 1, in mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) File "C:\Python33\statistics.py", line 369, in mean return _sum(data)/n File "C:\Python33\statistics.py", line 157, in _sum T = _coerce_types(T, type(x)) File "C:\Python33\statistics.py", line 327, in _coerce_types raise TypeError('cannot coerce types %r and %r' % (T1, T2)) TypeError: cannot coerce types and (this is because when _sum iterates over the input type Fraction wins over int, then float wins over Fraction and over everything else that follows in the first example, but in the second case Fraction wins over int, but then Fraction vs Decimal is undefined and throws an error). Confusing, isn't it? So here's the code of the _sum function: def _sum(data, start=0): """_sum(data [, start]) -> value Return a high-precision sum of the given numeric data. If optional argument ``start`` is given, it is added to the total. If ``data`` is empty, ``start`` (defaulting to 0) is returned. Examples -------- >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 Some sources of round-off error will be avoided: >>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 Fractions and Decimals are also supported: >>> from fractions import Fraction as F >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum(data) Decimal('0.6963') """ n, d = _exact_ratio(start) T = type(start) partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. coerce_types = _coerce_types exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and track the "current" type. for x in data: T = _coerce_types(T, type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) Internally, the function uses exact ratios for its calculations (which I think is very nice) and only goes through all the pain of coercing types to return T(total.numerator)/total.denominator where T is the final type resulting from the chain of conversions. I think a much cleaner (and probably faster) implementation would be to gather first all the types in the input sequence, then decide what to return in an input order independent way. My tentative implementation: def _sum2(data, start=None): if start is not None: t = set((type(start),)) n, d = _exact_ratio(start) else: t = set() n = 0 d = 1 partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and build up a set of all types. for x in data: t.add(type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n T = _coerce_types(t) # decide which type to use based on set of all types if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) this leaves the re-implementation of _coerce_types. Personally, I'd prefer something as simple as possible, maybe even: def _coerce_types (types): if len(types) == 1: return next(iter(types)) return float , but that's just a suggestion. In this case then: >>> _sum2((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5)))/6 1.9944444444444445 >>> _sum2((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))/6 1.9944444444444445 lets check the examples from the _sum docstring just to be sure: >>> _sum2([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 >>> _sum2([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 >>> from fractions import Fraction as F >>> _sum2([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum2(data) Decimal('0.6963') Now the second issue: It is maybe more a matter of taste and concerns the effects of passing a Counter() object to various functions in the module. I know this is undocumented and it's probably the user's fault if he tries that, but still: >>> from collections import Counter >>> c=Counter((1,1,1,1,2,2,2,2,2,3,3,3,3)) >>> c Counter({1: 4, 2: 5, 3: 4}) >>> mode(c) 2 Cool, mode knows how to work with Counters (interpreting them as frequency tables) >>> median(c) 2 Looks good >>> mean(c) 2.0 Very well But the truth is that only mode really works as you may think and we were just lucky with the other two: >>> c=Counter((1,1,2)) >>> mean(c) 1.5 oops >>> median(c) 1.5 hmm >From a quick look at the code you can see that mode actually converts your input to a Counter behind the scenes anyway, so it has no problem. mean and median, on the other hand, are simply iterating over their input, so if that input happens to be a mapping, they'll use just the keys. I think there are two simple ways to avoid this pitfall: 1) add an explicit warning to the docs explaining this behavior or 2) make mean and median do the same magic with Counters as mode does, i.e. make them check for Counter as the input type and deal with it as if it were a frequency table. I'd favor this behavior because it looks like little extra code, but may be very useful in many situations. I'm not quite sure whether maybe even all mappings should be treated that way? Ok, that's it for now I guess. Opinions anyone? Best, Wolfgang -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 29 04:06:40 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 28 Jan 2014 19:06:40 -0800 Subject: [Python-ideas] Need help designing subprocess API for Tulip Message-ID: If you're interested, please see us on the python-tulip mailing list at Google Groups. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Jan 29 08:44:30 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 29 Jan 2014 10:44:30 +0300 Subject: [Python-ideas] Iterative development Message-ID: Yet another idea that some of you will find strange. It is a parallel Python development process. It doesn't affect or replace current practice, so nobody gets hurt. It is also about open process, where openness means transparency (eliminate hidden communication), inclusiveness (eliminate exclusive rights and privileges) and accessibility (eliminate awkward practices and poor user experience). The idea is to split development of Python into two weeks cycle. Every two weeks is "iteration". Iteration consists of phases: 1. Planning (one, two days) 2. Execution 3. Testing 4. Demo 5. Retrospective Some of you, who familiar with concept of "sprint" and know something about "agile" buzzwords will find this idea familiar. In fact, this is borrowed from some of the best practices of working with remote teams who use this methodology. (Planning) So, during these the first, planning phase, people, who'd like to participate - choose what should be implemented in this iteration. For that there should be a list of things to be done. This list is called "backlog". People collaboratively estimate complexity and sort the things by priority. (Execution) You take a thing from backlog, mark that you're working on it, so that other people who are also interested can find you. If you need help, you split the thing into subtasks and make these tasks open for people to find and jump in. (Testing) This is a phase when work done is compared with actual thing description. Sometimes this leads to new insights, new ideas, new bugs and more work to be done in subsequent iteration. Sometimes it appears that during execution the thing completely diverged from what was originally planned. (Demo) Demonstration of the things done. Record progress, give credits and close mark things in backlog as done. Demo is made for broader community that just for a list of participants. (Retrospective) This is an important phase that is dedicated to gathering and processing feedback to improve the iteration loop. Every person reports what he/she liked and disliked, what was the % of overall fun. Then some things and ideas are being born from the feedback - what can be improved - being it tools, interaction with people or some other things that get in the way. -- anatoly t. From ethan at stoneleaf.us Wed Jan 29 09:29:10 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Jan 2014 00:29:10 -0800 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: <52E8BBD6.2010604@stoneleaf.us> On 01/28/2014 11:44 PM, anatoly techtonik wrote: > > It is a parallel Python development process. It doesn't affect or > replace current practice, so nobody gets hurt. So you're saying that we would have the current model, plus this agile model? > It is also about open > process, where openness means transparency (eliminate hidden > communication), What "hidden" communication? Talking in person or on IRC? Instead of ... where? > inclusiveness (eliminate exclusive rights and privileges) Exclusive rights? You mean let any piece of code get committed? > and accessibility (eliminate awkward practices and poor > user experience). It is not possible to please everyone; it is also not possible to ensure a "good user experience" for everyone. > The idea is to split development of Python into two weeks cycle. 80 hours? Do you have any idea how long it takes some of us to put in 80 hours of Python development time? -- ~Ethan~ From abarnert at yahoo.com Wed Jan 29 09:57:37 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jan 2014 00:57:37 -0800 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> On Jan 28, 2014, at 23:44, anatoly techtonik wrote: > Yet another idea that some of you will find strange. You do realize that Python is an open source project? And that the only people who work on it full time are the ones being paid by some organization that generally has its own priorities? > It is a parallel Python development process. It doesn't affect or > replace current practice, so nobody gets hurt. It is also about open > process, where openness means transparency (eliminate hidden > communication), inclusiveness (eliminate exclusive rights and > privileges) and accessibility (eliminate awkward practices and poor > user experience). > > The idea is to split development of Python into two weeks cycle. Every > two weeks is "iteration". Iteration consists of phases: > > 1. Planning (one, two days) > 2. Execution > 3. Testing > 4. Demo > 5. Retrospective > > Some of you, who familiar with concept of "sprint" and know something > about "agile" buzzwords will find this idea familiar. In fact, this is > borrowed from some of the best practices of working with remote teams > who use this methodology. > > (Planning) So, during these the first, planning phase, people, who'd > like to participate - choose what should be implemented in this > iteration. For that there should be a list of things to be done. This > list is called "backlog". People collaboratively estimate complexity > and sort the things by priority. > > (Execution) You take a thing from backlog, mark that you're working on > it, so that other people who are also interested can find you. If you > need help, you split the thing into subtasks and make these tasks open > for people to find and jump in. > > (Testing) This is a phase when work done is compared with actual thing > description. Sometimes this leads to new insights, new ideas, new bugs > and more work to be done in subsequent iteration. Sometimes it appears > that during execution the thing completely diverged from what was > originally planned. > > (Demo) Demonstration of the things done. Record progress, give credits > and close mark things in backlog as done. Demo is made for broader > community that just for a list of participants. > > (Retrospective) This is an important phase that is dedicated to > gathering and processing feedback to improve the iteration loop. Every > person reports what he/she liked and disliked, what was the % of > overall fun. Then some things and ideas are being born from the > feedback - what can be improved - being it tools, interaction with > people or some other things that get in the way. > > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From breamoreboy at yahoo.co.uk Wed Jan 29 10:29:21 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 29 Jan 2014 09:29:21 +0000 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: On 29/01/2014 07:44, anatoly techtonik wrote: > Yet another idea that some of you will find strange. > Instead of coming up with ideas, why not sign the contributors' agreement and come up with code that people can actually use? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Wed Jan 29 10:31:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Jan 2014 19:31:44 +1000 Subject: [Python-ideas] Iterative development In-Reply-To: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 29 Jan 2014 19:00, "Andrew Barnert" wrote: > > On Jan 28, 2014, at 23:44, anatoly techtonik wrote: > > > Yet another idea that some of you will find strange. > > You do realize that Python is an open source project? > > And that the only people who work on it full time are the ones being paid by some organization that generally has its own priorities? Currently a group containing zero people, FWIW (even Guido only spends part of his time on upstream work). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jan 29 10:47:26 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 29 Jan 2014 04:47:26 -0500 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: On 1/29/2014 2:44 AM, anatoly techtonik wrote: > The idea is to split development of Python into two weeks cycle. Every > two weeks is "iteration". Iteration consists of phases: > > 1. Planning (one, two days) > 2. Execution > 3. Testing > 4. Demo > 5. Retrospective This is more or less what we do now on an issue by issue basis. At a higher level, releases for the 'next' version already come out at 2 or 3 week intervals from a0 to final. At a higher level, we already have plans for 3.5 that we will start on as soon as 3.4.0 is out or after PyCon. -- Terry Jan Reedy From techtonik at gmail.com Wed Jan 29 10:11:44 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 29 Jan 2014 12:11:44 +0300 Subject: [Python-ideas] Normalized Python Message-ID: Python is a cross-platform language, but I often find myself writing sections specific for Windows and for Linux and sometimes even OS setting specific code. In these moments I that Python is not more cross-platform that C, for example. What could be done? Normalized Python - a set of default, standard behaviors that backup common user expectations about cross-platform and system-independent behavior regardless of backward compatibility and code compatibility concerns. This is needed, for example, to collect these two features: 1. open files in binary mode by default why? because "text file" is a human abstraction, for operating system it is just another format of binary data, so default operation is to read this data without any preprocessing 2. open text files in utf-8 encoding why? because users can not know the encoding of operating system, their programs can not choose right encoding, therefore a best guess is to expect the most widely used standard 3. threat stdout/stdin streams as binary why? because you don't want you data to be corrupt when you pass it in and out of Python via standard streams Having a separate "Normalized Python" concept is needed to set the context for developing and engineering ideas, instead of concentrating on the sad reality of backward compatibility curse. -- anatoly t. From techtonik at gmail.com Wed Jan 29 13:29:21 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 29 Jan 2014 15:29:21 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: <52E8BBD6.2010604@stoneleaf.us> References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Wed, Jan 29, 2014 at 11:29 AM, Ethan Furman wrote: > On 01/28/2014 11:44 PM, anatoly techtonik wrote: >> >> It is a parallel Python development process. It doesn't affect or >> replace current practice, so nobody gets hurt. > > So you're saying that we would have the current model, plus this agile > model? I am saying that you're not forced to follow agile model if you don't like it. You can do what you do as you did before. >> It is also about open >> process, where openness means transparency (eliminate hidden >> communication), > > What "hidden" communication? Talking in person or on IRC? Instead of ... > where? If information doesn't reach the recipient who want to read it, it is "hidden". Even if you talk in public channel on IRC, the information is hidden from me if I was not connected and channel doesn't have public logs. >> inclusiveness (eliminate exclusive rights and privileges) > Exclusive rights? You mean let any piece of code get committed? There are many exclusive rights that keep people off from contributing. I don't want to touch them here, because it will move the thread into different area. To make it more specific "inclusiveness" on the process is the process too. You start with people who have full exclusive rights and contributing then compare them to people who are willing to help, but don't do this. Then you remove the obstacles to include these people. >> and accessibility (eliminate awkward practices and poor >> user experience). > > It is not possible to please everyone; it is also not possible to ensure a > "good user experience" for everyone. That's a general claim. I am sure that it is possible to reach the point where everyone agree that their experience is "good enough user experience". And there is a dedicated time in the process (retrospective) to work on just on that. >> The idea is to split development of Python into two weeks cycle. > > > 80 hours? Do you have any idea how long it takes some of us to put in 80 > hours of Python development time? It is not development time. These two weeks cycle is just ordinary time, which may include 15 minutes of development time, a week or nothing. It is up to you - how much are you willing to spend. From rosuav at gmail.com Wed Jan 29 14:57:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 00:57:46 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik wrote: > If information doesn't reach the recipient who want to read it, it is "hidden". > Even if you talk in public channel on IRC, the information is hidden from me > if I was not connected and channel doesn't have public logs. Then if you care, connect. It's not hidden if you have the power to access it. Here's a suggestion: Fork Python (that's legal, that's what open source means) and start development using the model you advocate. If it's massively better than what's happening, (a) developers will flock to your model, and (b) the project could be completely handed over to you, as happened with GCC. Or alternatively, explain to us here what the real advantages are of your new model. So far, what I've seen is "hey, here's an idea", and not "here's what this idea will do to benefit Python"; and the idea itself looks more suited to a big business than to open source. Maybe someone who's actually used Agile will know what's so wonderful about it, but unless every core dev *has*, a bit of explanation will help. ChrisA From breamoreboy at yahoo.co.uk Wed Jan 29 15:08:39 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 29 Jan 2014 14:08:39 +0000 Subject: [Python-ideas] Normalized Python In-Reply-To: References: Message-ID: On 29/01/2014 09:11, anatoly techtonik wrote: > Python is a cross-platform language, but I often find myself writing > sections specific for Windows and for Linux and sometimes even OS > setting specific code. In these moments I that Python is not more > cross-platform that C, for example. > > What could be done? > > > Normalized Python - a set of default, standard behaviors that backup > common user expectations about cross-platform and system-independent > behavior regardless of backward compatibility and code compatibility > concerns. > > > This is needed, for example, to collect these two features: > 1. open files in binary mode by default > why? > because "text file" is a human abstraction, for operating > system it is just another format of binary data, so default > operation is to read this data without any preprocessing > > 2. open text files in utf-8 encoding > why? > because users can not know the encoding of operating > system, their programs can not choose right encoding, > therefore a best guess is to expect the most widely used > standard > > 3. threat stdout/stdin streams as binary > why? > because you don't want you data to be corrupt when > you pass it in and out of Python via standard streams > > > Having a separate "Normalized Python" concept is needed to set > the context for developing and engineering ideas, instead of > concentrating on the sad reality of backward compatibility curse. > I support what Chris Angelico has said on another thread, fork Python and if it's good enough everybody will flock to it. This also avoids the problem with the CLA. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Wed Jan 29 15:18:19 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 30 Jan 2014 00:18:19 +1000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On 29 January 2014 23:57, Chris Angelico wrote: > On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik wrote: >> If information doesn't reach the recipient who want to read it, it is "hidden". >> Even if you talk in public channel on IRC, the information is hidden from me >> if I was not connected and channel doesn't have public logs. > > Then if you care, connect. It's not hidden if you have the power to access it. > > Here's a suggestion: Fork Python (that's legal, that's what open > source means) and start development using the model you advocate. If > it's massively better than what's happening, (a) developers will flock > to your model, and (b) the project could be completely handed over to > you, as happened with GCC. > > Or alternatively, explain to us here what the real advantages are of > your new model. So far, what I've seen is "hey, here's an idea", and > not "here's what this idea will do to benefit Python"; and the idea > itself looks more suited to a big business than to open source. Maybe > someone who's actually used Agile will know what's so wonderful about > it, but unless every core dev *has*, a bit of explanation will help. Plenty of us have used it, and we know it's an entirely inappropriate model for open source development projects with broad asynchronous participation, as the time commitment needed to make the short cycle work is antithetical to loose collaboration. It works well for a focused team supporting a single application to meet the specific needs of a single business, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Jan 29 15:33:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 01:33:56 +1100 Subject: [Python-ideas] Normalized Python In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 8:11 PM, anatoly techtonik wrote: > Normalized Python - a set of default, standard behaviors that backup > common user expectations about cross-platform and system-independent > behavior regardless of backward compatibility and code compatibility > concerns. > > Having a separate "Normalized Python" concept is needed to set > the context for developing and engineering ideas, instead of > concentrating on the sad reality of backward compatibility curse. You can achieve the first two simply by opening files with parameters. There is NOTHING Windows-specific or Linux-specific in that. As of Python 3, opening in text mode is the default... but you can override that so easily. Why change the default (which breaks back compat) when you can just change your code? And I believe you can reopen stdin/stdout as binary, if you really want to, but that is a little harder. It's still not going to have any platform-specific code in it. (As I've never written a filter for binary files in Python, I've never had the need to read/write standard streams in binary. But I've no doubt that someone who has can show you how easy it is - I'd guess it's less than five lines of code, knowing Python.) > This is needed, for example, to collect these two features: (Among our features are such diverse elements as... oh, wrong Pythons.) > 1. open files in binary mode by default > why? > because "text file" is a human abstraction, for operating > system it is just another format of binary data, so default > operation is to read this data without any preprocessing A reasonably plausible argument. C++ follows that sort of model (you shouldn't pay for anything you're not using). SQL mostly follows that model (it generally takes more keywords to get the database to do more work - compare "SELECT x FROM y" and "SELECT x FROM y ORDER BY z", where the latter adds a sort phase; there are exceptions to this, like UNION ALL vs UNION, but they're notable _because_ they're exceptions). But it's nothing like a strong enough argument for changing. Creating two subtly different languages is a major problem, especially when the exact same syntax means different things. Imagine if I create a fork of Python that's absolutely identical except that you create a set with [1,2,3] and a list with {1,2,3}. All your code will be syntactically correct, but suddenly it does something quite different. That is a BAD idea. It would have to be *immensely* better to justify the breakage; and this is only "arguably better". (The most obvious contrary argument is that the default should do the thing most people want most often, which is working with text files. This same argument justifies the use of arbitrary-precision integers by default, instead of requiring an explicit "long" type; I'm sure you'll agree that the Py3 unification of these types was an advantage.) > 2. open text files in utf-8 encoding > why? > because users can not know the encoding of operating > system, their programs can not choose right encoding, > therefore a best guess is to expect the most widely used > standard Yes, this one is an issue. Python lets the OS recommend a default encoding, on the expectation that a Python script should fit into its host platform, rather than that all platforms should conform to what Python wants. A judgment call, and I'm sure there can be endless debates about what Python should do, but since it can be overridden with a single parameter on the open call, not a big deal IMO. > 3. threat stdout/stdin streams as binary > why? > because you don't want you data to be corrupt when > you pass it in and out of Python via standard streams Most definitely NOT. The standard streams should, by default, be text streams, and should have their encodings set according to what the other side wants. If there's a way for the OS and Python to communicate an encoding, that's absolutely perfect. Yes, there'll be a few edge cases involving redirection, but that's pretty much unsolvable anyway. The normal usage of Python MUST include Unicode; and that means the most obvious way to produce output (the print function) needs to write Unicode. So if stdout is a binary stream, what's print going to do with a str? Encode it? If so, you just move the issue - and print can send to multiple streams, so it'd need to know which are text and which are binary, etc, etc. Or should it throw an error, and force the programmer to do stuff like this: CONSOLE_ENCODING = "utf-8" # add some logic for guessing this s = "Hello, world!" print(s.encode(CONSOLE_ENCODING)) just to ensure that every programmer has to battle with the encodings manually, in lots of places, instead of configuring it once (or, more likely, having the default be right) and then having clean code everywhere? The only way that opening stdin/out as binary will prevent the corruption of your data is if your data is fundamentally bytes. Most programs, in any language, work with data that's fundamentally text; granted, a lot of languages don't distinguish, but if you look at what the programmer's doing, it's still text. Anything that prints "Hello, world!" is printing text, not bytes, and if the console's encoding is UTF-16, that should emit 26 bytes (plus any newline that's appropriate). Forcing the programmer to think about this is completely unnecessary. How many times do you actually come across these issues in porting? How much effort would you really save if these measures were implemented? If it's that important to you, fork CPython and create this "Normalized Python" that does everything you want (and then, linking this with the other thread, continue development of Normalized Python according to an Agile model and see if people join you rather than CPython). Good luck. ChrisA From rosuav at gmail.com Wed Jan 29 16:29:58 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 02:29:58 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik wrote: > You start with people who have full exclusive rights and contributing then > compare them to people who are willing to help, but don't do this. Then you > remove the obstacles to include these people. There's a fundamental misunderstanding behind this, I think. Contributions are valued, yes, but the purpose of an open source project does not begin and end at "encouraging contributions from every person on the planet". The goal of Python is to be a useful and usable programming language, and if that's best served by a single person doing all the coding, then that's how the project should be run. (I'm preeeeeetty confident that's not the case, though.) There's a general feeling around the world that dictatorships are bad, democracy is good, and the more people you have involved in something, the better. While this is not entirely false, it's not entirely true either. In the Bible, in the book of Proverbs, God tells us several times that multiple people's advice is of value. [1] [2] [3] But that's advice, not decision making. When it comes down to a final decision, it's almost always best to have a single person decide. A business has a CEO, an orchestra have a conductor, there's only one steering wheel in a car. And ultimately, trying to make every single thought behind every single decision public is counter-productive too. Ever tried to answer a child's "Why? Why? Why?" machine-gun? Yeah. On another project, I've contributed a large number of patches. Some fix bugs, some add features, some just fix little typos in documentation. All of them were simply submitted to the core team, reviewed, and ultimately applied, rejected, or modified. I'm not a core dev. I can't push to the git repository. But if I were to be given that power, it would be for reasons of convenience (if the core devs decide that all my patches are getting applied anyway, and it's easier for them to let me push my own), not transparency. You want to know what's going on? Get involved. Then you'll know. The people who care about the project will find a way to contribute. That's a fundamental of the open source model. You don't like the agreement that has to be signed before your patches will be accepted? Then contribute by reviewing other people's patches, or verifying bug reports, or whatever. Onus is not on the python.org legal team to make everything work for you; it's their job to make everything work for the PSF. I haven't looked into the specifics of the agreement in detail, but I'm confident that the PSF would not demand something just for the sake of bureaucracy, so I'd trust that there's good reason for all of it. (And hey. if you don't want to sign that, you can just declare that your contributions are public domain, IIRC.) I'm sure it's very American to demand that the people in power tell you what they're doing. (Or insert any other country name there, though I think the USA is at the forefront of this.) Trouble is, open source projects simply aren't built that way. ChrisA [1] Prov 11:14 http://www.biblegateway.com/passage/?search=proverbs%2011:14 [2] Prov 15:22 http://www.biblegateway.com/passage/?search=proverbs%2015:22 [3] Prov 24:6 http://www.biblegateway.com/passage/?search=proverbs%2024:6 From amber.yust at gmail.com Wed Jan 29 16:36:58 2014 From: amber.yust at gmail.com (Amber Yust) Date: Wed, 29 Jan 2014 15:36:58 +0000 Subject: [Python-ideas] Iterative development References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: <5676137352349587376@gmail297201516> I agree with you Chris, but can we keep religion out of this? On Wed Jan 29 2014 at 7:30:32 AM, Chris Angelico wrote: > On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik > wrote: > > You start with people who have full exclusive rights and contributing > then > > compare them to people who are willing to help, but don't do this. Then > you > > remove the obstacles to include these people. > > There's a fundamental misunderstanding behind this, I think. > > Contributions are valued, yes, but the purpose of an open source > project does not begin and end at "encouraging contributions from > every person on the planet". The goal of Python is to be a useful and > usable programming language, and if that's best served by a single > person doing all the coding, then that's how the project should be > run. (I'm preeeeeetty confident that's not the case, though.) > > There's a general feeling around the world that dictatorships are bad, > democracy is good, and the more people you have involved in something, > the better. While this is not entirely false, it's not entirely true > either. In the Bible, in the book of Proverbs, God tells us several > times that multiple people's advice is of value. [1] [2] [3] But > that's advice, not decision making. When it comes down to a final > decision, it's almost always best to have a single person decide. A > business has a CEO, an orchestra have a conductor, there's only one > steering wheel in a car. And ultimately, trying to make every single > thought behind every single decision public is counter-productive too. > Ever tried to answer a child's "Why? Why? Why?" machine-gun? Yeah. > > On another project, I've contributed a large number of patches. Some > fix bugs, some add features, some just fix little typos in > documentation. All of them were simply submitted to the core team, > reviewed, and ultimately applied, rejected, or modified. I'm not a > core dev. I can't push to the git repository. But if I were to be > given that power, it would be for reasons of convenience (if the core > devs decide that all my patches are getting applied anyway, and it's > easier for them to let me push my own), not transparency. You want to > know what's going on? Get involved. Then you'll know. > > The people who care about the project will find a way to contribute. > That's a fundamental of the open source model. You don't like the > agreement that has to be signed before your patches will be accepted? > Then contribute by reviewing other people's patches, or verifying bug > reports, or whatever. Onus is not on the python.org legal team to make > everything work for you; it's their job to make everything work for > the PSF. I haven't looked into the specifics of the agreement in > detail, but I'm confident that the PSF would not demand something just > for the sake of bureaucracy, so I'd trust that there's good reason for > all of it. (And hey. if you don't want to sign that, you can just > declare that your contributions are public domain, IIRC.) > > I'm sure it's very American to demand that the people in power tell > you what they're doing. (Or insert any other country name there, > though I think the USA is at the forefront of this.) Trouble is, open > source projects simply aren't built that way. > > ChrisA > > [1] Prov 11:14 http://www.biblegateway.com/passage/?search=proverbs%2011: > 14 > [2] Prov 15:22 http://www.biblegateway.com/passage/?search=proverbs%2015: > 22 > [3] Prov 24:6 http://www.biblegateway.com/passage/?search=proverbs%2024:6 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Wed Jan 29 16:52:31 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 29 Jan 2014 23:52:31 +0800 Subject: [Python-ideas] Iterative development In-Reply-To: <5676137352349587376@gmail297201516> References: <52E8BBD6.2010604@stoneleaf.us> <5676137352349587376@gmail297201516> Message-ID: > You want to know what's going on? Get involved. Then you'll know +1. It's odd to complain about the project's organization and processes when you haven't actually had any real experience with either. Getting involved in some project run by other people isn't easy, but it's not really that hard either in the world of open source. On Wed, Jan 29, 2014 at 11:36 PM, Amber Yust wrote: > I agree with you Chris, but can we keep religion out of this? > > > On Wed Jan 29 2014 at 7:30:32 AM, Chris Angelico wrote: > >> On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik >> wrote: >> > You start with people who have full exclusive rights and contributing >> then >> > compare them to people who are willing to help, but don't do this. Then >> you >> > remove the obstacles to include these people. >> >> There's a fundamental misunderstanding behind this, I think. >> >> Contributions are valued, yes, but the purpose of an open source >> project does not begin and end at "encouraging contributions from >> every person on the planet". The goal of Python is to be a useful and >> usable programming language, and if that's best served by a single >> person doing all the coding, then that's how the project should be >> run. (I'm preeeeeetty confident that's not the case, though.) >> >> There's a general feeling around the world that dictatorships are bad, >> democracy is good, and the more people you have involved in something, >> the better. While this is not entirely false, it's not entirely true >> either. In the Bible, in the book of Proverbs, God tells us several >> times that multiple people's advice is of value. [1] [2] [3] But >> that's advice, not decision making. When it comes down to a final >> decision, it's almost always best to have a single person decide. A >> business has a CEO, an orchestra have a conductor, there's only one >> steering wheel in a car. And ultimately, trying to make every single >> thought behind every single decision public is counter-productive too. >> Ever tried to answer a child's "Why? Why? Why?" machine-gun? Yeah. >> >> On another project, I've contributed a large number of patches. Some >> fix bugs, some add features, some just fix little typos in >> documentation. All of them were simply submitted to the core team, >> reviewed, and ultimately applied, rejected, or modified. I'm not a >> core dev. I can't push to the git repository. But if I were to be >> given that power, it would be for reasons of convenience (if the core >> devs decide that all my patches are getting applied anyway, and it's >> easier for them to let me push my own), not transparency. You want to >> know what's going on? Get involved. Then you'll know. >> >> The people who care about the project will find a way to contribute. >> That's a fundamental of the open source model. You don't like the >> agreement that has to be signed before your patches will be accepted? >> Then contribute by reviewing other people's patches, or verifying bug >> reports, or whatever. Onus is not on the python.org legal team to make >> everything work for you; it's their job to make everything work for >> the PSF. I haven't looked into the specifics of the agreement in >> detail, but I'm confident that the PSF would not demand something just >> for the sake of bureaucracy, so I'd trust that there's good reason for >> all of it. (And hey. if you don't want to sign that, you can just >> declare that your contributions are public domain, IIRC.) >> >> I'm sure it's very American to demand that the people in power tell >> you what they're doing. (Or insert any other country name there, >> though I think the USA is at the forefront of this.) Trouble is, open >> source projects simply aren't built that way. >> >> ChrisA >> >> [1] Prov 11:14 http://www.biblegateway.com/passage/?search=proverbs%2011: >> 14 >> [2] Prov 15:22 http://www.biblegateway.com/passage/?search=proverbs%2015: >> 22 >> [3] Prov 24:6 http://www.biblegateway.com/passage/?search=proverbs%2024:6 >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Jan 29 17:18:12 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 03:18:12 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> <5676137352349587376@gmail297201516> Message-ID: On Thu, Jan 30, 2014 at 2:52 AM, Haoyi Li wrote: >> You want to know what's going on? Get involved. Then you'll know > > +1. It's odd to complain about the project's organization and processes when > you haven't actually had any real experience with either. Getting involved > in some project run by other people isn't easy, but it's not really that > hard either in the world of open source. I first met that concept with community groups, rather than open source projects, but the result is similar. There were people who desperately wanted to be in that "inner circle" of people who knew, a year in advance, which Gilbert & Sullivan operas were going to be performed, and who'd be directing them, and who would be playing which roles, and so on. It's all announced sooner or later, but for some people, they'd really rather it be "sooner" than "later". Well, that's easily solved. Serve on the society's committee - then you know what's happening, because you're helping to make it happen. And if you're happy with a lesser advantage from lesser work, just swing by and help us with our mail-out. You get to read the info we're sending before we send it out... because you're helping us to send it out. In one stroke, you call the bluff of anyone who just wanted handouts of information, satisfy the desires of those who really care, and maybe even get some extra help running the (all-volunteer) organization. I call that a win! :) ChrisA From techtonik at gmail.com Wed Jan 29 13:48:26 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 29 Jan 2014 15:48:26 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Wed, Jan 29, 2014 at 11:57 AM, Andrew Barnert wrote: > On Jan 28, 2014, at 23:44, anatoly techtonik wrote: > >> Yet another idea that some of you will find strange. > > You do realize that Python is an open source project? Yes, captain. However, I fail to see why to ask the question. If you're saying that open source projects can't have any kind of methodology to save time and coordinate efforts more efficiently, then I have to disagree with you. Example from good old times of 2011 http://scons.org/wiki/BugParty I am certain there other open source projects with similar processes. > And that the only people who work on it full time are the ones being paid by some organization that generally has its own priorities? You don't need to work full time to participate in two week cycle. As I answered to Ethan, it is not development cycle time. It is just ordinary two weeks time. You choose what you can do in these two week and do this. You may find that you have more time than you've planned during this time, so you can see who is working on what and help them (if possible). From techtonik at gmail.com Wed Jan 29 14:08:21 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 29 Jan 2014 16:08:21 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 12:29 PM, Mark Lawrence wrote: > On 29/01/2014 07:44, anatoly techtonik wrote: >> >> Yet another idea that some of you will find strange. >> > > Instead of coming up with ideas, why not sign the contributors' agreement > and come up with code that people can actually use? replied to python-legal-sig https://mail.python.org/pipermail/python-legal-sig/2014-January/000070.html From rosuav at gmail.com Wed Jan 29 17:27:38 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 03:27:38 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Wed, Jan 29, 2014 at 11:48 PM, anatoly techtonik wrote: > You don't need to work full time to participate in two week cycle. > As I answered to Ethan, it is not development cycle time. It is just > ordinary two weeks time. You choose what you can do in these two > week and do this. You may find that you have more time than you've > planned during this time, so you can see who is working on what > and help them (if possible). What does the two-week cycle achieve that current processes with the bug tracker can't? Please explain to us the benefits of the Agile model, as they apply to a loose collaboration. ChrisA From taleinat at gmail.com Wed Jan 29 17:56:20 2014 From: taleinat at gmail.com (Tal Einat) Date: Wed, 29 Jan 2014 18:56:20 +0200 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 3:08 PM, anatoly techtonik wrote: > On Wed, Jan 29, 2014 at 12:29 PM, Mark Lawrence wrote: >> Instead of coming up with ideas, why not sign the contributors' agreement >> and come up with code that people can actually use? > > replied to python-legal-sig > https://mail.python.org/pipermail/python-legal-sig/2014-January/000070.html Basically, you refuse to sign a contributor agreement, but insist on blaming the PSF for that. Your position is simply unreasonable: You demand that the PSF should either stop demanding a contributor agreement, accept your own personal version of it, or spend a lot of time and energy attempting to explain it to you. You blame the PSF of being a needlessly bureaucratic political body which is giving you a hard time just because it can or because it doesn't care; whatever you may think, that is the opposite of the truth. Furthermore, since you continue to choose to phrase your arguments aggressively and offensively, how can you expect anyone to consider your proposals?? Regarding the contributor agreement, please spend your time and energy understanding it, instead of arguing about it here and blaming other people. Otherwise, stop pestering people about it. What you demand regarding the contributor agreement is not going to happen, period. If you actually care about Python, find a way to contribute helpfully! Even if you believe that we are the ones being stubborn and unhelpful, it is up to you to find a way to work with us productively. For example, I am sure you have noticed that few of your ideas posted here have been helpful in any way, if any at all. I do believe that you think these are good ideas, but surely you must see that nothing good results from your posting them to this list. As it is, you have been harming the development of Python considerably for many months by pestering people on various mailing lists. If you want to help, you must change your behavior! - Tal From abarnert at yahoo.com Wed Jan 29 18:24:01 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 29 Jan 2014 09:24:01 -0800 Subject: [Python-ideas] Normalized Python In-Reply-To: References: Message-ID: <97298FAB-ED1F-42B8-B69B-A189F41C03D7@yahoo.com> Chris, I pretty much agree with you, but there are two major additional points you didn't mention. On Jan 29, 2014, at 6:33, Chris Angelico wrote: > On Wed, Jan 29, 2014 at 8:11 PM, anatoly techtonik wrote: > >> 3. threat stdout/stdin streams as binary >> why? >> because you don't want you data to be corrupt when >> you pass it in and out of Python via standard streams > > Most definitely NOT. The standard streams should, by default, be text > streams, and should have their encodings set according to what the > other side wants. Note that when the other side is a Windows console, what it _really_ wants is for you not to use stdio, but to instead use the separate UTF-16-specific console APIs. Fitting this into Python 3's cross-platform io model is a bit challenging, and not yet done, but certainly doable. (It's been discussed multiple times, both on this list and elsewhere.) Fitting this into a Python 2-style io model as Anatoly suggests is completely impossible. Instead, every single program would have to either check that stdout.isatty and platform is Windows and explicitly use something other than stdout, or figure out the console encoding (which is hard to do from inside Python if you take away the stdout.encoding that Python provides for the text stdout today) and explicitly encoding every string to be printed. There's also the fact that the print function implicitly converts everything to a str for you, which wouldn't do any good if stdout were a binary file. Unlike Python 2, Python 3 has no way to convert arbitrary objects to bytes strings, which means you would need a mandatory encoding keyword arg on every call to print that took any args that weren't bytes-compatible. Between these two issues, the proposal would effectively give Python 3 all of the stdio/print problems that Python 2 had, and more, without any of Python 2's partial solutions to those problems. From rosuav at gmail.com Wed Jan 29 18:36:33 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 04:36:33 +1100 Subject: [Python-ideas] Normalized Python In-Reply-To: <97298FAB-ED1F-42B8-B69B-A189F41C03D7@yahoo.com> References: <97298FAB-ED1F-42B8-B69B-A189F41C03D7@yahoo.com> Message-ID: On Thu, Jan 30, 2014 at 4:24 AM, Andrew Barnert wrote: > Note that when the other side is a Windows console, what it _really_ wants is for you not to use stdio, but to instead use the separate UTF-16-specific console APIs. > > Fitting this into Python 3's cross-platform io model is a bit challenging, and not yet done, but certainly doable. (It's been discussed multiple times, both on this list and elsewhere.) > In the theoretical ideal, all that should be buried within the definition of the print function (or what it calls on). I should be able to write a program that says: print("Copyright ? 2014 My Name") even if my name includes non-ASCII, even non-BMP, characters; and that program should produce that output in whatever way is appropriate to the platform. (If it's running on a printer, that should produce a hard copy.) Now, maybe that ideal can't be attained, due to some platforms' limitations or stupidity, and clean code is of value too, but certainly the notion of "write a Unicode string to the most obvious place of output" is one that ought *conceptually* to be supported equally on all platforms, without my having to figure out one from another. Obviously if your terminal expects one encoding but announces another, there's going to be a mess. The theoretical ideal works only when negotiations are done properly. But again, that's outside of Python; and if the next version of SomeWeirdOS introduces a new means of announcing its console encoding, it should simply be a matter of coding that into Python, *not* into every single script. ChrisA From ronaldoussoren at mac.com Thu Jan 30 09:44:36 2014 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 30 Jan 2014 09:44:36 +0100 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: References: <52E1F56E.2030805@stoneleaf.us> Message-ID: <49AF79E2-129C-4548-A017-9B2B18BD62E8@mac.com> On 24 Jan 2014, at 08:54, Suresh V. wrote: > On Friday 24 January 2014 10:39 AM, Ethan Furman wrote: >> On 01/23/2014 08:09 PM, Suresh V. wrote: >>> >>> Also it would mean that the client code imports from this package. >>> I would like client code to remain exactly as it is (continue to >>> import from its original package) but the behavior is enhanced >>> once this package is imported on startup. >> >> /Something/ has to adjust the pre and post conditions -- if not the >> client code, then what? > > pre and post conditions are just one possible use of this. > > Going back to my smtplib.SMTP.sendmail example. > No changes in bulk of client code. > Single patch module imported in main. Why is this a good thing? You seem to propose adding a mechanism that makes it easily possible to modify the behaviour of existing functions, which makes it harder to reason about code. While this is also possible without language changes with the current monkey patching mechanisms its at least clear that your doing something naughty when writing the patching code :-) Ronald From rosuav at gmail.com Thu Jan 30 13:49:44 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Jan 2014 23:49:44 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Thu, Jan 30, 2014 at 10:56 PM, anatoly techtonik wrote: > You say - "short cycles" are bad. In agile I'd say - let's try and see why. > Maybe it's cycles what are bad, maybe it's people who can not sync > this often, maybe there is a technical problem with communication that > can be resolved by using right tools. The very concept of a cycle suggests a system that's more suited to a business environment than general open source development. Forcing people to pick up and set down work might be useful in the very short period just before a version release (I've been seeing some stuff about Argument Clinic - btw, kudos to the tireless people doing that, it's a huge job - and how some of the work will be deferred to 3.5), but most of the time, it's completely unnecessary. In big business, you might have a couple dozen programmers working on some particular job; in that two week cycle, each one could potentially put in quite a few hours. I heard a figure of 80 hours quoted, but I'm dubious about how many actual dev hours a salaried programmer would get done, in between meetings and whatnot. Still, could easily be upwards of 50 hours. Forcing everyone to stop and re-check things every fifty dev hours doesn't sound too bad. Now look at volunteers. Two weeks might be anywhere from zero hours up to... well, the upper end doesn't matter. But it could easily be just a single dev hour in that time. Are you then going to force this person to set aside what he's partially done, because of some arbitrary break point? Now, what happens if you take Agile and eliminate the two-week period? It begins to look very much like a pool of issues on a bug tracker. You have a pile of stuff to do, someone picks up something he feels like doing, posts a result back. Hmm, I wonder if that might be what's already happening... Do you see now why I was, without any experience of Agile, already dubious about its merits? And that even before Nick stated from experience that it's not going to help. Ideas are all very well, but they're useless without some form of test-bed. The only perfect way to find out if an idea works or not is to try it, and the onus is on the inventor to risk something for his idea. Put the theory to work on some project. Once you can point to some clear advantages *in practice*, you'll be able to recommend this to other people. So... fork CPython, tell us all how wonderful your version is going to be, and then show us how, in two weeks, or four weeks, or six weeks, you can do amazing stuff with a motley crew of programmers. Then we'll all take notice. ChrisA From rosuav at gmail.com Thu Jan 30 14:40:26 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 31 Jan 2014 00:40:26 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Fri, Jan 31, 2014 at 12:25 AM, anatoly techtonik wrote: > Single dev hour is ok if you reached your goal. That's the point. > You set the goals - you reach them. If you didn't reach them - you > analyze and see what could be done better. It is all in relaxing and > free manner, unlike the bloody corporation culture. You may invite > other people to join the fun. People can find what are you working > on and propose help. > > This is the process. Let's say you pick up something that's going to turn out to take you three dev hours. You then put one hour of work into it, and the two-week cut-off rolls around. What do you do? In the current model, there is no cut-off, so you just keep your work where it is until you find the time to finish it. Then you format it as a patch, put it on the tracker issue, and move on. (Or, if you're a core dev, I suppose you push it, see if the buildbots start looking red and angry, and then move on. Either way.) It doesn't matter if that took you one day, two weeks, or three months. What you're suggesting is that people should conform to an arbitrary number-of-days cutoff. That means that if the cut-off is getting close, there's a *dis*incentive to pick up any job, because you won't be able to finish it. Imagine if, when writing up a post for the mailing list, you had to finish each sentence inside one minute as per the clock. If it's currently showing hh:mm:49, you'd do better to not start a sentence, because you probably can't finish it in eleven seconds. Is that an advantage over "just write what you like, when you like"? ChrisA From techtonik at gmail.com Thu Jan 30 12:24:44 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 14:24:44 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: On Wed, Jan 29, 2014 at 12:47 PM, Terry Reedy wrote: > On 1/29/2014 2:44 AM, anatoly techtonik wrote: > >> The idea is to split development of Python into two weeks cycle. Every >> two weeks is "iteration". Iteration consists of phases: >> >> 1. Planning (one, two days) >> 2. Execution >> 3. Testing >> 4. Demo >> 5. Retrospective > > > This is more or less what we do now on an issue by issue basis. At a higher > level, releases for the 'next' version already come out at 2 or 3 week > intervals from a0 to final. At a higher level, we already have plans for 3.5 > that we will start on as soon as 3.4.0 is out or after PyCon. It is quite obvious from outside that Python has some kind of process, but it is quite hard to sync to it for people from outside, because it is not open - is not completely clear how the planning is made, which tasks are available for current sprint, what you can help with and how to track the progress. From techtonik at gmail.com Thu Jan 30 12:45:19 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 14:45:19 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Wed, Jan 29, 2014 at 4:57 PM, Chris Angelico wrote: > On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik wrote: > > Here's a suggestion: Fork Python (that's legal, that's what open > source means) and start development using the model you advocate. If > it's massively better than what's happening, (a) developers will flock > to your model, and (b) the project could be completely handed over to > you, as happened with GCC. There is a big difference between people who invent things and do things. I am a lazy bastard who can not do anything and sustain its job, because he is constantly inventing new stuff that no one is able to implement. Over the years I realized that the only good that I can do to humanity is to develop a sustainable model. So far it didn't happen, because it appeared that people only work on their own ideas. I don't own my ideas - they are free for everyone to explore and discuss. So if there is anything valuable - take it. I don't need power over project or money or anything in between. Next day there will be another idea and another discussion. It is nice to see communities that can develop ideas, that can realize that people are different and use the potential of that people are capable for to a full degree. It is also nice to see the evolution of people to act in a new roles that are uncommon for them. You won't like it, but it is also nice to see how people become worse, because they are human species and to realize that everyone is imperfect. What is not so nice is to see good things fail, because people can not reuse technology to help them to deal with human factor. > Or alternatively, explain to us here what the real advantages are of > your new model. So far, what I've seen is "hey, here's an idea", and > not "here's what this idea will do to benefit Python"; and the idea > itself looks more suited to a big business than to open source. Maybe > someone who's actually used Agile will know what's so wonderful about > it, but unless every core dev *has*, a bit of explanation will help. Ok. In short. There is only one advantage: - increased visibility which in turn results in - increased interest which in turn results in - increased participation. What problem does agile solve. There is one big problem that "increased participation" is actually the negative factor for existing contributors, because it takes more time from them. Where does this "more time" comes from? In current model: - increased participation == increased communication If you constantly communicate, you don't have time for development (probably the things that you like the most). How does agile help with that? "agile" means just that - "flexible". If you see the problem, you are not saying "we are all developers, nobody is interested in communications". No, instead you're saying -- ok, we have a communication problem, what can we try? In current model, you can not try anything, because you can not set goals. Goals is something that is at least: - Measurable - Time-bound There is no time bounds, there is no measurement. These are not part of the process, so you don't have even any means to solve the communication and time deficiency problem. If we have two weeks cycle, we can at least set goals. From techtonik at gmail.com Thu Jan 30 12:56:35 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 14:56:35 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Wed, Jan 29, 2014 at 5:18 PM, Nick Coghlan wrote: > On 29 January 2014 23:57, Chris Angelico wrote: >> On Wed, Jan 29, 2014 at 11:29 PM, anatoly techtonik wrote: >>> If information doesn't reach the recipient who want to read it, it is "hidden". >>> Even if you talk in public channel on IRC, the information is hidden from me >>> if I was not connected and channel doesn't have public logs. >> >> Then if you care, connect. It's not hidden if you have the power to access it. >> >> Here's a suggestion: Fork Python (that's legal, that's what open >> source means) and start development using the model you advocate. If >> it's massively better than what's happening, (a) developers will flock >> to your model, and (b) the project could be completely handed over to >> you, as happened with GCC. >> >> Or alternatively, explain to us here what the real advantages are of >> your new model. So far, what I've seen is "hey, here's an idea", and >> not "here's what this idea will do to benefit Python"; and the idea >> itself looks more suited to a big business than to open source. Maybe >> someone who's actually used Agile will know what's so wonderful about >> it, but unless every core dev *has*, a bit of explanation will help. > > Plenty of us have used it, and we know it's an entirely inappropriate > model for open source development projects with broad asynchronous > participation, as the time commitment needed to make the short cycle > work is antithetical to loose collaboration. It works well for a > focused team supporting a single application to meet the specific > needs of a single business, though. About *Agile* I tried to avoid the word Agile, but since you saying that you've used that, let us agree on terminology. In my world *agile* means *flexible*, which means *able to change*. It doesn't mean *scrum* or *two weeks sprint* or any of the hardcoded value that you put behind the phrase of "entirely inappropriate model for open source". Not that that's clear, let's move on. "Asynchronous participation" is called "distributed development", and it is used both by open source and by commercial companies a lot. If Yahoo terminated this practice and Google didn't even try - that's a problem of management of these companies. It doesn't mean it doesn't work for professional teams or people *interested* in interacting this way. Agile helps to analyze and improve distributed development processes the same way it does for rigid corporate practices. You say - "short cycles" are bad. In agile I'd say - let's try and see why. Maybe it's cycles what are bad, maybe it's people who can not sync this often, maybe there is a technical problem with communication that can be resolved by using right tools. From techtonik at gmail.com Thu Jan 30 13:44:08 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 15:44:08 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> <5676137352349587376@gmail297201516> Message-ID: On Wed, Jan 29, 2014 at 6:52 PM, Haoyi Li wrote: >> You want to know what's going on? Get involved. Then you'll know > > +1. It's odd to complain about the project's organization and processes when > you haven't actually had any real experience with either. Getting involved > in some project run by other people isn't easy, but it's not really that > hard either in the world of open source. I know that my experience is nothing compared to other people, and therefore I am even more interested to get feedback from people, deeply involved in the organization and processes, about good and bad things in the original idea of "Iterative Development" presented in the first thread message under this subject. From techtonik at gmail.com Thu Jan 30 13:52:43 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 15:52:43 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Wed, Jan 29, 2014 at 7:27 PM, Chris Angelico wrote: > On Wed, Jan 29, 2014 at 11:48 PM, anatoly techtonik wrote: >> You don't need to work full time to participate in two week cycle. >> As I answered to Ethan, it is not development cycle time. It is just >> ordinary two weeks time. You choose what you can do in these two >> week and do this. You may find that you have more time than you've >> planned during this time, so you can see who is working on what >> and help them (if possible). > > What does the two-week cycle achieve that current processes with the > bug tracker can't? More fun with collaboration. For some people it is not fun to grok the bugs they don't personally need to be solved. Sometimes because of complexity of the problem, but helping some else may be fun. Current bug tracker doesn't show: 1. what is important for people who think like you are 2. what is the current development focus So you can not plan how to spend your time more effectively and how to help with development. > Please explain to us the benefits of the Agile model, as they apply to > a loose collaboration. As I said, there is no single Agile model. Model can be agile (adapting, willing to change, flexible), natural or rigid. In rigid model you don't have choice. Take the bug, commit, release. In natural model you may have additional and optional steps. In agile model, you have a feedback loop that allows to estimate how good the model actually is and experiment with it to see if it can be better. From techtonik at gmail.com Thu Jan 30 14:25:36 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 30 Jan 2014 16:25:36 +0300 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: On Thu, Jan 30, 2014 at 3:49 PM, Chris Angelico wrote: > On Thu, Jan 30, 2014 at 10:56 PM, anatoly techtonik wrote: >> You say - "short cycles" are bad. In agile I'd say - let's try and see why. >> Maybe it's cycles what are bad, maybe it's people who can not sync >> this often, maybe there is a technical problem with communication that >> can be resolved by using right tools. > > The very concept of a cycle suggests a system that's more suited to a > business environment than general open source development. The cycle is needed when you need some kind of visibility in the process. For business environment it is critical, because it has to control the spendings. For open source environment it is critical for people, because they need to plan their time. > Forcing > people to pick up and set down work might be useful in the very short > period just before a version release (I've been seeing some stuff > about Argument Clinic - btw, kudos to the tireless people doing that, > it's a huge job - and how some of the work will be deferred to 3.5), Again, it is not forcing anyone. It is just a process. You are free to fail you development goal. It is not a business - there is nobody to fire you or say that you're underperforming. There is no heroism either. If you do not know your development pace, you can try and measure it, if you do know, you just realistically state what are you working on and if you need help with that. > but most of the time, it's completely unnecessary. In big business, > you might have a couple dozen programmers working on some particular > job; in that two week cycle, each one could potentially put in quite a > few hours. I heard a figure of 80 hours quoted, but I'm dubious about > how many actual dev hours a salaried programmer would get done, in > between meetings and whatnot. Still, could easily be upwards of 50 > hours. Forcing everyone to stop and re-check things every fifty dev > hours doesn't sound too bad. Now look at volunteers. Two weeks might > be anywhere from zero hours up to... well, the upper end doesn't > matter. But it could easily be just a single dev hour in that time. > Are you then going to force this person to set aside what he's > partially done, because of some arbitrary break point? Single dev hour is ok if you reached your goal. That's the point. You set the goals - you reach them. If you didn't reach them - you analyze and see what could be done better. It is all in relaxing and free manner, unlike the bloody corporation culture. You may invite other people to join the fun. People can find what are you working on and propose help. This is the process. > Now, what happens if you take Agile and eliminate the two-week period? > It begins to look very much like a pool of issues on a bug tracker. > You have a pile of stuff to do, someone picks up something he feels > like doing, posts a result back. Hmm, I wonder if that might be what's > already happening... Do you see now why I was, without any experience > of Agile, already dubious about its merits? And that even before Nick > stated from experience that it's not going to help. That's a crowdsourced development, not a team work in distributed environment. And there is no place for team to appear if everybody looks at a big pile of garbage and chooses the shiny metal plate that is precious only yo him. The environment you've described is not encouraging team birth and collaboration in any way. More than that - it looks like people would even oppose if commercial development teams would propose their work. In the past it happened already with "unladen swallow" project. Current development process couldn't munch the result if this work, and people didn't even try to adjust the process to make the future efforts possible. > Ideas are all very well, but they're useless without some form of > test-bed. The only perfect way to find out if an idea works or not is > to try it, and the onus is on the inventor to risk something for his > idea. Put the theory to work on some project. Once you can point to > some clear advantages *in practice*, you'll be able to recommend this > to other people. So... fork CPython, tell us all how wonderful your > version is going to be, and then show us how, in two weeks, or four > weeks, or six weeks, you can do amazing stuff with a motley crew of > programmers. Then we'll all take notice. I don't want core devs to accept this process at all. I don't want to sell it to them and I don't want them to follow it. =) It is completely optional, and I just don't want them to make more obstacles to new people who would like to try these. It may happen that resistance to change for open source projects may be bigger than in organizations. I just want to make sure that people aware that applying agile methodology to open source development is possible and I am inclined that it brings more positive improvements for the Python itself than de-facto development processes. From phd at phdru.name Thu Jan 30 16:35:27 2014 From: phd at phdru.name (Oleg Broytman) Date: Thu, 30 Jan 2014 16:35:27 +0100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: <20140130153527.GA17563@phdru.name> anatoly, if you are trying to change the development process, you're making two big mistakes: 1. You are late by twentysomething years. 2. You are trying to change the development process from the outside. That never works. In the world of free software changes can only be made from the inside. First become a good citizen, a valuable contributor, then propose changes to the process. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From zachary.ware+pyideas at gmail.com Thu Jan 30 17:17:52 2014 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Thu, 30 Jan 2014 10:17:52 -0600 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: I haven't been following this thread very closely, but I have to disagree with you here, Anatoly. On Thu, Jan 30, 2014 at 5:24 AM, anatoly techtonik wrote: > It is quite obvious from outside that Python has some kind of process, Which is well documented in several places. It can be tricky to always find all of those places, but anyone who is interested can ask, and will be quickly shown where to look. > but it is quite hard to sync to it for people from outside, I'm not sure what you mean here. Every contributor starts from "outside" of Python. I found no difficulty in getting started when I did, and I've seen several people start contributing successfully since then. It would be very hard to go from nothing to suddenly contributing huge patches to the innermost details of Python at a rapid pace, but that's not really what people (especially people new to open source development, like I was) should be doing anyway. Start slow and small, build from there, and it's an easy and painless process. > because it is not open Here I must disagree emphatically. My entire Python experience shows me that everything about Python is as open as possible. If you want to know something, look for it. If you can't find it, ask for it. If you can't be shown where it is, somebody (even yourself) will write it down somewhere so the next person looking can find it. > - is not completely clear how the planning is made, I'm not sure what you mean here, what planning? Anything that could be construed as "planning" is done via the PEP process, which is well documented in PEP 1. > which tasks are available for current sprint, what you can help with and how to track > the progress. This is the very definition of a bug tracker, and Python's is quite good for all of this. There could stand to be some upkeep done on some of the older issues: it would be good for an impartial person to pick through and see whether an issue is still a problem, update any patches to apply to current branches, manage the 'easy' tag, add the proper people to the nosy list, etc. This kind of thing would be a great place for someone to contribute. Honestly, just bringing all tracker issues up to date would be a worthwhile sprint task in my opinion. -- Zach From random832 at fastmail.us Thu Jan 30 17:38:07 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 30 Jan 2014 11:38:07 -0500 Subject: [Python-ideas] Normalized Python In-Reply-To: <97298FAB-ED1F-42B8-B69B-A189F41C03D7@yahoo.com> References: <97298FAB-ED1F-42B8-B69B-A189F41C03D7@yahoo.com> Message-ID: <1391099887.12127.77296821.5C13DD4B@webmail.messagingengine.com> On Wed, Jan 29, 2014, at 12:24, Andrew Barnert wrote: > Fitting this into a Python 2-style io model as Anatoly suggests is > completely impossible. Instead, every single program would have to either > check that stdout.isatty As a sidenote, isatty is broken on windows: it considers NUL to be a tty. This is because it wraps a C function which in MSVC has the same flaw. From yselivanov.ml at gmail.com Thu Jan 30 18:44:50 2014 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 30 Jan 2014 12:44:50 -0500 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: Message-ID: <52EA8F92.5030708@gmail.com> On 1/27/2014, 12:41 PM, Wolfgang wrote: [snip] > As for the first subject: > Specifically, I am not happy with the way the function handles different > types. Currently _coerce_types gets called for every element in the > function's input sequence and type conversion follows quite complicated > rules, and - what is worst - make the outcome of _sum() and thereby mean() > dependent on the order of items in the input sequence, e.g.: > >>>> mean((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5))) > 1.9944444444444445 > >>>> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) > Traceback (most recent call last): > File "", line 1, in > mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) > File "C:\Python33\statistics.py", line 369, in mean > return _sum(data)/n > File "C:\Python33\statistics.py", line 157, in _sum > T = _coerce_types(T, type(x)) > File "C:\Python33\statistics.py", line 327, in _coerce_types > raise TypeError('cannot coerce types %r and %r' % (T1, T2)) > TypeError: cannot coerce types and 'decimal.Decimal'> FWIW, I find some of the concerns Wolfgang raised quite valid. Steven, what do you think? Yury From greg at krypto.org Thu Jan 30 19:15:04 2014 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 30 Jan 2014 10:15:04 -0800 Subject: [Python-ideas] statistics module in Python3.4 Message-ID: (resending my the original had the wrong list address in the cc for some reason) ---------- Forwarded message ---------- From: Gregory P. Smith Date: Thu, Jan 30, 2014 at 9:59 AM Subject: Re: [Python-ideas] statistics module in Python3.4 To: Wolfgang Cc: python-ideas at googlegroups.com, Steven D'Aprano , Larry Hastings +cc Steve, the PEP 450 author On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang < wolfgang.maier at biologie.uni-freiburg.de> wrote: > Dear all, > I am still testing the new statistics module and I found two cases were > the behavior of the module seems suboptimal to me. > My most important concern is the module's internal _sum function and its > implications, the other one about passing Counter objects to module > functions. > > As for the first subject: > Specifically, I am not happy with the way the function handles different > types. Currently _coerce_types gets called for every element in the > function's input sequence and type conversion follows quite complicated > rules, and - what is worst - make the outcome of _sum() and thereby mean() > dependent on the order of items in the input sequence, e.g.: > > >>> mean((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5))) > 1.9944444444444445 > > >>> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) > Traceback (most recent call last): > File "", line 1, in > mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) > File "C:\Python33\statistics.py", line 369, in mean > return _sum(data)/n > File "C:\Python33\statistics.py", line 157, in _sum > T = _coerce_types(T, type(x)) > File "C:\Python33\statistics.py", line 327, in _coerce_types > raise TypeError('cannot coerce types %r and %r' % (T1, T2)) > TypeError: cannot coerce types and 'decimal.Decimal'> > > (this is because when _sum iterates over the input type Fraction wins over > int, then float wins over Fraction and over everything else that follows in > the first example, but in the second case Fraction wins over int, but then > Fraction vs Decimal is undefined and throws an error). > > Confusing, isn't it? So here's the code of the _sum function: > > def _sum(data, start=0): > """_sum(data [, start]) -> value > > Return a high-precision sum of the given numeric data. If optional > argument ``start`` is given, it is added to the total. If ``data`` is > empty, ``start`` (defaulting to 0) is returned. > > > Examples > -------- > > >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75) > 11.0 > > Some sources of round-off error will be avoided: > > >>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. > 1000.0 > > Fractions and Decimals are also supported: > > >>> from fractions import Fraction as F > >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) > Fraction(63, 20) > > >>> from decimal import Decimal as D > >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] > >>> _sum(data) > Decimal('0.6963') > > """ > > n, d = _exact_ratio(start) > T = type(start) > partials = {d: n} # map {denominator: sum of numerators} > # Micro-optimizations. > coerce_types = _coerce_types > exact_ratio = _exact_ratio > partials_get = partials.get > # Add numerators for each denominator, and track the "current" type. > for x in data: > T = _coerce_types(T, type(x)) > n, d = exact_ratio(x) > partials[d] = partials_get(d, 0) + n > if None in partials: > assert issubclass(T, (float, Decimal)) > assert not math.isfinite(partials[None]) > return T(partials[None]) > total = Fraction() > for d, n in sorted(partials.items()): > total += Fraction(n, d) > if issubclass(T, int): > assert total.denominator == 1 > return T(total.numerator) > if issubclass(T, Decimal): > return T(total.numerator)/total.denominator > return T(total) > > Internally, the function uses exact ratios for its calculations (which I > think is very nice) and only goes through all the pain of coercing types to > return > T(total.numerator)/total.denominator > where T is the final type resulting from the chain of conversions. > > I think a much cleaner (and probably faster) implementation would be to > gather first all the types in the input sequence, then decide what to > return in an input order independent way. > +1 Agreed that this would be cleaner given your example above. > My tentative implementation: > > def _sum2(data, start=None): > if start is not None: > t = set((type(start),)) > n, d = _exact_ratio(start) > else: > t = set() > n = 0 > d = 1 > partials = {d: n} # map {denominator: sum of numerators} > > # Micro-optimizations. > exact_ratio = _exact_ratio > partials_get = partials.get > > # Add numerators for each denominator, and build up a set of all types. > for x in data: > t.add(type(x)) > n, d = exact_ratio(x) > partials[d] = partials_get(d, 0) + n > T = _coerce_types(t) # decide which type to use based on set of all > types > if None in partials: > assert issubclass(T, (float, Decimal)) > assert not math.isfinite(partials[None]) > return T(partials[None]) > total = Fraction() > for d, n in sorted(partials.items()): > total += Fraction(n, d) > if issubclass(T, int): > assert total.denominator == 1 > return T(total.numerator) > if issubclass(T, Decimal): > return T(total.numerator)/total.denominator > return T(total) > > this leaves the re-implementation of _coerce_types. Personally, I'd prefer > something as simple as possible, maybe even: > > def _coerce_types (types): > if len(types) == 1: > return next(iter(types)) > return float > > , but that's just a suggestion. > > In this case then: > > >>> _sum2((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5)))/6 > 1.9944444444444445 > > >>> _sum2((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))/6 > 1.9944444444444445 > > lets check the examples from the _sum docstring just to be sure: > > >>> _sum2([3, 2.25, 4.5, -0.5, 1.0], 0.75) > 11.0 > > >>> _sum2([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. > 1000.0 > > >>> from fractions import Fraction as F > >>> _sum2([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) > Fraction(63, 20) > > >>> from decimal import Decimal as D > >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] > >>> _sum2(data) > Decimal('0.6963') > > > Now the second issue: > It is maybe more a matter of taste and concerns the effects of passing a > Counter() object to various functions in the module. > I know this is undocumented and it's probably the user's fault if he tries > that, but still: > > >>> from collections import Counter > >>> c=Counter((1,1,1,1,2,2,2,2,2,3,3,3,3)) > >>> c > Counter({1: 4, 2: 5, 3: 4}) > >>> mode(c) > 2 > Cool, mode knows how to work with Counters (interpreting them as frequency > tables) > > >>> median(c) > 2 > Looks good > > >>> mean(c) > 2.0 > Very well > > But the truth is that only mode really works as you may think and we were > just lucky with the other two: > >>> c=Counter((1,1,2)) > >>> mean(c) > 1.5 > oops > > >>> median(c) > 1.5 > hmm > > From a quick look at the code you can see that mode actually converts your > input to a Counter behind the scenes anyway, so it has no problem. > mean and median, on the other hand, are simply iterating over their input, > so if that input happens to be a mapping, they'll use just the keys. > > I think there are two simple ways to avoid this pitfall: > 1) add an explicit warning to the docs explaining this behavior or > 2) make mean and median do the same magic with Counters as mode does, i.e. > make them check for Counter as the input type and deal with it as if it > were a frequency table. I'd favor this behavior because it looks like > little extra code, but may be very useful in many situations. I'm not quite > sure whether maybe even all mappings should be treated that way? > I think this definitely needs documenting. Even if a behavior isn't settled on in time for 3.4 would it make sense to add some asserts to prevent passing a Counter to mean and median for the time being so that this could be improved in a later bugfix rather than becoming an odd behavior we need to maintain compatibility with in the future? It's very late in the release cycle so the best option for these kinds of changes may be to just document them as known issues and behaviors that we will or may fix in future releases. I think Steve and Larry should make the call on that. thanks for putting the new module through its paces! -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Thu Jan 30 19:16:23 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 30 Jan 2014 18:16:23 +0000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 30/01/2014 12:52, anatoly techtonik wrote: > > So you can not plan how to spend your time more effectively and how to > help with development. > Core dev time could be used more effectively if they weren't sidetracked by non-issues e.g. blithering idiots who keep reopening issues on the bug tracker. I won't mention any names. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From abarnert at yahoo.com Thu Jan 30 19:59:46 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 30 Jan 2014 10:59:46 -0800 Subject: [Python-ideas] Iterative development In-Reply-To: References: <52E8BBD6.2010604@stoneleaf.us> Message-ID: <7C031855-8F04-442D-9843-63799B75E673@yahoo.com> On Jan 30, 2014, at 5:25, anatoly techtonik wrote: > It may happen that resistance to change for open source projects may > be bigger than in organizations. I just want to make sure that people > aware that applying agile methodology to open source development is > possible and I am inclined that it brings more positive improvements for > the Python itself than de-facto development processes. Do you have any examples of an open source project (and not a company-driven one) that applied agile methodology and gained any benefits? Showing something concrete like that would make a far better argument than just rambling about what might be possible. From g.brandl at gmx.net Thu Jan 30 21:15:36 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 30 Jan 2014 21:15:36 +0100 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: Am 30.01.2014 17:17, schrieb Zachary Ware: > I haven't been following this thread very closely, but I have to > disagree with you here, Anatoly. > > On Thu, Jan 30, 2014 at 5:24 AM, anatoly techtonik wrote: >> It is quite obvious from outside that Python has some kind of process, > > Which is well documented in several places. It can be tricky to > always find all of those places, but anyone who is interested can ask, > and will be quickly shown where to look. Nowadays the development process is really well documented in the devguide. If anything is still not in there, that should be fixed. >> but it is quite hard to sync to it for people from outside, > > I'm not sure what you mean here. Every contributor starts from > "outside" of Python. I found no difficulty in getting started when I > did, and I've seen several people start contributing successfully > since then. It would be very hard to go from nothing to suddenly > contributing huge patches to the innermost details of Python at a > rapid pace, but that's not really what people (especially people new > to open source development, like I was) should be doing anyway. Start > slow and small, build from there, and it's an easy and painless > process. > >> because it is not open > > Here I must disagree emphatically. My entire Python experience shows > me that everything about Python is as open as possible. If you want > to know something, look for it. If you can't find it, ask for it. That's the key: *ask* for it. Do not rant that you didn't find something, complain that it wasn't in some random place you expected it, and then not accept help and hints from people that weren't put off replying you in the first place. >> - is not completely clear how the planning is made, > > I'm not sure what you mean here, what planning? Anything that could > be construed as "planning" is done via the PEP process, which is well > documented in PEP 1. We have tried quite a few times to make it clear to Anatoly that there is no "planning" made apart from what you can read about in PEPs and mailing lists. Apparently he thinks there's a secret agenda, when in reality there often is no (shared) agenda at all -- that's in the nature of an open source project. Of course individual developers may have private agendas. >> which tasks are available for current sprint, what you can help with and how to track >> the progress. > > This is the very definition of a bug tracker, and Python's is quite > good for all of this. There could stand to be some upkeep done on > some of the older issues: it would be good for an impartial person to > pick through and see whether an issue is still a problem, update any > patches to apply to current branches, manage the 'easy' tag, add the > proper people to the nosy list, etc. This kind of thing would be a > great place for someone to contribute. Honestly, just bringing all > tracker issues up to date would be a worthwhile sprint task in my > opinion. Few people have tried that because it's such a thankless task, but there was definitely progress. cheers, Georg From wolfgang.maier at biologie.uni-freiburg.de Thu Jan 30 16:28:59 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang) Date: Thu, 30 Jan 2014 07:28:59 -0800 (PST) Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: Message-ID: <7eb401cb-b789-4e5d-8a75-c7205cfb293b@googlegroups.com> > Opinions anyone? Nobody ? Cheers, Wolfgang -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 30 23:27:10 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 30 Jan 2014 14:27:10 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <7eb401cb-b789-4e5d-8a75-c7205cfb293b@googlegroups.com> References: <7eb401cb-b789-4e5d-8a75-c7205cfb293b@googlegroups.com> Message-ID: <52EAD1BE.4090009@stoneleaf.us> On 01/30/2014 07:28 AM, Wolfgang wrote: >> Opinions anyone? > > Nobody ? As a layman your concerns make sense to me. :) -- ~Ethan~ From breamoreboy at yahoo.co.uk Fri Jan 31 00:27:52 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 30 Jan 2014 23:27:52 +0000 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: Message-ID: On 27/01/2014 17:41, Wolfgang wrote: > Ok, that's it for now I guess. Opinions anyone? > Best, > Wolfgang > So this doesn't get lost I'd be inclined to raise two issues on the bug tracker. It's also much easier for people to follow the issues there and better still, see what the actual outcome is. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ethan at stoneleaf.us Fri Jan 31 00:31:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 30 Jan 2014 15:31:37 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: Message-ID: <52EAE0D9.1020609@stoneleaf.us> On 01/30/2014 03:27 PM, Mark Lawrence wrote: > On 27/01/2014 17:41, Wolfgang wrote: > >> Ok, that's it for now I guess. Opinions anyone? >> Best, >> Wolfgang >> > > So this doesn't get lost I'd be inclined to raise two issues on the bug tracker. It's also much easier for people to > follow the issues there and better still, see what the actual outcome is. Checking first is usually good policy, but now that you've had positive feed-back some issues on the bug tracker [1] is definitely a good idea. -- ~Ethan~ [1] http://bugs.python.org/issue?@template=item From steve at pearwood.info Fri Jan 31 02:07:25 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 12:07:25 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: Message-ID: <20140131010724.GE3799@ando> On Mon, Jan 27, 2014 at 09:41:02AM -0800, Wolfgang wrote: > Dear all, > I am still testing the new statistics module and I found two cases were the > behavior of the module seems suboptimal to me. > My most important concern is the module's internal _sum function and its > implications, the other one about passing Counter objects to module > functions. As the author of the module, I'm also concerned with the internal _sum function. That's why it's now a private function -- I originally intended for it to be a public function (see PEP 450). > As for the first subject: > Specifically, I am not happy with the way the function handles different > types. Currently _coerce_types gets called for every element in the > function's input sequence and type conversion follows quite complicated > rules, and - what is worst - make the outcome of _sum() and thereby mean() > dependent on the order of items in the input sequence, e.g.: [...] > (this is because when _sum iterates over the input type Fraction wins over > int, then float wins over Fraction and over everything else that follows in > the first example, but in the second case Fraction wins over int, but then > Fraction vs Decimal is undefined and throws an error). > > Confusing, isn't it? I don't think so. The idea is that _sum() ought to reflect the standard, dare I say intuitive, behaviour of repeated application of the __add__ and __radd__ methods, as used by the plus operator. For example, int + coerces to the other numeric type. What else would you expect? In mathematics the number 0.4 is the same whether you write it as 0.4, 2/5, 0.4+0j, [0; 2, 2] or any other notation you care to invent. (That last one is a continued fraction.) In Python, the number 0.4 is represented by a value and a type, and managing the coercion rules for the different types can be fiddly and annoying. But they shouldn't be *confusing* -- we have a numeric tower, and if I've written the code correctly, the coercion rules ought to follow the tower as closely as possible. > So here's the code of the _sum function: [...] You should expect that to change, if for no other reason than performance. At the moment, _sum is about two orders of magnitude times slower than the built-in sum. I think I can get it to about one order of magnitude slower. > I think a much cleaner (and probably faster) implementation would be to > gather first all the types in the input sequence, then decide what to > return in an input order independent way. My tentative implementation: [...] Thanks for this. I will add that to my collection of alternate versions of _sum. > this leaves the re-implementation of _coerce_types. Personally, I'd prefer > something as simple as possible, maybe even: > > def _coerce_types (types): > if len(types) == 1: > return next(iter(types)) > return float I don't want to coerce everything to float unnecessarily. Floats are, in some ways, the worst choice for numeric values, at least from the perspective of accuracy and correctness. Floats violate several of the fundamental rules of mathematics, e.g. addition is not commutative: py> 1e19 + (-1e19 + 0.1) == (1e19 + -1e19) + 0.1 False One of my aims is to avoid raising TypeError unnecessarily. The statistics module is aimed at casual users who may not understand, or care about, the subtleties of numeric coercions, they just want to take the average of two values regardless of what sort of number they are. But having said that, I realise that mixed-type arithmetic is difficult, and I've avoided documenting the fact that the module will work on mixed types. [...] > Now the second issue: > It is maybe more a matter of taste and concerns the effects of passing a > Counter() object to various functions in the module. Interesting. If you think there is a use-case for passing Counters to the statistics functions (weighted data?) then perhaps they can be explicitly supported in 3.5. It's way too late for 3.4 to introduce new functionality. [...] > From a quick look at the code you can see that mode actually converts your > input to a Counter behind the scenes anyway, so it has no problem. > mean and median, on the other hand, are simply iterating over their input, > so if that input happens to be a mapping, they'll use just the keys. Well yes :-) I'm open to the suggestion that Counters should be treated specially. Would you be so kind as to raise an issue in the bug tracker? Thanks for the feedback, -- Steven From steve at pearwood.info Fri Jan 31 02:27:05 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 12:27:05 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <52EAA20A.7090704@hastings.org> References: <52EAA20A.7090704@hastings.org> Message-ID: <20140131012705.GF3799@ando> On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote: > On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang > > wrote: > >I think a much cleaner (and probably faster) implementation would be > >to gather first all the types in the input sequence, then decide what > >to return in an input order independent way. > > I'm willing to consider this a "bug fix". And since it's a new function > in 3.4, we don't have an installed base. So I'm willing to consider > fixing this for 3.4. I'm hesitant to require two passes over the data in _sum. Some higher-order statistics like variance are currently implemented using two passes, but ultimately I've like to support single-pass algorithms that can operate on large but finite iterators. But I will consider it as an option. I'm also hesitant to make the promise that _sum will be order-independent. Addition in Python isn't: py> class A(int): ... def __add__(self, other): ... return type(self)(super().__add__(other)) ... def __repr__(self): ... return "%s(%d)" % (type(self).__name__, self) ... py> class B(A): ... pass ... py> A(1) + B(1) A(2) py> B(1) + A(1) B(2) [...] > Yes, exactly. If the support for Counter is half-baked, let's prevent > it from being used now. I strongly disagree with this. Counters are currently treated the same as any other iterable, and built-in sum and math.fsum don't treat them specially: py> from collections import Counter py> c = Counter([1, 1, 1, 1, 1, 2]) py> c Counter({1: 5, 2: 1}) py> sum(c) 3 py> from math import fsum py> fsum(c) 3.0 If you're worried about people coming to rely on this, and thus running into trouble in the future if Counters get treated specially for (say) weighted data, then I'd accept a warning in the docs, or even a runtime warning. But not an exception. -- Steven From rosuav at gmail.com Fri Jan 31 02:32:04 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 31 Jan 2014 12:32:04 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <20140131010724.GE3799@ando> References: <20140131010724.GE3799@ando> Message-ID: On Fri, Jan 31, 2014 at 12:07 PM, Steven D'Aprano wrote: > One of my aims is to avoid raising TypeError unnecessarily. The > statistics module is aimed at casual users who may not understand, or > care about, the subtleties of numeric coercions, they just want to take > the average of two values regardless of what sort of number they are. > But having said that, I realise that mixed-type arithmetic is difficult, > and I've avoided documenting the fact that the module will work on mixed > types. Based on the current docs and common sense, I would expect that Fraction and Decimal should normally be there exclusively, and that the only type coercions would be int->float->complex (because it makes natural sense to write a list of "floats" as [1.4, 2, 3.7], but it doesn't make sense to write a list of Fractions as [Fraction(1,2), 7.8, Fraction(12,35)]). Any mishandling of Fraction or Decimal with the other three types can be answered with "Well, you should be using the same type everywhere". (Though it might be useful to allow int->anything coercion, since that one's easy and safe.) ChrisA From abarnert at yahoo.com Fri Jan 31 04:47:54 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 30 Jan 2014 19:47:54 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <20140131010724.GE3799@ando> Message-ID: On Jan 30, 2014, at 17:32, Chris Angelico wrote: > On Fri, Jan 31, 2014 at 12:07 PM, Steven D'Aprano wrote: >> One of my aims is to avoid raising TypeError unnecessarily. The >> statistics module is aimed at casual users who may not understand, or >> care about, the subtleties of numeric coercions, they just want to take >> the average of two values regardless of what sort of number they are. >> But having said that, I realise that mixed-type arithmetic is difficult, >> and I've avoided documenting the fact that the module will work on mixed >> types. > > Based on the current docs and common sense, I would expect that > Fraction and Decimal should normally be there exclusively, and that > the only type coercions would be int->float->complex (because it makes > natural sense to write a list of "floats" as [1.4, 2, 3.7], but it > doesn't make sense to write a list of Fractions as [Fraction(1,2), > 7.8, Fraction(12,35)]). Any mishandling of Fraction or Decimal with > the other three types can be answered with "Well, you should be using > the same type everywhere". (Though it might be useful to allow > int->anything coercion, since that one's easy and safe.) Except that large enough int values lose information, and even larger ones raise an exception: >>> float(pow(3, 50)) == pow(3, 50) False >>> float(1<<2000) OverflowError: int too large to convert to float And that first one is the reason why statistics needs a custom sum in the first place. When there are only 2 types involved in the sequence, you get the answer you wanted. The only problem raised by the examples in this thread is that with 3 or more types that aren't all mutually coercible but do have a path through them, you can sometimes get imprecise answers and other times get exceptions, and you might come to rely on one or the other. So, rather than throwing out Stephen's carefully crafted and clearly worded rules and trying to come up with new ones, why not (for 3.4) just say that the order of coercions given values of 3 or more types is not documented and subject to change in the future (maybe even giving the examples from the initial email)? From abarnert at yahoo.com Fri Jan 31 04:49:14 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 30 Jan 2014 19:49:14 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <20140131010724.GE3799@ando> Message-ID: <97E9C909-C132-4B58-AC94-042AE0D92CD0@yahoo.com> On Jan 30, 2014, at 19:47, Andrew Barnert wrote: > So, rather than throwing out Stephen's carefully crafted and clearly worded rules Sorry, I meant Steven there. (At least I hope I did, otherwise this will be doubly embarrassing...) From steve at pearwood.info Fri Jan 31 05:09:38 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 15:09:38 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <20140131010724.GE3799@ando> Message-ID: <20140131040938.GG3799@ando> On Thu, Jan 30, 2014 at 07:47:54PM -0800, Andrew Barnert wrote: > So, rather than throwing out Stephen's carefully crafted and clearly > worded rules and trying to come up with new ones, why not (for 3.4) > just say that the order of coercions given values of 3 or more types > is not documented and subject to change in the future (maybe even > giving the examples from the initial email)? I am happy to have an explicit disclaimer in the docs saying the result of calculations on mixed types are not guaranteed and may be subject to change. Then for 3.5 we can consider this more carefully. -- Steven From rosuav at gmail.com Fri Jan 31 05:36:16 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 31 Jan 2014 15:36:16 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <20140131010724.GE3799@ando> Message-ID: On Fri, Jan 31, 2014 at 2:47 PM, Andrew Barnert wrote: >> Based on the current docs and common sense, I would expect that >> Fraction and Decimal should normally be there exclusively, and that >> the only type coercions would be int->float->complex (because it makes >> natural sense to write a list of "floats" as [1.4, 2, 3.7], but it >> doesn't make sense to write a list of Fractions as [Fraction(1,2), >> 7.8, Fraction(12,35)]). Any mishandling of Fraction or Decimal with >> the other three types can be answered with "Well, you should be using >> the same type everywhere". (Though it might be useful to allow >> int->anything coercion, since that one's easy and safe.) > > Except that large enough int values lose information, and even larger ones raise an exception: > > >>> float(pow(3, 50)) == pow(3, 50) > False > >>> float(1<<2000) > OverflowError: int too large to convert to float > > And that first one is the reason why statistics needs a custom sum in the first place. I don't think it'd be possible to forbid int -> float coercion - the Python community (and Steven himself) would raise an outcry. But int->float is at least as safe as it's fundamentally possible to be. Adding ".0" to the end of a literal (thus making it a float literal) is, AFAIK, absolutely identical to wrapping it in "float(" and ")". That's NOT true of float -> Fraction or float -> Decimal - going via float will cost precision, but going via int ought to be safe. >>> float(pow(3,50)) == pow(3.0,50) True The difference between int and any other type is going to be pretty much the same whether you convert first or convert last. The only distinction that I can think of is floating-point rounding errors, which are already dealt with: >>> statistics._sum([pow(2.0,53),1.0,1.0,1.0]) 9007199254740996.0 >>> sum([pow(2.0,53),1.0,1.0,1.0]) 9007199254740992.0 Since it handles this correctly with all floats, it'll handle it just fine with some ints and some floats: >>> sum([pow(2,53),1,1,1.0]) 9007199254740996.0 >>> statistics._sum([pow(2,53),1,1,1.0]) 9007199254740996.0 In this case, the builtin sum() happens to be correct, because it adds the first ones as ints, and then converts to float at the end. Of course, "correct" isn't quite correct - the true value based on real number arithmetic is ...95, as can be seen in Python if they're all ints. But I'm defining "correct" as "the same result that would be obtained by calculating in real numbers and then converting to the data type of the end result". And by that definition, builtin sum() is correct as long as the float is right at the end, and statistics._sum() is correct regardless of the order. >>> statistics._sum([1.0,pow(2,53),1,1]) 9007199254740996.0 >>> sum([1.0,pow(2,53),1,1]) 9007199254740992.0 So in that sense, it's "safe" to cast all int to float if the result is going to be float, unless an individual value is itself too big to convert, but the final result (thanks to negative values) would have been: I'm not sure how it's currently handled, but this particular case is working: >>> statistics._sum([1.0,1<<2000,0-(1<<2000)]) 1.0 The biggest problem, then, is cross-casting between float, Fraction, and Decimal. And anyone who's mixing those is asking for trouble already. ChrisA From rosuav at gmail.com Fri Jan 31 05:37:35 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 31 Jan 2014 15:37:35 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <20140131040938.GG3799@ando> References: <20140131010724.GE3799@ando> <20140131040938.GG3799@ando> Message-ID: On Fri, Jan 31, 2014 at 3:09 PM, Steven D'Aprano wrote: > On Thu, Jan 30, 2014 at 07:47:54PM -0800, Andrew Barnert wrote: > >> So, rather than throwing out Stephen's carefully crafted and clearly >> worded rules and trying to come up with new ones, why not (for 3.4) >> just say that the order of coercions given values of 3 or more types >> is not documented and subject to change in the future (maybe even >> giving the examples from the initial email)? > > I am happy to have an explicit disclaimer in the docs saying the result > of calculations on mixed types are not guaranteed and may be subject to > change. Then for 3.5 we can consider this more carefully. +1. ChrisA From larry at hastings.org Fri Jan 31 05:58:20 2014 From: larry at hastings.org (Larry Hastings) Date: Thu, 30 Jan 2014 20:58:20 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <20140131012705.GF3799@ando> References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> Message-ID: <52EB2D6C.9040803@hastings.org> On 01/30/2014 05:27 PM, Steven D'Aprano wrote: > I'm hesitant to require two passes over the data in _sum. Some > higher-order statistics like variance are currently implemented using > two passes, but ultimately I've like to support single-pass algorithms > that can operate on large but finite iterators. > > But I will consider it as an option. > > I'm also hesitant to make the promise that _sum will be > order-independent. Addition in Python isn't: [...] I concede that this is mostly outside my expertise, and the statistics module and the PEP were your doing. So you're the expert here and I will defer to you. But. My dim understanding of the *whole point* of the new statistics module was that it valued correctness over raw performance. I assumed sorting values from small to large** before summing was *exactly* the sort of thing it was written to do. If all we wanted were Python's existing semantics, why bother writing statistics._sum() in the first place? Just use sum(). On the other hand, I had missed the fact that this was an internal-only method. If changing _statistics._sum so it reordered the iterable to preserve correctness wouldn't change the behavior of any supported external APIs, then obviously there's no need, and I'd prefer to leave it alone for 3.4. If you decided to change it for 3.5 and people were relying on its old behavior, that would be on them. (Though a comment saying "I might change this later" would be welcome... if true.) > If you're worried about people coming to rely on this, and thus running > into trouble in the future if Counters get treated specially for (say) > weighted data, then I'd accept a warning in the docs, or even a runtime > warning. But not an exception. The statistics module isn't marked as provisional. So the semantics that ship with 3.4 are going to be set in stone. Changing them later simply won't be an option--that will break code. If you want to treat Counter objects differently in the future than you do now, then I agree with Wolfgang: the best course of action would be to add an exception now. But again I'll defer to your judgment about what's best for your module. //arry/ ** Or high-precision to low-precision. You know what I mean, the classic "if you add large numbers first you throw away precision and can wind up with a different result" thing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From suresh_vv at yahoo.com Fri Jan 31 06:36:46 2014 From: suresh_vv at yahoo.com (Suresh V.) Date: Fri, 31 Jan 2014 11:06:46 +0530 Subject: [Python-ideas] __before__ and __after__ attributes for functions In-Reply-To: <49AF79E2-129C-4548-A017-9B2B18BD62E8@mac.com> References: <52E1F56E.2030805@stoneleaf.us> <49AF79E2-129C-4548-A017-9B2B18BD62E8@mac.com> Message-ID: On Thursday 30 January 2014 02:14 PM, Ronald Oussoren wrote: > > On 24 Jan 2014, at 08:54, Suresh V. wrote: > >> On Friday 24 January 2014 10:39 AM, Ethan Furman wrote: >>> On 01/23/2014 08:09 PM, Suresh V. wrote: >>>> >>>> Also it would mean that the client code imports from this package. >>>> I would like client code to remain exactly as it is (continue to >>>> import from its original package) but the behavior is enhanced >>>> once this package is imported on startup. >>> >>> /Something/ has to adjust the pre and post conditions -- if not the >>> client code, then what? >> >> pre and post conditions are just one possible use of this. >> >> Going back to my smtplib.SMTP.sendmail example. >> No changes in bulk of client code. >> Single patch module imported in main. > > Why is this a good thing? You seem to propose adding a mechanism that makes it easily possible to modify the behaviour of existing functions, which makes it harder to reason about code. It is a "good thing" because it adheres to the "Open/Closed principle" better than monkey patching does. Meaning open to extension and closed to modification. > > While this is also possible without language changes with the current monkey patching mechanisms its at least clear that your doing something naughty when writing the patching code :-) This if for those non-naughty times :-) From tjreedy at udel.edu Fri Jan 31 07:01:33 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 31 Jan 2014 01:01:33 -0500 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <52EB2D6C.9040803@hastings.org> References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> <52EB2D6C.9040803@hastings.org> Message-ID: On 1/30/2014 11:58 PM, Larry Hastings wrote: > The statistics module isn't marked as provisional. Perhaps it should be, at least with respect to sums of mixed types and use of Counters. > So the semantics that ship with 3.4 are going to be set in stone. Given the discussion here and previously, that seems premature. -- Terry Jan Reedy From stephen at xemacs.org Fri Jan 31 06:56:39 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 31 Jan 2014 14:56:39 +0900 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <20140131010724.GE3799@ando> References: <20140131010724.GE3799@ando> Message-ID: <87bnys7kyw.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Floats violate several of the fundamental rules of mathematics, > e.g. addition is not commutative: AFAIK it is. > py> 1e19 + (-1e19 + 0.1) == (1e19 + -1e19) + 0.1 > False This is a failure of associativity, not commutativity. Associativity is in many ways a more fundamental property. From steve at pearwood.info Fri Jan 31 09:18:20 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 19:18:20 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <87bnys7kyw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140131010724.GE3799@ando> <87bnys7kyw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140131081820.GH3799@ando> On Fri, Jan 31, 2014 at 02:56:39PM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > Floats violate several of the fundamental rules of mathematics, > > e.g. addition is not commutative: > > AFAIK it is. > > > py> 1e19 + (-1e19 + 0.1) == (1e19 + -1e19) + 0.1 > > False > > This is a failure of associativity, not commutativity. Oops, you are correct. I got them mixed up. http://en.wikipedia.org/wiki/Associativity However, commutativity of addition can violated by Python numeric types, although not floats alone. E.g. the example I gave earlier of two int subclasses. -- Steven From abarnert at yahoo.com Fri Jan 31 09:32:10 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 31 Jan 2014 00:32:10 -0800 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <87bnys7kyw.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140131010724.GE3799@ando> <87bnys7kyw.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5C48EA5B-0483-490B-9BEA-7AB89B4D5B7C@yahoo.com> On Jan 30, 2014, at 21:56, "Stephen J. Turnbull" wrote: > Steven D'Aprano writes: > >> Floats violate several of the fundamental rules of mathematics, >> e.g. addition is not commutative: > > AFAIK it is. > >> py> 1e19 + (-1e19 + 0.1) == (1e19 + -1e19) + 0.1 >> False > > This is a failure of associativity, not commutativity. Associativity > is in many ways a more fundamental property. Yeah, the only way commutativity can fail with IEEE floats is if you treat nan as a number and have at least two nans, at least one of them quiet. But associativity failing isn't really fundamental. This example fails as a consequence of the axiom of (additive) identity not holding. (There is a unique "zero", but it's not true that, for all y, x+y=y implies x is that zero.) The overflow example fails because of closure not holding (unless you count inf and nan as numbers, in which case it again fails because zero fails even more badly). If you just meant that you lose commutativity before associativity in compositions over fields, then yeah, I guess in that sense associativity is more fundamental. From steve at pearwood.info Fri Jan 31 09:56:13 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 19:56:13 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: <52EB2D6C.9040803@hastings.org> References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> <52EB2D6C.9040803@hastings.org> Message-ID: <20140131085610.GI3799@ando> On Thu, Jan 30, 2014 at 08:58:20PM -0800, Larry Hastings wrote: > On 01/30/2014 05:27 PM, Steven D'Aprano wrote: > >I'm hesitant to require two passes over the data in _sum. Some > >higher-order statistics like variance are currently implemented using > >two passes, but ultimately I've like to support single-pass algorithms > >that can operate on large but finite iterators. > > > >But I will consider it as an option. > > > >I'm also hesitant to make the promise that _sum will be > >order-independent. Addition in Python isn't: [...] > > I concede that this is mostly outside my expertise, and the statistics > module and the PEP were your doing. So you're the expert here and I > will defer to you. > > But. My dim understanding of the *whole point* of the new statistics > module was that it valued correctness over raw performance. I assumed > sorting values from small to large** before summing was *exactly* the > sort of thing it was written to do. If all we wanted were Python's > existing semantics, why bother writing statistics._sum() in the first > place? Just use sum(). _sum doesn't duplicate the semantics of built-in sum(). It is sort of a hybrid of sum and math.fsum: like sum, it tries to conserve types, and give a sensible result when there are mixed types. Like fsum, it tries to be higher precision. > On the other hand, I had missed the fact that this was an internal-only > method. If changing _statistics._sum so it reordered the iterable to > preserve correctness wouldn't change the behavior of any supported > external APIs, then obviously there's no need, and I'd prefer to leave > it alone for 3.4. Changes to _sum may be visible, because the external APIs such as mean and variance rely on it. For example, an extreme case: if I removed _sum and replaced it with math.fsum, then all of the external APIs will suddenly start outputting floats and nothing but floats. (I'm not intending to do that.) I think that it is asking too much to promise that no statistics function will ever change it's numeric result. I don't intend for them to become *less* accurate, but they might become *more* accurate. For example, currently the unit tests for variance pass with an acceptable tolerance of 1e-12 (relative error). Perhaps this needs to be documented? The random module does something similar: http://docs.python.org/3/library/random.html#notes-on-reproducibility > If you decided to change it for 3.5 and people were > relying on its old behavior, that would be on them. (Though a comment > saying "I might change this later" would be welcome... if true.) > > > >If you're worried about people coming to rely on this, and thus running > >into trouble in the future if Counters get treated specially for (say) > >weighted data, then I'd accept a warning in the docs, or even a runtime > >warning. But not an exception. > > The statistics module isn't marked as provisional. So the semantics > that ship with 3.4 are going to be set in stone. Changing them later > simply won't be an option--that will break code. If you want to treat > Counter objects differently in the future than you do now, then I agree > with Wolfgang: the best course of action would be to add an exception > now. But again I'll defer to your judgment about what's best for your > module. Hmmm. Well, that's a much stronger promise of backward compatibility than I would have expected. The fact that (say) variance works with a dict is a pure accident of implementation, not advertised or promised in any way. But I'll accept your ruling. I want to reserve the right to add special handling of mappings in the future. In order of preference (highest to least) I'd like to: 1) Put a note in the documentation that handling of mappings is subject to change; 2) As above, plus raise warning.warn(); or 3) Raise an exception (this one only if you insist). -- Steven From wolfgang.maier at biologie.uni-freiburg.de Fri Jan 31 09:57:26 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 31 Jan 2014 08:57:26 +0000 (UTC) Subject: [Python-ideas] statistics module in Python3.4 References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> Message-ID: Steven D'Aprano writes: > > On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote: > > On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang > > > > wrote: > > >I think a much cleaner (and probably faster) implementation would be > > >to gather first all the types in the input sequence, then decide what > > >to return in an input order independent way. > > > > I'm willing to consider this a "bug fix". And since it's a new function > > in 3.4, we don't have an installed base. So I'm willing to consider > > fixing this for 3.4. > > I'm hesitant to require two passes over the data in _sum. Some > higher-order statistics like variance are currently implemented using > two passes, but ultimately I've like to support single-pass algorithms > that can operate on large but finite iterators. > > But I will consider it as an option. > > I'm also hesitant to make the promise that _sum will be > order-independent. Addition in Python isn't: > > py> class A(int): > ... def __add__(self, other): > ... return type(self)(super().__add__(other)) > ... def __repr__(self): > ... return "%s(%d)" % (type(self).__name__, self) > ... > py> class B(A): > ... pass > ... > py> A(1) + B(1) > A(2) > py> B(1) + A(1) > B(2) Hi Steven, first of all let me say that I am quite amazed by the extent of the discussion that is going on now. All I really meant is that there are two special cases (mixed types in _sum and Counters as input to some functions) that I find worth reconsidering in an otherwise really useful module. Regarding your comments above and in other posts: I never proposed two passes over the data. My implementation (below again because many people seem to have missed it in my first rather long post) gathers the input types in a set **while** calculating the sum in a single for loop. It then calls _coerce_types passing this set only once: def _sum2(data, start=None): if start is not None: t = set((type(start),)) n, d = _exact_ratio(start) else: t = set() n = 0 d = 1 partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and build up a set of all types. for x in data: t.add(type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n T = _coerce_types(t) # decide which type to use based on set of all types if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) As for my tentative implementation of _coerce_types, it was really meant as an example. Specifically, I said: > Personally, I'd prefer something as simple as possible, maybe even: > > def _coerce_types (types): > if len(types) == 1: > return next(iter(types)) > return float > > , but that's just a suggestion. It is totally up to you to come up with something more along the lines of your original, but I still think that making the behavior order-independent comes at no performance-cost (if not a gain) and will make _sum's return type more predictable. When I said the current behavior was confusing, I didn't mean "not logical" or anything. The current rules are in fact very precisely worked out, I just think they are too complicated to think them through every time. You are right of course with your remark that addition in Python is also order-dependent regarding the returned type, but in my opinion this is not the point here. You are emphasizing that _sum is a private function of the module, but mean is a public one and the behavior of mean is dictated by that of _sum. Now when I call the mean function, then, of course, I know that this will very most likely be implemented as adding all values then dividing by their number, but in terms of encapsulation principles I shouldn't be forced to think about this to understand the return value of the function. In other words, it doesn't help here that _sum reflects the behavior of __add__, all you should care about is that the behavior of mean() is simple to explain and understand. Again, this is just an opinion of somebody interested in having this particular module well-designed from the beginning before things are set in stone. Best wishes, Wolfgang From steve at pearwood.info Fri Jan 31 10:04:41 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Jan 2014 20:04:41 +1100 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> Message-ID: <20140131090441.GJ3799@ando> On Fri, Jan 31, 2014 at 08:57:26AM +0000, Wolfgang Maier wrote: > Steven D'Aprano writes: > > I'm hesitant to require two passes over the data in _sum. [...] > Regarding your comments above and in other posts: > I never proposed two passes over the data. My implementation (below again > because many people seem to have missed it in my first rather long post) > gathers the input types in a set **while** calculating the sum in a single > for loop. It then calls _coerce_types passing this set only once: Ah! I did miss it -- I just skimmed your implementation, and completely failed to realise the point you were making. Thank you for the correction. -- Steven From wolfgang.maier at biologie.uni-freiburg.de Fri Jan 31 11:12:15 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 31 Jan 2014 10:12:15 +0000 (UTC) Subject: [Python-ideas] statistics module in Python3.4 References: <20140131010724.GE3799@ando> Message-ID: Chris Angelico writes: > > On Fri, Jan 31, 2014 at 12:07 PM, Steven D'Aprano wrote: > > One of my aims is to avoid raising TypeError unnecessarily. The > > statistics module is aimed at casual users who may not understand, or > > care about, the subtleties of numeric coercions, they just want to take > > the average of two values regardless of what sort of number they are. > > But having said that, I realise that mixed-type arithmetic is difficult, > > and I've avoided documenting the fact that the module will work on mixed > > types. > > Based on the current docs and common sense, I would expect that > Fraction and Decimal should normally be there exclusively, and that > the only type coercions would be int->float->complex (because it makes > natural sense to write a list of "floats" as [1.4, 2, 3.7], but it > doesn't make sense to write a list of Fractions as [Fraction(1,2), > 7.8, Fraction(12,35)]). Any mishandling of Fraction or Decimal with > the other three types can be answered with "Well, you should be using > the same type everywhere". Well, that's simple to stick to as long as you are dealing with explicitly typed input data sets, but what about things like: a = transform_a_series_of_data_somehow(data) b = transform_this_series_differently(data) statistics.mean(a+b) # assuming a and b are lists of transformed values potentially different types are far more difficult to spot here and the fact that the result of the above might not be the same as, e.g.,: statistics.mean(b+a) is not making things easier to debug. >(Though it might be useful to allow > int->anything coercion, since that one's easy and safe.) > It should be mentioned here that complex numbers are not currently dealt with by statistics._sum . >>> statistics._sum((complex(1),)) Traceback (most recent call last): File "", line 1, in s._sum((complex(1),)) File ".\statistics.py", line 158, in _sum n, d = exact_ratio(x) File ".\statistics.py", line 257, in _exact_ratio raise TypeError(msg.format(type(x).__name__)) from None TypeError: can't convert type 'complex' to numerator/denominator Best, Wolfgang From haoyi.sg at gmail.com Fri Jan 31 12:35:53 2014 From: haoyi.sg at gmail.com (Haoyi Li) Date: Fri, 31 Jan 2014 03:35:53 -0800 Subject: [Python-ideas] Could the ast module's ASTs preserve source_length in addition to lineno and col_offset? In-Reply-To: <4f664144-f981-4c87-96d3-04481c122d3b@googlegroups.com> References: <51A69390.8070905@pearwood.info> <51A698A9.3020900@pearwood.info> <4f664144-f981-4c87-96d3-04481c122d3b@googlegroups.com> Message-ID: Nothing happened, I suppose. People in general thought it was a good idea but after looking at the python source code, I chickened out of implementing it in favor a dumb parse it till it works technique which sufficed for my purposes. On Fri, Jan 31, 2014 at 2:55 AM, Alexander Ivanov wrote: > What happened :? (I am also interested in getting source_length/col_last > kind of info. Is there an alternative Python ast wrapper/library which > provides it?) > > On Friday, May 31, 2013 4:47:15 PM UTC+3, Nick Coghlan wrote: >> >> >> On 31 May 2013 20:00, "Haoyi Li" wrote: >> > >> > Ok, I'll give it a shot; I'm not familiar with the python codebase or >> build process, but i'll puzzle it out. Where's the place to go for help >> related to this sort of thing? python-dev? >> >> Check the developer guide at docs. python.org/devguide, and if you have >> any follow-up questions, sign up to the core-me... at python.org list. >> >> Cheers, >> Nick. >> >> > >> > >> > On Fri, May 31, 2013 at 1:04 AM, Nick Coghlan >> wrote: >> >> >> >> >> >> >> On 31 May 2013 14:28, "Haoyi Li" wrote: >> >> > >> >> > Anyone else have any thoughts about this? This seems like it would >> be a pretty straightforward thing to do, and I would be happy to go through >> the code and submit a patch. The only question is whether we want to do it >> in the first place; are there any reasons it can't/shouldn't be done that >> I'm not aware of? >> >> >> >> Seems reasonable to me, but would need to see a patch to give a >> definite yes or no. >> >> >> >> Cheers, >> >> Nick. >> >> >> >> > >> >> > >> >> > On Wed, May 29, 2013 at 8:09 PM, Steven D'Aprano < >> st... at pearwood.info> wrote: >> >> >> >> >> >> On 30/05/13 10:04, Haoyi Li wrote: >> >> >>> >> >> >>> I don't need to keep the source code, I just need a single integer >> for each >> >> >>> node. I would then be able to reconstruct the source snippet. >> >> >> >> >> >> >> >> >> And so you did say. Sorry for the noise. >> >> >> >> >> >> >> >> >> -- >> >> >> Steven >> >> >> _______________________________________________ >> >> >> Python-ideas mailing list >> >> >> Python... at python.org >> >> >> >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > Python-ideas mailing list >> >> > Python... at python.org >> >> > http://mail.python.org/mailman/listinfo/python-ideas >> >> > >> > >> > >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jan 31 12:41:12 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 31 Jan 2014 21:41:12 +1000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 31 Jan 2014 04:17, "Mark Lawrence" wrote: > > On 30/01/2014 12:52, anatoly techtonik wrote: >> >> >> So you can not plan how to spend your time more effectively and how to >> help with development. >> > > Core dev time could be used more effectively if they weren't sidetracked by non-issues e.g. blithering idiots who keep reopening issues on the bug tracker. I won't mention any names. Mark, as annoying as Anatoly is, this is still a violation of the list code of conduct. If you find him too irritating to allow you to maintain civility on the core lists when dealing with him, set your mail client to ignore his messages (that's what most of the core devs have been doing for quite some time). Anatoly already got himself banned from bugs.python.org with his antics, and his moderation flag is set on all the core mailing lists. At the rate he is going, he is not encouraging anyone to reconsider either decision, and it's still possible for that moderation flag to be upgraded to an outright ban from the mailing lists if the mods decide it would be appropriate. Regards, Nick. > > > -- > My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. > > Mark Lawrence > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Fri Jan 31 13:43:55 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 31 Jan 2014 12:43:55 +0000 (UTC) Subject: [Python-ideas] statistics module in Python3.4 References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> <52EB2D6C.9040803@hastings.org> <20140131085610.GI3799@ando> Message-ID: Steven D'Aprano writes: > > On Thu, Jan 30, 2014 at 08:58:20PM -0800, Larry Hastings wrote: > > > > The statistics module isn't marked as provisional. So the semantics > > that ship with 3.4 are going to be set in stone. Changing them later > > simply won't be an option--that will break code. If you want to treat > > Counter objects differently in the future than you do now, then I agree > > with Wolfgang: the best course of action would be to add an exception > > now. But again I'll defer to your judgment about what's best for your > > module. > > > Hmmm. Well, that's a much stronger promise of backward compatibility > than I would have expected. The fact that (say) variance works with a > dict is a pure accident of implementation, not advertised or promised in > any way. But I'll accept your ruling. I want to reserve the right to > add special handling of mappings in the future. In order of preference > (highest to least) I'd like to: > > 1) Put a note in the documentation that handling of mappings is subject > to change; > > 2) As above, plus raise warning.warn(); or > > 3) Raise an exception (this one only if you insist). > I thought about this further and, yes, I guess at least point 1) is essential and even if that means marking the module as provisional it is a bit sad, but worth it. Mappings may be an excellent way of specifying frequencies and weights in an elegant way. You could use them to calculate weighted means and variances, and even to specify variable interval widths for median_grouped to calculate a weighted median as defined here: http://en.wikipedia.org/wiki/Weighted_median Most of this is quite easy to code I guess and it would be a pity to deprive yourself of this possibility because people start passing mappings now and start relying on iteration happening over keys only. I agree with Larry that once this happens, it will be hard to change the behavior even in 3.5. Best, Wolfgang From breamoreboy at yahoo.co.uk Fri Jan 31 16:13:39 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 31 Jan 2014 15:13:39 +0000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 31/01/2014 11:41, Nick Coghlan wrote: > > On 31 Jan 2014 04:17, "Mark Lawrence" > > wrote: > > > > On 30/01/2014 12:52, anatoly techtonik wrote: > >> > >> > >> So you can not plan how to spend your time more effectively and how to > >> help with development. > >> > > > > Core dev time could be used more effectively if they weren't > sidetracked by non-issues e.g. blithering idiots who keep reopening > issues on the bug tracker. I won't mention any names. > > Mark, as annoying as Anatoly is, this is still a violation of the list > code of conduct. If you find him too irritating to allow you to maintain > civility on the core lists when dealing with him, set your mail client > to ignore his messages (that's what most of the core devs have been > doing for quite some time). > > Anatoly already got himself banned from bugs.python.org > with his antics, and his moderation flag is set > on all the core mailing lists. At the rate he is going, he is not > encouraging anyone to reconsider either decision, and it's still > possible for that moderation flag to be upgraded to an outright ban from > the mailing lists if the mods decide it would be appropriate. > > Regards, > Nick. > Who says I was getting at Anotoly? Unless the English language has changed without my knowledge, you'll find that "idiots" and "names" are plural and not singular. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rosuav at gmail.com Fri Jan 31 16:26:52 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 1 Feb 2014 02:26:52 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Sat, Feb 1, 2014 at 2:13 AM, Mark Lawrence wrote: > Who says I was getting at Anotoly? Unless the English language has changed > without my knowledge, you'll find that "idiots" and "names" are plural and > not singular. That's a perfectly valid argument, in the same way that this is perfectly valid code: # utils.py import math math.pi = 3.159 SECONDS_PER_MINUTE = 60 def minsec(sec): global SECONDS_PER_MINUTE SECONDS_PER_MINUTE+=2 return sec//SECONDS_PER_MINUTE, sec%SECONDS_PER_MINUTE def format_time(sec): min,sec = minsec(sec) return "%02d:%02d"%(sec,min) It's all perfectly legal Python, but it breaks all sorts of conventions, and you know it. In the context of this thread, it was obvious to everyone what you were saying, and hiding behind the technicality of plurality doesn't help you. Do please be honest with yourself and us. ChrisA From breamoreboy at yahoo.co.uk Fri Jan 31 16:35:04 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 31 Jan 2014 15:35:04 +0000 Subject: [Python-ideas] statistics module in Python3.4 In-Reply-To: References: <52EAA20A.7090704@hastings.org> <20140131012705.GF3799@ando> Message-ID: On 31/01/2014 08:57, Wolfgang Maier wrote: > > Hi Steven, > first of all let me say that I am quite amazed by the extent of the > discussion that is going on now. All I really meant is that there are two > special cases (mixed types in _sum and Counters as input to some functions) > that I find worth reconsidering in an otherwise really useful module. > Thanks for starting what I see as a very healthy debate that in the longer term is highly likely to make for a better statistics module. What more could a user ask for? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From breamoreboy at yahoo.co.uk Fri Jan 31 16:45:16 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 31 Jan 2014 15:45:16 +0000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 31/01/2014 15:26, Chris Angelico wrote: > On Sat, Feb 1, 2014 at 2:13 AM, Mark Lawrence wrote: >> Who says I was getting at Anotoly? Unless the English language has changed >> without my knowledge, you'll find that "idiots" and "names" are plural and >> not singular. > > That's a perfectly valid argument, in the same way that this is > perfectly valid code: > > # utils.py > import math > math.pi = 3.159 > SECONDS_PER_MINUTE = 60 > > def minsec(sec): > global SECONDS_PER_MINUTE > SECONDS_PER_MINUTE+=2 > return sec//SECONDS_PER_MINUTE, sec%SECONDS_PER_MINUTE > > def format_time(sec): > min,sec = minsec(sec) > return "%02d:%02d"%(sec,min) > > > It's all perfectly legal Python, but it breaks all sorts of > conventions, and you know it. In the context of this thread, it was > obvious to everyone what you were saying, and hiding behind the > technicality of plurality doesn't help you. Do please be honest with > yourself and us. > > ChrisA Asperger Syndrome sufferers are always honest. Sadly I find it a major weakness that I have to live with. We also take things literally and write things literally. So your "obvious to everyone what you were saying" to me is clearly incorrect. Please withdraw the comment. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rosuav at gmail.com Fri Jan 31 17:15:07 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 1 Feb 2014 03:15:07 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Sat, Feb 1, 2014 at 2:45 AM, Mark Lawrence wrote: > Asperger Syndrome sufferers are always honest. Sadly I find it a major > weakness that I have to live with. We also take things literally and write > things literally. So your "obvious to everyone what you were saying" to me > is clearly incorrect. Please withdraw the comment. I know what it's like to live with Aspergers, I have it myself (at least, not formally diagnosed but it seems pretty likely). And I do put my foot in my mouth pretty often. But that doesn't mean that I can hide behind it as a shield when it's this obvious. You knew full well what you were saying when you said you wouldn't mention any names. Now, I do enjoy a good upper-class British insult-fest. That's a major part of what makes quite a few British comedies work - the utterly courteous, yet bitingly cutting, wit, barb, and counter-barb. The tenor explains to the soprano that he was just disguised as a member of the band, that he's really a much more important person, and she says that she knew he was in disguise the minute she heard him play. But when that wit is wielded inappropriately, the proper response is a graceful apology or retraction... or, if circumstances demand, a barbed retraction ("I implied in the House last week that the Hon Member had the intelligence of a stuffed egg. This was inappropriate, and I formally apologize for and retract this analogy. My breakfast egg today deserved no less."), but you have to be VERY sure of your ground before you take that option - claiming Aspergers is not sufficient. This'll probably sidetrack everyone terribly (for which I'm not sure if I apologize; it might be a good thing for some people to get stuck in TVTropes for a while) but this write-up about Asperger Syndrome gives an excellent comment: http://tvtropes.org/pmwiki/pmwiki.php/UsefulNotes/AspergerSyndrome """ Most genuine Aspies don't see Aspergers as a 'Get Out Of Jerk Ass Free' card, just an explanation. If somebody offends you, then tells you they have Asperger Syndrome and that's why they offended you, you can generally tell if this is true by a simple observation - If the admittance is followed (or preceded) by a genuine apology, it may be true. If it's followed by the expectation that you should now apologise to them for being offended, they're probably just jerks. """ I'll let that speak for itself. ChrisA From ethan at stoneleaf.us Fri Jan 31 17:28:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 31 Jan 2014 08:28:57 -0800 Subject: [Python-ideas] [off-topic] Insults, English, and Aspergers In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: <52EBCF49.6070302@stoneleaf.us> On 01/31/2014 07:45 AM, Mark Lawrence wrote: >> On Sat, Feb 1, 2014 at 2:13 AM, Mark Lawrence wrote: >>>>On 01/30/2014 10:16 AM, Mark Lawrence wrote: >>>>> >>>>> Core dev time could be used more effectively if they weren't sidetracked >>>>> by non-issues e.g. blithering idiots who keep reopening issues on the bug >>>>> tracker. I won't mention any names. >>> >>> Who says I was getting at Anotoly? Unless the English language has changed >>> without my knowledge, you'll find that "idiots" and "names" are plural and >>> not singular. > > Asperger Syndrome sufferers are always honest. [...] We also take things > literally and write things literally. So your "obvious to everyone what > you were saying" to me is clearly incorrect. Please withdraw the comment. What a load of crap. If you care to discuss this further, mail me off-list and stop wasting developer time. -- ~Ethan~ From breamoreboy at yahoo.co.uk Fri Jan 31 18:28:34 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 31 Jan 2014 17:28:34 +0000 Subject: [Python-ideas] [off-topic] Insults, English, and Aspergers In-Reply-To: <52EBCF49.6070302@stoneleaf.us> References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> <52EBCF49.6070302@stoneleaf.us> Message-ID: On 31/01/2014 16:28, Ethan Furman wrote: > On 01/31/2014 07:45 AM, Mark Lawrence wrote: >>> On Sat, Feb 1, 2014 at 2:13 AM, Mark Lawrence wrote: >>>>> On 01/30/2014 10:16 AM, Mark Lawrence wrote: >>>>>> >>>>>> Core dev time could be used more effectively if they weren't >>>>>> sidetracked >>>>>> by non-issues e.g. blithering idiots who keep reopening issues on >>>>>> the bug >>>>>> tracker. I won't mention any names. >>>> >>>> Who says I was getting at Anotoly? Unless the English language has >>>> changed >>>> without my knowledge, you'll find that "idiots" and "names" are >>>> plural and >>>> not singular. >> >> Asperger Syndrome sufferers are always honest. [...] We also take things >> literally and write things literally. So your "obvious to everyone what >> you were saying" to me is clearly incorrect. Please withdraw the >> comment. > > What a load of crap. > > If you care to discuss this further, mail me off-list and stop wasting > developer time. > Your opinion, clearly not mine. Further I don't discuss anything that starts on a Python mailing list with people offline as I don't believe in discussing things behind other people's backs. As for wasting developer time how much has been wasted by various people who've routinely insulted Python and by continuance its developers, and yet they're still allowed to spew their nonsense and get away with it? Yet again we're into the dual standards that annoy me so much, yet for speaking my mind I'm the one in the wrong. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From breamoreboy at yahoo.co.uk Fri Jan 31 18:32:11 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 31 Jan 2014 17:32:11 +0000 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On 31/01/2014 16:15, Chris Angelico wrote: > On Sat, Feb 1, 2014 at 2:45 AM, Mark Lawrence wrote: >> Asperger Syndrome sufferers are always honest. Sadly I find it a major >> weakness that I have to live with. We also take things literally and write >> things literally. So your "obvious to everyone what you were saying" to me >> is clearly incorrect. Please withdraw the comment. > > I know what it's like to live with Aspergers, I have it myself (at > least, not formally diagnosed but it seems pretty likely). And I do > put my foot in my mouth pretty often. But that doesn't mean that I can > hide behind it as a shield when it's this obvious. You knew full well > what you were saying when you said you wouldn't mention any names. > Once again I most certainly *DID NOT*. I knew full well what I was writing. I quite deliberately used plurals for that very purpose. Please in future stick to your bible bashing as you clearly know far more about that than you know about Asperger, with myself having a formal diagnosis. And please don't bother to withdraw your comment now or apologise, I wouldn't accept either as being in any way, shape or form genuine. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rosuav at gmail.com Fri Jan 31 18:42:04 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 1 Feb 2014 04:42:04 +1100 Subject: [Python-ideas] Iterative development In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> Message-ID: On Sat, Feb 1, 2014 at 4:32 AM, Mark Lawrence wrote: > On 31/01/2014 16:15, Chris Angelico wrote: >> >> On Sat, Feb 1, 2014 at 2:45 AM, Mark Lawrence >> wrote: >>> >>> Asperger Syndrome sufferers are always honest. Sadly I find it a major >>> weakness that I have to live with. We also take things literally and >>> write >>> things literally. So your "obvious to everyone what you were saying" to >>> me >>> is clearly incorrect. Please withdraw the comment. >> >> >> I know what it's like to live with Aspergers, I have it myself (at >> least, not formally diagnosed but it seems pretty likely). And I do >> put my foot in my mouth pretty often. But that doesn't mean that I can >> hide behind it as a shield when it's this obvious. You knew full well >> what you were saying when you said you wouldn't mention any names. >> > > Once again I most certainly *DID NOT*. I knew full well what I was writing. > I quite deliberately used plurals for that very purpose. Please in future > stick to your bible bashing as you clearly know far more about that than you > know about Asperger, with myself having a formal diagnosis. > > And please don't bother to withdraw your comment now or apologise, I > wouldn't accept either as being in any way, shape or form genuine. I wouldn't withdraw my comment, because I still stand by it. If you genuinely meant no specifics, then when someone pointed out how they interpreted your statement, you would have apologized and made a correction: "I didn't mean anyone in particular, I meant the way there've been 50 issues reopened unnecessarily by 30 different people lately", or something. But that wouldn't be true, would it? You really did mean Anatoly, and that's why you said what you did. Believe you me, I know more than you think I do. Think of Emma from "Once Upon A Time" if you like - a strong ability to detect lying, based on a metric ton of experience with it. ChrisA From rosuav at gmail.com Fri Jan 31 18:42:50 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 1 Feb 2014 04:42:50 +1100 Subject: [Python-ideas] [off-topic] Insults, English, and Aspergers In-Reply-To: References: <23EC770C-0A37-4370-AD15-537069CC6C77@yahoo.com> <52EBCF49.6070302@stoneleaf.us> Message-ID: On Sat, Feb 1, 2014 at 4:28 AM, Mark Lawrence wrote: > As for wasting developer time how much has been wasted by various people > who've routinely insulted Python and by continuance its developers, and yet > they're still allowed to spew their nonsense and get away with it? Yet again > we're into the dual standards that annoy me so much, yet for speaking my > mind I'm the one in the wrong. No, you're not in the wrong for speaking your mind. You're declared to be (or treated as) in the wrong based on an objective analysis of the content and style of your posts. Anatoly is at fault too, but your issues are almost completely tangential to his. They just happen to be in the same thread. Now please, stop talking. Trust me, you're only digging yourself further into a hole. This discussion is way way off topic, and I'm becoming painfully aware that I've said way too much already myself. ChrisA From ctb at msu.edu Fri Jan 31 18:44:31 2014 From: ctb at msu.edu (C. Titus Brown) Date: Fri, 31 Jan 2014 09:44:31 -0800 Subject: [Python-ideas] Iterative development In-Reply-To: References: Message-ID: <20140131174431.GA30515@idyll.org> On Sat, Feb 01, 2014 at 04:42:04AM +1100, Chris Angelico wrote: > On Sat, Feb 1, 2014 at 4:32 AM, Mark Lawrence wrote: > > On 31/01/2014 16:15, Chris Angelico wrote: > >> > >> On Sat, Feb 1, 2014 at 2:45 AM, Mark Lawrence > >> wrote: > >>> > >>> Asperger Syndrome sufferers are always honest. Sadly I find it a major > >>> weakness that I have to live with. We also take things literally and > >>> write > >>> things literally. So your "obvious to everyone what you were saying" to > >>> me > >>> is clearly incorrect. Please withdraw the comment. > >> > >> > >> I know what it's like to live with Aspergers, I have it myself (at > >> least, not formally diagnosed but it seems pretty likely). And I do > >> put my foot in my mouth pretty often. But that doesn't mean that I can > >> hide behind it as a shield when it's this obvious. You knew full well > >> what you were saying when you said you wouldn't mention any names. > >> > > > > Once again I most certainly *DID NOT*. I knew full well what I was writing. > > I quite deliberately used plurals for that very purpose. Please in future > > stick to your bible bashing as you clearly know far more about that than you > > know about Asperger, with myself having a formal diagnosis. > > > > And please don't bother to withdraw your comment now or apologise, I > > wouldn't accept either as being in any way, shape or form genuine. > > I wouldn't withdraw my comment, because I still stand by it. If you > genuinely meant no specifics, then when someone pointed out how they > interpreted your statement, you would have apologized and made a > correction: "I didn't mean anyone in particular, I meant the way > there've been 50 issues reopened unnecessarily by 30 different people > lately", or something. But that wouldn't be true, would it? You really > did mean Anatoly, and that's why you said what you did. Believe you > me, I know more than you think I do. Think of Emma from "Once Upon A > Time" if you like - a strong ability to detect lying, based on a > metric ton of experience with it. Hi all, this is getting rather meta, in a profoundly unproductive way. Can we stick to software development, Python, and non-personal-or-plural insults, please? thanks, --titus [ <-- wearing moderator hat ] -- C. Titus Brown, ctb at msu.edu