Mailman 3 June 2010 - Python-ideas

i18n and Python tracebacks
by Andre Roberge 28 Oct '12

28 Oct '12

I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example. For example, using French as my default local language, instead of >>> 1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero I might get something like >>> 1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro André

9 15

explicitation lines in python ?
by Daniel DELAY 12 Jul '10

12 Jul '10

If we could explicitate a too complex expression in an indented next line, I would use this feature very often : htmltable = ''.join( '<tr>{}</tr>'.format(htmlline) for line in table) : # main line htmlline : ''.join( '<td>{}</td>'.format(cell) for cell in line) # explicitation(s) line(s) (Sorry if this has already been discussed earlier on this list, I have not read all the archives) ******* in details : List comprehension "<expression> for x in mylist" often greatly improve readability of python programs, when <expression> is not too complex When <expression> is too complex (ex: nested lists), this become not readable, so we have to find another solution : a) defining a function expression(x), or an iterator function, wich will only used once in the code b) or droping this beautiful syntax to replace it the very basic list construction : newlist = [] for x in myiterable newlist.append(<expression>) I often choose b), but I dislike both solutions : - in solution a) function def can be far from list comprehension, in fact instructions to build the new list are split in two different places in the code. - solution b) seems a bit better to me, but the fact we build a new list from myiterable is not visible in a glance, unlike list comprehensions. Renouncing to list comprehension occurs rather often when I write python code I think we could greatly improve readability if we could keep list comprehension anywhere in all cases, but when necessary explicit a too complex expression in an indented line : htmltable = ''.join( '<tr>{}</tr>'.format(htmlline) for line in table) : # main line htmlline : ''.join( '<td>{}</td>'.format(cell) for cell in line) # explicitation(s) line(s) In the case the main line is the header of a "classical" indented block (starting with "for", "if", "with"...) , this idented block would simply follow the explicitation(s) line(s). The explicitations lines can be surely identified as the lines than begin with "identifier :" (when we are not in an unclosed dict) with open('data.txt') as f : if line in enumerate(mylist) : # main line mylist : f.read().strip().lower() # explicitation(s) line(s) print line # "classical" indented block Another possible use of "explicitations lines" is a coding style wich would start by "the whole picture" first, and completing with details after, wich is the usual way we mentally solve problems. Let's take an example : we want to write a function wich returns a multiplication table in a simle html document. When I solve this problem, il think a bit like that : - I need to return an html page. For that I need a "header" and a "body". My body will contain an "htmltable", wich be built from a "table" of numbers etc. My code could look like that : def tag(content, *tags): # little convenient function retval = content for t in tags: retval = '<{0}>{1}</{0}>'.format(t, retval) return retval def xhtml_mult_table(a, b): return tag(header + body, 'html') : header : tag('multiplication table', 'title') body : tag(htmltable, 'tbody', 'table', 'body') : htmltable : ''.join(tag(xhtmlline, 'tr') for line in table) : table : headerline + otherlines : headerline : [[''] + range(a)] otherlines : [[y] + [x*y for x in range(a)] for y in range(b)] xhtmlline : ''.join(tag(str(cell), 'td') for cell in line) This example is a "heavy use" of the "explicitation line" feature, to illustrate how it could work. I don't mean this should replace the "classical" syntax everywhere possible, but for me this would be for me a nice way to explicitate complex expressions from time to time, and the ability to use list comprehension everywhere I wan't. Daniel

8 17

Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?
by Mark Summerfield 01 Jul '10

01 Jul '10

Hi, Python 3 has two string prefixes r"" for raw strings and b"" for bytes. So if you want to create a regex based on bytes as far as I can tell you have to do something like this: FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii")) # or FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)") I think it would be much nicer if one could write: FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)") I _slightly_ prefer rb"" to br"" but either would be great:-) Why would you want a bytes regex? In my case I am reading PostScript files and PostScript .pfa font files so that I can embed the latter into the former. But I don't know what encoding these files use beyond the fact that it is ASCII or some ASCII superset like Latin1. So in true Python style I don't assume: instead I read the files as bytes and do all my processing using bytes, at no point decoding since I only ever insert ASCII characters. I don't think this is a rare example: with Python 3's clean separation between strings & bytes (a major advance IMO), I think there will often be cases where all the processing is done using bytes. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html

7 7

feature to make traceback objects usable without references to frame locals and globals
by ghazel＠gmail.com 30 Jun '10

30 Jun '10

Hi, I'm interested in a feature which allows users to discard the locals and globals references from frames held by a traceback object. Currently, traceback objects are used when capturing and re-raising exceptions. However, they hold a reference to all frames, which hold a reference to their locals and globals. These are not needed by the default traceback output, and can cause serious memory bloat if a reference to a traceback object is kept for any significant length of time, and there are even big red warnings in the Python docs about using them in one frame. ( http://docs.python.org/release/3.1/library/sys.html#sys.exc_info ). Example usage would be something like: import sys try: 1/0 except: t, v, tb = sys.exc_info() tb.clean() # ... much later ... raise t, v, tb Which would be basically a function to do this: import sys try: 1/0 except: t, v, tb = sys.exc_info() c = tb while c: c.tb_frame.f_locals = None c.tb_frame.f_globals = None c = c.tb_next # ... much later ... raise t, v, tb Twisted has done a very similar thing with their twisted.python.failure.Failure object, which stringifies the traceback data and discards the reference to the Python traceback entirely ( http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.0.0/twisted/… ) - they also replicate a lot of traceback printing functions to make use of this stringified data. It's worth noting that cgitb and other applications make use of locals and globals in its traceback output. However, I believe the vast majority of traceback usage does not make use of these references, and a significant penalty is paid as a result. Is there any interest in such a feature? -Greg

11 31

Re: [Python-ideas] [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
by Guido van Rossum 29 Jun '10

29 Jun '10

I'm moving this thread to python-ideas, where it belongs. I've looked at the implementation code (even stepped through it with pdb!), read the sample/test code, and read the two papers on animats.com fairly closely (they have a lot of overlap, and the memory model described below seems copied verbatim from http://www.animats.com/papers/languages/pythonconcurrency.html version 0.8). Some reactions (trying to hide my responses to the details of the code): - First of all, I'm very happy to see radical ideas proposed, even if they are at present unrealistic. We need a big brainstorm to come up with ideas from which an eventual solution to the multicore problem might be chosen. (Jesse Noller's multiprocessing is another; Adam Olsen's work yet another, at a different end of the spectrum.) - The proposed new semantics (frozen objects, memory model, auto-freezing of globals, enforcement of naming conventions) are radically different from Python's current semantics. They will break every 3rd party library in many more ways than Python 3. This is not surprising given the goals of the proposal (and its roots in Adam Olsen's work) but places a huge roadblock for acceptance. I see no choice but to keep trying to come up with a compromise that is more palatable and compatible without throwing away all the advantages. As it now stands, the proposal might as well be a new and different language. - SynchronizedObject looks like a mixture of a Java synchronized class (a non-standard concept in Java but easily understood as a class all whose public methods are synchronized) and a condition variable (which has the same semantics of releasing the lock while waiting but without crawling the stack for other locks to release). It looks like the examples showing off SynchronizedObject could be implemented just as elegantly using a condition variable (and voluntary abstention from using shared mutable objects). - If the goal is to experiment with new control structures, I recommend decoupling them from the memory model and frozen objects, instead relying (as is traditional in Python) on programmer caution to avoid races. This would make it much easier to see how programmers respond to the new control structures. - You could add the freeze() function for voluntary use, and you could even add automatic wrapping of arguments and return values for certain classes using a class decorator or a metaclass, but the performance overhead makes this unlikely to win over many converts. I don't see much use for the "whole program freezing" done by the current prototype -- there are way too many backdoors in Python for the prototype approach to be anywhere near foolproof, and if we want a non-foolproof approach, voluntary constraint (and, in some cases, voluntary, i.e. explicit, wrapping of modules or classes) would work just as well. - For a larger-scale experiment with the new memory model and semantic restrictions (or would it be better to call them syntactic restrictions? -- after all they are about statically detectable properties like naming conventions) I recommend looking at PyPy, which has as one of its explicitly stated project goals easy experimentation with different object models. - I'm sure I've forgotten something, but I wanted to keep my impressions fresh. - Again, John, thanks for taking the time to come up with an implementation of your idea! --Guido On Sat, Jun 26, 2010 at 9:39 AM, John Nagle <nagle(a)animats.com> wrote: > On 6/26/2010 7:44 AM, Jesse Noller wrote: >> >> On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord >> <fuzzyman(a)voidspace.org.uk> wrote: >>> >>> On 26/06/2010 07:11, John Nagle wrote: >>>> >>>> We have just released a proof-of-concept implementation of a new >>>> approach to thread management - "newthreading". > > .... > >>> The import * form is considered bad practise in *general* and >>> should not be recommended unless there is a good reason. > > I agree. I just did that to make the examples cleaner. > >>> however the introduction of free-threading in Python has not been >>> hampered by lack of synchronization primitives but by the >>> difficulty of changing the interpreter without unduly impacting >>> single threaded code. > > That's what I'm trying to address here. > >>> Providing an alternative garbage collection mechanism other than >>> reference counting would be a more interesting first-step as far as >>> I can see, as that removes the locking required around every access >>> to an object (which currently touches the reference count). >>> Introducing free-threading by *changing* the threading semantics >>> (so you can't share non-frozen objects between threads) would not >>> be acceptable. That comment is likely to be based on a >>> misunderstanding of your future intentions though. :-) > > This work comes out of a discussion a few of us had at a restaurant > in Palo Alto after a Stanford talk by the group at Facebook which > is building a JIT compiler for PHP. We were discussing how to > make threading both safe for the average programmer and efficient. > Javascript and PHP don't have threads at all; Python has safe > threading, but it's slow. C/C++/Java all have race condition > problems, of course. The Facebook guy pointed out that you > can't redefine a function dynamically in PHP, and they get > a performance win in their JIT by exploiting this. > > I haven't gone into the memory model in enough detail in the > technical paper. The memory model I envision for this has three > memory zones: > > 1. Shared fully-immutable objects: primarily strings, numbers, > and tuples, all of whose elements are fully immutable. These can > be shared without locking, and reclaimed by a concurrent garbage > collector like Boehm's. They have no destructors, so finalization > is not an issue. > > 2. Local objects. These are managed as at present, and > require no locking. These can either be thread-local, or local > to a synchronized object. There are no links between local > objects under different "ownership". Whether each thread and > object has its own private heap, or whether there's a common heap with > locks at the allocator is an implementation decision. > > 3. Shared mutable objects: mostly synchronized objects, but > also immutable objects like tuples which contain references > to objects that aren't fully immutable. These are the high-overhead > objects, and require locking during reference count updates, or > atomic reference count operations if supported by the hardware. > The general idea is to minimize the number of objects in this > zone. > > The zone of an object is determined when the object is created, > and never changes. This is relatively simple to implement. > Tuples (and frozensets, frozendicts, etc.) are normally zone 2 > objects. Only "freeze" creates collections in zones 1 and 3. > Synchronized objects are always created in zone 3. > There are no difficult handoffs, where an object that was previously > thread-local now has to be shared and has to acquire locks during > the transition. > > Existing interlinked data structures, like parse trees and GUIs, > are by default zone 2 objects, with the same semantics as at > present. They can be placed inside a SynchronizedObject if > desired, which makes them usable from multiple threads. > That's optional; they're thread-local otherwise. > > The rationale behind "freezing" some of the language semantics > when the program goes multi-thread comes from two sources - > Adam Olsen's Safethread work, and the acceptance of the > multiprocessing module. Olsen tried to retain all the dynamism of > the language in a multithreaded environment, but locking all the > underlying dictionaries was a boat-anchor on the whole system, > and slowed things down so much that he abandoned the project. > The Unladen Swallow documentation indicates that early thinking > on the project was that Olsen's approach would allow getting > rid of the GIL, but later notes indicate that no path to a > GIL-free JIT system is currently in development. > > The multiprocessing module provides semantics similar to > threading with "freezing". Data passed between processes is "frozen" > by pickling. Processes can't modify each other's code. Restrictive > though the multiprocessing module is, it appears to be useful. > It is sometimes recommended as the Pythonic approach to multi-core CPUs. > This is an indication that "freezing" is not unacceptable to the > user community. > > Most of the real-world use cases for extreme dynamism > involve events that happen during startup. Configuration files are > read, modules are selectively included, functions are overridden, tables > of references to functions are set up, regular expressions are compiled, > and the code is brought into the appropriately configured state. Then > the worker threads are started and the real work starts. The > "newthreading" approach allows all that. > > After two decades of failed attempts remove the Global > Interpreter Lock without making performance worse, it is perhaps > time to take a harder look at scaleable threading semantics. > > John Nagle > Animats -- --Guido van Rossum (python.org/~guido)

5 5

Re: [Python-ideas] [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
by Adam Olsen 29 Jun '10

29 Jun '10

On Sat, Jun 26, 2010 at 10:39, John Nagle <nagle(a)animats.com> wrote: > The rationale behind "freezing" some of the language semantics > when the program goes multi-thread comes from two sources - > Adam Olsen's Safethread work, and the acceptance of the > multiprocessing module. Olsen tried to retain all the dynamism of > the language in a multithreaded environment, but locking all the > underlying dictionaries was a boat-anchor on the whole system, > and slowed things down so much that he abandoned the project. > The Unladen Swallow documentation indicates that early thinking > on the project was that Olsen's approach would allow getting > rid of the GIL, but later notes indicate that no path to a > GIL-free JIT system is currently in development. That's not true. Refcounting was the boat-anchor, not dicts. I was unable to come up with a relatively simple replacement that scaled fully. The dicts shared as module globals and class dicts were a design issue, but more of an ideological one: concurrency mentality says you should only share immutable objects. Python prefers ad-hoc design, where you can do what you want so long as it's not particularly nasty. I was unable to find a way to have both, so I declared the python mentality the winner. The shareddict I came up with uses a read write lock, so that it's safe when you do mutate and doesn't bottleneck when you don't mutate. The only thing fancy was my method of checkpointing when doing a readlock->writelock transition, but there's a hundred other ways to accomplish that.

1 0

Rename time module to "posixtime"
by Alexander Belopolsky 18 Jun '10

18 Jun '10

One of the common complains about working with time values in Python, is that it some functionality is available in time module, some in datetime module and some in both. I propose a series of steps towards improving this situation. 1. Create posixtime.py initially containing just "from time import *" 2. Add python implementation of time.* functions to posixtime.py. 3. Rename time module to _posixtime and add time.py with a deprecation warning and "from _posixtime import *". Note that #2 may require to move some code from timemodule.c to datetimemodule.c, but at the binary level code compiled from these files is already linked together in datetimemodule. Moving the necessary code to datetime.c will help to eliminate current circular dependency between time and datetime.

10 19

Issue9004 Was:Rename time module to "posixtime"
by Alexander Belopolsky 17 Jun '10

17 Jun '10

On Thu, Jun 17, 2010 at 1:01 AM, Bruce Leban <bruce(a)leapyear.org> wrote: .. > When you say "And where in the docs would you explain the following: :-)" > that sounds like you're saying "this is too confusing we shouldn't document > it." To which I can only say :-( I presented what I consider to be a bug. I opened an issue 9004, [1] "datetime.utctimetuple() should not set tm_isdst flag to 0" for that. There is no point in documenting the following as expected behavior: >>> time.strftime('%c %z %Z', datetime.utcnow().utctimetuple()) 'Wed Jun 16 03:26:26 2010 -0500 EST' I believe it is better to fix it so that it produces >>> time.strftime('%c %z %Z', datetime.utcnow().utctimetuple()) 'Wed Jun 16 03:26:26 2010 ' instead. This, however shows limitation of datetime to timetuple conversion: there is currently no mechanism to store daylight saving time info in datetime object. See issue 9013. [2] Rather than fixing that, it would be much better to eliminate need for datetime to timetuple conversion in the first place. [1] http://bugs.python.org/issue9004 [2] http://bugs.python.org/issue9013

1 0

Re: [Python-ideas] Rename time module to "posixtime"
by Bruce Leban 17 Jun '10

17 Jun '10

-1 to moving anything The situation is confusing and moving things will add to that confusion for a significant length of time. What I would instead suggest is improving the docs. If I could look in one place to find any time function it would mitigate the fact that they're implemented in multiple places. --- Bruce (via android) On Jun 16, 2010 12:56 AM, "M.-A. Lemburg" <mal(a)egenix.com> wrote: Brett Cannon wrote: > On Tue, Jun 15, 2010 at 16:01, Cameron Simpson <cs(a)zip.com.au> wrote: >> On 15... -1. Please note that the time module provides access to low-level OS provided services which the datetime module does not expose. You cannot seriously expect that an application which happily uses the time module (only) for its limited date/time functionality to have to be rewritten just to stay compatible with Python. Note that not all applications are interested in sub-second accuracy and a computer without properly configured NTP and good internal clock doesn't even provide this accuracy to begin with (even if they happily pretend they do by exposing sub-second floats). You might want to do that for Python4 and then add all those time module functions using struct_time to the datetime module (returning datetime instances), but for Python3, we've had the stdlib reorg already. Renaming time -> posixtime falls into the same category. The only improvement I could see, would be to move calendar.timegm() to the time module, since that's where it belongs (keeping an alias in the calendar module, of course). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 16 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Ad... 2010-07-19: EuroPython 2010, Birmingham, UK 32 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, ... Python-ideas mailing list Python-ideas(a)python.org http://mail.python.org/mailman/listinfo/python-ide...

2 2

Moving calendar.timegm() to time module Was: Rename time module to "posixtime"
by Alexander Belopolsky 17 Jun '10

17 Jun '10

On Wed, Jun 16, 2010 at 1:37 PM, Brett Cannon <brett(a)python.org> wrote: .. >> The only improvement I could see, would be to move >> calendar.timegm() to the time module, since that's where >> it belongs (keeping an alias in the calendar module, of >> course). > > That should definitely happen at some point. > This is discussed in Issue 6280 <http://bugs.python.org/issue6280>. There are several issues with this proposal: 1. According to help(time), """ The Epoch is system-defined; on Unix, it is generally January 1st, 1970. The actual value can be retrieved by calling gmtime(0). """ Current calendar.gmtime implementation ignores this. The solution to this, may be to change help(time), though. 2. Current calendar.gmtime supports float values for hours, minutes and seconds in timedelta tuple. This is probably an unintended implementation artifact, but it is relied upon even in stdlib. See http://bugs.python.org/issue6280#msg107808 .

3 2