Mailman 3 October 2005 - Python-Dev

Re: [Python-Dev] Pre-PEP: Task-local variables
by Phillip J. Eby Nov. 17, 2005

Nov. 17, 2005

At 08:57 AM 10/20/2005 -0700, Guido van Rossum wrote: >Whoa, folks! Can I ask the gentlemen to curb their enthusiasm? > >PEP 343 is still (back) on the drawing table, PEP 342 has barely been >implemented (did it survive the AST-branch merge?), and already you >are talking about adding more stuff. Please put on the brakes! Sorry. I thought that 343 was just getting a minor tune-up. In the months since the discussion and approval (and implementation; Michael Hudson actually … [View More]had a PEP 343 patch out there), I've been doing a lot of thinking about how they will be used in applications, and thought that it would be a good idea to promote people using task-specific variables in place of globals or thread-locals. The conventional wisdom is that global variables are bad, but the truth is that they're very attractive because they allow you to have one less thing to pass around and think about in every line of code. Without globals, you would sooner or later end up with every function taking twenty arguments to pass through states down to other code, or else trying to cram all this data into some kind of "context" object, which then won't work with code that doesn't know about *your* definition of what a context is. Globals are thus extremely attractive for practical software development. If they weren't so useful, it wouldn't be necessary to warn people not to use them, after all. :) The problem with globals, however, is that sometimes they need to be changed in a particular context. PEP 343 makes it safer to use globals because you can always offer a context manager that changes them temporarily, without having to hand-write a try-finally block. This will make it even *more* attractive to use globals, which is not a problem as long as the code has no multitasking of any sort. Of course, the multithreading scenario is usually fixed by using thread-locals. All I'm proposing is that we replace thread locals with task locals, and promote the use of task-local variables for managed contexts (such as the decimal context) *that would otherwise be a global or a thread-local variable*. This doesn't seem to me like a very big deal; just an encouragement for people to make their stuff easy to use with PEP 342 and 343. By the way, I don't know if you do much with Java these days, but a big part of the whole J2EE fiasco and the rise of the so-called "lightweight containers" in Java has all been about how to manage implicit context so that you don't get stuck with either the inflexibility of globals or the deadweight of passing tons of parameters around. One of the big selling points of AspectJ is that it lets you implicitly funnel parameters from point A to point B without having to modify all the call signatures in between. In other words, its use is promoted for precisely the sort of thing that 'with' plus a task variable would be ideal for. As far as I can tell, 'with' plus a task variable is *much* easier to explain, use, and understand than an aspect-oriented programming tool is! (Especially from the "if the implementation is easy to explain, it may be a good idea" perspective.) >I know that somewhere in the proto-PEP Phillip argues that the context >API needs to be made a part of the standard library so that his >trampoline can efficiently swap implicit contexts required by >arbitrary standard and third-party library code. My response to that >is that library code (whether standard or third-party) should not >depend on implicit context unless it assumes it can assume complete >control over the application. I think maybe there's some confusion here, at least on my part. :) I see two ways to read your statement, one of which seems to be saying that we should get rid of the decimal context (because it doesn't have complete control over the application), and the other way of reading it doesn't seem connected to what I proposed. Anything that's a global variable is an "implicit context". Because of that, I spent considerable time and effort in PEAK trying to utterly stamp out global variables. *Everything* in PEAK has an explicit context. But that then becomes more of a pain to *use*, because you are now stuck with managing it, even if you cram it into a Zope-style acquisition tree so there's only one "context" to deal with. Plus, it assumes that everything the developer wants to do can be supplied by *one* framework, be it PEAK, Zope, or whatever, which is rarely the case but still forces framework developers to duplicate everybody else's stuff. In other words, I've come to realize that the path the major Python application frameworks is not really Pythonic. A Pythonic framework shouldn't load you down with new management burdens and keep you from using other frameworks. It should make life easier, and make your code *more* interoperable, not less. Indeed, I've pretty much come to agreement with the part of the Python developer community that has says Frameworks Are Evil. A primary source of this evil in the big three frameworks (PEAK, Twisted, and Zope) stem from their various approaches to dealing with this issue of context, which lack the simplicity of global (or task-local) variables. So, the lesson I've taken from my attempt to make everything explicit is that what developers *really* want is to have global variables, just without the downsides of uncontrolled modifications, and inter-thread or inter-task pollution. Explicit isn't always better than implicit, because oftentimes the practicality of having implicit things is much more important than the purity of making them all explicit. Simple is better than complex, and task-local variables are *much* simpler than trying to make everything explicit. >Also, Nick wants the name 'context' for PEP-343 style context >managers. I think it's overloading too much to use the same word for >per-thread or per-coroutine context. Actually, I was the one who originally proposed the term "context manager", and it doesn't seem like a conflict to me. Indeed, I suggested in the pre-PEP that "@context.manager" might be where we could put the decorator. The overload was intentional, to suggest that when creating a new context manager, it's worth considering whether the state should be kept in a context variable, rather than a global variable. The naming choice was for propaganda purposes, in other words. :) Anyway, I'll withdraw the proposal for now. We can always leave it out of 2.5, I can release an independent implementation, and then submit it for consideration again in the 2.6 timeframe. I just thought it would be a no-brainer to use task locals where thread locals are currently being used, and that's really all I was proposing we do as far as stdlib changes anyway. I was also hoping to get good input from Python-dev regarding some of the open issues, to try and build a consensus on them from the beginning. [View Less]

7 7

int(string) (was: DRAFT: python-dev Summary for 2005-09-01 through 2005-09-16)
by Tim Peters Nov. 13, 2005

Nov. 13, 2005

... > ----------------------------- > Speeding up list append calls > ----------------------------- > > A `comp.lang.python message from Tim Peters`_ prompted Neal Norwitz > to investigate how the code that Tim posted could be sped up. He > hacked the code to replace var.append() with the LIST_APPEND opcode, > .... Someone want a finite project that would _really_ help their Uncle Timmy in his slow-motion crusade to get Python on the list of "solved it!" languages for … [View More]each problem on that magnificent site? http://spoj.sphere.pl It turns out that many of the problems there have input encoded as vast quantities of integers (stdin is a mass of decimal integers on one or more lines). Most infamous for Python is this tutorial (you don't get points for solving it) problem, which is _trying_ to test whether your language of choice can read from stdin "fast enough": http://spoj.sphere.pl/problems/INTEST/ """ The input begins with two positive integers n k (n, k<=10**7). The next n lines of input contain one positive integer t_i, not greater than 10**9, each. Output Write a single integer to output, denoting how many integers t_i are divisable by k. Example Input: 7 3 1 51 966369 7 9 999996 11 Output: 4 """ There's an 8-second time limit, and I believe stdin is about 8MB (you're never allowed to see the actual input they use). They have a slower machine than you use ;-), so it's harder than it sounds. To date, 975 people have submitted a program that passed, but only a few managed to do it using Python. I did, and it required every trick in the book, including using psyco. Turns out it's _not_ input speed that's the problem here, and not even mainly the speed of integer mod: the bulk of the time is spent in int(string) (and, yes, that's also far more important to the problem Neal was looking at than list.append time). If you can even track all the levels of C function calls that ends up invoking <wink>, you find yourself in PyOS_strtoul(), which is a nifty all-purpose routine that accepts inputs in bases 2 thru 36, can auto-detect base, and does platform-independent overflow checking at the cost of a division per digit. All those are features, but it makes for sloooow conversion. I assume it's the overflow-checking that's the major time sink, and it's not correct anyway: it does the check slightly differently for base 10 than for any other base, explained only in the checkin comment for rev 2.13, 8 years ago: For base 10, cast unsigned long to long before testing overflow. This prevents 4294967296 from being an acceptable way to spell zero! So what are the odds that base 10 was the _only_ base that had a "bad input" case for the overflow-check method used? If you thought "slim", you were right ;-) Here are other bad cases, under all Python versions to date (on a 32-bit box; if sizeof(long) == 8, there are different bad cases): int('102002022201221111211', 3) = 0 int('32244002423141', 5) = 0 int('1550104015504', 6) = 0 int('211301422354', 7) = 0 int('12068657454', 9) = 0 int('1904440554', 11) = 0 int('9ba461594', 12) = 0 int('535a79889', 13) = 0 int('2ca5b7464', 14) = 0 int('1a20dcd81', 15) = 0 int('a7ffda91', 17) = 0 int('704he7g4', 18) = 0 int('4f5aff66', 19) = 0 int('3723ai4g', 20) = 0 int('281d55i4', 21) = 0 int('1fj8b184', 22) = 0 int('1606k7ic', 23) = 0 int('mb994ag', 24) = 0 int('hek2mgl', 25) = 0 int('dnchbnm', 26) = 0 int('b28jpdm', 27) = 0 int('8pfgih4', 28) = 0 int('76beigg', 29) = 0 int('5qmcpqg', 30) = 0 int('4q0jto4', 31) = 0 int('3aokq94', 33) = 0 int('2qhxjli', 34) = 0 int('2br45qb', 35) = 0 int('1z141z4', 36) = 0 IOW, the only bases that _aren't_ "bad" are powers of 2, and 10 because it's special-cased (BTW, I'm not sure that base 10 doesn't have a different bad case now, but don't care enough to prove it one way or the other). Now fixing that is easy: the problem comes from being too clever, doing both a multiply and an addition before checking for overflow. Check each operation on its own and it would be bulletproof, without special-casing. But that might be even slower (it would remove the branch special-casing 10, but add a cheap integer addition overflow check with its own branch). The challenge (should you decide to accept it <wink>) is to replace the overflow-checking with something both correct _and_ much faster than doing n integer divisions for an n-character input. For example, 36**6 < 2**32-1, so whenever the input has no more than 6 digits overflow is impossible regardless of base and regardless of platform. That's simple and exploitable. For extra credit, make int(string) go faster than preparing your taxes ;-) BTW, Python as-is can be used to solve many (I'd bet most) of these problems in the time limit imposed, although it may take some effort, and it may not be possible without using psyco. A Python triumph I'm particularly fond of: http://spoj.sphere.pl/problems/FAMILY/ The legend at the bottom: Warning: large Input/Output data, be careful with certain languages seems to be a euphemism for "don't even think about using Python" <0.9 wink>. But there's a big difference in this one: it's a _hard_ problem, requiring graph analysis, delicate computation, greater than double-precision precision (in the end), and can hugely benefit from preprocessing a batch of queries to plan out and minimize the number of operations needed. Five people have solved it to date (click on "Best Solutions"), and you'll see that my Python entry is the second-fastest so far, beating 3 C++ entries by 3 excellent C++ programmers. I don't know what they did, but I suspect I was far more willing to code up an effective but tedious "plan out and minimize" phase _because_ I was using Python. I sure didn't beat them on reading the mass quantities of integers from stdin <wink>. [View Less]

8 12

PEP 352 Transition Plan
by Raymond Hettinger Nov. 5, 2005

Nov. 5, 2005

I don't follow why the PEP deprecates catching a category of exceptions in a different release than it deprecates raising them. Why would a release allow catching something that cannot be raised? I must be missing something here. Raymond

7 11

a different kind of reduce...
by Martin Blais Nov. 1, 2005

Nov. 1, 2005

Hi I find myself occasionally doing this: ... = dirname(dirname(dirname(p))) I'm always--literally every time-- looking for a more functional form, something that would be like this: # apply dirname() 3 times on its results, initializing with p ... = repapply(dirname, 3, p) There is a way to hack something like that with reduce, but it's not pretty--it involves creating a temporary list and a lambda function: ... = reduce(lambda x, y: dirname(x), [p] + [None] * 3) Just … [View More]

10 9

PEP 351, the freeze protocol
by Barry Warsaw Nov. 1, 2005

Nov. 1, 2005

I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest. http://www.python.org/peps/pep-0351.html Cheers, -Barry

13 28

svn checksum error
by skip＠pobox.com Nov. 1, 2005

Nov. 1, 2005

I tried "svn up" to bring my sandbox up-to-date and got this output: % svn up U Include/unicodeobject.h subversion/libsvn_wc/update_editor.c:1609: (apr_err=155017) svn: Checksum mismatch for 'Objects/.svn/text-base/unicodeobject.c.svn-base'; expected: '8611dc5f592e7cbc6070524a1437db9b', actual: '2d28838f2fec366fc58386728a48568e' What's that telling me? Thx, Skip

5 6

Divorcing str and unicode (no more implicit conversions).
by Bengt Richter Nov. 1, 2005

Nov. 1, 2005

Please bear with me for a few paragraphs ;-) One aspect of str-type strings is the efficiency afforded when all the encoding really is ascii. If the internal encoding were e.g. fixed utf-16le for strings, maybe with today's computers it would still be efficient enough for most actual string purposes (excluding the current use of str-strings as byte sequences). I.e., you'd still have to identify what was "strings" (of characters) and what was really byte sequences with no implied or explicit … [View More]encoding or character semantics. Ok, let's make that distinction explicit: Call one kind of string a byte sequence and the other a character sequence (representation being a separate issue). A unicode object is of course the prime _general_ representation of a character sequence in Python, but all the names in python source code (that become NAME tokens) are UIAM also character sequences, and representable by a byte sequence interpreted according to ascii encoding. For the sake of discussion, suppose we had another _character_ sequence type that was the moral equivalent of unicode except for internal representation, namely a str subclass with an encoding attribute specifying the encoding that you _could_ use to decode the str bytes part to get unicode (which you wouldn't do except when necessary). We could call it class charstr(str): ... and have chrstr().bytes be the str part and chrstr().encoding specify the encoding part. In all the contexts where we have obvious encoding information, we can then generate a charstr instead of a str. E.g., if the source of module_a has # -*- coding: latin1 -*- cs = 'über-cool' then type(cs) # => <type 'charstr'> cs.bytes # => '\xfcber-cool' cs.encoding # => 'latin-1' and print cs would act like print cs.bytes.decode(cs.encoding) -- or I guess sys.stdout.write(cs.bytes.decode(cs.encoding).encode(sys.stdout.encoding) followed by sys.stdout.write('\n'.decode('ascii').encode(sys.stdout.encoding) for the newline of the print. Now if module_b has # -*- coding: utf8 -*- cs = 'über-cool' and we interactively import module_a, module_b and then print module_a.cs + ' =?= ' + module_b.cs what could happen ideally vs. what we have currently? UIAM, currently we would just get the concatenation of the three str byte sequences concatenated to make '\xfcber-cool =?= \xc3\xbcber-cool' and that would be printed as whatever that comes out as without conversion when seen by the output according to sys.stdout.encoding. But if those cs instances had been charstr instances, the coding cookie encoding information would have been preserved, and the interactive print could have evaluated the string expression -- given cs.decode() as sugar for (cs.bytes.decode(cs.encoding or globals().get('__encoding__') or __import__('sys').getdefaultencoding())) -- as module_a.cs.decode() + ' =?= '.decode() + module_b.cs.decode() if pairwise terms differ in encoding as they might all here. If the interactive session source were e.g. latin-1, like module_a, then module_a.cs + ' =?= ' would not require an encoding change, because the ' =?= ' would be a charstr instance with encoding == 'latin-1', and so the result would still be latin-1 that far. But with module_b.cs being utf8, the next addition would cause the .decode() promotions to unicode. In a console window, the ' =?= '.encoding might be 'cp437' or such, and the first addition would then cause promotion (since module_a.cs.encoding != 'cp437'). I have sneaked in run-time access to individual modules' encodings by assuming that the encoding cookie could be compiled in as an explicit global __encoding__ variable for any given module (what to have as __encoding__ for built-in modules could vary for various purposes). ISTM this could have use in situations where an encoding assumption is necessary and currently 'ascii' is not as good a guess as one could make, though I suspect if string literals became charstr strings instead of str strings, many if not most of those situations would disappear (I'm saying this because ATM I can't think of an 'ascii'-guess situation that wouldn't go away ;-) If there were a charchr() version of chr() that would result in a charstr instead of a str, IWT one would want an easy-sugar default encoding assumption, probably based on the same as one would assume for '%c' % num in a given module source -- which presumably would be '%c'.encoding, where '%c' assumes the encoding of the module source, normally recorded in __encoding__. So charchr(n) would act like chr(n).decode().encode(''.encoding) -- or more reasonably charstr(chr(n)), which would be short for charstr(chr(n), globals().get('__encoding__') or __import__('sys').getdefaultencoding()) Or some efficient equivalent ;-) Using strings in dicts requires hashing to find key comparison candidates and comparison to check for key equivalence. This would seem to point to some kind of normalized hashing, but not necessarily normalized key representation. Some is apparently happening, since >>> hash('a') == hash(unicode('a')) True I don't know what would be worth the trouble to optimize string key usage where strings are really all of one encoding vs totally general use vs a heavily biased mix. Or even if it could be done without unreasonable complexity. Maybe a dict could be given an option to hash all its keys as unicode vs whatever it does now. But having a charstr subtype of str would improve the "implicit" conversions to unicode IMO. Anyway, I wanted to throw in my .02USD re the implicit conversions, taking the view that much of the implicitness could be based on reliable inferences from source encodings of string literals or from their effects as format strings. Regards, Bengt Richter [not a normal subscriber to python-dev, so I'll have to google for any responses] [View Less]

22 76

Freezing the CVS on Oct 26 for SVN switchover
by "Martin v. Löwis" Oct. 31, 2005

Oct. 31, 2005

I'd like to start the subversion switchover this coming Wednesday, with a total commit freeze at 16:00 GMT. If you have larger changes to commit that you would like to commit before the switchover, but after that date, please let me know. At that point, I will set the repository to read-only (through a commitinfo hook), and request that SF rolls a tarfile. I will then notify you when the Subversion repository is online. If you have sandboxes with modifications, it might be good to cvs diff -u … [View More]

12 22

StreamHandler eating exceptions
by Gustavo Niemeyer Oct. 31, 2005

Oct. 31, 2005

The StreamHandler available under the logging package is currently catching all exceptions under the emit() method call. In the Handler.handleError() documentation it's mentioned that it's implemented like that because users do not care about errors in the logging system. I'd like to apply the following patch: Index: Lib/logging/__init__.py =================================================================== --- Lib/logging/__init__.py (revision 41357) +++ Lib/logging/__init__.py (… [View More]

4 3

Parser and Runtime: Divorced!
by Evan Jones Oct. 30, 2005

Oct. 30, 2005

After a few hours of tedious and frustrating hacking I've managed to separate the Python abstract syntax tree parser from the rest of Python itself. This could be useful for people who may wish to build Python tools without Python, or tools in C/C++. In the process of doing this, I came across a comment mentioning that it would be desirable to separate the parser. Is there any interest in doing this? I now have a vague idea about how to do this. Of course, there is no point in making … [View More]

1 1