Mailman 3 December 2000 - Python-Dev

XML runtime errors?
by Fredrik Lundh 06 Mar '01

06 Mar '01

stoopid question: why the heck is xmllib using "RuntimeError" to flag XML syntax errors? raise RuntimeError, 'Syntax error at line %d: %s' % (self.lineno, message) what's wrong with "SyntaxError"? </F>

3 3

[PEP 223] Change the Meaning of \x Escapes
by Tim Peters 16 Feb '01

16 Feb '01

An HTML version of the attached can be viewed at http://python.sourceforge.net/peps/pep-0223.html This will be adopted for 2.0 unless there's an uproar. Note that it *does* have potential for breaking existing code -- although no real-life instance of incompatibility has yet been reported. This is explained in detail in the PEP; check your code now. although-if-i-were-you-i-wouldn't-bother<0.5-wink>-ly y'rs - tim PEP: 223 Title: Change the Meaning of \x Escapes Version: $Revision: 1.4 $ Author: tpeters(a)beopen.com (Tim Peters) Status: Active Type: Standards Track Python-Version: 2.0 Created: 20-Aug-2000 Post-History: 23-Aug-2000 Abstract Change \x escapes, in both 8-bit and Unicode strings, to consume exactly the two hex digits following. The proposal views this as correcting an original design flaw, leading to clearer expression in all flavors of string, a cleaner Unicode story, better compatibility with Perl regular expressions, and with minimal risk to existing code. Syntax The syntax of \x escapes, in all flavors of non-raw strings, becomes \xhh where h is a hex digit (0-9, a-f, A-F). The exact syntax in 1.5.2 is not clearly specified in the Reference Manual; it says \xhh... implying "two or more" hex digits, but one-digit forms are also accepted by the 1.5.2 compiler, and a plain \x is "expanded" to itself (i.e., a backslash followed by the letter x). It's unclear whether the Reference Manual intended either of the 1-digit or 0-digit behaviors. Semantics In an 8-bit non-raw string, \xij expands to the character chr(int(ij, 16)) Note that this is the same as in 1.6 and before. In a Unicode string, \xij acts the same as \u00ij i.e. it expands to the obvious Latin-1 character from the initial segment of the Unicode space. An \x not followed by at least two hex digits is a compile-time error, specifically ValueError in 8-bit strings, and UnicodeError (a subclass of ValueError) in Unicode strings. Note that if an \x is followed by more than two hex digits, only the first two are "consumed". In 1.6 and before all but the *last* two were silently ignored. Example In 1.5.2: >>> "\x123465" # same as "\x65" 'e' >>> "\x65" 'e' >>> "\x1" '\001' >>> "\x\x" '\\x\\x' >>> In 2.0: >>> "\x123465" # \x12 -> \022, "3456" left alone '\0223456' >>> "\x65" 'e' >>> "\x1" [ValueError is raised] >>> "\x\x" [ValueError is raised] >>> History and Rationale \x escapes were introduced in C as a way to specify variable-width character encodings. Exactly which encodings those were, and how many hex digits they required, was left up to each implementation. The language simply stated that \x "consumed" *all* hex digits following, and left the meaning up to each implementation. So, in effect, \x in C is a standard hook to supply platform-defined behavior. Because Python explicitly aims at platform independence, the \x escape in Python (up to and including 1.6) has been treated the same way across all platforms: all *except* the last two hex digits were silently ignored. So the only actual use for \x escapes in Python was to specify a single byte using hex notation. Larry Wall appears to have realized that this was the only real use for \x escapes in a platform-independent language, as the proposed rule for Python 2.0 is in fact what Perl has done from the start (although you need to run in Perl -w mode to get warned about \x escapes with fewer than 2 hex digits following -- it's clearly more Pythonic to insist on 2 all the time). When Unicode strings were introduced to Python, \x was generalized so as to ignore all but the last *four* hex digits in Unicode strings. This caused a technical difficulty for the new regular expression engine: SRE tries very hard to allow mixing 8-bit and Unicode patterns and strings in intuitive ways, and it no longer had any way to guess what, for example, r"\x123456" should mean as a pattern: is it asking to match the 8-bit character \x56 or the Unicode character \u3456? There are hacky ways to guess, but it doesn't end there. The ISO C99 standard also introduces 8-digit \U12345678 escapes to cover the entire ISO 10646 character space, and it's also desired that Python 2 support that from the start. But then what are \x escapes supposed to mean? Do they ignore all but the last *eight* hex digits then? And if less than 8 following in a Unicode string, all but the last 4? And if less than 4, all but the last 2? This was getting messier by the minute, and the proposal cuts the Gordian knot by making \x simpler instead of more complicated. Note that the 4-digit generalization to \xijkl in Unicode strings was also redundant, because it meant exactly the same thing as \uijkl in Unicode strings. It's more Pythonic to have just one obvious way to specify a Unicode character via hex notation. Development and Discussion The proposal was worked out among Guido van Rossum, Fredrik Lundh and Tim Peters in email. It was subsequently explained and disussed on Python-Dev under subject "Go \x yourself", starting 2000-08-03. Response was overwhelmingly positive; no objections were raised. Backward Compatibility Changing the meaning of \x escapes does carry risk of breaking existing code, although no instances of incompabitility have yet been discovered. The risk is believed to be minimal. Tim Peters verified that, except for pieces of the standard test suite deliberately provoking end cases, there are no instances of \xabcdef... with fewer or more than 2 hex digits following, in either the Python CVS development tree, or in assorted Python packages sitting on his machine. It's unlikely there are any with fewer than 2, because the Reference Manual implied they weren't legal (although this is debatable!). If there are any with more than 2, Guido is ready to argue they were buggy anyway <0.9 wink>. Guido reported that the O'Reilly Python books *already* document that Python works the proposed way, likely due to their Perl editing heritage (as above, Perl worked (very close to) the proposed way from its start). Finn Bock reported that what JPython does with \x escapes is unpredictable today. This proposal gives a clear meaning that can be consistently and easily implemented across all Python implementations. Effects on Other Tools Believed to be none. The candidates for breakage would mostly be parsing tools, but the author knows of none that worry about the internal structure of Python strings beyond the approximation "when there's a backslash, swallow the next character". Tim Peters checked python-mode.el, the std tokenize.py and pyclbr.py, and the IDLE syntax coloring subsystem, and believes there's no need to change any of them. Tools like tabnanny.py and checkappend.py inherit their immunity from tokenize.py. Reference Implementation The code changes are so simple that a separate patch will not be produced. Fredrik Lundh is writing the code, is an expert in the area, and will simply check the changes in before 2.0b1 is released. BDFL Pronouncements Yes, ValueError, not SyntaxError. "Problems with literal interpretations traditionally raise 'runtime' exceptions rather than syntax errors." Copyright This document has been placed in the public domain.

9 16

Re: [Python-checkins] CVS: python/dist/src/Include pyport.h,2.20,2.21
by Gregor Hoffleit 29 Jan '01

29 Jan '01

FYI: This misdefinition with LONG_BIT was due to a bug in glibc's limits.h. It has been fixed in glibc 2.96. Gregor On Wed, Oct 04, 2000 at 06:42:32PM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Include > In directory slayer.i.sourceforge.net:/tmp/cvs-serv5758/python/dist/src/Include > > Modified Files: > pyport.h > Log Message: > Move LONG_BIT from intobject.c to pyport.h. #error if it's already been > #define'd to an unreasonable value (several recent gcc systems have > misdefined it, causing bogus overflows in integer multiplication). Nuke > CHAR_BIT entirely.

3 2

chomp()?
by Guido van Rossum 16 Jan '01

16 Jan '01

Someone just posted a patch to implement s.chomp() as a string method: http://sourceforge.net/patch/?func=detailpatch&patch_id=103029&group_id=5470 Pseudo code (for those not aware of the Perl function by that name): def chomp(s): if s[-2:] == '\r\n': return s[:-2] if s[-1:] == '\r' or s[-1:] == '\n': return s[:-1] return s I.e. it removes a trailing \r\n, \r, or \n. Any comments? Is this needed given that we have s.rstrip() already? --Guido van Rossum (home page: http://www.python.org/~guido/)

8 9

RE: [Python-Dev] Fwd: try...else
by Tim Peters 03 Jan '01

03 Jan '01

[Robin Becker] > The 2.0 docs clearly state 'The optional else clause is executed when no > exception occurs in the try clause.' This makes it sound as though it > gets executed on the 'way out'. Of course. That's not what the docs meant, though, and Guido is not going to change the implementation now because that would break code that relies on how Python has always *worked* in these cases. The way Python works is also the way Guido intended it to work (I'm allowed to channel him when he's on vacation <0.9 wink)>. Indeed, that's why I suggested a specific doc change. If your friend would also be confused by that, then we still have a problem; else we don't.

3 5

curses in the core?
by Martin v. Loewis 03 Jan '01

03 Jan '01

> If curses is a core facility now, the default build should tread it > as one. ... > IMO ssl isn't an issue because it's not documented as being in the > standard module set. ... > 3. Documented as being in the core but not built in by default. > My more general claim is that the existence of class 3 is a problem In the case of curses, I believe there is a documentation error in the 2.0 documentation. The curses packages is listed under "Generic Operating System Services". I believe this is wrong, it should be listed as "Unix Specific Services". Unless I'm mistaken, the curses module is not available on the Mac and on Windows. With that change, the curses module would then fall into Eric's category 2 (Not documented as being in the core and not built in by default). That documentation change should be carried out even if curses is autoconfigured; autoconf is used on Unix only, either. Regards, Martin P.S. The "Python Library Reference" content page does not mention the word "core" at all, except as part of asyncore...

3 2

FAQ Horribly Out Of Date
by Moshe Zadka 02 Jan '01

02 Jan '01

Hi! The current FAQ is horribly out of date. I think the FAQ-Wizard method has proven itself not very efficient (for example, apparently no one noticed until now that it's not working <0.2 wink>). Is there any hope putting the FAQ in Misc/, having a script which scp's it to the SF page, and making that the official FAQ? On a related note, what is the current status of the PSA? Is it officially dead? -- Moshe Zadka <sig(a)zadka.site.co.il> This is a signature anti-virus. Please stop the spread of signature viruses!

5 8

Re: [Patch #103002] Fix for #116285: Properly raise UnicodeErrors
by Martin von Loewis 01 Jan '01

01 Jan '01

[resent since python.org ran out of disk space] > My only problem with it is your copyright notice. AFAIK, patches to > the Python core cannot contain copyright notices without proper > license information. OTOH, I don't think that these minor changes > really warrant adding a complete license paragraph. I'd like to get an "official" clarification on this question. Is it the case that patches containing copyright notices are only accepted if they are accompanied with license information? I agree that the changes are minor, I also believe that I hold the copyright to the changes whether I attach a notice or not (at least according to our local copyright law). What concerns me that without such a notice, gencodec.py looks as if CNRI holds the copyright to it. I'm not willing to assign the copyright of my changes to CNRI, and I'd like to avoid the impression of doing so. What is even more concerning is that CNRI also holds the copyright to the generated files, even though they are derived from information made available by the Unicode consortium! Regards, Martin

2 2

Most everything is busted
by Tim Peters 01 Jan '01

01 Jan '01

Add this error to the pot: """ http://www.python.org/cgi-bin/moinmoin Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /cgi-bin/moinmoin. Reason: Document contains no data ------------------------------------------------------------------- Apache/1.3.9 Server at www.python.org Port 80 """ Also, as far as I can tell: + news->mail for c.l.py hasn't delivered anything for well over 24 hours. + No mail to Python-Dev has showed up in the archives (let alone been delivered) since Fri, 29 Dec 2000 16:42:44 +0200 (IST). + The other Python mailing lists appear equally dead. time-for-a-new-year!-ly y'rs - tim

2 2

plz test bsddb using shared linkage
by Skip Montanaro 01 Jan '01

01 Jan '01

A bug was filed on SF contending that the default linkage for bsddb should be shared instead of static because some Linux systems ship multiple versions of libdb. Would those of you who can and do build bsddb (probably only unixoids of some variety) please give this simple test a try? Uncomment the *shared* line in Modules/Setup.config.in, re-run configure, build Python and then try: import bsddb db = bsddb.btopen("/tmp/dbtest.db", "c") db["1"] = "1" print db["1"] db.close() del db If this doesn't fail for anyone I'll check the change in and close the bug report, otherwise I'll add a(nother) comment to the bug report that *shared* breaks bsddb for others and close the bug report. Thx, Skip

2 1