Mailman 3 September 2000 - Python-Dev

XML runtime errors?
by Fredrik Lundh 05 Mar '01

05 Mar '01

stoopid question: why the heck is xmllib using "RuntimeError" to flag XML syntax errors? raise RuntimeError, 'Syntax error at line %d: %s' % (self.lineno, message) what's wrong with "SyntaxError"? </F>

3 3

[PEP 223] Change the Meaning of \x Escapes
by Tim Peters 15 Feb '01

15 Feb '01

An HTML version of the attached can be viewed at http://python.sourceforge.net/peps/pep-0223.html This will be adopted for 2.0 unless there's an uproar. Note that it *does* have potential for breaking existing code -- although no real-life instance of incompatibility has yet been reported. This is explained in detail in the PEP; check your code now. although-if-i-were-you-i-wouldn't-bother<0.5-wink>-ly y'rs - tim PEP: 223 Title: Change the Meaning of \x Escapes Version: $Revision: 1.4 $ Author: tpeters(a)beopen.com (Tim Peters) Status: Active Type: Standards Track Python-Version: 2.0 Created: 20-Aug-2000 Post-History: 23-Aug-2000 Abstract Change \x escapes, in both 8-bit and Unicode strings, to consume exactly the two hex digits following. The proposal views this as correcting an original design flaw, leading to clearer expression in all flavors of string, a cleaner Unicode story, better compatibility with Perl regular expressions, and with minimal risk to existing code. Syntax The syntax of \x escapes, in all flavors of non-raw strings, becomes \xhh where h is a hex digit (0-9, a-f, A-F). The exact syntax in 1.5.2 is not clearly specified in the Reference Manual; it says \xhh... implying "two or more" hex digits, but one-digit forms are also accepted by the 1.5.2 compiler, and a plain \x is "expanded" to itself (i.e., a backslash followed by the letter x). It's unclear whether the Reference Manual intended either of the 1-digit or 0-digit behaviors. Semantics In an 8-bit non-raw string, \xij expands to the character chr(int(ij, 16)) Note that this is the same as in 1.6 and before. In a Unicode string, \xij acts the same as \u00ij i.e. it expands to the obvious Latin-1 character from the initial segment of the Unicode space. An \x not followed by at least two hex digits is a compile-time error, specifically ValueError in 8-bit strings, and UnicodeError (a subclass of ValueError) in Unicode strings. Note that if an \x is followed by more than two hex digits, only the first two are "consumed". In 1.6 and before all but the *last* two were silently ignored. Example In 1.5.2: >>> "\x123465" # same as "\x65" 'e' >>> "\x65" 'e' >>> "\x1" '\001' >>> "\x\x" '\\x\\x' >>> In 2.0: >>> "\x123465" # \x12 -> \022, "3456" left alone '\0223456' >>> "\x65" 'e' >>> "\x1" [ValueError is raised] >>> "\x\x" [ValueError is raised] >>> History and Rationale \x escapes were introduced in C as a way to specify variable-width character encodings. Exactly which encodings those were, and how many hex digits they required, was left up to each implementation. The language simply stated that \x "consumed" *all* hex digits following, and left the meaning up to each implementation. So, in effect, \x in C is a standard hook to supply platform-defined behavior. Because Python explicitly aims at platform independence, the \x escape in Python (up to and including 1.6) has been treated the same way across all platforms: all *except* the last two hex digits were silently ignored. So the only actual use for \x escapes in Python was to specify a single byte using hex notation. Larry Wall appears to have realized that this was the only real use for \x escapes in a platform-independent language, as the proposed rule for Python 2.0 is in fact what Perl has done from the start (although you need to run in Perl -w mode to get warned about \x escapes with fewer than 2 hex digits following -- it's clearly more Pythonic to insist on 2 all the time). When Unicode strings were introduced to Python, \x was generalized so as to ignore all but the last *four* hex digits in Unicode strings. This caused a technical difficulty for the new regular expression engine: SRE tries very hard to allow mixing 8-bit and Unicode patterns and strings in intuitive ways, and it no longer had any way to guess what, for example, r"\x123456" should mean as a pattern: is it asking to match the 8-bit character \x56 or the Unicode character \u3456? There are hacky ways to guess, but it doesn't end there. The ISO C99 standard also introduces 8-digit \U12345678 escapes to cover the entire ISO 10646 character space, and it's also desired that Python 2 support that from the start. But then what are \x escapes supposed to mean? Do they ignore all but the last *eight* hex digits then? And if less than 8 following in a Unicode string, all but the last 4? And if less than 4, all but the last 2? This was getting messier by the minute, and the proposal cuts the Gordian knot by making \x simpler instead of more complicated. Note that the 4-digit generalization to \xijkl in Unicode strings was also redundant, because it meant exactly the same thing as \uijkl in Unicode strings. It's more Pythonic to have just one obvious way to specify a Unicode character via hex notation. Development and Discussion The proposal was worked out among Guido van Rossum, Fredrik Lundh and Tim Peters in email. It was subsequently explained and disussed on Python-Dev under subject "Go \x yourself", starting 2000-08-03. Response was overwhelmingly positive; no objections were raised. Backward Compatibility Changing the meaning of \x escapes does carry risk of breaking existing code, although no instances of incompabitility have yet been discovered. The risk is believed to be minimal. Tim Peters verified that, except for pieces of the standard test suite deliberately provoking end cases, there are no instances of \xabcdef... with fewer or more than 2 hex digits following, in either the Python CVS development tree, or in assorted Python packages sitting on his machine. It's unlikely there are any with fewer than 2, because the Reference Manual implied they weren't legal (although this is debatable!). If there are any with more than 2, Guido is ready to argue they were buggy anyway <0.9 wink>. Guido reported that the O'Reilly Python books *already* document that Python works the proposed way, likely due to their Perl editing heritage (as above, Perl worked (very close to) the proposed way from its start). Finn Bock reported that what JPython does with \x escapes is unpredictable today. This proposal gives a clear meaning that can be consistently and easily implemented across all Python implementations. Effects on Other Tools Believed to be none. The candidates for breakage would mostly be parsing tools, but the author knows of none that worry about the internal structure of Python strings beyond the approximation "when there's a backslash, swallow the next character". Tim Peters checked python-mode.el, the std tokenize.py and pyclbr.py, and the IDLE syntax coloring subsystem, and believes there's no need to change any of them. Tools like tabnanny.py and checkappend.py inherit their immunity from tokenize.py. Reference Implementation The code changes are so simple that a separate patch will not be produced. Fredrik Lundh is writing the code, is an expert in the area, and will simply check the changes in before 2.0b1 is released. BDFL Pronouncements Yes, ValueError, not SyntaxError. "Problems with literal interpretations traditionally raise 'runtime' exceptions rather than syntax errors." Copyright This document has been placed in the public domain.

9 16

Python 2.0b2 note for Windows developers
by Tim Peters 15 Oct '00

15 Oct '00

Since most Python users on Windows don't have any use for them, I trimmed the Python 2.0b2 installer by leaving out the debug-build .lib, .pyd, .exe and .dll files. If you want them, they're available in a separate zip archive; read the Windows Users notes at http://www.pythonlabs.com/products/python2.0/download_python2.0b2.html for info and a download link. If you don't already know *why* you might want them, trust me: you don't want them <wink>. they-don't-even-make-good-paperweights-ly y'rs - tim

3 5

Application for developer status
by Eric S. Raymond 09 Oct '00

09 Oct '00

May I have developer status on the SourceForge CVS, please? I maintain two standard-library modules (shlex and netrc) and have been involved with the development of several others (including Cmd, smtp, httplib, and multifile). My only immediate plan for what to do with developer access is to add the browser-launch capability previously discussed on this list. My general interest is in improving the standard class library, especially in the areas of Internet-protocol support (urllib, ftp, telnet, pop, imap, smtp, nntplib, etc.) and mini-language toolkits and frameworks (shlex. netrc, Cmd, ConfigParser). If the Internet-protocol support in the library were broken out as a development category, I would be willing to fill the patch-handler slot for it. -- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> See, when the GOVERNMENT spends money, it creates jobs; whereas when the money is left in the hands of TAXPAYERS, God only knows what they do with it. Bake it into pies, probably. Anything to avoid creating jobs. -- Dave Barry

8 16

RE: buffer overlow in PC/getpathp.c
by Jeremy Hylton 06 Oct '00

06 Oct '00

>I would be happy to! Although I am happy to report that I believe it >safe - I have been very careful of this from the time I wrote it. > >What is the process? How formal should it be? Not sure how formal it should be, but I would recommend you review uses of strcpy and convince yourself that the source string is never longer than the target buffer. I am not convinced. For example, in calculate_path(), char *pythonhome is initialized from an environment variable and thus has unknown length. Later it used in a strcpy(prefix, pythonhome), where prefix has a fixed length. This looks like a vulnerability than could be closed by using strncpy(prefix, pythonhome, MAXPATHLEN). The Unix version of this code had three or four vulnerabilities of this sort. So I imagine the Windows version has those too. I was imagining that the registry offered a whole new opportunity to provide unexpectedly long strings that could overflow buffers. Jeremy

2 2

Changes in semantics to str()?
by Guido van Rossum 02 Oct '00

02 Oct '00

When we changed floats to behave different on repr() than on str(), we briefly discussed changes to the container objects as well, but nothing came of it. Currently, str() of a tuple, list or dictionary is the same as repr() of those objects. This is not very consistent. For example, when we have a float like 1.1 which can't be represented exactly, str() yields "1.1" but repr() yields "1.1000000000000001". But if we place the same number in a list, it doesn't matter which function we use: we always get "[1.1000000000000001]". Below I have included changes to listobject.c, tupleobject.c and dictobject.c that fix this. The fixes change the print and str() callbacks for these objects to use PyObject_Str() on the contained items -- except if the item is a string or Unicode string. I made these exceptions because I don't like the idea of str(["abc"]) yielding [abc] -- I'm too used to the idea of seeing ['abc'] here. And str() of a Unicode object fails when it contains non-ASCII characters, so that's no good either -- it would break too much code. Is it too late to check this in? Another negative consequence would be that for user-defined or 3rd party extension objects that have different repr() and str(), like NumPy arrays, it might break some code -- but I think this is not very likely. --Guido van Rossum (home page: http://www.python.org/~guido/) *** dictobject.c 2000/09/01 23:29:27 2.65 --- dictobject.c 2000/09/30 16:03:04 *************** *** 594,599 **** --- 594,601 ---- register int i; register int any; register dictentry *ep; + PyObject *item; + int itemflags; i = Py_ReprEnter((PyObject*)mp); if (i != 0) { *************** *** 609,620 **** if (ep->me_value != NULL) { if (any++ > 0) fprintf(fp, ", "); ! if (PyObject_Print((PyObject *)ep->me_key, fp, 0)!=0) { Py_ReprLeave((PyObject*)mp); return -1; } fprintf(fp, ": "); ! if (PyObject_Print(ep->me_value, fp, 0) != 0) { Py_ReprLeave((PyObject*)mp); return -1; } --- 611,630 ---- if (ep->me_value != NULL) { if (any++ > 0) fprintf(fp, ", "); ! item = (PyObject *)ep->me_key; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags)!=0) { Py_ReprLeave((PyObject*)mp); return -1; } fprintf(fp, ": "); ! item = ep->me_value; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) { Py_ReprLeave((PyObject*)mp); return -1; } *************** *** 661,666 **** --- 671,722 ---- return v; } + static PyObject * + dict_str(dictobject *mp) + { + auto PyObject *v; + PyObject *sepa, *colon, *item, *repr; + register int i; + register int any; + register dictentry *ep; + + i = Py_ReprEnter((PyObject*)mp); + if (i != 0) { + if (i > 0) + return PyString_FromString("{...}"); + return NULL; + } + + v = PyString_FromString("{"); + sepa = PyString_FromString(", "); + colon = PyString_FromString(": "); + any = 0; + for (i = 0, ep = mp->ma_table; i < mp->ma_size && v; i++, ep++) { + if (ep->me_value != NULL) { + if (any++) + PyString_Concat(&v, sepa); + item = ep->me_key; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&v, repr); + PyString_Concat(&v, colon); + item = ep->me_value; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&v, repr); + } + } + PyString_ConcatAndDel(&v, PyString_FromString("}")); + Py_ReprLeave((PyObject*)mp); + Py_XDECREF(sepa); + Py_XDECREF(colon); + return v; + } + static int dict_length(dictobject *mp) { *************** *** 1193,1199 **** &dict_as_mapping, /*tp_as_mapping*/ 0, /* tp_hash */ 0, /* tp_call */ ! 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ --- 1249,1255 ---- &dict_as_mapping, /*tp_as_mapping*/ 0, /* tp_hash */ 0, /* tp_call */ ! (reprfunc)dict_str, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ Index: listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.88 diff -c -r2.88 listobject.c *** listobject.c 2000/09/26 05:46:01 2.88 --- listobject.c 2000/09/30 16:03:04 *************** *** 197,203 **** static int list_print(PyListObject *op, FILE *fp, int flags) { ! int i; i = Py_ReprEnter((PyObject*)op); if (i != 0) { --- 197,204 ---- static int list_print(PyListObject *op, FILE *fp, int flags) { ! int i, itemflags; ! PyObject *item; i = Py_ReprEnter((PyObject*)op); if (i != 0) { *************** *** 210,216 **** for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! if (PyObject_Print(op->ob_item[i], fp, 0) != 0) { Py_ReprLeave((PyObject *)op); return -1; } --- 211,221 ---- for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! item = op->ob_item[i]; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) { Py_ReprLeave((PyObject *)op); return -1; } *************** *** 245,250 **** --- 250,285 ---- return s; } + static PyObject * + list_str(PyListObject *v) + { + PyObject *s, *comma, *item, *repr; + int i; + + i = Py_ReprEnter((PyObject*)v); + if (i != 0) { + if (i > 0) + return PyString_FromString("[...]"); + return NULL; + } + s = PyString_FromString("["); + comma = PyString_FromString(", "); + for (i = 0; i < v->ob_size && s != NULL; i++) { + if (i > 0) + PyString_Concat(&s, comma); + item = v->ob_item[i]; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&s, repr); + } + Py_XDECREF(comma); + PyString_ConcatAndDel(&s, PyString_FromString("]")); + Py_ReprLeave((PyObject *)v); + return s; + } + static int list_compare(PyListObject *v, PyListObject *w) { *************** *** 1484,1490 **** 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 1519,1525 ---- 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)list_str, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ *************** *** 1561,1567 **** 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 1596,1602 ---- 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)list_str, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Index: tupleobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/tupleobject.c,v retrieving revision 2.46 diff -c -r2.46 tupleobject.c *** tupleobject.c 2000/09/15 07:32:39 2.46 --- tupleobject.c 2000/09/30 16:03:04 *************** *** 167,178 **** static int tupleprint(PyTupleObject *op, FILE *fp, int flags) { ! int i; fprintf(fp, "("); for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! if (PyObject_Print(op->ob_item[i], fp, 0) != 0) return -1; } if (op->ob_size == 1) --- 167,183 ---- static int tupleprint(PyTupleObject *op, FILE *fp, int flags) { ! int i, itemflags; ! PyObject *item; fprintf(fp, "("); for (i = 0; i < op->ob_size; i++) { if (i > 0) fprintf(fp, ", "); ! item = op->ob_item[i]; ! itemflags = flags; ! if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) ! itemflags = 0; ! if (PyObject_Print(item, fp, itemflags) != 0) return -1; } if (op->ob_size == 1) *************** *** 200,205 **** --- 205,234 ---- return s; } + static PyObject * + tuplestr(PyTupleObject *v) + { + PyObject *s, *comma, *item, *repr; + int i; + s = PyString_FromString("("); + comma = PyString_FromString(", "); + for (i = 0; i < v->ob_size && s != NULL; i++) { + if (i > 0) + PyString_Concat(&s, comma); + item = v->ob_item[i]; + if (item == NULL || PyString_Check(item) || PyUnicode_Check(item)) + repr = PyObject_Repr(item); + else + repr = PyObject_Str(item); + PyString_ConcatAndDel(&s, repr); + } + Py_DECREF(comma); + if (v->ob_size == 1) + PyString_ConcatAndDel(&s, PyString_FromString(",")); + PyString_ConcatAndDel(&s, PyString_FromString(")")); + return s; + } + static int tuplecompare(register PyTupleObject *v, register PyTupleObject *w) { *************** *** 412,418 **** 0, /*tp_as_mapping*/ (hashfunc)tuplehash, /*tp_hash*/ 0, /*tp_call*/ ! 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ --- 441,447 ---- 0, /*tp_as_mapping*/ (hashfunc)tuplehash, /*tp_hash*/ 0, /*tp_call*/ ! (reprfunc)tuplestr, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/

3 4

Patch to avoid conflict with older versions of Python.
by Mark Hammond 01 Oct '00

01 Oct '00

Hi all, I'd like some feedback on a patch assigned to me. It is designed to prevent Python extensions built for an earlier version of Python from crashing the new version. I haven't actually tested the patch, but I am sure it works as advertised (who is db31 anyway?). My question relates more to the "style" - the patch locates the new .pyd's address in memory, and parses through the MS PE/COFF format, locating the import table. If then scans the import table looking for Pythonxx.dll, and compares any found entries with the current version. Quite clever - a definite plus is that is should work for all old and future versions (of Python - dunno about Windows ;-) - but do we want this sort of code in Python? Is this sort of hack, however clever, going to some back and bite us? Second related question: if people like it, is this feature something we can squeeze in for 2.0? If there are no objections to any of this, I am happy to test it and check it in - but am not confident of doing so without some feedback. Thanks, Mark.

5 8

codecs question
by Martin von Loewis 30 Sep '00

30 Sep '00

> Unfortunately, I can't see what "encoding" I should use if I want > to read & write Unicode string objects to it. ;( (Marc-Andre, > please tell me I've missed something!) It depends on the output you want to have. One option would be s=codecs.lookup('unicode-escape')[3](sys.stdout) Then, s.write(u'\251') prints a string in Python quoting notation. Unfortunately, print >>s,u'\251' won't work, since print *first* tries to convert the argument to a string, and then prints the string onto the stream. > On the other hand, it's annoying that I can't create a file-object > that takes Unicode strings from "print", and doesn't seem intuitive. Since you are asking for a hack :-) How about having an additional letter of 'u' in the "mode" attribute of a file object? Then, print would be def print(stream,string): if type(string) == UnicodeType: if 'u' in stream.mode: stream.write(string) return stream.write(str(string)) The Stream readers and writers would then need to have a mode or 'ru' or 'wu', respectively. Any other protocol to signal unicode-awareness in a stream might do as well. Regards, Martin P.S. Is there some function to retrieve the UCN names from ucnhash.c?

3 4

codecs question
by Fred L. Drake, Jr. 29 Sep '00

29 Sep '00

Jeremy was just playing with the xml.sax package, and decided to print the string returned from parsing "û" (the copyright symbol). Sure enough, he got a traceback: >>> print u'\251' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) and asked me about it. I was a little surprised myself. First, that anyone would use "print" in a SAX handler to start with, and second, that it was so painful. Now, I can chalk this up to not using a reasonable stdout that understands that Unicode needs to be translated to Latin-1 given my font selection. So I looked at the codecs module to provide a usable output stream. The EncodedFile class provides a nice wrapper around another file object, and supports both encoding both ways. Unfortunately, I can't see what "encoding" I should use if I want to read & write Unicode string objects to it. ;( (Marc-Andre, please tell me I've missed something!) I also don't think I can use it with "print", extended or otherwise. The PRINT_ITEM opcode calls PyFile_WriteObject() with whatever it gets, so that's fine. Then it converts the object using PyObject_Str() or PyObject_Repr(). For Unicode objects, the tp_str handler attempts conversion to the default encoding ("ascii" in this case), and raises the traceback we see above. Perhaps a little extra work is needed in PyFile_WriteObject() to allow Unicode objects to pass through if the file is merely file-like, and let the next layer handle the conversion? This would probably break code, and therefore not be acceptable. On the other hand, it's annoying that I can't create a file-object that takes Unicode strings from "print", and doesn't seem intuitive. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member

2 1

WHOA!!! Screw up on my part: how do I undo this (Re: [Python-checkins] CVS: black - Imported sources)
by Trent Mick 29 Sep '00

29 Sep '00

I was playing with a different SourceForge project and I screwed up my CVSROOT (used Python's instead). Sorry SOrry! How do I undo this cleanly? I could 'cvs remove' the README.txt file but that would still leave the top-level 'black/' turd right? Do the SourceForge admin guys have to manually kill the 'black' directory in the repository? or-failing-that-can-my--pet-project-make-it-into-python-2.0-<weak-smile>-ly yours, Trent On Wed, Sep 27, 2000 at 12:06:06AM -0700, Trent Mick wrote: > Update of /cvsroot/python/black > In directory slayer.i.sourceforge.net:/tmp/cvs-serv20977 > > Log Message: > first import into CVS > > Status: > > Vendor Tag: vendor > Release Tags: start > > N black/README.txt > > No conflicts created by this import > > > ***** Bogus filespec: - > ***** Bogus filespec: Imported > ***** Bogus filespec: sources > > _______________________________________________ > Python-checkins mailing list > Python-checkins(a)python.org > http://www.python.org/mailman/listinfo/python-checkins -- Trent Mick TrentM(a)ActiveState.com

3 6