Mailman 3 March 2000 - Python-Dev

Windows and PyObject_NEW
by Vladimir.Marangozov＠inrialpes.fr March 27, 2000

March 27, 2000

For MarkH, Guido and the Windows experienced: I've been reading Jeffrey Richter's "Advanced Windows" last night in order to try understanding better why PyObject_NEW is implemented differently for Windows. Again, I feel uncomfortable with this, especially now, when I'm dealing with the memory aspect of Python's object constructors/desctrs. Some time ago, Guido elaborated on why PyObject_NEW uses malloc() on the user's side, before calling _PyObject_New (on Windows, cf. objimpl.h): [Guido] &… [View More]gt; I can explain the MS_COREDLL business: > > This is defined on Windows because the core is in a DLL. Since the > caller may be in another DLL, and each DLL (potentially) has a > different default allocator, and (in pre-Vladimir times) the > type-specific deallocator typically calls free(), we (Mark & I) > decided that the allocation should be done in the type-specific > allocator. We changed the PyObject_NEW() macro to call malloc() and > pass that into _PyObject_New() as a second argument. While I agree with this, from reading chapters 5-9 of (a French copy of) the book (translated backwards here): 5. Win32 Memory Architecture 6. Exploring Virtual Memory 7. Using Virtual Memory in Your Applications 8. Memory Mapped Files 9. Heaps I can't find any radical Windows specificities for memory management. On Windows, like the rest of the OSes, the (virtual & physical) memory allocated for a process is common and seem to be accessible from all DDLs involved in an executable. Things like page sharing, copy-on-write, private process mem, etc. are conceptually all the same on Windows and Unix. Now, the backwards binary compatibility argument aside (assuming that extensions get recompiled when a new Python version comes out), my concern is that with the introduction of PyObject_NEW *and* PyObject_DEL, there's no point in having separate implementations for Windows and Unix any more (or I'm really missing something and I fail to see what it is). User objects would be allocated *and* freed by the core DLL (at least the object headers). Even if several DLLs use different allocators, this shouldn't be a problem if what's obtained via PyObject_NEW is freed via PyObject_DEL. This Python memory would be allocated from the Python's core DLL regions/pages/heaps. And I believe that the memory allocated by the core DLL is accessible from the other DLL's of the process. (I haven't seen evidence on the opposite, but tell me if this is not true) I thought that maybe Windows malloc() uses different heaps for the different DLLs, but that's fine too, as long as the _NEW/_DEL symmetry is respected and all heaps are accessible from all DLLs (which seems to be the case...), but: In the beginning of Chapter 9, Heaps, I read the following: """ ...About Win32 heaps (compared to Win16 heaps)... * There is only one kind of heap (it doesn't have any particular name, like "local" or "global" on Win16, because it's unique) * Heaps are always local to a process. The contents of a process heap is not accessible from the threads of another process. A large number of Win16 applications use the global heap as a way of sharing data between processes; this change in the Win32 heaps is often a source of problems for porting Win16 applications to Win32. * One process can create several heaps in its addressing space and can manipulate them all. * A DLL does not have its own heap. It uses the heaps as part of the addressing space of the process. However, a DLL can create a heap in the addressing space of a process and reserve it for its own use. Since several 16-bit DLLs share data between processes by using the local heap of a DLL, this change is a source of problems when porting Win16 apps to Win32... """ This last paragraph confuses me. On one hand, it's stated that all heaps can be manipulated by the process, and OTOH, a DLL can reserve a heap for personal use within that process (implying the heap is r/w protected for the other DLLs ?!?). The rest of this chapter does not explain how this "private reservation" is or can be done, so some of you would probably want to chime in and explain this to me. Going back to PyObject_NEW, if it turns out that all heaps are accessible from all DLLs involved in the process, I would probably lobby for unifying the implementation of _PyObject_NEW/_New and _PyObject_DEL/_Del for Windows and Unix. Actually on Windows, object allocation does not depend on a central, Python core memory allocator. Therefore, with the patches I'm working on, changing the core allocator would work (would be changed for real) only for platforms other than Windows. Next, ff it's possible to unify the implementation, it would also be possible to expose and officialize in the C API a new function set: PyObject_New() and PyObject_Del() (without leading underscores) For now, due to the implementation difference on Windows, we're forced to use the macro versions PyObject_NEW/DEL. Clearly, please tell me what would be wrong on Windows if a) & b) & c): a) we have PyObject_New(), PyObject_Del() b) their implementation is platform independent (no MS_COREDLL diffs, we retain the non-Windows variant) c) they're both used systematically for all object types -- Vladimir MARANGOZOV | Vladimir.Marangozov(a)inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 [View Less]

4 4

Re: [Python-Dev] 1.6 job list
by Jack Jansen March 27, 2000

March 27, 2000

Recently, Moshe Zadka <moshez(a)math.huji.ac.il> said: > Here's a reason: there shouldn't be changes we'll retract later -- we > need to come up with the (more or less) right hierarchy the first time, > or we'll do a lot of work for nothing. I think I disagree here (hmm, it's probably better to say that I agree, but I agree on a tangent:-). I think we can be 100% sure that we're wrong the first time around, and we should plan for that. One of the reasons why were' wrong is … [View More]

1 0

[1.6]: UserList, Dict: Do we need a UserString class?
by pf＠artcom-gmbh.de March 27, 2000

March 27, 2000

String objects have grown methods since 1.5.2. So it makes sense to provide a class 'UserString' similar to 'UserList' and 'UserDict', so that there is a standard base class to inherit from, if someone has the desire to extend the string methods. What do you think? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)

1 0

1.6 job list
by Andrew Kuchling March 26, 2000

March 26, 2000

I've written up a list of things that need to get done before 1.6 is finished. This is my vision of what needs to be done, and doesn't have an official stamp of approval from GvR or anyone else. So it's very probably wrong. http://starship.python.net/crew/amk/python/1.6-jobs.html Here's the list formatted as text. The major outstanding things at the moment seem to be sre and Distutils; once they go in, you could probably release an alpha, because the other items are relatively minor. … [View More]

15 47

cPickle and cStringIO
by Ka-Ping Yee March 26, 2000

March 26, 2000

Are there any objections to including try: from cPickle import * except: pass in pickle and try: from cStringIO import * except: pass in StringIO? -- ?!ng "I'm not trying not to answer the question; i'm just not answering it." -- Lenore Snell

2 1

Q: repr.py vs. pprint.py
by Moshe Zadka March 25, 2000

March 25, 2000

Is there any reason to keep two seperate modules with simple-formatting functions? I think pprint is somewhat more sophisticated, but in the worst case, we can just dump them both in the same file (the only thing would be that pprint would export "repr", in addition to "saferepr" (among others). (Just bumped into this in my reorg suggestion) -- Moshe Zadka <mzadka(a)geocities.com>. http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com

1 0

voting numbers
by Greg Stein March 25, 2000

March 25, 2000

Hey... just thought I'd drop off a description of the "formal" mechanism that the ASF uses for voting since it has been seen here and there on this group :-) +1 "I'm all for it. Do it!" +0 "Seems cool and acceptable, but I can also live without it" -0 "Not sure this is the best thing to do, but I'm not against it." -1 "Veto. And <HERE> is my reasoning." Strictly speaking, there is no vetoing here, other than by Guido. For changes to Apache (as opposed to bug fixes), it depends on … [View More]

1 0

Re: Unicode character names
by Bill Tutt March 25, 2000

March 25, 2000

MAL wrote: >Andrew M. Kuchling" wrote: >> >> Paul Prescod writes: >>>The new \N escape interpolates named characters within strings. For >>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a >>>unicode smiley face at the end. >> >> Cute idea, and it certainly means you can avoid looking up Unicode >> numbers. (You can look up names instead. :) ) Note that this means the >> Unicode database is no longer … [View More]

5 7

Memory Management
by Asbahr, Jason March 24, 2000

March 24, 2000

Greetings! We're working on integrating our own memory manager into our project and the current challenge is figuring out how to make it play nice with Python (and SWIG). The approach we're currently taking is to patch 1.5.2 and augment the PyMem* macros to call external memory allocation functions that we provide. The idea is to easily allow the addition of third party memory management facilities to Python. Assuming 1) we get it working :-), and 2) we sync to the latest Python CVS and … [View More]

3 2

Unicode Patch Set 2000-03-24
by M.-A. Lemburg March 24, 2000

March 24, 2000

Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: … [View More] http://www.lemburg.com/python/ Only in CVS-Python/Doc/tools: anno-api.py diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/codecs.py Python+Unicode/Lib/codecs.py --- CVS-Python/Lib/codecs.py Thu Mar 23 23:58:41 2000 +++ Python+Unicode/Lib/codecs.py Fri Mar 17 23:51:01 2000 @@ -46,7 +46,7 @@ handling schemes by providing the errors argument. These string values are defined: - 'strict' - raise an error (or a subclass) + 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/output/test_unicode Python+Unicode/Lib/test/output/test_unicode --- CVS-Python/Lib/test/output/test_unicode Fri Mar 24 22:21:26 2000 +++ Python+Unicode/Lib/test/output/test_unicode Sat Mar 11 00:23:21 2000 @@ -1,5 +1,4 @@ test_unicode Testing Unicode comparisons... done. -Testing Unicode contains method... done. Testing Unicode formatting strings... done. Testing unicodedata module... done. diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Lib/test/test_unicode.py Python+Unicode/Lib/test/test_unicode.py --- CVS-Python/Lib/test/test_unicode.py Thu Mar 23 23:58:47 2000 +++ Python+Unicode/Lib/test/test_unicode.py Fri Mar 24 00:29:43 2000 @@ -293,3 +293,33 @@ assert unicodedata.combining(u'\u20e1') == 230 print 'done.' + +# Test builtin codecs +print 'Testing builtin codecs...', + +assert unicode('hello','ascii') == u'hello' +assert unicode('hello','utf-8') == u'hello' +assert unicode('hello','utf8') == u'hello' +assert unicode('hello','latin-1') == u'hello' + +assert u'hello'.encode('ascii') == 'hello' +assert u'hello'.encode('utf-8') == 'hello' +assert u'hello'.encode('utf8') == 'hello' +assert u'hello'.encode('utf-16-le') == 'h\000e\000l\000l\000o\000' +assert u'hello'.encode('utf-16-be') == '\000h\000e\000l\000l\000o' +assert u'hello'.encode('latin-1') == 'hello' + +u = u''.join(map(unichr, range(1024))) +for encoding in ('utf-8', 'utf-16', 'utf-16-le', 'utf-16-be', + 'raw_unicode_escape', 'unicode_escape', 'unicode_internal'): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(256))) +for encoding in ('latin-1',): + assert unicode(u.encode(encoding),encoding) == u + +u = u''.join(map(unichr, range(128))) +for encoding in ('ascii',): + assert unicode(u.encode(encoding),encoding) == u + +print 'done.' diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Misc/unicode.txt Python+Unicode/Misc/unicode.txt --- CVS-Python/Misc/unicode.txt Thu Mar 23 23:58:48 2000 +++ Python+Unicode/Misc/unicode.txt Fri Mar 24 22:29:35 2000 @@ -715,21 +715,126 @@ These markers are used by the PyArg_ParseTuple() APIs: - 'U': Check for Unicode object and return a pointer to it + "U": Check for Unicode object and return a pointer to it - 's': For Unicode objects: auto convert them to the <default encoding> + "s": For Unicode objects: auto convert them to the <default encoding> and return a pointer to the object's <defencstr> buffer. - 's#': Access to the Unicode object via the bf_getreadbuf buffer interface + "s#": Access to the Unicode object via the bf_getreadbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not the Unicode string length (this may be different depending on the Internal Format). - 't#': Access to the Unicode object via the bf_getcharbuf buffer interface + "t#": Access to the Unicode object via the bf_getcharbuf buffer interface (see Buffer Interface); note that the length relates to the buffer length, not necessarily to the Unicode string length (this may be different depending on the <default encoding>). + "es": + Takes two parameters: encoding (const char *) and + buffer (char **). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + On output, a buffer of the needed size is allocated and + returned through *buffer as NULL-terminated string. + The encoded may not contain embedded NULL characters. + The caller is responsible for free()ing the allocated *buffer + after usage. + + "es#": + Takes three parameters: encoding (const char *), + buffer (char **) and buffer_len (int *). + + The input object is first coerced to Unicode in the usual way + and then encoded into a string using the given encoding. + + If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer) + on input. Output is then copied to *buffer. + + If *buffer is NULL, a buffer of the needed size is + allocated and output copied into it. *buffer is then + updated to point to the allocated memory area. The caller + is responsible for free()ing *buffer after usage. + + In both cases *buffer_len is updated to the number of + characters written (excluding the trailing NULL-byte). + The output buffer is assured to be NULL-terminated. + +Examples: + +Using "es#" with auto-allocation: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + int buffer_len = 0; + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + free(buffer); + return str; + } + +Using "es" with auto-allocation returning a NULL-terminated string: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char *buffer = NULL; + + if (!PyArg_ParseTuple(args, "es:test_parser", + encoding, &buffer)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromString(buffer); + free(buffer); + return str; + } + +Using "es#" with a pre-allocated buffer: + + static PyObject * + test_parser(PyObject *self, + PyObject *args) + { + PyObject *str; + const char *encoding = "latin-1"; + char _buffer[10]; + char *buffer = _buffer; + int buffer_len = sizeof(_buffer); + + if (!PyArg_ParseTuple(args, "es#:test_parser", + encoding, &buffer, &buffer_len)) + return NULL; + if (!buffer) { + PyErr_SetString(PyExc_SystemError, + "buffer is NULL"); + return NULL; + } + str = PyString_FromStringAndSize(buffer, buffer_len); + return str; + } + File/Stream Output: ------------------- @@ -837,6 +942,7 @@ History of this Proposal: ------------------------- +1.3: Added new "es" and "es#" parser markers 1.2: Removed POD about codecs.open() 1.1: Added note about comparisons and hash values. Added note about case mapping algorithms. Changed stream codecs .read() and Only in CVS-Python/Objects: .#stringobject.c.2.59 Only in CVS-Python/Objects: stringobject.c.orig diff -u -rP -x *.o -x *.pyc -x Makefile -x *~ -x *.so -x add2lib -x pgen -x buildno -x config.* -x libpython* -x python -x Setup -x Setup.local -x Setup.thread -x hassignal -x Makefile.pre -x *.bak -x *.s -x DEADJOE -x Demo -x CVS CVS-Python/Python/getargs.c Python+Unicode/Python/getargs.c --- CVS-Python/Python/getargs.c Sat Mar 11 10:55:21 2000 +++ Python+Unicode/Python/getargs.c Fri Mar 24 20:22:26 2000 @@ -178,6 +178,8 @@ } else if (level != 0) ; /* Pass */ + else if (c == 'e') + ; /* Pass */ else if (isalpha(c)) max++; else if (c == '|') @@ -654,6 +656,122 @@ break; } + case 'e': /* encoded string */ + { + char **buffer; + const char *encoding; + PyObject *u, *s; + int size; + + /* Get 'e' parameter: the encoding name */ + encoding = (const char *)va_arg(*p_va, const char *); + if (encoding == NULL) + return "(encoding is NULL)"; + + /* Get 's' parameter: the output buffer to use */ + if (*format != 's') + return "(unkown parser marker combination)"; + buffer = (char **)va_arg(*p_va, char **); + format++; + if (buffer == NULL) + return "(buffer is NULL)"; + + /* Convert object to Unicode */ + u = PyUnicode_FromObject(arg); + if (u == NULL) + return "string, unicode or text buffer"; + + /* Encode object; use default error handling */ + s = PyUnicode_AsEncodedString(u, + encoding, + NULL); + Py_DECREF(u); + if (s == NULL) + return "(encoding failed)"; + if (!PyString_Check(s)) { + Py_DECREF(s); + return "(encoder failed to return a string)"; + } + size = PyString_GET_SIZE(s); + + /* Write output; output is guaranteed to be + 0-terminated */ + if (*format == '#') { + /* Using buffer length parameter '#': + + - if *buffer is NULL, a new buffer + of the needed size is allocated and + the data copied into it; *buffer is + updated to point to the new buffer; + the caller is responsible for + free()ing it after usage + + - if *buffer is not NULL, the data + is copied to *buffer; *buffer_len + has to be set to the size of the + buffer on input; buffer overflow is + signalled with an error; buffer has + to provide enough room for the + encoded string plus the trailing + 0-byte + + - in both cases, *buffer_len is + updated to the size of the buffer + /excluding/ the trailing 0-byte + + */ + int *buffer_len = va_arg(*p_va, int *); + + format++; + if (buffer_len == NULL) + return "(buffer_len is NULL)"; + if (*buffer == NULL) { + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + } else { + if (size + 1 > *buffer_len) { + Py_DECREF(s); + return "(buffer overflow)"; + } + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + *buffer_len = size; + } else { + /* Using a 0-terminated buffer: + + - the encoded string has to be + 0-terminated for this variant to + work; if it is not, an error raised + + - a new buffer of the needed size + is allocated and the data copied + into it; *buffer is updated to + point to the new buffer; the caller + is responsible for free()ing it + after usage + + */ + if (strlen(PyString_AS_STRING(s)) != size) + return "(encoded string without "\ + "NULL bytes)"; + *buffer = PyMem_NEW(char, size + 1); + if (*buffer == NULL) { + Py_DECREF(s); + return "(memory error)"; + } + memcpy(*buffer, + PyString_AS_STRING(s), + size + 1); + } + Py_DECREF(s); + break; + } + case 'S': /* string object */ { PyObject **p = va_arg(*p_va, PyObject **); [View Less]

1 1