From l.mastrodomenico at gmail.com Sun Jul 1 00:18:53 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 1 Jul 2007 00:18:53 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class Message-ID: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> Hi everyone, I have submitted a new PEP: http://www.python.org/dev/peps/pep-0368/ It starts from a Pete Shinners' suggestion and from the consideration that there are a lot of Python libraries that use image objects, but almost all of them have implemented their own image classes, incompatible with everyone else's (and often not very pythonic). The PEP tries to improve the situation by defining a standard image protocol: in practice this is a definition of how a minimal "image-like" object should look and act in Python. Its details are carefully chosen to allow existing image classes in Tkinter, PIL, wxPython and pygame to implement it without breaking backward compatibility with their existing user bases. It also proposes the inclusion in the standard library of a fast and efficient default implementation of the new protocol. The PEP is long and detailed, but it's not in any way meant to be a take-it-or-leave-it deal: I'm open to any change, even radical, to improve it. It isn't py3k-specific (and it has a low number), but I posted here anyway because IMHO the main question is if and how to include this in Python 3.0; then, if the PEP is accepted, I'll backport the new classes to Python 2.6. Any suggestion or criticism is welcome; I'll also solicit feedback from external libraries developers that might be interested in implementing the new protocol. Regards -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From robert.kern at gmail.com Sun Jul 1 00:33:07 2007 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 30 Jun 2007 17:33:07 -0500 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> Message-ID: <f66lnb$h1q$1@sea.gmane.org> Lino Mastrodomenico wrote: > Hi everyone, > > I have submitted a new PEP: > > http://www.python.org/dev/peps/pep-0368/ > > It starts from a Pete Shinners' suggestion and from the consideration > that there are a lot of Python libraries that use image objects, but > almost all of them have implemented their own image classes, > incompatible with everyone else's (and often not very pythonic). Could you build this on top of the new buffer protocol that we're working on? http://www.python.org/dev/peps/pep-3118/ Enabling this kind of data sharing is precisely what the new buffer type is intended for. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Jul 1 00:36:08 2007 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 30 Jun 2007 17:36:08 -0500 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <f66lnb$h1q$1@sea.gmane.org> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <f66lnb$h1q$1@sea.gmane.org> Message-ID: <f66lt0$h1q$2@sea.gmane.org> Robert Kern wrote: > Lino Mastrodomenico wrote: >> Hi everyone, >> >> I have submitted a new PEP: >> >> http://www.python.org/dev/peps/pep-0368/ >> >> It starts from a Pete Shinners' suggestion and from the consideration >> that there are a lot of Python libraries that use image objects, but >> almost all of them have implemented their own image classes, >> incompatible with everyone else's (and often not very pythonic). > > Could you build this on top of the new buffer protocol that we're working on? > > http://www.python.org/dev/peps/pep-3118/ Never mind. I found the reference in your PEP. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From l.mastrodomenico at gmail.com Sun Jul 1 03:00:29 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 1 Jul 2007 03:00:29 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> Message-ID: <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> Here's the full text of the PEP's current draft, so you can comment directly on it (thanks to Collin Winter for the suggestion): PEP: 368 Title: Standard image protocol and class Version: $Revision: 56133 $ Last-Modified: $Date: 2007-06-30 21:07:03 +0200 (sab, 30 giu 2007) $ Author: Lino Mastrodomenico <l.mastrodomenico at gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Jun-2007 Python-Version: 2.6, 3.0 Post-History: Abstract ======== The current situation of image storage and manipulation in the Python world is extremely fragmented: almost every library that uses image objects has implemented its own image class, incompatible with everyone else's and often not very pythonic. A basic RGB image class exists in the standard library (``Tkinter.PhotoImage``), but is pretty much unusable, and unused, for anything except Tkinter programming. This fragmentation not only takes up valuable space in the developers minds, but also makes the exchange of images between different libraries (needed in relatively common use cases) slower and more complex than it needs to be. This PEP proposes to improve the situation by defining a simple and pythonic image protocol/interface that can be hopefully accepted and implemented by existing image classes inside and outside the standard library *without breaking backward compatibility* with their existing user bases. In practice this is a definition of how a minimal *image-like* object should look and act (in a similar way to the ``read()`` and ``write()`` methods in *file-like* objects). The inclusion in the standard library of a class that provides basic image manipulation functionality and implements the new protocol is also proposed, together with a mixin class that helps adding support for the protocol to existing image classes. Rationale ========= A good way to have high quality modules ready for inclusion in the Python standard library is to simply wait for natural selection among competing external libraries to provide a clear winner with useful functionality and a big user base. Then the de-facto standard can be officially sanctioned by including it in the standard library. Unfortunately this approach hasn't worked well for the creation of a dominant image class in the Python world: almost every third-party library that requires an image object creates its own class incompatible with the ones from other libraries. This is a real problem because it's entirely reasonable for a program to create and manipulate an image using, e.g., PIL (the Python Imaging Library) and then display it using wxPython or pygame. But these libraries have different and incompatible image classes, and the usual solution is to manually "export" an image from the source to a (width, height, bytes_string) tuple and "import" it creating a new instance in the target format. This approach *works*, but is both uglier and slower than it needs to be. Another "solution" that has been sometimes used is the creation of specific adapters and/or converters from a class to another (e.g. PIL offers the ``ImageTk`` module for converting PIL images to a class compatible with the Tkinter one). But this approach doesn't scale well with the number of libraries involved and it's still annoying for the user: if I have a perfectly good image object why should I convert before passing it to the next method, why can't it simply accept my image as-is? The problem isn't by any stretch limited to the three mentioned libraries and has probably multiple causes, including two that IMO are very important to understand before solving it: * in today's computing world an image is a basic type not strictly tied to a specific domain. This is why there will never be a clear winner between the image classes from the three libraries mentioned above (PIL, wxPython and pygame): they cover different domains and don't really compete with each other; * the Python standard library has never provided a good image class that can be adopted or imitated by third part modules. ``Tkinter.PhotoImage`` provides basic RGB functionality, but it's by far the slowest and ugliest of the bunch and it can be instantiated only after the Tkinter root window has been created. This PEP tries to improve this situation in four ways: 1. It defines a simple and pythonic image protocol/interface (both on the Python and the C side) that can be hopefully accepted and implemented by existing image classes inside and outside the standard library *without breaking backward compatibility* with their existing user bases. 2. It proposes the inclusion in the standard library of three new classes: * ``ImageMixin`` provides almost everything necessary to implement the new protocol; its main purpose is to make as simple as possible to support this interface for existing libraries, in some cases as simple as adding it to the list of base classes and doing minor additions to the constructor. * ``Image`` is a subclass of ``ImageMixin`` and will add a constructor that can resize and/or convert an image between different pixel formats. This is intended to provide a fast and efficient default implementation of the new protocol. * ``ImageSize`` is a minor helper class. See below for details. 3. ``Tkinter.PhotoImage`` will implement the new protocol (mostly through the ``ImageMixin`` class) and all the Tkinter methods that can receive an image will be modified the accept any object that implements the interface. As an aside the author of this PEP will collaborate with the developers of the most common external libraries to achieve the same goal (supporting the protocol in their classes and accepting any class that implements it). 4. New ``PyImage_*`` functions will be added to the CPython C API: they implement the C side of the protocol and accept as first parameter **any** object that supports it, even if it isn't an instance of the ``Image``/``ImageMixin`` classes. The main effects for the end user will be a simplification of the interchange of images between different libraries (if everything goes well, any Python library will accept images from any other library) and the out-of-the-box availability of the new ``Image`` class. The new class is intended to cover simple but common use cases like cropping and/or resizing a photograph to the desired size and passing it an appropriate widget for displaying it on a window, or darkening a texture and passing it to a 3D library. The ``Image`` class is not intended to replace or compete with PIL, Pythonmagick or NumPy, even if it provides a (very small) subset of the functionality of these three libraries. In particular PIL offers very rich image manipulation features with *dozens* of classes, filters, transformations and file formats. The inclusion of PIL (or something similar) in the standard library may, or may not, be a worthy goal but it's completely outside the scope of this PEP. Specification ============= The ``imageop`` module is used as the *default* location for the new classes and objects because it has for a long time hosted functions that provided a somewhat similar functionality, but a new module may be created if preferred (e.g. a new "``image``" or "``media``" module; the latter may eventually include other multimedia classes). ``MODES`` is a new module level constant: it is a set of the pixel formats supported by the ``Image`` class. Any image object that implements the new protocol is guaranteed to be formatted in one of these modes, but libraries that accept images are allowed to support only a subset of them. These modes are in turn also available as module level constants (e.g. ``imageop.RGB``). The following table is a summary of the modes currently supported and their properties: ========= =============== ========= =========== ====================== Name Component Bits per Subsampling Valid names component intervals ========= =============== ========= =========== ====================== L l (lowercase L) 8 no full range L16 l 16 no full range L32 l 32 no full range LA l, a 8 no full range LA32 l, a 16 no full range RGB r, g, b 8 no full range RGB48 r, g, b 16 no full range RGBA r, g, b, a 8 no full range RGBA64 r, g, b, a 16 no full range YV12 y, cr, cb 8 1, 2, 2 16-235, 16-240, 16-240 JPEG_YV12 y, cr, cb 8 1, 2, 2 full range CMYK c, m, y, k 8 no full range CMYK64 c, m, y, k 16 no full range ========= =============== ========= =========== ====================== When the name of a mode ends with a number, it represents the average number of bits per pixel. All the other modes simply use a byte per component per pixel. No palette modes or modes with less than 8 bits per component are supported. Welcome to the 21st century. Here's a quick description of the modes and the rationale for their inclusion; there are four groups of modes: 1. **grayscale** (``L*`` modes): they are heavily used in scientific computing (those people may also need a very high dynamic range and precision, hence ``L32``, the only mode with 32 bits per component) and sometimes it can be useful to consider a single component of a color image as a grayscale image (this is used by the individual planes of the planar images, see ``YV12`` below); the name of the component (``'l'``, lowercase letter L) stands for luminance, the second optional component (``'a'``) is the alpha value and represents the opacity of the pixels: alpha = 0 means full transparency, alpha = 255/65535 represents a fully opaque pixel; 2. **RGB\* modes**: the garden variety color images. The optional alpha component has the same meaning as in grayscale modes; 3. **YCbCr**, a.k.a. YUV (``*YV12`` modes). These modes are planar (i.e. the values of all the pixel for each component are stored in a consecutive memory area, instead of the usual arrangement where all the components of a pixel reside in consecutive bytes) and use a 1, 2, 2 (a.k.a. 4:2:0) subsampling (i.e. each pixel has its own Y value, but the Cb and Cr components are shared between groups of 2x2 adjacent pixels) because this is the format that's by far the most common for YCbCr images. Please note that the V (Cr) plane is stored before the U (Cb) plane. ``YV12`` is commonly used for MPEG2 (including DVDs), MPEG4 (both ASP/DivX and AVC/H.264) and Theora video frames. Valid values for Y are in range(16, 236) (excluding 236), and valid values for Cb and Cr are in range(16, 241). ``JPEG_YV12`` is similar to ``YV12``, but the three components can have the full range of 256 values. It's the native format used by almost all JPEG/JFIF files and by MJPEG video frames. The "strangeness" of these two wrt all the other supported modes derives from the fact that they are widely used that way by a lot of existing libraries and applications; this is also the reason why they are included (and the fact that they can't losslessly converted to RGB because YCbCr is a bigger color space); the funny 4:2:0 planar arrangement of the pixel values is relatively easy to support because in most cases the three planes can be considered three separate grayscale images; 4. **CMYK\* modes** (cyan, magenta, yellow and black) are subtractive color modes, used for printing color images on dead trees. Professional designers love to pretend that they can't live without them, so here they are. Python API ---------- See the examples_ below. In Python 2.x, all the new classes defined here are new-style classes. Mode Objects '''''''''''' The mode objects offer a number of attributes and methods that can be used for implementing generic algorithms that work on different types of images: ``components`` The number of components per pixel (e.g. 4 for an RGBA image). ``component_names`` A tuple of strings; see the column "Component names" in the above table. ``bits_per_component`` 8, 16 or 32; see "Bits per component" in the above table. ``bytes_per_pixel`` ``components * bits_per_component // 8``, only available for non planar modes (see below). ``planar`` Boolean; ``True`` if the image components reside each in a separate plane. Currently this happens if and only if the mode uses subsampling. ``subsampling`` A tuple that for each component in the mode contains a tuple of two integers that represent the amount of downsampling in the horizontal and vertical direction, respectively. In practice it's ``((1, 1), (2, 2), (2, 2))`` for ``YV12`` and ``JPEG_YV12`` and ``((1, 1),) * components`` for everything else. ``x_divisor`` ``max(x for x, y in subsampling)``; the width of an image that uses this mode must be divisible for this value. ``y_divisor`` ``max(y for x, y in subsampling)``; the height of an image that uses this mode must be divisible for this value. ``intervals`` A tuple that for each component in the mode contains a tuple of two integers: the minimum and maximum valid value for the component. Its value is ``((16, 235), (16, 240), (16, 240))`` for ``YV12`` and ``((0, 2 ** bits_per_component - 1),) * components`` for everything else. ``get_length(iterable[integer]) -> int`` The parameter must be an iterable that contains two integers: the width and height of an image; it returns the number of bytes needed to store an image of these dimensions with this mode. Implementation detail: the modes are instances of a subclass of ``str`` and have a value equal to their name (e.g. ``imageop.RGB == 'RGB'``) except for ``L32`` that has value ``'I'``. This is only intended for backward compatibility with existing PIL users; new code that uses the image protocol proposed here should not rely on this detail. Image Protocol '''''''''''''' Any object that supports the image protocol must provide the following methods and attributes: ``mode`` The format and the arrangement of the pixels in this image; it's one of the constants in the ``MODES`` set. ``size`` An instance of the `ImageSize class`_; it's a named tuple of two integers: the width and the height of the image in pixels; both of them must be >= 1 and can also be accessed as the ``width`` and ``height`` attributes of ``size``. ``buffer`` A sequence of integers between 0 and 255; they are the actual bytes used for storing the image data (i.e. modifying their values affects the image pixels and vice versa); the data has a row-major/C-contiguous order without padding and without any special memory alignment, even when there are more than 8 bits per component. The only supported methods are ``__len__``, ``__getitem__``/``__setitem__`` (with both integers and slice indexes) and ``__iter__``; on the C side it implements the buffer protocol. This is a pretty low level interface to the image and the user is responsible for using the correct (native) byte order for modes with more than 8 bit per component and the correct value ranges for ``YV12`` images. A buffer may or may not keep a reference to its image, but it's still safe (if useless) to use the buffer even after the corresponding image has been destroyed by the garbage collector (this will require changes to the image class of wxPython and possibly other libraries). Implementation detail: this can be an ``array('B')``, a ``bytes()`` object or a specialized fixed-length type. ``info`` A ``dict`` object that can contain arbitrary metadata associated with the image (e.g. DPI, gamma, ICC profile, exposure time...); the interpretation of this data is beyond the scope of this PEP and probably depends on the library used to create and/or to save the image; if a method of the image returns a new image, it can copy or adapt metadata from its own ``info`` attribute (the ``ImageMixin`` implementation always creates a new image with an empty ``info`` dictionary). | ``bits_per_component`` | ``bytes_per_pixel`` | ``component_names`` | ``components`` | ``intervals`` | ``planar`` | ``subsampling`` Shortcuts for the corresponding ``mode.*`` attributes. ``map(function[, function...]) -> None`` For every pixel in the image, maps each component through the corresponding function. If only one function is passed, it is used repeatedly for each component. This method modifies the image **in place** and is usually very fast (most of the time the functions are called only a small number of times, possibly only once for simple functions without branches), but it imposes a number of restrictions on the function(s) passed: * it must accept a single integer argument and return a number (``map`` will round the result to the nearest integer and clip it to ``range(0, 2 ** bits_per_component)``, if necessary); * it must *not* try to intercept any ``BaseException``, ``Exception`` or any unknown subclass of ``Exception`` raised by any operation on the argument (implementations may try to optimize the speed by passing funny objects, so even a simple ``"if n == 10:"`` may raise an exception: simply ignore it, ``map`` will take care of it); catching any other exception is fine; * it should be side-effect free and its result should not depend on values (other than the argument) that may change during a single invocation of ``map``. | ``rotate90() -> image`` | ``rotate180() -> image`` | ``rotate270() -> image`` Return a copy of the image rotated 90, 180 or 270 degrees counterclockwise around its center. ``clip() -> None`` Saturates invalid component values in ``YV12`` images to the minimum or the maximum allowed (see ``mode.intervals``), for other image modes this method does nothing, very fast; libraries that save/export ``YV12`` images are encouraged to always call this method, since intermediate operations (e.g. the ``map`` method) may assign to pixels values outside the valid intervals. ``split() -> tuple[image]`` Returns a tuple of ``L``, ``L16`` or ``L32`` images corresponding to the individual components in the image. Planar images also supports attributes with the same names defined in ``component_names``: they contain grayscale (mode ``L``) images that offer a view on the pixel values for the corresponding component; any change to the subimages is immediately reflected on the parent image and vice versa (their buffers refer to the same memory location). Non-planar images offer the following additional methods: ``pixels() -> iterator[pixel]`` Returns an iterator that iterates over all the pixels in the image, starting from the top line and scanning each line from left to right. See below for a description of the `pixel objects`_. ``__iter__() -> iterator[line]`` Returns an iterator that iterates over all the lines in the image, from top to bottom. See below for a description of the `line objects`_. ``__len__() -> int`` Returns the number of lines in the image (``size.height``). ``__getitem__(integer) -> line`` Returns the line at the specified (y) position. ``__getitem__(tuple[integer]) -> pixel`` The parameter must be a tuple of two integers; they are interpreted respectively as x and y coordinates in the image (0, 0 is the top left corner) and a pixel object is returned. ``__getitem__(slice | tuple[integer | slice]) -> image`` The parameter must be a slice or a tuple that contains two slices or an integer and a slice; the selected area of the image is copied and a new image is returned; ``image[x:y:z]`` is equivalent to ``image[:, x:y:z]``. ``__setitem__(tuple[integer], integer | iterable[integer]) -> None`` Modifies the pixel at specified position; ``image[x, y] = integer`` is a shortcut for ``image[x, y] = (integer,)`` for images with a single component. ``__setitem__(slice | tuple[integer | slice], image) -> None`` Selects an area in the same way as the corresponding form of the ``__getitem__`` method and assigns to it a copy of the pixels from the image in the second argument, that must have exactly the same mode as this image and the same size as the specified area; the alpha component, if present, is simply copied and doesn't affect the other components of the image (i.e. no alpha compositing is performed). The ``mode``, ``size`` and ``buffer`` (including the address in memory of the ``buffer``) never change after an image is created. It is expected that, if PEP 3118 is accepted, all the image objects will support the new buffer protocol, however this is beyond the scope of this PEP. ``Image`` and ``ImageMixin`` Classes '''''''''''''''''''''''''''''''''''' The ``ImageMixin`` class implements all the methods and attributes described above except ``mode``, ``size``, ``buffer`` and ``info``. ``Image`` is a subclass of ``ImageMixin`` that adds support for these four attributes and offers the following constructor (please note that the constructor is not part of the image protocol): ``__init__(mode, size, color, source)`` ``mode`` must be one of the constants in the ``MODES`` set, ``size`` is a sequence of two integers (width and height of the new image); ``color`` is a sequence of integers, one for each component of the image, used to initialize all the pixels to the same value; ``source`` can be a sequence of integers of the appropriate size and format that is copied as-is in the buffer of the new image or an existing image; in Python 2.x ``source`` can also be an instance of ``str`` and is interpreted as a sequence of bytes. ``color`` and ``source`` are mutually exclusive and if they are both omitted the image is initialized to transparent black (all the bytes in the buffer have value 16 in the ``YV12`` mode, 255 in the ``CMYK*`` modes and 0 for everything else). If ``source`` is present and is an image, ``mode`` and/or ``size`` can be omitted; if they are specified and are different from the source mode and/or size, the source image is converted. The exact algorithms used for resizing and doing color space conversions may differ between Python versions and implementations, but they always give high quality results (e.g.: a cubic spline interpolation can be used for upsampling and an antialias filter can be used for downsampling images); any combination of mode conversion is supported, but the algorithm used for conversions to and from the ``CMYK*`` modes is pretty na?ve: if you have the exact color profiles of your devices you may want to use a good color management tool such as LittleCMS. The new image has an empty ``info`` ``dict``. Line Objects '''''''''''' The line objects (returned, e.g., when iterating over an image) support the following attributes and methods: ``mode`` The mode of the image from where this line comes. ``__iter__() -> iterator[pixel]`` Returns an iterator that iterates over all the pixels in the line, from left to right. See below for a description of the `pixel objects`_. ``__len__() -> int`` Returns the number of pixels in the line (the image width). ``__getitem__(integer) -> pixel`` Returns the pixel at the specified (x) position. ``__getitem__(slice) -> image`` The selected part of the line is copied and a new image is returned; the new image will always have height 1. ``__setitem__(integer, integer | iterable[integer]) -> None`` Modifies the pixel at the specified position; ``line[x] = integer`` is a shortcut for ``line[x] = (integer,)`` for images with a single component. ``__setitem__(slice, image) -> None`` Selects a part of the line and assigns to it a copy of the pixels from the image in the second argument, that must have height 1, a width equal to the specified slice and the same mode as this line; the alpha component, if present, is simply copied and doesn't affect the other components of the image (i.e. no alpha compositing is performed). Pixel Objects ''''''''''''' The pixel objects (returned, e.g., when iterating over a line) support the following attributes and methods: ``mode`` The mode of the image from where this pixel comes. ``value`` A tuple of integers, one for each component. Any iterable of the correct length can be assigned to ``value`` (it will be automagically converted to a tuple), but you can't assign to it an integer, even if the mode has only a single component: use, e.g., ``pixel.l = 123`` instead. ``r, g, b, a, l, c, m, y, k`` The integer values of each component; only those applicable for the current mode (in ``mode.component_names``) will be available. | ``__iter__() -> iterator[int]`` | ``__len__() -> int`` | ``__getitem__(integer | slice) -> int | tuple[int]`` | ``__setitem__(integer | slice, integer | iterable[integer]) -> None`` These four methods emulate a fixed length list of integers, one for each pixel component. ``ImageSize`` Class ''''''''''''''''''' ``ImageSize`` is a named tuple, a class identical to ``tuple`` except that: * its constructor only accepts two integers, width and height; they are converted in the constructor using their ``__index__()`` methods, so all the ``ImageSize`` objects are guaranteed to contain only ``int`` (or possibly ``long``, in Python 2.x) instances; * it has a ``width`` and a ``height`` property that are equivalent to the first and the second number in the tuple, respectively; * the string returned by its ``__repr__`` method is ``'imageop.ImageSize(width=%d, height=%d)' % (width, height)``. ``ImageSize`` is not usually instantiated by end-users, but can be used when creating a new class that implements the image protocol, since the ``size`` attribute must be an ``ImageSize`` instance. C API ----- The available image modes are visible at the C level as ``PyImage_*`` constants of type ``PyObject *`` (e.g.: ``PyImage_RGB`` is ``imageop.RGB``). The following functions offer a C-friendly interface to mode and image objects (all the functions return ``NULL`` or -1 on failure): ``int PyImageMode_Check(PyObject *obj)`` Returns true if the object ``obj`` is a valid image mode. | ``int PyImageMode_GetComponents(PyObject *mode)`` | ``PyObject* PyImageMode_GetComponentNames(PyObject *mode)`` | ``int PyImageMode_GetBitsPerComponent(PyObject *mode)`` | ``int PyImageMode_GetBytesPerPixel(PyObject *mode)`` | ``int PyImageMode_GetPlanar(PyObject *mode)`` | ``PyObject* PyImageMode_GetSubsampling(PyObject *mode)`` | ``int PyImageMode_GetXDivisor(PyObject *mode)`` | ``int PyImageMode_GetYDivisor(PyObject *mode)`` | ``Py_ssize_t PyImageMode_GetLength(PyObject *mode, Py_ssize_t width, Py_ssize_t height)`` These functions are equivalent to their corresponding Python attributes or methods. ``int PyImage_Check(PyObject *obj)`` Returns true if the object ``obj`` is an ``Image`` object or an instance of a subtype of the ``Image`` type; see also ``PyObject_CheckImage`` below. ``int PyImage_CheckExact(PyObject *obj)`` Returns true if the object ``obj`` is an ``Image`` object, but not an instance of a subtype of the ``Image`` type. | ``PyObject* PyImage_New(PyObject *mode, Py_ssize_t width, Py_ssize_t height)`` Returns a new ``Image`` instance, initialized to transparent black (see ``Image.__init__`` above for the details). | ``PyObject* PyImage_FromImage(PyObject *image, PyObject *mode, Py_ssize_t width, Py_ssize_t height)`` Returns a new ``Image`` instance, initialized with the contents of the ``image`` object rescaled and converted to the specified ``mode``, if necessary. | ``PyObject* PyImage_FromBuffer(PyObject *buffer, PyObject *mode, Py_ssize_t width, Py_ssize_t height)`` Returns a new ``Image`` instance, initialized with the contents of the ``buffer`` object. ``int PyObject_CheckImage(PyObject *obj)`` Returns true if the object ``obj`` implements a sufficient subset of the image protocol to be accepted by the functions defined below, even if its class is not a subclass of ``ImageMixin`` and/or ``Image``. Currently it simply checks for the existence and correctness of the attributes ``mode``, ``size`` and ``buffer``. | ``PyObject* PyImage_GetMode(PyObject *image)`` | ``Py_ssize_t PyImage_GetWidth(PyObject *image)`` | ``Py_ssize_t PyImage_GetHeight(PyObject *image)`` | ``int PyImage_Clip(PyObject *image)`` | ``PyObject* PyImage_Split(PyObject *image)`` | ``PyObject* PyImage_GetBuffer(PyObject *image)`` | ``int PyImage_AsBuffer(PyObject *image, const void **buffer, Py_ssize_t *buffer_len)`` These functions are equivalent to their corresponding Python attributes or methods; the image memory can be accessed only with the GIL and a reference to the image or its buffer held, and extra care should be taken for modes with more than 8 bits per component: the data is stored in native byte order and it can be **not** aligned on 2 or 4 byte boundaries. Examples ======== A few examples of common operations with the new ``Image`` class and protocol:: # create a new black RGB image of 6x9 pixels rgb_image = imageop.Image(imageop.RGB, (6, 9)) # same as above, but initialize the image to bright red rgb_image = imageop.Image(imageop.RGB, (6, 9), color=(255, 0, 0)) # convert the image to YCbCr yuv_image = imageop.Image(imageop.JPEG_YV12, source=rgb_image) # read the value of a pixel and split it into three ints r, g, b = rgb_image[x, y] # modify the magenta component of a pixel in a CMYK image cmyk_image[x, y].m = 13 # modify the Y (luma) component of a pixel in a *YV12 image and # its corresponding subsampled Cr (red chroma) yuv_image.y[x, y] = 42 yuv_image.cr[x // 2, y // 2] = 54 # iterate over an image for line in rgb_image: for pixel in line: # swap red and blue, and set green to 0 pixel.value = pixel.b, 0, pixel.r # find the maximum value of the red component in the image max_red = max(pixel.r for pixel in rgb_image.pixels()) # count the number of colors in the image num_of_colors = len(set(tuple(pixel) for pixel in image.pixels())) # copy a block of 4x2 pixels near the upper right corner of an # image and paste it into the lower left corner of the same image image[:4, -2:] = image[-6:-2, 1:3] # create a copy of the image, except that the new image can have a # different (usually empty) info dict new_image = image[:] # create a mirrored copy of the image, with the left and right # sides flipped flipped_image = image[::-1, :] # downsample an image to half its original size using a fast, low # quality operation and a slower, high quality one: low_quality_image = image[::2, ::2] new_size = image.size.width // 2, image.size.height // 2 high_quality_image = imageop.Image(size=new_size, source=image) # direct buffer access rgb_image[0, 0] = r, g, b assert tuple(rgb_image.buffer[:3]) == (r, g, b) Backwards Compatibility ======================= There are three areas touched by this PEP where backwards compatibility should be considered: * **Python 2.6**: new classes and objects are added to the ``imageop`` module without touching the existing module contents; new methods and attributes will be added to ``Tkinter.PhotoImage`` and its ``__getitem__`` and ``__setitem__`` methods will be modified to accept integers, tuples and slices (currently they only accept strings). All the changes provide a superset of the existing functionality, so no major compatibility issues are expected. * **Python 3.0**: the legacy contents of the ``imageop`` module will be deleted, according to PEP 3108; everything defined in this proposal will work like in Python 2.x with the exception of the usual 2.x/3.0 differences (e.g. support for ``long`` integers and for interpreting ``str`` instances as sequences of bytes will be dropped). * **external libraries**: the names and the semantics of the standard image methods and attributes are carefully chosen to allow some external libraries that manipulate images (including at least PIL, wxPython and pygame) to implement the new protocol in their image classes without breaking compatibility with existing code. The only blatant conflicts between the image protocol and NumPy arrays are the value of the ``size`` attribute and the coordinates order in the ``image[x, y]`` expression. Reference Implementation ======================== If this PEP is accepted, the author will provide a reference implementation of the new classes in pure Python (that can run in CPython, PyPy, Jython and IronPython) and a second one optimized for speed in Python and C, suitable for inclusion in the CPython standard library. The author will also submit the required Tkinter patches. For all the code will be available a version for Python 2.x and a version for Python 3.0 (it is expected that the two version will be very similar and the Python 3.0 one will probably be generated almost completely automatically). Acknowledgments =============== The implementation of this PEP, if accepted, is sponsored by Google through the Google Summer of Code program. Copyright ========= This document has been placed in the public domain. -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From bjourne at gmail.com Sun Jul 1 14:34:03 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Sun, 1 Jul 2007 14:34:03 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> Message-ID: <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> Cool PEP! I really love the API for the Image class. A standard Image class would be a useful addition to the standard library. But I cannot see how it would solve the problem with to many image classes. The reason why PIL, PyGame and wxPython has different image classes is because each of them use different C functions for manipulating said image classes. These differences bubble up through the bindings and results in PIL exposing an Image, PyGame a Surface and wxPython a wxImage. The result is that if you want to use a PIL Image in say PyGame, you still need to convert it. If PIL stores RGB images with 32 bpp and PyGame uses 24, then you'll have to convert it to get it into the proper format. The only way to get compatibility between the libraries is to create an image library in C _and_ get those libraries to start using it. -- mvh Bj?rn From stargaming at gmail.com Sun Jul 1 18:01:12 2007 From: stargaming at gmail.com (Stargaming) Date: Sun, 01 Jul 2007 18:01:12 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> Message-ID: <f68j4d$o30$1@sea.gmane.org> BJ?rn Lindqvist schrieb: > Cool PEP! I really love the API for the Image class. A standard Image > class would be a useful addition to the standard library. > > But I cannot see how it would solve the problem with to many image > classes. The reason why PIL, PyGame and wxPython has different image > classes is because each of them use different C functions for > manipulating said image classes. These differences bubble up through > the bindings and results in PIL exposing an Image, PyGame a Surface > and wxPython a wxImage. The result is that if you want to use a PIL > Image in say PyGame, you still need to convert it. If PIL stores RGB > images with 32 bpp and PyGame uses 24, then you'll have to convert it > to get it into the proper format. > > The only way to get compatibility between the libraries is to create > an image library in C _and_ get those libraries to start using it. > They'll all quack the same way. (This is paraphrased in the PEP's abstract, as far as I read it.) From martin at v.loewis.de Sun Jul 1 18:55:42 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 01 Jul 2007 18:55:42 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <f68j4d$o30$1@sea.gmane.org> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> <f68j4d$o30$1@sea.gmane.org> Message-ID: <4687DC8E.6010109@v.loewis.de> >> The only way to get compatibility between the libraries is to create >> an image library in C _and_ get those libraries to start using it. >> > > They'll all quack the same way. (This is paraphrased in the PEP's > abstract, as far as I read it.) To the Python side, yes. But to the underlying C library, some quack, some bark. How would you pass a Tkinter.PhotoImage to wxPython if both supported the PEP? wxPython would likely be able to produce objects that provide the Image interface, but I can't see how wxPython could consume such a thing - the underlying C libraries surely expect something completely different. The only way I can see this work is if each library imports Image objects by copying them, pixel for pixel, through this interface. Regards, Martin From l.mastrodomenico at gmail.com Sun Jul 1 18:59:09 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Sun, 1 Jul 2007 18:59:09 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> Message-ID: <cc93256f0707010959o44c77912sb989c68cf890b846@mail.gmail.com> 2007/7/1, BJ?rn Lindqvist <bjourne at gmail.com>: > But I cannot see how it would solve the problem with to many image > classes. The reason why PIL, PyGame and wxPython has different image > classes is because each of them use different C functions for > manipulating said image classes. These differences bubble up through > the bindings and results in PIL exposing an Image, PyGame a Surface > and wxPython a wxImage. The result is that if you want to use a PIL > Image in say PyGame, you still need to convert it. Actually, this is not always true. :-) For example it's entirely possible to have the *same* python RGBA image considered as a SDL_Surface by SDL (the underlying library used by pygame), as an ImagingMemoryInstance by the PIL C library and have its buffer directly accepted by the OpenGL function glTexImage2D (with a bit of care in the order of the corners passed to glTexCoord2f), independently by who created the image in the first place. This works because most C/C++ libraries give the possibility of creating a native image struct/class using an existing memory buffer (without copying it) and they support at least a subset of the modes currently defined, with the exact byte order, padding, etc, specified in the PEP (usually L and at least one of RGB or RGBA). But you are right, the particular format specified in the PEP is not always supported by existing the libraries, even when they support that particular mode. Sometimes this can be fixed (e.g. PIL currently uses by default 4 bytes per pixel for RGB images and has only experimental support for 3 bytes per pixel, but its C library is written by the same people that maintain the Python bindings, so they can change it if they want) and sometimes it cannot be easily fixed (e.g. a wxImage class will happily accept a RGB buffer as defined by the PEP, but it has a funny memory arrangement for RGBA images that is completely incompatible). So I expect that each Python library that jumps on the PEP bandwagon will have three levels of support for the modes listed: 1) no support at all (e.g. most 3D libraries will probably never accept CMYK images as textures); the user can explicitly convert the image using "new_image = Image(new_mode, source=old_image)"; 2) limited support: they support a particular mode, but cannot directly use the standard memory arrangement, so when they receive an alien image object they convert it on the fly to their preferred byte order and they do the reverse operation when a foreign library tries to access the buffer property of their images (they may offer a read-only buffer); this is not ideal, but it's better than the current situation because it's transparent to the user and it requires only a single memory copy/conversion instead of the two usually performed by the current tostring/fromstring dance; 3) full support: no conversion or memory copy ever necessary for the exchange of images between two libraries if they both have full support for a particular mode. Of course the Image class that I'm writing and that I hope will be included in the stdlib, will have full support for all the modes. Please note that the conversions in "2)" above can be avoided in some (most?) cases if PEP 3118 is accepted, because it will become possible to expose and discover the "native" memory arrangement of an image without accessing its buffer property (that, in my vision, will always offer the "standard" arrangement defined in the PEP, to simplify things for libraries that prefer a simpler interface, even if it may be slightly less efficient in some, hopefully rare, cases). -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From alexandre at peadrop.com Mon Jul 2 19:46:28 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 2 Jul 2007 13:46:28 -0400 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> Message-ID: <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> If StringIO is not allowed to over-seek, what should happen to the current file position when it is truncated? >>> s = StringIO("Hello world!") >>> s.seek(0, 2) >>> s.truncate(2) >>> s.tell() ??? Truncating can either set the position to the new string size, or it leaves it alone. -- Alexandre From guido at python.org Mon Jul 2 20:38:54 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jul 2007 11:38:54 -0700 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> Message-ID: <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> Honestly, I think truncate() should always set the current position to the new size, even though that's not what it currently does. Or at least it should set it to the new size if that's less than the current position. What's the rationale (apart from "Unix defined it so") why it currently leaves the position unchanged? At least I think it's fine if StringIO does it this way. I think TextIOWrapper should also do it this way, as it has the same issue (writing null bytes is not defined for encoded files). --Guido On 7/2/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > If StringIO is not allowed to over-seek, what should happen to the > current file position when it is truncated? > > >>> s = StringIO("Hello world!") > >>> s.seek(0, 2) > >>> s.truncate(2) > >>> s.tell() > ??? > > Truncating can either set the position to the new string size, or it > leaves it alone. > > -- Alexandre > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Mon Jul 2 20:59:28 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 2 Jul 2007 14:59:28 -0400 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> Message-ID: <acd65fa20707021159p4291451ycebaf3c6ac51a438@mail.gmail.com> On 7/2/07, Guido van Rossum <guido at python.org> wrote: > Honestly, I think truncate() should always set the current position to > the new size, even though that's not what it currently does. Or at > least it should set it to the new size if that's less than the current > position. What's the rationale (apart from "Unix defined it so") why > it currently leaves the position unchanged? No idea. I just know that truncate in the old StringIO module do set the position to the new size if the new size is less than the current position. And that is how I implemented it in _bytes_io and _string_io. From rasky at develer.com Tue Jul 3 00:51:41 2007 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 03 Jul 2007 00:51:41 +0200 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <20070630205444.GD22221@theory.org> References: <20070630205444.GD22221@theory.org> Message-ID: <f6bvhu$9l3$1@sea.gmane.org> On 30/06/2007 22.54, Matt Chisholm wrote: > I've created and submitted a new PEP proposing support for labels in > Python's break and continue statements. Georg Brandl has graciously > added it to the PEP list as PEP 3136: > > http://www.python.org/dev/peps/pep-3136/ > > I understand that the deadline for submitting features for Python 3.0 > has passed, so this PEP targets Python 3.1. I also expect that people > might not want to take time off from the Python 3.0 effort to discuss > features that are even further off in the future. > > Thanks for your time, and thanks for letting me contribute an idea to > Python. I didn't see one simple alternative listed: move everything within a function: def func(): for a in a_list: for b in b_list: if condition1(a, b): return [...] if condition2(a, b): break func() -- Giovanni Bajo From greg.ewing at canterbury.ac.nz Tue Jul 3 01:35:25 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 03 Jul 2007 11:35:25 +1200 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <f6bvhu$9l3$1@sea.gmane.org> References: <20070630205444.GD22221@theory.org> <f6bvhu$9l3$1@sea.gmane.org> Message-ID: <46898BBD.1040901@canterbury.ac.nz> On 30/06/2007 22.54, Matt Chisholm wrote: > I've created and submitted a new PEP proposing support for labels in > Python's break and continue statements. > > http://www.python.org/dev/peps/pep-3136/ -1. Confusing nested loops are best broken out into separate functions rather than patching over the problem with features like this. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From ntoronto at cs.byu.edu Tue Jul 3 09:17:05 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 03 Jul 2007 01:17:05 -0600 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <46898BBD.1040901@canterbury.ac.nz> References: <20070630205444.GD22221@theory.org> <f6bvhu$9l3$1@sea.gmane.org> <46898BBD.1040901@canterbury.ac.nz> Message-ID: <4689F7F1.1070503@cs.byu.edu> Greg Ewing wrote: > On 30/06/2007 22.54, Matt Chisholm wrote: > > >> I've created and submitted a new PEP proposing support for labels in >> Python's break and continue statements. >> >> http://www.python.org/dev/peps/pep-3136/ >> > > -1. Confusing nested loops are best broken out into > separate functions rather than patching over the > problem with features like this. > +1 (not that my vote really counts for much). Breaking logic out into separate functions can obscure the meaning of an algorithm that is most naturally implemented with nested loops. Neil From edin.salkovic at gmail.com Tue Jul 3 10:11:59 2007 From: edin.salkovic at gmail.com (Edin Salkovic) Date: Tue, 3 Jul 2007 10:11:59 +0200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> Message-ID: <63eb7fa90707030111r2ab33606xb2f76269e9e80b1f@mail.gmail.com> Hi Lino, On 7/1/07, Lino Mastrodomenico <l.mastrodomenico at gmail.com> wrote: > ``__getitem__(integer) -> line`` > > Returns the line at the specified (y) position. Just some ideas to think about. 1) Have you considered adding a separate lines property to the Image protocol? 2) Does one, by default, want to iterate over lines or over pixels of an image? Even your example iterates over pixels: # iterate over an image for line in rgb_image: for pixel in line: # swap red and blue, and set green to 0 pixel.value = pixel.b, 0, pixel.r why not just: # iterate over an image for pixel in rgb_image: pixel.value = pixel.b, 0, pixel.r 3) The pixels method (same for the possible lines property that I mentioned above) should probably be a property, i.e.: pixels -> iterator[pixel], not: pixels() -> iterator[pixel] P.S.: You might also inform the SciPy/NumPy lists about the PEP. Keep up the good work!, Edin From guido at python.org Tue Jul 3 10:14:17 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jul 2007 10:14:17 +0200 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <20070630205444.GD22221@theory.org> References: <20070630205444.GD22221@theory.org> Message-ID: <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> On 6/30/07, Matt Chisholm <matt-python at theory.org> wrote: > I've created and submitted a new PEP proposing support for labels in > Python's break and continue statements. Georg Brandl has graciously > added it to the PEP list as PEP 3136: > > http://www.python.org/dev/peps/pep-3136/ I think this is a good summary of various proposals that have been floated in the past, plus some new ones. As a PEP, it falls short because it doesn't pick a solution but merely offers a large menu of possible options. Also, there is nothing about implementation yet. However, I'm rejecting it on the basis that code so complicated to require this feature is very rare. In most cases there are existing work-arounds that produce clean code, for example using 'return'. While I'm sure there are some (rare) real cases where clarity of the code would suffer from a refactoring that makes it possible to use return, this is offset by two issues: 1. The complexity added to the language, permanently. This affects not only all Python implementations, but also every source analysis tool, plus of course all documentation for the language. 2. My expectation that the feature will be abused more than it will be used right, leading to a net decrease in code clarity (measured across all Python code written henceforth). Lazy programmers are everywhere, and before you know it you have an incredible mess on your hands of unintelligible code. I realize this is a heavy bar to pass, and somewhat subjective. That's okay. There is real value in having a small language. Also, as I said, while there are no past PEPs to document it, this has been brought up and rejected many times before. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Tue Jul 3 10:27:03 2007 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 03 Jul 2007 10:27:03 +0200 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <4689F7F1.1070503@cs.byu.edu> References: <20070630205444.GD22221@theory.org> <f6bvhu$9l3$1@sea.gmane.org> <46898BBD.1040901@canterbury.ac.nz> <4689F7F1.1070503@cs.byu.edu> Message-ID: <f6d18n$ag$1@sea.gmane.org> On 03/07/2007 9.17, Neil Toronto wrote: > Greg Ewing wrote: >> On 30/06/2007 22.54, Matt Chisholm wrote: >> >> >>> I've created and submitted a new PEP proposing support for labels in >>> Python's break and continue statements. >>> >>> http://www.python.org/dev/peps/pep-3136/ >>> >> -1. Confusing nested loops are best broken out into >> separate functions rather than patching over the >> problem with features like this. >> > > +1 (not that my vote really counts for much). Breaking logic out into > separate functions can obscure the meaning of an algorithm that is most > naturally implemented with nested loops. Do you have a concrete, real-world example? -- Giovanni Bajo From ntoronto at cs.byu.edu Tue Jul 3 11:42:09 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 03 Jul 2007 03:42:09 -0600 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <f6d18n$ag$1@sea.gmane.org> References: <20070630205444.GD22221@theory.org> <f6bvhu$9l3$1@sea.gmane.org> <46898BBD.1040901@canterbury.ac.nz> <4689F7F1.1070503@cs.byu.edu> <f6d18n$ag$1@sea.gmane.org> Message-ID: <468A19F1.7070302@cs.byu.edu> Giovanni Bajo wrote: > On 03/07/2007 9.17, Neil Toronto wrote: > >> Greg Ewing wrote: >> >>> On 30/06/2007 22.54, Matt Chisholm wrote: >>> >>> >>> >>>> I've created and submitted a new PEP proposing support for labels in >>>> Python's break and continue statements. >>>> >>>> http://www.python.org/dev/peps/pep-3136/ >>>> >>>> >>> -1. Confusing nested loops are best broken out into >>> separate functions rather than patching over the >>> problem with features like this. >>> >>> >> +1 (not that my vote really counts for much). Breaking logic out into >> separate functions can obscure the meaning of an algorithm that is most >> naturally implemented with nested loops. >> > > Do you have a concrete, real-world example? > You pragmatists and your concrete, real-world examples. :p Anyway, sure: image processing -> binary morphological operators -> erode. It's a four-deep nested loop. You pass a binary bitmask (kernel) over a binary image, centering it on each pixel. If one bit in the image is off that's on in the kernel, you turn off the center pixel in the destination. This is the obvious break - it only takes one, so it's senseless to keep going in the inner two loops. Moving the innermost two loops into a new function makes the flow of the algorithm less linear and therefore less clear. (Also, the function would never be called from anywhere else. How about an inner function? That's worse for understandability, IMNSHO.) Other ways of avoiding the inner break, such as counting hits, or overwriting the center pixel repeatedly, obscure the meaning of the morphological operator. Granted, Python doesn't usually get used for low-level stuff like this, and I'd probably use Numpy array operations in the place of the inner two loops, which would be less efficient, but faster. But you were asking whether algorithms that are naturally expressed as nested loops with breaks exist, and this just happened to be on my hard drive, written in Java. FWIW, I've read Guido's recent rejection of this PEP, but I wanted to take up the challenge of showing that these (admittedly rare) use cases do exist. A lot of them come from 2D analogues of algorithms that call for a break from an inner loop. Neil From tomerfiliba at gmail.com Tue Jul 3 12:59:51 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Tue, 3 Jul 2007 12:59:51 +0200 Subject: [Python-3000] the do-while pep Message-ID: <1d85506f0707030359w45bd864cn2007e67459df18cc@mail.gmail.com> i haven't seen this issue discussed at all, so i thought i'd bring it up -- what's the status of the pep 315 (do-while syntax)? is it getting into py3k? -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070703/30989ac6/attachment.html From guido at python.org Tue Jul 3 13:42:14 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jul 2007 13:42:14 +0200 Subject: [Python-3000] the do-while pep In-Reply-To: <1d85506f0707030359w45bd864cn2007e67459df18cc@mail.gmail.com> References: <1d85506f0707030359w45bd864cn2007e67459df18cc@mail.gmail.com> Message-ID: <ca471dc20707030442l30621106v8aaef7281b57b5b1@mail.gmail.com> On 7/3/07, tomer filiba <tomerfiliba at gmail.com> wrote: > i haven't seen this issue discussed at all, so i thought i'd bring it up -- > what's the status of the pep 315 (do-while syntax)? is it getting into py3k? No, it wasn't even considered. It was in the deferred list and nobody suggested we look at it for Py3k. From the message quoted in the deferral note it doesn't look like it's an easy sell. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From john at yates-sheets.org Tue Jul 3 14:32:19 2007 From: john at yates-sheets.org (John S. Yates, Jr.) Date: Tue, 03 Jul 2007 08:32:19 -0400 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> References: <20070630205444.GD22221@theory.org> <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> Message-ID: <j1gk83hfeltf7uqi0u4b5tpgbg80qg3cp0@4ax.com> On Tue, 3 Jul 2007, "Guido van Rossum" <guido at python.org> wrote: >However, I'm rejecting it on the basis that code so complicated to >require this feature is very rare. I assume that you are familiar with Donald E. Knuth's classic paper: "Structured Programming with go to Statements" http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf /john From alexandre at peadrop.com Tue Jul 3 17:06:20 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 3 Jul 2007 11:06:20 -0400 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> Message-ID: <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> On 7/2/07, Guido van Rossum <guido at python.org> wrote: > Honestly, I think truncate() should always set the current position to > the new size, even though that's not what it currently does. Thought about that and I think that would be the best thing to do. That would avoid making StringIO unnecessary different from BytesIO. And IMHO, it is less prone to bugs. If someone wants to truncate while keeping the current position, then he will have to state is intention explicitly by saving the value of tell() and calling seek() after truncating. I also find the semantic make more sense too. For example: >>> s = StringIO("Good bye, world") >>> s.truncate(10) >>> s.write("cruel world") >>> s.getvalue() ??? I think that should return "Good bye, cruel world", not "cruel world". So, does anyone else agree with this small semantic change of truncate()? -- Alexandre From p.f.moore at gmail.com Tue Jul 3 17:13:51 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 3 Jul 2007 16:13:51 +0100 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> Message-ID: <79990c6b0707030813o38b36960m7a6469722fd05444@mail.gmail.com> On 03/07/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > I also find the semantic make more sense too. For example: > > >>> s = StringIO("Good bye, world") > >>> s.truncate(10) > >>> s.write("cruel world") > >>> s.getvalue() > ??? > > I think that should return "Good bye, cruel world", not "cruel world". > > So, does anyone else agree with this small semantic change of truncate()? Looks reasonable to me - without checking documentation, your proposal is what I'd expect the example to do. Paul. From tjreedy at udel.edu Wed Jul 4 01:40:14 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 3 Jul 2007 19:40:14 -0400 Subject: [Python-3000] Announcing PEP 3136 References: <20070630205444.GD22221@theory.org><ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> <j1gk83hfeltf7uqi0u4b5tpgbg80qg3cp0@4ax.com> Message-ID: <f6emou$a8n$1@sea.gmane.org> "John S. Yates, Jr." <john at yates-sheets.org> wrote in message news:j1gk83hfeltf7uqi0u4b5tpgbg80qg3cp0 at 4ax.com... | On Tue, 3 Jul 2007, "Guido van Rossum" <guido at python.org> wrote: | | >However, I'm rejecting it on the basis that code so complicated to | >require this feature is very rare. | | I assume that you are familiar with Donald E. Knuth's classic paper: | "Structured Programming with go to Statements" | http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf Do you consider this to be for or against the PEP? Rereading it.... At least half Knuth's goto examples are covered by Python's single level restricted gotos: Example 1 (switched to 0-bases arrays, not tested): for i in range(m): if A[i] == x: break else: A[m] = x B[m] = 0 m += 1 B[i] += 1 Example 5 (ditto): i = 0 #? initial value not given while True: if A[i] < x: if L[i] != 0: i = L[i]; continue else: L[i] = j; break else: # > x if R[i] != 0: i = R[i]; continue else: R[i] = j; break # dup code could be factored with LR = L or R as A[i] < or > x A[j] = x L[j] = R[j] = 0 j += 1 The rest are general gotos, including jumps into the middle of loops. None are multilevel continues or breaks. tjr From john at yates-sheets.org Wed Jul 4 15:41:48 2007 From: john at yates-sheets.org (John S. Yates, Jr.) Date: Wed, 04 Jul 2007 09:41:48 -0400 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <f6emou$a8n$1@sea.gmane.org> References: <20070630205444.GD22221@theory.org><ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> <j1gk83hfeltf7uqi0u4b5tpgbg80qg3cp0@4ax.com> <f6emou$a8n$1@sea.gmane.org> Message-ID: <f97n835arna3kmn156k6avn3i63q4g611h@4ax.com> On Tue, 3 Jul 2007, "Terry Reedy" <tjreedy at udel.edu> wrote: >Do you consider this to be for or against the PEP? >Rereading it.... > >At least half Knuth's goto examples are covered >by Python's single level restricted gotos: In all honesty I did not reread the paper. I posted based on the recollection that it was the basis for my feeling no compunction about using reviled gotos in my C / C++ code to effect multi-level exists and continuations. When called to task by my peers I invoke Knuth's name. Let's chalk it up to the fallibility of memory over a span of more than 30 years. Thank's for keeping me honest. I am printing out the paper. Rereading it should help me recall the state of programming when I was first starting out. /john From turnbull at sk.tsukuba.ac.jp Wed Jul 4 18:46:08 2007 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 05 Jul 2007 01:46:08 +0900 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <f97n835arna3kmn156k6avn3i63q4g611h@4ax.com> References: <20070630205444.GD22221@theory.org> <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> <j1gk83hfeltf7uqi0u4b5tpgbg80qg3cp0@4ax.com> <f6emou$a8n$1@sea.gmane.org> <f97n835arna3kmn156k6avn3i63q4g611h@4ax.com> Message-ID: <876450xgkv.fsf@uwakimon.sk.tsukuba.ac.jp> John S. Yates, Jr. writes: > In all honesty I did not reread the paper. Sir, you have my thanks for this small misstep, without which you would have undoubtedly abstained from posting that URL, and I, in turn, would have missed a chance to read that wonderful paper. From collinw at gmail.com Fri Jul 6 16:03:56 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 6 Jul 2007 16:03:56 +0200 Subject: [Python-3000] Change to class construction? Message-ID: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> While experimenting with porting setuptools to py3k (as of r56155), I ran into this situation: class C: a = (4, 5) b = [c for c in range(2) if a] results in a "NameError: global name 'a' is not defined" error, while class C: a = (4, 5) b = [c for c in a] works fine. This gives the same error as above: class C: a = (4, 5) b = [a for c in range(2)] Both now-erroneous snippets work in 2.5.1. Was this change intentional? Collin Winter From g.brandl at gmx.net Fri Jul 6 17:00:08 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 06 Jul 2007 17:00:08 +0200 Subject: [Python-3000] Change to class construction? In-Reply-To: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> Message-ID: <f6lld0$qv3$1@sea.gmane.org> Collin Winter schrieb: > While experimenting with porting setuptools to py3k (as of r56155), I > ran into this situation: > > class C: > a = (4, 5) > b = [c for c in range(2) if a] > > results in a "NameError: global name 'a' is not defined" error, while > > class C: > a = (4, 5) > b = [c for c in a] > > works fine. This gives the same error as above: > > class C: > a = (4, 5) > b = [a for c in range(2)] > > Both now-erroneous snippets work in 2.5.1. Was this change intentional? It is at least intentional in the sense that in 3k it works the same as with genexps, which give the same errors in 2.5. What's different is that all code inside a genexp except the first iterator (which is why the second example works) is contained in its own function namespace. So, an equivalent problem is: class C: foo = 1 def bar(): print(foo) bar() Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From pje at telecommunity.com Fri Jul 6 19:25:10 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 06 Jul 2007 13:25:10 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <f6lld0$qv3$1@sea.gmane.org> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> Message-ID: <20070706172258.1E65B3A4046@sparrow.telecommunity.com> At 05:00 PM 7/6/2007 +0200, Georg Brandl wrote: >Collin Winter schrieb: > > While experimenting with porting setuptools to py3k (as of r56155), I > > ran into this situation: > > > > class C: > > a = (4, 5) > > b = [c for c in range(2) if a] > > > > results in a "NameError: global name 'a' is not defined" error, while > > > > class C: > > a = (4, 5) > > b = [c for c in a] > > > > works fine. This gives the same error as above: > > > > class C: > > a = (4, 5) > > b = [a for c in range(2)] > > > > Both now-erroneous snippets work in 2.5.1. Was this change intentional? > >It is at least intentional in the sense that in 3k it works the same as with >genexps, which give the same errors in 2.5. This looks like a bug to me. A list comprehension's local scope should be the locals of the enclosing code, even if its loop indexes aren't exposed to that scope. From guido at python.org Sat Jul 7 00:32:15 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 7 Jul 2007 00:32:15 +0200 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070706172258.1E65B3A4046@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> Message-ID: <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.com> On 7/6/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 05:00 PM 7/6/2007 +0200, Georg Brandl wrote: > >Collin Winter schrieb: > > > While experimenting with porting setuptools to py3k (as of r56155), I > > > ran into this situation: > > > > > > class C: > > > a = (4, 5) > > > b = [c for c in range(2) if a] > > > > > > results in a "NameError: global name 'a' is not defined" error, while > > > > > > class C: > > > a = (4, 5) > > > b = [c for c in a] > > > > > > works fine. This gives the same error as above: > > > > > > class C: > > > a = (4, 5) > > > b = [a for c in range(2)] > > > > > > Both now-erroneous snippets work in 2.5.1. Was this change intentional? > > > >It is at least intentional in the sense that in 3k it works the same as with > >genexps, which give the same errors in 2.5. > > This looks like a bug to me. A list comprehension's local scope > should be the locals of the enclosing code, even if its loop indexes > aren't exposed to that scope. It's because the class scope is not made available to the methods. That is intentional. Georg's later example is relevant: class C: a = 1 def f(self): print(a) # <-- raises NameError for 'a' This is in turn intentional so that too-clever kids don't develop a habit of referencing class variables without prefixing them with self or C. The OP's use case is rare enough that I don't think we should do anything about it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sat Jul 7 01:36:07 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 06 Jul 2007 19:36:07 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.co m> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.com> Message-ID: <20070706233354.DA07A3A4046@sparrow.telecommunity.com> At 12:32 AM 7/7/2007 +0200, Guido van Rossum wrote: >On 7/6/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > At 05:00 PM 7/6/2007 +0200, Georg Brandl wrote: > > >Collin Winter schrieb: > > > > While experimenting with porting setuptools to py3k (as of r56155), I > > > > ran into this situation: > > > > > > > > class C: > > > > a = (4, 5) > > > > b = [c for c in range(2) if a] > > > > > > > > results in a "NameError: global name 'a' is not defined" error, while > > > > > > > > class C: > > > > a = (4, 5) > > > > b = [c for c in a] > > > > > > > > works fine. > > > > This looks like a bug to me. A list comprehension's local scope > > should be the locals of the enclosing code, even if its loop indexes > > aren't exposed to that scope. > >It's because the class scope is not made available to the methods. The examples are in the class body, not in methods. The code is statically initializing the class contents, so using C.a isn't possible. I suppose it can be worked around by moving the static initialization code outside the class body; it's just not obvious why it happens. Collin, where did you find this code in setuptools, btw? I've been looking around at other packages of mine where static class initialization uses data structures like this, and I haven't found any place where anything but the "in" clause of a comprehension depends on class-scope variables. So, if setuptools is the only one of my libraries that does this, I'd have to agree with Guido that it is indeed quite rare. :) If I had to hazard a guess, I'd guess that it's in one of the setuptools command classes that subclasses a distutils command, and proceeds to muck around with the original options in some fashion. I just don't want to check all of them if you know which one it is. :) From guido at python.org Sat Jul 7 01:41:16 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 7 Jul 2007 01:41:16 +0200 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070706233354.DA07A3A4046@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.com> <20070706233354.DA07A3A4046@sparrow.telecommunity.com> Message-ID: <ca471dc20707061641l4dd04f07x203f225be2c32cb3@mail.gmail.com> On 7/7/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 12:32 AM 7/7/2007 +0200, Guido van Rossum wrote: > >On 7/6/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > > At 05:00 PM 7/6/2007 +0200, Georg Brandl wrote: > > > >Collin Winter schrieb: > > > > > While experimenting with porting setuptools to py3k (as of r56155), I > > > > > ran into this situation: > > > > > > > > > > class C: > > > > > a = (4, 5) > > > > > b = [c for c in range(2) if a] > > > > > > > > > > results in a "NameError: global name 'a' is not defined" error, while > > > > > > > > > > class C: > > > > > a = (4, 5) > > > > > b = [c for c in a] > > > > > > > > > > works fine. > > > > > > This looks like a bug to me. A list comprehension's local scope > > > should be the locals of the enclosing code, even if its loop indexes > > > aren't exposed to that scope. > > > >It's because the class scope is not made available to the methods. > > The examples are in the class body, not in methods. The code is > statically initializing the class contents, so using C.a isn't possible. Understood, but a generator expression (and hence in 3.0 also a list comprehension) is treated the same as a method body. > I suppose it can be worked around by moving the static initialization > code outside the class body; it's just not obvious why it happens. > > Collin, where did you find this code in setuptools, btw? I've been > looking around at other packages of mine where static class > initialization uses data structures like this, and I haven't found > any place where anything but the "in" clause of a comprehension > depends on class-scope variables. So, if setuptools is the only one > of my libraries that does this, I'd have to agree with Guido that it > is indeed quite rare. :) > > If I had to hazard a guess, I'd guess that it's in one of the > setuptools command classes that subclasses a distutils command, and > proceeds to muck around with the original options in some fashion. I > just don't want to check all of them if you know which one it is. :) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat Jul 7 03:17:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 07 Jul 2007 13:17:39 +1200 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070706172258.1E65B3A4046@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> Message-ID: <468EE9B3.40902@canterbury.ac.nz> Phillip J. Eby wrote: > This looks like a bug to me. A list comprehension's local scope > should be the locals of the enclosing code, even if its loop indexes > aren't exposed to that scope. It sounds like list comprehensions are being implemented using genexps behind the scenes now. Is this wise? In a recent thread, I suggested that one of the reasons for keeping the LC syntax was that it could be faster than list(genexp). Has anyone investigated whether any speed is being lost by making them equivalent? -- Greg From g.brandl at gmx.net Sat Jul 7 08:55:12 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 07 Jul 2007 08:55:12 +0200 Subject: [Python-3000] Change to class construction? In-Reply-To: <468EE9B3.40902@canterbury.ac.nz> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468EE9B3.40902@canterbury.ac.nz> Message-ID: <f6ndbn$i0v$1@sea.gmane.org> Greg Ewing schrieb: > Phillip J. Eby wrote: >> This looks like a bug to me. A list comprehension's local scope >> should be the locals of the enclosing code, even if its loop indexes >> aren't exposed to that scope. > > It sounds like list comprehensions are being implemented > using genexps behind the scenes now. That's not true, but the implementation is somewhat similar in that the code is executed in its own function context. > Is this wise? In a recent thread, I suggested that one > of the reasons for keeping the LC syntax was that it > could be faster than list(genexp). Has anyone investigated > whether any speed is being lost by making them equivalent? I don't remember the details, but IIRC the new LC implementation was not slower than the 2.x one. Nick should know more about that. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ncoghlan at gmail.com Sat Jul 7 16:15:54 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 08 Jul 2007 00:15:54 +1000 Subject: [Python-3000] Change to class construction? In-Reply-To: <f6ndbn$i0v$1@sea.gmane.org> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468EE9B3.40902@canterbury.ac.nz> <f6ndbn$i0v$1@sea.gmane.org> Message-ID: <468FA01A.6040707@gmail.com> Georg Brandl wrote: > Greg Ewing schrieb: >> Phillip J. Eby wrote: >>> This looks like a bug to me. A list comprehension's local scope >>> should be the locals of the enclosing code, even if its loop indexes >>> aren't exposed to that scope. >> It sounds like list comprehensions are being implemented >> using genexps behind the scenes now. > > That's not true, but the implementation is somewhat similar in that > the code is executed in its own function context. Georg is correct. A list comprehension like: [(x * y) for x in seq1 for y in seq2] expands to the following in 2.x (% prefixes the compiler's hidden variables): %n = [] for x in seq1: for y in seq2: %n.append(x*y) # Special opcode, not a normal call In py3k it expands to: def <anon>(outermost): %0 = [] for x in outermost: for y in seq2: %0.append(x*y) # Special opcode, not a normal call return %0 %n = <anon>(seq1) Python's scoping rules are somewhat tricky - doing it this way means we know they are being applied the same way in list and set comprehensions as they are applied in generator expressions, even if it isn't quite as fast as the 2.x approach to comprehensions. Another significant benefit from a maintainability point of view is that the 3 kinds of comprehension (list, set, genexp) now follow the same code path through the compiler, with only minor variations in the setup/cleanup code and the statement inside the innermost loop. >> Is this wise? In a recent thread, I suggested that one >> of the reasons for keeping the LC syntax was that it >> could be faster than list(genexp). Has anyone investigated >> whether any speed is being lost by making them equivalent? > > I don't remember the details, but IIRC the new LC implementation > was not slower than the 2.x one. Nick should know more about that. Inside a function, Py3k is slower by a constant amount relative to 2.x (the cost of creating and calling a function object) regardless of the length of the resulting list/set. At module level, Py3k will typically be faster, as the fixed cost from the anonymous function object will be overtaken by the speedup from the iteration variables becoming function locals instead of module globals. The Py3k comprehensions are still significantly faster than the equivalent generator expressions, as they still avoid suspending and resuming a generator for each value in the resulting sequence. The bit that makes all of this tricky isn't really hiding the iteration variables from the containing scope - it's making sure that the body of the comprehension can still see them after you have done so (particularly challenging if the comprehension itself contains a lambda expression, or another comprehension/genexp). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From tjreedy at udel.edu Sat Jul 7 19:08:15 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 7 Jul 2007 13:08:15 -0400 Subject: [Python-3000] Change to class construction? References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468EE9B3.40902@canterbury.ac.nz><f6ndbn$i0v$1@sea.gmane.org> <468FA01A.6040707@gmail.com> Message-ID: <f6oh9v$f40$1@sea.gmane.org> "Nick Coghlan" <ncoghlan at gmail.com> wrote in message news:468FA01A.6040707 at gmail.com... | Georg is correct. A list comprehension like: | | [(x * y) for x in seq1 for y in seq2] | | expands to the following in 2.x (% prefixes the compiler's hidden | variables): | | %n = [] | for x in seq1: | for y in seq2: | %n.append(x*y) # Special opcode, not a normal call | | In py3k it expands to: | | def <anon>(outermost): | %0 = [] | for x in outermost: | for y in seq2: | %0.append(x*y) # Special opcode, not a normal call | return %0 | %n = <anon>(seq1) Why not pass both seq1 *and* seq2 to the function so both become locals? The difference of treatment is quite surprising. From tjreedy at udel.edu Sat Jul 7 19:15:55 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 7 Jul 2007 13:15:55 -0400 Subject: [Python-3000] PEP 368: Standard image protocol and class References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> Message-ID: <f6ohob$gc1$1@sea.gmane.org> Reference Implementation ======================== If this PEP is accepted, the author will provide a reference implementation of the new classes in pure Python (that can run in CPython, PyPy, Jython and IronPython) and a second one optimized for speed in Python and C, suitable for inclusion in the CPython standard library. The author will also submit the required Tkinter patches. For all the code will be available a version for Python 2.x and a version for Python 3.0 (it is expected that the two version will be very similar and the Python 3.0 one will probably be generated almost completely automatically). Acknowledgments =============== The implementation of this PEP, if accepted, is sponsored by Google through the Google Summer of Code program. **************************************************** ***************************************************** 1. I think this *should* conform to the mew buffer protocol. Assume that it will be in 3.0. 2. I don't see how work you promised to do for your stipend can be contingent on acceptance into the standard lib. In any case, this should be released at the end of the summer as patches and 3rd party module on PyPI so it can be tested in practice and then proposed for the library. Very few new library modules get accepted before written ;-). From ncoghlan at gmail.com Sun Jul 8 07:10:16 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 08 Jul 2007 15:10:16 +1000 Subject: [Python-3000] Change to class construction? In-Reply-To: <f6oh9v$f40$1@sea.gmane.org> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468EE9B3.40902@canterbury.ac.nz><f6ndbn$i0v$1@sea.gmane.org> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> Message-ID: <469071B8.8030604@gmail.com> Terry Reedy wrote: > "Nick Coghlan" <ncoghlan at gmail.com> wrote in message > news:468FA01A.6040707 at gmail.com... > | In py3k it expands to: > | > | def <anon>(outermost): > | %0 = [] > | for x in outermost: > | for y in seq2: > | %0.append(x*y) # Special opcode, not a normal call > | return %0 > | %n = <anon>(seq1) > > Why not pass both seq1 *and* seq2 to the function so both become locals? > The difference of treatment is quite surprising. The inner iterable expressions can't be evaluated early, as they need to be re-evaluated for each pass around the outer loop (or loops). An example where the iterable expression for the inner loop refers to the iteration variable of the outer loop should make that clear: .>>> [y for x in range(4) for y in range(x)] [0, 0, 1, 0, 1, 2] The advantage of the Py3k approach is that it eliminates the current semantic differences between a list comprehension and list() with a generator expression argument, while keeping most of the performance benefits of the special syntax. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From collinw at gmail.com Sun Jul 8 10:55:45 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 8 Jul 2007 11:55:45 +0300 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070706233354.DA07A3A4046@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.com> <20070706233354.DA07A3A4046@sparrow.telecommunity.com> Message-ID: <43aa6ff70707080155r820ec31t595442753817f0ae@mail.gmail.com> On 7/7/07, Phillip J. Eby <pje at telecommunity.com> wrote: > Collin, where did you find this code in setuptools, btw? I've been > looking around at other packages of mine where static class > initialization uses data structures like this, and I haven't found > any place where anything but the "in" clause of a comprehension > depends on class-scope variables. So, if setuptools is the only one > of my libraries that does this, I'd have to agree with Guido that it > is indeed quite rare. :) > > If I had to hazard a guess, I'd guess that it's in one of the > setuptools command classes that subclasses a distutils command, and > proceeds to muck around with the original options in some fashion. I > just don't want to check all of them if you know which one it is. :) Yep, it's in setuptools.command.install, lines 20-23 (setuptools v0.6c6). Collin Winter From ferringb at gmail.com Sun Jul 8 16:07:42 2007 From: ferringb at gmail.com (Brian Harring) Date: Sun, 8 Jul 2007 07:07:42 -0700 Subject: [Python-3000] Change to class construction? In-Reply-To: <f6oh9v$f40$1@sea.gmane.org> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> Message-ID: <20070708140741.GD23765@seldon> On Sat, Jul 07, 2007 at 01:08:15PM -0400, Terry Reedy wrote: > > "Nick Coghlan" <ncoghlan at gmail.com> wrote in message > news:468FA01A.6040707 at gmail.com... > | Georg is correct. A list comprehension like: > | > | [(x * y) for x in seq1 for y in seq2] > | > | expands to the following in 2.x (% prefixes the compiler's hidden > | variables): > | > | %n = [] > | for x in seq1: > | for y in seq2: > | %n.append(x*y) # Special opcode, not a normal call > | > | In py3k it expands to: > | > | def <anon>(outermost): > | %0 = [] > | for x in outermost: > | for y in seq2: > | %0.append(x*y) # Special opcode, not a normal call > | return %0 > | %n = <anon>(seq1) > > Why not pass both seq1 *and* seq2 to the function so both become locals? > The difference of treatment is quite surprising. I'd be curious if there is anyway to preserve the existing behaviour; class foo: some_list = ('blacklist1', 'blacklist2') known_bad = some_list += ('blah',) locals().update([(attr, some_callable) for attr in some_list]) is slightly contrived, but I use similar code quite often for method generation- both for tests, and standard enough objects. Realize I could do the same via metaclasses, but it's an extra step and not nearly as easy/friendly imo. So... anyway to preserve that trick under py3k? ~harring -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070708/565e6b76/attachment.pgp From pje at telecommunity.com Sun Jul 8 19:50:30 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 08 Jul 2007 13:50:30 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <43aa6ff70707080155r820ec31t595442753817f0ae@mail.gmail.com > References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <ca471dc20707061532k5a636a49r57ab383dad81fae3@mail.gmail.com> <20070706233354.DA07A3A4046@sparrow.telecommunity.com> <43aa6ff70707080155r820ec31t595442753817f0ae@mail.gmail.com> Message-ID: <20070708174816.E26A33A404D@sparrow.telecommunity.com> At 11:55 AM 7/8/2007 +0300, Collin Winter wrote: >On 7/7/07, Phillip J. Eby <pje at telecommunity.com> wrote: >>Collin, where did you find this code in setuptools, btw? I've been >>looking around at other packages of mine where static class >>initialization uses data structures like this, and I haven't found >>any place where anything but the "in" clause of a comprehension >>depends on class-scope variables. So, if setuptools is the only one >>of my libraries that does this, I'd have to agree with Guido that it >>is indeed quite rare. :) >> >>If I had to hazard a guess, I'd guess that it's in one of the >>setuptools command classes that subclasses a distutils command, and >>proceeds to muck around with the original options in some fashion. I >>just don't want to check all of them if you know which one it is. :) > >Yep, it's in setuptools.command.install, lines 20-23 (setuptools v0.6c6). Ah. Yeah, no big deal to change it; 'new_commands' and '_nc' don't need to be attributes of the class, and so could just be done before the 'class:' statement. I don't know why I even bothered with the _nc thing there, either. From ncoghlan at gmail.com Mon Jul 9 13:03:15 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 09 Jul 2007 21:03:15 +1000 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070708140741.GD23765@seldon> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> Message-ID: <469215F3.90807@gmail.com> Brian Harring wrote: > > I'd be curious if there is anyway to preserve the existing behaviour; > > class foo: > some_list = ('blacklist1', 'blacklist2') > known_bad = some_list += ('blah',) > locals().update([(attr, some_callable) for attr in some_list]) > > is slightly contrived, but I use similar code quite often for method > generation- both for tests, and standard enough objects. Realize I > could do the same via metaclasses, but it's an extra step and not > nearly as easy/friendly imo. > > So... anyway to preserve that trick under py3k? As you've written it, that trick isn't affected by the semantic change at all (as the expression inside the list comprehension doesn't try to refer to a class variable). If 'some_callable' was actually a method of the class, then you'd need to use an actual for loop instead of the list comprehension: class foo(object): some_list = ('blacklist1', 'blacklist2') def some_method(self): # whatever pass for attr in some_list: locals()[attr] = some_method However, I will point out that setting class attributes via locals() is formally undefined (it happens to work in current versions of CPython, but there's no guarantee that will always be the case). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Mon Jul 9 17:07:06 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 11:07:06 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <469215F3.90807@gmail.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> Message-ID: <20070709150454.641193A404D@sparrow.telecommunity.com> At 09:03 PM 7/9/2007 +1000, Nick Coghlan wrote: >However, I will point out that setting class attributes via locals() is >formally undefined (it happens to work in current versions of CPython, >but there's no guarantee that will always be the case). As of PEP 3115, it's no longer undefined for class statements. Of course, if it were truly undefined to begin with, we wouldn't be so worried about how to implement the potential optimizations that the undefinedness theoretically implies. :) (i.e. optimized globals/locals) From guido at python.org Mon Jul 9 17:13:46 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 9 Jul 2007 18:13:46 +0300 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070709150454.641193A404D@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> Message-ID: <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> On 7/9/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 09:03 PM 7/9/2007 +1000, Nick Coghlan wrote: > >However, I will point out that setting class attributes via locals() is > >formally undefined (it happens to work in current versions of CPython, > >but there's no guarantee that will always be the case). > > As of PEP 3115, it's no longer undefined for class statements. Where does it say so? To be honest, I don't know where ti find Nick's claim in the reference manual. But I'm surprised that you read anything about locals() into that PEP, as it doesn't mention that function at all. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Mon Jul 9 18:03:28 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 12:03:28 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.co m> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> Message-ID: <20070709160115.16C323A404D@sparrow.telecommunity.com> At 06:13 PM 7/9/2007 +0300, Guido van Rossum wrote: >On 7/9/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > At 09:03 PM 7/9/2007 +1000, Nick Coghlan wrote: > > >However, I will point out that setting class attributes via locals() is > > >formally undefined (it happens to work in current versions of CPython, > > >but there's no guarantee that will always be the case). > > > > As of PEP 3115, it's no longer undefined for class statements. > >Where does it say so? To be honest, I don't know where ti find Nick's >claim in the reference manual. I assume Nick is referring to: http://www.python.org/doc/2.2/ref/execframes.html which says it's undefined. I can't seem to find where this section went to in 2.3 and beyond, or anything that says what happens with non-dictionary objects, except: http://docs.python.org/ref/exec.html which makes a much stronger claim: "The built-in functions globals() and locals() return the current global and local dictionary, respectively" and also states that as of 2.4, exec allows the use of any mapping object as the locals. There isn't any mention of the fact that locals() may not be writable, which should probably be considered an error. >But I'm surprised that you read >anything about locals() into that PEP, as it doesn't mention that >function at all. Correct -- which means that either the PEP is in error, or the semantics of locals() must be that the actual namespace in use is returned. My reasoning: since PEP 3115 allows an arbitrary mapping object to be used, there is no way that such an object can be converted to a read-only dictionary, and the current definition (as I understand it) is that locals() returns you either the actual local namespace object, or a "dictionary representing the ... namespace" (per the reference manual). Since PEP 3115 does not require that there be any way of converting the arbitrary mapping object into a dictionary (or even that there be any pre-defined way of *reading* its contents!) there is no way that locals() can fulfill its existing contract *except* by returning that object. QED. Well, that's the spelled-out reasoning for my intuition, anyway. :) That doesn't mean the PEP or the specification of locals() can't change, but it seems to me that if one or the other doesn't, then modifying class-suite locals() to create class members implicitly becomes official, since the failure for it to do so would become a bug in locals(). (Since it will no longer be returning a "dictionary representing the namespace" if it doesn't return that mapping object, and can't possibly return anything else that "represents" the namespace in any meaningful way.) From pje at telecommunity.com Mon Jul 9 20:44:09 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 14:44:09 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 Message-ID: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> PEP 3100 suggests dict.setdefault() may be removed in Python 3, since it is in principle no longer necessary (due to the new defaultdict type). However, there is another class of use cases which use setdefault for its limited atomic properties - the initialization of non-mutated data structures that are shared among threads. (And defaultdict cannot achieve the same thing.) I currently have three places where I use this, off the top of my head: 1. a "synchronized" decorator that initializes an object's __lock__ attribute (if not found) using ob.__dict__.setdefault('__lock__', allocate_lock()) 2. an Aspect implementation that does almost exactly the same thing, so that if multiple threads ask for an Aspect that doesn't exist for a given object, they will not end up using different instances. 3. a configuration library that supports "write many, read once" configurations shared across threads. A key may have its value written to any number of times, so long as it has never been read. As soon as the value has been read by any thread, it becomes fixed and it cannot be set to any other value. (Setting it to the same value has no effect.) This is essentially a simple way of having a provably race-condition-free data structure -- if you have a race condition, you will get an error. As a bonus, it is completely non-blocking and single threaded code does not pay any overhead for the use of the data structures. Of course, to take advantage of setdefault's atomic properties, one must be using CPython, and all the dictionary keys must have __hash__ and __eq__ methods implemented entirely in C (recursively to their contents, if tuples are involved). However, for all three of the above applications this latter condition is actually quite trivial to ensure. I realize, however, that this is an "impure" usage, in that other Python implementations usually do not have any atomicity guarantees, period. But it would save me having to write a setdefault function in C when porting any of the above code to 3.0. ;-) From tav at espians.com Mon Jul 9 20:59:11 2007 From: tav at espians.com (tav) Date: Mon, 9 Jul 2007 19:59:11 +0100 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> Message-ID: <95d8c0810707091159k657f0fe1k5aa4eb9a5a4c96c4@mail.gmail.com> > PEP 3100 suggests dict.setdefault() may be removed in Python 3, since > it is in principle no longer necessary (due to the new defaultdict type). > > However, there is another class of use cases which use setdefault for > its limited atomic properties - the initialization of non-mutated > data structures that are shared among threads. (And defaultdict > cannot achieve the same thing.) +1 setdefault's ability to return current value is also a very useful functionality and has saved writing: if key not in dict: value = <compute-value> dict[key] = value with the simpler: value = dict.setdefault(key, <compute-value>) Is there a better way to do the above without .setdefault? -- love, tav founder and ceo, esp metanational llp plex:espians/tav | tav at espians.com | +44 (0) 7809 569 369 From pje at telecommunity.com Mon Jul 9 21:17:12 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 15:17:12 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <95d8c0810707091159k657f0fe1k5aa4eb9a5a4c96c4@mail.gmail.co m> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <95d8c0810707091159k657f0fe1k5aa4eb9a5a4c96c4@mail.gmail.com> Message-ID: <20070709191500.4C4033A404D@sparrow.telecommunity.com> At 07:59 PM 7/9/2007 +0100, tav wrote: >>PEP 3100 suggests dict.setdefault() may be removed in Python 3, since >>it is in principle no longer necessary (due to the new defaultdict type). >> >>However, there is another class of use cases which use setdefault for >>its limited atomic properties - the initialization of non-mutated >>data structures that are shared among threads. (And defaultdict >>cannot achieve the same thing.) > >+1 > >setdefault's ability to return current value is also a very useful >functionality and has saved writing: > > if key not in dict: > value = <compute-value> > dict[key] = value > >with the simpler: > > value = dict.setdefault(key, <compute-value>) > >Is there a better way to do the above without .setdefault? Yes, in 2.5 there's collections.defaultdict. Of course, that only works if there is a fixed mapping from keys to initial computed values for the entire dictionary for all time. Oh, and if your code gets to create the dictionary. :) From barry at python.org Mon Jul 9 21:35:50 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 9 Jul 2007 15:35:50 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> Message-ID: <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 9, 2007, at 2:44 PM, Phillip J. Eby wrote: > PEP 3100 suggests dict.setdefault() may be removed in Python 3, since > it is in principle no longer necessary (due to the new defaultdict > type). > > However, there is another class of use cases which use setdefault for > its limited atomic properties - the initialization of non-mutated > data structures that are shared among threads. (And defaultdict > cannot achieve the same thing.) Phillip, I support any initiative to keep .setdefault() or similar functionality. When this thread came up before, I wasn't against defaultdict, I just didn't think it covered enough of the use cases of .setdefault() to warrant its removal. You describe some additional use cases. However, .setdefault() is a horrible name because it's not clear from the name that a 'get' operation also happens. It occurs to me that I haven't reached my stupid idea quota for the day, so here goes. What if we ditched .setdefault() as a name and gave .get() an optional argument to also set the key's value when it's missing. class dict2(dict): """ >>> d = dict2() >>> d.setdefault('foo', []).append(7) >>> sorted(d.items()) [('foo', [7])] >>> d.setdefault('foo', []).append(8) >>> sorted(d.items()) [('foo', [7, 8])] >>> d.get('bar', [], set_missing=True).append(9) >>> sorted(d.items()) [('bar', [9]), ('foo', [7, 8])] >>> d.get('bar', [], True).append(10) >>> sorted(d.items()) [('bar', [9, 10]), ('foo', [7, 8])] """ def get(self, key, default=None, set_missing=False): missing = object() value = super(dict2, self).get(key, missing) if value is not missing: return value if set_missing: self[key] = default return default This more or less conveys that both a get and a set operation is happening. It also doesn't violate the rule against letting an argument change the return type of a function. Maybe it will make this useful functionality more palatable. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRpKOGHEjvBPtnXfVAQJIxwP9Ev7aASfVOw3q1aiCZ3Pr4VsQwzmeb0SR 4xJR9VvAZVcsjL4wAaleU55vFir9fBnFkvEnMMRFOBJ49NtS6EuLt+yGkt22gadg TSlfNK0t4oVeFT4MJ6AebaHwBL8PvILAbV5eJ6x3H0hH383rdcdtrRyFzvhKnBRy tPqtjIZlU6Q= =WxDp -----END PGP SIGNATURE----- From guido at python.org Mon Jul 9 22:56:08 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 9 Jul 2007 23:56:08 +0300 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070709160115.16C323A404D@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> Message-ID: <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> On 7/9/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 06:13 PM 7/9/2007 +0300, Guido van Rossum wrote: > >On 7/9/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > > At 09:03 PM 7/9/2007 +1000, Nick Coghlan wrote: > > > >However, I will point out that setting class attributes via locals() is > > > >formally undefined (it happens to work in current versions of CPython, > > > >but there's no guarantee that will always be the case). > > > > > > As of PEP 3115, it's no longer undefined for class statements. > > > >Where does it say so? To be honest, I don't know where ti find Nick's > >claim in the reference manual. > > I assume Nick is referring to: > > http://www.python.org/doc/2.2/ref/execframes.html > > which says it's undefined. I can't seem to find where this section > went to in 2.3 and beyond, or anything that says what happens with > non-dictionary objects, except: > > http://docs.python.org/ref/exec.html > > which makes a much stronger claim: > > "The built-in functions globals() and locals() return the current > global and local dictionary, respectively" > > and also states that as of 2.4, exec allows the use of any mapping > object as the locals. There isn't any mention of the fact that > locals() may not be writable, which should probably be considered an error. > > > >But I'm surprised that you read > >anything about locals() into that PEP, as it doesn't mention that > >function at all. > > Correct -- which means that either the PEP is in error, or the > semantics of locals() must be that the actual namespace in use is returned. > > My reasoning: since PEP 3115 allows an arbitrary mapping object to be > used, there is no way that such an object can be converted to a > read-only dictionary, and the current definition (as I understand it) > is that locals() returns you either the actual local namespace > object, or a "dictionary representing the ... namespace" (per the > reference manual). > > Since PEP 3115 does not require that there be any way of converting > the arbitrary mapping object into a dictionary (or even that there be > any pre-defined way of *reading* its contents!) there is no way that > locals() can fulfill its existing contract *except* by returning that object. > > QED. Well, that's the spelled-out reasoning for my intuition, > anyway. :) That doesn't mean the PEP or the specification of > locals() can't change, but it seems to me that if one or the other > doesn't, then modifying class-suite locals() to create class members > implicitly becomes official, since the failure for it to do so would > become a bug in locals(). (Since it will no longer be returning a > "dictionary representing the namespace" if it doesn't return that > mapping object, and can't possibly return anything else that > "represents" the namespace in any meaningful way.) Python's specification isn't as rigid as it should be, and such a "proof" isn't worth much, especially as the reference manual hasn't always been updated as things changed. The use of the word "mapping" might easily be construed as implementing abc.Mapping, and then iteration and reading the contents would be well-defined. The weasel-words about "a dictionary representing the namespace" are meant to cover the situation for a function's local scope, which isn't stored in a mapping-like object at all until you use exec() or locals(), or a few others. We could easily change this to return a writable mapping that's not a dict at all but a "view" on the locals just as dict.keys() returns a view on a dict. I don't see why locals() couldn't return the object used to represent the namespace, but I don't see that it couldn't be some view on that object either, depending on the details of the implementation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 9 23:01:15 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 10 Jul 2007 00:01:15 +0300 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <95d8c0810707091159k657f0fe1k5aa4eb9a5a4c96c4@mail.gmail.com> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <95d8c0810707091159k657f0fe1k5aa4eb9a5a4c96c4@mail.gmail.com> Message-ID: <ca471dc20707091401w43103abfi67d222388aee81c6@mail.gmail.com> On 7/9/07, tav <tav at espians.com> wrote: > setdefault's ability to return current value is also a very useful > functionality and has saved writing: > > if key not in dict: > value = <compute-value> > dict[key] = value > > with the simpler: > > value = dict.setdefault(key, <compute-value>) > > Is there a better way to do the above without .setdefault? Those are not equivalent, as the form using setdefault() *always* evaluates <compute-value> while the other form only evaluates it when needed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brandon at rhodesmill.org Mon Jul 9 23:01:38 2007 From: brandon at rhodesmill.org (Brandon Craig Rhodes) Date: Mon, 09 Jul 2007 17:01:38 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> (Barry Warsaw's message of "Mon, 9 Jul 2007 15:35:50 -0400") References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> Message-ID: <87odilnvf1.fsf@ten22.rhodesmill.org> Barry Warsaw <barry at python.org> writes: > However, .setdefault() is a horrible name because it's not clear > from the name that a 'get' operation also happens. Agreed! From the name, a clever but naive user would assume that "setdefault" sets what value the dictionary returns when a key does not exist. On first encountering the name, one imagines: >>> d = {} >>> d[1] KeyError: 1 >>> d.setdefault('missing') >>> d[1] 'missing' -- Brandon Craig Rhodes brandon at rhodesmill.org http://rhodesmill.org/brandon From guido at python.org Mon Jul 9 23:04:56 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 10 Jul 2007 00:04:56 +0300 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> Message-ID: <ca471dc20707091404u40c1a8eek5cb122a74edbf1b@mail.gmail.com> On 7/9/07, Barry Warsaw <barry at python.org> wrote: > Phillip, I support any initiative to keep .setdefault() or similar > functionality. When this thread came up before, I wasn't against > defaultdict, I just didn't think it covered enough of the use cases > of .setdefault() to warrant its removal. You describe some > additional use cases. > > However, .setdefault() is a horrible name because it's not clear from > the name that a 'get' operation also happens. We had a long name discussion when it was introduced. Perhaps we can go back to the list suggested then and see if a better alternative was overlooked? > It occurs to me that I haven't reached my stupid idea quota for the > day, so here goes. What if we ditched .setdefault() as a name and > gave .get() an optional argument to also set the key's value when > it's missing. > [...] > def get(self, key, default=None, set_missing=False): > missing = object() > value = super(dict2, self).get(key, missing) > if value is not missing: > return value > if set_missing: > self[key] = default > return default > > This more or less conveys that both a get and a set operation is > happening. It also doesn't violate the rule against letting an > argument change the return type of a function. Maybe it will make > this useful functionality more palatable. But it does violate the rule that if you have a boolean flag to indicate a "variant" of an API and in practice you'll always be passing a constant for that flag, you're better off defining two methods with different names. Although if the return type isn't different, the semantics are certainly *very* different here. So I'm strongly against this. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Mon Jul 9 23:29:04 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 9 Jul 2007 14:29:04 -0700 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> Message-ID: <002a01c7c270$42de5060$a389763f@RaymondLaptop1> > PEP 3100 suggests dict.setdefault() may be removed in Python 3, since > it is in principle no longer necessary (due to the new defaultdict type). I've forgotten. What was the whole point of Python 3.0? Is it to make the language fat with lots of ways to do everything? Guys, this is your ONE chance to slim down the language and pare away anything that is unnecessary or arcane. The setdefault() method has too many defects to keep around. Why would you want a method that instantiates the default on every call even if not needed. Let this one die. The dict API already heavily loaded. Thinning it a bit would be a nice improvement. Raymond From fumanchu at amor.org Mon Jul 9 23:55:41 2007 From: fumanchu at amor.org (Robert Brewer) Date: Mon, 9 Jul 2007 14:55:41 -0700 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <002a01c7c270$42de5060$a389763f@RaymondLaptop1> Message-ID: <435DF58A933BA74397B42CDEB8145A860DBCAFEA@ex9.hostedexchange.local> Raymond Hettinger wrote: > > PEP 3100 suggests dict.setdefault() may be removed in > > Python 3, since it is in principle no longer necessary > > (due to the new defaultdict type). > > I've forgotten. What was the whole point of Python 3.0? > Is it to make the language fat with lots of ways to do everything? > Guys, this is your ONE chance to slim down the language and > pare away anything that is unnecessary or arcane. > > The setdefault() method has too many defects to keep around. > Why would you want a method that instantiates the default on > every call even if not needed. > > Let this one die. The dict API already heavily loaded. Thinning > it a bit would be a nice improvement. I have to agree, even though it means more work for me (due to my own heavy use of setdefault for its atomicity). Perhaps a better resolution for these use cases would be a stdlib module which would provide fast, thread-safe collections. This would standardize, across implementations, some of the CPython behaviors we've come to rely on. It would also make make it clear that the given type is being used specifically for its thread-safety. Robert Brewer System Architect Amor Ministries fumanchu at amor.org From barry at python.org Tue Jul 10 00:14:33 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 9 Jul 2007 18:14:33 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <ca471dc20707091404u40c1a8eek5cb122a74edbf1b@mail.gmail.com> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> <ca471dc20707091404u40c1a8eek5cb122a74edbf1b@mail.gmail.com> Message-ID: <9D661F09-FBD2-4C5D-9F90-2DDC476767D7@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 9, 2007, at 5:04 PM, Guido van Rossum wrote: > On 7/9/07, Barry Warsaw <barry at python.org> wrote: >> Phillip, I support any initiative to keep .setdefault() or similar >> functionality. When this thread came up before, I wasn't against >> defaultdict, I just didn't think it covered enough of the use cases >> of .setdefault() to warrant its removal. You describe some >> additional use cases. >> >> However, .setdefault() is a horrible name because it's not clear from >> the name that a 'get' operation also happens. > > We had a long name discussion when it was introduced. Perhaps we can > go back to the list suggested then and see if a better alternative was > overlooked? Don't look here because some big dummy contradicts himself seven years later: http://mail.python.org/pipermail/python-dev/2000-August/007819.html hmm-put()-ly y'rs, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRpKzSXEjvBPtnXfVAQKRmQP8DZDYKFOhOjYvtf+OkmmgAnwWaOI5tpPv kHHxtMGPdgEM3cXAdT0U5m04W1IUmMKBItV/JE4qGO4OdD0eFIUPaZBufVUIIg3b 230qJnamVWrzZ/uRUhgDK363Kt2NstrxKce+kX37FPy2qHUSu3RMiBpzx9NJBW8I P3rjaqYZycg= =cU+w -----END PGP SIGNATURE----- From barry at python.org Tue Jul 10 00:17:08 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 9 Jul 2007 18:17:08 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <002a01c7c270$42de5060$a389763f@RaymondLaptop1> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <002a01c7c270$42de5060$a389763f@RaymondLaptop1> Message-ID: <E792E098-68C1-43B6-A1C0-F9D37A2C853F@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jul 9, 2007, at 5:29 PM, Raymond Hettinger wrote: >> PEP 3100 suggests dict.setdefault() may be removed in Python 3, since >> it is in principle no longer necessary (due to the new defaultdict >> type). > > I've forgotten. What was the whole point of Python 3.0? > Is it to make the language fat with lots of ways to do everything? > Guys, this is your ONE chance to slim down the language and > pare away anything that is unnecessary or arcane. > > The setdefault() method has too many defects to keep around. > Why would you want a method that instantiates the default on > every call even if not needed. Um, like .get()? > Let this one die. The dict API already heavily loaded. Thinning > it a bit would be a nice improvement. Unless you remove something useful. The problem with setdefault() isn't what it does, it's the name. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRpKz5HEjvBPtnXfVAQKV4gP+Ntpkcmo9Yx0d0CvPuGen1E78RLGVquhm wtaGY2OHsQk8Fq+5DSLdTLQcqba5Ru8kToxcFG+FbKuul7xvN+yFJ4yfFzBKvp6z CLwE+GkP6v/zC/W1hJ0zkd/0zWE4tPp5Egmug5BhZ6n2ZkwX2ExCfq2jMXf/xmsV cmu7z3TWQXI= =BzxB -----END PGP SIGNATURE----- From pje at telecommunity.com Tue Jul 10 02:13:56 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 20:13:56 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com > References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> Message-ID: <20070710001144.3B26A3A404D@sparrow.telecommunity.com> At 11:56 PM 7/9/2007 +0300, Guido van Rossum wrote: > The use of the word "mapping" >might easily be construed as implementing abc.Mapping, and then >iteration and reading the contents would be well-defined. I'm not sure which use of the word "mapping" you're talking about. PEP 3115 is explicit that there is no specific requirements for the __prepare__()'d namespace; it just mentions some things that might be useful to have in such an object. So, in order to replace it with a view or something, we'd want to change the PEP to explicitly document what is required. Personally, I'd just as soon make it explicitly official that locals() in a class suite gives you the __prepare__()'d object, whatever it is. If a given Python implementation can support PEP 3115 in the first place, then it clearly knows what object to return. ;-) From pje at telecommunity.com Tue Jul 10 02:21:44 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 20:21:44 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <ca471dc20707091404u40c1a8eek5cb122a74edbf1b@mail.gmail.com > References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> <ca471dc20707091404u40c1a8eek5cb122a74edbf1b@mail.gmail.com> Message-ID: <20070710001930.07D653A404D@sparrow.telecommunity.com> At 12:04 AM 7/10/2007 +0300, Guido van Rossum wrote: >On 7/9/07, Barry Warsaw <barry at python.org> wrote: >>Phillip, I support any initiative to keep .setdefault() or similar >>functionality. When this thread came up before, I wasn't against >>defaultdict, I just didn't think it covered enough of the use cases >>of .setdefault() to warrant its removal. You describe some >>additional use cases. >> >>However, .setdefault() is a horrible name because it's not clear from >>the name that a 'get' operation also happens. > >We had a long name discussion when it was introduced. Perhaps we can >go back to the list suggested then and see if a better alternative was >overlooked? Personally, for my use cases it wouldn't matter if it didn't return a value, because I'm not using it to shorten the code. So if you took away the return value and left the name (or changed it to something clearer), that'd be okay by me. The alternative, of course, is as Robert suggested, to just write some library code to deal with this and similar issues. If I have to import setdefault from somewhere to use it (ala the heapq.* functions), that's fine by me too, as long as it's still able to be atomic. That approach might also address Raymond's desire to narrow the dictionary object API. From rrr at ronadam.com Tue Jul 10 02:21:31 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 09 Jul 2007 19:21:31 -0500 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <DE56AE10-C634-4185-B89D-C513FCE1F877@python.org> Message-ID: <4692D10B.5040902@ronadam.com> Barry Warsaw wrote: > However, .setdefault() is a horrible name because it's not clear from > the name that a 'get' operation also happens. The return value of .setdefault() could be changed to None, then the name would be correct. And then a helper function could fill the current use case of returning the added abject at the same time. >>> d = {} >>> def setget(setter, getter, vars): ... setter(*vars) ... return getter(*vars) ... >>> setget(d.setdefault, d.get, ('foo', [])).append(7) >>> d {'foo': [7]} >>> setget(d.setdefault, d.get, ('foo', [])).append(8) >>> d {'foo': [7, 8]} Now if this could be made to be more general so it worked with with other objects it might really be useful. ;-) Cheers, Ron From rrr at ronadam.com Tue Jul 10 02:40:18 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 09 Jul 2007 19:40:18 -0500 Subject: [Python-3000] Change to class construction? In-Reply-To: <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> Message-ID: <4692D572.4080401@ronadam.com> Guido van Rossum wrote: > We could easily change this to return a > writable mapping that's not a dict at all but a "view" on the locals > just as dict.keys() returns a view on a dict. I don't see why locals() > couldn't return the object used to represent the namespace, but I > don't see that it couldn't be some view on that object either, > depending on the details of the implementation. This sounds great! I just recently wanted to pass a namespace to exec, but it refuses to accept anything but a dictionary for a local name space. What I really want to do is pass an object as the local namespace. And have the exec() use it complete with it's properties intact. Passing obj.__dict__ doesn't work in this case. Cheers, Ron From pje at telecommunity.com Tue Jul 10 02:48:54 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 09 Jul 2007 20:48:54 -0400 Subject: [Python-3000] Change to class construction? In-Reply-To: <4692D572.4080401@ronadam.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> <4692D572.4080401@ronadam.com> Message-ID: <20070710004640.93BD83A40A4@sparrow.telecommunity.com> At 07:40 PM 7/9/2007 -0500, Ron Adam wrote: >Guido van Rossum wrote: > >>We could easily change this to return a >>writable mapping that's not a dict at all but a "view" on the locals >>just as dict.keys() returns a view on a dict. I don't see why locals() >>couldn't return the object used to represent the namespace, but I >>don't see that it couldn't be some view on that object either, >>depending on the details of the implementation. > >This sounds great! I just recently wanted to pass a namespace to >exec, but it refuses to accept anything but a dictionary for a local >name space. You can already do that in Python 2.4. >What I really want to do is pass an object as the local >namespace. And have the exec() use it complete with it's properties >intact. Passing obj.__dict__ doesn't work in this case. You need a wrapper, e.g.: class AttrMap(object): def __init__(self, ob): self.ob = ob def __getitem__(self, key): try: return getattr(self.ob, key) except AttributeError: raise KeyError, key # setitem, delitem, etc... From greg.ewing at canterbury.ac.nz Tue Jul 10 03:16:20 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 10 Jul 2007 13:16:20 +1200 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> Message-ID: <4692DDE4.9090707@canterbury.ac.nz> Phillip J. Eby wrote: > However, there is another class of use cases which use setdefault for > its limited atomic properties - the initialization of non-mutated > data structures that are shared among threads. Isn't it rather dangerous to rely on any built-in Python operations to be atomic? They might happen to be, but I don't think there's any guarantee they will stay that way. -- Greg From matt-python at theory.org Tue Jul 10 03:33:07 2007 From: matt-python at theory.org (Matt Chisholm) Date: Mon, 9 Jul 2007 18:33:07 -0700 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> References: <20070630205444.GD22221@theory.org> <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> Message-ID: <20070710013307.GA5495@theory.org> On Jul 3 2007, 10:14, Guido van Rossum wrote: >On 6/30/07, Matt Chisholm <matt-python at theory.org> wrote: >>I've created and submitted a new PEP proposing support for labels in >>Python's break and continue statements. Georg Brandl has graciously >>added it to the PEP list as PEP 3136: >> >>http://www.python.org/dev/peps/pep-3136/ > >I think this is a good summary of various proposals that have been >floated in the past, plus some new ones. As a PEP, it falls short >because it doesn't pick a solution but merely offers a large menu of >possible options. Also, there is nothing about implementation yet. I was hoping the community would pick their favorite option. And I planned to address implementation if the PEP was well received. >However, I'm rejecting it on the basis that code so complicated to >require this feature is very rare. In most cases there are existing >work-arounds that produce clean code, for example using 'return'. >While I'm sure there are some (rare) real cases where clarity of the >code would suffer from a refactoring that makes it possible to use >return, this is offset by two issues: > >1. The complexity added to the language, permanently. This affects not >only all Python implementations, but also every source analysis tool, >plus of course all documentation for the language. Not knowing anything about the implementation details, I can't argue with that. >2. My expectation that the feature will be abused more than it will be >used right, leading to a net decrease in code clarity (measured across >all Python code written henceforth). Lazy programmers are everywhere, >and before you know it you have an incredible mess on your hands of >unintelligible code. Are break / continue currently abused more than they are used right, or used to make code difficult to understand? I am trying to come up with an example of mis-use of labeled break or continue that is mitigated by the absence of labels, and I can't quite think of one. Maybe I'm being unimaginative. :) >I realize this is a heavy bar to pass, and somewhat subjective. That's >okay. There is real value in having a small language. Also, as I said, >while there are no past PEPs to document it, this has been brought up >and rejected many times before. So, I don't quite agree, but you're the boss. If this has been rejected before, I don't want to waste everybody's time discussing it again. Should I add your justification to the PEP and change it's status? -matt P.S. Thanks to everybody who read the PEP and commented. :) From greg.ewing at canterbury.ac.nz Tue Jul 10 03:40:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 10 Jul 2007 13:40:51 +1200 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <20070710013307.GA5495@theory.org> References: <20070630205444.GD22221@theory.org> <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> <20070710013307.GA5495@theory.org> Message-ID: <4692E3A3.9010600@canterbury.ac.nz> Matt Chisholm wrote: > Are break / continue currently abused more than they are used right, > or used to make code difficult to understand? In my experience, using break and continue for anything other than a standard loop-and-a-half makes code hard to follow, even when there is only one loop. Labels would not mitigate that. -- Greg From fdrake at acm.org Tue Jul 10 05:20:01 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 9 Jul 2007 23:20:01 -0400 Subject: [Python-3000] A request to keep dict.setdefault() in 3.0 In-Reply-To: <4692DDE4.9090707@canterbury.ac.nz> References: <20070709184156.E2EFC3A404D@sparrow.telecommunity.com> <4692DDE4.9090707@canterbury.ac.nz> Message-ID: <200707092320.01898.fdrake@acm.org> On Monday 09 July 2007, Greg Ewing wrote: > Isn't it rather dangerous to rely on any built-in > Python operations to be atomic? They might happen > to be, but I don't think there's any guarantee > they will stay that way. My limited recollection is that setdefault() was all about it being atomic; otherwise there's no benefit to building it in C. The documentation sadly omits mentioning this very important property of setdefault(), however. If the atomicity isn't promised, then there's no benefit, and writing a helper in Python would be fine. However, as we've seen in this discussion, that's critical to many users of the method. Without it, most users would have to add a C (or whatever) function that did the same task and made the atomicity promise. IMHO, it's better to have a single shared implementation with this promise; that makes it easier to recognize when reading unfamiliar code. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> From rrr at ronadam.com Tue Jul 10 06:03:04 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 09 Jul 2007 23:03:04 -0500 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070710004640.93BD83A40A4@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> <ca471dc20707091356t2caa808u353705ca714d96e3@mail.gmail.com> <4692D572.4080401@ronadam.com> <20070710004640.93BD83A40A4@sparrow.telecommunity.com> Message-ID: <469304F8.2060303@ronadam.com> Phillip J. Eby wrote: > At 07:40 PM 7/9/2007 -0500, Ron Adam wrote: > >> Guido van Rossum wrote: >> >>> We could easily change this to return a >>> writable mapping that's not a dict at all but a "view" on the locals >>> just as dict.keys() returns a view on a dict. I don't see why locals() >>> couldn't return the object used to represent the namespace, but I >>> don't see that it couldn't be some view on that object either, >>> depending on the details of the implementation. >> >> This sounds great! I just recently wanted to pass a namespace to exec, >> but it refuses to accept anything but a dictionary for a local name >> space. > > You can already do that in Python 2.4. > > >> What I really want to do is pass an object as the local namespace. >> And have the exec() use it complete with it's properties intact. >> Passing obj.__dict__ doesn't work in this case. > > You need a wrapper, e.g.: > > class AttrMap(object): > def __init__(self, ob): > self.ob = ob > def __getitem__(self, key): > try: return getattr(self.ob, key) > except AttributeError: raise KeyError, key > # setitem, delitem, etc... Thanks, that should solves (I hope) the particular case I have. Although it would have been nicer if it was in the library someplace. Of course everyone says that about nearly everything. It might be nice if locals() could receive an argument so it can be used with class's. Possible returning a wrapped class view such as the example you gave. Regards, Ron From ncoghlan at gmail.com Tue Jul 10 11:33:04 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Jul 2007 19:33:04 +1000 Subject: [Python-3000] Change to class construction? In-Reply-To: <20070709160115.16C323A404D@sparrow.telecommunity.com> References: <43aa6ff70707060703w6d4f2edbm82a0a6da4a6bbd90@mail.gmail.com> <f6lld0$qv3$1@sea.gmane.org> <20070706172258.1E65B3A4046@sparrow.telecommunity.com> <468FA01A.6040707@gmail.com> <f6oh9v$f40$1@sea.gmane.org> <20070708140741.GD23765@seldon> <469215F3.90807@gmail.com> <20070709150454.641193A404D@sparrow.telecommunity.com> <ca471dc20707090813k2271460dw9955c7aec79b2e4d@mail.gmail.com> <20070709160115.16C323A404D@sparrow.telecommunity.com> Message-ID: <46935250.8060903@gmail.com> Phillip J. Eby wrote: > At 06:13 PM 7/9/2007 +0300, Guido van Rossum wrote: >> On 7/9/07, Phillip J. Eby <pje at telecommunity.com> wrote: >> > At 09:03 PM 7/9/2007 +1000, Nick Coghlan wrote: >> > >However, I will point out that setting class attributes via >> locals() is >> > >formally undefined (it happens to work in current versions of CPython, >> > >but there's no guarantee that will always be the case). >> > >> > As of PEP 3115, it's no longer undefined for class statements. >> >> Where does it say so? To be honest, I don't know where ti find Nick's >> claim in the reference manual. > > I assume Nick is referring to: > > http://www.python.org/doc/2.2/ref/execframes.html > > which says it's undefined. I was actually referring to this warning in the library reference docs for the locals() function: """Warning: The contents of this dictionary should not be modified; changes may not affect the values of local variables used by the interpreter.""" Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue Jul 10 23:14:27 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 00:14:27 +0300 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni Message-ID: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> One of the most daunting tasks remaining for Python 3.0a1 (to be released by the end of August) is fixing the remaining failing unit tests in the py3k-struni branch (http://svn.python.org/view/python/branches/py3k-struni/). This is the branch where I have started the work on the string/unification branch. I want to promote this branch to become the "main" Py3k branch ASAP (by renaming it to py3k), but I don't want to do that until all unit tests pass. I've been working diligently on this task, and I've got it down to about 50 tests that are failing on at least one of OSX and Ubuntu (the platforms to which I have easy access). Now I need help. To facilitate distributing the task of getting the remaining tests to pass, I've created a wiki page: http://wiki.python.org/moin/Py3kStrUniTests . Please help! It's easy to help: (1) check out the py3k-struni branch; (2) build it; (3) pick a test and figure out why it's failing; (4) produce a fix; (5) submit the fix to SF (or check it in, if you have submit privileges and are confident enough). In order to avoid duplicate work, I've come up with a simple protocol: you mark a test in the wiki as "MINE" (with your name) when you start looking at it. You mark it as "FIXED [IN SF]" once you fix it, adding the patch# if the fix is in SF. If you give up, remove your lock, adding instead a note with what you've found (even just the names of the failing subtests is helpful). Please help! There are other tasks, see PEP 3100. Mail me if you're interested in anything specifically. (Please don't ask me "do you think I could do this" -- you know better than I whether you're capable of coding at a specific level. If you don't understand the task, you're probably not qualified.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Wed Jul 11 00:30:13 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 11 Jul 2007 00:30:13 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> Message-ID: <46940875.2000606@cheimes.de> Guido van Rossum wrote: > Please help! I've made a meta patch that makes debugging the bugs a lot easier. It replaces assert_(foo == bar) and failUnless(foo == bar) with failUnlessEqual(foo, bar). failUnlessEqual shows the value of foo and bar when they are not equal. http://www.python.org/sf/1751515 sed -r "s/self\.assert_\((.*)\ ==/self.failUnlessEqual\(\1,/" -i *.py sed -r "s/self\.failUnless\((.*)\ ==/self.failUnlessEqual\(\1,/" -i *.py By the way the ctypes unit tests are causing a segfault on my machine: test_ctypes Warning: could not import ctypes.test.test_numbers: unpack requires a string argument of length 1 Segmentation fault Ubunutu 7.04 on i386 machine with an Intel P3. Christian From steven.bethard at gmail.com Wed Jul 11 00:38:53 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 10 Jul 2007 16:38:53 -0600 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <46940875.2000606@cheimes.de> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> Message-ID: <d11dcfba0707101538l1d3c3244i788032054da92717@mail.gmail.com> On 7/10/07, Christian Heimes <lists at cheimes.de> wrote: > Guido van Rossum wrote: > > Please help! > > I've made a meta patch that makes debugging the bugs a lot easier. It > replaces assert_(foo == bar) and failUnless(foo == bar) with > failUnlessEqual(foo, bar). failUnlessEqual shows the value of foo and > bar when they are not equal. > > http://www.python.org/sf/1751515 > > sed -r "s/self\.assert_\((.*)\ ==/self.failUnlessEqual\(\1,/" -i *.py > sed -r "s/self\.failUnless\((.*)\ ==/self.failUnlessEqual\(\1,/" -i *.py Some of these look questionable, e.g.: - self.assert_(d == self.spamle or d == self.spambe) + self.failUnlessEqual(d == self.spamle or d, self.spambe) ... - self.assert_((a == 42) is False) + self.failUnlessEqual((a, 42) is False) I'd probably go with something a little more restrictive, maybe: r'self.assert_\(\S+ == \S+\)' Something like that ought to have fewer false positives. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From lists at cheimes.de Wed Jul 11 01:17:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 11 Jul 2007 01:17:26 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <d11dcfba0707101538l1d3c3244i788032054da92717@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <d11dcfba0707101538l1d3c3244i788032054da92717@mail.gmail.com> Message-ID: <46941386.9080301@cheimes.de> Steven Bethard wrote: > I'd probably go with something a little more restrictive, maybe: > > r'self.assert_\(\S+ == \S+\)' > > Something like that ought to have fewer false positives. Woops! You are right. Even your pattern has caused some false positives but I've reread the patch and removed the offending lines. I'm going to upload another patch as soon as I have verified mine again. Christian From lists at cheimes.de Wed Jul 11 03:54:49 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 11 Jul 2007 03:54:49 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> Message-ID: <f71d9d$thn$1@sea.gmane.org> I found a bug in the str type that may affect a lot of tests. In the py3k-struni branch the str() constructor doesn't use __str__ when the argument is an instance of a subclass of str. A user defined string can't change __str__(). The __repr__ method isn't affected. It works in Python 2.5 and in the p3yk branch. Python 3.0x (py3k-struni:56245, Jul 10 2007, 23:34:56) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class Mystr(str): ... def __str__(self): return 'v' ... >>> s = Mystr('x') >>> s 'x' >>> str(s) 'x' # <- SHOULD RETURN 'v' Christian From guido at python.org Wed Jul 11 08:48:49 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 09:48:49 +0300 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <46941386.9080301@cheimes.de> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <d11dcfba0707101538l1d3c3244i788032054da92717@mail.gmail.com> <46941386.9080301@cheimes.de> Message-ID: <ca471dc20707102348p78041da9hbd4f08b84d4abab1@mail.gmail.com> Please use self.assertEqual() instead of self.failUnlessEqual() -- the assertEqual() form is much more common. Otherwise, good idea! On 7/11/07, Christian Heimes <lists at cheimes.de> wrote: > Steven Bethard wrote: > > I'd probably go with something a little more restrictive, maybe: > > > > r'self.assert_\(\S+ == \S+\)' > > > > Something like that ought to have fewer false positives. > > Woops! You are right. Even your pattern has caused some false positives > but I've reread the patch and removed the offending lines. I'm going to > upload another patch as soon as I have verified mine again. > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Wed Jul 11 09:51:58 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Wed, 11 Jul 2007 09:51:58 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <f71d9d$thn$1@sea.gmane.org> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> Message-ID: <46948C1E.5050800@livinglogic.de> Christian Heimes wrote: > I found a bug in the str type that may affect a lot of tests. > > In the py3k-struni branch the str() constructor doesn't use __str__ when > the argument is an instance of a subclass of str. A user defined string > can't change __str__(). The __repr__ method isn't affected. This hasn't been rewired yet. Behind the covers str still behaves like unicode, i.e. it uses __unicode__ for conversion. Servus, Walter From guido at python.org Wed Jul 11 10:01:05 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 11:01:05 +0300 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <46948C1E.5050800@livinglogic.de> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> Message-ID: <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> Yeah, I'm looking in to this right now. What a mess! But I'm close to a fix. There's more that causes test_descr to fail however. Bleh, what a terrible unit test -- it doesn't use the unittest module, and a single failure aborts the rest of the test. --Guido On 7/11/07, Walter D?rwald <walter at livinglogic.de> wrote: > Christian Heimes wrote: > > > I found a bug in the str type that may affect a lot of tests. > > > > In the py3k-struni branch the str() constructor doesn't use __str__ when > > the argument is an instance of a subclass of str. A user defined string > > can't change __str__(). The __repr__ method isn't affected. > > This hasn't been rewired yet. Behind the covers str still behaves like > unicode, i.e. it uses __unicode__ for conversion. > > Servus, > Walter > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 11 11:30:58 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 12:30:58 +0300 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> Message-ID: <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> Fixed in subversion. Please do review r56252 to see that I did the right thing. On 7/11/07, Guido van Rossum <guido at python.org> wrote: > Yeah, I'm looking in to this right now. What a mess! But I'm close to a fix. > > There's more that causes test_descr to fail however. Bleh, what a > terrible unit test -- it doesn't use the unittest module, and a single > failure aborts the rest of the test. > > --Guido > > On 7/11/07, Walter D?rwald <walter at livinglogic.de> wrote: > > Christian Heimes wrote: > > > > > I found a bug in the str type that may affect a lot of tests. > > > > > > In the py3k-struni branch the str() constructor doesn't use __str__ when > > > the argument is an instance of a subclass of str. A user defined string > > > can't change __str__(). The __repr__ method isn't affected. > > > > This hasn't been rewired yet. Behind the covers str still behaves like > > unicode, i.e. it uses __unicode__ for conversion. > > > > Servus, > > Walter > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 11 13:45:04 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 14:45:04 +0300 Subject: [Python-3000] Fwd: Your confirmation is required to leave the Python-3000 mailing list In-Reply-To: <mailman.0.1184147038.21671.python-3000@python.org> References: <mailman.0.1184147038.21671.python-3000@python.org> Message-ID: <ca471dc20707110445n28be3d1g8d46daa701eaeab2@mail.gmail.com> Which joker tried to unsub me? ---------- Forwarded message ---------- From: python-3000-confirm+e08ed5828...1ff418f380758543 at python.org <python-3000-confirm+e08ed58281...51ff418f380758543 at python.org> Date: Jul 11, 2007 12:43 PM Subject: Your confirmation is required to leave the Python-3000 mailing list To: guido at python.org Mailing list removal confirmation notice for mailing list Python-3000 We have received a request for the removal of your email address, "guido at python.org" from the python-3000 at python.org mailing list. To confirm that you want to be removed from this mailing list, simply reply to this message, keeping the Subject: header intact. Or visit this web page: http://mail.python.org/mailman/confirm/python-3000/e08ed5828...8543 Or include the following line -- and only the following line -- in a message to python-3000-request at python.org: confirm e08e...0758543 Note that simply sending a `reply' to this message should work from most mail readers, since that usually leaves the Subject: line in the right form (additional "Re:" text in the Subject: is okay). If you do not wish to be removed from this list, please simply disregard this message. If you think you are being maliciously removed from the list, or have any other questions, send them to python-3000-owner at python.org. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 11 13:46:12 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 14:46:12 +0300 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <f728g6$6gt$1@sea.gmane.org> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> Message-ID: <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> On 7/11/07, Thomas Heller <theller at ctypes.org> wrote: > Christian Heimes schrieb: > > > > By the way the ctypes unit tests are causing a segfault on my machine: > > test_ctypes > > Warning: could not import ctypes.test.test_numbers: unpack requires a > > string argument of length 1 > > Segmentation fault > > > > Ubunutu 7.04 on i386 machine with an Intel P3. > > I can reproduce this. ctypes.test.test_numbers is easy to fix, but there > are other severe problems with ctypes. > > I would love to look into these, but I prefer debugging on Windows. > However, the windows build does not work because the _fileio builtin > module is missing from config.c. Again, this is not so easy to fix, > because the ftruncate function does not exist on Windows. I don't have a Windows box; contributions to fix this situation are welcome. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From amauryfa at gmail.com Wed Jul 11 14:27:35 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 11 Jul 2007 14:27:35 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> Message-ID: <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> Thomas Heller wrote: > I would love to look into these, but I prefer debugging on Windows. > However, the windows build does not work because the _fileio builtin > module is missing from config.c. Again, this is not so easy to fix, > because the ftruncate function does not exist on Windows. In fileobject.c, there is a replacement for ftruncate. See the code around the call to SetEndOfFile(). I'll try to provide a patch later today. -- Amaury Forgeot d'Arc From guido at python.org Wed Jul 11 14:41:21 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 15:41:21 +0300 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> Message-ID: <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> That would be great! Assign it to theller who can test it much better than I can. On 7/11/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote: > Thomas Heller wrote: > > I would love to look into these, but I prefer debugging on Windows. > > However, the windows build does not work because the _fileio builtin > > module is missing from config.c. Again, this is not so easy to fix, > > because the ftruncate function does not exist on Windows. > > In fileobject.c, there is a replacement for ftruncate. See the code > around the call to SetEndOfFile(). > > I'll try to provide a patch later today. > > -- > Amaury Forgeot d'Arc > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Jul 11 14:50:44 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 11 Jul 2007 14:50:44 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> Message-ID: <f72jn4$cma$1@sea.gmane.org> Guido van Rossum schrieb: > That would be great! Assign it to theller who can test it much better > than I can. > > On 7/11/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote: >> Thomas Heller wrote: >> > I would love to look into these, but I prefer debugging on Windows. >> > However, the windows build does not work because the _fileio builtin >> > module is missing from config.c. Again, this is not so easy to fix, >> > because the ftruncate function does not exist on Windows. >> >> In fileobject.c, there is a replacement for ftruncate. See the code >> around the call to SetEndOfFile(). >> >> I'll try to provide a patch later today. Awaiting your patch ;-). The most important problem, IMO, is now that wide filenames on Windows are not implemented, see the code starting at line 148 in _fileio.c. This prevents most unittests to run because test_support cannot be imported: C:\svn\py3k-struni\PCbuild>python -E -tt ../lib/test/regrtest.py Traceback (most recent call last): File "../lib/test/regrtest.py", line 165, in <module> from test import test_support File "C:\svn\py3k-struni\lib\test\test_support.py", line 182, in <module> fp = open(TESTFN, 'w+') File "C:\svn\py3k-struni\lib\site.py", line 412, in __new__ return io.open(*args, **kwds) File "C:\svn\py3k-struni\lib\io.py", line 122, in open (updating and "+" or "")) NotImplementedError: Windows wide filenames are not yet supported Thomas From theller at ctypes.org Wed Jul 11 16:08:47 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 11 Jul 2007 16:08:47 +0200 Subject: [Python-3000] Heaptypes Message-ID: <f72o9f$v6i$1@sea.gmane.org> ctypes creates heaptypes with this call, in _ctypes.c, line 3986 (slightly simplified): result = PyObject_CallFunction((PyObject *)&ArrayType_Type, "s(O){s:n,s:O}", name, &Array_Type, "_length_", length, "_type_", itemtype ); The call succeeds. Printing the type fails with an assertion: theller at tubu:~/devel/py3k-struni$ ./python Python 3.0x (py3k-struni:56268M, Jul 11 2007, 15:56:43) [GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from ctypes import c_int [54751 refs] >>> atype = c_int * 3 [54762 refs] >>> atype.__name__ s'c_int_Array_3' [55278 refs] >>> repr(atype) python: Objects/unicodeobject.c:630: PyUnicodeUCS2_FromFormatV: Assertion `obj && ((((obj)->ob_type)->tp_flags & ((1L<<28))) != 0)' failed. Abgebrochen theller at tubu:~/devel/py3k-struni$ As one can see, the __name__ is a byte string (or how is this called now?). The fix is probably to use an 'U' format character in the PyObject_CallFunction format string, but I assume the call should have failed in the first place? And what about the dictionary that is constructed for the call '{s:n,s:O}', should it use 'U' format chars also? Thomas From guido at python.org Wed Jul 11 16:15:41 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 17:15:41 +0300 Subject: [Python-3000] Heaptypes In-Reply-To: <f72o9f$v6i$1@sea.gmane.org> References: <f72o9f$v6i$1@sea.gmane.org> Message-ID: <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> There are currently three "string" types, here shown with there repr styles: - str = 'same as unicode in 2.x' - bytes = b'new, mutable list of small ints' - str8 = s'same as str in 2.x' The s'...' notation means it's an 8-bit string (not a bytes array). This is not supported in the syntax; it's just used on output. (Use str8(b'...') to create one of these.) I'm still hoping to remove this type before the release, but it appears to be still necessary so far. I don't know enouch about ...CallFunction to help you with the rest. --Guido On 7/11/07, Thomas Heller <theller at ctypes.org> wrote: > ctypes creates heaptypes with this call, in _ctypes.c, line 3986 (slightly simplified): > > result = PyObject_CallFunction((PyObject *)&ArrayType_Type, > "s(O){s:n,s:O}", > name, > &Array_Type, > "_length_", > length, > "_type_", > itemtype > ); > > The call succeeds. Printing the type fails with an assertion: > > theller at tubu:~/devel/py3k-struni$ ./python > Python 3.0x (py3k-struni:56268M, Jul 11 2007, 15:56:43) > [GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from ctypes import c_int > [54751 refs] > >>> atype = c_int * 3 > [54762 refs] > >>> atype.__name__ > s'c_int_Array_3' > [55278 refs] > >>> repr(atype) > python: Objects/unicodeobject.c:630: PyUnicodeUCS2_FromFormatV: Assertion `obj && ((((obj)->ob_type)->tp_flags & ((1L<<28))) != 0)' failed. > Abgebrochen > theller at tubu:~/devel/py3k-struni$ > > As one can see, the __name__ is a byte string (or how is this called now?). > The fix is probably to use an 'U' format character in the PyObject_CallFunction format string, > but I assume the call should have failed in the first place? And what about the dictionary that > is constructed for the call '{s:n,s:O}', should it use 'U' format chars also? > > Thomas > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Wed Jul 11 16:39:00 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 11 Jul 2007 16:39:00 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> Message-ID: <f72q24$61n$1@sea.gmane.org> Guido van Rossum schrieb: > There are currently three "string" types, here shown with there repr styles: > > - str = 'same as unicode in 2.x' > - bytes = b'new, mutable list of small ints' > - str8 = s'same as str in 2.x' > > The s'...' notation means it's an 8-bit string (not a bytes array). > This is not supported in the syntax; it's just used on output. (Use > str8(b'...') to create one of these.) I'm still hoping to remove this > type before the release, but it appears to be still necessary so far. > > I don't know enouch about ...CallFunction to help you with the rest. Let me explain it in other words. This code creates a new type: >>> ht = type("name", (object,), {}) [47054 refs] >>> ht <class '__main__.name'> [47093 refs] The '__name__' attribute is a (unicode) string: >>> ht.__name__ 'name' [47121 refs] >>> But I can also create a type in this way: >>> ht = type(str8(b"name"), (object,), {}) [47208 refs] The __name__ attribute is a str8 instance: >>> ht.__name__ s'name' [47236 refs] Printing the type triggers an assertion: >>> ht Assertion failed: obj && PyUnicode_Check(obj), file \svn\py3k-struni\Objects\unicodeobject.c, line 630 C:\svn\py3k-struni\PCbuild> because parts of the code assume that the '__name__' is a (unicode) string. Thomas From guido at python.org Wed Jul 11 16:47:47 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 11 Jul 2007 17:47:47 +0300 Subject: [Python-3000] Heaptypes In-Reply-To: <f72q24$61n$1@sea.gmane.org> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <f72q24$61n$1@sea.gmane.org> Message-ID: <ca471dc20707110747o16064f8an8889467d81bb434c@mail.gmail.com> On 7/11/07, Thomas Heller <theller at ctypes.org> wrote: > Let me explain it in other words. This code creates a new type: > > >>> ht = type("name", (object,), {}) > [47054 refs] > >>> ht > <class '__main__.name'> > [47093 refs] > > The '__name__' attribute is a (unicode) string: > > >>> ht.__name__ > 'name' > [47121 refs] > >>> > > But I can also create a type in this way: > > >>> ht = type(str8(b"name"), (object,), {}) > [47208 refs] > > The __name__ attribute is a str8 instance: > > >>> ht.__name__ > s'name' > [47236 refs] > > Printing the type triggers an assertion: > > >>> ht > Assertion failed: obj && PyUnicode_Check(obj), file \svn\py3k-struni\Objects\unicodeobject.c, line 630 > C:\svn\py3k-struni\PCbuild> > > because parts of the code assume that the '__name__' is a (unicode) string. Hm. I guess the creation must insist that __name__ is a unicode. Can you fix this yourself? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at python.org Wed Jul 11 17:07:48 2007 From: thomas at python.org (Thomas Wouters) Date: Wed, 11 Jul 2007 08:07:48 -0700 Subject: [Python-3000] Fwd: Your confirmation is required to leave the Python-3000 mailing list In-Reply-To: <ca471dc20707110445n28be3d1g8d46daa701eaeab2@mail.gmail.com> References: <mailman.0.1184147038.21671.python-3000@python.org> <ca471dc20707110445n28be3d1g8d46daa701eaeab2@mail.gmail.com> Message-ID: <9e804ac0707110807i1c2e515dy5538aa487b711649@mail.gmail.com> I can't find the message you forwarded in mail.python.org's logs (although I may be looking wrong; its hard to do such a search without the full headers of the original) -- but it looks to me like it was a hoax, and not an actual unsubscription request from mail.python.org. On 7/11/07, Guido van Rossum <guido at python.org> wrote: > > Which joker tried to unsub me? > > ---------- Forwarded message ---------- > From: python-3000-confirm+e08ed5828...1ff418f380758543 at python.org > <python-3000-confirm+e08ed58281...51ff418f380758543 at python.org> > Date: Jul 11, 2007 12:43 PM > Subject: Your confirmation is required to leave the Python-3000 mailing > list > To: guido at python.org > > > Mailing list removal confirmation notice for mailing list Python-3000 > > We have received a request for the removal of your email address, > "guido at python.org" from the python-3000 at python.org mailing list. To > confirm that you want to be removed from this mailing list, simply > reply to this message, keeping the Subject: header intact. Or visit > this web page: > > http://mail.python.org/mailman/confirm/python-3000/e08ed5828...8543 > > > Or include the following line -- and only the following line -- in a > message to python-3000-request at python.org: > > confirm e08e...0758543 > > Note that simply sending a `reply' to this message should work from > most mail readers, since that usually leaves the Subject: line in the > right form (additional "Re:" text in the Subject: is okay). > > If you do not wish to be removed from this list, please simply > disregard this message. If you think you are being maliciously > removed from the list, or have any other questions, send them to > python-3000-owner at python.org. > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/thomas%40python.org > -- Thomas Wouters <thomas at python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070711/b89d3fc1/attachment.html From walter at livinglogic.de Wed Jul 11 17:28:09 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Wed, 11 Jul 2007 17:28:09 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> Message-ID: <4694F709.2040304@livinglogic.de> Guido van Rossum wrote: > Fixed in subversion. Please do review r56252 to see that I did the right > thing. I haven't looked at test_descr.py but the rest looks good to me. I guess for the final version of Py3000 type_set_name() in typeobject.c will not downgrade unicode strings to str8, but instead upgrade str8 objects to unicode. Also now that PyObject_Unicode() tries __unicode__ first and then tp_str should we rename all __unicode__ methods to __str__, or will __unicode__ stay? Servus, Walter > On 7/11/07, Guido van Rossum <guido at python.org> wrote: >> Yeah, I'm looking in to this right now. What a mess! But I'm close to >> a fix. >> >> There's more that causes test_descr to fail however. Bleh, what a >> terrible unit test -- it doesn't use the unittest module, and a single >> failure aborts the rest of the test. >> >> --Guido >> >> On 7/11/07, Walter D?rwald <walter at livinglogic.de> wrote: >> > Christian Heimes wrote: >> > >> > > I found a bug in the str type that may affect a lot of tests. >> > > >> > > In the py3k-struni branch the str() constructor doesn't use >> __str__ when >> > > the argument is an instance of a subclass of str. A user defined >> string >> > > can't change __str__(). The __repr__ method isn't affected. >> > >> > This hasn't been rewired yet. Behind the covers str still behaves like >> > unicode, i.e. it uses __unicode__ for conversion. >> > >> > Servus, >> > Walter >> > _______________________________________________ >> > Python-3000 mailing list >> > Python-3000 at python.org >> > http://mail.python.org/mailman/listinfo/python-3000 >> > Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > >> >> >> -- >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> > > From theller at ctypes.org Wed Jul 11 17:52:49 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 11 Jul 2007 17:52:49 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <4694F709.2040304@livinglogic.de> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> <4694F709.2040304@livinglogic.de> Message-ID: <f72uch$mki$1@sea.gmane.org> Walter D?rwald schrieb: > > I guess for the final version of Py3000 type_set_name() in typeobject.c > will not downgrade unicode strings to str8, but instead upgrade str8 > objects to unicode. I'm currently working on type_set_name, see the other message with subject 'Heaptypes'. Thomas From amauryfa at gmail.com Wed Jul 11 17:53:14 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 11 Jul 2007 17:53:14 +0200 Subject: [Python-3000] Fwd: Your confirmation is required to leave the Python-3000 mailing list In-Reply-To: <9e804ac0707110807i1c2e515dy5538aa487b711649@mail.gmail.com> References: <mailman.0.1184147038.21671.python-3000@python.org> <ca471dc20707110445n28be3d1g8d46daa701eaeab2@mail.gmail.com> <9e804ac0707110807i1c2e515dy5538aa487b711649@mail.gmail.com> Message-ID: <e27efe130707110853x6eb32d6raa38c679b3bac2d4@mail.gmail.com> Hello, Thomas Wouters wrote: > > I can't find the message you forwarded in mail.python.org's logs (although I > may be looking wrong; its hard to do such a search without the full headers > of the original) -- but it looks to me like it was a hoax, and not an actual > unsubscription request from mail.python.org. ... > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/amauryfa%40gmail.com > Every mail sent by mailman seems to contain a self-unsubscribe link, like the one just above. On reply, the link (with *my* address) is part of the quoted text. Did someone click on such a link, and used the web interface? -- Amaury Forgeot d'Arc From chrism at plope.com Wed Jul 11 19:16:01 2007 From: chrism at plope.com (Chris McDonough) Date: Wed, 11 Jul 2007 13:16:01 -0400 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> Message-ID: <FB45562E-60A9-4369-837B-ECD846981ED1@plope.com> I have a very remedial question about how to fix test failures due to the side effects of string-unicode integration. The xmlrpc library uses explicit encoding to encode XML tag payloads to (almost always) utf8. Tag literals are not encoded. What would be the best way to mimic this behavior under the new regime? Just use unicode everywhere and encode the entire XML body to utf-8 at the end? Or deal explicitly in bytes everywhere? Or..? Remedially, - C On Jul 10, 2007, at 5:14 PM, Guido van Rossum wrote: > One of the most daunting tasks remaining for Python 3.0a1 (to be > released by the end of August) is fixing the remaining failing unit > tests in the py3k-struni branch > (http://svn.python.org/view/python/branches/py3k-struni/). > > This is the branch where I have started the work on the > string/unification branch. I want to promote this branch to become the > "main" Py3k branch ASAP (by renaming it to py3k), but I don't want to > do that until all unit tests pass. I've been working diligently on > this task, and I've got it down to about 50 tests that are failing on > at least one of OSX and Ubuntu (the platforms to which I have easy > access). Now I need help. > > To facilitate distributing the task of getting the remaining tests to > pass, I've created a wiki page: > http://wiki.python.org/moin/Py3kStrUniTests . Please help! It's easy > to help: (1) check out the py3k-struni branch; (2) build it; (3) pick > a test and figure out why it's failing; (4) produce a fix; (5) submit > the fix to SF (or check it in, if you have submit privileges and are > confident enough). > > In order to avoid duplicate work, I've come up with a simple protocol: > you mark a test in the wiki as "MINE" (with your name) when you start > looking at it. You mark it as "FIXED [IN SF]" once you fix it, adding > the patch# if the fix is in SF. If you give up, remove your lock, > adding instead a note with what you've found (even just the names of > the failing subtests is helpful). > > Please help! > > There are other tasks, see PEP 3100. Mail me if you're interested in > anything specifically. (Please don't ask me "do you think I could do > this" -- you know better than I whether you're capable of coding at a > specific level. If you don't understand the task, you're probably not > qualified.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists > %40plope.com > From amauryfa at gmail.com Wed Jul 11 20:33:46 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 11 Jul 2007 20:33:46 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <f72jn4$cma$1@sea.gmane.org> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> <f72jn4$cma$1@sea.gmane.org> Message-ID: <e27efe130707111133h329080fcocf1a3f1c5e954824@mail.gmail.com> Hi, Thomas Heller wrote: > The most important problem, IMO, is now that wide filenames on Windows are not > implemented, see the code starting at line 148 in _fileio.c. This prevents > most unittests to run because test_support cannot be imported: > > C:\svn\py3k-struni\PCbuild>python -E -tt ../lib/test/regrtest.py > Traceback (most recent call last): > File "../lib/test/regrtest.py", line 165, in <module> > from test import test_support > File "C:\svn\py3k-struni\lib\test\test_support.py", line 182, in <module> > fp = open(TESTFN, 'w+') > File "C:\svn\py3k-struni\lib\site.py", line 412, in __new__ > return io.open(*args, **kwds) > File "C:\svn\py3k-struni\lib\io.py", line 122, in open > (updating and "+" or "")) > NotImplementedError: Windows wide filenames are not yet supported The attached patch corrects this. Now open() accept both unicode strings and bytes objects. -- Amaury Forgeot d'Arc -------------- next part -------------- A non-text attachment was scrubbed... Name: fileio-1.diff Type: application/octet-stream Size: 1473 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070711/5bb54244/attachment.obj From amauryfa at gmail.com Wed Jul 11 21:13:31 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 11 Jul 2007 21:13:31 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <f72jn4$cma$1@sea.gmane.org> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> <f72jn4$cma$1@sea.gmane.org> Message-ID: <e27efe130707111213h7830a78fg584177883240217e@mail.gmail.com> Re-hello, Thomas Heller wrote: > > On 7/11/07, Amaury Forgeot d'Arc wrote: > >> Thomas Heller wrote: > >> > I would love to look into these, but I prefer debugging on Windows. > >> > However, the windows build does not work because the _fileio builtin > >> > module is missing from config.c. Again, this is not so easy to fix, > >> > because the ftruncate function does not exist on Windows. > >> > >> In fileobject.c, there is a replacement for ftruncate. See the code > >> around the call to SetEndOfFile(). > >> > >> I'll try to provide a patch later today. > > Awaiting your patch ;-). Ok, here it is; shamelessly copied from fileobject.c. BTW, what is the status of this fileobject? open() doesn't seem to use it anymore. Will file() be removed at some point? Now test_fileio passes on Windows, with the exception of testAbles(): since c:\dev is an existing directory on my machine, /dev/tty is a regular file and is seekable... Maybe skip this test on win32? I have a couple of other corrections, found by randomly playing with the tests functions... shall I post the corrections here as well? -- Amaury Forgeot d'Arc -------------- next part -------------- A non-text attachment was scrubbed... Name: fileio-2.diff Type: application/octet-stream Size: 1681 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070711/d18cdd5d/attachment-0001.obj From theller at ctypes.org Wed Jul 11 22:07:11 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 11 Jul 2007 22:07:11 +0200 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <e27efe130707111213h7830a78fg584177883240217e@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> <f72jn4$cma$1@sea.gmane.org> <e27efe130707111213h7830a78fg584177883240217e@mail.gmail.com> Message-ID: <f73d9i$emp$1@sea.gmane.org> Amaury Forgeot d'Arc schrieb: > Re-hello, > > Thomas Heller wrote: >> > On 7/11/07, Amaury Forgeot d'Arc wrote: >> >> Thomas Heller wrote: >> >> > I would love to look into these, but I prefer debugging on Windows. >> >> > However, the windows build does not work because the _fileio builtin >> >> > module is missing from config.c. Again, this is not so easy to fix, >> >> > because the ftruncate function does not exist on Windows. >> >> >> >> In fileobject.c, there is a replacement for ftruncate. See the code >> >> around the call to SetEndOfFile(). >> >> >> >> I'll try to provide a patch later today. >> >> Awaiting your patch ;-). > > Ok, here it is; shamelessly copied from fileobject.c. Amaury, please upload your patches to the SF bug tracker, and assign them to me. I will (hopefully) look into them tomorrow. > BTW, what is the status of this fileobject? open() doesn't seem to use > it anymore. Will file() be removed at some point? > > Now test_fileio passes on Windows, > with the exception of testAbles(): since c:\dev is an existing > directory on my machine, /dev/tty is a regular file and is seekable... > Maybe skip this test on win32? > > I have a couple of other corrections, found by randomly playing with > the tests functions... shall I post the corrections here as well? See above: posting them to the tracker makes sure they don't get lost. Thanks, Thomas From guido at python.org Wed Jul 11 23:03:03 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jul 2007 00:03:03 +0300 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <4694F709.2040304@livinglogic.de> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> <4694F709.2040304@livinglogic.de> Message-ID: <ca471dc20707111403u45c29d39hf12f2fcaa3b55696@mail.gmail.com> On 7/11/07, Walter D?rwald <walter at livinglogic.de> wrote: > I guess for the final version of Py3000 type_set_name() in typeobject.c > will not downgrade unicode strings to str8, but instead upgrade str8 > objects to unicode. Right, Thomas is working on this (but I have some feedback on his fix). > Also now that PyObject_Unicode() tries __unicode__ first and then tp_str > should we rename all __unicode__ methods to __str__, or will __unicode__ > stay? __unicode__ should be renamed to __str__, or removed (depending on whether the __str__ method already does the right thing). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 11 23:05:39 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jul 2007 00:05:39 +0300 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <FB45562E-60A9-4369-837B-ECD846981ED1@plope.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <FB45562E-60A9-4369-837B-ECD846981ED1@plope.com> Message-ID: <ca471dc20707111405q44922c5chaba514e734f86c9@mail.gmail.com> On 7/11/07, Chris McDonough <chrism at plope.com> wrote: > I have a very remedial question about how to fix test failures due to > the side effects of string-unicode integration. > > The xmlrpc library uses explicit encoding to encode XML tag payloads > to (almost always) utf8. Tag literals are not encoded. > > What would be the best way to mimic this behavior under the new > regime? Just use unicode everywhere and encode the entire XML body > to utf-8 at the end? Or deal explicitly in bytes everywhere? Or..? The correct approach would be to use Unicode (i.e., str) everywhere and encode to UTF-8 at the end. If that's too hard something's wrong with the philosophy of using Unicode everywhere... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 11 23:08:35 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jul 2007 00:08:35 +0300 Subject: [Python-3000] [Python-Dev] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <e27efe130707111213h7830a78fg584177883240217e@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <46940875.2000606@cheimes.de> <f728g6$6gt$1@sea.gmane.org> <ca471dc20707110446je177145x8d0b0cf2ca2f8487@mail.gmail.com> <e27efe130707110527r58a3d1a0g7ab78d9e1707d51c@mail.gmail.com> <ca471dc20707110541i4a3199c1w4ea14a1c486bdd01@mail.gmail.com> <f72jn4$cma$1@sea.gmane.org> <e27efe130707111213h7830a78fg584177883240217e@mail.gmail.com> Message-ID: <ca471dc20707111408x138db80en9080346bcea0e3ba@mail.gmail.com> On 7/11/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote: > BTW, what is the status of this fileobject? open() doesn't seem to use > it anymore. Will file() be removed at some point? The 'file' builtin is already gone. (You did use the py3k-struni branch, didn't you?) Some parts of the fileobject.c file will remain, but the only APIs that remain in there are generic I/O APIs that work with file-like objects (in particular io.IOBase). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Thu Jul 12 01:12:45 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 11 Jul 2007 23:12:45 +0000 (UTC) Subject: [Python-3000] Change _Py prefix for 3k? Message-ID: <f73o5d$hsf$1@sea.gmane.org> It's a small detail but I wonder if it's time to stop using a leading underscore for internal APIs. I'm not sure what would be a good replacement, perhaps a trailing underscore. In case people don't remember, the _Py prefix could, theoretically, be invalid C on some platforms. Regards, Neil From tjreedy at udel.edu Thu Jul 12 04:01:01 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 Jul 2007 22:01:01 -0400 Subject: [Python-3000] PEP 3099 += no bool change? Message-ID: <f7420u$cg1$1@sea.gmane.org> Someone asked if Py3 would get a "real" or "pure" bool type (one not subclassing int). [The usual complaints and rehash about current bool ensured.] I believe (and said so) that this is a settled question. If so, please add a line under Standard types * bool will continue to subclass int. tjr From joe at bitworking.org Thu Jul 12 07:02:51 2007 From: joe at bitworking.org (Joe Gregorio) Date: Thu, 12 Jul 2007 01:02:51 -0400 Subject: [Python-3000] test_mmap.py and OSError Message-ID: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> I decided to try to tackle the unit tests failing on the py3k-struni branch for mmap. It now passes all the unit tests but one, and the problem is that I don't know what should be 'fixed'. The code in the unit test is: finally: try: f.close() except OSError: pass The problem is that the file is already closed and in Lib/io.py, the close calls flush() and flush() raises ValueError() if the file is already closed, but the unit test is looking for OSError. Should io.py raise OSError instead of ValueError? Or should test_mmap.py be expecting ValueError? Or is there something else that I'm completely missing? [ The wisdom of choosing mmap as my first fiddling with Python internals can be debated later :) ] Thanks, -joe -- Joe Gregorio http://bitworking.org From greg.ewing at canterbury.ac.nz Thu Jul 12 07:26:54 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 12 Jul 2007 17:26:54 +1200 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> Message-ID: <4695BB9E.2030202@canterbury.ac.nz> Joe Gregorio wrote: > flush() raises > ValueError() if the file is already closed, > > Should io.py raise OSError instead of ValueError? Is it really necessary to raise anything at all? An already-closed file is as flushed as it can get, so why not just let it be a no-op? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From guido at python.org Thu Jul 12 09:02:29 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jul 2007 10:02:29 +0300 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <4695BB9E.2030202@canterbury.ac.nz> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> <4695BB9E.2030202@canterbury.ac.nz> Message-ID: <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> On 7/12/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > Joe Gregorio wrote: > > flush() raises > > ValueError() if the file is already closed, > > > > Should io.py raise OSError instead of ValueError? > > Is it really necessary to raise anything at all? > An already-closed file is as flushed as it can > get, so why not just let it be a no-op? I like that much better. So close() shouldn't try to flush() if it's already closed. This means fixing io.py. (Unfortunately it's a bit of a mess, a bit of refactoring would do it good.) BTW whenever changing io.py, always run both test_io.py and test_file.py, as they test slightly different sets of behavior. (Though occasionally these tests must be adjusted too.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jul 12 09:04:44 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Jul 2007 10:04:44 +0300 Subject: [Python-3000] Change _Py prefix for 3k? In-Reply-To: <f73o5d$hsf$1@sea.gmane.org> References: <f73o5d$hsf$1@sea.gmane.org> Message-ID: <ca471dc20707120004r4ff7c3ceg475c55a0b751ecff@mail.gmail.com> On 7/12/07, Neil Schemenauer <nas at arctrix.com> wrote: > It's a small detail but I wonder if it's time to stop using a > leading underscore for internal APIs. I'm not sure what would be a > good replacement, perhaps a trailing underscore. In case people > don't remember, the _Py prefix could, theoretically, be invalid C on > some platforms. There are lots of things we do that could theoretically be bad C. I doubt that this particular one will ever bite us. Are there any other reasons for such a change? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From walter at livinglogic.de Thu Jul 12 14:16:33 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Thu, 12 Jul 2007 14:16:33 +0200 Subject: [Python-3000] Need help fixing failing Py3k Unittests in py3k-struni In-Reply-To: <ca471dc20707111403u45c29d39hf12f2fcaa3b55696@mail.gmail.com> References: <ca471dc20707101414q77168e32v12b157c2ab756394@mail.gmail.com> <f71d9d$thn$1@sea.gmane.org> <46948C1E.5050800@livinglogic.de> <ca471dc20707110101m5f65249aj4cf1122be20c5856@mail.gmail.com> <ca471dc20707110230r21b2938bxa6d5bffe8fd968aa@mail.gmail.com> <4694F709.2040304@livinglogic.de> <ca471dc20707111403u45c29d39hf12f2fcaa3b55696@mail.gmail.com> Message-ID: <46961BA1.9080206@livinglogic.de> Guido van Rossum wrote: > On 7/11/07, Walter D?rwald <walter at livinglogic.de> wrote: >> I guess for the final version of Py3000 type_set_name() in typeobject.c >> will not downgrade unicode strings to str8, but instead upgrade str8 >> objects to unicode. > > Right, Thomas is working on this (but I have some feedback on his fix). > >> Also now that PyObject_Unicode() tries __unicode__ first and then tp_str >> should we rename all __unicode__ methods to __str__, or will __unicode__ >> stay? > > __unicode__ should be renamed to __str__, or removed (depending on > whether the __str__ method already does the right thing). I've dropped __unicode__ from tkinter. The only remaining __unicode__ use is in the email package (besides the tests, where IMHO __unicode__ should stay as long as its handled by PyObject_Unicode()). email.Header.Header defines a __unicode__ which is different from the __str__ method. I guess Barry will know how to fix this. Servus, Walter From joe at bitworking.org Thu Jul 12 15:54:23 2007 From: joe at bitworking.org (Joe Gregorio) Date: Thu, 12 Jul 2007 09:54:23 -0400 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> <4695BB9E.2030202@canterbury.ac.nz> <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> Message-ID: <3f1451f50707120654s13e81551x25df9a1dadccafb0@mail.gmail.com> On 7/12/07, Guido van Rossum <guido at python.org> wrote: > On 7/12/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > > Joe Gregorio wrote: > > > flush() raises > > > ValueError() if the file is already closed, > > > > > > Should io.py raise OSError instead of ValueError? > > > > Is it really necessary to raise anything at all? > > An already-closed file is as flushed as it can > > get, so why not just let it be a no-op? > > I like that much better. So close() shouldn't try to flush() if it's > already closed. This means fixing io.py. (Unfortunately it's a bit of > a mess, a bit of refactoring would do it good.) Thanks for the guidance. This patch fixes mmap and also changes io.py so that close() doesn't flush if it's already closed. I did run both test_io.py and test_file.py when checking the changes to io.py. http://www.python.org/sf/1752647 Thanks, -joe -- Joe Gregorio http://bitworking.org From nas at arctrix.com Thu Jul 12 17:53:48 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 12 Jul 2007 09:53:48 -0600 Subject: [Python-3000] Change _Py prefix for 3k? In-Reply-To: <ca471dc20707120004r4ff7c3ceg475c55a0b751ecff@mail.gmail.com> References: <f73o5d$hsf$1@sea.gmane.org> <ca471dc20707120004r4ff7c3ceg475c55a0b751ecff@mail.gmail.com> Message-ID: <20070712155348.GA29907@arctrix.com> On Thu, Jul 12, 2007 at 10:04:44AM +0300, Guido van Rossum wrote: > There are lots of things we do that could theoretically be bad C. I > doubt that this particular one will ever bite us. Are there any other > reasons for such a change? I think Python is one of the only open source projects to use a _[A-Z] prefix on non-local symbols. That seems more dangerous that other non-standard stuff. Also, it could be hard to work around if someone runs into trouble. My gut feeling is that it's not worth the effort to change but I wanted it to be considered for 3k. Neil From martin at v.loewis.de Fri Jul 13 17:19:45 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 13 Jul 2007 17:19:45 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> Message-ID: <46979811.2050405@v.loewis.de> > I don't know enouch about ...CallFunction to help you with the rest. I wonder whether the "s" specifier in CallFunction, BuildValue etc should create Unicode objects, rather than str8 objects. Regards, Martin From pje at telecommunity.com Fri Jul 13 19:41:47 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 13 Jul 2007 13:41:47 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> Message-ID: <20070713173936.53C213A404D@sparrow.telecommunity.com> At 07:39 AM 7/13/2007 +0200, Michele Simionato wrote: >But I want to ask your opinion first, in order to understand if you >are willing to scale down your proposal or not. At EuroPython Guido >said that in private mail you made some strong argument explaining >why the PEP could not be simplified, but he did not say more than that It's not an argument that the PEP can't be simplified; only that a simpler PEP won't accomplish my original goal for the PEP (of having a generic API for generic functions) vs. simply having a generic function implementation in the stdlib. The first goal requires the second, but the second doesn't need the first, and as far as I'm aware, I'm the only person who really wants the first. A simpler PEP could exist to implement the second goal only, implementing dynamic overloading in Python 3.0 with all of the non-controversial features of 3124, and using Guido's preferred API. The holdup is that I don't have time to work on the *implementation* of both my version *and* this simplified version; there is little overlap between the two because mine is highly self-referential/self-bootstrapping, absolutely dependent on being able to modify functions in-place (a feature Guido seems near -1 on), and virtually impossible to scale down. So, it is much lower on my priorities at the moment to implement the simplified version, because I will neither gain code reuse *nor* the API standardization I'd hoped for. At the moment, my plan is to finish implementing a PEP 3124-like, fully extensible implementation for Python 2.x (see PEAK-Rules), then look at splitting 3124 into a simplified version and a separate extension API PEP aimed at Python 3.1 or later. At that point, I will know for sure what extension API features are necessary to implement the more advanced features I want in PEAK-Rules. I expect to be able to start work on this (i.e., revisiting the proposal) in about a month. With luck, I will be able to carve out enough time to create the simpler implementation and update the PEP in a reasonable amount of time. However, there is nothing stopping anyone else who wishes it from either making the simpler implementation or drafting the scaled-down PEP. The simpler version Guido wants isn't really that different from his existing generic function prototype, especially if you drop all forms of method combination (including :next_method). It will also need positional dispatching, but that's another feature that could perhaps wait for 3.1 as well. In short, if you want a PEP 3124 implementation started on sooner than about a month from now, you need to find a volunteer or do it yourself. >The point is that for 95% of my use cases, simplegeneric would be >enough, and it is alreay available *now*. So, if Guido was willing >to accept something like simplegeneric for Python 3.0, I would not >mind waiting for multiple dispatch in 3.1. You'll have to ask him about that. For what it's worth, the pkgutil module already contains an even simpler generic function implementation than simplegeneric, and is already in the stdlib albeit undocumented. >The reason why I am not using simplegeneric or RuleDispatch already, >is that I do not want to commit in production to a technology >without the official approval of the BDFL, and I prefer to wait now >than having to change my code later. I guess this means you never use any packages from the Cheeseshop? :) From michele.simionato at gmail.com Fri Jul 13 20:37:40 2007 From: michele.simionato at gmail.com (Michele Simionato) Date: Fri, 13 Jul 2007 18:37:40 +0000 (UTC) Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.co m> <20070713173936.53C213A404D@sparrow.telecommunity.com> Message-ID: <loom.20070713T201857-3@post.gmane.org> Phillip J. Eby <pje <at> telecommunity.com> writes: > For what it's worth, the pkgutil > module already contains an even simpler generic function > implementation than simplegeneric, and is already in the stdlib > albeit undocumented. Well, that is good to know. Personally I would be content with something at that level of sophistication (i.e. the absolute minimum). I think there is no much experience in the community with generic functions (except for you) and there is no danger in waiting and in acquiring more experience before including in the standard library a fully featured package. After all, RuleDispatch is already out there and there is no reason for putting everything in the stdlib. For the same reason, I am happy that Zope interfaces will stay out of the stdlib, and that we will have the much simpler ABC (of course one could argue that generic functions are better than ABC and actually I think so, but still ABC are a simpler entry point for most programmers, more in line with how Python has worked until now, and they will allows me to throw away an half-backed interface implementation I am using now, which is always a good thing ;) Michele Simionato From theller at ctypes.org Fri Jul 13 21:13:39 2007 From: theller at ctypes.org (Thomas Heller) Date: Fri, 13 Jul 2007 21:13:39 +0200 Subject: [Python-3000] pep3115 - metaclasses in python 3000 Message-ID: <f78it5$t35$1@sea.gmane.org> playing a little with py3k... pep3115 mentions that "__prepare__ returns a dictionary-like object which is used to store the class member definitions during evaluation of the class body." It does not mention whether this dict-like object is used afterwards as the class-dictionary of the created class or not (when the __new__ method of the metaclass is called). The sample-code suggests that it would be used as class dict of the newly created class (the sample code copies it into a regular dictionary before it is passed to the type.__new__ call). However, the actual code in the py3k-struni branch (typeobject.c) copies the passed in dict again. In other words, it seems impossible even with pep3115 to use a custom subclass of dict as a type's __dict__ member, and afaik it is impossible in Python to replace that afterwards. Is this analysis correct? Is that the intent of pep3115? Or could the code be changed so that it is possible to supply a custom type dict with the metaclass? Thanks, Thomas From pje at telecommunity.com Fri Jul 13 23:51:52 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 13 Jul 2007 17:51:52 -0400 Subject: [Python-3000] pep3115 - metaclasses in python 3000 In-Reply-To: <f78it5$t35$1@sea.gmane.org> References: <f78it5$t35$1@sea.gmane.org> Message-ID: <20070713214940.9A5883A404D@sparrow.telecommunity.com> At 09:13 PM 7/13/2007 +0200, Thomas Heller wrote: >playing a little with py3k... > >pep3115 mentions that "__prepare__ returns a dictionary-like object >which is used to store the class member definitions during evaluation >of the class body." > >It does not mention whether this dict-like object is used afterwards >as the class-dictionary of the created class or not (when the __new__ >method of the metaclass is called). > >The sample-code suggests that it would be used as class dict of the >newly created class (the sample code copies it into a regular dictionary >before it is passed to the type.__new__ call). >However, the actual code in the py3k-struni branch (typeobject.c) copies >the passed in dict again. > >In other words, it seems impossible even with pep3115 to use a custom >subclass of dict as a type's __dict__ member, and afaik it is impossible >in Python to replace that afterwards. > >Is this analysis correct? Is that the intent of pep3115? Or could >the code be changed so that it is possible to supply a custom type dict >with the metaclass? I would suggest that we do not intend that the class __dict__ == the __prepare__ object, even as the default case. Otherwise, we have to find everything that accesses type dictionaries and make sure they can work with other kinds of objects. From talin at acm.org Sat Jul 14 06:56:44 2007 From: talin at acm.org (Talin) Date: Fri, 13 Jul 2007 21:56:44 -0700 Subject: [Python-3000] pep3115 - metaclasses in python 3000 In-Reply-To: <f78it5$t35$1@sea.gmane.org> References: <f78it5$t35$1@sea.gmane.org> Message-ID: <4698578C.3080808@acm.org> Thomas Heller wrote: > playing a little with py3k... > > pep3115 mentions that "__prepare__ returns a dictionary-like object > which is used to store the class member definitions during evaluation > of the class body." > > It does not mention whether this dict-like object is used afterwards > as the class-dictionary of the created class or not (when the __new__ > method of the metaclass is called). The intention is that it's up to the metaclass to decide. I suspect that most metaclasses won't want to use the dict-like object as the class dict, for two reasons: 1) The behavior of assigning to the class dict after class creation is likely to be different than the behavior of assignment during class creation. In particular, a typical 'dict-like' object is likely to be slower than a dict (it has more work to do, after all), and you don't want that slowness around once your class is finished initializing. 2) A 'dict-like' object doesn't have to support all of the methods of a real dict, wherease a class dict does. So your dict-like wrapper can be relatively simple. -- Talin From lists at cheimes.de Sat Jul 14 15:36:04 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 14 Jul 2007 15:36:04 +0200 Subject: [Python-3000] TextIOWrapper.write(s:str) and bytes in py3k-struni Message-ID: <f7ajg5$pg5$1@sea.gmane.org> Hello! I'm having some troubles with unit tests in the py3k-struni branch. Some test like test_uu are failing because an io.TextIOWrapper instance's write() method doesn't handle bytes. The method is defined as: def write(self, s: str): if self.closed: raise ValueError("write to closed file") # XXX What if we were just reading? b = s.encode(self._encoding) if isinstance(b, str): b = bytes(b) n = self.buffer.write(b) if "\n" in s: # XXX only if isatty self.flush() self._snapshot = self._decoder = None return len(s) The problematic lines are the lines from s.encode() to b = bytes(b). The behavior is more than questionable. A bytes object doesn't have an encode() method and str's encode method() always returns bytes. IMO the write() method should be changed to: def write(self, s: (str, bytes)): if self.closed: raise ValueError("write to closed file") # XXX What if we were just reading? if isinstance(s, basestring): b = s.encode(self._encoding) elif isinstance(s, bytes): b = s else: b = bytes(b) n = self.buffer.write(b) if b"\n" in b: # XXX only if isatty self.flush() self._snapshot = self._decoder = None return len(s) Or the write() should explictly raise a TypeError when it is not allowed to handle bytes. Christian From guido at python.org Sat Jul 14 16:08:31 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 14 Jul 2007 17:08:31 +0300 Subject: [Python-3000] Heaptypes In-Reply-To: <46979811.2050405@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> Message-ID: <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> That sounds like a good idea to try. It may break some more tests but those are all indications of places that incorrectly still require str8. On 7/13/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > > I don't know enouch about ...CallFunction to help you with the rest. > > I wonder whether the "s" specifier in CallFunction, BuildValue etc > should create Unicode objects, rather than str8 objects. > > Regards, > Martin > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Jul 15 16:17:00 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Jul 2007 07:17:00 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error Message-ID: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> When a source file contains a string literal with an out-of-range \U escape (e.g. "\U12345678"), instead of a syntax error pointing to the offending literal, I get this, without any indication of the file or line: UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character This is quite hard to track down. (Both the location of the bad literal in the source file, and the origin of the error in the parser. :-) Can someone come up with a fix? I note that raw escapes show a slightly different error. I also note that the same issue exists for u"..." literals in Python 2.5. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Sun Jul 15 23:04:19 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 15 Jul 2007 23:04:19 +0200 Subject: [Python-3000] exclusion feature for 2to3? Message-ID: <f7e24l$92e$1@sea.gmane.org> In order to have a codebase run in 2.x and 3.x, via automated translated by 2to3, there should be some "exclusion feature" for single lines that tells the refactorer not to touch those lines. For example, if you have some object that still has an iteritems() method and keeps it, it'll have to stay the same during translation. Same goes, e.g., for methods named next(), has_key() etc. Most obvious would be a special comment, something like for x in curiousobject.iteritems(): # 2to3:keep foo(x) Does that make sense? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From python3now at gmail.com Mon Jul 16 03:14:00 2007 From: python3now at gmail.com (James Thiele) Date: Sun, 15 Jul 2007 18:14:00 -0700 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <f7e24l$92e$1@sea.gmane.org> References: <f7e24l$92e$1@sea.gmane.org> Message-ID: <8f01efd00707151814y600e6derb248e3dec921162c@mail.gmail.com> It makes sense - what would you suggest to specify lines/features to exclude? On 7/15/07, Georg Brandl <g.brandl at gmx.net> wrote: > In order to have a codebase run in 2.x and 3.x, via automated translated by > 2to3, there should be some "exclusion feature" for single lines that tells > the refactorer not to touch those lines. > > For example, if you have some object that still has an iteritems() method and > keeps it, it'll have to stay the same during translation. > Same goes, e.g., for methods named next(), has_key() etc. > > Most obvious would be a special comment, something like > > for x in curiousobject.iteritems(): # 2to3:keep > foo(x) > > Does that make sense? > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/python3now%40gmail.com > From guido at python.org Mon Jul 16 04:22:15 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Jul 2007 19:22:15 -0700 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <f7e24l$92e$1@sea.gmane.org> References: <f7e24l$92e$1@sea.gmane.org> Message-ID: <ca471dc20707151922i1894355fh5118d07aa68abb65@mail.gmail.com> On 7/15/07, Georg Brandl <g.brandl at gmx.net> wrote: > In order to have a codebase run in 2.x and 3.x, via automated translated by > 2to3, there should be some "exclusion feature" for single lines that tells > the refactorer not to touch those lines. > > For example, if you have some object that still has an iteritems() method and > keeps it, it'll have to stay the same during translation. > Same goes, e.g., for methods named next(), has_key() etc. > > Most obvious would be a special comment, something like > > for x in curiousobject.iteritems(): # 2to3:keep > foo(x) > > Does that make sense? Absolutely. (Were you in the audience of my keynote at EuroPython? I believe I briefly mentioned the need for such a feature there. :-) Can't say I have a good feeling for how to implement it yet, but it should definitely be possible. Precise syntax to be done. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Mon Jul 16 08:12:26 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 15 Jul 2007 23:12:26 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> Message-ID: <ee2a432c0707152312g48030ado88ffb03e37956cdf@mail.gmail.com> On 7/15/07, Guido van Rossum <guido at python.org> wrote: > When a source file contains a string literal with an out-of-range \U > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > offending literal, I get this, without any indication of the file or > line: > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > position 0-9: illegal Unicode character > > This is quite hard to track down. (Both the location of the bad > literal in the source file, and the origin of the error in the parser. > :-) Can someone come up with a fix? Take a look at the patch http://python.org/sf/1031213 That might help. I'm not sure if it's the same problem. I really need to dispose of a bunch of things assigned to me. :-( n From g.brandl at gmx.net Mon Jul 16 13:23:29 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 16 Jul 2007 13:23:29 +0200 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <ca471dc20707151922i1894355fh5118d07aa68abb65@mail.gmail.com> References: <f7e24l$92e$1@sea.gmane.org> <ca471dc20707151922i1894355fh5118d07aa68abb65@mail.gmail.com> Message-ID: <f7fkf5$a0n$1@sea.gmane.org> Guido van Rossum schrieb: > On 7/15/07, Georg Brandl <g.brandl at gmx.net> wrote: >> In order to have a codebase run in 2.x and 3.x, via automated translated by >> 2to3, there should be some "exclusion feature" for single lines that tells >> the refactorer not to touch those lines. >> >> For example, if you have some object that still has an iteritems() method and >> keeps it, it'll have to stay the same during translation. >> Same goes, e.g., for methods named next(), has_key() etc. >> >> Most obvious would be a special comment, something like >> >> for x in curiousobject.iteritems(): # 2to3:keep >> foo(x) >> >> Does that make sense? > > Absolutely. (Were you in the audience of my keynote at EuroPython? I > believe I briefly mentioned the need for such a feature there. :-) No, I ran the new documentation toolset through 2to3; and e.g. docutils nodes have a has_key() that does something else than __contains__(). Good to know it's planned! Georg From guido at python.org Mon Jul 16 16:16:10 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 07:16:10 -0700 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <f7fkf5$a0n$1@sea.gmane.org> References: <f7e24l$92e$1@sea.gmane.org> <ca471dc20707151922i1894355fh5118d07aa68abb65@mail.gmail.com> <f7fkf5$a0n$1@sea.gmane.org> Message-ID: <ca471dc20707160716hcf583c4uca822775f19b9987@mail.gmail.com> On 7/16/07, Georg Brandl <g.brandl at gmx.net> wrote: > > Absolutely. (Were you in the audience of my keynote at EuroPython? I > > believe I briefly mentioned the need for such a feature there. :-) > > No, I ran the new documentation toolset through 2to3; and e.g. docutils > nodes have a has_key() that does something else than __contains__(). > > Good to know it's planned! Planned is a big word. Someone has to design and implement it. BTW I hope to see more core developers from Europe at EuroPython next year! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 16 20:29:17 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 11:29:17 -0700 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <3f1451f50707120654s13e81551x25df9a1dadccafb0@mail.gmail.com> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> <4695BB9E.2030202@canterbury.ac.nz> <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> <3f1451f50707120654s13e81551x25df9a1dadccafb0@mail.gmail.com> Message-ID: <ca471dc20707161129h63fa1e04g93100ba88eea8fd4@mail.gmail.com> So, after seeing the patch and thinking this over some more, I have changed my mind (again). Attempting to flush a closed file seems to indicate that you're confused about whether a file is closed or not, and that seems indicative of unclear thinking, i.e. it's likely a bug that ought to be caught. I think the original thinking that lead to this being treated as an error in 2.x was correct. I don't see attempts to close an already closed file the same way -- this is a state transition to a final state and it makes total sense that you can reach that state from itself. There are good use cases for allowing this. I don't see the use case for flushing a closed file. --Guido On 7/12/07, Joe Gregorio <joe at bitworking.org> wrote: > On 7/12/07, Guido van Rossum <guido at python.org> wrote: > > On 7/12/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > > > Joe Gregorio wrote: > > > > flush() raises > > > > ValueError() if the file is already closed, > > > > > > > > Should io.py raise OSError instead of ValueError? > > > > > > Is it really necessary to raise anything at all? > > > An already-closed file is as flushed as it can > > > get, so why not just let it be a no-op? > > > > I like that much better. So close() shouldn't try to flush() if it's > > already closed. This means fixing io.py. (Unfortunately it's a bit of > > a mess, a bit of refactoring would do it good.) > > Thanks for the guidance. > > This patch fixes mmap and also changes io.py > so that close() doesn't flush if it's already closed. > I did run both test_io.py and test_file.py when checking > the changes to io.py. > > http://www.python.org/sf/1752647 > > Thanks, > -joe > > -- > Joe Gregorio http://bitworking.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From joe at bitworking.org Mon Jul 16 20:45:05 2007 From: joe at bitworking.org (Joe Gregorio) Date: Mon, 16 Jul 2007 14:45:05 -0400 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <ca471dc20707161129h63fa1e04g93100ba88eea8fd4@mail.gmail.com> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> <4695BB9E.2030202@canterbury.ac.nz> <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> <3f1451f50707120654s13e81551x25df9a1dadccafb0@mail.gmail.com> <ca471dc20707161129h63fa1e04g93100ba88eea8fd4@mail.gmail.com> Message-ID: <3f1451f50707161145i3a17541arf8c9c8595d641c39@mail.gmail.com> On 7/16/07, Guido van Rossum <guido at python.org> wrote: > So, after seeing the patch and thinking this over some more, I have > changed my mind (again). Attempting to flush a closed file seems to > indicate that you're confused about whether a file is closed or not, > and that seems indicative of unclear thinking, i.e. it's likely a bug > that ought to be caught. I think the original thinking that lead to > this being treated as an error in 2.x was correct. > > I don't see attempts to close an already closed file the same way -- > this is a state transition to a final state and it makes total sense > that you can reach that state from itself. There are good use cases > for allowing this. I don't see the use case for flushing a closed > file. Personally I like that better, it seems more consistent. Should I change the try/except block in the mmap unit test to look for ValueError or should the exception raised in io.py be of type OSError like the 2.5 code expects? test_mmap.py:108 try: f.close() except OSError: pass Thanks, -joe -- Joe Gregorio http://bitworking.org From guido at python.org Mon Jul 16 21:36:59 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 12:36:59 -0700 Subject: [Python-3000] test_mmap.py and OSError In-Reply-To: <3f1451f50707161145i3a17541arf8c9c8595d641c39@mail.gmail.com> References: <3f1451f50707112202n320e4b25hfadb3670129ba33a@mail.gmail.com> <4695BB9E.2030202@canterbury.ac.nz> <ca471dc20707120002u5d49a0c9s17970b705c68a588@mail.gmail.com> <3f1451f50707120654s13e81551x25df9a1dadccafb0@mail.gmail.com> <ca471dc20707161129h63fa1e04g93100ba88eea8fd4@mail.gmail.com> <3f1451f50707161145i3a17541arf8c9c8595d641c39@mail.gmail.com> Message-ID: <ca471dc20707161236r76b28e5fx49da5831937b87ce@mail.gmail.com> On 7/16/07, Joe Gregorio <joe at bitworking.org> wrote: > On 7/16/07, Guido van Rossum <guido at python.org> wrote: > > So, after seeing the patch and thinking this over some more, I have > > changed my mind (again). Attempting to flush a closed file seems to > > indicate that you're confused about whether a file is closed or not, > > and that seems indicative of unclear thinking, i.e. it's likely a bug > > that ought to be caught. I think the original thinking that lead to > > this being treated as an error in 2.x was correct. > > > > I don't see attempts to close an already closed file the same way -- > > this is a state transition to a final state and it makes total sense > > that you can reach that state from itself. There are good use cases > > for allowing this. I don't see the use case for flushing a closed > > file. > > Personally I like that better, it seems more consistent. > > Should I change the try/except block in the mmap unit test to look for > ValueError or should the exception raised in io.py be of type OSError like > the 2.5 code expects? > > test_mmap.py:108 > > try: > f.close() > except OSError: > pass > > Thanks, > -joe I just checked in your changes, but looking at the code, I think it's bogus either way: there should be two separate try/finally blocks corresponding to the two 'f = open(...)' calls. I'll fix it that way. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 16 22:35:12 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 13:35:12 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ee2a432c0707152312g48030ado88ffb03e37956cdf@mail.gmail.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <ee2a432c0707152312g48030ado88ffb03e37956cdf@mail.gmail.com> Message-ID: <ca471dc20707161335m1d41d274x3d64d3f9856d5bbc@mail.gmail.com> Doesn't look like it's the same problem. I've assigned that one to Martin who knows that area best of all. On 7/15/07, Neal Norwitz <nnorwitz at gmail.com> wrote: > On 7/15/07, Guido van Rossum <guido at python.org> wrote: > > When a source file contains a string literal with an out-of-range \U > > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > > offending literal, I get this, without any indication of the file or > > line: > > > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > > position 0-9: illegal Unicode character > > > > This is quite hard to track down. (Both the location of the bad > > literal in the source file, and the origin of the error in the parser. > > :-) Can someone come up with a fix? > > Take a look at the patch http://python.org/sf/1031213 > > That might help. I'm not sure if it's the same problem. > > I really need to dispose of a bunch of things assigned to me. :-( > > n > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 16 23:23:33 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 14:23:33 -0700 Subject: [Python-3000] TextIOWrapper.write(s:str) and bytes in py3k-struni In-Reply-To: <f7ajg5$pg5$1@sea.gmane.org> References: <f7ajg5$pg5$1@sea.gmane.org> Message-ID: <ca471dc20707161423p106bfe15i29eb572dda2b07c2@mail.gmail.com> On 7/14/07, Christian Heimes <lists at cheimes.de> wrote: > I'm having some troubles with unit tests in the py3k-struni branch. Some > test like test_uu are failing because an io.TextIOWrapper instance's > write() method doesn't handle bytes. The method is defined as: > > def write(self, s: str): > if self.closed: > raise ValueError("write to closed file") > # XXX What if we were just reading? > b = s.encode(self._encoding) > if isinstance(b, str): > b = bytes(b) > n = self.buffer.write(b) > if "\n" in s: > # XXX only if isatty > self.flush() > self._snapshot = self._decoder = None > return len(s) > > The problematic lines are the lines from s.encode() to b = bytes(b). The > behavior is more than questionable. A bytes object doesn't have an > encode() method and str's encode method() always returns bytes. IMO the > write() method should be changed to: > > def write(self, s: (str, bytes)): > if self.closed: > raise ValueError("write to closed file") > # XXX What if we were just reading? > if isinstance(s, basestring): > b = s.encode(self._encoding) > elif isinstance(s, bytes): > b = s > else: > b = bytes(b) > n = self.buffer.write(b) > if b"\n" in b: > # XXX only if isatty > self.flush() > self._snapshot = self._decoder = None > return len(s) > > Or the write() should explictly raise a TypeError when it is not allowed > to handle bytes. I came across this in your SF patch. I disagree with your desire to let TextIOWrapper.write() handle bytes: it should *only* be passed str objects. The uu test was failing because it was writing bytes to a text stream. Perhaps the error should be better; though I'm not sure I want to add explicit type checks (as it would defeat duck typing). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jul 17 01:58:36 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 16:58:36 -0700 Subject: [Python-3000] pep3115 - metaclasses in python 3000 In-Reply-To: <4698578C.3080808@acm.org> References: <f78it5$t35$1@sea.gmane.org> <4698578C.3080808@acm.org> Message-ID: <ca471dc20707161658l58a1d600j34c7175aac3a4b3c@mail.gmail.com> On 7/13/07, Talin <talin at acm.org> wrote: > Thomas Heller wrote: > > playing a little with py3k... > > > > pep3115 mentions that "__prepare__ returns a dictionary-like object > > which is used to store the class member definitions during evaluation > > of the class body." > > > > It does not mention whether this dict-like object is used afterwards > > as the class-dictionary of the created class or not (when the __new__ > > method of the metaclass is called). > > The intention is that it's up to the metaclass to decide. I suspect that > most metaclasses won't want to use the dict-like object as the class > dict, for two reasons: > > 1) The behavior of assigning to the class dict after class creation is > likely to be different than the behavior of assignment during class > creation. In particular, a typical 'dict-like' object is likely to be > slower than a dict (it has more work to do, after all), and you don't > want that slowness around once your class is finished initializing. > > 2) A 'dict-like' object doesn't have to support all of the methods of a > real dict, wherease a class dict does. So your dict-like wrapper can be > relatively simple. The object returned by __prepare__() actually *is* incorporated into the class object, unless the metaclass' __new__() passes something else to type.__new__(). However this isn't obvious when you ask for the class' __dict__ attribute: you always get a dict proxy. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jul 17 02:11:09 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Jul 2007 17:11:09 -0700 Subject: [Python-3000] pep3115 - metaclasses in python 3000 In-Reply-To: <ca471dc20707161658l58a1d600j34c7175aac3a4b3c@mail.gmail.com> References: <f78it5$t35$1@sea.gmane.org> <4698578C.3080808@acm.org> <ca471dc20707161658l58a1d600j34c7175aac3a4b3c@mail.gmail.com> Message-ID: <ca471dc20707161711q49d46202s2ce6f5989dff6aca@mail.gmail.com> On 7/16/07, Guido van Rossum <guido at python.org> wrote: > The object returned by __prepare__() actually *is* incorporated into > the class object, unless the metaclass' __new__() passes something > else to type.__new__(). However this isn't obvious when you ask for > the class' __dict__ attribute: you always get a dict proxy. I take it back. The object is copied, for the reasons Phillip explained. There is no way around this without writing C code, as the only way to create a type object from Python is to call type.__new__() -- the __new__() method if a subclass of type still must call type's __new__() method to create the actual object. (Embarrassed, since I wrote all the code involved.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Tue Jul 17 03:22:13 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 17 Jul 2007 03:22:13 +0200 Subject: [Python-3000] TextIOWrapper.write(s:str) and bytes in py3k-struni In-Reply-To: <ca471dc20707161423p106bfe15i29eb572dda2b07c2@mail.gmail.com> References: <f7ajg5$pg5$1@sea.gmane.org> <ca471dc20707161423p106bfe15i29eb572dda2b07c2@mail.gmail.com> Message-ID: <469C19C5.3010006@cheimes.de> Guido van Rossum wrote: > I came across this in your SF patch. I disagree with your desire to > let TextIOWrapper.write() handle bytes: it should *only* be passed str > objects. The uu test was failing because it was writing bytes to a > text stream. > > Perhaps the error should be better; though I'm not sure I want to add > explicit type checks (as it would defeat duck typing). Yes, duck typing is very useful but this duck doesn't quack me why it hurts. ;) It's rather confusing at first. What do you think about def write(self, s: str): if self.closed: raise ValueError("write to closed file") try: b = s.encode(self._encoding) except AttributeError: raise TypeError("str expected, got %r" % s) ... def write(self, s: str): if self.closed: raise ValueError("write to closed file") if not hasattr(s, 'encode') raise TypeError("str expected, got %r" % s) ... ? It explains what is going wrong. Christian From martin at v.loewis.de Tue Jul 17 06:52:27 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jul 2007 06:52:27 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> Message-ID: <469C4B0B.50605@v.loewis.de> Guido van Rossum schrieb: > That sounds like a good idea to try. It may break some more tests but > those are all indications of places that incorrectly still require > str8. > >> I wonder whether the "s" specifier in CallFunction, BuildValue etc >> should create Unicode objects, rather than str8 objects. Done. I fixed a number of test cases that broke because of that. In particular, bytes.__reduce__ could not easily return str8 objects as its marshalling state anymore (and shouldn't do so, anyway). So I made bytes a builtin type of pickle, using the S code. As a consequence, a number of other types had to get fixed. So in total, it adds one new failure: something in test_pickle now complains that bytes objects are not hashable. Regards, Martin From p.f.moore at gmail.com Tue Jul 17 13:04:13 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 17 Jul 2007 12:04:13 +0100 Subject: [Python-3000] TextIOWrapper.write(s:str) and bytes in py3k-struni In-Reply-To: <469C19C5.3010006@cheimes.de> References: <f7ajg5$pg5$1@sea.gmane.org> <ca471dc20707161423p106bfe15i29eb572dda2b07c2@mail.gmail.com> <469C19C5.3010006@cheimes.de> Message-ID: <79990c6b0707170404x68b6b99cj6be77e4f8e65c82@mail.gmail.com> On 17/07/07, Christian Heimes <lists at cheimes.de> wrote: > def write(self, s: str): > if self.closed: > raise ValueError("write to closed file") > if not hasattr(s, 'encode') > raise TypeError("str expected, got %r" % s) > ... > > ? It explains what is going wrong. Surely the error should say that the object passed needs an encode method, rather than that it should be a str? Paul. From ncoghlan at gmail.com Tue Jul 17 14:15:31 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Jul 2007 22:15:31 +1000 Subject: [Python-3000] TextIOWrapper.write(s:str) and bytes in py3k-struni In-Reply-To: <469C19C5.3010006@cheimes.de> References: <f7ajg5$pg5$1@sea.gmane.org> <ca471dc20707161423p106bfe15i29eb572dda2b07c2@mail.gmail.com> <469C19C5.3010006@cheimes.de> Message-ID: <469CB2E3.5070309@gmail.com> Christian Heimes wrote: > What do you think about > > def write(self, s: str): > if self.closed: > raise ValueError("write to closed file") > try: > b = s.encode(self._encoding) > except AttributeError: > raise TypeError("str expected, got %r" % s) > ... The try/except here is a bit too broad - you only want to trap the attribute error. That said, I'm not sure what error you could raise that would be clearer than complaining that the object passed in doesn't have an encode() method. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue Jul 17 16:25:30 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 17 Jul 2007 07:25:30 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469C4B0B.50605@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> Message-ID: <ca471dc20707170725q91bfba7p6f549a613c0c300e@mail.gmail.com> Thanks! Can you add test_pickle to the wiki page? (http://wiki.python.org/moin/Py3kStrUniTests) On 7/16/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > Guido van Rossum schrieb: > > That sounds like a good idea to try. It may break some more tests but > > those are all indications of places that incorrectly still require > > str8. > > > >> I wonder whether the "s" specifier in CallFunction, BuildValue etc > >> should create Unicode objects, rather than str8 objects. > > Done. I fixed a number of test cases that broke because of that. > In particular, bytes.__reduce__ could not easily return str8 objects > as its marshalling state anymore (and shouldn't do so, anyway). > So I made bytes a builtin type of pickle, using the S code. > As a consequence, a number of other types had to get fixed. > > So in total, it adds one new failure: something in test_pickle > now complains that bytes objects are not hashable. > > Regards, > Martin > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue Jul 17 22:42:54 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 17 Jul 2007 22:42:54 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707170725q91bfba7p6f549a613c0c300e@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707170725q91bfba7p6f549a613c0c300e@mail.gmail.com> Message-ID: <469D29CE.5050600@v.loewis.de> Guido van Rossum schrieb: > Thanks! Can you add test_pickle to the wiki page? > (http://wiki.python.org/moin/Py3kStrUniTests) Done! Martin From guido at python.org Tue Jul 17 23:04:14 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 17 Jul 2007 14:04:14 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469D29CE.5050600@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707170725q91bfba7p6f549a613c0c300e@mail.gmail.com> <469D29CE.5050600@v.loewis.de> Message-ID: <ca471dc20707171404l40c28f9cr733930031123537@mail.gmail.com> On 7/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > Guido van Rossum schrieb: > > Thanks! Can you add test_pickle to the wiki page? > > (http://wiki.python.org/moin/Py3kStrUniTests) > > Done! But now I'm confused. I don't see the failure. Are you sure you checked in what you did? In the py3k-struni branch? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jul 17 23:47:51 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 17 Jul 2007 14:47:51 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <loom.20070713T201857-3@post.gmane.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> Message-ID: <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> On 7/13/07, Michele Simionato <michele.simionato at gmail.com> wrote: > Phillip J. Eby <pje <at> telecommunity.com> writes: > > For what it's worth, the pkgutil > > module already contains an even simpler generic function > > implementation than simplegeneric, and is already in the stdlib > > albeit undocumented. > > Well, that is good to know. Personally I would be content with something > at that level of sophistication (i.e. the absolute minimum). I think there > is no much experience in the community with generic functions (except for > you) and there is no danger in waiting and in acquiring more experience > before including in the standard library a fully featured package. After > all, RuleDispatch is already out there and there is no reason for putting > everything in the stdlib. For the same reason, I am happy that Zope interfaces > will stay out of the stdlib, and that we will have the much simpler ABC > (of course one could argue that generic functions are better than ABC and > actually I think so, but still ABC are a simpler entry point for most > programmers, more in line with how Python has worked until now, and they > will allows me to throw away an half-backed interface implementation I am > using now, which is always a good thing ;) Actually, I believe ABCs and GFs work well together, and I believe Phillip has said so too. Regarding the fate of PEP 3124, perhaps the right thing is to reject the PEP, and be content with having GFs as a third party add-on? There seems to be nothing particular about Python 3.0 as the point of introduction of GFs anyway -- they can be introduced just as easily in 3.1 or 4.0 or any time later (or earlier, as Phillip's existing implementation show). I have one remaining question for Phillip: why is your design "absolutely dependent on being able to modify functions in-place"? That dependency would appear to make it harder to port the design to other Python implementations whose function objects don't behave the same way. I can see it as a philosophical desirable feature; but I don't understand the technical need for it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed Jul 18 00:38:06 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 17 Jul 2007 18:38:06 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> Message-ID: <20070717223550.7B1B13A403A@sparrow.telecommunity.com> At 02:47 PM 7/17/2007 -0700, Guido van Rossum wrote: >I have one remaining question for Phillip: why is your design >"absolutely dependent on being able to modify functions in-place"? >That dependency would appear to make it harder to port the design to >other Python implementations whose function objects don't behave the >same way. I can see it as a philosophical desirable feature; but I >don't understand the technical need for it. It allows the framework to bootstrap via successive approximation. Initially, the 'implies()' function is just a plain function, and then it later becomes a generic function. (And of course it gets called in between those two points.) The same happens for 'disjuncts()' and 'overrides()'. Is it potentially possible that there's another way to do it, given enough restrictions on how other code uses the exported API and enough hackery during bootstrapping? Perhaps, but I don't know of such a way. The modification-in-place approach allows me to just write the functions and not care precisely when they become generic. I still have to do a little extra special bootstrapping for implies(), because of its self-referential nature, but everything else I can pretty much blaze right on through with. (By the way, AFAIK IronPython, Jython (2.2), and PyPy all support writable func_code attributes, so it's evidently practical to do so for reasonably dynamic Python implementations.) >Regarding the fate of PEP 3124, perhaps the right thing is to reject >the PEP, and be content with having GFs as a third party add-on? I've also suggested simply deferring it. I'd still like to see a "blessed" meta-API for generic functions at some point. Also, as I've said, there's nothing stopping anybody from stepping up with a less-ambitious and less-controversial implementation based on your preferred API. I just won't be able to get to it myself for a month or so. (Also, nothing stops such a less-ambitious approach from being later folded into something more like my approach, with full extensibility and all the bells and whistles. In the worst case, one could always make a backward compatibility layer that fakes the more limited API using the more general one, as long as the lesser API is a strict subset of the greater -- and I believe it is.) >There seems to be nothing particular about Python 3.0 as the point of >introduction of GFs anyway -- they can be introduced just as easily in >3.1 or 4.0 or any time later (or earlier, as Phillip's existing >implementation show). Well, the one thing that might still be relevant is the "overloading inside classes" rule. That's the only bit that has any effect on Python 3.0 semantics vis-a-vis metaclasses, class decorators, etc. The way things currently stand for 3.0, I actually *won't* be able to make a GF implementation that handles the "first argument should be of the containing class" rule without users having an explicit metaclass or class decorator that supports it. In 2.x, I take advantage of the ability of code run inside a class suite to change the enclosing class' __metaclass__; in 3.0, you can't do this anymore since the __metaclass__ doesn't come from the class suite, and there isn't a replacement hook. From guido at python.org Wed Jul 18 00:53:24 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 17 Jul 2007 15:53:24 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070717223550.7B1B13A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> Message-ID: <ca471dc20707171553x69ebd106n2af86d47e2f6afc3@mail.gmail.com> On 7/17/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 02:47 PM 7/17/2007 -0700, Guido van Rossum wrote: > >I have one remaining question for Phillip: why is your design > >"absolutely dependent on being able to modify functions in-place"? > >That dependency would appear to make it harder to port the design to > >other Python implementations whose function objects don't behave the > >same way. I can see it as a philosophical desirable feature; but I > >don't understand the technical need for it. > > It allows the framework to bootstrap via successive > approximation. Initially, the 'implies()' function is just a plain > function, and then it later becomes a generic function. (And of > course it gets called in between those two points.) The same happens > for 'disjuncts()' and 'overrides()'. Why isn't it possible to mark these functions as explicitly overloadable? I'm not sure I understand what you mean by "bootstrapping". > Is it potentially possible that there's another way to do it, given > enough restrictions on how other code uses the exported API and > enough hackery during bootstrapping? Perhaps, but I don't know of > such a way. The modification-in-place approach allows me to just > write the functions and not care precisely when they become > generic. I still have to do a little extra special bootstrapping for > implies(), because of its self-referential nature, but everything > else I can pretty much blaze right on through with. I guess I'll have to reserve judgment until the implementation exists. > (By the way, AFAIK IronPython, Jython (2.2), and PyPy all support > writable func_code attributes, so it's evidently practical to do so > for reasonably dynamic Python implementations.) Fair enough, though I suspect that IronPython might use certain optimizations that depend on func_code not being written. However, I certainly don't know enough about it. Anyone familiar with IronPython on this list care to comment? > >Regarding the fate of PEP 3124, perhaps the right thing is to reject > >the PEP, and be content with having GFs as a third party add-on? > > I've also suggested simply deferring it. I'd still like to see a > "blessed" meta-API for generic functions at some point. I'll defer it. It seems you are the only one who can write such a blessed meta-API, and I'm guessing that's the part of PEP 3124 that was never completed. > Also, as I've said, there's nothing stopping anybody from stepping up > with a less-ambitious and less-controversial implementation based on > your preferred API. I just won't be able to get to it myself for a > month or so. I'm not sure anybody else cares enough to pre-empt you. > (Also, nothing stops such a less-ambitious approach from being later > folded into something more like my approach, with full extensibility > and all the bells and whistles. In the worst case, one could always > make a backward compatibility layer that fakes the more limited API > using the more general one, as long as the lesser API is a strict > subset of the greater -- and I believe it is.) > > > >There seems to be nothing particular about Python 3.0 as the point of > >introduction of GFs anyway -- they can be introduced just as easily in > >3.1 or 4.0 or any time later (or earlier, as Phillip's existing > >implementation show). > > Well, the one thing that might still be relevant is the "overloading > inside classes" rule. That's the only bit that has any effect on > Python 3.0 semantics vis-a-vis metaclasses, class decorators, etc. > > The way things currently stand for 3.0, I actually *won't* be able to > make a GF implementation that handles the "first argument should be > of the containing class" rule without users having an explicit > metaclass or class decorator that supports it. > > In 2.x, I take advantage of the ability of code run inside a class > suite to change the enclosing class' __metaclass__; in 3.0, you can't > do this anymore since the __metaclass__ doesn't come from the class > suite, and there isn't a replacement hook. I don't understand enough of your implementation to understand this requirement. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Wed Jul 18 01:27:54 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 17 Jul 2007 19:27:54 -0400 Subject: [Python-3000] Introspection broken for objects using Py_FindMethod() Message-ID: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> Hi, It is intentional that the introspection broken for objects using Py_FindMethod()? For example: Python 3.0x (cpy_merge:56413:56414M, Jul 17 2007, 13:57:23) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 >>> import cPickle >>> dir(cPickle.Unpickler(file)) [] >>> dir(cPickle.Pickler(file)) ['PicklingError', snip...] Thanks, -- Alexandre From guido at python.org Wed Jul 18 01:52:16 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 17 Jul 2007 16:52:16 -0700 Subject: [Python-3000] Introspection broken for objects using Py_FindMethod() In-Reply-To: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> References: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> Message-ID: <ca471dc20707171652w254d597bl9068abae61b64da4@mail.gmail.com> On 7/17/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > Hi, > > It is intentional that the introspection broken for objects using > Py_FindMethod()? For example: > > Python 3.0x (cpy_merge:56413:56414M, Jul 17 2007, 13:57:23) > [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 > >>> import cPickle > >>> dir(cPickle.Unpickler(file)) > [] > >>> dir(cPickle.Pickler(file)) > ['PicklingError', snip...] Yes, see a thread between me, Georg and Brett around March 7-10: http://mail.python.org/pipermail/python-3000/2007-March/006061.html I think the conclusion was to get rid of Py_FindMethod altogether. The replacement isn't very hard. But it hasn't been done yet. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed Jul 18 02:27:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 17 Jul 2007 20:27:02 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707171553x69ebd106n2af86d47e2f6afc3@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <ca471dc20707171553x69ebd106n2af86d47e2f6afc3@mail.gmail.com> Message-ID: <20070718002446.4B2763A403A@sparrow.telecommunity.com> At 03:53 PM 7/17/2007 -0700, Guido van Rossum wrote: >On 7/17/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > At 02:47 PM 7/17/2007 -0700, Guido van Rossum wrote: > > >I have one remaining question for Phillip: why is your design > > >"absolutely dependent on being able to modify functions in-place"? > > >That dependency would appear to make it harder to port the design to > > >other Python implementations whose function objects don't behave the > > >same way. I can see it as a philosophical desirable feature; but I > > >don't understand the technical need for it. > > > > It allows the framework to bootstrap via successive > > approximation. Initially, the 'implies()' function is just a plain > > function, and then it later becomes a generic function. (And of > > course it gets called in between those two points.) The same happens > > for 'disjuncts()' and 'overrides()'. > >Why isn't it possible to mark these functions as explicitly >overloadable? How would I ever add rules to them, if I need them to already be callable in order to add any rules in the first place? :) (In practice, things are even hairier, because I also sometimes need to call these functions *while they are already being called*, if there's no cache hit!) This is partly a consequence of splitting responsibilities between "rule sets" and "dispatch engines". PEAK-Rules wants to be able to use a simple type-tuple dispatcher (like your prototype), but also upgrade to fancier engines as required for specific functions, without changing the rules already registered for the function. So it treats the set of overloads as a separate object from the engine that actually implements dispatching. That way, you can upgrade the engine, even while keeping the rules. However, to populate a rule set, you need to know the disjuncts() of a rule... so you could never add the default rule to disjuncts() without a default rule already being there. None of this is relevant for a design that doesn't care about having more than one supported implementation, though, which is why a reduced-in-scope implementation that's not trying to be a universal API can just ignore all of this. (Heck, disjuncts() wouldn't even be needed in an implementation that wasn't trying to support arbitrary engine extensions, since its purpose is to list the "or"-ed conditions of a rule that can be fulfilled in more than one way.) > > Well, the one thing that might still be relevant is the "overloading > > inside classes" rule. That's the only bit that has any effect on > > Python 3.0 semantics vis-a-vis metaclasses, class decorators, etc. > > > > The way things currently stand for 3.0, I actually *won't* be able to > > make a GF implementation that handles the "first argument should be > > of the containing class" rule without users having an explicit > > metaclass or class decorator that supports it. > > > > In 2.x, I take advantage of the ability of code run inside a class > > suite to change the enclosing class' __metaclass__; in 3.0, you can't > > do this anymore since the __metaclass__ doesn't come from the class > > suite, and there isn't a replacement hook. > >I don't understand enough of your implementation to understand this >requirement. This part would actually be relevant even for a scaled-down non-extensible implementation. The requirement is this: overloads defined in a class need to implicitly treat the first argument of the overloading method as if it were explicitly declared "self: EnclosingClass". In order to do this, the equivalent code in RuleDispatch currently sticks a temporary metaclass into the class locals(), so that it can defer the overload operation until after the class exists. Then it adds in the class to the overload registration. This could be handled by any other sort of mechanism that would allow code in a class body to register a callback to receive the created class. A custom metaclass or class decorator would certainly do the trick, but then you have do something like: @class_contains_overloads class Something: @some_function.overload def blah(self, ...): yadda() It'd be nice to be able to skip the redundant class decorator, as it's not adding any useful information for the reader, and forgetting it will produce a bug. So if method decorators were allowed to request class decorators to be added, that would be the simplest way to manage this. However, if this has to wait for 3.1, it's no big deal. From jimjjewett at gmail.com Wed Jul 18 03:04:01 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 17 Jul 2007 21:04:01 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070718002446.4B2763A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <ca471dc20707171553x69ebd106n2af86d47e2f6afc3@mail.gmail.com> <20070718002446.4B2763A403A@sparrow.telecommunity.com> Message-ID: <fb6fbf560707171804n2d60958dq89a0726a53c16c84@mail.gmail.com> On 7/17/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 02:47 PM 7/17/2007 -0700, Guido van Rossum wrote: > >I have one remaining question for Phillip: why is your design > >"absolutely dependent on being able to modify functions in-place"? > It allows the framework to bootstrap via successive > approximation. Initially, the 'implies()' function is just a plain Would it work to make the original 'implies()' something other than an ordinary function? I realize that you prefer being able to overload anything, but it seems that you *could* mark the ones you'll need to overload as part of bootstrapping. > In 2.x, I take advantage of the ability of code run inside a class > suite to change the enclosing class' __metaclass__; in 3.0, What was missing from the __class__ attribute that you get from the super PEP fail? Was it that you wanted access to the class while defining the class, before the method is ever called? Why can't an ordinary class decorator work? Is it because you want the funky stuff to be conditional? If so, is that really required? Or are you just objecting to the fact that metaclasses like this won't be the default? -jJ From greg.ewing at canterbury.ac.nz Wed Jul 18 03:37:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 18 Jul 2007 13:37:10 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070717223550.7B1B13A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> Message-ID: <469D6EC6.9010005@canterbury.ac.nz> Phillip J. Eby wrote: > It allows the framework to bootstrap via successive > approximation. Initially, the 'implies()' function is just a plain > function, and then it later becomes a generic function. (And of > course it gets called in between those two points.) The same happens > for 'disjuncts()' and 'overrides()'. But you know from the outset that these functions will eventually become generic, so why can't they be defined as some callable object that can have its insides switched, if you're on a Python whose normal function objects don't allow that? -- Greg From pje at telecommunity.com Wed Jul 18 04:03:20 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 17 Jul 2007 22:03:20 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <fb6fbf560707171804n2d60958dq89a0726a53c16c84@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <ca471dc20707171553x69ebd106n2af86d47e2f6afc3@mail.gmail.com> <20070718002446.4B2763A403A@sparrow.telecommunity.com> <fb6fbf560707171804n2d60958dq89a0726a53c16c84@mail.gmail.com> Message-ID: <20070718020107.EA7123A403A@sparrow.telecommunity.com> At 09:04 PM 7/17/2007 -0400, Jim Jewett wrote: >On 7/17/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > At 02:47 PM 7/17/2007 -0700, Guido van Rossum wrote: > > >I have one remaining question for Phillip: why is your design > > >"absolutely dependent on being able to modify functions in-place"? > > > It allows the framework to bootstrap via successive > > approximation. Initially, the 'implies()' function is just a plain > >Would it work to make the original 'implies()' something other than an >ordinary function? I realize that you prefer being able to overload >anything, but it seems that you *could* mark the ones you'll need to >overload as part of bootstrapping. Fair enough. The design is still dependent on modifying "functions" in place, for some value of "function". It just never occurred to me to introduce a *third* type of "function", besides the two already being dealt with (i.e., standard functions and generic functions). Even *thinking* about the idea right now is like fingernails on a chalkboard to me, so I can see why it didn't occur to me. :) > > In 2.x, I take advantage of the ability of code run inside a class > > suite to change the enclosing class' __metaclass__; in 3.0, > >What was missing from the __class__ attribute that you get from the >super PEP fail? Was it that you wanted access to the class while >defining the class, before the method is ever called? Correct; you need access to it before the method is called, since it's to add an overload to a generic function. >Why can't an ordinary class decorator work? It can; it's just noise. > Is it because you want >the funky stuff to be conditional? If so, is that really required? I don't understand what you mean by "funky stuff" or "conditional", here. >Or are you just objecting to the fact that metaclasses like this won't >be the default? The idea is to make it so that using generic functions doesn't require a bunch of extra bookkeeping, like adding metaclasses or decorators. Metaclasses are particularly problematic in that mixing multiple metaclasses is not an activity for novice wizards. That's why I don't use that approach in today's Python: I can safely wizard around the problem using pseudo-metaclasses, such that the user's metaclasses aren't touched. Post-PEP 3115, however, it won't be an option any more, and you'll at least need a boilerplate decorator for it to work, and it'll silently break without it, giving absolutely no clue as to the problem. From pje at telecommunity.com Wed Jul 18 04:05:25 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 17 Jul 2007 22:05:25 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <469D6EC6.9010005@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> Message-ID: <20070718020310.2168A3A403A@sparrow.telecommunity.com> At 01:37 PM 7/18/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > It allows the framework to bootstrap via successive > > approximation. Initially, the 'implies()' function is just a plain > > function, and then it later becomes a generic function. (And of > > course it gets called in between those two points.) The same happens > > for 'disjuncts()' and 'overrides()'. > >But you know from the outset that these functions will >eventually become generic, so why can't they be defined >as some callable object that can have its insides >switched, if you're on a Python whose normal function >objects don't allow that? Well, phrased that way, it sounds like a justification for treating it as a porting strategy for such Pythons. The library could just use a "copy_code(srcfunc, dstfunc)" function that's implemented differently on different Pythons. From martin at v.loewis.de Wed Jul 18 04:29:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jul 2007 04:29:14 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707171404l40c28f9cr733930031123537@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707170725q91bfba7p6f549a613c0c300e@mail.gmail.com> <469D29CE.5050600@v.loewis.de> <ca471dc20707171404l40c28f9cr733930031123537@mail.gmail.com> Message-ID: <469D7AFA.5030505@v.loewis.de> > But now I'm confused. I don't see the failure. Are you sure you > checked in what you did? In the py3k-struni branch? Oops, no. The commit was rejected because it was not whitespace-normalized correctly, and I didn't notice. Now I tried again. Martin From unknown_kev_cat at hotmail.com Tue Jul 17 19:16:42 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 17 Jul 2007 13:16:42 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin Message-ID: <f7ithr$lrr$1@sea.gmane.org> Building Py3k_struni under Cygwin I've noticed a few more tests failing than the wiki shows. These are using SVN revision 56413. Some spurious errors seem to occur if Python/ is not remaned temporally. I have not included those. (This is an oddity of the cygwin '.exe' autohandling combined with case-insensitivity) Test_coding: Errors. Traceback included at end of message. "test test_descr failed -- ['foo\u1234bar'] slots not caught" "test test_largefile failed -- got b'z', but expected 'z'" test_marshal: Tests that fail are fasiling with a recursion limit exceeded error. Tracebacks: test test_coding failed -- Traceback (most recent call last): File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 12, in test_bad_c oding2 self.verify_bad_module(module_name) File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 20, in verify_bad _module text = fp.read() File "/home/Owner/py3k-struni/Lib/io.py", line 1186, in read res += decoder.decode(self.buffer.read(), True) File "/home/Owner/py3k-struni/Lib/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) Just a heads up. From martin at v.loewis.de Wed Jul 18 05:36:05 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jul 2007 05:36:05 +0200 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> Message-ID: <469D8AA5.1080502@v.loewis.de> > When a source file contains a string literal with an out-of-range \U > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > offending literal, I get this, without any indication of the file or > line: > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > position 0-9: illegal Unicode character > > This is quite hard to track down. I think the fundamental flaw is that a codec is used to implement the Python syntax (or, rather, lexical rules). Not quite sure what the rationale for this design was; doing it on the lexical level is (was) tricky because \u escapes were allowed only for Unicode literals, and the lexer had no knowledge of the prefix preceding a literal. (In 3k, it's still similar, because \U escapes have no effect in bytes and raw literals). Still, even if it is "only" handled at the parsing level, I don't see why it needs to be a codec. Instead, implementing escapes in the compiler would still allow for proper diagnostics (notice that in the AST the original lexical form of the string literal is gone). > (Both the location of the bad > literal in the source file, and the origin of the error in the parser. > :-) Can someone come up with a fix? The language definition makes it difficult to fix it where I would consider the "proper" place, i.e. in the tokenization: http://docs.python.org/ref/strings.html says that escapeseq is "\" <any ASCII character>. So "\x" is a valid shortstring. Then it becomes fuzzy: It says that any unrecognized escape sequences are left in the string. While that appears like a clear specification, it is not implemented (and has not since Python 2.0 anymore). According to the spec, '\U12345678' is well-formed, and denotes the same string as '\\U12345678'. I now see the following choices: 1. Restore implementing the spec again. Stop complaining about invalid escapes for \x and \U, and just interpret the \ as '\\'. In this case, the current design could be left in place, and the codecs would just stop raising these errors. 2. Change the spec to make it an error if \x is not followed by two hex digits, \u not by four hex digits, \U not by 8, or the value denoted by the \U digits is out of range. In this case, I would propose to move the lexical analysis back into the parser, or just make an internal API that will raise a proper SyntaxError (it will be tricky to compute the column in the original source line, though). 3. Change the spec to make constrain escapeseq, giving up the rule that uninterpreted escapes silently become two characters. That's difficult to write down in EBNF, so should be formulated through constraints in natural language. The lexer would have to keep track of what kind of literal it is processing, and reject invalid escapes directly on source level. There are probably other options as well. Regards, Martin From martin at v.loewis.de Wed Jul 18 05:37:56 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 18 Jul 2007 05:37:56 +0200 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <ca471dc20707160716hcf583c4uca822775f19b9987@mail.gmail.com> References: <f7e24l$92e$1@sea.gmane.org> <ca471dc20707151922i1894355fh5118d07aa68abb65@mail.gmail.com> <f7fkf5$a0n$1@sea.gmane.org> <ca471dc20707160716hcf583c4uca822775f19b9987@mail.gmail.com> Message-ID: <469D8B14.4050907@v.loewis.de> > BTW I hope to see more core developers from Europe at EuroPython next year! It's always difficult to get there for me, as it takes place during the semester :-( Regards, Martin From kbk at shore.net Wed Jul 18 08:04:13 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed, 18 Jul 2007 02:04:13 -0400 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> (Guido van Rossum's message of "Sun, 15 Jul 2007 07:17:00 -0700") References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> Message-ID: <87odia5jtu.fsf@hydra.bayview.thirdcreek.com> "Guido van Rossum" <guido at python.org> writes: > When a source file contains a string literal with an out-of-range \U > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > offending literal, I get this, without any indication of the file or > line: > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > position 0-9: illegal Unicode character > > This is quite hard to track down. (Both the location of the bad > literal in the source file, and the origin of the error in the parser. > :-) Can someone come up with a fix? > > I note that raw escapes show a slightly different error. I also note > that the same issue exists for u"..." literals in Python 2.5. For what it's worth, I posted a patch to ast.c against the 2.6 trunk which massages the unicode exception into a SyntaxError showing the location. That approach lets unicodeobject.c handle the gory details while ast.c handles the SyntaxError generation. It might be a solution until something deeper along the lines of Martin's thoughts is possibly developed. I don't think that any reference adjustments are needed, but someone should check the patch. www.python.org/sf/1755885 -- KBK From kbk at shore.net Wed Jul 18 08:04:13 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed, 18 Jul 2007 02:04:13 -0400 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> (Guido van Rossum's message of "Sun, 15 Jul 2007 07:17:00 -0700") References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> Message-ID: <87k5sy5j6l.fsf@hydra.bayview.thirdcreek.com> "Guido van Rossum" <guido at python.org> writes: > When a source file contains a string literal with an out-of-range \U > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > offending literal, I get this, without any indication of the file or > line: > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > position 0-9: illegal Unicode character > > This is quite hard to track down. (Both the location of the bad > literal in the source file, and the origin of the error in the parser. > :-) Can someone come up with a fix? > > I note that raw escapes show a slightly different error. I also note > that the same issue exists for u"..." literals in Python 2.5. For what it's worth, I posted a patch to ast.c against the 2.6 trunk which massages the unicode exception into a SyntaxError showing the location. That approach lets unicodeobject.c handle the gory details while ast.c handles the SyntaxError generation. It might be a solution until something deeper along the lines of Martin's thoughts is possibly developed. I don't think that any reference adjustments are needed, but someone should check the patch. www.python.org/sf/1755885 -- KBK From amauryfa at gmail.com Wed Jul 18 10:20:36 2007 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 18 Jul 2007 10:20:36 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f7ithr$lrr$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> Message-ID: <e27efe130707180120l4bd674bcg8cda3a0c2ec8b5bf@mail.gmail.com> Hello, 2007/7/17, Joe Smith wrote: > Building Py3k_struni under Cygwin I've noticed a few more tests failing than > the wiki shows. > These are using SVN revision 56413. > > Some spurious errors seem to occur if Python/ is not remaned temporally. I > have not included those. (This is an oddity of the cygwin '.exe' > autohandling combined with case-insensitivity) For this, I have added a line to runtests.sh: # Choose the Python binary. case `uname` in Darwin) PYTHON=./python.exe;; CYGWIN*) PYTHON=./python.exe;; *) PYTHON=./python;; esac Hope this helps, -- Amaury Forgeot d'Arc From guido at python.org Wed Jul 18 18:47:13 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 09:47:13 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070718020310.2168A3A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> Message-ID: <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> On 7/17/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 01:37 PM 7/18/2007 +1200, Greg Ewing wrote: > >Phillip J. Eby wrote: > > > It allows the framework to bootstrap via successive > > > approximation. Initially, the 'implies()' function is just a plain > > > function, and then it later becomes a generic function. (And of > > > course it gets called in between those two points.) The same happens > > > for 'disjuncts()' and 'overrides()'. > > > >But you know from the outset that these functions will > >eventually become generic, so why can't they be defined > >as some callable object that can have its insides > >switched, if you're on a Python whose normal function > >objects don't allow that? > > Well, phrased that way, it sounds like a justification for treating > it as a porting strategy for such Pythons. The library could just > use a "copy_code(srcfunc, dstfunc)" function that's implemented > differently on different Pythons. Sorry, but I'm still totally uncomfortable with this. While I admit the feature exists, I really, really, really don't want it to be used on a regular basis. As long as Phillip calls a counterproposal "fingernails on a chalkboard", I call this unpythonic. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 18 18:59:46 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 09:59:46 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <e27efe130707180120l4bd674bcg8cda3a0c2ec8b5bf@mail.gmail.com> References: <f7ithr$lrr$1@sea.gmane.org> <e27efe130707180120l4bd674bcg8cda3a0c2ec8b5bf@mail.gmail.com> Message-ID: <ca471dc20707180959n6a8f971dqb92982fe2fdaade5@mail.gmail.com> On 7/18/07, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote: > Hello, > > 2007/7/17, Joe Smith wrote: > > Building Py3k_struni under Cygwin I've noticed a few more tests failing than > > the wiki shows. > > These are using SVN revision 56413. > > > > Some spurious errors seem to occur if Python/ is not remaned temporally. I > > have not included those. (This is an oddity of the cygwin '.exe' > > autohandling combined with case-insensitivity) > > For this, I have added a line to runtests.sh: > > # Choose the Python binary. > case `uname` in > Darwin) PYTHON=./python.exe;; > CYGWIN*) PYTHON=./python.exe;; > *) PYTHON=./python;; > esac This is now committed to Subversion: (r56440). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 18 19:02:07 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 10:02:07 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f7ithr$lrr$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> Message-ID: <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> On 7/17/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > Building Py3k_struni under Cygwin I've noticed a few more tests failing than > the wiki shows. > These are using SVN revision 56413. > > Some spurious errors seem to occur if Python/ is not remaned temporally. I > have not included those. (This is an oddity of the cygwin '.exe' > autohandling combined with case-insensitivity) > > > Test_coding: Errors. Traceback included at end of message. > "test test_descr failed -- ['foo\u1234bar'] slots not caught" > "test test_largefile failed -- got b'z', but expected 'z'" > test_marshal: Tests that fail are fasiling with a recursion limit exceeded > error. > > > > Tracebacks: > > test test_coding failed -- Traceback (most recent call last): > File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 12, in > test_bad_c > oding2 > self.verify_bad_module(module_name) > File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 20, in > verify_bad > _module > text = fp.read() > File "/home/Owner/py3k-struni/Lib/io.py", line 1186, in read > res += decoder.decode(self.buffer.read(), True) > File "/home/Owner/py3k-struni/Lib/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: > ordinal > not in range(128) The test_descr and test_largefile failures are reproducible on Ubuntu and someone will eventually fix them. I can't reproduce the test_marshal and test_coding failures; please investigate more on CYGWIN. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 18 19:27:01 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 10:27:01 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <87k5sy5j6l.fsf@hydra.bayview.thirdcreek.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <87k5sy5j6l.fsf@hydra.bayview.thirdcreek.com> Message-ID: <ca471dc20707181027u182550dyaaf362fc718dd883@mail.gmail.com> On 7/17/07, Kurt B. Kaiser <kbk at shore.net> wrote: > For what it's worth, I posted a patch to ast.c against the 2.6 trunk > which massages the unicode exception into a SyntaxError showing the > location. > > That approach lets unicodeobject.c handle the gory details while ast.c > handles the SyntaxError generation. It might be a solution until > something deeper along the lines of Martin's thoughts is possibly > developed. > > I don't think that any reference adjustments are needed, but someone > should check the patch. > > www.python.org/sf/1755885 Thanks! Checked in, and merged into p3yk. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed Jul 18 19:27:49 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 18 Jul 2007 13:27:49 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com > References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> Message-ID: <20070718172907.861383A40A4@sparrow.telecommunity.com> At 09:47 AM 7/18/2007 -0700, Guido van Rossum wrote: >Sorry, but I'm still totally uncomfortable with this. While I admit >the feature exists, I really, really, really don't want it to be used >on a regular basis. As long as Phillip calls a counterproposal >"fingernails on a chalkboard", I call this unpythonic. I didn't say I wouldn't *do* it, I just explained why I'd have never come up with the idea on my own. I don't have to like something in order to do it, though of course it helps. :) From guido at python.org Wed Jul 18 19:31:53 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 10:31:53 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <469D8AA5.1080502@v.loewis.de> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <469D8AA5.1080502@v.loewis.de> Message-ID: <ca471dc20707181031sa2339a4u4900de65a549c4e2@mail.gmail.com> On 7/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > > When a source file contains a string literal with an out-of-range \U > > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > > offending literal, I get this, without any indication of the file or > > line: > > > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > > position 0-9: illegal Unicode character > > > > This is quite hard to track down. > > I think the fundamental flaw is that a codec is used to implement > the Python syntax (or, rather, lexical rules). > > Not quite sure what the rationale for this design was; doing it on > the lexical level is (was) tricky because \u escapes were allowed > only for Unicode literals, and the lexer had no knowledge of the > prefix preceding a literal. (In 3k, it's still similar, because > \U escapes have no effect in bytes and raw literals). > > Still, even if it is "only" handled at the parsing level, I > don't see why it needs to be a codec. Instead, implementing > escapes in the compiler would still allow for proper diagnostics > (notice that in the AST the original lexical form of the string > literal is gone). I guess because it was deemed useful to have a codec for this purpose too, thereby exposing the algorithm to Python code that needs the same functionality (e.g. the compiler package, RIP). > > (Both the location of the bad > > literal in the source file, and the origin of the error in the parser. > > :-) Can someone come up with a fix? > > The language definition makes it difficult to fix it where I would > consider the "proper" place, i.e. in the tokenization: > > http://docs.python.org/ref/strings.html > > says that escapeseq is "\" <any ASCII character>. So > "\x" is a valid shortstring. > > Then it becomes fuzzy: It says that any unrecognized escape > sequences are left in the string. While that appears like a clear > specification, it is not implemented (and has not since Python > 2.0 anymore). According to the spec, '\U12345678' is well-formed, > and denotes the same string as '\\U12345678'. > > I now see the following choices: > 1. Restore implementing the spec again. Stop complaining about > invalid escapes for \x and \U, and just interpret the \ > as '\\'. In this case, the current design could be left in > place, and the codecs would just stop raising these errors. Sounds like a bad idea. I think \xNN (where N is not a hex digit) once behaved this way, and it was changed to explicitly complain instead as a service to users. > 2. Change the spec to make it an error if \x is not followed > by two hex digits, \u not by four hex digits, \U not by > 8, or the value denoted by the \U digits is out of range. > In this case, I would propose to move the lexical analysis > back into the parser, or just make an internal API that > will raise a proper SyntaxError (it will be tricky to > compute the column in the original source line, though). I'm all in favor of this spec change. Eventually we should change the lexer to do this right; for now, Kurt's patch is good enough. > 3. Change the spec to make constrain escapeseq, giving up > the rule that uninterpreted escapes silently become > two characters. That's difficult to write down in EBNF, > so should be formulated through constraints in natural > language. The lexer would have to keep track of what kind > of literal it is processing, and reject invalid escapes > directly on source level. -1 > There are probably other options as well. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Wed Jul 18 19:56:07 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Wed, 18 Jul 2007 13:56:07 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> Message-ID: <f7lk7q$9m6$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707181002w64e076aco9a509ec7e4e15b9a at mail.gmail.com... > On 7/17/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> Building Py3k_struni under Cygwin I've noticed a few more tests failing >> than >> the wiki shows. >> These are using SVN revision 56413. >> >> Some spurious errors seem to occur if Python/ is not remaned temporally. >> I >> have not included those. (This is an oddity of the cygwin '.exe' >> autohandling combined with case-insensitivity) >> >> >> Test_coding: Errors. Traceback included at end of message. >> "test test_descr failed -- ['foo\u1234bar'] slots not caught" >> "test test_largefile failed -- got b'z', but expected 'z'" >> test_marshal: Tests that fail are fasiling with a recursion limit >> exceeded >> error. >> >> >> >> Tracebacks: >> >> test test_coding failed -- Traceback (most recent call last): >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 12, in >> test_bad_c >> oding2 >> self.verify_bad_module(module_name) >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 20, in >> verify_bad >> _module >> text = fp.read() >> File "/home/Owner/py3k-struni/Lib/io.py", line 1186, in read >> res += decoder.decode(self.buffer.read(), True) >> File "/home/Owner/py3k-struni/Lib/encodings/ascii.py", line 26, in >> decode >> return codecs.ascii_decode(input, self.errors)[0] >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: >> ordinal >> not in range(128) > > The test_descr and test_largefile failures are reproducible on Ubuntu > and someone will eventually fix them. > > I can't reproduce the test_marshal and test_coding failures; please > investigate more on CYGWIN. For the test coding, apprently the module's contents are intended to be loaded, and then verified that a syntax error occurs when trying to parse the module. However, on cygwin i'm consistantly getting an error on the line that reads the file. Specificly fp.read(). Fp.read() appears to be trying to export a unicode string by interpreting the byte string as ascii. The byte string is most certainly not valid ascii. So the codec throws an error. I'm guessing for some reason python normally chose a different codec, but on my cygwin compiles it is choosing ascii. I'm not sure why. Nor am I sure how to inestigate further. Heres a fairly useless loking traceback for test_marshal. Many of the tests fail with nearly identical tracebacks: #====================================================================== #ERROR: test_tuple (test.test_marshal.ContainerTestCase) #---------------------------------------------------------------------- #Traceback (most recent call last): # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 134, in test_tuple # self.helper(tuple(self.d.keys())) # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 21, in helper # new = marshal.load(f) #ValueError: recursion limit exceeded For what it's worth here is the fll subtest list and status for test_marshal: #test_bool (test.test_marshal.IntTestCase) ... ERROR #test_int64 (test.test_marshal.IntTestCase) ... ok #test_ints (test.test_marshal.IntTestCase) ... ERROR #test_floats (test.test_marshal.FloatTestCase) ... ERROR #test_buffer (test.test_marshal.StringTestCase) ... ERROR #test_string (test.test_marshal.StringTestCase) ... ERROR #test_unicode (test.test_marshal.StringTestCase) ... ERROR #test_code (test.test_marshal.CodeTestCase) ... ok #test_dict (test.test_marshal.ContainerTestCase) ... ERROR #test_list (test.test_marshal.ContainerTestCase) ... ERROR #test_sets (test.test_marshal.ContainerTestCase) ... ERROR #test_tuple (test.test_marshal.ContainerTestCase) ... ERROR #test_exceptions (test.test_marshal.ExceptionTestCase) ... ok #test_bug_5888452 (test.test_marshal.BugsTestCase) ... ok #test_fuzz (test.test_marshal.BugsTestCase) ... ok #test_loads_recursion (test.test_marshal.BugsTestCase) ... ok #test_patch_873224 (test.test_marshal.BugsTestCase) ... ok #test_recursion_limit (test.test_marshal.BugsTestCase) ... ok #test_version_argument (test.test_marshal.BugsTestCase) ... ok I'm wondering if the recusion limit on my build is getting set too low somehow. From guido at python.org Wed Jul 18 20:13:24 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 11:13:24 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f7lk7q$9m6$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> Message-ID: <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> On 7/18/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > > "Guido van Rossum" <guido at python.org> wrote in message > news:ca471dc20707181002w64e076aco9a509ec7e4e15b9a at mail.gmail.com... > > On 7/17/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > >> Building Py3k_struni under Cygwin I've noticed a few more tests failing > >> than > >> the wiki shows. > >> These are using SVN revision 56413. > >> > >> Some spurious errors seem to occur if Python/ is not remaned temporally. > >> I > >> have not included those. (This is an oddity of the cygwin '.exe' > >> autohandling combined with case-insensitivity) > >> > >> > >> Test_coding: Errors. Traceback included at end of message. > >> "test test_descr failed -- ['foo\u1234bar'] slots not caught" > >> "test test_largefile failed -- got b'z', but expected 'z'" > >> test_marshal: Tests that fail are fasiling with a recursion limit > >> exceeded > >> error. > >> > >> > >> > >> Tracebacks: > >> > >> test test_coding failed -- Traceback (most recent call last): > >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 12, in > >> test_bad_c > >> oding2 > >> self.verify_bad_module(module_name) > >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 20, in > >> verify_bad > >> _module > >> text = fp.read() > >> File "/home/Owner/py3k-struni/Lib/io.py", line 1186, in read > >> res += decoder.decode(self.buffer.read(), True) > >> File "/home/Owner/py3k-struni/Lib/encodings/ascii.py", line 26, in > >> decode > >> return codecs.ascii_decode(input, self.errors)[0] > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: > >> ordinal > >> not in range(128) > > > > The test_descr and test_largefile failures are reproducible on Ubuntu > > and someone will eventually fix them. > > > > I can't reproduce the test_marshal and test_coding failures; please > > investigate more on CYGWIN. > > For the test coding, apprently the module's contents are intended to be > loaded, and then verified that a syntax error occurs when trying to parse > the module. However, on cygwin i'm consistantly getting an error on the line > that reads the file. Specificly fp.read(). > > Fp.read() appears to be trying to export a unicode string by interpreting > the byte string as ascii. The byte string is most certainly not valid ascii. > So the codec throws an error. I'm guessing for some reason python normally > chose a different codec, but on my cygwin compiles it is choosing ascii. I'm > not sure why. Nor am I sure how to inestigate further. The encoding defaults to the filesystem encoding or otherwise Latin-1. There's an XXX comment in io.py, in TextIOWrapper.__init__, admitting this is questionable. I'm guessing CYGWIN has a filesystem encoding equal to ASCII? Is this a good idea? Maybe the default encoding should always be UTF-8 (matching the source code default encoding). I can also fix it by changing test_coding.py to add encoding="utf-8" to the open() call in verify_bad_module(). > Heres a fairly useless loking traceback for test_marshal. Many of the tests > fail with nearly identical tracebacks: > > #====================================================================== > #ERROR: test_tuple (test.test_marshal.ContainerTestCase) > #---------------------------------------------------------------------- > #Traceback (most recent call last): > # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 134, in > test_tuple > # self.helper(tuple(self.d.keys())) > # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 21, in > helper > # new = marshal.load(f) > #ValueError: recursion limit exceeded > > For what it's worth here is the fll subtest list and status for > test_marshal: > > #test_bool (test.test_marshal.IntTestCase) ... ERROR > #test_int64 (test.test_marshal.IntTestCase) ... ok > #test_ints (test.test_marshal.IntTestCase) ... ERROR > #test_floats (test.test_marshal.FloatTestCase) ... ERROR > #test_buffer (test.test_marshal.StringTestCase) ... ERROR > #test_string (test.test_marshal.StringTestCase) ... ERROR > #test_unicode (test.test_marshal.StringTestCase) ... ERROR > #test_code (test.test_marshal.CodeTestCase) ... ok > #test_dict (test.test_marshal.ContainerTestCase) ... ERROR > #test_list (test.test_marshal.ContainerTestCase) ... ERROR > #test_sets (test.test_marshal.ContainerTestCase) ... ERROR > #test_tuple (test.test_marshal.ContainerTestCase) ... ERROR > #test_exceptions (test.test_marshal.ExceptionTestCase) ... ok > #test_bug_5888452 (test.test_marshal.BugsTestCase) ... ok > #test_fuzz (test.test_marshal.BugsTestCase) ... ok > #test_loads_recursion (test.test_marshal.BugsTestCase) ... ok > #test_patch_873224 (test.test_marshal.BugsTestCase) ... ok > #test_recursion_limit (test.test_marshal.BugsTestCase) ... ok > #test_version_argument (test.test_marshal.BugsTestCase) ... ok > > I'm wondering if the recusion limit on my build is getting set too low > somehow. Can you find out what it is? sys.getrecursionlimit(). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Wed Jul 18 20:27:40 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 18 Jul 2007 14:27:40 -0400 Subject: [Python-3000] Introspection broken for objects using Py_FindMethod() In-Reply-To: <ca471dc20707171652w254d597bl9068abae61b64da4@mail.gmail.com> References: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> <ca471dc20707171652w254d597bl9068abae61b64da4@mail.gmail.com> Message-ID: <acd65fa20707181127w3507c064rc02e6241c24d86f2@mail.gmail.com> On 7/17/07, Guido van Rossum <guido at python.org> wrote: > Yes, see a thread between me, Georg and Brett around March 7-10: > > http://mail.python.org/pipermail/python-3000/2007-March/006061.html > Thanks for the pointer. > I think the conclusion was to get rid of Py_FindMethod altogether. The > replacement isn't very hard. But it hasn't been done yet. Do you need you some help for that? Perhaps, I could try to write a patch to replace the trivial use cases of Py_FindMethod in the stdlib. Also, I think it would be a good idea to document the change, too. -- Alexandre From unknown_kev_cat at hotmail.com Wed Jul 18 20:50:14 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Wed, 18 Jul 2007 14:50:14 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org><ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com><f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> Message-ID: <f7lnd8$l2s$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707181113m360db736h2fd079f29f71220 at mail.gmail.com... > On 7/18/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> >> "Guido van Rossum" <guido at python.org> wrote in message >> news:ca471dc20707181002w64e076aco9a509ec7e4e15b9a at mail.gmail.com... >> > On 7/17/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> >> Building Py3k_struni under Cygwin I've noticed a few more tests >> >> failing >> >> than >> >> the wiki shows. >> >> These are using SVN revision 56413. >> >> >> >> Some spurious errors seem to occur if Python/ is not remaned >> >> temporally. >> >> I >> >> have not included those. (This is an oddity of the cygwin '.exe' >> >> autohandling combined with case-insensitivity) >> >> >> >> >> >> Test_coding: Errors. Traceback included at end of message. >> >> "test test_descr failed -- ['foo\u1234bar'] slots not caught" >> >> "test test_largefile failed -- got b'z', but expected 'z'" >> >> test_marshal: Tests that fail are fasiling with a recursion limit >> >> exceeded >> >> error. >> >> >> >> >> >> >> >> Tracebacks: >> >> >> >> test test_coding failed -- Traceback (most recent call last): >> >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 12, in >> >> test_bad_c >> >> oding2 >> >> self.verify_bad_module(module_name) >> >> File "/home/Owner/py3k-struni/Lib/test/test_coding.py", line 20, in >> >> verify_bad >> >> _module >> >> text = fp.read() >> >> File "/home/Owner/py3k-struni/Lib/io.py", line 1186, in read >> >> res += decoder.decode(self.buffer.read(), True) >> >> File "/home/Owner/py3k-struni/Lib/encodings/ascii.py", line 26, in >> >> decode >> >> return codecs.ascii_decode(input, self.errors)[0] >> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position >> >> 0: >> >> ordinal >> >> not in range(128) >> > >> > The test_descr and test_largefile failures are reproducible on Ubuntu >> > and someone will eventually fix them. >> > >> > I can't reproduce the test_marshal and test_coding failures; please >> > investigate more on CYGWIN. >> >> For the test coding, apprently the module's contents are intended to be >> loaded, and then verified that a syntax error occurs when trying to parse >> the module. However, on cygwin i'm consistantly getting an error on the >> line >> that reads the file. Specificly fp.read(). >> >> Fp.read() appears to be trying to export a unicode string by interpreting >> the byte string as ascii. The byte string is most certainly not valid >> ascii. >> So the codec throws an error. I'm guessing for some reason python >> normally >> chose a different codec, but on my cygwin compiles it is choosing ascii. >> I'm >> not sure why. Nor am I sure how to inestigate further. > > The encoding defaults to the filesystem encoding or otherwise Latin-1. > There's an XXX comment in io.py, in TextIOWrapper.__init__, admitting > this is questionable. I'm guessing CYGWIN has a filesystem encoding > equal to ASCII? Is this a good idea? Quite possibly. I know they have wanted to move using the unicode API's to support everything, but that is a pain because of the meathod that windows uses internally to support Unicode. > Maybe the default encoding should always be UTF-8 (matching the source > code default encoding). > > I can also fix it by changing test_coding.py to add encoding="utf-8" > to the open() call in verify_bad_module(). > >> Heres a fairly useless loking traceback for test_marshal. Many of the >> tests >> fail with nearly identical tracebacks: >> >> #====================================================================== >> #ERROR: test_tuple (test.test_marshal.ContainerTestCase) >> #---------------------------------------------------------------------- >> #Traceback (most recent call last): >> # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 134, in >> test_tuple >> # self.helper(tuple(self.d.keys())) >> # File "/home/Owner/py3k-struni/Lib/test/test_marshal.py", line 21, in >> helper >> # new = marshal.load(f) >> #ValueError: recursion limit exceeded >> >> For what it's worth here is the fll subtest list and status for >> test_marshal: >> >> #test_bool (test.test_marshal.IntTestCase) ... ERROR >> #test_int64 (test.test_marshal.IntTestCase) ... ok >> #test_ints (test.test_marshal.IntTestCase) ... ERROR >> #test_floats (test.test_marshal.FloatTestCase) ... ERROR >> #test_buffer (test.test_marshal.StringTestCase) ... ERROR >> #test_string (test.test_marshal.StringTestCase) ... ERROR >> #test_unicode (test.test_marshal.StringTestCase) ... ERROR >> #test_code (test.test_marshal.CodeTestCase) ... ok >> #test_dict (test.test_marshal.ContainerTestCase) ... ERROR >> #test_list (test.test_marshal.ContainerTestCase) ... ERROR >> #test_sets (test.test_marshal.ContainerTestCase) ... ERROR >> #test_tuple (test.test_marshal.ContainerTestCase) ... ERROR >> #test_exceptions (test.test_marshal.ExceptionTestCase) ... ok >> #test_bug_5888452 (test.test_marshal.BugsTestCase) ... ok >> #test_fuzz (test.test_marshal.BugsTestCase) ... ok >> #test_loads_recursion (test.test_marshal.BugsTestCase) ... ok >> #test_patch_873224 (test.test_marshal.BugsTestCase) ... ok >> #test_recursion_limit (test.test_marshal.BugsTestCase) ... ok >> #test_version_argument (test.test_marshal.BugsTestCase) ... ok >> >> I'm wondering if the recusion limit on my build is getting set too low >> somehow. > > Can you find out what it is? sys.getrecursionlimit(). Hmm... It is a limit of 1000. That is probably large enough, no? Anyway, from some basic testing it looks like marshal is always throwing that error when marshal.load() is called. However, marshal.loads() works fine. Might this be another encoding related error? From guido at python.org Wed Jul 18 20:56:07 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 11:56:07 -0700 Subject: [Python-3000] Introspection broken for objects using Py_FindMethod() In-Reply-To: <acd65fa20707181127w3507c064rc02e6241c24d86f2@mail.gmail.com> References: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> <ca471dc20707171652w254d597bl9068abae61b64da4@mail.gmail.com> <acd65fa20707181127w3507c064rc02e6241c24d86f2@mail.gmail.com> Message-ID: <ca471dc20707181156h5c8b874coefe02a58307d9d7c@mail.gmail.com> On 7/18/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > On 7/17/07, Guido van Rossum <guido at python.org> wrote: > > Yes, see a thread between me, Georg and Brett around March 7-10: > > > > http://mail.python.org/pipermail/python-3000/2007-March/006061.html > > > > Thanks for the pointer. > > > I think the conclusion was to get rid of Py_FindMethod altogether. The > > replacement isn't very hard. But it hasn't been done yet. > > Do you need you some help for that? Perhaps, I could try to write a > patch to replace the trivial use cases of Py_FindMethod in the stdlib. > Also, I think it would be a good idea to document the change, too. That would be great! The Python 3000 project can use all the help it can get! Please use the py3k-struni branch. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 18 20:58:17 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 11:58:17 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f7lnd8$l2s$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> Message-ID: <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> On 7/18/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > >> I'm wondering if the recusion limit on my build is getting set too low > >> somehow. > > > > Can you find out what it is? sys.getrecursionlimit(). > > Hmm... It is a limit of 1000. > That is probably large enough, no? Yes, that's what it is for me. > Anyway, from some basic testing it looks like marshal is always throwing > that error when marshal.load() is called. > However, marshal.loads() works fine. > > Might this be another encoding related error? Perhaps. Or something else. Do try to investigate. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Wed Jul 18 22:32:57 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 18 Jul 2007 16:32:57 -0400 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> Message-ID: <acd65fa20707181332n480bf6fsa7bff17403770786@mail.gmail.com> So, any decision on the proposed semantic change of truncate? -- Alexandre On 7/3/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > On 7/2/07, Guido van Rossum <guido at python.org> wrote: > > Honestly, I think truncate() should always set the current position to > > the new size, even though that's not what it currently does. > > Thought about that and I think that would be the best thing to do. > That would avoid making StringIO unnecessary different from BytesIO. > And IMHO, it is less prone to bugs. If someone wants to truncate while > keeping the current position, then he will have to state is intention > explicitly by saving the value of tell() and calling seek() after > truncating. > > I also find the semantic make more sense too. For example: > > >>> s = StringIO("Good bye, world") > >>> s.truncate(10) > >>> s.write("cruel world") > >>> s.getvalue() > ??? > > I think that should return "Good bye, cruel world", not "cruel world". > > So, does anyone else agree with this small semantic change of truncate()? > From guido at python.org Wed Jul 18 22:36:26 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 13:36:26 -0700 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20707181332n480bf6fsa7bff17403770786@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231052x561e7acfpf84373ea670c2974@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> <acd65fa20707181332n480bf6fsa7bff17403770786@mail.gmail.com> Message-ID: <ca471dc20707181336n294fc353vc4eefc82854a8759@mail.gmail.com> Unless anyone cares, it should imply a seek to the indicated position if an argument was present. On 7/18/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > So, any decision on the proposed semantic change of truncate? > > -- Alexandre > > On 7/3/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > > On 7/2/07, Guido van Rossum <guido at python.org> wrote: > > > Honestly, I think truncate() should always set the current position to > > > the new size, even though that's not what it currently does. > > > > Thought about that and I think that would be the best thing to do. > > That would avoid making StringIO unnecessary different from BytesIO. > > And IMHO, it is less prone to bugs. If someone wants to truncate while > > keeping the current position, then he will have to state is intention > > explicitly by saving the value of tell() and calling seek() after > > truncating. > > > > I also find the semantic make more sense too. For example: > > > > >>> s = StringIO("Good bye, world") > > >>> s.truncate(10) > > >>> s.write("cruel world") > > >>> s.getvalue() > > ??? > > > > I think that should return "Good bye, cruel world", not "cruel world". > > > > So, does anyone else agree with this small semantic change of truncate()? > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From kbk at shore.net Wed Jul 18 23:34:05 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Wed, 18 Jul 2007 17:34:05 -0400 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707181027u182550dyaaf362fc718dd883@mail.gmail.com> (Guido van Rossum's message of "Wed, 18 Jul 2007 10:27:01 -0700") References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <87k5sy5j6l.fsf@hydra.bayview.thirdcreek.com> <ca471dc20707181027u182550dyaaf362fc718dd883@mail.gmail.com> Message-ID: <87d4yp5rci.fsf@hydra.bayview.thirdcreek.com> "Guido van Rossum" <guido at python.org> writes: >> www.python.org/sf/1755885 > > Thanks! Checked in, and merged into p3yk. Thanks! Unfortunately, I see there's an error from test_unicode.py, which I neglected to re-run. My apologies! I've checked in a fix on the trunk and the buildbots are relatively happy once more, it seems. Should be caught in the next merge. -- KBK From guido at python.org Wed Jul 18 23:42:37 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 14:42:37 -0700 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <87d4yp5rci.fsf@hydra.bayview.thirdcreek.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <87k5sy5j6l.fsf@hydra.bayview.thirdcreek.com> <ca471dc20707181027u182550dyaaf362fc718dd883@mail.gmail.com> <87d4yp5rci.fsf@hydra.bayview.thirdcreek.com> Message-ID: <ca471dc20707181442w6d82d090rd2b8341f4ee097ee@mail.gmail.com> On 7/18/07, Kurt B. Kaiser <kbk at shore.net> wrote: > "Guido van Rossum" <guido at python.org> writes: > > >> www.python.org/sf/1755885 > > > > Thanks! Checked in, and merged into p3yk. > > Thanks! > > Unfortunately, I see there's an error from test_unicode.py, which I > neglected to re-run. My apologies! > > I've checked in a fix on the trunk and the buildbots are relatively > happy once more, it seems. > > Should be caught in the next merge. Ah, I see. I fixed it separately in the py3k-struni branch. I'll try to remember the next time I merge. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Wed Jul 18 23:42:56 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jul 2007 23:42:56 +0200 Subject: [Python-3000] Invalid \U escape in source code give hard-to-trace error In-Reply-To: <ca471dc20707181031sa2339a4u4900de65a549c4e2@mail.gmail.com> References: <ca471dc20707150717m7344c9cfh3237b78e9dcf681f@mail.gmail.com> <469D8AA5.1080502@v.loewis.de> <ca471dc20707181031sa2339a4u4900de65a549c4e2@mail.gmail.com> Message-ID: <f7m1gn$odp$1@sea.gmane.org> Guido van Rossum schrieb: > On 7/17/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: >> > When a source file contains a string literal with an out-of-range \U >> > escape (e.g. "\U12345678"), instead of a syntax error pointing to the >> > offending literal, I get this, without any indication of the file or >> > line: >> > >> > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in >> > position 0-9: illegal Unicode character >> > >> > This is quite hard to track down. >> >> I think the fundamental flaw is that a codec is used to implement >> the Python syntax (or, rather, lexical rules). >> >> Not quite sure what the rationale for this design was; doing it on >> the lexical level is (was) tricky because \u escapes were allowed >> only for Unicode literals, and the lexer had no knowledge of the >> prefix preceding a literal. (In 3k, it's still similar, because >> \U escapes have no effect in bytes and raw literals). >> >> Still, even if it is "only" handled at the parsing level, I >> don't see why it needs to be a codec. Instead, implementing >> escapes in the compiler would still allow for proper diagnostics >> (notice that in the AST the original lexical form of the string >> literal is gone). > > I guess because it was deemed useful to have a codec for this purpose > too, thereby exposing the algorithm to Python code that needs the same > functionality (e.g. the compiler package, RIP). And it still is useful. If you want to convert a string into a printable representation, you can use repr(), but for the inverse you need this codec. (or eval()...) Georg From alexandre at peadrop.com Wed Jul 18 23:43:54 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 18 Jul 2007 17:43:54 -0400 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <f7e24l$92e$1@sea.gmane.org> References: <f7e24l$92e$1@sea.gmane.org> Message-ID: <acd65fa20707181443y121705cevb0d36a3f816ef6d@mail.gmail.com> On 7/15/07, Georg Brandl <g.brandl at gmx.net> wrote: > Most obvious would be a special comment, something like > > for x in curiousobject.iteritems(): # 2to3:keep > foo(x) > > Does that make sense? It would be a good idea to define a convention for these special comments. For example, we could define something similar to C's pragma: #pragma <feature> <option> ... or perhaps, #: <feature> <option> ... So, your example would become: for x in curiousobject.iteritems(): #pragma 2to3 keep foo(x) I expect other tools, like pdb.py and trace.py could follow this convention as well. For example: def buggy_func(): #pragma pdb break pass if debug: #pragma trace ignore pass The motivation for making a such convention, is to make it easy for programmers to identify comments that are in fact control lines. -- Alexandre From g.brandl at gmx.net Wed Jul 18 23:44:11 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 18 Jul 2007 23:44:11 +0200 Subject: [Python-3000] Introspection broken for objects using Py_FindMethod() In-Reply-To: <acd65fa20707181127w3507c064rc02e6241c24d86f2@mail.gmail.com> References: <acd65fa20707171627t29f9fc03p37164f3d87d94a25@mail.gmail.com> <ca471dc20707171652w254d597bl9068abae61b64da4@mail.gmail.com> <acd65fa20707181127w3507c064rc02e6241c24d86f2@mail.gmail.com> Message-ID: <f7m1j2$odp$2@sea.gmane.org> Alexandre Vassalotti schrieb: > On 7/17/07, Guido van Rossum <guido at python.org> wrote: >> Yes, see a thread between me, Georg and Brett around March 7-10: >> >> http://mail.python.org/pipermail/python-3000/2007-March/006061.html >> > > Thanks for the pointer. > >> I think the conclusion was to get rid of Py_FindMethod altogether. The >> replacement isn't very hard. But it hasn't been done yet. > > Do you need you some help for that? Perhaps, I could try to write a > patch to replace the trivial use cases of Py_FindMethod in the stdlib. > Also, I think it would be a good idea to document the change, too. I once started a patch for that, but deferred it IIRC in pyexpat or elementtree. I'll look it I still have it lying around somewhere. Georg From benji at benjiyork.com Wed Jul 18 23:59:19 2007 From: benji at benjiyork.com (Benji York) Date: Wed, 18 Jul 2007 17:59:19 -0400 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <acd65fa20707181443y121705cevb0d36a3f816ef6d@mail.gmail.com> References: <f7e24l$92e$1@sea.gmane.org> <acd65fa20707181443y121705cevb0d36a3f816ef6d@mail.gmail.com> Message-ID: <469E8D37.9050006@benjiyork.com> Alexandre Vassalotti wrote: > I expect other tools, like pdb.py and trace.py could follow this > convention as well. For example: I used the time machine to convince the author of trace.py use this convention. He didn't like your spelling, but eventually agreed to #pragma NO COVER. -- Benji York http://benjiyork.com From guido at python.org Thu Jul 19 01:11:56 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 16:11:56 -0700 Subject: [Python-3000] Announcing PEP 3136 In-Reply-To: <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> References: <20070630205444.GD22221@theory.org> <ca471dc20707030114m7fa2d74btb21c8bd1ae8023db@mail.gmail.com> Message-ID: <ca471dc20707181611t1fad2e32qaf15d4407a580915@mail.gmail.com> (FWIW, I've formally rejected the PEP now, referring to this message.) --Guido On 7/3/07, Guido van Rossum <guido at python.org> wrote: > On 6/30/07, Matt Chisholm <matt-python at theory.org> wrote: > > I've created and submitted a new PEP proposing support for labels in > > Python's break and continue statements. Georg Brandl has graciously > > added it to the PEP list as PEP 3136: > > > > http://www.python.org/dev/peps/pep-3136/ > > I think this is a good summary of various proposals that have been > floated in the past, plus some new ones. As a PEP, it falls short > because it doesn't pick a solution but merely offers a large menu of > possible options. Also, there is nothing about implementation yet. > > However, I'm rejecting it on the basis that code so complicated to > require this feature is very rare. In most cases there are existing > work-arounds that produce clean code, for example using 'return'. > While I'm sure there are some (rare) real cases where clarity of the > code would suffer from a refactoring that makes it possible to use > return, this is offset by two issues: > > 1. The complexity added to the language, permanently. This affects not > only all Python implementations, but also every source analysis tool, > plus of course all documentation for the language. > > 2. My expectation that the feature will be abused more than it will be > used right, leading to a net decrease in code clarity (measured across > all Python code written henceforth). Lazy programmers are everywhere, > and before you know it you have an incredible mess on your hands of > unintelligible code. > > I realize this is a heavy bar to pass, and somewhat subjective. That's > okay. There is real value in having a small language. Also, as I said, > while there are no past PEPs to document it, this has been brought up > and rejected many times before. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu Jul 19 01:51:09 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 19 Jul 2007 11:51:09 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> Message-ID: <469EA76D.7000204@canterbury.ac.nz> Guido van Rossum wrote: > Sorry, but I'm still totally uncomfortable with this. While I admit > the feature exists, I really, really, really don't want it to be used > on a regular basis. As long as the objects defined by a regular def statement aren't modifiable, it seems like it won't be possible to support retroactive generification of functions that haven't initially been defined as generic somehow. So effectively you're saying that you're against this, or willing to forego it? Not arguing one way or the other, just seeking to clarify your position. -- Greg From guido at python.org Thu Jul 19 01:57:01 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 16:57:01 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469C4B0B.50605@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> Message-ID: <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> On 7/16/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > Guido van Rossum schrieb: > > That sounds like a good idea to try. It may break some more tests but > > those are all indications of places that incorrectly still require > > str8. > > > >> I wonder whether the "s" specifier in CallFunction, BuildValue etc > >> should create Unicode objects, rather than str8 objects. > > Done. I fixed a number of test cases that broke because of that. > In particular, bytes.__reduce__ could not easily return str8 objects > as its marshalling state anymore (and shouldn't do so, anyway). > So I made bytes a builtin type of pickle, using the S code. > As a consequence, a number of other types had to get fixed. > > So in total, it adds one new failure: something in test_pickle > now complains that bytes objects are not hashable. Now that this is checked in, I understand the problem. You are using the same opcodes for pickling bytes and str8 -- save_bytes() is a clone of save_string() (the latter is the callback for str8, not for str). But you made load_string() always return bytes. The broken tests fail because they use hardcoded pickles which use the STRING opcode to save a str8 which is used as a dict key. You broke backwards compatibility this way; I think that a pickle produced by Python 2.x should be readable by Python 3.0. Now, one could argue about whether an 8-bit string pickled in 2.x should be returned as a Unicode string in 3.0 or as a bytes array. There is even an argument to be made that it should be a bytes array, since an 8-bit string in 2.x it's just as likely to represent binary data as text data, and even if it's text, we don't know the encoding. But I think that there is a counter-argument that's stronger: the dict {'a': 42} pickled in 2.x must unpickle as a dict with an immutable object as key. So we should either unpickle 'a' as a (unicode) str with value 'a', or as (8-bit) str8, as long as the latter type exists (I haven't decided whether to keep str8 or something like it, or whether to try to get rid of it completely). One possibility might be to first try to decode the STRING argument as utf-8, and if that fails to convert it to str8 instead. What do you think? I don't understand all of the changes you made in r56438, perhaps you can save most of them. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jul 19 01:59:52 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 16:59:52 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <469EA76D.7000204@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> <469EA76D.7000204@canterbury.ac.nz> Message-ID: <ca471dc20707181659n740ba5a0va8342f833094a855@mail.gmail.com> On 7/18/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > Guido van Rossum wrote: > > Sorry, but I'm still totally uncomfortable with this. While I admit > > the feature exists, I really, really, really don't want it to be used > > on a regular basis. > > As long as the objects defined by a regular def statement > aren't modifiable, it seems like it won't be possible > to support retroactive generification of functions that > haven't initially been defined as generic somehow. > > So effectively you're saying that you're against this, > or willing to forego it? Not arguing one way or the > other, just seeking to clarify your position. The only approach to retroactive generification that I approve of is replacing the entire object with a wrapper of sorts, e.g. foo = generify(foo) or (more likely) import bar bar.foo = generify(bar.foo) I know this has a downside when someone else did "from bar import foo" before the generification was applied; that is a general problem with "from foo import bar" and should be addressed by not using that style in cases where this matters. (It is fine for importing a submodule from a package of course.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Jul 19 02:15:30 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 19 Jul 2007 02:15:30 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> Message-ID: <469EAD22.1040603@v.loewis.de> > You broke backwards compatibility this way; I think that a pickle > produced by Python 2.x should be readable by Python 3.0. It is, is it not? > (I haven't decided whether to keep str8 or something like it, or > whether to try to get rid of it completely). I assumed the latter - and if it indeed goes away, it's certainly a bug to ever return str8 from pickle, right? > One possibility might be to first try to decode the STRING argument as > utf-8, and if that fails to convert it to str8 instead. What do you > think? I don't understand all of the changes you made in r56438, > perhaps you can save most of them. The question really is what bytes should be pickled as; that needs to be decided before fixing the code. Should it be built-in (and if so, using what code)? If not, it probably needs to go through __reduce__, and if so, what should __reduce__ return for bytes object? __reduce__ currently does (O(s#)) with (ob_type, ob_bytes, ob_size). Now, s# creates a Unicode object, and the pickling fails to round-trip correctly. If __reduce__ returns a Unicode object, what encoding should be assumed? (which then needs to be symmetric with bytes()) If __reduce__ returns a str8 object, you will have to keep str8 (or else you cannot pickle bytes). Regards, Martin From guido at python.org Thu Jul 19 05:01:18 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 18 Jul 2007 20:01:18 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469EAD22.1040603@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> Message-ID: <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> On 7/18/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > > You broke backwards compatibility this way; I think that a pickle > > produced by Python 2.x should be readable by Python 3.0. > > It is, is it not? No; {'a': 1} pickled on 2.x results in an error complaining about an unhashable object when the pickle is read in 3.0; this is the error you saw in test_pickle.py. > > (I haven't decided whether to keep str8 or something like it, or > > whether to try to get rid of it completely). > > I assumed the latter - and if it indeed goes away, it's certainly > a bug to ever return str8 from pickle, right? If indeed it goes away, it can't be returned. If it's still around, we can argue about the desirability of returning one. > > One possibility might be to first try to decode the STRING argument as > > utf-8, and if that fails to convert it to str8 instead. What do you > > think? I don't understand all of the changes you made in r56438, > > perhaps you can save most of them. > > The question really is what bytes should be pickled as; that needs to > be decided before fixing the code. Should it be built-in (and if so, > using what code)? If not, it probably needs to go through __reduce__, > and if so, what should __reduce__ return for bytes object? Either a new opcode (which would such a pickle fail hard when unpickled with 2.5, but that's probably fine as it would fail anyway), or some variation of what I coded before, using __reduce__. > __reduce__ currently does (O(s#)) with (ob_type, ob_bytes, ob_size). > Now, s# creates a Unicode object, and the pickling fails to round-trip > correctly. I thought that before your patch a bytes object roundtripped correctly with all three protocols. Or maybe it got broken when s# was changed? An additional requirement might be that if bytes are introduced in 2.6, a pickle containing bytes written by 3.0 should be readable by 2.6. Ideally, pickles not containing bytes written in 3.0 should always be readable in 2.6 (assuming the user-defined types it references exist). > If __reduce__ returns a Unicode object, what encoding should be assumed? > (which then needs to be symmetric with bytes()) > > If __reduce__ returns a str8 object, you will have to keep str8 (or > else you cannot pickle bytes). When __reduce__ returns a string at all, that means it's the name of a global. I guess that should be encoded using UTF-8, so that as long as the name is ASCII, 2.x can unpickle it. But I'm not sure if that's what you were asking. Anyway, one reason this is such a mess is clearly that the pickle protocol has no independent spec -- it's grown organically in code. Reverse-engineering the intent of the code is a pain. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Jul 19 09:06:58 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 19 Jul 2007 09:06:58 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> Message-ID: <469F0D92.7080301@v.loewis.de> >> __reduce__ currently does (O(s#)) with (ob_type, ob_bytes, ob_size). >> Now, s# creates a Unicode object, and the pickling fails to round-trip >> correctly. > > I thought that before your patch a bytes object roundtripped correctly > with all three protocols. Or maybe it got broken when s# was changed? It did, and it got. s# used to return a str8, which then was pickled byte-for-byte. When s# started to return Unicode strings, bytes above 128 got widened to Py_UNICODE (which is what currently PyUnicode_FromString does), so b'\xFF' became bytes('\uFFFF'). That got pickled and unpickled; then bytes('\uFFFF') is b'\xef\xbf\xbf' (because it applies the default encoding to the unicode argument), and it failed to roundtrip to b'\xFF'. It's actually not possible to generate b'\xFF' using a unicode string argument, as string the default encoding will never return s'\xFF' (as that's not valid UTF-8). > An additional requirement might be that if bytes are introduced in > 2.6, a pickle containing bytes written by 3.0 should be readable by > 2.6. Sure: whatever we decide now needs to be applied to 2.6 also. >> If __reduce__ returns a Unicode object, what encoding should be assumed? >> (which then needs to be symmetric with bytes()) >> >> If __reduce__ returns a str8 object, you will have to keep str8 (or >> else you cannot pickle bytes). > > When __reduce__ returns a string at all, that means it's the name of a > global. I guess that should be encoded using UTF-8, so that as long as > the name is ASCII, 2.x can unpickle it. But I'm not sure if that's > what you were asking. No. py> b'foo'.__reduce__() (<type 'bytes'>, ('foo',)) py> b'\xff'.__reduce__() (<type 'bytes'>, ('\uffff',)) It returns one string each time, as the first element of a one-element tuple (that is then passed to the bytes() constructor on unpickling) > Anyway, one reason this is such a mess is clearly that the pickle > protocol has no independent spec -- it's grown organically in code. > Reverse-engineering the intent of the code is a pain. That's also true, but I don't see it much as a problem here. If it had a spec, that spec would have said that b'S', b'T' and b'U' have a str payload. That spec would break if str8 goes away, and the spec would be changed to explain how these codes act in 2.x and 3.x. It would not talk at all about the bytes type, and that it's __reduce__ might return different things in 2.x and 3.x (unless bytes gets a primitive code for pickle). Regards, Martin From p.f.moore at gmail.com Thu Jul 19 10:30:35 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 19 Jul 2007 09:30:35 +0100 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707181659n740ba5a0va8342f833094a855@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> <469EA76D.7000204@canterbury.ac.nz> <ca471dc20707181659n740ba5a0va8342f833094a855@mail.gmail.com> Message-ID: <79990c6b0707190130g7a5c7804kcf96b0e9956724c2@mail.gmail.com> On 19/07/07, Guido van Rossum <guido at python.org> wrote: > The only approach to retroactive generification that I approve of is > replacing the entire object with a wrapper of sorts, e.g. > > foo = generify(foo) Which (again, just to clarify) means that you would require that generic functions be introduced by a decorator? @generic def foo(): pass (your explicit equivalent would be for "after the fact" conversion to a generic). Paul From aurelien.campeas at logilab.fr Thu Jul 19 10:42:15 2007 From: aurelien.campeas at logilab.fr (=?iso-8859-1?Q?Aur=E9lien_Camp=E9as?=) Date: Thu, 19 Jul 2007 10:42:15 +0200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070713173936.53C213A404D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> Message-ID: <20070719084215.GA18244@crater.logilab.fr> On Fri, Jul 13, 2007 at 01:41:47PM -0400, Phillip J. Eby wrote: > At 07:39 AM 7/13/2007 +0200, Michele Simionato wrote: > >But I want to ask your opinion first, in order to understand if you > >are willing to scale down your proposal or not. At EuroPython Guido > >said that in private mail you made some strong argument explaining > >why the PEP could not be simplified, but he did not say more than that > > It's not an argument that the PEP can't be simplified; only that a > simpler PEP won't accomplish my original goal for the PEP (of having > a generic API for generic functions) vs. simply having a generic > function implementation in the stdlib. The first goal requires the > second, but the second doesn't need the first, and as far as I'm > aware, I'm the only person who really wants the first. At least on this list ? Well, you could add me yo your count ... but who am I ? ;-) > > A simpler PEP could exist to implement the second goal only, > implementing dynamic overloading in Python 3.0 with all of the > non-controversial features of 3124, and using Guido's preferred API. > > The holdup is that I don't have time to work on the *implementation* > of both my version *and* this simplified version; there is little > overlap between the two because mine is highly > self-referential/self-bootstrapping, absolutely dependent on being > able to modify functions in-place (a feature Guido seems near -1 on), > and virtually impossible to scale down. > > So, it is much lower on my priorities at the moment to implement the > simplified version, because I will neither gain code reuse *nor* the > API standardization I'd hoped for. > > At the moment, my plan is to finish implementing a PEP 3124-like, > fully extensible implementation for Python 2.x (see PEAK-Rules), then > look at splitting 3124 into a simplified version and a separate > extension API PEP aimed at Python 3.1 or later. At that point, I > will know for sure what extension API features are necessary to > implement the more advanced features I want in PEAK-Rules. > > I expect to be able to start work on this (i.e., revisiting the > proposal) in about a month. With luck, I will be able to carve out > enough time to create the simpler implementation and update the PEP > in a reasonable amount of time. > > However, there is nothing stopping anyone else who wishes it from > either making the simpler implementation or drafting the scaled-down > PEP. The simpler version Guido wants isn't really that different > from his existing generic function prototype, especially if you drop > all forms of method combination (including :next_method). It will Maybe it's just a silly data point, but the current Zope/Plone & assorted products codebases are riddled with ad-hoc before, after methods and hard-coded super-calls ... I don't know what these have become in Zope 3 but at least this shows a need. Having standard ways to specify these methods as gfs, would be a boon. OTOH having generic functions without the standard method combination looks a bit like a futile exercise; these are especially useful when you build hog frameworks such as zope and whatever sits and tries to cooperate on top of it. Maybe thinking about method combination as 'dynamic decoration' (paralelling the 'generic functions'/'dynamically overloadable functions' terminology shift) would be a more friedly way to teach python folks about the feature ? (Since it seems to me that python wants to absorb foreign languages features under different names.) I would have liked to have input on this from other people using RuleDispatch features also (doesn't one of Django/Turbogears project use them extensively ?). Just so the BDFL & lieutenants don't argue too much in the direction of 'the community has no experience with these things'. I think (wishfully ?) a sizeable, if not big, part of the python *user* community is knwoledgeable about it. These people do not necessarily express themselves there. My two cents, Aur?lien. > also need positional dispatching, but that's another feature that > could perhaps wait for 3.1 as well. > > In short, if you want a PEP 3124 implementation started on sooner > than about a month from now, you need to find a volunteer or do it yourself. > > > >The point is that for 95% of my use cases, simplegeneric would be > >enough, and it is alreay available *now*. So, if Guido was willing > >to accept something like simplegeneric for Python 3.0, I would not > >mind waiting for multiple dispatch in 3.1. > > You'll have to ask him about that. For what it's worth, the pkgutil > module already contains an even simpler generic function > implementation than simplegeneric, and is already in the stdlib > albeit undocumented. > > > >The reason why I am not using simplegeneric or RuleDispatch already, > >is that I do not want to commit in production to a technology > >without the official approval of the BDFL, and I prefer to wait now > >than having to change my code later. > > I guess this means you never use any packages from the Cheeseshop? :) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/aurelien.campeas%40logilab.fr From p.f.moore at gmail.com Thu Jul 19 12:58:54 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 19 Jul 2007 11:58:54 +0100 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070719084215.GA18244@crater.logilab.fr> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> Message-ID: <79990c6b0707190358q3b64d67ejd342a24ada9ca8fd@mail.gmail.com> On 19/07/07, Aur?lien Camp?as <aurelien.campeas at logilab.fr> wrote: > On Fri, Jul 13, 2007 at 01:41:47PM -0400, Phillip J. Eby wrote: > > At 07:39 AM 7/13/2007 +0200, Michele Simionato wrote: > > >But I want to ask your opinion first, in order to understand if you > > >are willing to scale down your proposal or not. At EuroPython Guido > > >said that in private mail you made some strong argument explaining > > >why the PEP could not be simplified, but he did not say more than that > > > > It's not an argument that the PEP can't be simplified; only that a > > simpler PEP won't accomplish my original goal for the PEP (of having > > a generic API for generic functions) vs. simply having a generic > > function implementation in the stdlib. The first goal requires the > > second, but the second doesn't need the first, and as far as I'm > > aware, I'm the only person who really wants the first. > > At least on this list ? > Well, you could add me yo your count ... but who am I ? ;-) I don't think the issue is quite as black and white as Phillip is stating it. I personally have no immediate need for his more advanced API, but I'd support its inclusion if that meant increasing the chance of *any* GF API going into the core. There really ought to be an "Open Issues" section of the PEP, capturing the key areas where we don't have agreement. The lack of such a section is what makes it almost impossible to follow the discussions, insofar as how they make progress towards accepting the PEP. As a contribution to the discussion, may I offer the following as the key items I believe are open: 1. The "Advanced" API - some people (including Guido?) do not see the need for the advanced features of the PEP such as method combinations. On the other hand, no-one has offered to write up of implement a reduced version. 2. Functions being modifiable in-place. Technical issues with the implementation of the advanced API are complex to code without assuming that function objects can be modified (which Guido is unwilling to sanction in the general case). Furthermore, the PEP specifically states that @overload modifies existing functions in-place. 3. All functions are generic - The PEP states that the @overload decorator will work on any function, which requires in-place modification. By requiring overloadable functions to be declared somehow (for example, using a decorator) this requirement could possibly be removed. My apologies if I've misrepresented anyone's views. Please correct me if I have! I hope this is of some use. Paul. From guido at python.org Thu Jul 19 16:07:09 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jul 2007 07:07:09 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <79990c6b0707190130g7a5c7804kcf96b0e9956724c2@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <loom.20070713T201857-3@post.gmane.org> <ca471dc20707171447p68c59c44w254ee9890eb44b8f@mail.gmail.com> <20070717223550.7B1B13A403A@sparrow.telecommunity.com> <469D6EC6.9010005@canterbury.ac.nz> <20070718020310.2168A3A403A@sparrow.telecommunity.com> <ca471dc20707180947p41fdcd8k9be97b50658b7385@mail.gmail.com> <469EA76D.7000204@canterbury.ac.nz> <ca471dc20707181659n740ba5a0va8342f833094a855@mail.gmail.com> <79990c6b0707190130g7a5c7804kcf96b0e9956724c2@mail.gmail.com> Message-ID: <ca471dc20707190707o31ada610w2c5a7133233d5406@mail.gmail.com> On 7/19/07, Paul Moore <p.f.moore at gmail.com> wrote: > On 19/07/07, Guido van Rossum <guido at python.org> wrote: > > The only approach to retroactive generification that I approve of is > > replacing the entire object with a wrapper of sorts, e.g. > > > > foo = generify(foo) > > Which (again, just to clarify) means that you would require that > generic functions be introduced by a decorator? > > @generic > def foo(): > pass > > (your explicit equivalent would be for "after the fact" conversion to > a generic). Yes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Jul 19 16:22:37 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jul 2007 07:22:37 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070719084215.GA18244@crater.logilab.fr> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> Message-ID: <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> On 7/19/07, Aur?lien Camp?as <aurelien.campeas at logilab.fr> wrote: > I would have liked to have input on this from other people using > RuleDispatch features also (doesn't one of Django/Turbogears project > use them extensively ?). Just so the BDFL & lieutenants don't argue > too much in the direction of 'the community has no experience with > these things'. I think (wishfully ?) a sizeable, if not big, part of > the python *user* community is knwoledgeable about it. These people do > not necessarily express themselves there. Thanks for posting. It's been excruciatingly hard to find anyone besides Phillip interested in GFs or able to provide use cases. For me they're mostly still something theoretically interesting from other languages, like continuations. Maybe you can round up some more users? FWIW, I think the Turbogears use you're thinking of is jsonify, a GF for converting arbitrary Python data into JSON (JavaScript Object Notation). But I'm not aware of it using any of the advanced features -- it seems to be using just the basic facility of overloading on a single argument type, which could be done with my own "overloading" example (see the Python subversion sandbox). At least that's what I got from skimming the docs: http://docs.turbogears.org/1.0/JsonifyDecorator . That article claims that TurboGears uses RuleDispatch extensively. I'd love to hear from them about how they use the advanced features. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Thu Jul 19 17:34:38 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 19 Jul 2007 11:34:38 -0400 Subject: [Python-3000] exclusion feature for 2to3? In-Reply-To: <469E8D37.9050006@benjiyork.com> References: <f7e24l$92e$1@sea.gmane.org> <acd65fa20707181443y121705cevb0d36a3f816ef6d@mail.gmail.com> <469E8D37.9050006@benjiyork.com> Message-ID: <acd65fa20707190834y7f6034e3uba166b7eaa5066ed@mail.gmail.com> On 7/18/07, Benji York <benji at benjiyork.com> wrote: > Alexandre Vassalotti wrote: > > I expect other tools, like pdb.py and trace.py could follow this > > convention as well. For example: > > I used the time machine to convince the author of trace.py use this > convention. Uh? > He didn't like your spelling, but eventually agreed to #pragma NO COVER. Ah! :) Yes, that is where I got the spelling. I don't really like it either, but I haven't found anything better. -- Alexandre From aurelien.campeas at logilab.fr Thu Jul 19 17:41:42 2007 From: aurelien.campeas at logilab.fr (=?iso-8859-1?Q?Aur=E9lien_Camp=E9as?=) Date: Thu, 19 Jul 2007 17:41:42 +0200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> Message-ID: <20070719154141.GD18244@crater.logilab.fr> On Thu, Jul 19, 2007 at 07:22:37AM -0700, Guido van Rossum wrote: > On 7/19/07, Aur?lien Camp?as <aurelien.campeas at logilab.fr> wrote: >> I would have liked to have input on this from other people using >> RuleDispatch features also (doesn't one of Django/Turbogears project >> use them extensively ?). Just so the BDFL & lieutenants don't argue >> too much in the direction of 'the community has no experience with >> these things'. I think (wishfully ?) a sizeable, if not big, part of >> the python *user* community is knwoledgeable about it. These people do >> not necessarily express themselves there. > > Thanks for posting. It's been excruciatingly hard to find anyone > besides Phillip interested in GFs or able to provide use cases. For me > they're mostly still something theoretically interesting from other > languages, like continuations. Maybe you can round up some more > users? I will try. Please note that (imho) unlike scheme's first class continuations (which are clearly an ?ber-powerful, hard-to-master meta-programming feature), method combinations are just another tool for day-to-day programming (in languages that already provide them), especially large systems. One can certainly live without them, just like one can program without the python 2.5 with statement. I sincerely believe Zope cries for gfs, including standard method combination, since its inception. Btw, I like to think of 'with' as a (static) decorator for code blocks. Why not see before/after/around methods like a variation on the theme of (dynamic) decoration of existing methods ? Terminology change seems important for the Python community as it (perhaps) helps assimilation of new concepts in the light of ones that are already mastered. Dunno if that makes sense, yet. > > FWIW, I think the Turbogears use you're thinking of is jsonify, a GF > for converting arbitrary Python data into JSON (JavaScript Object > Notation). Yes and I remember well Simon Belak's presentation (and enthusiasm) at EP 2006. At least from http://turbogears.org/ultimate.html one sees that generic functions are somewhat used also in : # choose widgets for data entry (tgfastdata.formmaker) # pick an output method for expose() (turbogears.controllers) # choose an error handler when something goes wrong (turbogears.errorhandling) > But I'm not aware of it using any of the advanced features > -- it seems to be using just the basic facility of overloading on a > single argument type, which could be done with my own "overloading" > example (see the Python subversion sandbox). At least that's what I > got from skimming the docs: > http://docs.turbogears.org/1.0/JsonifyDecorator . That article claims > that TurboGears uses RuleDispatch extensively. I'd love to hear from > them about how they use the advanced features. I might want to take some time next week to have a look at the source. Anyway thanks for leting that door still open, I felt like it was all done. Aur?lien. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From pje at telecommunity.com Thu Jul 19 17:56:17 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 19 Jul 2007 11:56:17 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> Message-ID: <20070719160724.62AE73A403A@sparrow.telecommunity.com> At 07:22 AM 7/19/2007 -0700, Guido van Rossum wrote: >On 7/19/07, Aur?lien Camp?as <aurelien.campeas at logilab.fr> wrote: > > I would have liked to have input on this from other people using > > RuleDispatch features also (doesn't one of Django/Turbogears project > > use them extensively ?). Just so the BDFL & lieutenants don't argue > > too much in the direction of 'the community has no experience with > > these things'. I think (wishfully ?) a sizeable, if not big, part of > > the python *user* community is knwoledgeable about it. These people do > > not necessarily express themselves there. > >Thanks for posting. It's been excruciatingly hard to find anyone >besides Phillip interested in GFs or able to provide use cases. For me >they're mostly still something theoretically interesting from other >languages, like continuations. Maybe you can round up some more users? About a month ago, googling PEP 3124 turned up a handful of blog posts in support. I also got a few private emails of support. The blog posts weren't from anybody I know or who are past users of my libraries, AFAICT, and at any rate aren't the same people who emailed. My simplegeneric package has hundreds of downloads logged at the Cheeseshop -- about 1/8th as many as wsgiref, if that gives you any idea of relative popularity. RuleDispatch isn't on the Cheeseshop, so I don't know how many people are using that. But the people that are, are very enthusiastic. During the time period when RuleDispatch wasn't working properly on Python 2.5 yet, I got fairly regular emails asking when it would. :) RuleDispatch uses my DecoratorTools package, whose 1.4 version had over 8000 Cheeseshop downloads (more than double wsgiref), and I believe that those are mostly due to TurboGears' use of RuleDispatch (as well as direct use of DecoratorTools). >FWIW, I think the Turbogears use you're thinking of is jsonify, a GF >for converting arbitrary Python data into JSON (JavaScript Object >Notation). But I'm not aware of it using any of the advanced features >-- it seems to be using just the basic facility of overloading on a >single argument type, which could be done with my own "overloading" >example (see the Python subversion sandbox). Actually, for that use case even simplegeneric would suffice, but at the time JSONify was written, it didn't exist yet. By the way, I recently came across a use case for @around that I hadn't mentioned before. I'm in the process of re-implementing RuleDispatch's expression features in PEAK-Rules, and as I was defining the rules for intersecting logical conditions, it occurred to me that you could define intersection in terms of implication. When intersecting conditions A and B, you can return A if it implies B, or B if it implies A. So I just wrote this (translated here to the PEP 3124 dialect): @around(intersect) def intersect_if_implies(c1:object, c2:object, nm:next_method): if implies(c1, c2): return c1 elif implies(c2, c1): return c2 return nm(c1, c2) Because this method is @around, it is called before any ordinary methods are called, even if they apply to more specific types than 'object'. This means you only have to define intersection algorithms to handle conditions that don't imply each other. (Assuming of course you've defined implies() relationships.) When I realized I could do this, I was able to ditch a bunch of duplicated code in the individual intersect() relationships I had, and avoided having to write that code for the rest of the intersect() methods I had left to write. From pje at telecommunity.com Thu Jul 19 18:16:30 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 19 Jul 2007 12:16:30 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <79990c6b0707190358q3b64d67ejd342a24ada9ca8fd@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> <79990c6b0707190358q3b64d67ejd342a24ada9ca8fd@mail.gmail.com> Message-ID: <20070719161414.8EB723A403A@sparrow.telecommunity.com> At 11:58 AM 7/19/2007 +0100, Paul Moore wrote: >1. The "Advanced" API - some people (including Guido?) do not see the >need for the advanced features of the PEP such as method combinations. >On the other hand, no-one has offered to write up of implement a >reduced version. Actually, two people have, if you count me. The other one hasn't yet done any of the things we discussed that they could do, and it's still on my "to do eventually" list to take care of the rest, including an implementation. >2. Functions being modifiable in-place. Technical issues with the >implementation of the advanced API are complex to code without >assuming that function objects can be modified (which Guido is >unwilling to sanction in the general case). Furthermore, the PEP >specifically states that @overload modifies existing functions >in-place. > >3. All functions are generic - The PEP states that the @overload >decorator will work on any function, which requires in-place >modification. By requiring overloadable functions to be declared >somehow (for example, using a decorator) this requirement could >possibly be removed. I've agreed to Guido's terms for this stuff, more than once, and am fine with having a restricted implementation that does things his way. It just won't help me much with my goals for all this, unless we figure out a way for that to co-exist with what I want to do, and I haven't figured that out yet. In the meantime, I've got other pressing projects for OSAF that are mostly keeping me from doing *anything* related to generic functions, even the stuff I *want* to do. OSAF does use simplegeneric in parts of Chandler, btw, but my current work doesn't relate to those parts. I don't have the cycles at the moment for a PEP rewrite *and* implementing another generic function engine besides the five I've already written (and the sixth one that's in progress now). The original plan for PEP 3124 was to port peak.rules.core to 3.0 after some feature additions, but the stripped-down design calls for a different implementation -- especially since peak.rules.core modifies functions in place. (A minor irony: one of the reasons I did it that way instead of creating custom objects and then optimizing them with C, was to make it possible for PyPy and Psyco to optimize the code. In other words, it was intended to *enhance* portability to other Python platforms, not inhibit it!) From tjreedy at udel.edu Thu Jul 19 19:15:30 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 19 Jul 2007 13:15:30 -0400 Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com><20070713173936.53C213A404D@sparrow.telecommunity.com><20070719084215.GA18244@crater.logilab.fr><ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.co m> <20070719160724.62AE73A403A@sparrow.telecommunity.com> Message-ID: <f7o67j$du5$1@sea.gmane.org> "Phillip J. Eby" <pje at telecommunity.com> wrote in message news:20070719160724.62AE73A403A at sparrow.telecommunity.com... By the way, I recently came across a use case for @around that I hadn't mentioned before. I'm in the process of re-implementing RuleDispatch's expression features in PEAK-Rules, and as I was defining the rules for intersecting logical conditions, it occurred to me that you could define intersection in terms of implication. When intersecting conditions A and B, you can return A if it implies B, or B if it implies A. So I just wrote this (translated here to the PEP 3124 dialect): @around(intersect) def intersect_if_implies(c1:object, c2:object, nm:next_method): if implies(c1, c2): return c1 elif implies(c2, c1): return c2 return nm(c1, c2) Because this method is @around, it is called before any ordinary methods are called, even if they apply to more specific types than 'object'. This means you only have to define intersection algorithms to handle conditions that don't imply each other. (Assuming of course you've defined implies() relationships.) When I realized I could do this, I was able to ditch a bunch of duplicated code in the individual intersect() relationships I had, and avoided having to write that code for the rest of the intersect() methods I had left to write. ===================================== As a side note: if you have either a negate() or disjoint(), you can also handle a 3rd of the 4 cases object-generically: elif disjoint(c1,c2): return <empty> #or elif implies(c1, negate(c2): return <empty> # symmetrical with elif implies(c2, negate(c1): trturn <empty> and then the intersection algorithms can assume non-disjointness. tjr From guido at python.org Thu Jul 19 20:32:14 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jul 2007 11:32:14 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469F0D92.7080301@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> <469F0D92.7080301@v.loewis.de> Message-ID: <ca471dc20707191132j7837ec90w1971bca72dac282a@mail.gmail.com> On 7/19/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > >> __reduce__ currently does (O(s#)) with (ob_type, ob_bytes, ob_size). > >> Now, s# creates a Unicode object, and the pickling fails to round-trip > >> correctly. > > > > I thought that before your patch a bytes object roundtripped correctly > > with all three protocols. Or maybe it got broken when s# was changed? > > It did, and it got. s# used to return a str8, which then was pickled > byte-for-byte. When s# started to return Unicode strings, bytes > above 128 got widened to Py_UNICODE (which is what currently > PyUnicode_FromString does), so b'\xFF' became bytes('\uFFFF'). Ouch!!! This turns out to be a bug in PyUnicode_FronStringAndSize() due to signed characters. It can even cause a segfault: Python 3.0x (py3k-struni, Jul 18 2007, 11:01:59) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> b"\x80".__reduce__() Segmentation fault Fixed by applying Py_CHARMASK() to all occurrences of *u in that function. Committed revision 56460. > That got pickled and unpickled; then bytes('\uFFFF') is > b'\xef\xbf\xbf' (because it applies the default encoding to > the unicode argument), and it failed to roundtrip to b'\xFF'. > > It's actually not possible to generate b'\xFF' using > a unicode string argument, as string the default encoding will > never return s'\xFF' (as that's not valid UTF-8). But you can do it using bytes('\xff', 'latin-1'). I think that's a reasonable thing for bytes.__reduce__() to return. > > An additional requirement might be that if bytes are introduced in > > 2.6, a pickle containing bytes written by 3.0 should be readable by > > 2.6. > > Sure: whatever we decide now needs to be applied to 2.6 also. Right. > >> If __reduce__ returns a Unicode object, what encoding should be assumed? > >> (which then needs to be symmetric with bytes()) > >> > >> If __reduce__ returns a str8 object, you will have to keep str8 (or > >> else you cannot pickle bytes). > > > > When __reduce__ returns a string at all, that means it's the name of a > > global. I guess that should be encoded using UTF-8, so that as long as > > the name is ASCII, 2.x can unpickle it. But I'm not sure if that's > > what you were asking. > > No. > py> b'foo'.__reduce__() > (<type 'bytes'>, ('foo',)) > py> b'\xff'.__reduce__() > (<type 'bytes'>, ('\uffff',)) > > It returns one string each time, as the first element of a one-element > tuple (that is then passed to the bytes() constructor on unpickling) I see. It returns a tuple containing a string. I was confused. Sorry. (But the \uffff is due to the bug above.) > > Anyway, one reason this is such a mess is clearly that the pickle > > protocol has no independent spec -- it's grown organically in code. > > Reverse-engineering the intent of the code is a pain. > > That's also true, but I don't see it much as a problem here. If it > had a spec, that spec would have said that b'S', b'T' and b'U' > have a str payload. That spec would break if str8 goes away, and > the spec would be changed to explain how these codes act in 2.x > and 3.x. It would not talk at all about the bytes type, and that > it's __reduce__ might return different things in 2.x and 3.x > (unless bytes gets a primitive code for pickle). How about the following. it's not perfect but it's the best I can think of that doesn't break any pickles. In 3.0, when an S, T or U pickle code is encountered, the returned value is a Unicode string decoded from the bytes using Latin-1. This means that all S, T or U pickle codes returns Unicode objects. In those cases where this was really meant to transfer binary data, the application running under 3.0 can fix this by calling bytes(X, 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call str(Y, 'utf-8') after that. But 3.0 should only *generate* the S, T or U pickle codes for str8 values (as long as that type exists) or for str values containing only 7-bit ASCII bytes; for all else it should use the unicode pickle codes. For bytes, I propose that b"ab\xff".__reduce__() return (bytes, ("ab\xff", "latin-1")). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu Jul 19 22:26:35 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 19 Jul 2007 22:26:35 +0200 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707191132j7837ec90w1971bca72dac282a@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707110715s4cd53401t53e9075bdc2ea1df@mail.gmail.com> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> <469F0D92.7080301@v.loewis.de> <ca471dc20707191132j7837ec90w1971bca72dac282a@mail.gmail.com> Message-ID: <469FC8FB.2050004@v.loewis.de> > But you can do it using bytes('\xff', 'latin-1'). I think that's a > reasonable thing for bytes.__reduce__() to return. That's certainly a choice. Another choice is that bytes defaults to latin-1, rather than the system default encoding. This is roughly equivalent, and gives a slightly more compact pickle result. > How about the following. it's not perfect but it's the best I can > think of that doesn't break any pickles. > > In 3.0, when an S, T or U pickle code is encountered, the returned > value is a Unicode string decoded from the bytes using Latin-1. This > means that all S, T or U pickle codes returns Unicode objects. In > those cases where this was really meant to transfer binary data, the > application running under 3.0 can fix this by calling bytes(X, > 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call > str(Y, 'utf-8') after that. It would actually have to be Y.encode('latin-1').decode('utf-8') (assuming Y is what you get from unpickling): py> str('\xc3\xb6', 'utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: decoding Unicode is not supported > But 3.0 should only *generate* the S, T or U pickle codes for str8 > values (as long as that type exists) or for str values containing only > 7-bit ASCII bytes; for all else it should use the unicode pickle > codes. Sounds fine to me. > For bytes, I propose that b"ab\xff".__reduce__() return (bytes, > ("ab\xff", "latin-1")). See above. Unless somebody objects, I'd rather make latin-1 the default for bytes when a string is passed (I'm uncertain myself of how much explicit is better than implicit here). I'll look into implementing that strategy. Regards, Martin From jonathan-lists at cleverdevil.org Thu Jul 19 22:00:53 2007 From: jonathan-lists at cleverdevil.org (Jonathan LaCour) Date: Thu, 19 Jul 2007 16:00:53 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> Message-ID: <B46B3E00-F0B0-40BE-A59F-561487652FBE@cleverdevil.org> Guido van Rossum wrote: > FWIW, I think the Turbogears use you're thinking of is jsonify, > a GF for converting arbitrary Python data into JSON (JavaScript > Object Notation). But I'm not aware of it using any of the > advanced features -- it seems to be using just the basic facility > of overloading on a single argument type, which could be done > with my own "overloading" example (see the Python subversion > sandbox). At least that's what I got from skimming the docs: > http://docs.turbogears.org/1.0/JsonifyDecorator . That article claims > that TurboGears uses RuleDispatch extensively. I'd love to hear from > them about how they use the advanced features. There are several places in TurboGears that we use generic functions: TurboJSON --------- TurboGears controllers work by returning dictionaries, which are then passed to template engines to generate and render responses. TurboJSON is a Buffet-compatible template plugin that jsonifies data that is returned from a TurboGears controller. The jsonify function is a generic function that is used to perform the serialization, and is commonly extended to provide custom JSON serialization in a cross-cutting way: # TurboGears Controller class PeopleController(controllers.Controller): @expose('json') def person(self, person_id): person = Person.get(person_id) return dict(person=person) # generic function for JSONifying Person objects @jsonify.when('isinstance(obj, Person)') def jsonify_person(obj): return dict( name=person.name, age=person.age, birthdate=person.birthdate.strftime('%Y-%M-%D') ) I use this feature heavily, and find it to be easy to understand once you get used to the concept of generic functions. Of course, we don't restrict @jsonify.when() to isinstance checking. I've seen production code which checks the value of an object before jsonifying it, or which checks and attribute on the object to determine how it should be rendered in the JSON. For example if one of our users has a bunch of different contacts in a contact object, but she wants different JSON for contacts who are also leads, she can use predicate dispatch in the @jsonify.when decorator to do that... Picking a Template Engine ------------------------- TurboGears supports a variety of templating engines in a cross-framework way using a standard API called Buffet. TurboGears controllers can specify different templating engines and different templates for a controller method if they so desire, and we use generic functions to implement this on the backend so that you can regester multiple template options for rendering the same controller method. class Root(controllers.RootController): @expose(template='mako:path.to.mako.template.html') def get_mako(self): return dict(...) @expose("actionflow.templates.tasks") @expose("cheetah:actionflow.templates.tasktext", accept_format="text/plain") @expose("kid:actionflow.templates.taskfeed, accept_format="rss") @expose("json", accept_format = "text/javascript", as_format="json") def task(self): return dict(...) Rule dispatch gets used to check what format is requested (either in the headers, or explicitly via a tg_format parameter) and calls the correct rendering function in the correct way to turn the dict that's returned into what the client asked for. We're going to be improving this and making it even more powerful in TurboGears 2.0. Validation and Error Handling ----------------------------- TurboGears has a built-in framework for validating parameters that are passed in over HTTP. This integrates with an underlying widget system which can be used to generate forms, called ToscaWidgets, that you can use to validate against. You can find good documentation and examples here: http://docs.turbogears.org/1.0/ErrorHandling Here is an example: import turbogears from turbogears import controllers, expose, validate, redirect from turbogears import exception_handler class Root(controllers.RootController): def vh(self, tg_exceptions=None): return dict( handling_value=True, exception=str(tg_exceptions) ) def ih(self, tg_exceptions=None): return dict( handling_index=True, exception=str(tg_exceptions) ) @expose() @exception_handler(vh, "isinstance(tg_exceptions, ValueError)") @exception_handler(ih, "isinstance(tg_exceptions, IndexError)") def exceptional(self, number=2): number = int(number) if number < 42: raise IndexError("Number too Low!") if number == 42: raise IndexError("Wise guy, eh?") if number > 100: raise Exception("This number is exceptionally high!") return dict(result="No errors!") Lots of users are currently making use of this functionality in TurboGears, and it seems to be fairly well received. And again, you can use predicate dispatch to regoster different error_handlers for different kinds of errors. I for one, as a committer on TurboGears, would absolutely love to see a good, solid generic function capability integrated into the standard library, and find PEP 3124 to completely cover my needs. There are certainly things in the PEP that I do not have a use for, but nothing in the PEP seems to be much of a stretch to me. Just my 2 cents (or maybe 50 cents...) -- Jonathan LaCour http://cleverdevil.org From pje at telecommunity.com Thu Jul 19 23:08:04 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 19 Jul 2007 17:08:04 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <B46B3E00-F0B0-40BE-A59F-561487652FBE@cleverdevil.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <20070719084215.GA18244@crater.logilab.fr> <ca471dc20707190722w53fcb1d9te591db51035de487@mail.gmail.com> <B46B3E00-F0B0-40BE-A59F-561487652FBE@cleverdevil.org> Message-ID: <20070719210547.8D5BA3A403A@sparrow.telecommunity.com> At 04:00 PM 7/19/2007 -0400, Jonathan LaCour wrote: >I for one, as a committer on TurboGears, would absolutely love to see >a good, solid generic function capability integrated into the standard >library, and find PEP 3124 to completely cover my needs. There are >certainly things in the PEP that I do not have a use for, but nothing in >the PEP seems to be much of a stretch to me. FYI, Jonathan, the version of PEAK-Rules that's in SVN implements everything that's currently in PEP 3124 except the Interface bits. It does not, however, implement RuleDispatch-style predicate expressions, just argument-isinstance tests. I'd hoped to have predicates done this month, but it's running a couple weeks behind. After it's done, I plan to throw together a RuleDispatch-style API over it, to make porting/testing easier, using something like "from peak.rules import dispatch" to get a module that fakes the RuleDispatch API (e.g. somefunc.when() instead of when(somefunc)). From guido at python.org Fri Jul 20 00:25:07 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jul 2007 15:25:07 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <469FC8FB.2050004@v.loewis.de> References: <f72o9f$v6i$1@sea.gmane.org> <46979811.2050405@v.loewis.de> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> <469F0D92.7080301@v.loewis.de> <ca471dc20707191132j7837ec90w1971bca72dac282a@mail.gmail.com> <469FC8FB.2050004@v.loewis.de> Message-ID: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> On 7/19/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > > But you can do it using bytes('\xff', 'latin-1'). I think that's a > > reasonable thing for bytes.__reduce__() to return. > > That's certainly a choice. Another choice is that bytes defaults to > latin-1, rather than the system default encoding. This is roughly > equivalent, and gives a slightly more compact pickle result. I don't like bytes defaulting to anything at all; that they currently do is a transitional issue in the branch. Java used to have a default of Latin-1 for converting bytes <--> string and it was considered a mistake AFAIK. I've implemented the explicit latin-1version for now; we can change this later. > > How about the following. it's not perfect but it's the best I can > > think of that doesn't break any pickles. > > > > In 3.0, when an S, T or U pickle code is encountered, the returned > > value is a Unicode string decoded from the bytes using Latin-1. This > > means that all S, T or U pickle codes returns Unicode objects. In > > those cases where this was really meant to transfer binary data, the > > application running under 3.0 can fix this by calling bytes(X, > > 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call > > str(Y, 'utf-8') after that. > > It would actually have to be Y.encode('latin-1').decode('utf-8') > (assuming Y is what you get from unpickling): That's another way of saying it. I meant for Y to be the result of bytes(X, 'latin-1') but that was non-obvious. Anyway I think we're in agreement here. :-) > py> str('\xc3\xb6', 'utf-8') > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: decoding Unicode is not supported > > > But 3.0 should only *generate* the S, T or U pickle codes for str8 > > values (as long as that type exists) or for str values containing only > > 7-bit ASCII bytes; for all else it should use the unicode pickle > > codes. > > Sounds fine to me. > > > For bytes, I propose that b"ab\xff".__reduce__() return (bytes, > > ("ab\xff", "latin-1")). > > See above. Unless somebody objects, I'd rather make latin-1 the > default for bytes when a string is passed (I'm uncertain myself > of how much explicit is better than implicit here). See above. > I'll look into implementing that strategy. How about instead you help with fixing pickling of datetime objects? This broke when I fixed test_pickle. Rolling back your changes to datetime pickling didn't seem to help. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jul 20 01:58:30 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 19 Jul 2007 16:58:30 -0700 Subject: [Python-3000] Heaptypes In-Reply-To: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> References: <f72o9f$v6i$1@sea.gmane.org> <ca471dc20707140708n413bfe9fwc6d223f50ff44573@mail.gmail.com> <469C4B0B.50605@v.loewis.de> <ca471dc20707181657o4ccfcc7eu94134972b0b78fb5@mail.gmail.com> <469EAD22.1040603@v.loewis.de> <ca471dc20707182001g241ef15cj5aacea9971e7d2b0@mail.gmail.com> <469F0D92.7080301@v.loewis.de> <ca471dc20707191132j7837ec90w1971bca72dac282a@mail.gmail.com> <469FC8FB.2050004@v.loewis.de> <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> Message-ID: <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> On 7/19/07, Guido van Rossum <guido at python.org> wrote: > How about instead you help with fixing pickling of datetime objects? > This broke when I fixed test_pickle. Rolling back your changes to > datetime pickling didn't seem to help. Never mind; this was shallow -- cPickle doesn't pickle bytes correctly. I've decided to get rid of cPickle -- someone is writing a replacement for the summer of code anyway. The new approach will be that you always write "import pickle" and this transparently attempts to use the C accelerator if it can be imported, like heapq.py and _heapq.c. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Fri Jul 20 04:37:53 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 19 Jul 2007 22:37:53 -0400 Subject: [Python-3000] Fwd: Re: pep 3124 plans Message-ID: <20070720023537.681DA3A403A@sparrow.telecommunity.com> FYI... another TurboGears developer speaks up re: their generic function use. >Date: Thu, 19 Jul 2007 20:17:23 -0400 >From: "Mark Ramm" >To: "Phillip J. Eby" >Subject: Re: [Python-3000] pep 3124 plans > >>FYI, Jonathan, the version of PEAK-Rules that's in SVN implements >>everything that's currently in PEP 3124 except the Interface bits. >> >>It does not, however, implement RuleDispatch-style predicate >>expressions, just argument-isinstance tests. I'd hoped to have >>predicates done this month, but it's running a couple weeks >>behind. After it's done, I plan to throw together a >>RuleDispatch-style API over it, to make porting/testing easier, using >>something like "from peak.rules import dispatch" to get a module that >>fakes the RuleDispatch API (e.g. somefunc.when() instead of when(somefunc)). > >This is good news indeed. TurboGears 2 is looking for rule based >dispatch, and I'm very interested in PEAK Rules as an alternative to >RD since you've pretty much deprecated RD. But an RD like interface >on PEAK-Rules will make TG2 more API compatible, and opens up the >possibility of moving over in the tg 1.x line. > >Predicate dispatch isn't really needed for some of the things in >TurboGears, and there are a couple of places where we went overboard >with generic functions everywhere. But, at the same time there are >other places where generic functions and predicate dispatch really >make things a lot easier to understand, and it would hurt quite a bit >to have to to give it up. > >As the maintainer of tg2, my main interest is to have a viable, >reasonably well supported, generic function implementation that we can >use and rely on. > >I don't so much care that it's baked into the core language, or >included in the standard library -- though I think those would be >great things. Generic functions helped me to think about problems in >a new way, and have been a remarkably useful tool to have in my >toolbox. > >--Mark Ramm From unknown_kev_cat at hotmail.com Fri Jul 20 07:19:09 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Fri, 20 Jul 2007 01:19:09 -0400 Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.co m> <20070713173936.53C213A404D@sparrow.telecommunity.com> Message-ID: <f7pgki$6o3$1@sea.gmane.org> So the state of the PEP? From the rest of the posts so far, it sounds like there is no real objection to the basic end user API as described in the PEP, except for the case of retroactive generification, which GvR wants made explict in the user's code, AIUI. But there are concerns about the implementation. Overiding inside classes would need a new implementation, but at the moment your not sure how to implement that. Also your current bootstrapping system requires in-place modifing of some functions. You think using a third type of function could perhaps fix that if no cleaner solution appears, correct? Also what has happened with the Interfaces/Adpatation/Aspects part of the document? How does that mesh with the ABC's? After all adaptable interfaces and ABCs have such similar use cases users may not be sure which to use. Or has that part been defered for now, as the GF and method combination part is not dependent on those? From unknown_kev_cat at hotmail.com Fri Jul 20 08:20:55 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Fri, 20 Jul 2007 02:20:55 -0400 Subject: [Python-3000] PEP 368: Standard image protocol and class References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com><cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com><740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> <cc93256f0707010959o44c77912sb989c68cf890b846@mail.gmail.com> Message-ID: <f7pk8b$evu$1@sea.gmane.org> "Lino Mastrodomenico" <l.mastrodomenico at gmail.com> wrote in message news:cc93256f0707010959o44c77912sb989c68cf890b846 at mail.gmail.com... >2007/7/1, BJ?rn Lindqvist <bjourne at gmail.com>: >> But I cannot see how it would solve the problem with to many image >> classes. The reason why PIL, PyGame and wxPython has different image >> classes is because each of them use different C functions for >> manipulating said image classes. These differences bubble up through >> the bindings and results in PIL exposing an Image, PyGame a Surface >> and wxPython a wxImage. The result is that if you want to use a PIL >> Image in say PyGame, you still need to convert it. > >Actually, this is not always true. :-) > >For example it's entirely possible to have the *same* python RGBA >image considered as a SDL_Surface by SDL (the underlying library used >by pygame), as an ImagingMemoryInstance by the PIL C library and have >its buffer directly accepted by the OpenGL function glTexImage2D (with >a bit of care in the order of the corners passed to glTexCoord2f), >independently by who created the image in the first place. > >This works because most C/C++ libraries give the possibility of >creating a native image struct/class using an existing memory buffer >(without copying it) and they support at least a subset of the modes >currently defined, with the exact byte order, padding, etc, specified >in the PEP (usually L and at least one of RGB or RGBA). > >But you are right, the particular format specified in the PEP is not >always supported by existing the libraries, even when they support >that particular mode. Sometimes this can be fixed (e.g. PIL currently >uses by default 4 bytes per pixel for RGB images and has only >experimental support for 3 bytes per pixel, but its C library is >written by the same people that maintain the Python bindings, so they >can change it if they want) and sometimes it cannot be easily fixed >(e.g. a wxImage class will happily accept a RGB buffer as defined by >the PEP, but it has a funny memory arrangement for RGBA images that is >completely incompatible). > >So I expect that each Python library that jumps on the PEP bandwagon >will have three levels of support for the modes listed: > > 1) no support at all (e.g. most 3D libraries will probably never >accept CMYK images as textures); the user can explicitly convert the >image using "new_image = Image(new_mode, source=old_image)"; > > 2) limited support: they support a particular mode, but cannot >directly use the standard memory arrangement, so when they receive an >alien image object they convert it on the fly to their preferred byte >order and they do the reverse operation when a foreign library tries >to access the buffer property of their images (they may offer a >read-only buffer); this is not ideal, but it's better than the current >situation because it's transparent to the user and it requires only a >single memory copy/conversion instead of the two usually performed by >the current tostring/fromstring dance; > > 3) full support: no conversion or memory copy ever necessary for the >exchange of images between two libraries if they both have full >support for a particular mode. Of course the Image class that I'm >writing and that I hope will be included in the stdlib, will have full >support for all the modes. > >Please note that the conversions in "2)" above can be avoided in some >(most?) cases if PEP 3118 is accepted, because it will become possible >to expose and discover the "native" memory arrangement of an image >without accessing its buffer property (that, in my vision, will always >offer the "standard" arrangement defined in the PEP, to simplify >things for libraries that prefer a simpler interface, even if it may >be slightly less efficient in some, hopefully rare, cases). > If the maintainers of most of the large packages that do imaging are willing to support this, and your code is good, I see absolutely no reason why this PEP would not be accepted. It appears you worked hard to make sure that it would be possible for the existing libraries to use the Image protocol without too much work. (Unless they need to use "support level 2" as you described above, for some modes. That would add some extra work). Will you provide an abstract base class for Image Protocol implentations to inherit from? (The ImageMixin could inheirit from that class, just not providing implemenations of info, buffer, mode, and size. [Hmm. If any of those were functions then that would prevent somebody from directly instancing ImageMixin, which would be a good thing, as it was really only intended to be used as a base class as far as I can tell.]) Will the simple "Image" class have no extra functionally beyond the protocol's minimum requirements and the stated resizing/mode-changing constructors? If an image-protocol object is passed to the Image-constructor requesting a mode conversion or resizing, but is already in the requested mode/size what happens? Is the underlying image data duplicated? Or does the new instance basically point to the old data? From jcarlson at uci.edu Fri Jul 20 10:18:01 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 20 Jul 2007 01:18:01 -0700 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> Message-ID: <20070720010804.85A7.JCARLSON@uci.edu> "Guido van Rossum" <guido at python.org> wrote: > On 7/19/07, Guido van Rossum <guido at python.org> wrote: > > How about instead you help with fixing pickling of datetime objects? > > This broke when I fixed test_pickle. Rolling back your changes to > > datetime pickling didn't seem to help. > > Never mind; this was shallow -- cPickle doesn't pickle bytes > correctly. I've decided to get rid of cPickle -- someone is writing a > replacement for the summer of code anyway. The new approach will be > that you always write "import pickle" and this transparently attempts > to use the C accelerator if it can be imported, like heapq.py and > _heapq.c. On a related note, since I had been supporting only Python 2.3 for quite a while, I didn't notice the fact that Python's _heapq.c (in 2.4 at least, I haven't tested on 2.5) only supported lists as containers, and not a list-like object with all methods that heapq calls (which was an issue for a pure-Python pair heap implementation I posted last December or so). What made it really annoying is that there was no way to tell the heapq module not to load the C version so that I could use a generic container. I ended up just commenting out the C module heapq import and moving on. I don't know if we want to make it possible to disable the loading of certain C modules that *don't* offer all of the same features, or if we want to limit the Python versions to what the C versions support, or even if we want to expand the C versions to handle all cases that the Python versions support. While the pickle/cPickle, StringIO/cStringIO, etc., naming can be a bit annoying, it does give me the choice whether I want it to be fast or flexible. - Josiah From guido at python.org Fri Jul 20 16:44:09 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 07:44:09 -0700 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <20070720010804.85A7.JCARLSON@uci.edu> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> Message-ID: <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> On 7/20/07, Josiah Carlson <jcarlson at uci.edu> wrote: > > "Guido van Rossum" <guido at python.org> wrote: > > On 7/19/07, Guido van Rossum <guido at python.org> wrote: > > > How about instead you help with fixing pickling of datetime objects? > > > This broke when I fixed test_pickle. Rolling back your changes to > > > datetime pickling didn't seem to help. > > > > Never mind; this was shallow -- cPickle doesn't pickle bytes > > correctly. I've decided to get rid of cPickle -- someone is writing a > > replacement for the summer of code anyway. The new approach will be > > that you always write "import pickle" and this transparently attempts > > to use the C accelerator if it can be imported, like heapq.py and > > _heapq.c. > > On a related note, since I had been supporting only Python 2.3 for quite > a while, I didn't notice the fact that Python's _heapq.c (in 2.4 at > least, I haven't tested on 2.5) only supported lists as containers, and > not a list-like object with all methods that heapq calls (which was an > issue for a pure-Python pair heap implementation I posted last December > or so). > > What made it really annoying is that there was no way to tell the heapq > module not to load the C version so that I could use a generic container. > I ended up just commenting out the C module heapq import and moving on. > > I don't know if we want to make it possible to disable the loading of > certain C modules that *don't* offer all of the same features, or if we > want to limit the Python versions to what the C versions support, or > even if we want to expand the C versions to handle all cases that the > Python versions support. While the pickle/cPickle, StringIO/cStringIO, > etc., naming can be a bit annoying, it does give me the choice whether I > want it to be fast or flexible. This was an example of a performance improvement that changed the specs of an API in an incompatible way. Breaking your code was an unintended side effect of the speedup. We're going to do a few more of these in Py3k, and this time breaking the specs is the name of the game. I think going forward (post 3.0) we should be more careful to write specs that can easily be optimized without breaking existing usage, or writing speedups that can handle all the argument types that the original code supported. I definitely *don't* want to continue the old habit of having a slow and a fast module with different names; the experience with especially cPickle and cStringIO is that everyone believes their code is performance critical and hence uses the C version if it exists, thereby repeating the same idiom over and over. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jul 20 16:49:12 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 07:49:12 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <f7pgki$6o3$1@sea.gmane.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> Message-ID: <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > So the state of the PEP? From the rest of the posts so far, > it sounds like there is no real objection to the basic end user API as > described in the PEP, Actually I want to reserve judgment on that until the PEP is rewritten to explain and document the underlying mechanisms. It is currently impossible (for me, anyway) to understand how the machinery to support the described features could be built. Without that I cannot approve the PEP. Phillip knows this but is too busy to work on it. > except for the case of retroactive generification, which GvR wants made > explict in the user's code, AIUI. > > But there are concerns about the implementation. Overiding inside classes > would need a new implementation, but at the moment your not sure how to > implement that. Also your current bootstrapping system requires in-place > modifing of some functions. You think using a third type of function could > perhaps fix that if no cleaner solution appears, correct? > > Also what has happened with the Interfaces/Adpatation/Aspects part of the > document? How does that mesh with the ABC's? > After all adaptable interfaces and ABCs have such similar use cases users > may not be sure which to use. > Or has that part been defered for now, as the GF and method combination part > is not dependent on those? AFAIK Phillip has declared that his implementation only uses (or could be made to only use) isinstance()/issubclass(), and the overriding of these two used by the ABCs is actually very convenient for the GF PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Fri Jul 20 18:21:56 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Fri, 20 Jul 2007 12:21:56 -0400 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <ca471dc20707181336n294fc353vc4eefc82854a8759@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <acd65fa20706231124q4e5d5192kdc5694d52175e660@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> <acd65fa20707181332n480bf6fsa7bff17403770786@mail.gmail.com> <ca471dc20707181336n294fc353vc4eefc82854a8759@mail.gmail.com> Message-ID: <acd65fa20707200921r55a12be8n908de344e4327607@mail.gmail.com> How this different from setting the position to the new size? What should happen when someone call truncate() with an argument greater than the current size? Should it do a seek, or nothing? Thanks, -- Alexandre On 7/18/07, Guido van Rossum <guido at python.org> wrote: > Unless anyone cares, it should imply a seek to the indicated position > if an argument was present. > > On 7/18/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > > So, any decision on the proposed semantic change of truncate? > > > > On 7/3/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > > > On 7/2/07, Guido van Rossum <guido at python.org> wrote: > > > > Honestly, I think truncate() should always set the current position to > > > > the new size, even though that's not what it currently does. > > > > > > Thought about that and I think that would be the best thing to do. > > > That would avoid making StringIO unnecessary different from BytesIO. > > > And IMHO, it is less prone to bugs. If someone wants to truncate while > > > keeping the current position, then he will have to state is intention > > > explicitly by saving the value of tell() and calling seek() after > > > truncating. > > > > > > I also find the semantic make more sense too. For example: > > > > > > >>> s = StringIO("Good bye, world") > > > >>> s.truncate(10) > > > >>> s.write("cruel world") > > > >>> s.getvalue() > > > ??? > > > > > > I think that should return "Good bye, cruel world", not "cruel world". > > > > > > So, does anyone else agree with this small semantic change of truncate()? > > > From guido at python.org Fri Jul 20 18:51:08 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 09:51:08 -0700 Subject: [Python-3000] StringIO/BytesIO in io.py doesn't over-seek properly In-Reply-To: <acd65fa20707200921r55a12be8n908de344e4327607@mail.gmail.com> References: <acd65fa20706230853w32f8895g91b7715c456900b7@mail.gmail.com> <ca471dc20706231148p7cbb9953tb31099dfe68c9a32@mail.gmail.com> <acd65fa20706251114u60bae701ve95a84ffee27e0b2@mail.gmail.com> <acd65fa20706280737n54b8dea8l5362b8545c990236@mail.gmail.com> <acd65fa20707021046o4349aafdxd7b895f502edd32@mail.gmail.com> <ca471dc20707021138o3392bc11u9a9be3f1a6f4dda1@mail.gmail.com> <acd65fa20707030806i60b0e77dm71b394f279e2172c@mail.gmail.com> <acd65fa20707181332n480bf6fsa7bff17403770786@mail.gmail.com> <ca471dc20707181336n294fc353vc4eefc82854a8759@mail.gmail.com> <acd65fa20707200921r55a12be8n908de344e4327607@mail.gmail.com> Message-ID: <ca471dc20707200951s9989585pfd1fe19d43f6beec@mail.gmail.com> They shouldn't, really, and I don't care too much about what happens in that case. It may depend on whether the I/O device honors seeks beyond EOF or not. On 7/20/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > How this different from setting the position to the new size? What > should happen when someone call truncate() with an argument greater > than the current size? Should it do a seek, or nothing? > > Thanks, > -- Alexandre > > On 7/18/07, Guido van Rossum <guido at python.org> wrote: > > Unless anyone cares, it should imply a seek to the indicated position > > if an argument was present. > > > > On 7/18/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > > > So, any decision on the proposed semantic change of truncate? > > > > > > On 7/3/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote: > > > > On 7/2/07, Guido van Rossum <guido at python.org> wrote: > > > > > Honestly, I think truncate() should always set the current position to > > > > > the new size, even though that's not what it currently does. > > > > > > > > Thought about that and I think that would be the best thing to do. > > > > That would avoid making StringIO unnecessary different from BytesIO. > > > > And IMHO, it is less prone to bugs. If someone wants to truncate while > > > > keeping the current position, then he will have to state is intention > > > > explicitly by saving the value of tell() and calling seek() after > > > > truncating. > > > > > > > > I also find the semantic make more sense too. For example: > > > > > > > > >>> s = StringIO("Good bye, world") > > > > >>> s.truncate(10) > > > > >>> s.write("cruel world") > > > > >>> s.getvalue() > > > > ??? > > > > > > > > I think that should return "Good bye, cruel world", not "cruel world". > > > > > > > > So, does anyone else agree with this small semantic change of truncate()? > > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Fri Jul 20 19:15:51 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Fri, 20 Jul 2007 13:15:51 -0400 Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com><20070713173936.53C213A404D@sparrow.telecommunity.com><f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> Message-ID: <f7qqka$igc$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707200749p4ed42134h453c7535c98cc73d at mail.gmail.com... > On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> So the state of the PEP? From the rest of the posts so far, >> it sounds like there is no real objection to the basic end user API as >> described in the PEP, > > Actually I want to reserve judgment on that until the PEP is rewritten > to explain and document the underlying mechanisms. It is currently > impossible (for me, anyway) to understand how the machinery to support > the described features could be built. Without that I cannot approve > the PEP. Phillip knows this but is too busy to work on it. > Fair enough. However, You see nothing terribly broken with the end user side of the PEP, assuming the underlining machinery can be built in a reasonable way, correct? >> except for the case of retroactive generification, which GvR wants made >> explict in the user's code, AIUI. >> >> But there are concerns about the implementation. Overiding inside classes >> would need a new implementation, but at the moment your not sure how to >> implement that. Also your current bootstrapping system requires in-place >> modifing of some functions. You think using a third type of function >> could >> perhaps fix that if no cleaner solution appears, correct? >> >> Also what has happened with the Interfaces/Adpatation/Aspects part of the >> document? How does that mesh with the ABC's? >> After all adaptable interfaces and ABCs have such similar use cases users >> may not be sure which to use. >> Or has that part been defered for now, as the GF and method combination >> part >> is not dependent on those? > > AFAIK Phillip has declared that his implementation only uses (or could > be made to only use) isinstance()/issubclass(), and the overriding of > these two used by the ABCs is actually very convenient for the GF PEP. > Ok, but what about the potential for confusion between @abc.abstractmethod and @overloading.abstract? They are similar, but the ABC's one appears to block instantiation of a class that contains (or whoses ancestors contain) an abstractmethod that has not been overrideen by inheritance. On the other hand the interfaces in PEP 3124 work quite differently. Implementations of the abstract functions can be provided by GFs. As such, an interface can be used even if there are no classes implementing it. Yet despite those differences, the common use cases for interfaces seem pretty much identical to the common use cases of ABCs, which I fear will be a problem, as the end user may not be able to easily decide which to use. (My personal thoughts would be to use ABCs normally, and use the PEP 3124 interfaces only as adapters.) From guido at python.org Fri Jul 20 19:30:41 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 10:30:41 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <f7qqka$igc$1@sea.gmane.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <f7qqka$igc$1@sea.gmane.org> Message-ID: <ca471dc20707201030s49a02240veab2c125f75ab68d@mail.gmail.com> On 7/20/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > "Guido van Rossum" <guido at python.org> wrote in message > news:ca471dc20707200749p4ed42134h453c7535c98cc73d at mail.gmail.com... > > On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > >> So the state of the PEP? From the rest of the posts so far, > >> it sounds like there is no real objection to the basic end user API as > >> described in the PEP, > > > > Actually I want to reserve judgment on that until the PEP is rewritten > > to explain and document the underlying mechanisms. It is currently > > impossible (for me, anyway) to understand how the machinery to support > > the described features could be built. Without that I cannot approve > > the PEP. Phillip knows this but is too busy to work on it. > > Fair enough. However, You see nothing terribly broken with the end user side > of the PEP, > assuming the underlining machinery can be built in a reasonable way, > correct? Not at all true. How can I be in agreement with an incomplete PEP? I don't want to reject the PEP only because it's incomplete, but a good understanding of the interaction between the simple end-user API and machinery is essential for acceptance. > >> except for the case of retroactive generification, which GvR wants made > >> explict in the user's code, AIUI. > >> > >> But there are concerns about the implementation. Overiding inside classes > >> would need a new implementation, but at the moment your not sure how to > >> implement that. Also your current bootstrapping system requires in-place > >> modifing of some functions. You think using a third type of function > >> could > >> perhaps fix that if no cleaner solution appears, correct? > >> > >> Also what has happened with the Interfaces/Adpatation/Aspects part of the > >> document? How does that mesh with the ABC's? > >> After all adaptable interfaces and ABCs have such similar use cases users > >> may not be sure which to use. > >> Or has that part been defered for now, as the GF and method combination > >> part > >> is not dependent on those? > > > > AFAIK Phillip has declared that his implementation only uses (or could > > be made to only use) isinstance()/issubclass(), and the overriding of > > these two used by the ABCs is actually very convenient for the GF PEP. > > > > Ok, but what about the potential for confusion between @abc.abstractmethod > and @overloading.abstract? > They are similar, but the ABC's one appears to block instantiation of a > class that contains (or whoses ancestors contain) an abstractmethod that has > not been overrideen by inheritance. On the other hand the interfaces in PEP > 3124 work quite differently. Implementations of the abstract functions can > be provided by GFs. As such, an interface can be used even if there are no > classes implementing it. You're right, there are conflicting ideas here. A quick read of the "Interfaces and Adaptation" section doesn't make me think that I'd like to use it instead of PEP 3119 though; the mechanism is more powerful (it lets you convert a list to an IStack whose pop method calls the list's append method) but also more verbose (you have to make declarations about each individual method). > Yet despite those differences, the common use cases for interfaces seem > pretty much identical to the common use cases of ABCs, which I fear will be > a problem, as the end user may not be able to easily decide which to use. > (My personal thoughts would be to use ABCs normally, and use the PEP 3124 > interfaces only as adapters.) Agreed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Fri Jul 20 19:45:54 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 20 Jul 2007 13:45:54 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> Message-ID: <20070720174706.AE5773A40A8@sparrow.telecommunity.com> At 07:49 AM 7/20/2007 -0700, Guido van Rossum wrote: >On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > > So the state of the PEP? From the rest of the posts so far, > > it sounds like there is no real objection to the basic end user API as > > described in the PEP, > >Actually I want to reserve judgment on that until the PEP is rewritten >to explain and document the underlying mechanisms. It is currently >impossible (for me, anyway) to understand how the machinery to support >the described features could be built. Without that I cannot approve >the PEP. Phillip knows this but is too busy to work on it. Actually, I was under the impression you didn't want the API described in the PEP, and wanted the following changes in addition to dropping method combination, aspects, and interfaces: * :next_method as a keyword-only argument * @somegeneric.overload as the standard decorator (w/no @overload or @when) * advance declaration of a function as overloadable (which is also required by the previous change and by your preference not to modify functions in-place) Also, I didn't know you wanted an explanation of how the underlying mechanisms work in general. I thought the only piece you were looking for more explanation of was the method combination machinery -- which would be moot if we're scaling back the API as described by the above. Just to be sure I'm clear as to what you want, is that the only mechanism you're unclear on, or is the whole thing unclear? The whole thing was inspired by your overloading prototype, I've just made all the concrete bits of it more... "generic". That is, instead of using issubclass or other explicit relationship tests between overload signatures, I use a generic function implies(). Instead of simply storing a method added as an overload, I use a "combine_actions()" generic function to combine it with any method that's already there (possibly including a method type for "No Method Found"). Instead of simply finding the most-specific matching signature on cache misses, I use combine_actions() to combine *all applicable* actions (i.e., all those that the calling signature implies()). The combine_actions() function uses another generic function, overrides(), to compare method priorities. overrides() is defined so that Around beats Before beats After beats regular methods beats no method found. The overrides() of two methods of the same type is determined by which signature implies() the other, without also being implied *by* the other. If there is no overrides() order between two methods, you get an AmbiguousMethod combining the two -- which can be overridden by any method whose signature implies() everything in the AmbiguousMethod. All this is pretty much the same as in your prototype, except that it's done by adding these rules to the generic functions, rather than by hardcoding them. That's why it's bigger than your prototype, but also why it's extensible in terms of adding new method types or ways to specify signatures. I then also added the ability to attach different dispatchers to a function, so that you could replace the simple "tuple of types" matching with more sophisticated engines like RuleDispatch's, while still retaining the ability to use the same method combinations and existing overloads registered for a function. That is, it lets you keep the same API for defining overloads and method combinations as the basic implementation, while allowing the actual overload targets and dispatching mechanisms to vary. That's pretty much it except for Aspects and Interfaces. I've ended up making my Aspect implementation available separately in the ObjectRoles cheeseshop package, renaming them Roles instead of Aspects. (And yes, I will add all the above explanation to the PEP.) >AFAIK Phillip has declared that his implementation only uses (or could >be made to only use) isinstance()/issubclass(), and the overriding of >these two used by the ABCs is actually very convenient for the GF PEP. Yep. The overload of "implies(c1:type, c2:type)" is "issubclass". "isinstance()" isn't used, since that would render your type-tuple caching strategy unusable. From guido at python.org Fri Jul 20 19:52:14 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 10:52:14 -0700 Subject: [Python-3000] uuid creation not thread-safe? Message-ID: <ca471dc20707201052p68883fc5l3efd8ecc5cfd497f@mail.gmail.com> I discovered what appears to be a thread-unsafety in uuid.py. This is in the trunk as well as in 3.x; I'm using the trunk here for easy reference. There's some code around like 395: import ctypes, ctypes.util _buffer = ctypes.create_string_buffer(16) This creates a *global* buffer which is used as the output parameter to later calls to _uuid_generate_random() and _uuid_generate_time(). For example, around line 481, in uuid1(): _uuid_generate_time(_buffer) return UUID(bytes=_buffer.raw) Clearly if two threads do this simultaneously they are overwriting _buffer in unpredictable order. There are a few other occurrences of this too. I find it somewhat disturbing that what seems a fairly innocent function that doesn't *appear* to have global state is nevertheless not thread-safe. Would it be wise to fix this, e.g. by allocating a fresh output buffer inside uuid1() and other callers? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bwinton at latte.ca Fri Jul 20 20:07:36 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 20 Jul 2007 14:07:36 -0400 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> Message-ID: <46A0F9E8.8010404@latte.ca> Guido van Rossum wrote: >> While the pickle/cPickle, StringIO/cStringIO, etc., naming can be >> a bit annoying, it does give me the choice whether I want it to be >> fast or flexible. > I definitely *don't* want to continue the old habit of having a slow > and a fast module with different names; the experience with especially > cPickle and cStringIO is that everyone believes their code is > performance critical and hence uses the C version if it exists, > thereby repeating the same idiom over and over. Until they need to turn Unicode strings into file-like objects, at which point they go back to StringIO. (Why yes, I was recently bitten by that particular "restriction". :) Later, Blake. From guido at python.org Fri Jul 20 20:25:40 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 11:25:40 -0700 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <46A0F9E8.8010404@latte.ca> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> <46A0F9E8.8010404@latte.ca> Message-ID: <ca471dc20707201125j3b391fdy67be2a44e0bb4ef1@mail.gmail.com> On 7/20/07, Blake Winton <bwinton at latte.ca> wrote: > Guido van Rossum wrote: > >> While the pickle/cPickle, StringIO/cStringIO, etc., naming can be > >> a bit annoying, it does give me the choice whether I want it to be > >> fast or flexible. > > I definitely *don't* want to continue the old habit of having a slow > > and a fast module with different names; the experience with especially > > cPickle and cStringIO is that everyone believes their code is > > performance critical and hence uses the C version if it exists, > > thereby repeating the same idiom over and over. > > Until they need to turn Unicode strings into file-like objects, at which > point they go back to StringIO. (Why yes, I was recently bitten by that > particular "restriction". :) Py3k will have separate BytesIO and StringIO classes (both in the io module). The accelerations, if any, will be transparent. Subclasses or usage depending on implementation details however are not supported. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Jul 21 02:17:12 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 20 Jul 2007 17:17:12 -0700 Subject: [Python-3000] Need help fixing tests in str/unicode branch Message-ID: <ca471dc20707201717x457f07d2pd841608db5168c2d@mail.gmail.com> Thanks to all who helped fixing tests in the str/unicode branch! We're down to about 35 failing tests. I still need help -- especially since we're now getting into territory that I don't know all that well, for example the email package or XML support. The list of unit tests that need help is still on the wiki: http://wiki.python.org/moin/Py3kStrUniTests Instructions on how to help and how to avoid duplicate work are also there. Please help! Thanks to all those who already fixed one or more tests! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat Jul 21 03:57:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 Jul 2007 13:57:23 +1200 Subject: [Python-3000] PEP 368: Standard image protocol and class In-Reply-To: <f7pk8b$evu$1@sea.gmane.org> References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com> <cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com> <740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com> <cc93256f0707010959o44c77912sb989c68cf890b846@mail.gmail.com> <f7pk8b$evu$1@sea.gmane.org> Message-ID: <46A16803.1020200@canterbury.ac.nz> Joe Smith wrote: > If the maintainers of most of the large packages that do imaging are willing > to support this, > and your code is good, I see absolutely no reason why this PEP would not be > accepted. Something that bothers me about it a little is that the core Python/C API seems like the wrong place to put PyImge_* functions. -- Greg From greg.ewing at canterbury.ac.nz Sat Jul 21 04:01:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 Jul 2007 14:01:42 +1200 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <20070720010804.85A7.JCARLSON@uci.edu> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> Message-ID: <46A16906.7010005@canterbury.ac.nz> Josiah Carlson wrote: > What made it really annoying is that there was no way to tell the heapq > module not to load the C version so that I could use a generic container. I would say that all such dual-implementation modules should make the specific implementations available under different names, using some convention such as _c_heapq/_p_heapq. -- Greg From joe at bitworking.org Sat Jul 21 06:12:51 2007 From: joe at bitworking.org (Joe Gregorio) Date: Sat, 21 Jul 2007 00:12:51 -0400 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) Message-ID: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> Should xml.parsers.expat.XMLParser.ParseFile(file) operate on both text and binary streams? If it should operate on text streams then an issue arises from "read(n)" meaning different things for text and binary streams. If the stream passed in is "text" then read(n) will read 'n' unicode characters, but pyexpat.c allocates a buffer of 2048 bytes and calls read(2048) which could obviously return more than 2048 bytes. The simplest solution in the case of a text stream is to be safe and convert that into read(2048/4) to accommodate the worst case scenario. Has this come up before and is there a better solution? Thanks, -joe On 7/20/07, Guido van Rossum <guido at python.org> wrote: > Thanks to all who helped fixing tests in the str/unicode branch! We're > down to about 35 failing tests. I still need help -- especially since > we're now getting into territory that I don't know all that well, for > example the email package or XML support. > > The list of unit tests that need help is still on the wiki: > http://wiki.python.org/moin/Py3kStrUniTests > > Instructions on how to help and how to avoid duplicate work are also > there. Please help! > > Thanks to all those who already fixed one or more tests! > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/joe%40bitworking.org > -- Joe Gregorio http://bitworking.org From fdrake at acm.org Sat Jul 21 06:25:10 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 21 Jul 2007 00:25:10 -0400 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> Message-ID: <200707210025.11031.fdrake@acm.org> On Saturday 21 July 2007, Joe Gregorio wrote: > Should xml.parsers.expat.XMLParser.ParseFile(file) operate on > both text and binary streams? No. XML is a serialization of a markup language containing Unicode character into an encoded stream. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> From talin at acm.org Sat Jul 21 07:55:24 2007 From: talin at acm.org (Talin) Date: Fri, 20 Jul 2007 22:55:24 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070720174706.AE5773A40A8@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> Message-ID: <46A19FCC.7070609@acm.org> Phillip J. Eby wrote: > At 07:49 AM 7/20/2007 -0700, Guido van Rossum wrote: >> On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >>> So the state of the PEP? From the rest of the posts so far, >>> it sounds like there is no real objection to the basic end user API as >>> described in the PEP, >> Actually I want to reserve judgment on that until the PEP is rewritten >> to explain and document the underlying mechanisms. It is currently >> impossible (for me, anyway) to understand how the machinery to support >> the described features could be built. Without that I cannot approve >> the PEP. Phillip knows this but is too busy to work on it. > > Actually, I was under the impression you didn't want the API > described in the PEP, and wanted the following changes in addition to > dropping method combination, aspects, and interfaces: I'd like to clarify these requirements a little bit: On the issue of method combination, aspects, and interfaces: Guido has not made a pronouncement on whether these things may or may not be accepted at some time in the future. What he has said is that he doesn't *yet* understand the use case for them, and that these should be separate PEPs so that we can argue their merits independently. What he's strongly against (if my understanding is correct) is a "package deal" where he is forced to accept all of the features, or none. I get the sense that the need for some of these advanced features becomes apparent only after having worked with generics for a while. If that's the case, then the best hope for including them in the stdlib is to get an implementation of generics into the hands of lots of Python programmers so that they can become familiar with them. > * :next_method as a keyword-only argument > > * @somegeneric.overload as the standard decorator (w/no @overload or @when) You mentioned earlier that there was a design reason for preferring @overload and @when vs. the earlier RuleDispatch syntax, but the explanation you gave wasn't very clear (to me anyway). (I personally prefer the @somegeneric.overload, but that's purely an aesthetic value judgement - if there's a strong architectural advantage of the other syntax, I'd like to hear it.) > * advance declaration of a function as overloadable (which is also > required by the previous change and by your preference not to modify > functions in-place) Right. There are two reasons that I think that post-hoc overloading runs into problems. The first, as you mentioned, is that it's difficult to implement without some kind of trickery. The second reason - this is my opinion - is that it too much resembles the mythical "comefrom" statement (the opposite of "goto"). The "comefrom" statement is intended to be a joke - the worst possible language feature from the standpoint of being able to manually trace the flow of execution of a program. I do think that there are use cases for being able to 'decorate' (in the broader sense) the execution of a function, in an aspect-like way; But I also think that such power should not be used casually, and places where its used should stick out in a way that makes them visually obvious and searchable. > Also, I didn't know you wanted an explanation of how the underlying > mechanisms work in general. I thought the only piece you were > looking for more explanation of was the method combination machinery > -- which would be moot if we're scaling back the API as described by the above. > > Just to be sure I'm clear as to what you want, is that the only > mechanism you're unclear on, or is the whole thing unclear? The > whole thing was inspired by your overloading prototype, I've just > made all the concrete bits of it more... "generic". It seems to me that PEPs should only be required to explain their mechanisms if there's some doubt or controversy about the implementation. It seems to me that this PEP pushes the bounds of what is efficiently doable, so some extra explanation is required. One issue that hasn't been satisfactorily resolved is the handling of the 'self' parameter. At least, let me give my explanation of what I think the issue is and see if we're on the same page: Overloading a class method requires special treatment of the 'self' parameter because there's an implicit constraint on what types of objects can be passed as 'self': for any method defined in any class, the 'self' parameter must be an instance of the class (or a subclass) in which the method is defined. Now, this would be trivial if we required the programmer to explicitly declare the type of 'self', but this violates DRY and has the potential to cause mischief if the programmer forgets to update the method signature when they change the class. In order to avoid this syntactical redundancy, there is a desire to be able to automatically detect the type of the class in which the overload is declared. This is hard to do, because the "overload" machinery is handled by a function decorator, which runs before the class is actually constructed. Various methods for deducing the class have been proposed, but they have all so far been somewhat problematic, especially in light of "new-style" metaclasses. I can think of only two approaches for solving this cleanly. The first is that the overload decorator should be given some C-code help. Now, I recognize that part of your goal was to make the initial prototype a "pure Python" implementation in order to make life easier for Jython/IronPython and friends. That is certainly laudable. However, if the C-code help is a relatively small function that can be reimplemented for the other interpreters, then the impact on portability will be small. The other approach is to somehow defer the work until after the class is fully constructed. The question then is when will the work be done - in other words, where should the decorator hook its fixup callback? Even assuming we had some sort of hook that would be triggered when a class has finished construction, then the question is what about non-member generic functions? Since they are not contained in a class body, this hypothetical hook will never be called, and thus the methods won't be "finished". (A way around this would be to say that the only thing that the class-construction hook does is to add the additional type information for 'self', and the method is otherwise finished and ready to go as soon as the decorator is completed.) If it turns out that there's no way to get a callback when the class has finished being built, then we may have to defer finishing the construction until the first time the generic function is called. This wouldn't be too bad, considering that there's a bunch of other stuff that is lazily calculated on first call anyway, from what I understand. > That is, instead of using issubclass or other explicit relationship > tests between overload signatures, I use a generic function > implies(). Instead of simply storing a method added as an overload, > I use a "combine_actions()" generic function to combine it with any > method that's already there (possibly including a method type for "No > Method Found"). Instead of simply finding the most-specific matching > signature on cache misses, I use combine_actions() to combine *all > applicable* actions (i.e., all those that the calling signature implies()). > > The combine_actions() function uses another generic function, > overrides(), to compare method priorities. overrides() is defined so > that Around beats Before beats After beats regular methods beats no > method found. The overrides() of two methods of the same type is > determined by which signature implies() the other, without also being > implied *by* the other. > > If there is no overrides() order between two methods, you get an > AmbiguousMethod combining the two -- which can be overridden by any > method whose signature implies() everything in the AmbiguousMethod. > > All this is pretty much the same as in your prototype, except that > it's done by adding these rules to the generic functions, rather than > by hardcoding them. That's why it's bigger than your prototype, but > also why it's extensible in terms of adding new method types or ways > to specify signatures. > > I then also added the ability to attach different dispatchers to a > function, so that you could replace the simple "tuple of types" > matching with more sophisticated engines like RuleDispatch's, while > still retaining the ability to use the same method combinations and > existing overloads registered for a function. > > That is, it lets you keep the same API for defining overloads and > method combinations as the basic implementation, while allowing the > actual overload targets and dispatching mechanisms to vary. > > That's pretty much it except for Aspects and Interfaces. I've ended > up making my Aspect implementation available separately in the > ObjectRoles cheeseshop package, renaming them Roles instead of Aspects. > > (And yes, I will add all the above explanation to the PEP.) > > >> AFAIK Phillip has declared that his implementation only uses (or could >> be made to only use) isinstance()/issubclass(), and the overriding of >> these two used by the ABCs is actually very convenient for the GF PEP. > > Yep. The overload of "implies(c1:type, c2:type)" is > "issubclass". "isinstance()" isn't used, since that would render > your type-tuple caching strategy unusable. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > From unknown_kev_cat at hotmail.com Sat Jul 21 09:20:47 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Sat, 21 Jul 2007 03:20:47 -0400 Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com><20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> Message-ID: <f7sc4k$bht$1@sea.gmane.org> "Talin" <talin at acm.org> wrote in message news:46A19FCC.7070609 at acm.org... > Phillip J. Eby wrote: >> At 07:49 AM 7/20/2007 -0700, Guido van Rossum wrote: >>> On 7/19/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >>>> So the state of the PEP? From the rest of the posts so far, >>>> it sounds like there is no real objection to the basic end user API as >>>> described in the PEP, >>> Actually I want to reserve judgment on that until the PEP is rewritten >>> to explain and document the underlying mechanisms. It is currently >>> impossible (for me, anyway) to understand how the machinery to support >>> the described features could be built. Without that I cannot approve >>> the PEP. Phillip knows this but is too busy to work on it. >> >> Actually, I was under the impression you didn't want the API >> described in the PEP, and wanted the following changes in addition to >> dropping method combination, aspects, and interfaces: > > I'd like to clarify these requirements a little bit: > > On the issue of method combination, aspects, and interfaces: Guido has > not made a pronouncement on whether these things may or may not be > accepted at some time in the future. What he has said is that he doesn't > *yet* understand the use case for them, and that these should be > separate PEPs so that we can argue their merits independently. What he's > strongly against (if my understanding is correct) is a "package deal" > where he is forced to accept all of the features, or none. > > I get the sense that the need for some of these advanced features > becomes apparent only after having worked with generics for a while. If > that's the case, then the best hope for including them in the stdlib is > to get an implementation of generics into the hands of lots of Python > programmers so that they can become familiar with them. > Well perhaps I can explain a few things. First of all it is important to note that generic functions don't do much that cannot already be done, but sometimes using generic functions can make things easier to read and maintain. For the purposes of talking about this, we will consider a simple function of one argument. The most basic type of generic function dispatch is one that dispatches based on object type. Now clearly, one could achieve the same basic effect by doing type-checking in the body and putting what would be the contents of the generic function inside the body of an if or switch statement. But lets say there are 15 possible types, each of which needs to be handled differently. In that case, something like generic functions make the code far more readable. One of the nice features of Eby's proposal is that more complicated dispatching systems can be added. Perhaps some application needs a dispatching engine that can dispatch based on the value of an objects member. Perhaps the user wants an overload specificly for any product object whose price property equals 0. With Eby's system adding a dispatch engine that supports that is not difficult. But realize that generic functions are a type of method combination. Basically the alternatives are combined together. Sure, they remain separate functions in python memory, but to a caller, it looks like a single method. As such some of the support framework will be the same for both, it seems logical to propose a full method combination system at the same time. What are the use cases for method combination? Well lets say you are using a third party library. One of the functions you want to use works ok, however, when it is operating some specific type of object (one of your design perhaps), and it does not cleanup properly for that object, because it was not aware of the specifics of that type of object. Perhaps it leaves a file handle open. One could use an after method to perform the cleanup. Now, one may argue that you could also just replace the function with a wrapper function that calls the original and then does the cleanup. However, what if there were more than one such instance needed. What if there where many? Then it would be nice to be able to use a mechanism not unlike the generic function system that could keep track of all of them and combine them. Before methods are useful for things like adding extra bounds checking to an existing function. For what its worth, I've worked with a system that had something related to the before and after methods, and found it worked well. As you can hopefully see so far the name of the game is to combine code from different places and perhaps written by different people, and present them to the user as one cohesive method. That is what Generic functions do. That is what method combination does. It seems to me to be a good idea to implement them together to ensure they work together properly. The effects of this can be wonderful. A package could convert some of a frameworks functions to generics, to allow them to handle the new objects the package provides. It might also need to add some before and after methods to ensure that the user of the module, it looks like the framework was designed to support the module in question, when in fact, it was not. The idea being the package can basically make the needed changes so that everything just works. All without having to duplicate any code from the framework itself. See the benefits? (The framework mentioned could be a major framework like ZOPE, just an average package, or even a simple module.) Now on to the interfaces/adaptation part of the PEP. I would rather see that system primarily used as an adaptation system. It seems very well designed for that purpose. To me interfaces are a way for a class to tell other code that it has a certain set of properties and methods which act in a specific fashion. While Eby's proposal can do that, ABCs seems like a nicer way to do that in my opinion. However, an adaptor provides a means to use a single interface to interact with objects that provide similar functionally, natively have different interfaces. Eby's described system sounds ideal for that purpose. That said, I think it can be reasonably spun off into a separate PEP. It is very much dependent on an implementation of generic functions, but AFAICT the rest of the PEP does not depend on it. Please feel free to correct me If I made any mistakes in the above analysis. From martin at v.loewis.de Sat Jul 21 11:38:07 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 21 Jul 2007 11:38:07 +0200 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <46A16906.7010005@canterbury.ac.nz> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <46A16906.7010005@canterbury.ac.nz> Message-ID: <46A1D3FF.4020000@v.loewis.de> Greg Ewing schrieb: > Josiah Carlson wrote: >> What made it really annoying is that there was no way to tell the heapq >> module not to load the C version so that I could use a generic container. > > I would say that all such dual-implementation modules should > make the specific implementations available under different > names, using some convention such as _c_heapq/_p_heapq. You mean, like prefixing it with c, e.g. StringIO vs. cStringIO, pickle vs. cPickle? Regards, Martin From greg.ewing at canterbury.ac.nz Sat Jul 21 11:56:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 Jul 2007 21:56:10 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A19FCC.7070609@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> Message-ID: <46A1D83A.2080308@canterbury.ac.nz> Talin wrote: > Overloading a class method requires special treatment of the 'self' > parameter because there's an implicit constraint on what types of > objects can be passed as 'self' Hang on a minute. Is it really necessary for the GF machinery to concern itself with this? By the time you get to the (possibly overloaded) method object, dispatching on 'self' has already been done. So the GF machinery can just ignore 'self' and dispatch on the rest of the arguments -- can't it? -- Greg From greg.ewing at canterbury.ac.nz Sat Jul 21 12:03:56 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 21 Jul 2007 22:03:56 +1200 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <46A1D3FF.4020000@v.loewis.de> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <46A16906.7010005@canterbury.ac.nz> <46A1D3FF.4020000@v.loewis.de> Message-ID: <46A1DA0C.5010107@canterbury.ac.nz> Martin v. L?wis wrote: > You mean, like prefixing it with c, e.g. StringIO vs. cStringIO, > pickle vs. cPickle? Yes, but with an official scheme for deriving the names from the main package name, and also an understanding that these are implementation details to be used only when really necessary (hence the leading underscores). Considering Guido's comment about people gratuitously using the C versions, perhaps only the Python version should be made available as an official alternative. It's unlikely that people will gratuitously choose what they perceive to be a *slower* version of the module. :-) -- Greg From dalke at dalkescientific.com Sat Jul 21 16:23:45 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 21 Jul 2007 16:23:45 +0200 Subject: [Python-3000] removing exception .args Message-ID: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> Posting here a expansion of a short discussion I had after Guido's keynote at EuroPython. In this email I propose eliminating the ".args" attribute from the Exception type. It's not useful, and supporting it correctly is complicated enough that it's often not supported correctly In Python 2 the base Exception class works like this >>> x = Exception("spam", "was", "here") >>> x[0] 'spam' >>> x.args ('spam', 'was', 'here') >>> In Py3K the [0] index lookup disappears. This is a good thing. Positional lookup like this is rarely useful. The .args attribute remains. I don't see the need for it and propose that it be removed in Py3K. Why? The "args" attribute is not useful. People making non-trivial Exception subclasses often forget to call __init__ on the parent exception, and using attribute lookups is much better than using an index lookup. That's the experience of the stat call. Having support for a single object (almost always a string) passed into the exception is pragmatically useful, so I think the base exception class should look like class Exception(object): msg = None def __init__(self, msg): self.msg = msg def __str__(self): if self.msg is not None: return "%s()" % (self.__class__.__name__,) else: return "%s(%r)" % (self.__class__.__name__, self.msg) ** The rest of this email is because I'm detail oriented and present evidence to back up my assertion. There are a number of subclasses which should but don't call the base __init__, generic error reporting software can't use the "args protocol" for anything. Pretty much the only thing a generic error report mechanism (like traceback and logging) can do is call str() on the exception. Here are some examples to show that some exceptions in the standard library don't do a good job of calling the base class __init__. (in HTMLParser.py) class HTMLParseError(Exception): """Exception raised for all parse errors.""" def __init__(self, msg, position=(None, None)): assert msg self.msg = msg self.lineno = position[0] self.offset = position[1] (in calender.py) # Exceptions raised for bad input class IllegalMonthError(ValueError): def __init__(self, month): self.month = month def __str__(self): return "bad month number %r; must be 1-12" % self.month (in doctest.py) class DocTestFailure(Exception): ... def __init__(self, test, example, got): self.test = test self.example = example self.got = got def __str__(self): return str(self.test) Eyeballing the numbers, I think about 1/3rd of the standard library Exception subclasses with an __init__ forget to call the base class and forget to set .args and .msg. For better readability and maintainability, complex exceptions with multiple parameters should make those parameters accessible via attributes, and not expect clients to reach into the args list by position. All three classes I just listed defined a new __init__ so that the parameters were available by name. Here's an exception which does the right thing under Python2. By that I meaning that it fully implements the exception API and it makes the parameters available as named attributes. It also protects against subclasses which forget to call GetoptError.__init__ by defining class attributes. (from getopt.py ) class GetoptError(Exception): opt = '' msg = '' def __init__(self, msg, opt=''): self.msg = msg self.opt = opt Exception.__init__(self, msg, opt) def __str__(self): return self.msg This is correct, but cumbersome. Why should we encourage all non-trivial subclasses to look like this? Historically there has been a problem with the existing ".args". The base class implementation of __str__ required that that attribute be present. This changed some time between 2.3 and 2.5. This change invalidated comments like this in httplib.py class HTTPException(Exception): # Subclasses that define an __init__ must call Exception.__init__ # or define self.args. Otherwise, str() will fail. pass which later on hacks around not calling __init__ by doing this class UnknownProtocol(HTTPException): def __init__(self, version): self.args = version, self.version = version One last existing example to point out. urllib2.py uses class URLError(IOError): # URLError is a sub-type of IOError, but it doesn't share any of # the implementation. need to override __init__ and __str__. # It sets self.args for compatibility with other EnvironmentError # subclasses, but args doesn't have the typical format with errno in # slot 0 and strerror in slot 1. This may be better than nothing. def __init__(self, reason): self.args = reason, self.reason = reason def __str__(self): return '<urlopen error %s>' % self.reason Again, a hack. This time a hack because EnvironmentError wants an errno and an errorstring. >>> EnvironmentError(2,"This is an error message","sp") EnvironmentError(2, 'This is an error message') >>> err = EnvironmentError(2,"This is an error message","sp") >>> err.errno 2 >>> err.strerror 'This is an error message' >>> err.filename 'sp' >>> (Note the small bug; the filename is not shown in str(err) ) In closing, given an arbitrary exception, the only thing you can hope might work is str(exception). There's a decent chance that .args and even .msg aren't present. Generic exception handling code cannot expect those attribute to exist, and handlers for specific type should use named attributes rather than the less readable/less maintainable position attributes. Python3K is allowed to be non-backwards compatible. I propose getting rid of this useless feature. Andrew dalke at dalkescientific.com From g.brandl at gmx.net Sat Jul 21 17:08:37 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 21 Jul 2007 17:08:37 +0200 Subject: [Python-3000] removing exception .args In-Reply-To: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> Message-ID: <f7t7hk$f3c$1@sea.gmane.org> Andrew Dalke schrieb: > Posting here a expansion of a short discussion I had > after Guido's keynote at EuroPython. In this email > I propose eliminating the ".args" attribute from the > Exception type. It's not useful, and supporting it > correctly is complicated enough that it's often not > supported correctly > > > > In Python 2 the base Exception class works like this > > >>> x = Exception("spam", "was", "here") > >>> x[0] > 'spam' > >>> x.args > ('spam', 'was', 'here') > >>> > > In Py3K the [0] index lookup disappears. This is a > good thing. Positional lookup like this is rarely useful. > > The .args attribute remains. I don't see the need for > it and propose that it be removed in Py3K. Hm, I always found it useful to just do class MyCustomError(Exception): pass and give it arbitrary arguments to it without writing __init__ method stuff that I can access from outside. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Sat Jul 21 17:16:12 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 21 Jul 2007 08:16:12 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <f7sc4k$bht$1@sea.gmane.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <f7sc4k$bht$1@sea.gmane.org> Message-ID: <ca471dc20707210816r4d663cdaqcef7e9f28c150a75@mail.gmail.com> On 7/21/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > One of the nice features of Eby's proposal is that more complicated > dispatching systems can be added. Perhaps some application needs a > dispatching engine that can dispatch based on the value of an objects > member. Perhaps the user wants an overload specificly for any product object > whose price property equals 0. With Eby's system adding a dispatch engine > that supports that is not difficult. This is true. However it comes at a cost. Whenever I see an API that takes a string which is then parsed by the called function as a Python expression (perhaps constrained to a subset of Python) I cringe, especially if the common use is to pass a literal. There are just so many issues with that... It's not colorized by the editor, it's not syntax-checked by either the editor or the Python parser, it requires one to build yet another parser... This is why I don't like the ...when("isinstance(obj, list)") syntax from (I think) RuleDispatch, and I'm glad it's not in the PEP. I'm unclear however on how you would do this otherwise -- is overloading implies() the best approach? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Jul 21 17:21:54 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 21 Jul 2007 08:21:54 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A19FCC.7070609@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> Message-ID: <ca471dc20707210821s160c88dy36c82e2184348afc@mail.gmail.com> On 7/20/07, Talin <talin at acm.org> wrote: > On the issue of method combination, aspects, and interfaces: Guido has > not made a pronouncement on whether these things may or may not be > accepted at some time in the future. What he has said is that he doesn't > *yet* understand the use case for them, and that these should be > separate PEPs so that we can argue their merits independently. What he's > strongly against (if my understanding is correct) is a "package deal" > where he is forced to accept all of the features, or none. I'm mellowing out on this a bit -- I'm no longer requesting a separate PEP with all the advanced features (I understand Phillip's argument that that second PEP will just be an easy rejection target). I do want to understand the motivation and implementation for each of the advanced features, so we can have a reasonable discussion about whether a particular feature is really worth adding or can easily be added later by/for the few users who really need it. > It seems to me that PEPs should only be required to explain their > mechanisms if there's some doubt or controversy about the > implementation. But referring to my sandbox/overloading implementation is *not* acceptable; I want whatever that does (not much) spelled out in the PEP for posterity. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Jul 21 17:31:13 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 21 Jul 2007 08:31:13 -0700 Subject: [Python-3000] removing exception .args In-Reply-To: <f7t7hk$f3c$1@sea.gmane.org> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> <f7t7hk$f3c$1@sea.gmane.org> Message-ID: <ca471dc20707210831s1e304d30m77fe5412b66edebe@mail.gmail.com> On 7/21/07, Georg Brandl <g.brandl at gmx.net> wrote: > Andrew Dalke schrieb: > > Posting here a expansion of a short discussion I had > > after Guido's keynote at EuroPython. In this email > > I propose eliminating the ".args" attribute from the > > Exception type. It's not useful, and supporting it > > correctly is complicated enough that it's often not > > supported correctly > > > > > > > > In Python 2 the base Exception class works like this > > > > >>> x = Exception("spam", "was", "here") > > >>> x[0] > > 'spam' > > >>> x.args > > ('spam', 'was', 'here') > > >>> > > > > In Py3K the [0] index lookup disappears. This is a > > good thing. Positional lookup like this is rarely useful. > > > > The .args attribute remains. I don't see the need for > > it and propose that it be removed in Py3K. > > Hm, I always found it useful to just do > > class MyCustomError(Exception): > pass > > and give it arbitrary arguments to it without writing __init__ > method stuff that I can access from outside. Right. Also, the fact that there is no *guarantee* that e.args contains *all* the arguments passed to the constructor doesn't mean that e.args isn't useful. It's useful for many standard exceptions. I also happen to think that it's well-defined: it is whatever is passed to Exception.__init__(), whether called directly or from an overriding __init__() method. Given the amount of code that currently uses it, I think removing it would also be a major undertaking, as we would have to invent names for everything that's currently accessed via e.args[i]. (I know there's a lot of code that uses it, because converting the stdlib from e[i] to e.args[i] was a major pain.) So -1 on removing e.args. I'd be okay with a recommendation not to rely on it and to define explicitly named attributes for everything one cares for. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Sat Jul 21 19:07:12 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Sat, 21 Jul 2007 13:07:12 -0400 Subject: [Python-3000] pep 3124 plans References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com><20070713173936.53C213A404D@sparrow.telecommunity.com><f7pgki$6o3$1@sea.gmane.org><ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com><20070720174706.AE5773A40A8@sparrow.telecommunity.com><46A19FCC.7070609@acm.org> <f7sc4k$bht$1@sea.gmane.org> <ca471dc20707210816r4d663cdaqcef7e9f28c150a75@mail.gmail.com> Message-ID: <f7teg3$1lh$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707210816r4d663cdaqcef7e9f28c150a75 at mail.gmail.com... > On 7/21/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> One of the nice features of Eby's proposal is that more complicated >> dispatching systems can be added. Perhaps some application needs a >> dispatching engine that can dispatch based on the value of an objects >> member. Perhaps the user wants an overload specificly for any product >> object >> whose price property equals 0. With Eby's system adding a dispatch engine >> that supports that is not difficult. > > This is true. However it comes at a cost. Whenever I see an API that > takes a string which is then parsed by the called function as a Python > expression (perhaps constrained to a subset of Python) I cringe, > especially if the common use is to pass a literal. There are just so > many issues with that... It's not colorized by the editor, it's not > syntax-checked by either the editor or the Python parser, it requires > one to build yet another parser... > > This is why I don't like the ...when("isinstance(obj, list)") syntax > from (I think) RuleDispatch, and I'm glad it's not in the PEP. I'm > unclear however on how you would do this otherwise -- is overloading > implies() the best approach? First of all, If i understrand the PEP correectly. that should be: when(funcname,"isinstance(obj, list)") where funcname is the name of the function to be overloaded. Whatever dispatch engine that is, is it not possible to do something more like when(funcname,{isinstance,{obj,list}))? (I used list syntax here as an example only (other syntaxes could work). I'm not sure if the 'obj' part is refering to a variable that would be in scope at the when declaration. That might have to be quoted as a string. Regardless though, I'm pretty sure dispatch engines can use things other than interpreted strings. From foom at fuhm.net Sat Jul 21 19:17:55 2007 From: foom at fuhm.net (James Y Knight) Date: Sat, 21 Jul 2007 13:17:55 -0400 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <200707210025.11031.fdrake@acm.org> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> Message-ID: <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> On Jul 21, 2007, at 12:25 AM, Fred L. Drake, Jr. wrote: > On Saturday 21 July 2007, Joe Gregorio wrote: >> Should xml.parsers.expat.XMLParser.ParseFile(file) operate on >> both text and binary streams? > > No. XML is a serialization of a markup language containing Unicode > character > into an encoded stream. Well...there's many reasons why it is useful to be able to parse an already-decoded unicode stream into XML, and to serialize XML into a unicode string. For example, if combining into a larger unicode document, or parsing from a literal string in the source code. Sure, normally XML is serialized to bytes, but it is also serializable to unicode, and that's a useful feature to have (if implementable). James From pje at telecommunity.com Sat Jul 21 19:33:08 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 21 Jul 2007 13:33:08 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707210816r4d663cdaqcef7e9f28c150a75@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <f7sc4k$bht$1@sea.gmane.org> <ca471dc20707210816r4d663cdaqcef7e9f28c150a75@mail.gmail.com> Message-ID: <20070721173204.C1B913A40D7@sparrow.telecommunity.com> At 08:16 AM 7/21/2007 -0700, Guido van Rossum wrote: >On 7/21/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > > One of the nice features of Eby's proposal is that more complicated > > dispatching systems can be added. Perhaps some application needs a > > dispatching engine that can dispatch based on the value of an objects > > member. Perhaps the user wants an overload specificly for any > product object > > whose price property equals 0. With Eby's system adding a dispatch engine > > that supports that is not difficult. > >This is true. However it comes at a cost. Whenever I see an API that >takes a string which is then parsed by the called function as a Python >expression (perhaps constrained to a subset of Python) I cringe, >especially if the common use is to pass a literal. There are just so >many issues with that... It's not colorized by the editor, it's not >syntax-checked by either the editor or the Python parser, Note that it's been previously proposed to add an AST literal syntax for "quoting" code to get around this, but such metasyntactic features were rejected for 3.0. There are other applications for such a syntax besides generic functions: there exist today Python ORMs that translate Python generator expressions to SQL queries. Today, they work by decompiling bytecode, precisely to avoid some of the issues you mention. However, an AST literal syntax would actually work better for that, IMO, just as it would for generic functions. > it requires >one to build yet another parser... Well, for Python and subsets thereof, it suffices to use the stdlib for that. My implementations use the tuple-formatted ASTs from the 'parser' module. >This is why I don't like the ...when("isinstance(obj, list)") syntax >from (I think) RuleDispatch, and I'm glad it's not in the PEP. I'm >unclear however on how you would do this otherwise -- is overloading >implies() the best approach? It's one approach. However, the idea of "@when" and other decorators in the PEP taking a second argument is so that you can pass in objects of your own design. These objects can implement disjuncts() to support or-ed conditions, and can request an upgrade to a different dispatching engine. Of course, the ability to pass in such objects means you could pass in something like "Expr('some python expression')"... which was of course one thing I planned to use it for. From fdrake at acm.org Sat Jul 21 19:36:59 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sat, 21 Jul 2007 13:36:59 -0400 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> Message-ID: <200707211336.59820.fdrake@acm.org> On Saturday 21 July 2007, James Y Knight wrote: > Well...there's many reasons why it is useful to be able to parse an > already-decoded unicode stream into XML, and to serialize XML into a > unicode string. For example, if combining into a larger unicode > document, or parsing from a literal string in the source code. Yes, but that doesn't mean it's the XML parser's job to take multiple input types. It could easily be supported by creating a wrapper object that converts unicode to bytes objects, so the underlying C parser still gets bytes. Such a wrapper could easily be part of xml.parsers.expat if desired, but I'd like to avoid adding lots of stuff to the pyexpat C code. Avoiding complexifying the C code is a good thing. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> From talin at acm.org Sat Jul 21 19:36:05 2007 From: talin at acm.org (Talin) Date: Sat, 21 Jul 2007 10:36:05 -0700 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> Message-ID: <46A24405.1050102@acm.org> James Y Knight wrote: > On Jul 21, 2007, at 12:25 AM, Fred L. Drake, Jr. wrote: > >> On Saturday 21 July 2007, Joe Gregorio wrote: >>> Should xml.parsers.expat.XMLParser.ParseFile(file) operate on >>> both text and binary streams? >> No. XML is a serialization of a markup language containing Unicode >> character >> into an encoded stream. > > Well...there's many reasons why it is useful to be able to parse an > already-decoded unicode stream into XML, and to serialize XML into a > unicode string. For example, if combining into a larger unicode > document, or parsing from a literal string in the source code. > > Sure, normally XML is serialized to bytes, but it is also > serializable to unicode, and that's a useful feature to have (if > implementable). The general use case for XML is reading or writing a document, where "document" means a bytestream from either a file or a socket. The question is whether it would also be useful to parse Python strings that contain XML markup, or format an XML document into a Python string. Some care needs to be taken here, because XML has its own way of specifying the character encoding. For example, suppose I have a python string that contains the characters: '<?xml version="1.0" encoding="utf-8" ?>' Well, the problem with this is that the encoding *isn't* UTF-8. Python 3000 strings are internally encoded as UTF-16 (although generally it tries to hide that fact from you so most of the time you don't have to care.) Suppose then that you write this string out to a file (perhaps after combining it with other strings.) If I happen to write the file as UTF-8, then everything is fine, but if I happen to pick some other encoding that doesn't match the encoding attribute in the prologue then we have the potential for confusion. This matters because there are lots of people who write XML documents with print statements (and many of them forget to handle things like escaping of entities and such.) This also matters because the Python XML parsing libraries are mostly based on expat, which is C code that doesn't have any special knowledge of Python strings - it only works on the encodings that it can detect, or which you tell it to use. So if you wanted to directly parse a Python string as XML, you would probably have to treat it as a byte array and override the encoding detection, telling it explicitly to use UTF-16. > James > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org > From ncoghlan at gmail.com Sat Jul 21 20:02:57 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 22 Jul 2007 04:02:57 +1000 Subject: [Python-3000] removing exception .args In-Reply-To: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> Message-ID: <46A24A51.1090101@gmail.com> Andrew Dalke wrote: > Having support for a single object (almost always a > string) passed into the exception is pragmatically useful, > so I think the base exception class should look like > > class Exception(object): > msg = None > def __init__(self, msg): > self.msg = msg > def __str__(self): > if self.msg is not None: > return "%s()" % (self.__class__.__name__,) > else: > return "%s(%r)" % (self.__class__.__name__, self.msg) > > ** Went there, didn't like it, left again. See PEP 352, especially the section on the late (unlamented) BaseException.message. > The rest of this email is because I'm detail oriented > and present evidence to back up my assertion. > > There are a number of subclasses which should but don't > call the base __init__, generic error reporting software > can't use the "args protocol" for anything. Pretty much > the only thing a generic error report mechanism (like > traceback and logging) can do is call str() on the exception. As of Python 2.5, you can rely on the attribute being present, as it is provided automatically by BaseException: .>>> class MyException(Exception): ... def __init__(self): ... pass ... .>>> MyException().args () Of course, as Guido pointed out, args will be empty unless the exception sets it directly or via BaseException.__init__. > This is correct, but cumbersome. Why should we > encourage all non-trivial subclasses to look like this? If you want to avoid requiring that subclasses call your __init__ method, you can actually do that by putting any essential initialisation into the __new__ method instead. Then the requirement is merely to call the parent __new__ method if you override __new__, and you have to do something like that in order to create the class instance in the first place. To rewrite the example from getopt using this technique: class GetoptError(Exception): def __new__(cls, msg, opt=''): self = super(cls, GetoptError).__new__(cls, msg, opt='') self.msg = msg self.opt = opt return self def __str__(self): return self.msg I actually find using __new__ this way to be a useful practice in general for setting up class invariants in base classes, as it's easy to forget to call __init__ on the base class, but forgetting to call __new__ takes some serious effort. Putting the essential parts in __new__ means never having to include the instruction that "you must call this classes __init__ method when subclassing and overriding __init__" into any API documentation I write. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From talin at acm.org Sat Jul 21 20:04:55 2007 From: talin at acm.org (Talin) Date: Sat, 21 Jul 2007 11:04:55 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <f7sc4k$bht$1@sea.gmane.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com><20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <f7sc4k$bht$1@sea.gmane.org> Message-ID: <46A24AC7.3050505@acm.org> Joe Smith wrote: > The effects of this can be wonderful. A package could convert some of a > frameworks functions to generics, to allow them to handle the new objects > the package provides. It might also need to add some before and after > methods to ensure that the user of the module, it looks like the framework > was designed to support the module in question, when in fact, it was not. > The idea being the package can basically make the needed changes so that > everything just works. All without having to duplicate any code from the > framework itself. > See the benefits? (The framework mentioned could be a major framework like > ZOPE, just an average package, or even a simple module.) When considering the decision to include a new feature into the language, one has to consider the costs as well as the benefits. You've made an impassioned argument showing all the wonderful power and expressiveness of these various features. However, power and expressiveness are not the only factors that should be considered. To give an analogy, think back 20-25 years ago, when there was still a vocal contingent of programmers who were in favor of self-modifying assembly code. Expert hackers would show the amazing power of this technique, all of the wonderfully clever tricks that you could accomplish. (I remember this because I was writing games back them, and self-modifying code was the only way you could write 6502 assembly code that was actually efficient. Since the 6502 had no 16-bit index registers, the only way to have efficient arrays larger than 256 bytes was to calculate the address and then modify the 16-bit address field of the subsequent instruction.) At the same time, however, this clever technique came at a cost: Programs that were very difficult to debug or even understand. Many people spoke out against it, and for a time it seemed that the technique was a dying art. Today we have the best of both worlds: We still have self-modifying code, but nowadays we call it JIT: Just-In-Time compilation. Instead of a free-for-all where a programmer can modify any arbitrary memory address, instead the power of run-time code generation is safely sandboxed inside of a JIT compiler component that is very competent at hiding the grisly details from the programmer. Now, don't think that I am directly comparing method combination to self-modifying assembly code. I'm not saying that such things are inherently dangerous and should be avoided. Rather, what I am trying to point out is the *thought process* that should be applied to any new feature. Python is a "small" language in the sense that it's easy to hold the entire syntax in your head, and lots of people want to keep it that way. This does not mean that we can't move forward with new features. But it means that each feature needs to be judged and weighed as to how much it affects that "mental smallness" of the language. Generic functions are favored because they have the potential to *shrink* certain kinds of problems. I don't mean in the sense of requiring the programmer to type less keystrokes, but in the sense of shrinking how much brainpower it takes to think about the problem. But even then it took Guido several months (according to a posting he made some time ago) of thinking about generics before he reached his "Aha" moment with regards to completely grokking the concept. This focus on practicality rather than rocket science is exactly why Guido's a good gatekeeper in these matters - if he doesn't understand it why it's important or useful, it probably means that lots of other Python developers won't either. -- Talin From pje at telecommunity.com Sat Jul 21 20:16:57 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 21 Jul 2007 14:16:57 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A19FCC.7070609@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> Message-ID: <20070721181442.48FB03A403A@sparrow.telecommunity.com> At 10:55 PM 7/20/2007 -0700, Talin wrote: >You mentioned earlier that there was a design reason for preferring >@overload and @when vs. the earlier RuleDispatch syntax, but the >explanation you gave wasn't very clear (to me anyway). > >(I personally prefer the @somegeneric.overload, but that's purely an >aesthetic value judgement - if there's a strong architectural advantage >of the other syntax, I'd like to hear it.) You can't add new method combinations that way. If method combining is a function, not a method, then you can add as many new method types as you like. If you have to use @somegeneric.before and @somegeneric.after, you can't decide on your own to add @somegeneric.debug. However, if it's @before(somegeneric...), then you can add @debug and @authorize and @discount and whatever else you need for your application, without needing to monkeypatch them in. To me, TOOOWTDI means that all (or nearly all) the decorators should follow the same pattern. >Right. There are two reasons that I think that post-hoc overloading runs >into problems. The first, as you mentioned, is that it's difficult to >implement without some kind of trickery. Well, that depends on what you define as "trickery", but clearly Guido feels that being able to overload an existing function without having to go through the code of every possible client is indeed "trickery". IMO, however, going through the code of the clients is an unreasonable and unscalable task that goes against the whole point of the exercise: to "assert qualified statements over oblivious code" (one common definition of aspect-oriented programming). If I have to go through all the code that might have imported the function and stored it somewhere, that's hardly oblivious. It creates an opportunity for invisible, *import sequence-dependent* bugs, that can be reintroduced any time somebody changes an import statement! So the irony, IMO, of avoiding this "trickery" is that it makes the practice error-prone, thereby providing a self-fulfilling justification for avoiding its use. (Whereas, if the "trickery" were allowed, it would be much safer to actually use it.) All that having been said, I'm still willing to make an implementation that does it Guido's way. I just don't agree that the restriction is justified. But more on that below. >The second reason - this is my opinion - is that it too much resembles >the mythical "comefrom" statement (the opposite of "goto"). The >"comefrom" statement is intended to be a joke - the worst possible >language feature from the standpoint of being able to manually trace the >flow of execution of a program. Well, I've worked with people who dislike OO for exactly the same reason, since they feel they can never know whether a method might have been overridden in a subclass. Seriously! However, for the specific use cases *I* have in mind, you'd be using oblivious extension to implement customer-specific business rules, layered atop a core framework. You don't want to waste time declaring *everything* overloadable, any more than you declare classes to be subclassable! You just need to be able to write the customer's rules in one place. So if you're trying to follow something manually, you're going to look at that customer's business rule modules in order to know about the exceptional control flow. I don't think that's really comparable to the joke implementation of "come from". In any system, the more the computer does for you, the harder it will be for you to mentally emulate what the computer's doing, step-by-step. That's simply the nature of the beast. However, in the case of rule-based declarative abstractions, you're getting closer to something that's *easier* for the brain to model. Our brains run by pattern recognition, with more-specific patterns taking precedence, so this is an easier model for your brain to follow than step-by-step computation anyway. Certainly, it's an easier model for your software customers to provide you with in the first place. I.e., customers usually don't give you a step-by-step, "well, first I check if the customer has an outstanding balance before I ship them anything." They say, "Don't ship stuff to people with an outstanding balance." And guess what? Viewed formally, that's a "come from" statement. So the most straightforward expression of typical business rules and requirements, is going to consist of a list of come-froms. So coding them that way actually gets us more verifiable requirements, and a simpler mental model to *produce* the code in the first place. >One issue that hasn't been satisfactorily resolved is the handling of >the 'self' parameter. At least, let me give my explanation of what I >think the issue is and see if we're on the same page: > >Overloading a class method requires special treatment of the 'self' >parameter because there's an implicit constraint on what types of >objects can be passed as 'self': for any method defined in any class, >the 'self' parameter must be an instance of the class (or a subclass) in >which the method is defined. Now, this would be trivial if we required >the programmer to explicitly declare the type of 'self', but this >violates DRY and has the potential to cause mischief if the programmer >forgets to update the method signature when they change the class. Well, actually that never occurred to me, because obviously you can't do that (refer to the class before it's finished being defined). :) >If it turns out that there's no way to get a callback when the class has >finished being built, then we may have to defer finishing the >construction until the first time the generic function is called. This >wouldn't be too bad, considering that there's a bunch of other stuff >that is lazily calculated on first call anyway, from what I understand. Actually, this isn't anywhere near as complicated as all the stuff I just snipped from the above. :) All that matters is whether the decorator is invoked in the body of a class. If it is, it needs a callback to finish the job. If it isn't, it can immediately go ahead with what it's doing. Note that this was implemented in RuleDispatch literally years ago; it's only the loss of __metaclass__ that presents a problem for a Py3K implementation. From dalke at dalkescientific.com Sat Jul 21 23:16:35 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 21 Jul 2007 23:16:35 +0200 Subject: [Python-3000] removing exception .args In-Reply-To: <46A24A51.1090101@gmail.com> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> <46A24A51.1090101@gmail.com> Message-ID: <590554CB-D940-4424-8CD5-154F73732DE1@dalkescientific.com> The main statement I have is, excepting backwards compatibility, nothing would care if .args was removed in 3.0, and those which currently used .args were changed to use attributes instead. Please show/advise me otherwise. > Andrew Dalke wrote: >> so I think the base exception class should look like >> class Exception(object): >> msg = None >> def __init__(self, msg): >> self.msg = msg >> def __str__(self): >> if self.msg is not None: >> return "%s()" % (self.__class__.__name__,) >> else: >> return "%s(%r)" % (self.__class__.__name__, self.msg) On Jul 21, 2007, at 8:02 PM, Nick Coghlan wrote: > Went there, didn't like it, left again. See PEP 352, Sure, fine. The "pragmatic" thing I care about is allowing a single argument to be passed in the base exception class, which in turn is used in the __str__ / __repr__. If it's called "message" or "msg" or stored in .args as a single element tuple, I don't care. For example, this would also be fine to me. class Exception(object): __obj = object() def __init__(self, msg): self.__obj = msg def __repr__(self): if self.__obj is Exception.__obj: return "%s()" % (self.__class__.__name__,) else: return "%s(%r)" % (self.__class__.__name__, self.__obj) > especially the section on the late (unlamented) BaseException.message. I'm more hoping for this part of the "retracted ideas" section: ... and consider a more long-term transition strategy in Python 3.0 to remove multiple-argument support in BaseException in preference of accepting only a single argument. That section also says that removing 'args' during the transition is hard. I can believe it. But Python 3 can be non-backwards compatible. > As of Python 2.5, you can rely on the attribute being present, > as it is provided automatically by BaseException: Yes, I know that. Is it useful? Is having an autogenerated, empty .args useful? Why? What code would break? (excepting backwards compatibility for code that expects to extra information via position instead of attribut) As far as I can tell, it's not useful. And that's why it should be deleted. If it were useful, then explain why 'filename' isn't in the args list for IOError, as in >>> import os >>> err = IOError(2, os.strerror(2), "/path/to/nowhere") >>> err.args (2, 'No such file or directory') >>> repr(err) "IOError(2, 'No such file or directory')" >>> err.errno 2 >>> err.strerror 'No such file or directory' >>> err.filename '/path/to/nowhere' >>> Answer: it's a bug. But it's a bug that no one really cares about. Its lack affects no one. And removing 'args' would affect .. no one. Excepting code which currently expects to get fields [0], [1], ... when the original exception should have defined attributes instead. > If you want to avoid requiring that subclasses call your > __init__ method, you can actually do that by putting any > essential initialisation into the __new__ method instead. That wasn't my point. My point is that many non-trivial exception classes don't currently call the base class __init__ nor set the .vars attribute. That one class I showed was an example of defensive programming - knowing that there's a decent chance that derived classes won't call the __init__. There should be no reason to be this defensive. Most other classes are not. That getopt example was a second-order effect and should not be a driving case for any future direction. The real problem isn't that .args wasn't initialized. The real problem is that .args shouldn't need to exist. (In personal email I did a followup on why I think __new__ should not be used for this case, or for the more generally advocated case of "setting up class invariants in the base class." I felt that that was a distracting tangent.) Andrew dalke at dalkescientific.com From greg.ewing at canterbury.ac.nz Sun Jul 22 02:26:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jul 2007 12:26:24 +1200 Subject: [Python-3000] removing exception .args In-Reply-To: <f7t7hk$f3c$1@sea.gmane.org> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> <f7t7hk$f3c$1@sea.gmane.org> Message-ID: <46A2A430.1090405@canterbury.ac.nz> Georg Brandl wrote: > Hm, I always found it useful to just do > > class MyCustomError(Exception): > pass > > and give it arbitrary arguments to it without writing __init__ > method stuff that I can access from outside. Maybe class Exception(object): def __init__(self, msg = None, **kwds): self.msg = msg self.__dict__.update(kwds) Then you'd have to pass your extra args as keyword args, but you could still avoid having an __init__ if you wanted. -- Greg From brett at python.org Sun Jul 22 02:46:21 2007 From: brett at python.org (Brett Cannon) Date: Sat, 21 Jul 2007 17:46:21 -0700 Subject: [Python-3000] removing exception .args In-Reply-To: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> Message-ID: <bbaeab100707211746r5a4cb2a2rafb066bb24260206@mail.gmail.com> On 7/21/07, Andrew Dalke <dalke at dalkescientific.com> wrote: > Posting here a expansion of a short discussion I had > after Guido's keynote at EuroPython. In this email > I propose eliminating the ".args" attribute from the > Exception type. It's not useful, and supporting it > correctly is complicated enough that it's often not > supported correctly > This was originally proposed in PEP 352. This was the reason for the existence of the 'message' attribute as introduced in Python 2.5.. At PyCon 2007 I actually removed 'args' (see the p3yk_no_args_on_exc branch in svn: http://svn.python.org/view/python/branches/p3yk_no_args_on_exc/). But after making everyone at PyCon suffer through my swearing and frustration and talking with python-dev (and thus should be in the python-dev/python-3000 archives), the decision was made to not remove it (which is why 'message' is deprecated in Python 2.6). This was because the removal at the C level is very painful. There are many places within the code where a tuple is passed to various C functions that expect that tuple to be treated as multiple arguments to the exception constructor. But changing the semantics of a C function has already been labeled a no-no. So one would have to remove the C functions that construct exceptions with arguments and use a new one that only expects a single argument so not to have unexpected semantics. That sucks because those functions are all over. In the branch I just stuck the tuple into the 'message' attribute, but that caused its own issues as output was now a little funky since everything was considered a tuple, including single arguments. So while I totally understand the desire to ditch 'args' and just have 'message', doing so thoroughly and in any reasonable way that is not painful is not easy thanks to the C API. -Brett From greg.ewing at canterbury.ac.nz Sun Jul 22 02:47:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jul 2007 12:47:36 +1200 Subject: [Python-3000] removing exception .args In-Reply-To: <46A24A51.1090101@gmail.com> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> <46A24A51.1090101@gmail.com> Message-ID: <46A2A928.9070705@canterbury.ac.nz> Nick Coghlan wrote: > Putting the essential parts in > __new__ means never having to include the instruction that "you must > call this classes __init__ method when subclassing and overriding > __init__" into any API documentation I write. I always assume that I *do* have to call the base __init__ if I override it, unless something explicitly says that I don't. And I assume other people follow the same rule, so I don't feel obliged to spell it out when I document my own classes. -- Greg From dalke at dalkescientific.com Sun Jul 22 03:11:46 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 22 Jul 2007 03:11:46 +0200 Subject: [Python-3000] removing exception .args Message-ID: <D14DF1E2-0737-4D5B-8133-A08FD2F6BA61@dalkescientific.com> Brett: > This was originally proposed in PEP 352. > So while I totally understand the desire to ditch 'args' and just have > 'message', doing so thoroughly and in any reasonable way that is not > painful is not easy thanks to the C API. *sigh* I read through the back python 3k list postings on this. I see this topic is pending further input. which is why I am asking if people are still supportive of this? I can offer nothing there as I don't dwell in the depths of the C API. Does the ".args" needs to be visible to Python code? That would hide the problem, yes? I've been reading the docs, and found the clause related to IOError not having the filename in the args tuple. When an EnvironmentError exception is instantiated with a 3-tuple, the first two items are available as above, while the third item is available on the filename attribute. However, for backwards compatibility, the args attribute contains only a 2-tuple of the first two constructor arguments. At the very least, could this be fixed? Andrew dalke at dalkescientific.com From greg.ewing at canterbury.ac.nz Sun Jul 22 03:09:05 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jul 2007 13:09:05 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070721181442.48FB03A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> Message-ID: <46A2AE31.2080105@canterbury.ac.nz> Phillip J. Eby wrote: > I.e., customers usually don't give you a step-by-step, "well, first I > check if the customer has an outstanding balance before I ship them > anything." They say, "Don't ship stuff to people with an outstanding balance." In my experience, customers often give you a vague, incomplete and even contradictory set of rules. It takes a lot of careful thought to refine them into something complete and coherent, and it requires considering all the rules together to see how they interact with each other. The GF approach encourages scattering the rules over different parts of the program, and I can't see how that helps with this process. -- Greg From dalke at dalkescientific.com Sun Jul 22 03:16:02 2007 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 22 Jul 2007 03:16:02 +0200 Subject: [Python-3000] removing exception .args In-Reply-To: <D14DF1E2-0737-4D5B-8133-A08FD2F6BA61@dalkescientific.com> References: <D14DF1E2-0737-4D5B-8133-A08FD2F6BA61@dalkescientific.com> Message-ID: <261196CB-F8BB-4727-96B5-EDDAEA12E54B@dalkescientific.com> Andrew Dalke: > Does the ".args" needs to be visible to Python code? > That would hide the problem, yes? I see I'm not getting all messages on this thread. Looked at the archive and saw: Guido: > So -1 on removing e.args. I'd be okay with a recommendation not to > rely on it and to define explicitly named attributes for everything > one cares for. Okay. Sounds like the best that can happen. Andrew dalke at dalkescientific.com From greg.ewing at canterbury.ac.nz Sun Jul 22 03:28:27 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jul 2007 13:28:27 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070721181442.48FB03A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> Message-ID: <46A2B2BB.9070305@canterbury.ac.nz> Phillip J. Eby wrote: > Well, I've worked with people who dislike OO for exactly the same > reason, since they feel they can never know whether a method might > have been overridden in a subclass. I think there's a considerable difference in degree here, though. When you call a method, you know you're delegating responsibility to the object for carrying out that operation. And you know you're delegating it to that object and no other, so given the run-time type you can find the code that gets called fairly easily. With GFs that require overloadable functions to be declared as such, you know when you call one that you're delegating to something. But it's a lot less clear what you're delegating to or where. Any or all of the arguments could be determining which piece of code gets called, and the code could be in a much wider variety of places, not necessarily even near any of the classes involved. If any function can be overloaded, then *any* call could potentially be delegating somewhere, increasing the range of possible behaviours even more. -- Greg From greg.ewing at canterbury.ac.nz Sun Jul 22 03:46:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 22 Jul 2007 13:46:24 +1200 Subject: [Python-3000] removing exception .args In-Reply-To: <D14DF1E2-0737-4D5B-8133-A08FD2F6BA61@dalkescientific.com> References: <D14DF1E2-0737-4D5B-8133-A08FD2F6BA61@dalkescientific.com> Message-ID: <46A2B6F0.9080903@canterbury.ac.nz> Andrew Dalke wrote: > However, for backwards compatibility, the args attribute > > contains only a 2-tuple of the first two constructor arguments. This is a good reason for having named attributes instead of a tuple -- it's extensible without requiring these sorts of hacks. As for the C function problem -- are these functions instantiating some known exception class? If so, why can't that class be given an __init__ that accepts the appropriate arguments positionally and stores them as attributes (or passes them on as keywords args as per my earlier suggestion)? -- Greg From pje at telecommunity.com Sun Jul 22 03:58:49 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 21 Jul 2007 21:58:49 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A2B2BB.9070305@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> Message-ID: <20070722015630.8F34C3A403A@sparrow.telecommunity.com> At 01:28 PM 7/22/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > Well, I've worked with people who dislike OO for exactly the same > > reason, since they feel they can never know whether a method might > > have been overridden in a subclass. > >I think there's a considerable difference in degree here, >though. When you call a method, you know you're delegating >responsibility to the object for carrying out that operation. >And you know you're delegating it to that object and no >other, so given the run-time type Well, if you're looking at *run-time*, then you can equally well dump out the runtime contents of a generic function, complete with modules, filenames, and line numbers of every method. In the peak.rules.core case, that operation would look something like: from peak.rules.core import rules_for print list(rules_for(somefunc)) Although you'd probably want nicer formatting. But that wouldn't be hard to add. >If any function can be overloaded, then *any* call could >potentially be delegating somewhere, increasing the range >of possible behaviours even more. That's exactly true of today's Python, and always has been. Heck, somebody can change a class' __bases__ at runtime, or change the class of an object on the fly. I don't think that anybody's saying that unrestricted use of dynamism is good, or that it can't be abused. However, the potential for abuse is no different. If anything, generic functions allow more *structured* dynamism, because two different modules can safely add methods to a function, instead of being tempted to reimplement and monkeypatch it. From pje at telecommunity.com Sun Jul 22 04:06:40 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 21 Jul 2007 22:06:40 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A2AE31.2080105@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> Message-ID: <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> At 01:09 PM 7/22/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > I.e., customers usually don't give you a step-by-step, "well, first I > > check if the customer has an outstanding balance before I ship them > > anything." They say, "Don't ship stuff to people with an > outstanding balance." > >In my experience, customers often give you a vague, >incomplete and even contradictory set of rules. It >takes a lot of careful thought to refine them into >something complete and coherent, and it requires >considering all the rules together to see how they >interact with each other. Which is why it's good to be able to group those rules *together* -- especially grouping one customer's rules separately from another's. Putting them both into your core code would make the system harder to understand, and harder to distinguish the rules applying to that customer. >The GF approach encourages scattering the rules >over different parts of the program, You seem to be saying that the ability to put things in different places encourages disorganization. I claim the contrary: being able to put GF methods in different places means that you are able to put things in a *more* logical organization than is possible with only classes. Yes, it certainly *enables* you to be more disorganized, if that's what you wish. But why would you *do* that? It makes no sense. It seems to me that by that argument, we shouldn't have modules, because people might put a class and its subclass in two different modules. But that's a *feature*, because it lets you organize things according to other dimensions that might be more important to understanding the program, than the inheritance relationship between classes. From ncoghlan at gmail.com Sun Jul 22 05:26:11 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 22 Jul 2007 13:26:11 +1000 Subject: [Python-3000] removing exception .args In-Reply-To: <46A2A928.9070705@canterbury.ac.nz> References: <9A9F27CC-D660-4C09-8D8C-5C4DDD66D2E6@dalkescientific.com> <46A24A51.1090101@gmail.com> <46A2A928.9070705@canterbury.ac.nz> Message-ID: <46A2CE53.9070701@gmail.com> Greg Ewing wrote: > Nick Coghlan wrote: >> Putting the essential parts in >> __new__ means never having to include the instruction that "you must >> call this classes __init__ method when subclassing and overriding >> __init__" into any API documentation I write. > > I always assume that I *do* have to call the base __init__ > if I override it, unless something explicitly says that > I don't. And I assume other people follow the same rule, > so I don't feel obliged to spell it out when I document > my own classes. Andrew actually pointed out a flaw in my suggestion - if the person subclassing wants to change the constructor signature, they end up needing to override both __new__ and__init__, rather than just __init__. So the implementation trick is exposed more than I thought, and the idea is far less useful outside of tightly controlled class hierarchies (which is where I've personally used it). /end tangent-that-I'd-regret-bringing-up-except-for-the-fact-that-I-learnt-something Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From joe at bitworking.org Sun Jul 22 06:35:14 2007 From: joe at bitworking.org (Joe Gregorio) Date: Sun, 22 Jul 2007 00:35:14 -0400 Subject: [Python-3000] pyexpat: returns_unicode str/unicode branch Message-ID: <3f1451f50707212135l56f90d56p4088957d12ab36cd@mail.gmail.com> On 7/21/07, Fred L. Drake, Jr. <fdrake at acm.org> wrote: > On Saturday 21 July 2007, Joe Gregorio wrote: > > Should xml.parsers.expat.XMLParser.ParseFile(file) operate on > > both text and binary streams? > > No. XML is a serialization of a markup language containing Unicode character > into an encoded stream. Along the same lines, since all strings are now unicode, should "returns_unicode" be dropped from xmlparser objects? That is, the handler functions will always be passed unicode strings and not utf-8 bytes. Thanks, -joe -- Joe Gregorio http://bitworking.org From martin at v.loewis.de Sun Jul 22 09:56:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 22 Jul 2007 09:56:26 +0200 Subject: [Python-3000] pyexpat: returns_unicode str/unicode branch In-Reply-To: <3f1451f50707212135l56f90d56p4088957d12ab36cd@mail.gmail.com> References: <3f1451f50707212135l56f90d56p4088957d12ab36cd@mail.gmail.com> Message-ID: <46A30DAA.3040204@v.loewis.de> > Along the same lines, since all strings are now unicode, > should "returns_unicode" be dropped from xmlparser objects? > That is, the handler functions will always be passed unicode > strings and not utf-8 bytes. Sure. Martin From martin at v.loewis.de Sun Jul 22 10:00:18 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 22 Jul 2007 10:00:18 +0200 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> Message-ID: <46A30E92.5040400@v.loewis.de> > Sure, normally XML is serialized to bytes, but it is also > serializable to unicode, and that's a useful feature to have (if > implementable). It's not reasonably implementable; users who have use cases will have to encode as UTF-8 first. Regards, Martin From fdrake at acm.org Sun Jul 22 15:50:51 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sun, 22 Jul 2007 09:50:51 -0400 Subject: [Python-3000] pyexpat: returns_unicode str/unicode branch In-Reply-To: <3f1451f50707212135l56f90d56p4088957d12ab36cd@mail.gmail.com> References: <3f1451f50707212135l56f90d56p4088957d12ab36cd@mail.gmail.com> Message-ID: <200707220950.52076.fdrake@acm.org> On Sunday 22 July 2007, Joe Gregorio wrote: > Along the same lines, since all strings are now unicode, > should "returns_unicode" be dropped from xmlparser objects? > That is, the handler functions will always be passed unicode > strings and not utf-8 bytes. Yes. This was always a backward-compatibility point, but it's been a long time since the default was to return UTF-8. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> From guido at python.org Sun Jul 22 17:43:54 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 22 Jul 2007 08:43:54 -0700 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <46A30E92.5040400@v.loewis.de> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> <46A30E92.5040400@v.loewis.de> Message-ID: <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> On 7/22/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > > Sure, normally XML is serialized to bytes, but it is also > > serializable to unicode, and that's a useful feature to have (if > > implementable). > > It's not reasonably implementable; users who have use cases > will have to encode as UTF-8 first. Now I'm confused. Are we proposing that all our XML APIs read and write encoded bytes, or are we proposing that they read and write Unicode strings, leaving the encoding/decoding to the I/O stream? I thought the latter was preferred but now it looks like you're arguing for the former? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Sun Jul 22 17:56:34 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sun, 22 Jul 2007 11:56:34 -0400 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <46A30E92.5040400@v.loewis.de> <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> Message-ID: <200707221156.34992.fdrake@acm.org> On Sunday 22 July 2007, Guido van Rossum wrote: > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream? I > thought the latter was preferred but now it looks like you're arguing > for the former? XML should always be read as bytes, and the output of serialization should be bytes (the Py3k "bytes" type, or some immutable flavor of the same). The APIs that present data parsed from XML, and that accept input that should be serialized in XML, should use Unicode strings (the Py3k "str" type). -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> From martin at v.loewis.de Sun Jul 22 18:30:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 22 Jul 2007 18:30:26 +0200 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> <46A30E92.5040400@v.loewis.de> <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> Message-ID: <46A38622.1010505@v.loewis.de> Guido van Rossum schrieb: > On 7/22/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: >> > Sure, normally XML is serialized to bytes, but it is also >> > serializable to unicode, and that's a useful feature to have (if >> > implementable). >> >> It's not reasonably implementable; users who have use cases >> will have to encode as UTF-8 first. > > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream? Unicode strings in both cases. I was not talking about writing at all; pyexpat only does reading (aka parsing). It returns Unicode strings, but processes bytes. > I > thought the latter was preferred but now it looks like you're arguing > for the former? The XML parser input stream should be byte-oriented. XML has its own notion of input encoding (expressed in the XML declaration, <?xml...); it's the job of the parser to figure it out. Having the user provide a character-oriented stream to the parser is both inconvenient and error-prone: the application would have to figure out the encoding itself first. Regards, Martin From unknown_kev_cat at hotmail.com Sun Jul 22 21:51:51 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Sun, 22 Jul 2007 15:51:51 -0400 Subject: [Python-3000] PEP 368: Standard image protocol and class References: <cc93256f0706301518kd9fe7a7iaf0e9bd8e2e18edd@mail.gmail.com><cc93256f0706301800m20012379n84aff4ff3df88021@mail.gmail.com><740c3aec0707010534j4049efbchb2389bf61413c300@mail.gmail.com><cc93256f0707010959o44c77912sb989c68cf890b846@mail.gmail.com><f7pk8b$evu$1@sea.gmane.org> <46A16803.1020200@canterbury.ac.nz> Message-ID: <f80cgs$4h4$1@sea.gmane.org> "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote in message news:46A16803.1020200 at canterbury.ac.nz... > Joe Smith wrote: >> If the maintainers of most of the large packages that do imaging are >> willing >> to support this, >> and your code is good, I see absolutely no reason why this PEP would not >> be >> accepted. > > Something that bothers me about it a little is that > the core Python/C API seems like the wrong place to put > PyImge_* functions. > The document mentions delivering a version of the code that uses python and C. That would be an extention module, correct? Couldn't those functions be in the C extention? The Docs for 2.5 state that extention modules can provide a C API that other modules can use. (I'm assuming that has not changed). That should work. After all any extention that needs those functions will likely on the python side be importing the Image module anyway, which would require that the C extention for the Image module be loaded. Or am I missing something? If am am not missing anything, this sounds like a minor implementation issue. From greg.ewing at canterbury.ac.nz Mon Jul 23 01:47:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 23 Jul 2007 11:47:39 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070722015630.8F34C3A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> Message-ID: <46A3EC9B.4020507@canterbury.ac.nz> Phillip J. Eby wrote: > Well, if you're looking at *run-time*, then you can equally well dump > out the runtime contents of a generic function, I'm not talking about doing this *at* run time. I'm talking about reasoning about what the program will do, based on your knowledge of what the run-time type will be. With a normal method call, you can take an assumed run-time type, start at one end and follow things through step by step. That's not so easy with generic functions, for two reasons: (1) all of the arguments can potentially influence where the control flow goes, and (2) the overloading code can be anywhere in the program, not confined to the classes involved. I'm not saying this makes GFs impossible to use, but they do make the programmer's world considerably more complicated. You can't just brush these concerns off as being no worse than what OO already provides. > I don't think that anybody's saying that unrestricted use of dynamism is > good, or that it can't be abused. However, the potential for abuse is > no different. I'm not talking about abuse. I'm only talking about using GFs the way they're meant to be used. There's more to think about in the presence of GFs even without any abuse. -- Greg From greg.ewing at canterbury.ac.nz Mon Jul 23 01:48:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 23 Jul 2007 11:48:07 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> Message-ID: <46A3ECB7.9070504@canterbury.ac.nz> Phillip J. Eby wrote: > You seem to be saying that the ability to put things in different places > encourages disorganization. No. What I'm saying is that there are conflicting organisational requirements here. If the things being put in different places were independent and able to be reasoned about in isolation, everything would be fine. But they're not independent, because different overloadings of the same GF can interact, sometimes in subtle ways, and reasoning about their interactions is facilitated by being able to see all the relevant rules together. Even if the rules don't, in fact, interact, it can be hard to convince yourself of this without being sure that you simultaneously know what all the rules are at some point in time. -- Greg From greg.ewing at canterbury.ac.nz Mon Jul 23 01:59:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 23 Jul 2007 11:59:35 +1200 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> <46A30E92.5040400@v.loewis.de> <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> Message-ID: <46A3EF67.8020003@canterbury.ac.nz> Guido van Rossum wrote: > Now I'm confused. Are we proposing that all our XML APIs read and > write encoded bytes, or are we proposing that they read and write > Unicode strings, leaving the encoding/decoding to the I/O stream? The design of XML seems a bit braindamaged here, with the encoding specification being *inside* the XML itself, rather than being something specified externally. It's a bit like a self-opening letter that works by having a letter opener sealed inside the envelope. You can open it, but you have to open it first... If this part of the XML spec is to be taken literally, it would seem that we're forced to treat XML as bytes and not text... despite that XML is supposed to be a text format... aaargh!!! It might make sense to have an XML parser that took a unicode string containing the body of an XML message with the encoding line stripped off. -- Greg From talin at acm.org Mon Jul 23 02:13:47 2007 From: talin at acm.org (Talin) Date: Sun, 22 Jul 2007 17:13:47 -0700 Subject: [Python-3000] str/unicode tests: pyexpat.c and read(n) In-Reply-To: <46A3EF67.8020003@canterbury.ac.nz> References: <3f1451f50707202112ye61385fifb4b2307f7fdf536@mail.gmail.com> <200707210025.11031.fdrake@acm.org> <59C0A7B2-B334-4984-AA8E-CA024B73553B@fuhm.net> <46A30E92.5040400@v.loewis.de> <ca471dc20707220843i345c0fdcld852fb9f26a97b04@mail.gmail.com> <46A3EF67.8020003@canterbury.ac.nz> Message-ID: <46A3F2BB.7060408@acm.org> Greg Ewing wrote: > Guido van Rossum wrote: >> Now I'm confused. Are we proposing that all our XML APIs read and >> write encoded bytes, or are we proposing that they read and write >> Unicode strings, leaving the encoding/decoding to the I/O stream? > > The design of XML seems a bit braindamaged here, with the > encoding specification being *inside* the XML itself, > rather than being something specified externally. It's > a bit like a self-opening letter that works by having > a letter opener sealed inside the envelope. You can > open it, but you have to open it first... All of the popular XML parsers have self-bootstrapping code that handles detection of the encoding, including auto-detection when no encoding is specified. So basically - don't worry about it, it's taken care of. -- Talin From pje at telecommunity.com Mon Jul 23 02:48:54 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 22 Jul 2007 20:48:54 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A3EC9B.4020507@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> Message-ID: <20070723004703.C3A903A40A9@sparrow.telecommunity.com> At 11:47 AM 7/23/2007 +1200, Greg Ewing wrote: >With a normal method call, you can take an assumed >run-time type, start at one end and follow things >through step by step. That's not so easy with >generic functions, for two reasons: (1) all of the >arguments can potentially influence where the >control flow goes, and (2) the overloading code >can be anywhere in the program, not confined to >the classes involved. In order to follow things through with normal method calls, you have to know where a class is in the program, implying that you either search for it, or have read enough of the program to figure it out. Which of these two things is different with generic functions? (Meanwhile, if you are "starting at one end" and "follow things through step-by-step", then you are going to step right through all the method definitions, regardless of whether they're standard methods or GF methods.) >I'm not saying this makes GFs impossible to use, >but they do make the programmer's world considerably >more complicated. Since they make my world simpler, I'd have to disagree with such a blanket statement. (I imagine the other developers who are using them would similarly disagree.) If your argument is that it might make it more difficult for you to know what's going on in a poorly-organized program, or make it easier to write a poorly-organized program, I might agree with you. But I disagree in the general case, because if you're going to be grepping for 'foo', it doesn't matter whether it's a method name or a generic function name -- you're still going to find all the definitions. >You can't just brush these concerns >off as being no worse than what OO already provides. Actually I can, and just did. Grep (or whatever global search tool your editor provides) is your friend. It ain't perfect, but it's just as much required (and equally imperfect) for global analysis of a traditionally-OO program. From pje at telecommunity.com Mon Jul 23 03:10:09 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 22 Jul 2007 21:10:09 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A3ECB7.9070504@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> Message-ID: <20070723010750.E27693A40A9@sparrow.telecommunity.com> At 11:48 AM 7/23/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > You seem to be saying that the ability to put things in different places > > encourages disorganization. > >No. What I'm saying is that there are conflicting organisational >requirements here. > >If the things being put in different places were independent >and able to be reasoned about in isolation, everything would >be fine. And that is in fact the *normal* case, even in GF use. You seem to be arguing that possible == probable, when it simply ain't so. > But they're not independent, because different >overloadings of the same GF can interact, sometimes in >subtle ways, and reasoning about their interactions is >facilitated by being able to see all the relevant rules >together. Yeah, and a program *can* be full of monkeypatching and change classes' __bases__ at runtime, but most people don't write their code that way, most of the time. The whole point of GF's is that they make things *simpler*, because you can usually avoid the sort of awkwardness that accompanies trying to do those things *without* GF's. (E.g. adapters, registries, and the like -- which are just as hard to analyze statically.) Consider, too, that merely combining super() with multiple inheritance can produce very surprising results in today's Python. You cannot statically predict what method super() is going to call by looking at the code of the class that calls it. (Because a subclass can effectively insert bases between the class and its explicit bases.) In other words, if you want to know what's going on in a Python program today with regard to today's method combination next_method() feature (which we call super()), you already have to grep for *all* the method definitions. And this little bit of extra complexity doesn't even have a method combination decorator to call out that subtlety to you; you have to look in the method *body*. Even next_method has to at least be listed in the argument list. :) >Even if the rules don't, in fact, interact, it can be hard >to convince yourself of this without being sure that you >simultaneously know what all the rules are at some point >in time. Well, as I said before, you can always run the program and dump out the entire list, complete with filenames and line numbers if you're so inclined. That's certainly what I'd do, were I investigating some code I was unfamiliar with. And fancier tools could certainly be created, if they were needed. Python already has each and every one of the things you're complaining about, as binary operators depend on multiple argument values (and you have to know *both* types in order to work out the result), the method being called by super() can't be statically predicted any more than next_method(), can, and you already have to use grep if you're going after global understanding of a large program. If anything, generic functions give you *better* tools to work with, as there is no trivial way to fire up a program and say, "show me all the classes that have a foo() method." (You could probably write something to find them using object.__subclasses__, though, at least for new-style types.) From talin at acm.org Mon Jul 23 09:07:51 2007 From: talin at acm.org (Talin) Date: Mon, 23 Jul 2007 00:07:51 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070723010750.E27693A40A9@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> Message-ID: <46A453C7.9070407@acm.org> Phillip J. Eby wrote: > If anything, generic functions give you *better* tools to work with, > as there is no trivial way to fire up a program and say, "show me all > the classes that have a foo() method." (You could probably write > something to find them using object.__subclasses__, though, at least > for new-style types.) I'm glad we're having this conversation - this is the kind of thing I want to hear more of. The intention of my posts is not to argue against GFs, but to challenge the proponents of GFs to explain themselves better. However, GFs are relatively non-controversial compared to method combinations and some of the other "advanced" stuff. Getting some kind of GF support into 3.0 is a near certainty at this point, if I have judged the situation rightly. So you need not waste too much ink defending them. I would focus more on the stuff that's built on top of GFs. -- Talin From ncoghlan at gmail.com Mon Jul 23 15:09:53 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 23 Jul 2007 23:09:53 +1000 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A3EC9B.4020507@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> Message-ID: <46A4A8A1.2020705@gmail.com> Greg Ewing wrote: > Phillip J. Eby wrote: >> I don't think that anybody's saying that unrestricted use of dynamism is >> good, or that it can't be abused. However, the potential for abuse is >> no different. > > I'm not talking about abuse. I'm only talking about using > GFs the way they're meant to be used. There's more to > think about in the presence of GFs even without any > abuse. GF's are already used all the time in Python - they're just called magic methods. So I'll assume you're happy with the idea that if you want to analyse the expression: d[a+b] statically in current Python, you need to look for __add__ and __radd__ methods on both 'a' and 'b' (assuming you know their types), and __hash__ and __eq__ methods on whatever type is returned from that operation, and then a __getitem__ method on the type of 'd' (again, assuming you already know it). In all cases, the methods might not actually be on those particular types, but on one of their parent types. And if there are any invocations of super() in any of the method implementations, then you need to take the MRO into account as well. Of course, most of the time you wouldn't bother with that level of analysis unless you had reason to believe something was going wrong with that expression. Otherwise, you would assume that all of the magic methods involved were performing as expected. So what's different if we change that expression to use GF's instead?: get_mapping_item(d, binary_add(a, b)) Well, nothing really, except that instead of looking for the magic methods referred to above, we are instead looking for all overloads of get_mapping_item and binary_add. And the big benefit here is that whatever techniques you come up with for searching for those overloads will work for *any* GF implemented using the same tools, whereas the search for magic methods only works in some cases. For example, what would you need to search for to figure out what code copy.copy, copy.deepcopy, pickle.dumps or pickle.loads invoke for a given type? It's significantly more complicated than just looking for single magic methods. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Mon Jul 23 17:32:50 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 23 Jul 2007 11:32:50 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A453C7.9070407@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> Message-ID: <20070723153031.D00273A403D@sparrow.telecommunity.com> At 12:07 AM 7/23/2007 -0700, Talin wrote: >Phillip J. Eby wrote: > > If anything, generic functions give you *better* tools to work with, > > as there is no trivial way to fire up a program and say, "show me all > > the classes that have a foo() method." (You could probably write > > something to find them using object.__subclasses__, though, at least > > for new-style types.) > >I'm glad we're having this conversation - this is the kind of thing I >want to hear more of. The intention of my posts is not to argue against >GFs, but to challenge the proponents of GFs to explain themselves better. > >However, GFs are relatively non-controversial compared to method >combinations and some of the other "advanced" stuff. Well, as I just pointed out (and Greg has in the past, whether meaning to or not), method combination is pretty much isomorphic to method overriding and calling super()... except that it's easier to say what you really mean, instead of having to work around the fact that there's only one native precedence. For example, one pattern that sometimes comes up in writing methods is that you have a base class that always wants to do something *after* the subclass version of the method is called. To implement that without method combination, you have to split the method into two parts, one of which gets called by the other, and then tell everybody writing subclasses to only override the second method. With method combination and a generic function, you simply declare an @after method for the base type, and it'll get called after the normal methods for any subclasses. From pje at telecommunity.com Mon Jul 23 17:34:27 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 23 Jul 2007 11:34:27 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A4A8A1.2020705@gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <46A4A8A1.2020705@gmail.com> Message-ID: <20070723153210.EA5ED3A403D@sparrow.telecommunity.com> At 11:09 PM 7/23/2007 +1000, Nick Coghlan wrote: >And the big benefit here is that whatever techniques you come up with >for searching for those overloads will work for *any* GF implemented >using the same tools, By the way, this is one of the reasons why it would be good to have a relatively uniform API for generic functions in Python. From joe at bitworking.org Mon Jul 23 18:29:41 2007 From: joe at bitworking.org (Joe Gregorio) Date: Mon, 23 Jul 2007 12:29:41 -0400 Subject: [Python-3000] str/uni - test_pyexpat.py Message-ID: <3f1451f50707230929q586015ady464d09be3205c4bb@mail.gmail.com> I've submitted the following patch to fix test_pyexpat.py: http://www.python.org/sf/1759016 Part of the fix was to remove the 'returns_unicode' attribute. Should the updates to the documentation be added to this patch or submitted as a separate patch? Thanks, -joe -- Joe Gregorio http://bitworking.org From guido at python.org Mon Jul 23 19:43:45 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 23 Jul 2007 10:43:45 -0700 Subject: [Python-3000] str/uni - test_pyexpat.py In-Reply-To: <3f1451f50707230929q586015ady464d09be3205c4bb@mail.gmail.com> References: <3f1451f50707230929q586015ady464d09be3205c4bb@mail.gmail.com> Message-ID: <ca471dc20707231043k2137d85bp771bf074f943165d@mail.gmail.com> On 7/23/07, Joe Gregorio <joe at bitworking.org> wrote: > I've submitted the following patch to fix test_pyexpat.py: > > http://www.python.org/sf/1759016 > > Part of the fix was to remove the 'returns_unicode' attribute. > Should the updates to the documentation be added to this > patch or submitted as a separate patch? Thanks! I've submitted this as r56512. An all-in-one patch is fine. Since I'm not an expat expert, could someone else check the code? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Jul 24 01:58:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 24 Jul 2007 11:58:10 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070723004703.C3A903A40A9@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> Message-ID: <46A54092.8030606@canterbury.ac.nz> Phillip J. Eby wrote: > In order to follow things through with normal method calls, you have to > know where a class is in the program, implying that you either search > for it, or have read enough of the program to figure it out. > > Which of these two things is different with generic functions? A class is defined in just one place, or a limited number of places if it has base classes. It also provides a convenient mental chunk under which to group all the operations that it implements. With GFs, there is no such obvious mental grouping. > if you're going to be > grepping for 'foo', it doesn't matter whether it's a method name or a > generic function name -- you're still going to find all the definitions. No, you're going to find every function whose name is 'foo', whether it's a method of the particular GF you have in mind or not. A considerably smarter tool than grep would be needed. > Since they make my world simpler, Are you talking about code that you've written yourself here, or do you find they make code written by others easier to understand as well? I'd have to disagree with such a blanket statement. > Grep (or whatever global search tool your > editor provides) is your friend. It ain't perfect, but it's just as > much required (and equally imperfect) for global analysis of a > traditionally-OO program. Most of the time I find that I don't need to perform global analysis of a traditionally-OO paradigm. The conceptual encapsulation provided by classes makes that unnecessary. GF breaks that encapsulation, or at least to my mind it seems to, and that makes me uncomfortable. -- Greg From pje at telecommunity.com Tue Jul 24 02:51:09 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 23 Jul 2007 20:51:09 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A54092.8030606@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> Message-ID: <20070724004850.2F2343A403D@sparrow.telecommunity.com> At 11:58 AM 7/24/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > In order to follow things through with normal method calls, you have to > > know where a class is in the program, implying that you either search > > for it, or have read enough of the program to figure it out. > > > > Which of these two things is different with generic functions? > >A class is defined in just one place, or a limited number >of places if it has base classes. ...and may be subclassed in an unlimited number of places. A generic function is defined in just one place, with a limited number of "generic" methods typically adjoining it, and may be extended in an unlimited number of places. Where's the difference? >It also provides a convenient mental chunk under which to >group all the operations that it implements. With GFs, there >is no such obvious mental grouping. The function itself is the grouping, in the same way that Python's operator.* functions are, or its built-in generics like len() and iter(). len() encapsulates the concept of "sequence", just as iter() encapsulates "iterable", and operator.add encapsulates "addition". These are conceptual categories that can't be defined by classes, except by conventions like ABCs -- and ISTR the ABCs PEP ran into trouble dealing with n-ary operators where n>1. > > if you're going to be > > grepping for 'foo', it doesn't matter whether it's a method name or a > > generic function name -- you're still going to find all the definitions. > >No, you're going to find every function whose name is 'foo', >whether it's a method of the particular GF you have in mind >or not. And this doesn't apply to normal methods? Come on. This is far *more* likely to be a problem with normal methods than it is with generic functions. For one thing, you can isolate your search to modules that import the function being overridden -- something you can't do with normal methods. > > Since they make my world simpler, > >Are you talking about code that you've written yourself here, >or do you find they make code written by others easier to >understand as well? Yes, I find code written using generics to be generally easier to understand, because it's possible to grasp a generic operator without needing to understand all the classes it can be applied to. For example, the generic function operator.add in Python defines the concept of addition, without me needing to understand all possible types that might be added together. And since all non-trivial Python code already uses generic functions, I find that they do in fact make all Python code simpler to understand. Indeed, they're a significant contributor to Python's ease-of-use. PEP 3124 seeks to expand that ease by allowing people to easily add their own generic functions, without needing to use workarounds like interfaces and adapters. >I'd have to disagree with such a blanket statement. The thing that you seem to keep missing in your analysis is that Python already *has* generic functions in the language specification, and has had them for what, 10, 15 years? If any of these problems you're talking about actually existed, I think we'd already know about them. Or are you arguing that functions like len() and iter() make progams harder to understand in all the same ways that you're saying that adding a standard GF library will? > > Grep (or whatever global search tool your > > editor provides) is your friend. It ain't perfect, but it's just as > > much required (and equally imperfect) for global analysis of a > > traditionally-OO program. > >Most of the time I find that I don't need to perform global >analysis of a traditionally-OO paradigm. The conceptual >encapsulation provided by classes makes that unnecessary. >GF breaks that encapsulation, or at least to my mind it >seems to, and that makes me uncomfortable. That's because you're ignoring the GFs (and operators implemented as GF's) that you use all day long in even the most trivial of Python programs, let alone ones that use pickle or copy or pprint. Even computing a sum such as 2+2 involves a generic function in Python! All PEP 3124 proposes to do is have a standard API for programmatically adding methods to generic functions, irrespective of how those functions are internally implemented. Its decorators are to generic functions what 'setattr()' is to objects: i.e., a generic function for manipulating their contents. It doesn't really "add generic functions to Python", because Python already had them. From greg.ewing at canterbury.ac.nz Tue Jul 24 02:54:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 24 Jul 2007 12:54:38 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070723010750.E27693A40A9@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> Message-ID: <46A54DCE.8050205@canterbury.ac.nz> Phillip J. Eby wrote: > And that is in fact the *normal* case, even in GF use. You seem to be > arguing that possible == probable, when it simply ain't so. No, I'm saying that it's hard to convince myself that I'm not going to fall into one of the possible traps, even if it's an improbable one. When adding an overload to a GF, what methodology can I follow to ensure that my overload doesn't interact in an unfortunate way with another one somewhere else, perhaps one not written by me? If the only answer to that is "grep the entire program for things that might be other overloadings of this GF", that doesn't do much to allay my misgivings. > Yeah, and a program *can* be full of monkeypatching and change classes' > __bases__ at runtime, but most people don't write their code that way, > most of the time. The difference is that we're talking about a system specifically *designed* for carrying out monkeypatching. I don't care what you call it, it still looks like monkeypatching to me. The fundamental reason that we think monkeypatching is a bad idea is still there -- something done by one part of the program can affect the behaviour of another part with no obvious connection. > The whole point of GF's is that they make things > *simpler*, because you can usually avoid the sort of awkwardness that > accompanies trying to do those things *without* GF's. (E.g. adapters, > registries, and the like -- which are just as hard to analyze statically.) Yes, but as far as I can see, GFs don't make these things much *easier* to analyse statically. Registries are awkward because of that difficulty, not because they're hard to implement. > Consider, too, that merely combining super() with multiple inheritance > can produce very surprising results in today's Python. Yes, which is largely why I've personally never used super(), and regard it as a misfeature. I wouldn't mind if it went away completely. > Well, as I said before, you can always run the program and dump out the > entire list, complete with filenames and line numbers if you're so > inclined. Even once I've got such a list, I've then got to examine it carefully and try to nut out the implications of all the type relationships, before/after/around/discount/etc method cominations, and whathaveyou. Yes, I know you already get some of this with multiple inheritance -- which is why I use it very rarely and very carefully. Also the complexities tend to be confined to the class doing the multiple inheriting and only need to be considered by the author of that class, not everyone who uses it. And what if the program doesn't exist yet, because I'm still thinking about how to write it? Or it exists but isn't yet in a state where it can be run successfully? > binary operators depend on multiple argument values (and you > have to know *both* types in order to work out the result) Yes, that can be a bit more complex, but at least the method that gets called has to belong to one class or the other. Also it's easier to follow nowadays with the auto-coercion system being phased out -- the left operand gets first say, and if it doesn't care, the right operand gets its say. -- Greg From pje at telecommunity.com Tue Jul 24 03:39:17 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 23 Jul 2007 21:39:17 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A54DCE.8050205@canterbury.ac.nz> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A54DCE.8050205@canterbury.ac.nz> Message-ID: <20070724013722.B2F5B3A403D@sparrow.telecommunity.com> At 12:54 PM 7/24/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > And that is in fact the *normal* case, even in GF use. You seem to be > > arguing that possible == probable, when it simply ain't so. > >No, I'm saying that it's hard to convince myself that I'm >not going to fall into one of the possible traps, even if >it's an improbable one. > >When adding an overload to a GF, what methodology can I >follow to ensure that my overload doesn't interact in an >unfortunate way with another one somewhere else, perhaps >one not written by me? What methodology can you follow that ensures that same thing when overriding a method in a subclass? > > Yeah, and a program *can* be full of monkeypatching and change classes' > > __bases__ at runtime, but most people don't write their code that way, > > most of the time. > >The difference is that we're talking about a system >specifically *designed* for carrying out monkeypatching. >I don't care what you call it, it still looks like >monkeypatching to me. You're not looking very hard, then. Is this excerpt from peak.rules.core monkeypatching? def implies(s1,s2): """Is s2 always true if s1 is true?""" return s1==s2 from types import ClassType when(implies, (type, type) )(issubclass) when(implies, (ClassType, ClassType))(issubclass) when(implies, (type, ClassType))(issubclass) when(implies, (bool, bool))(lambda c1, c2: c2 or not c1) when(implies, (bool, object))(lambda c1, c2: not c1) when(implies, (object, bool))(lambda c1, c2: c2) To me, this looks like a straightforward explanation of the implication rules between new-style and classic classes and boolean values. In fact, it seems much more straightforward to me, than writing out a big if-then tree whose *intent* I would have to discern from comments or the structure of the tree itself. And if I had to discern the intent from the structure of the if tree, I would have no way of knowing whether the if's as written were in fact *correct*. I could mistake a bug for the author's intention in that case. This is just one of the ways in which generic functions can be a superior tool for code understanding -- even in the complete absence of anything that can be described as "monkeypatching". In truth, every interface or abstract base class is just another way of specifying a generic function. When you say that objects implementing a certain interface or protocol must have a 'foo' method, then any subclass may add a new *actual* implementation of 'foo' -- which is no different from adding a method to a generic function for a new type. > The fundamental reason that >we think monkeypatching is a bad idea is still there -- >something done by one part of the program can affect >the behaviour of another part with no obvious connection. It's FUD to try to associate monkeypatching with GF's. Generic functions have *none* of the bad effects of monkeypatching. Monkeypatching is bad because: 1. It's hard to see 2. Can't be safely composed (i.e. multiple monkeypatches) without introducing dependency order at best and bugs at worst GF method additions are highly visible, and are safely composable, since more-specific methods override each other, and only truly independent methods can "float" as to execution order. > > The whole point of GF's is that they make things > > *simpler*, because you can usually avoid the sort of awkwardness that > > accompanies trying to do those things *without* GF's. (E.g. adapters, > > registries, and the like -- which are just as hard to analyze statically.) > >Yes, but as far as I can see, GFs don't make these things >much *easier* to analyse statically. Registries are awkward >because of that difficulty, not because they're hard >to implement. >... >Yes, which is largely why I've personally never used super(), >and regard it as a misfeature. I wouldn't mind if it went >away completely. >... >Even once I've got such a list, I've then got to examine it >carefully and try to nut out the implications of all the >type relationships, before/after/around/discount/etc method >cominations, and whathaveyou. >... >Yes, I know you already get some of this with multiple >inheritance -- which is why I use it very rarely and very >carefully. Also the complexities tend to be confined to the >class doing the multiple inheriting and only need to be >considered by the author of that class, not everyone who >uses it. Okay, well I guess the above statements all put you squarely in the "OO is too scary" category, so I'm not sure there's much else I can say that'd be useful. Keep in mind, however, that without a *standard* way of doing GF's, you will have to figure out *each* library or program's ad-hoc workarounds, instead of simply getting to know One Obvious Way of doing it. >And what if the program doesn't exist yet, because I'm >still thinking about how to write it? Or it exists but >isn't yet in a state where it can be run successfully? I don't understand what you're asking, here. > > binary operators depend on multiple argument values (and you > > have to know *both* types in order to work out the result) > >Yes, that can be a bit more complex, but at least the method >that gets called has to belong to one class or the other. >Also it's easier to follow nowadays with the auto-coercion >system being phased out -- the left operand gets first say, >and if it doesn't care, the right operand gets its say. Oh really? Are you sure about that? I was under the impression that under certain circumstances, if one object is "more specific" than the other (i.e., one is an instance of a subclass of the other's type), then that one gets first say. From guido at python.org Tue Jul 24 04:57:37 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 23 Jul 2007 19:57:37 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070724004850.2F2343A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> <20070724004850.2F2343A403D@sparrow.telecommunity.com> Message-ID: <ca471dc20707231957n2e58258v7b86b904803890dd@mail.gmail.com> On 7/23/07, Phillip J. Eby <pje at telecommunity.com> wrote: > At 11:58 AM 7/24/2007 +1200, Greg Ewing wrote: > >A class is defined in just one place, or a limited number > >of places if it has base classes. > > ...and may be subclassed in an unlimited number of places. > > A generic function is defined in just one place, with a limited > number of "generic" methods typically adjoining it, and may be > extended in an unlimited number of places. > > Where's the difference? Phillip, you seem to be dead set on providing a mathematical proof that the two are equivalent. Unfortunately, my gut tells me otherwise, and it doesn't want to listen to mathematical proofs. It's like proofs of God's (non-)existence. They don't work unless you're already in agreement with the outcome. Fact is, many people, including me, are uncomfortable with the idea that a GF can be overridden *anywhere*. I am not letting that get in the way of acknowledging the value of GFs, but I don't think it's worth trying to take this fear away by attempting to prove that it is irrational. Irrationality, as the name implies, is not susceptible to rational argument. I could come up with several reasons why it's not the same at all, but I'm not going to bother, because it'll just encourage you to deny it even harder. I think the argument (from both sides) is irrelevant; you're wasting your valuable time and energy that would much better directed towards updating the PEP and writing an implementation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue Jul 24 05:42:23 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 23 Jul 2007 23:42:23 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707231957n2e58258v7b86b904803890dd@mail.gmail.com > References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> <20070724004850.2F2343A403D@sparrow.telecommunity.com> <ca471dc20707231957n2e58258v7b86b904803890dd@mail.gmail.com> Message-ID: <20070724034006.9B23E3A403D@sparrow.telecommunity.com> At 07:57 PM 7/23/2007 -0700, Guido van Rossum wrote: >On 7/23/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > At 11:58 AM 7/24/2007 +1200, Greg Ewing wrote: > > >A class is defined in just one place, or a limited number > > >of places if it has base classes. > > > > ...and may be subclassed in an unlimited number of places. > > > > A generic function is defined in just one place, with a limited > > number of "generic" methods typically adjoining it, and may be > > extended in an unlimited number of places. > > > > Where's the difference? > >Phillip, you seem to be dead set on providing a mathematical proof >that the two are equivalent. Actually, I don't consider them equivalent; I consider each to have its own benefits and drawbacks. For example, GF declarations are more verbose than traditional methods, both at definition and call time. I just don't see that the things Greg is describing aren't equally applicable to traditional methods. >I could come up with several reasons why it's not the same at all, I'm genuinely curious as to what those are. If you have the chance to send them to me privately, I'll use them only to improve the PEP -- and I won't reply here or privately. :) From talin at acm.org Tue Jul 24 06:44:00 2007 From: talin at acm.org (Talin) Date: Mon, 23 Jul 2007 21:44:00 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070724034006.9B23E3A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> <20070724004850.2F2343A403D@sparrow.telecommunity.com> <ca471dc20707231957n2e58258v7b86b904803890dd@mail.gmail.com> <20070724034006.9B23E3A403D@sparrow.telecommunity.com> Message-ID: <46A58390.8050408@acm.org> Phillip J. Eby wrote: > I just don't see that the things Greg is describing aren't equally > applicable to traditional methods. I wasn't going to get into this, but - since you asked :) The short form of the argument is that being able to overload any function as a generic function retroactively changes the implicit contract of what that function is. I agree with you that the problem of tracing down all of the places where a GF could dispatch to is analogous to tracing down all the places where a subclass could override a method. I would argue, though, that the "subclass analogy" that you have raised (which is a good one) corresponds most closely to the "explicit overload" GF design. In other words, when I create a base class, I know at the time I am writing it that because it is a class, its methods may be overloaded by someone later; And this knowledge is something that I factor in to the design of the class as I am writing it. (This foreknowledge is even more relevant in languages like C++ and Java where you can explicitly control on a per-method basis whether it is overridable or not. Regardless of what you think of these languages, I think we can all agree that programmers depend on the ability of the 'virtual' or 'final' keywords to control what subclass writers are able to do.) So I would say that writing a subclass is exactly like explicitly declaring a generic function: At the time I write the function, I know that people may come along later and overload that function, and I factor that knowledge into the design of the function as I am writing. By extension, I claim that your analogy breaks down when we start talking about adding overloads to a function that was not originally declared as generic. The reason is because in this case, the original author of the function did not expect that someone would be able to come along and overload it later. The ability to overload has always been part of the implicit contract of creating a class. It has never been part of the implicit contract of writing a function or method. So essentially, you are going back to all the functions that have ever been written and changing that implicit contract retroactively. (I'm not claiming that this can never be done, I'm explaining why you are getting this reaction from Greg and Guido.) In the case of __magic__ overloads, they too are explicitly declared: Only in this case, the explicit declaration either in the wrapper function (such as len(x)), or in some cases the 'declaration' is hidden inside the Python interpreter, but everyone knows about it (an example being __init__). More broadly, everyone knows in advance that a method having a name of the __magic__ form is intended to be a specialization of a general pattern. Now, it's not that hard, for a given function, to use grep to trace down the possible GFs that may be overloading that specific function. But that's only if you have foreknowledge of which functions are overloaded and which aren't. There are thousands of functions in a typical program (well, more accurately there are thousands of *methods*, and relatively few global functions). Suppose that 5% of them are overloaded, but you have no idea which 5% of them are. Trying to search for each of them to see what overloads there are is an N^2 problem, and very different, I would claim, than the situation with subclassing. (Although admittedly, this problem is really only acute when we talk about non-instance-method functions, since the implicit constraints on the 'self' parameter already limit the search space for possible overloads of instance methods. Although with adaption and bound methods, anything can act like an instance method, so I would guess all bets are off...) Now, it may be interesting to compare the implicit overloading with C++ overloaded methods. C++ also allow any function to be overloaded without explicitly declaring "overloadability", although the overload resolution happens in the compiler rather than in the runtime. But note, however, that this overloading is also carefully hemmed in, because only overloads that are actually in scope at the time of the call will actually take effect. So again, the search space for finding overloads is less than global, and you only need look in header files and scopes that are visible to the calling site, which will typically be a small fraction of the total source code for an application. So I hope that explains why overloading regular functions is perceived by some people to be of a different order than overloading class methods. -- Talin From ncoghlan at gmail.com Tue Jul 24 14:25:57 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 24 Jul 2007 22:25:57 +1000 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070724013722.B2F5B3A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A54DCE.8050205@canterbury.ac.nz> <20070724013722.B2F5B3A403D@sparrow.telecommunity.com> Message-ID: <46A5EFD5.80008@gmail.com> Phillip J. Eby wrote: > At 12:54 PM 7/24/2007 +1200, Greg Ewing wrote: >> > binary operators depend on multiple argument values (and you >>> have to know *both* types in order to work out the result) >> Yes, that can be a bit more complex, but at least the method >> that gets called has to belong to one class or the other. >> Also it's easier to follow nowadays with the auto-coercion >> system being phased out -- the left operand gets first say, >> and if it doesn't care, the right operand gets its say. > > Oh really? Are you sure about that? I was under the impression that > under certain circumstances, if one object is "more specific" than > the other (i.e., one is an instance of a subclass of the other's > type), then that one gets first say. Yep, and that feature stays even with __coerce__ going away. Otherwise subclasses would have a hell of a time getting their __r*__ methods to be invoked instead of the base classes. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From alexandre at peadrop.com Tue Jul 24 23:03:39 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 24 Jul 2007 17:03:39 -0400 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <ca471dc20707200744r4a8efc1an444d7f4f894ff23a@mail.gmail.com> Message-ID: <acd65fa20707241403k132e0f8el4b76407afe3c8ef1@mail.gmail.com> On 7/20/07, Guido van Rossum <guido at python.org> wrote: > I definitely *don't* want to continue the old habit of having a slow > and a fast module with different names; the experience with especially > cPickle and cStringIO is that everyone believes their code is > performance critical and hence uses the C version if it exists, > thereby repeating the same idiom over and over. Actually, I am been surprised myself that the C version of StringIO isn't always faster than the Python one. I have a testcase where using StringIO, instead of cStringIO, is ~20% faster. -- Alexandre From alexandre at peadrop.com Tue Jul 24 23:11:22 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 24 Jul 2007 17:11:22 -0400 Subject: [Python-3000] _heapq.c, etc. (was Re: Heaptypes) In-Reply-To: <46A1DA0C.5010107@canterbury.ac.nz> References: <ca471dc20707191525m5161b04x828e60efd17f6ffb@mail.gmail.com> <ca471dc20707191658s14d86b52x24b3a12524d9a97b@mail.gmail.com> <20070720010804.85A7.JCARLSON@uci.edu> <46A16906.7010005@canterbury.ac.nz> <46A1D3FF.4020000@v.loewis.de> <46A1DA0C.5010107@canterbury.ac.nz> Message-ID: <acd65fa20707241411p50a68a4ayd803ca63b15f6c84@mail.gmail.com> I am not sure if an official naming scheme is really necessary. For StringIO and BytesIO, I simply added a leading underscore the Python implementations and rename them if the C implementations aren't available. So, the Python versions remain available for testing, or if someone needs them. -- Alexandre On 7/21/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > Martin v. L?wis wrote: > > You mean, like prefixing it with c, e.g. StringIO vs. cStringIO, > > pickle vs. cPickle? > > Yes, but with an official scheme for deriving the names > from the main package name, and also an understanding > that these are implementation details to be used only > when really necessary (hence the leading underscores). > > Considering Guido's comment about people gratuitously > using the C versions, perhaps only the Python version > should be made available as an official alternative. > It's unlikely that people will gratuitously choose what > they perceive to be a *slower* version of the module. :-) From pje at telecommunity.com Tue Jul 24 23:56:28 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 24 Jul 2007 17:56:28 -0400 Subject: [Python-3000] New section for PEP 3124 Message-ID: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> Taking the recent threads here, and Guido's comments off-list, I've attempted to put together a coherent response as a new section for the PEP, which I've checked in and included a copy of here. If I have misrepresented anyone's argument, or if you spot something where you have a question or need a clarification, please let me know. Thanks. Overloading Usage Patterns ========================== In discussion on the Python-3000 list, the proposed feature of allowing arbitrary functions to be overloaded has been somewhat controversial, with some people expressing concern that this would make programs more difficult to understand. The general thrust of this argument is that one cannot rely on what a function does, if it can be changed from anywhere in the program at any time. Even though in principle this can already happen through monkeypatching or code substitution, it is considered poor practice to do so. However, providing support for overloading any function (or so the argument goes), is implicitly blessing such changes as being an acceptable practice. This argument appears to make sense in theory, but it is almost entirely mooted in practice for two reasons. First, people are generally not perverse, defining a function to do one thing in one place, and then summarily defining it to do the opposite somewhere else! The principal reasons to extend the behavior of a function that has *not* been specifically made generic are to: * Add special cases not contemplated by the original function's author, such as support for additional types. * Be notified of an action in order to cause some related operation to be performed, either before the original operation is performed, after it, or both. This can include general-purpose operations like adding logging, timing, or tracing, as well as application-specific behavior. None of these reasons for adding overloads imply any change to the intended default or overall behavior of the existing function, however. Just as a base class method may be overridden by a subclass for these same two reasons, so too may a function be overloaded to provide for such enhancements. In other words, universal overloading does not equal *arbitrary* overloading, in the sense that we need not expect people to randomly redefine the behavior of existing functions in illogical or unpredictable ways. If they did so, it would be no less of a bad practice than any other way of writing illogical or unpredictable code! However, to distinguish bad practice from good, it is perhaps necessary to clarify further what good practice for defining overloads *is*. And that brings us to the second reason why generic functions do not necessarily make programs harder to understand: overloading patterns in actual programs tend to follow very predictable patterns. (Both in Python and in languages that have no *non*-generic functions.) If a module is defining a new generic operation, it will usually also define any required overloads for existing types in the same place. Likewise, if a module is defining a new type, then it will usually define overloads there for any generic functions that it knows or cares about. As a result, the vast majority of overloads can be found adjacent to either the function being overloaded, or to a newly-defined type for which the overload is adding support. Thus, overloads are highly- discoverable in the common case, as you are either looking at the function or the type, or both. It is only in rather infrequent cases that one will have overloads in a module that contains neither the function nor the type(s) for which the overload is added. This would be the case if, say, a third-party created a bridge of support between one library's types and another library's generic function(s). In such a case, however, best practice suggests prominently advertising this, especially by way of the module name. For example, PyProtocols defines such bridge support for working with Zope interfaces and legacy Twisted interfaces, using modules called ``protocols.twisted_support`` and ``protocols.zope_support``. (These bridges are done with interface adapters, rather than generic functions, but the basic principle is the same.) In short, understanding programs in the presence of universal overloading need not be any more difficult, given that the vast majority of overloads will either be adjacent to a function, or the definition of a type that is passed to that function. And, in the absence of incompetence or deliberate intention to be obscure, the few overloads that are not adjacent to the relevant type(s) or function(s), will generally not need to be understood or known about outside the scope where those overloads are defined. (Except in the "support modules" case, where best practice suggests naming them accordingly.) From guido at python.org Wed Jul 25 00:16:46 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Jul 2007 15:16:46 -0700 Subject: [Python-3000] New section for PEP 3124 In-Reply-To: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> References: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> Message-ID: <ca471dc20707241516m6af5329ax1dada6129718d058@mail.gmail.com> I'm confused why you spend so much time refuting the argument, given that you've already agreed to implement explicit decoration. Did I misread that? As I tried to indicate with my "gut feelings" argument this is not something that's up to rational argument. Also, the paragraph starting with "As a result, the vast majority of overloads can be found adjacent to..." sounds like it isn't a big loss to require explicit decoration. So I'm sticking with it. On 7/24/07, Phillip J. Eby <pje at telecommunity.com> wrote: > Taking the recent threads here, and Guido's comments off-list, I've > attempted to put together a coherent response as a new section for > the PEP, which I've checked in and included a copy of here. If I > have misrepresented anyone's argument, or if you spot something where > you have a question or need a clarification, please let me know. Thanks. > > > Overloading Usage Patterns > ========================== > > In discussion on the Python-3000 list, the proposed feature of allowing > arbitrary functions to be overloaded has been somewhat controversial, > with some people expressing concern that this would make programs more > difficult to understand. > > The general thrust of this argument is that one cannot rely on what a > function does, if it can be changed from anywhere in the program at any > time. Even though in principle this can already happen through > monkeypatching or code substitution, it is considered poor practice to > do so. > > However, providing support for overloading any function (or so the > argument goes), is implicitly blessing such changes as being an > acceptable practice. > > This argument appears to make sense in theory, but it is almost entirely > mooted in practice for two reasons. > > First, people are generally not perverse, defining a function to do one > thing in one place, and then summarily defining it to do the opposite > somewhere else! The principal reasons to extend the behavior of a > function that has *not* been specifically made generic are to: > > * Add special cases not contemplated by the original function's author, > such as support for additional types. > > * Be notified of an action in order to cause some related operation to > be performed, either before the original operation is performed, > after it, or both. This can include general-purpose operations like > adding logging, timing, or tracing, as well as application-specific > behavior. > > None of these reasons for adding overloads imply any change to the > intended default or overall behavior of the existing function, however. > Just as a base class method may be overridden by a subclass for these > same two reasons, so too may a function be overloaded to provide for > such enhancements. > > In other words, universal overloading does not equal *arbitrary* > overloading, in the sense that we need not expect people to randomly > redefine the behavior of existing functions in illogical or > unpredictable ways. If they did so, it would be no less of a bad > practice than any other way of writing illogical or unpredictable code! > > However, to distinguish bad practice from good, it is perhaps necessary > to clarify further what good practice for defining overloads *is*. And > that brings us to the second reason why generic functions do not > necessarily make programs harder to understand: overloading patterns in > actual programs tend to follow very predictable patterns. (Both in > Python and in languages that have no *non*-generic functions.) > > If a module is defining a new generic operation, it will usually also > define any required overloads for existing types in the same place. > Likewise, if a module is defining a new type, then it will usually > define overloads there for any generic functions that it knows or cares > about. > > As a result, the vast majority of overloads can be found adjacent to > either the function being overloaded, or to a newly-defined type for > which the overload is adding support. Thus, overloads are highly- > discoverable in the common case, as you are either looking at the > function or the type, or both. > > It is only in rather infrequent cases that one will have overloads in a > module that contains neither the function nor the type(s) for which the > overload is added. This would be the case if, say, a third-party > created a bridge of support between one library's types and another > library's generic function(s). In such a case, however, best practice > suggests prominently advertising this, especially by way of the module > name. > > For example, PyProtocols defines such bridge support for working with > Zope interfaces and legacy Twisted interfaces, using modules called > ``protocols.twisted_support`` and ``protocols.zope_support``. (These > bridges are done with interface adapters, rather than generic functions, > but the basic principle is the same.) > > In short, understanding programs in the presence of universal > overloading need not be any more difficult, given that the vast majority > of overloads will either be adjacent to a function, or the definition of > a type that is passed to that function. > > And, in the absence of incompetence or deliberate intention to be > obscure, the few overloads that are not adjacent to the relevant type(s) > or function(s), will generally not need to be understood or known about > outside the scope where those overloads are defined. (Except in the > "support modules" case, where best practice suggests naming them > accordingly.) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Jul 25 00:30:38 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Jul 2007 15:30:38 -0700 Subject: [Python-3000] [Python-Dev] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> Message-ID: <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> Yeah, that particular test is not yet working. (Fixes are welcome -- see http://wiki.python.org/moin/Py3kStrUniTests for how to help.) I believe I rigged "make install" to continue after this error -- did the rest of the install complete? FWIW, a better place to discuss Py3k bleeding edge stuff is python-3000 at python.org. Sign up at the usual place. (I've CC'ed that list now -- please remove python-dev from followups.) --Guido On 7/24/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > Afther checking out the py3k-struni branch, 'make install' issued this: > > Compiling /usr/local/python/3.0/lib/python3.0/test/test_tarfile.py ... > *** SyntaxError: ('expected string, bytes found', > ('/usr/local/python/3.0/lib/python3.0/test/test_tarfile.py', 0, 0, > None)) > > If this is expected to fail, please forget this. > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed Jul 25 01:01:06 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 24 Jul 2007 19:01:06 -0400 Subject: [Python-3000] New section for PEP 3124 In-Reply-To: <ca471dc20707241516m6af5329ax1dada6129718d058@mail.gmail.co m> References: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> <ca471dc20707241516m6af5329ax1dada6129718d058@mail.gmail.com> Message-ID: <20070724225847.804E13A40A7@sparrow.telecommunity.com> At 03:16 PM 7/24/2007 -0700, Guido van Rossum wrote: >I'm confused why you spend so much time refuting the argument, The purpose was to capture the arguments on both sides for posterity as part of the PEP. > Also, the >paragraph starting with "As a result, the vast majority of overloads >can be found adjacent to..." sounds like it isn't a big loss to >require explicit decoration. Perhaps these two bits should have been closer together, then: >On 7/24/07, Phillip J. Eby <pje at telecommunity.com> wrote: >>The principal reasons to extend the behavior of a >>function that has *not* been specifically made generic are to: >> >>* Add special cases not contemplated by the original function's author, >> such as support for additional types. >> >>* Be notified of an action in order to cause some related operation to >> be performed, either before the original operation is performed, >> after it, or both. This can include general-purpose operations like >> adding logging, timing, or tracing, as well as application-specific >> behavior. >>... >>As a result, the vast majority of overloads can be found adjacent to >>either the function being overloaded, or to a *newly-defined type for >>which the overload is adding support* Emphasis added to the last bit -- you can't add support for a newly-defined type to a previously-existing function that was not declared generic, unless arbitrary overloads are allowed. For example, epydoc and pydoc contain functions that inspect the type of their arguments in order to decide what to with them. While it's arguable that in a GF world, the authors *should* have made those functions overloadable, it isn't reasonable to expect everyone to rewrite their code to make everything overloadable, nor to correctly anticipate every function for which extension might be needed. >As I tried to indicate with my "gut feelings" argument >this is not something that's up to rational argument. Of course... but the purpose was to document the experiences upon which *my* gut feelings are based, since that aspect of the PEP was not previously dealt with adequately. In retrospect, the new section is weak mainly because it's phrased as a defense to a critique, rather than being written as a motivation for the proposed feature. So much for the attempt at a quick fix. :) From dalcinl at gmail.com Wed Jul 25 01:14:03 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 24 Jul 2007 20:14:03 -0300 Subject: [Python-3000] [Python-Dev] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> Message-ID: <e7ba66e40707241614g2a5180b3o871f2bd73d57a695@mail.gmail.com> On 7/24/07, Guido van Rossum <guido at python.org> wrote: > I believe I rigged "make install" to continue after this error -- did > the rest of the install complete? Yes, it continued fine. BTW, are you interested in sending the output of python testsuite? I'm on a Fedora Core 6 box. I could build my wrappers for MPI without problems (they were working against p3yk branch, but I was warned that development has moved to py3k-struni). However, I am having trouble with 'pickle', but perhaps this is only my fault, i just imported pickle instead of cPickle (and all this in a C extension module). I am using that because cPickle seems to be not available in the py3k-struni. -- Lisandro Dalc?n From lists at cheimes.de Wed Jul 25 01:28:24 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 25 Jul 2007 01:28:24 +0200 Subject: [Python-3000] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> Message-ID: <f861uv$55h$1@sea.gmane.org> Guido van Rossum wrote: > On 7/24/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: >> Afther checking out the py3k-struni branch, 'make install' issued this: >> >> Compiling /usr/local/python/3.0/lib/python3.0/test/test_tarfile.py ... >> *** SyntaxError: ('expected string, bytes found', >> ('/usr/local/python/3.0/lib/python3.0/test/test_tarfile.py', 0, 0, >> None)) >> >> If this is expected to fail, please forget this. It should not faild but we know that it is failing. The module isn't easy to fix either. I spent about an hour on tarfile.py without any luck. It's a beast and seems to be rather old style code from the Python 1.x days. Christian From lists at cheimes.de Wed Jul 25 01:33:09 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 25 Jul 2007 01:33:09 +0200 Subject: [Python-3000] [Python-Dev] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <e7ba66e40707241614g2a5180b3o871f2bd73d57a695@mail.gmail.com> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <e7ba66e40707241614g2a5180b3o871f2bd73d57a695@mail.gmail.com> Message-ID: <f8627t$6a7$1@sea.gmane.org> Lisandro Dalcin wrote: > However, I am having trouble with 'pickle', but perhaps this is only > my fault, i just imported pickle instead of cPickle (and all this in a > C extension module). I am using that because cPickle seems to be not > available in the py3k-struni. The pickle module is broken as well. The cPickle module won't be available in Python 3000. The C optimization of the cPickle module are going to be integrated into the pickle module during a Google Summer of Code project. The new pickle code will be subclass-able (cPickle couldn't be subclassed) but will have optimized C code to speed up pickling and unpickling. Christian From greg.ewing at canterbury.ac.nz Wed Jul 25 03:43:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 25 Jul 2007 13:43:04 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070724004850.2F2343A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> <20070724004850.2F2343A403D@sparrow.telecommunity.com> Message-ID: <46A6AAA8.8080607@canterbury.ac.nz> Phillip J. Eby wrote: > ...and may be subclassed in an unlimited number of places. > > A generic function is defined in just one place, with a limited number > of "generic" methods typically adjoining it, and may be extended in an > unlimited number of places. > > Where's the difference? With GFs, even if you assume a particular runtime type, it can *still* be extended in an unlimited number of places. >> It also provides a convenient mental chunk under which to >> group all the operations that it implements. > > The function itself is the grouping, in the same way that Python's > operator.* functions are, or its built-in generics like len() and > iter(). But they're just syntactic sugar for calling methods of the objects involved, so those objects' classes have full control of what happens. If len() were a GF in your sense, the code implementing it for a given type could appear anywhere. >> No, you're going to find every function whose name is 'foo', >> whether it's a method of the particular GF you have in mind >> or not. > > And this doesn't apply to normal methods? Yes, it does to some extent, and that can be a nuisance. But in the first instance I'm not going to grep the whole program, just the file where the class is defined. If I don't find it there, I'll move on to the file defining its base class, etc. The first definition I find will be the relevant one. In other words, I have a search *path* through a structure that's reflected in the layout of the source files. GFs destroy that structure. (Multiple inheritance can mess this up a bit, but that just means multiple inheritance has problems, not that GFs are good.) > For one thing, you can isolate your search to modules that > import the function being overridden But I'll still get an unordered set of results that I'll have to sort through to find the most relevant method. > The thing that you seem to keep missing in your analysis is that Python > already *has* generic functions in the language specification, But only in a very restricted way -- so restricted that I've never even thought of them as GFs, but as just another way to write a method call. -- Greg From greg.ewing at canterbury.ac.nz Wed Jul 25 04:39:06 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 25 Jul 2007 14:39:06 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070724013722.B2F5B3A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A54DCE.8050205@canterbury.ac.nz> <20070724013722.B2F5B3A403D@sparrow.telecommunity.com> Message-ID: <46A6B7CA.3010308@canterbury.ac.nz> Phillip J. Eby wrote: > At 12:54 PM 7/24/2007 +1200, Greg Ewing wrote: >> When adding an overload to a GF, what methodology can I >> follow to ensure that my overload doesn't interact in an >> unfortunate way with another one somewhere else, perhaps >> one not written by me? > > What methodology can you follow that ensures that same thing when > overriding a method in a subclass? It's dead simple -- my method always wins. This is true even in the presence of multiple inheritance. Problems only arise there if I use super() to make an inherited method call (so I don't do that) or if other people multiply inherit from me -- in which case it's their problem, not mine. > Okay, well I guess the above statements all put you squarely in the "OO > is too scary" category, Certainly not -- I don't find OO scary at all. I wouldn't say that I find GFs "scary" either, only that I would use them cautiously and sparingly. I don't agree that there is no difference between the traditional OO model and GFs. With GFs there is less static structure that you can rely on. >> And what if the program doesn't exist yet, because I'm >> still thinking about how to write it? Or it exists but >> isn't yet in a state where it can be run successfully? > > I don't understand what you're asking, here. Don't you think it's important to be able to reason about the way a program will behave while you're in the process of designing it? If you haven't written runnable code yet, you can't run it to get a list of method overrides. > I was under the impression that > under certain circumstances, if one object is "more specific" than the > other (i.e., one is an instance of a subclass of the other's type), then > that one gets first say. You may be right. But the fact remains that the method called will be a method of one class or the other -- it can't be some function defined in an arbitrary place. -- Greg From greg.ewing at canterbury.ac.nz Wed Jul 25 06:06:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 25 Jul 2007 16:06:04 +1200 Subject: [Python-3000] New section for PEP 3124 In-Reply-To: <20070724225847.804E13A40A7@sparrow.telecommunity.com> References: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> <ca471dc20707241516m6af5329ax1dada6129718d058@mail.gmail.com> <20070724225847.804E13A40A7@sparrow.telecommunity.com> Message-ID: <46A6CC2C.9040506@canterbury.ac.nz> Phillip J. Eby wrote: > At 03:16 PM 7/24/2007 -0700, Guido van Rossum wrote: > >>I'm confused why you spend so much time refuting the argument, > > The purpose was to capture the arguments on both sides for posterity > as part of the PEP. I don't think you need to spend so many words on the argument itself -- a one-paragraph summary would be enough. The parts outlining recommended practice for overloading look useful, though. This is the sort of thing I was after with my "What methodology can I follow?" question. But I would phrase it in an "It is recommended that..." kind of way rather than making assertions about what "can be found" in code (that doesn't exist yet in Python). > For example, epydoc and pydoc contain functions that inspect the type > of their arguments in order to decide what to with them. While it's > arguable that in a GF world, the authors *should* have made those > functions overloadable, it isn't reasonable to expect everyone to > rewrite their code to make everything overloadable, nor to correctly > anticipate every function for which extension might be needed. However, given the existence of GFs, someone writing something like pydoc, and coming to a point where he is about to write an if-else statement that switches on a type, perhaps ought to at least suspect that it might be a good idea to use a GF instead? -- Greg From guido at python.org Wed Jul 25 06:53:37 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Jul 2007 21:53:37 -0700 Subject: [Python-3000] [Python-Dev] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <f8627t$6a7$1@sea.gmane.org> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <e7ba66e40707241614g2a5180b3o871f2bd73d57a695@mail.gmail.com> <f8627t$6a7$1@sea.gmane.org> Message-ID: <ca471dc20707242153p748e17eeh81d346097159b583@mail.gmail.com> What's broken about pickle on the struni branch? It passes all its tests. On 7/24/07, Christian Heimes <lists at cheimes.de> wrote: > Lisandro Dalcin wrote: > > However, I am having trouble with 'pickle', but perhaps this is only > > my fault, i just imported pickle instead of cPickle (and all this in a > > C extension module). I am using that because cPickle seems to be not > > available in the py3k-struni. > > The pickle module is broken as well. The cPickle module won't be > available in Python 3000. The C optimization of the cPickle module are > going to be integrated into the pickle module during a Google Summer of > Code project. The new pickle code will be subclass-able (cPickle > couldn't be subclassed) but will have optimized C code to speed up > pickling and unpickling. > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Wed Jul 25 06:54:33 2007 From: talin at acm.org (Talin) Date: Tue, 24 Jul 2007 21:54:33 -0700 Subject: [Python-3000] Latest revision of PEP 3101 Message-ID: <46A6D789.9060502@acm.org> You can find it in the usual place: http://www.python.org/dev/peps/pep-3101/ There are no changes to public APIs, the only changes are to the extension mechanism for custom formatting classes. Also, I've edited a lot of the text in order to improve the clarity of explanations and cut out excess verbiage. Comments are welcome as usual. -- Talin From guido at python.org Wed Jul 25 06:55:10 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Jul 2007 21:55:10 -0700 Subject: [Python-3000] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <f861uv$55h$1@sea.gmane.org> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <f861uv$55h$1@sea.gmane.org> Message-ID: <ca471dc20707242155p665b8bd4ie0a065cbe31c09ba@mail.gmail.com> Tarfile is not from the 1.x days. But you're right, it's hairy. It also changes too much (e.g. between 2.4.1 and 2.4.3 a refactoring happened that also caused a new bug. The code has evolved quite a bit since then and is still evolving... ;-( ) On 7/24/07, Christian Heimes <lists at cheimes.de> wrote: > Guido van Rossum wrote: > > On 7/24/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > >> Afther checking out the py3k-struni branch, 'make install' issued this: > >> > >> Compiling /usr/local/python/3.0/lib/python3.0/test/test_tarfile.py ... > >> *** SyntaxError: ('expected string, bytes found', > >> ('/usr/local/python/3.0/lib/python3.0/test/test_tarfile.py', 0, 0, > >> None)) > >> > >> If this is expected to fail, please forget this. > > It should not faild but we know that it is failing. The module isn't > easy to fix either. I spent about an hour on tarfile.py without any > luck. It's a beast and seems to be rather old style code from the Python > 1.x days. > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jyasskin at gmail.com Wed Jul 25 07:18:00 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Tue, 24 Jul 2007 22:18:00 -0700 Subject: [Python-3000] struni and the Apple four-character-codes Message-ID: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> I'm looking through a couple of the OS X tests and have run into the question of what to do with four-character codes. (For those of you who are unfamiliar with these, Apple, around the dawn of time, decided that C constants like 'TEXT' (yes, those are single quotes) would compile to the uint32_t 0x54455854 (or maybe the other-endian version of that) so they could use these as cheap-but-readable type identifiers.) In Python 2, these are represented as 'str' instances, which PyMac_GetOSType() in Python/mactoolboxglue.c converts to the native int format. For Python 3, right now they're str8's, but str8 is theoretically supposed to go away. Because they're binary constants displayed as ASCII, not unicode text, I initially thought that 'bytes' was the appropriate type. Unfortunately, bytes is mutable, and I think it makes sense to hash these constants (and some code in aepack.py does). So, I'm stuck and wanted to ask the list for input. I see 5 options: 1) Make these str instances so they're immutable and just rely on convention and runtime errors to keep them in ascii. 2) Make them bytes, and cast them to something else when you want to make them keys in a dict. 3) Keep them str8 and give up on getting rid of it. 4) Make bytes immutable, add a 'buffer' type which acts like the current bytes type, and make these codes instances of bytes. [probably impossible this late in the game] 5) Make a new hashable class for these codes which converts them to and from ints and bytes and becomes the general argument type for the apple platform interface. [Cleanest, but lots of work that I'm not volunteering to do] Thoughts? Jeffrey From talin at acm.org Wed Jul 25 07:29:21 2007 From: talin at acm.org (Talin) Date: Tue, 24 Jul 2007 22:29:21 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> Message-ID: <46A6DFB1.8050908@acm.org> Jeffrey Yasskin wrote: > I'm looking through a couple of the OS X tests and have run into the > question of what to do with four-character codes. (For those of you > who are unfamiliar with these, Apple, around the dawn of time, decided > that C constants like 'TEXT' (yes, those are single quotes) would > compile to the uint32_t 0x54455854 (or maybe the other-endian version > of that) so they could use these as cheap-but-readable type > identifiers.) In Python 2, these are represented as 'str' instances, > which PyMac_GetOSType() in Python/mactoolboxglue.c converts to the > native int format. For Python 3, right now they're str8's, but str8 is > theoretically supposed to go away. Because they're binary constants > displayed as ASCII, not unicode text, I initially thought that 'bytes' > was the appropriate type. Unfortunately, bytes is mutable, and I think > it makes sense to hash these constants (and some code in aepack.py > does). > > So, I'm stuck and wanted to ask the list for input. I see 5 options: > 1) Make these str instances so they're immutable and just rely on > convention and runtime errors to keep them in ascii. > 2) Make them bytes, and cast them to something else when you want to > make them keys in a dict. > 3) Keep them str8 and give up on getting rid of it. > 4) Make bytes immutable, add a 'buffer' type which acts like the > current bytes type, and make these codes instances of bytes. [probably > impossible this late in the game] > 5) Make a new hashable class for these codes which converts them to > and from ints and bytes and becomes the general argument type for the > apple platform interface. [Cleanest, but lots of work that I'm not > volunteering to do] > > Thoughts? > Jeffrey Yeah. I like the idea of converting them to integers, but I don't think you need a special hash table class for that. Instead, create a wrapper class for the four character codes: TextId = FourCharId("TEXT") i = int(TextId) # Integer value s = str(TextId) # String representation some_map[TextId] = "Some Text" # Can use as dict key The wrapper class is an immutable class that handles conversion to integer form in the constructor, hashing, and has a __str__ and __repr__ method that produces the original input string. Then you can use that as a key to a regular dict. -- Talin From guido at python.org Wed Jul 25 07:44:17 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Jul 2007 22:44:17 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <46A6DFB1.8050908@acm.org> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <46A6DFB1.8050908@acm.org> Message-ID: <ca471dc20707242244x2c423cfdmc22cde053a8563a9@mail.gmail.com> Make them bytes literals (some code already does this), and convert them to integers when they're needed to be used as hash keys. (Does this happen a lot? I haven't seen it yet.) I would endorse an API to create an int from a bytes array (or arbitrary length) and vice versa -- that would be a useful way to marshal long integers, too. There's probably already a C API to do something like that. --Guido On 7/24/07, Talin <talin at acm.org> wrote: > Jeffrey Yasskin wrote: > > I'm looking through a couple of the OS X tests and have run into the > > question of what to do with four-character codes. (For those of you > > who are unfamiliar with these, Apple, around the dawn of time, decided > > that C constants like 'TEXT' (yes, those are single quotes) would > > compile to the uint32_t 0x54455854 (or maybe the other-endian version > > of that) so they could use these as cheap-but-readable type > > identifiers.) In Python 2, these are represented as 'str' instances, > > which PyMac_GetOSType() in Python/mactoolboxglue.c converts to the > > native int format. For Python 3, right now they're str8's, but str8 is > > theoretically supposed to go away. Because they're binary constants > > displayed as ASCII, not unicode text, I initially thought that 'bytes' > > was the appropriate type. Unfortunately, bytes is mutable, and I think > > it makes sense to hash these constants (and some code in aepack.py > > does). > > > > So, I'm stuck and wanted to ask the list for input. I see 5 options: > > 1) Make these str instances so they're immutable and just rely on > > convention and runtime errors to keep them in ascii. > > 2) Make them bytes, and cast them to something else when you want to > > make them keys in a dict. > > 3) Keep them str8 and give up on getting rid of it. > > 4) Make bytes immutable, add a 'buffer' type which acts like the > > current bytes type, and make these codes instances of bytes. [probably > > impossible this late in the game] > > 5) Make a new hashable class for these codes which converts them to > > and from ints and bytes and becomes the general argument type for the > > apple platform interface. [Cleanest, but lots of work that I'm not > > volunteering to do] > > > > Thoughts? > > Jeffrey > > Yeah. I like the idea of converting them to integers, but I don't think > you need a special hash table class for that. Instead, create a wrapper > class for the four character codes: > > TextId = FourCharId("TEXT") > i = int(TextId) # Integer value > s = str(TextId) # String representation > some_map[TextId] = "Some Text" # Can use as dict key > > The wrapper class is an immutable class that handles conversion to > integer form in the constructor, hashing, and has a __str__ and __repr__ > method that produces the original input string. Then you can use that as > a key to a regular dict. > > -- Talin > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Wed Jul 25 10:54:38 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 25 Jul 2007 10:54:38 +0200 Subject: [Python-3000] [Python-Dev] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <ca471dc20707242153p748e17eeh81d346097159b583@mail.gmail.com> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <e7ba66e40707241614g2a5180b3o871f2bd73d57a695@mail.gmail.com> <f8627t$6a7$1@sea.gmane.org> <ca471dc20707242153p748e17eeh81d346097159b583@mail.gmail.com> Message-ID: <46A70FCE.9030508@cheimes.de> Guido van Rossum wrote: > What's broken about pickle on the struni branch? It passes all its tests. > My brain ... :( I had some old code laying around. After svn revert + svn up all pickle tests are passing. *blush* Christian From jan.grant at bristol.ac.uk Wed Jul 25 11:03:08 2007 From: jan.grant at bristol.ac.uk (Jan Grant) Date: Wed, 25 Jul 2007 10:03:08 +0100 (BST) Subject: [Python-3000] pep 3124 plans In-Reply-To: <46A58390.8050408@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2B2BB.9070305@canterbury.ac.nz> <20070722015630.8F34C3A403A@sparrow.telecommunity.com> <46A3EC9B.4020507@canterbury.ac.nz> <20070723004703.C3A903A40A9@sparrow.telecommunity.com> <46A54092.8030606@canterbury.ac.nz> <20070724004850.2F2343A403D@sparrow.telecommunity.com> <ca471dc20707231957n2e58258v7b86b904803890dd@mail.gmail.com> <20070724034006.9B23E3A403D@sparrow.telecommunity.com> <46A58390.8050408@acm.org> Message-ID: <20070725095353.I54289@tribble.ilrt.bris.ac.uk> On Mon, 23 Jul 2007, Talin wrote: > Phillip J. Eby wrote: > > > I just don't see that the things Greg is describing aren't equally > > applicable to traditional methods. > > I wasn't going to get into this, but - since you asked :) > > The short form of the argument is that being able to overload any > function as a generic function retroactively changes the implicit > contract of what that function is. I don't think this is really true in programs written with good taste - ie, it's no more true than in the OO case. In the OO case, one might consider the class of an object to be closely associated with a contract describing its intended semantics (its type). If a function takes a parameter and is written expecting that it is passed an argument of type B (for come class B), then by subclassing B into a derived class, D, you _ought_ to be able to pass an instance of D to the same function which should be able to use it, regardless. That's what subclassing _means_: if D is a subclass of B, then all instances of D should behave appropriately and according to the intended semantics of B when used as a B. Of course, it's perfectly possible to abuse subclassing to acquire implementation rather than the type/contract, but well-written* OO programs at least draw a clear distinction between those uses if they do it at all. So, when you look at an OO program that makes extensive use of subclassing, you typically have a notion of what method calls should do at a broad semantic level because that notion is part of the contract implicit in the type. Exactly the same is true with GFs. Yes, you can overload "add()" to mean "subtract" or "remove a random file" or "close all database connections" in certain cases. That's painfully flying in the face of the intended semantics of the function you're overloading; so, don't do that. Cheers, jan * Excuse the unavoidably emotive terminology like "well-written". I know there are other views - I'm just arguing this one. -- jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/ Tel +44 (0)117 3317661 http://ioctl.org/jan/ Spreadsheet through network. Oh yeah. From ronaldoussoren at mac.com Wed Jul 25 12:04:49 2007 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 25 Jul 2007 03:04:49 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> Message-ID: <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> I've CC-ed Jack Jansen as he has maintained the Mac libraries for ages (from way before OS9 was shiny and new). On Wednesday, July 25, 2007, at 07:18AM, "Jeffrey Yasskin" <jyasskin at gmail.com> wrote: >I'm looking through a couple of the OS X tests and have run into the >question of what to do with four-character codes. (For those of you >who are unfamiliar with these, Apple, around the dawn of time, decided >that C constants like 'TEXT' (yes, those are single quotes) would >compile to the uint32_t 0x54455854 (or maybe the other-endian version >of that) so they could use these as cheap-but-readable type AFAIK the are always converted as big-endian values. >identifiers.) In Python 2, these are represented as 'str' instances, >which PyMac_GetOSType() in Python/mactoolboxglue.c converts to the >native int format. For Python 3, right now they're str8's, but str8 is >theoretically supposed to go away. Because they're binary constants >displayed as ASCII, not unicode text, I initially thought that 'bytes' >was the appropriate type. Unfortunately, bytes is mutable, and I think >it makes sense to hash these constants (and some code in aepack.py >does). > >So, I'm stuck and wanted to ask the list for input. I see 5 options: > 1) Make these str instances so they're immutable and just rely on >convention and runtime errors to keep them in ascii. > 2) Make them bytes, and cast them to something else when you want to >make them keys in a dict. > 3) Keep them str8 and give up on getting rid of it. > 4) Make bytes immutable, add a 'buffer' type which acts like the >current bytes type, and make these codes instances of bytes. [probably >impossible this late in the game] > 5) Make a new hashable class for these codes which converts them to >and from ints and bytes and becomes the general argument type for the >apple platform interface. [Cleanest, but lots of work that I'm not >volunteering to do] A 6th option is a subclass of int. It's constructor would accept a string containing the 4CC and the repr/str method would return the string representation of the code. IMHO this is the cleanest representation of 4CCs in Python because those codes are basicy a "neat" way to enter integer literals in C. This would also solve a problem that PyObjC users sometimes run into: Several C/Objective-C APIs return a dictionary where one of the values is an integer and where one would commonly use 4CCs to write down literals. This currently causes unexpected failures but would do the right thing with this option. Ronald From benji at benjiyork.com Wed Jul 25 14:45:29 2007 From: benji at benjiyork.com (Benji York) Date: Wed, 25 Jul 2007 08:45:29 -0400 Subject: [Python-3000] New section for PEP 3124 In-Reply-To: <20070724225847.804E13A40A7@sparrow.telecommunity.com> References: <20070724220011.4F63D3A40B2@sparrow.telecommunity.com> <ca471dc20707241516m6af5329ax1dada6129718d058@mail.gmail.com> <20070724225847.804E13A40A7@sparrow.telecommunity.com> Message-ID: <46A745E9.1050803@benjiyork.com> Phillip J. Eby wrote: > For example, epydoc and pydoc contain functions that inspect the type > of their arguments in order to decide what to with them. While it's > arguable that in a GF world, the authors *should* have made those > functions overloadable, it isn't reasonable to expect everyone to > rewrite their code to make everything overloadable, nor to correctly > anticipate every function for which extension might be needed. That makes me wonder. Will it be possible to monkeypatch a function to make it overloadable? I don't know if people will really do that, but at least the irony would be worth it. -- Benji York http://benjiyork.com From martin at v.loewis.de Wed Jul 25 19:26:45 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 25 Jul 2007 19:26:45 +0200 Subject: [Python-3000] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <f861uv$55h$1@sea.gmane.org> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <f861uv$55h$1@sea.gmane.org> Message-ID: <46A787D5.2030000@v.loewis.de> >>> Afther checking out the py3k-struni branch, 'make install' issued this: >>> >>> Compiling /usr/local/python/3.0/lib/python3.0/test/test_tarfile.py ... >>> *** SyntaxError: ('expected string, bytes found', >>> ('/usr/local/python/3.0/lib/python3.0/test/test_tarfile.py', 0, 0, >>> None)) >>> >>> If this is expected to fail, please forget this. > > It should not faild but we know that it is failing. The module isn't > easy to fix either. I spent about an hour on tarfile.py without any > luck. It's a beast and seems to be rather old style code from the Python > 1.x days. Don't despair, the white knight in shining armor might not be too far away to safe you from the dragon :-) Seriously, Lars Gust?bel (CC'ed) has always been quite helpful in fixing whatever problem arise with the tarfile module. Lars, do you have a chance to look at porting the module to 3k/struni? Regards, Martin From lars at gustaebel.de Thu Jul 26 00:54:41 2007 From: lars at gustaebel.de (Lars =?iso-8859-15?Q?Gust=E4bel?=) Date: Thu, 26 Jul 2007 00:54:41 +0200 Subject: [Python-3000] Py3k: error during 'make install' in py3k-struni ? In-Reply-To: <46A787D5.2030000@v.loewis.de> References: <e7ba66e40707241524v2ad90c51hcb3e63f0a8ea9b08@mail.gmail.com> <ca471dc20707241530r4957c856ved9bfa5a9a023c6@mail.gmail.com> <f861uv$55h$1@sea.gmane.org> <46A787D5.2030000@v.loewis.de> Message-ID: <20070725225441.GA18002@core.g33x.de> On Wed, Jul 25, 2007 at 07:26:45PM +0200, "Martin v. L?wis" wrote: > >>> Afther checking out the py3k-struni branch, 'make install' issued this: > >>> > >>> Compiling /usr/local/python/3.0/lib/python3.0/test/test_tarfile.py ... > >>> *** SyntaxError: ('expected string, bytes found', > >>> ('/usr/local/python/3.0/lib/python3.0/test/test_tarfile.py', 0, 0, > >>> None)) > >>> > >>> If this is expected to fail, please forget this. > > > > It should not faild but we know that it is failing. The module isn't > > easy to fix either. I spent about an hour on tarfile.py without any > > luck. It's a beast and seems to be rather old style code from the Python > > 1.x days. > > Don't despair, the white knight in shining armor might not be too far > away to safe you from the dragon :-) Yea, I heard that call :-) > Seriously, Lars Gust?bel (CC'ed) has always been quite helpful in > fixing whatever problem arise with the tarfile module. > > Lars, do you have a chance to look at porting the > module to 3k/struni? I just took a quick look at it, but I could not reproduce the above error message. However, it is obvious that tarfile.py is completely unusable in py3k-struni and it is my job to fix it, which seems to me far from trivial at the moment. I have to catch up with py3k development as well, so I am not able to estimate when the job will be done. -- Lars Gust?bel lars at gustaebel.de The world is a tragedy to those who feel, but a comedy to those who think. (Horace Walpole) From greg.ewing at canterbury.ac.nz Thu Jul 26 04:07:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 26 Jul 2007 14:07:55 +1200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> Message-ID: <46A801FB.7020300@canterbury.ac.nz> Jeffrey Yasskin wrote: > Apple, around the dawn of time, decided > that C constants like 'TEXT' (yes, those are single quotes) would > compile to the uint32_t 0x54455854 They weren't C constants originally, they were Pascal constants, and it made sense at the time given the way the Pascal compiler they were using handled string literals. They also worked okay as multi-character char literals in the early C compilers used on the Mac. It's unfortunate that gcc gets persnickety about those. > I initially thought that 'bytes' > was the appropriate type. Unfortunately, bytes is mutable, and I think > it makes sense to hash these constants (and some code in aepack.py > does). Is this another indication that we should have an immutable version of the bytes type? -- Greg From ronaldoussoren at mac.com Thu Jul 26 07:52:34 2007 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 26 Jul 2007 07:52:34 +0200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <46A801FB.7020300@canterbury.ac.nz> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <46A801FB.7020300@canterbury.ac.nz> Message-ID: <3B03B169-1321-48BC-A14D-7B32592BE159@mac.com> On 26 Jul, 2007, at 4:07, Greg Ewing wrote: > >> I initially thought that 'bytes' >> was the appropriate type. Unfortunately, bytes is mutable, and I >> think >> it makes sense to hash these constants (and some code in aepack.py >> does). > > Is this another indication that we should have an > immutable version of the bytes type? No. Four-character-constants are *not* strings or byte arrays, they are integer literals. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070726/ed794ae0/attachment.bin From dalcinl at gmail.com Thu Jul 26 19:06:47 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 26 Jul 2007 14:06:47 -0300 Subject: [Python-3000] interaction between locals, builtins and except clause Message-ID: <e7ba66e40707261006s71826ce8p73194134992697f3@mail.gmail.com> Porting to Py3K, I modified a function like the followin, using a trick for it working in Py2.x . def __iter__(self): if self == _mpi.INFO_NULL: return try: range = xrange except: pass nkeys = _mpi.info_get_nkeys(self) for nthkey in range(nkeys): yield _mpi.info_get_nthkey(self, nthkey) However, I've got in my unittests (running with py3k) ERROR: testPyMethods (__main__.TestInfo) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/unittest/test_info.py", line 123, in testPyMethods for key in INFO: File "/u/dalcinl/lib/python/mpi4py/MPI.py", line 937, in __iter__ for nthkey in range(nkeys): UnboundLocalError: local variable 'range' referenced before assignment I am not completelly sure if this is expected (it is, regarding implementation, but perhaps not regarding Python as a language), so I post this for your consideration. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From eopadoan at altavix.com Thu Jul 26 19:27:11 2007 From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan) Date: Thu, 26 Jul 2007 14:27:11 -0300 Subject: [Python-3000] interaction between locals, builtins and except clause In-Reply-To: <e7ba66e40707261006s71826ce8p73194134992697f3@mail.gmail.com> References: <e7ba66e40707261006s71826ce8p73194134992697f3@mail.gmail.com> Message-ID: <dea92f560707261027k2462a9beu10b5c4b3398a458d@mail.gmail.com> On 7/26/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > Porting to Py3K, I modified a function like the followin, using a > trick for it working in Py2.x . > > def __iter__(self): > if self == _mpi.INFO_NULL: > return > try: range = xrange > except: pass > nkeys = _mpi.info_get_nkeys(self) > for nthkey in range(nkeys): > yield _mpi.info_get_nthkey(self, nthkey) > > However, I've got in my unittests (running with py3k) > > ERROR: testPyMethods (__main__.TestInfo) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "tests/unittest/test_info.py", line 123, in testPyMethods > for key in INFO: > File "/u/dalcinl/lib/python/mpi4py/MPI.py", line 937, in __iter__ > for nthkey in range(nkeys): > UnboundLocalError: local variable 'range' referenced before assignment > > > I am not completelly sure if this is expected (it is, regarding > implementation, but perhaps not regarding Python as a language), so > I post this for your consideration. > Python thinnks range is local, because you referenced it, even if an error ocurred. Use 'global range' at the top of the file. But, as I understand, you are trying to target both Python 2.x and 3 with the same code, using tricks like this one. I think that, even if you succeed, the resulting code will be quite unmaintainable. -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From dalcinl at gmail.com Thu Jul 26 20:34:51 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Thu, 26 Jul 2007 15:34:51 -0300 Subject: [Python-3000] interaction between locals, builtins and except clause In-Reply-To: <dea92f560707261027k2462a9beu10b5c4b3398a458d@mail.gmail.com> References: <e7ba66e40707261006s71826ce8p73194134992697f3@mail.gmail.com> <dea92f560707261027k2462a9beu10b5c4b3398a458d@mail.gmail.com> Message-ID: <e7ba66e40707261134x6389bb6bi17d31f0cce323a96@mail.gmail.com> On 7/26/07, Eduardo EdCrypt O. Padoan <eopadoan at altavix.com> > Python thinnks range is local, because you referenced it, even if an > error ocurred. Use 'global range' at the top of the file. Yes, I understand all that. I just wanted to know if the result of this locals + except + globals interaction was right, even in the case of errors. Now I know that it is OK. Thanks! > But, as I understand, you are trying to target both Python 2.x and 3 > with the same code, using tricks like this one. I think that, even if > you succeed, the resulting code will be quite unmaintainable. Well, my code is not so complex in the python side (I'm still supporting python 2.3). And I do not want to put things in globals, just for maintenenace reasons. In the end, I've used the following trick: try: _range = xrange except: _range = range for i in _range(n): pass I think it should work in any 2x and 3K. Is this right? Perhaps this trick could be used for some automated conversion tool targeting backward compatibility with 2.x series. Regards, and thanks again for your clarification. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From greg.ewing at canterbury.ac.nz Fri Jul 27 02:12:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 27 Jul 2007 12:12:10 +1200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <3B03B169-1321-48BC-A14D-7B32592BE159@mac.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <46A801FB.7020300@canterbury.ac.nz> <3B03B169-1321-48BC-A14D-7B32592BE159@mac.com> Message-ID: <46A9385A.2060508@canterbury.ac.nz> Ronald Oussoren wrote: > No. Four-character-constants are *not* strings or byte arrays, they are > integer literals. Well, in Pascal they were character arrays -- it was only when they switched to C that they became ints. Conceptually they're still the same thing. Python isn't C, and doesn't have to be bound by C's limitations. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From aholkner at cs.rmit.edu.au Fri Jul 27 02:45:35 2007 From: aholkner at cs.rmit.edu.au (Alex Holkner) Date: Fri, 27 Jul 2007 10:45:35 +1000 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <46A9385A.2060508@canterbury.ac.nz> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <46A801FB.7020300@canterbury.ac.nz> <3B03B169-1321-48BC-A14D-7B32592BE159@mac.com> <46A9385A.2060508@canterbury.ac.nz> Message-ID: <46A9402F.1040807@cs.rmit.edu.au> Greg Ewing wrote: > Ronald Oussoren wrote: >> No. Four-character-constants are *not* strings or byte arrays, they are >> integer literals. > > Well, in Pascal they were character arrays -- it > was only when they switched to C that they became > ints. Conceptually they're still the same thing. > Python isn't C, and doesn't have to be bound by > C's limitations. Regardless of what the situation was in Pascal's time, they are currently integers. The order of bytes in the array would need to be adjusted depending on the machine endianness to be correct. The C argument passing convention is different for byte arrays than for integers (presumably the most common use of these constants is to use them with Apple libraries). Different constants within the same enumeration are sometimes specified as decimal integers, and sometimes as character constants. For example, the QTNewGWorldFromPtr function uses an enumeration which includes k32BGRAPixelFormat, defined as 'BGRA', and k32ARGBPixelFormat, defined as 0x20. Providing a convenience str() method may be handy, but the internal representation must be integer. Alex. From jyasskin at gmail.com Fri Jul 27 05:38:45 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 26 Jul 2007 20:38:45 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> Message-ID: <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> I've sent the patch as http://python.org/sf/1761465 using Guido's suggestion of using bytes, but I do philosophically prefer Talin's and Ronald's suggestions. On 7/25/07, Ronald Oussoren <ronaldoussoren at mac.com> wrote: > I've CC-ed Jack Jansen as he has maintained the Mac libraries for ages (from way before OS9 was shiny and new). Did you mean to add him to this thread? > On Wednesday, July 25, 2007, at 07:18AM, "Jeffrey Yasskin" <jyasskin at gmail.com> wrote: > > 5) Make a new hashable class for these codes which converts them to > >and from ints and bytes and becomes the general argument type for the > >apple platform interface. [Cleanest, but lots of work that I'm not > >volunteering to do] > > A 6th option is a subclass of int. It's constructor would accept a string containing the 4CC and the repr/str method would return the string representation of the code. IMHO this is the cleanest representation of 4CCs in Python because those codes are basicy a "neat" way to enter integer literals in C. Na?ve question: How does that differ from option (5)? Just the isinstance() behavior? I said this would take a lot of work because I think the new type needs to be implemented in C to be returned from PyMac_GetOSType(), and it seemed like a bigger API change than just switching to bytes, but it turns out that switching to bytes isn't particularly trivial either when you have to cast for every use in a dict, so maybe the new type would be easier. > This would also solve a problem that PyObjC users sometimes run into: Several C/Objective-C APIs return a dictionary where one of the values is an integer and where one would commonly use 4CCs to write down literals. This currently causes unexpected failures but would do the right thing with this option. I don't think that option (6) by itself solves with that particular problem. If you call str() on one of those ints, you'd just get a number, which is different from what would happen if you call str() on the 4CC type. It might help though by handling comparisons correctly. On 7/26/07, Alex Holkner <aholkner at cs.rmit.edu.au> wrote: > Providing a convenience str() method may be handy, but the internal > representation must be integer. Where are you getting "must"? In current python, they're 'str' instances, not ints. The C interface between python and apple code converts, of course, but python can do whatever makes the most sense to us. -- Namast?, Jeffrey Yasskin From guido at python.org Fri Jul 27 07:07:52 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 26 Jul 2007 22:07:52 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> Message-ID: <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> On 7/26/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > I've sent the patch as http://python.org/sf/1761465 using Guido's > suggestion of using bytes, but I do philosophically prefer Talin's and > Ronald's suggestions. I've checked in what you submitted; at this point I take whatever I can get if it makes unit tests pass. :-) I'm not so sure that the "philosophically optimal" solution is all that practical. After all we could have done that before, but we didn't -- we used strings, because that's the most convenient way to spell them in Python code, and (nearly) all APIs that take or return these are C code which can do whatever it wants. We could use Unicode strings where in the past we used 8-bit strings, but that would be somewhat nasty when there's ever one of these codes that's not pure ASCII -- we'd have to worry about encoding them properly. So I'm happy with byte strings and the occasional helper to convert these to strings or ints when using them as keys. (Personally I'd like to use strings for the keys since {'TEXT': 'stuff'} is a lot clearer than {1413830740: 'stuff'} when encountered in a debugging session.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jyasskin at gmail.com Fri Jul 27 07:27:42 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 26 Jul 2007 22:27:42 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070723153031.D00273A403D@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> Message-ID: <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> On 7/23/07, Phillip J. Eby <pje at telecommunity.com> wrote: > For example, one pattern that sometimes comes up in writing methods > is that you have a base class that always wants to do something > *after* the subclass version of the method is called. To implement > that without method combination, you have to split the method into > two parts, one of which gets called by the other, and then tell > everybody writing subclasses to only override the second method. > > With method combination and a generic function, you simply declare an > @after method for the base type, and it'll get called after the > normal methods for any subclasses. I've totally wanted to do that, so your email gave me a surge of hope, but I think the generic function approach is actually worse here (unless I'm totally misunderstanding). I think this would look like: class MyBase: @generic def mymethod(self): default_stuff(self) @after(mymethod) def later(self): more_stuff(self) class MyDerived(MyBase): mymethod = MyBase.mymethod @overload def mymethod(self): other_stuff(self) And if MyDerived just overrides mymethod normally, it replaces the @after part too. So instead of telling people to override this other method (with the benefit that immigrants from other languages are already used to this inconvenience), you have to tell them to stick two extra lines in front of their overrides. If they forget, the penalty is the same. What's the benefit from generic functions here? -- Namast?, Jeffrey Yasskin From ronaldoussoren at mac.com Fri Jul 27 08:11:28 2007 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 27 Jul 2007 08:11:28 +0200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> Message-ID: <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> On 27 Jul, 2007, at 5:38, Jeffrey Yasskin wrote: > I've sent the patch as http://python.org/sf/1761465 using Guido's > suggestion of using bytes, but I do philosophically prefer Talin's and > Ronald's suggestions. > > On 7/25/07, Ronald Oussoren <ronaldoussoren at mac.com> wrote: >> I've CC-ed Jack Jansen as he has maintained the Mac libraries for >> ages (from way before OS9 was shiny and new). > > Did you mean to add him to this thread? Yes, but I obviously failed to actually add this address to the CC list :-(. > > >> On Wednesday, July 25, 2007, at 07:18AM, "Jeffrey Yasskin" <jyasskin at gmail.com >> > wrote: >> > 5) Make a new hashable class for these codes which converts them to >> >and from ints and bytes and becomes the general argument type for >> the >> >apple platform interface. [Cleanest, but lots of work that I'm not >> >volunteering to do] >> >> A 6th option is a subclass of int. It's constructor would accept a >> string containing the 4CC and the repr/str method would return the >> string representation of the code. IMHO this is the cleanest >> representation of 4CCs in Python because those codes are basicy a >> "neat" way to enter integer literals in C. > > Na?ve question: How does that differ from option (5)? Just the > isinstance() behavior? That's the only change, but it is an important one. To reiterate: 4- character-codes in C are numeric literals and it would be best if Python reflected that fact to avoid surprises. 4-character-codes are definitely not arrays of bytes. One example of an API that returns a dictionary where some keys refer to values that are commonly encoded using 4-character-codes is - [NSFileManager fileAttributesAtPath:traverseLink]. > > > I said this would take a lot of work because I think the new type > needs to be implemented in C to be returned from PyMac_GetOSType(), > and it seemed like a bigger API change than just switching to bytes, > but it turns out that switching to bytes isn't particularly trivial > either when you have to cast for every use in a dict, so maybe the new > type would be easier. The new type would be easier and the API change isn't too bad. I don't think you'd have to implement this type in C, there just needs to be a hook to tell the C code about this type. > > >> This would also solve a problem that PyObjC users sometimes run >> into: Several C/Objective-C APIs return a dictionary where one of >> the values is an integer and where one would commonly use 4CCs to >> write down literals. This currently causes unexpected failures but >> would do the right thing with this option. > > I don't think that option (6) by itself solves with that particular > problem. If you call str() on one of those ints, you'd just get a > number, which is different from what would happen if you call str() on > the 4CC type. It might help though by handling comparisons correctly. That's what I meant by "the right thing": code would just work except for not printing a nice human-readable value. As you don't have to do that a lot anyway that's not really a problem. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3562 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070727/56c25928/attachment.bin From jyasskin at gmail.com Fri Jul 27 08:21:36 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 26 Jul 2007 23:21:36 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> Message-ID: <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> On 7/26/07, Guido van Rossum <guido at python.org> wrote: > (Personally > I'd like to use strings for the keys since {'TEXT': 'stuff'} is a lot > clearer than {1413830740: 'stuff'} when encountered in a debugging > session.) Good argument. You now have a patch that uses str() instead of b2i(). From ncoghlan at gmail.com Fri Jul 27 12:20:09 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 Jul 2007 20:20:09 +1000 Subject: [Python-3000] interaction between locals, builtins and except clause In-Reply-To: <e7ba66e40707261134x6389bb6bi17d31f0cce323a96@mail.gmail.com> References: <e7ba66e40707261006s71826ce8p73194134992697f3@mail.gmail.com> <dea92f560707261027k2462a9beu10b5c4b3398a458d@mail.gmail.com> <e7ba66e40707261134x6389bb6bi17d31f0cce323a96@mail.gmail.com> Message-ID: <46A9C6D9.3050405@gmail.com> Lisandro Dalcin wrote: > I think it should work in any 2x and 3K. Is this right? Perhaps this > trick could be used for some automated conversion tool targeting > backward compatibility with 2.x series. The backwards compatible version looks like this: def __iter__(self): if self == _mpi.INFO_NULL: return nkeys = _mpi.info_get_nkeys(self) for nthkey in xrange(nkeys): yield _mpi.info_get_nthkey(self, nthkey) The 2to3 converter will automatically convert the xrange() call to a range() call for the Py3k version. If you want to persist in trying to get the same code running on both Py3k and 2.x without using the 2->3 converter, then I suggest segregating it all into a compatibility module and do: from py3k_compat import _range The try/except code to determine how to set _range would then occur only once, regardless of the number of places where you used it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Fri Jul 27 14:55:00 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jul 2007 05:55:00 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> Message-ID: <ca471dc20707270555y4f271cd0j53b999b7d1f827cf@mail.gmail.com> On 7/26/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > On 7/26/07, Guido van Rossum <guido at python.org> wrote: > > (Personally > > I'd like to use strings for the keys since {'TEXT': 'stuff'} is a lot > > clearer than {1413830740: 'stuff'} when encountered in a debugging > > session.) > > Good argument. You now have a patch that uses str() instead of b2i(). Hmm... That only works as long as the bytes are ASCII. Is that a problem for aepack? Or are all its 4CCs chosen from a well-known set that's all-ASCII? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Jul 27 17:25:18 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jul 2007 08:25:18 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> Message-ID: <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> On 7/26/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > On 7/23/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > For example, one pattern that sometimes comes up in writing methods > > is that you have a base class that always wants to do something > > *after* the subclass version of the method is called. To implement > > that without method combination, you have to split the method into > > two parts, one of which gets called by the other, and then tell > > everybody writing subclasses to only override the second method. > > > > With method combination and a generic function, you simply declare an > > @after method for the base type, and it'll get called after the > > normal methods for any subclasses. > > I've totally wanted to do that, so your email gave me a surge of hope, > but I think the generic function approach is actually worse here > (unless I'm totally misunderstanding). I think this would look like: > > class MyBase: > @generic > def mymethod(self): > default_stuff(self) > @after(mymethod) > def later(self): > more_stuff(self) > > class MyDerived(MyBase): > mymethod = MyBase.mymethod > @overload > def mymethod(self): > other_stuff(self) > > And if MyDerived just overrides mymethod normally, it replaces the > @after part too. > > So instead of telling people to override this other method (with the > benefit that immigrants from other languages are already used to this > inconvenience), you have to tell them to stick two extra lines in > front of their overrides. If they forget, the penalty is the same. > What's the benefit from generic functions here? The more I think about this example (and the one in the PEP from which it's derived), the more I think this part is a frontal collision between two paradigms, and needs a lot more thought put into it. The need to say "mymethod = MyBase.mymethod" in the subclass, and the subtle disasters that happen if this is forgotten, and the rules that guide what code is called in what order when a subclass method is called with a type signature for which a better match exists in the base class, not to mention the combination of super() with next_method, all make me think that ths part of the PEP is not ready for public consumption just yet. Basic GFs, great. Before/after/around, good. Other method combinations, fine. But GFs in classes and subclassing? Not until we have a much better design. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bob at redivi.com Fri Jul 27 17:47:24 2007 From: bob at redivi.com (Bob Ippolito) Date: Fri, 27 Jul 2007 08:47:24 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <ca471dc20707270555y4f271cd0j53b999b7d1f827cf@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> <ca471dc20707270555y4f271cd0j53b999b7d1f827cf@mail.gmail.com> Message-ID: <6a36e7290707270847k556e6eb5mda4aeb83fa919499@mail.gmail.com> On 7/27/07, Guido van Rossum <guido at python.org> wrote: > On 7/26/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > > On 7/26/07, Guido van Rossum <guido at python.org> wrote: > > > (Personally > > > I'd like to use strings for the keys since {'TEXT': 'stuff'} is a lot > > > clearer than {1413830740: 'stuff'} when encountered in a debugging > > > session.) > > > > Good argument. You now have a patch that uses str() instead of b2i(). > > Hmm... That only works as long as the bytes are ASCII. Is that a > problem for aepack? Or are all its 4CCs chosen from a well-known set > that's all-ASCII? 4CCs are not all ASCII, they're Mac OS Roman. This is why in some of the C header files the constants turned into integers. -bob From dalcinl at gmail.com Fri Jul 27 17:49:14 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 27 Jul 2007 12:49:14 -0300 Subject: [Python-3000] docstring for dict.values Message-ID: <e7ba66e40707270849i5b640a8fxdff08c1f776947cc@mail.gmail.com> Why the docstrings for 'dict.values' says "a set-like object ..." ?? >>> list(dict(a=1,b=1,c=1).values()) [1, 1, 1] -- Lisandro Dalc?n From guido at python.org Fri Jul 27 18:19:24 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jul 2007 09:19:24 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <6a36e7290707270847k556e6eb5mda4aeb83fa919499@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> <ca471dc20707270555y4f271cd0j53b999b7d1f827cf@mail.gmail.com> <6a36e7290707270847k556e6eb5mda4aeb83fa919499@mail.gmail.com> Message-ID: <ca471dc20707270919n30c4788eldf2d444ab15378b9@mail.gmail.com> On 7/27/07, Bob Ippolito <bob at redivi.com> wrote: > 4CCs are not all ASCII, they're Mac OS Roman. This is why in some of > the C header files the constants turned into integers. Good to know! We should use that when converting them to Unicode. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Fri Jul 27 18:20:30 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 27 Jul 2007 12:20:30 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> Message-ID: <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> At 08:25 AM 7/27/2007 -0700, Guido van Rossum wrote: >Basic GFs, great. Before/after/around, good. Other method >combinations, fine. But GFs in classes and subclassing? Not until we >have a much better design. Sounds reasonable to me. The only time I actually use them in classes myself is to override existing generic functions that live outside the class, like ones from an Interface or a standalone generic. The main reason I included GFs-in-classes examples in the PEP is because of the "dynamic overloading" meme. In C++, Java, etc., you can use overloading in methods, so I wanted to show how you could do that, if you wanted to. I suspect that the simplest way to fix this in Py3K is with an "overloading" metaclass, as it would not even require any decorators. That is, you could provide a custom dictionary that records every definition of a function with the same name. The actual metaclass creation process would check for a method of the same name in a base class, and if it's generic (or the current class added more than one method), put a generic method in. With a little bit of work, you could probably determine whether you could get away with dropping the genericness in a subclass; specifically, if all the subclass-defined methods are "more specific" than all base class methods, then there's no need for them to be in the same generic function, unless they make next_method calls. Thus, you'll end up with normal methods except where absolutely necessary. Such a metaclass would make method overloads look pretty much the same as in OO languages with static overloading. The only remaining hole at that point would be reconciling super() and next_method. If you're using this metaclass, super() is only meaningful if you're not in the same generic function as is used in your base, while next_method() is only meaningful if you *are*. I don't know of any quick way to fix that, but I'll give it some thought. From guido at python.org Fri Jul 27 18:33:32 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jul 2007 09:33:32 -0700 Subject: [Python-3000] docstring for dict.values In-Reply-To: <e7ba66e40707270849i5b640a8fxdff08c1f776947cc@mail.gmail.com> References: <e7ba66e40707270849i5b640a8fxdff08c1f776947cc@mail.gmail.com> Message-ID: <ca471dc20707270933x6e156629pce0a88d1f67138b0@mail.gmail.com> On 7/27/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > Why the docstrings for 'dict.values' says "a set-like object ..." ?? > > >>> list(dict(a=1,b=1,c=1).values()) > [1, 1, 1] Oops, that's a bug! Thanks for reporting. Committed revision 56584. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dalcinl at gmail.com Fri Jul 27 18:51:46 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 27 Jul 2007 13:51:46 -0300 Subject: [Python-3000] docstring for dict.values In-Reply-To: <ca471dc20707270933x6e156629pce0a88d1f67138b0@mail.gmail.com> References: <e7ba66e40707270849i5b640a8fxdff08c1f776947cc@mail.gmail.com> <ca471dc20707270933x6e156629pce0a88d1f67138b0@mail.gmail.com> Message-ID: <e7ba66e40707270951q12f36de1o85534d7dc77620b1@mail.gmail.com> It seems the same applies to dict.items() ... $ set(dict(a=[]).items()) >>> set(dict(a=[]).items()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'list' On 7/27/07, Guido van Rossum <guido at python.org> wrote: > On 7/27/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > > Why the docstrings for 'dict.values' says "a set-like object ..." ?? > > > > >>> list(dict(a=1,b=1,c=1).values()) > > [1, 1, 1] > > Oops, that's a bug! Thanks for reporting. -- Lisandro Dalc?n From guido at python.org Fri Jul 27 18:55:27 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Jul 2007 09:55:27 -0700 Subject: [Python-3000] docstring for dict.values In-Reply-To: <e7ba66e40707270951q12f36de1o85534d7dc77620b1@mail.gmail.com> References: <e7ba66e40707270849i5b640a8fxdff08c1f776947cc@mail.gmail.com> <ca471dc20707270933x6e156629pce0a88d1f67138b0@mail.gmail.com> <e7ba66e40707270951q12f36de1o85534d7dc77620b1@mail.gmail.com> Message-ID: <ca471dc20707270955m5b68b09cie0253fd9af703ff1@mail.gmail.com> That's a totally different issue. The result of .items() is a set. But if it contains an unhashable object you can't convert it to a regular set. --Guido On 7/27/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > It seems the same applies to dict.items() ... > > $ set(dict(a=[]).items()) > >>> set(dict(a=[]).items()) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > TypeError: unhashable type: 'list' > > > On 7/27/07, Guido van Rossum <guido at python.org> wrote: > > On 7/27/07, Lisandro Dalcin <dalcinl at gmail.com> wrote: > > > Why the docstrings for 'dict.values' says "a set-like object ..." ?? > > > > > > >>> list(dict(a=1,b=1,c=1).values()) > > > [1, 1, 1] > > > > Oops, that's a bug! Thanks for reporting. > > -- > Lisandro Dalc?n > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Fri Jul 27 18:56:00 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 28 Jul 2007 02:56:00 +1000 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> Message-ID: <46AA23A0.7090807@gmail.com> Phillip J. Eby wrote: > I don't know of any quick way to fix that, but I'll give it some thought. In the meantime, do we want the standard metaclass to complain when it finds generic functions in class bodies, or to automatically treat them as static methods? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Sat Jul 28 03:19:44 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 28 Jul 2007 13:19:44 +1200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> Message-ID: <46AA99B0.5070105@canterbury.ac.nz> Guido van Rossum wrote: > We could use Unicode strings where in the past we used 8-bit strings, > but that would be somewhat nasty when there's ever one of these codes > that's not pure ASCII Since this is a Mac-specific thing (and Classic-originated at that), I think you can be pretty sure that any non-ASCII value is to be interpreted according to the Macintosh character set, if it's meant to be a character at all. So I would suggest using the Macintosh encoding when converting these to and from unicode. -- Greg From greg.ewing at canterbury.ac.nz Sat Jul 28 03:41:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 28 Jul 2007 13:41:49 +1200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> Message-ID: <46AA9EDD.4010405@canterbury.ac.nz> Ronald Oussoren wrote: > To reiterate: 4-character-codes in C are numeric literals I'm still not convinced about that. The major use of 4-char codes is in data structures stored on disk. I'd be surprised if they're really stored in the opposite order on little endian architectures, since then you wouldn't be able to use a file system written from a PPC on an Intel or vice versa. It's much more likely that the C macros used to handle 4-char codes change depending on the architecture, so that the order in memory stays the same. So I stand by my opinion that *conceptually* they're still 4-character arrays, and the fact that they're declared as ints in C is just a kludge to work around limitations of C. > One example of an API that returns a dictionary where some keys refer > to values that are commonly encoded using 4-character-codes is - > [NSFileManager fileAttributesAtPath:traverseLink]. Blarg. Well, I think Cocoa is braindamaged in the way it handles this. It should convert them to/from some friendlier type automatically. Note that if you use a specialised type for this in Python, it still won't help with APIs like this that munge them in with other types polymorphically. You'll still have to do an explicit conversion in your Python code. So it doesn't really matter whether the representation in Python is a unicode string, byte string or something special. -- Greg From martin at v.loewis.de Sat Jul 28 11:10:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 28 Jul 2007 11:10:34 +0200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <46AA99B0.5070105@canterbury.ac.nz> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> <46AA99B0.5070105@canterbury.ac.nz> Message-ID: <46AB080A.7030208@v.loewis.de> > Since this is a Mac-specific thing (and Classic-originated at > that), I think you can be pretty sure that any non-ASCII value > is to be interpreted according to the Macintosh character set, > if it's meant to be a character at all. Please understand that there is no such thing as "the Macintosh character set". Somebody else gave already the correct answer: these codes are commonly interpreted as MacRoman. Regards, Martin From tomerfiliba at gmail.com Sat Jul 28 17:06:50 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 28 Jul 2007 17:06:50 +0200 Subject: [Python-3000] optimizing [x]range Message-ID: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> currently, testing for "x in xrange(y)" is an O(n) operation. since xrange objects (which would become range in py3k) are not real lists, there's no reason that __contains__ be an O(n). it can easily be made into an O(1) operation. here's a demo code (it should be trivial to implement this in CPython) class xxrange(object): def __init__(self, *args): if len(args) == 1: self.start, self.stop, self.step = (0, args[0], 1) elif len(args) == 2: self.start, self.stop, self.step = (args[0], args[1], 1) elif len(args) == 3: self.start, self.stop, self.step = args else: raise TypeError("invalid number of args") def __iter__(self): i = self.start while i < self.stop: yield i i += self.step def __contains__(self, num): if num < self.start or num > self.stop: return False return (num - self.start) % self.step == 0 print list(xxrange(7)) # [0, 1, 2, 3, 4, 5, 6] print list(xxrange(0, 7, 2)) # [0, 2, 4, 6] print list(xxrange(1, 7, 2)) # [1, 3, 5] print 98 in xxrange(100) # True print 98 in xxrange(0, 100, 2) # True print 99 in xxrange(0, 100, 2) # False print 98 in xxrange(1, 100, 2) # False print 99 in xxrange(1, 100, 2) # True -tomer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070728/3d78c559/attachment.htm From guido at python.org Sun Jul 29 00:04:02 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 28 Jul 2007 15:04:02 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> Do we really need another way to spell a <= x < b? Have you got a real-world use case in mind for the version with step > 1? I'm at most lukewarm; I'd be willing to look at a patch to the C code in the py3k-struni branch, plus unit tests though. --Guido On 7/28/07, tomer filiba <tomerfiliba at gmail.com> wrote: > currently, testing for "x in xrange(y)" is an O(n) operation. > > since xrange objects (which would become range in py3k) are not real lists, > there's no reason that __contains__ be an O(n). it can easily be made into > an O(1) operation. here's a demo code (it should be trivial to implement > this in CPython) > > > class xxrange(object): > def __init__(self, *args): > if len(args) == 1: > self.start , self.stop, self.step = (0, args[0], 1) > elif len(args) == 2: > self.start, self.stop, self.step = (args[0], args[1], 1) > elif len(args) == 3: > self.start, self.stop, self.step = args > else: > raise TypeError("invalid number of args") > > def __iter__(self): > i = self.start > while i < self.stop: > yield i > i += self.step > > def __contains__(self, num): > if num < self.start or num > self.stop: > return False > return (num - self.start) % self.step == 0 > > > print list(xxrange(7)) # [0, 1, 2, 3, 4, 5, 6] > print list(xxrange(0, 7, 2)) # [0, 2, 4, 6] > print list(xxrange(1, 7, 2)) # [1, 3, 5] > print 98 in xxrange(100) # True > print 98 in xxrange(0, 100, 2) # True > print 99 in xxrange(0, 100, 2) # False > print 98 in xxrange(1, 100, 2) # False > print 99 in xxrange(1, 100, 2) # True > > > > -tomer > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jdahlin at async.com.br Sun Jul 29 02:32:37 2007 From: jdahlin at async.com.br (Johan Dahlin) Date: Sat, 28 Jul 2007 21:32:37 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> Message-ID: <46ABE025.4050204@async.com.br> Guido van Rossum wrote: > Do we really need another way to spell a <= x < b? FWIW, I'd say yes; I sometimes find it a bit difficult to remember how the operator should be placed, there are several possible ways of making a mistake, eg; a > x < b a < x > b a < x < b a > x > b Now, the range syntax seems a bit strange at first, but I find it easier to parse: if x in range(a, b) There's no way to incorrectly parse that, it's immediately known that the programmer tries to see whether x is in a specific range. It seems to be used quite widely already; http://google.com/codesearch?hl=en&q=+%5E.*if%5Cs%2B.*%5Cs%2Bin%5Cs%2Brange%5C(.*%24&start=10&sa=N Johan From lcaamano at gmail.com Sun Jul 29 03:04:29 2007 From: lcaamano at gmail.com (lcaamano) Date: Sun, 29 Jul 2007 01:04:29 -0000 Subject: [Python-3000] uuid creation not thread-safe? In-Reply-To: <ca471dc20707201052p68883fc5l3efd8ecc5cfd497f@mail.gmail.com> References: <ca471dc20707201052p68883fc5l3efd8ecc5cfd497f@mail.gmail.com> Message-ID: <1185671069.839769.274620@z28g2000prd.googlegroups.com> On Jul 20, 1:52 pm, "Guido van Rossum" <gu... at python.org> wrote: > I discovered what appears to be a thread-unsafety inuuid.py. This is > in the trunk as well as in 3.x; I'm using the trunk here for easy > reference. There's some code around like 395: > > import ctypes, ctypes.util > _buffer = ctypes.create_string_buffer(16) > > This creates a *global* buffer which is used as the output parameter > to later calls to _uuid_generate_random() and _uuid_generate_time(). > For example, around line 481, in uuid1(): > > _uuid_generate_time(_buffer) > returnUUID(bytes=_buffer.raw) > > Clearly if two threads do this simultaneously they are overwriting > _buffer in unpredictable order. There are a few other occurrences of > this too. > > I find it somewhat disturbing that what seems a fairly innocent > function that doesn't *appear* to have global state is nevertheless > not thread-safe. Would it be wise to fix this, e.g. by allocating a > fresh output buffer inside uuid1() and other callers? > I didn't find any reply to this, which is odd, so forgive me if it's old news. I agree with you that it's not thread safe and that a local buffer in the stack should fix it. Just for reference, the thread-safe uuid extension we've been using since python 2.1, which I don't recall where we borrow it from, uses a local buffer in the stack. It looks like this: -----begin uuid.c-------------- static char uuid__doc__ [] = "DCE compatible Universally Unique Identifier module"; #include "Python.h" #include <uuid/uuid.h> static char uuidgen__doc__ [] = "Create a new DCE compatible UUID value"; static PyObject * uuidgen(void) { uuid_t out; char buf[48]; uuid_generate(out); uuid_unparse(out, buf); return PyString_FromString(buf); } static PyMethodDef uuid_methods[] = { {"uuidgen", uuidgen, 0, uuidgen__doc__}, {NULL, NULL} /* Sentinel */ }; DL_EXPORT(void) inituuid(void) { Py_InitModule4("uuid", uuid_methods, uuid__doc__, (PyObject *)NULL, PYTHON_API_VERSION); } -----end uuid.c-------------- It also seems that using uuid_generate()/uuid_unparse() should be faster than using uuid_generate_random() and then creating a python object to call its __str__ method. If so, it would be nice if the uuid.py module also provided equivalent fast versions that returned strings instead of objects. -- Luis P Caamano Atlanta, GA, USA From jyasskin at gmail.com Sun Jul 29 03:28:08 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 28 Jul 2007 18:28:08 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <6a36e7290707270847k556e6eb5mda4aeb83fa919499@mail.gmail.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <ca471dc20707262207k6c5844e3t1bc051e0f70ee5ca@mail.gmail.com> <5d44f72f0707262321o553347c9j72ac55e195107f9b@mail.gmail.com> <ca471dc20707270555y4f271cd0j53b999b7d1f827cf@mail.gmail.com> <6a36e7290707270847k556e6eb5mda4aeb83fa919499@mail.gmail.com> Message-ID: <5d44f72f0707281828l50394be3o8baf18080426ecc8@mail.gmail.com> On 7/27/07, Bob Ippolito <bob at redivi.com> wrote: > On 7/27/07, Guido van Rossum <guido at python.org> wrote: > > On 7/26/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > > > On 7/26/07, Guido van Rossum <guido at python.org> wrote: > > > > (Personally > > > > I'd like to use strings for the keys since {'TEXT': 'stuff'} is a lot > > > > clearer than {1413830740: 'stuff'} when encountered in a debugging > > > > session.) > > > > > > Good argument. You now have a patch that uses str() instead of b2i(). > > > > Hmm... That only works as long as the bytes are ASCII. Is that a > > problem for aepack? Or are all its 4CCs chosen from a well-known set > > that's all-ASCII? > > 4CCs are not all ASCII, they're Mac OS Roman. This is why in some of > the C header files the constants turned into integers. Good point; my second patch is wrong. I'm satisfied that b2i is correct, even if it's not ideal from either a debugging or a "what are 4CCs really" perspective, so I don't intend to do any more work on it. Would one of the mac enthusiasts like to take over from here? From joe at bitworking.org Sun Jul 29 03:47:05 2007 From: joe at bitworking.org (Joe Gregorio) Date: Sat, 28 Jul 2007 18:47:05 -0700 Subject: [Python-3000] base64 - bytes and strings Message-ID: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> I just submitted a patch to fix test_urllib2 and test_cookielib. In the process of fixing them I came across something that looks like an inconsistency in the base64 module. Right now the base64 module uses bytes for everything. That is, a value passed to b64encode() must be bytes, and the base64 encoded response is also in bytes. Shouldn't it operate more like expat, with the stuff to be encoded is bytes and the encoded form is a string? It seems more natural if the encoded value is a string since base64 encoding is a way of encoding data so that it fits in US-ASCII. Thanks, -joe -- Joe Gregorio http://bitworking.org From ncoghlan at gmail.com Sun Jul 29 04:40:18 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Jul 2007 12:40:18 +1000 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> Message-ID: <46ABFE12.5000101@gmail.com> Joe Gregorio wrote: > Shouldn't it operate more like expat, with the stuff to be > encoded is bytes and the encoded form is a string? > It seems more natural if the encoded value is a string since > base64 encoding is a way of encoding data > so that it fits in US-ASCII. Py3k strings are unicode, so returning a string would mean you just have to encode it again using the ascii codec to get the bytes to put on the wire. Since the base64 module already knows that it is producing ASCII, it makes more sense to consider it as a byte->byte encoding. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Sun Jul 29 06:58:32 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 29 Jul 2007 06:58:32 +0200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> Message-ID: <46AC1E78.7050900@v.loewis.de> > It seems more natural if the encoded value is a string since > base64 encoding is a way of encoding data > so that it fits in US-ASCII. There have been long debates about this specific question in the past. The point that proponents of "base64 encoding should yield strings" miss is that US-ASCII is *both* a character set, and an encoding. So if data "is in US-ASCII", it's not all that clear whether the focus is on it being character data, or bytes. base64 is used "on the wire" most of the time (except when it gets embedded into XML); from that point of view, it's more natural that encoding yields bytes. Regards, Martin From g.brandl at gmx.net Sun Jul 29 07:41:58 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 29 Jul 2007 07:41:58 +0200 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46ABE025.4050204@async.com.br> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> Message-ID: <f8h9b2$h4u$1@sea.gmane.org> Johan Dahlin schrieb: > Guido van Rossum wrote: >> Do we really need another way to spell a <= x < b? > > FWIW, I'd say yes; I sometimes find it a bit difficult to remember > how the operator should be placed, there are several possible ways > of making a mistake, eg; > > a > x < b > a < x > b > a < x < b > a > x > b > > Now, the range syntax seems a bit strange at first, but I find it easier > to parse: > > if x in range(a, b) > > There's no way to incorrectly parse that, it's immediately known that > the programmer tries to see whether x is in a specific range. What about floats? Currently, "3.5 in range(5)" is False, while "0 <= 3.5 < 5" is True. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From jdahlin at async.com.br Sun Jul 29 02:32:37 2007 From: jdahlin at async.com.br (Johan Dahlin) Date: Sat, 28 Jul 2007 21:32:37 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> Message-ID: <46ABE025.4050204@async.com.br> Guido van Rossum wrote: > Do we really need another way to spell a <= x < b? FWIW, I'd say yes; I sometimes find it a bit difficult to remember how the operator should be placed, there are several possible ways of making a mistake, eg; a > x < b a < x > b a < x < b a > x > b Now, the range syntax seems a bit strange at first, but I find it easier to parse: if x in range(a, b) There's no way to incorrectly parse that, it's immediately known that the programmer tries to see whether x is in a specific range. It seems to be used quite widely already; http://google.com/codesearch?hl=en&q=+%5E.*if%5Cs%2B.*%5Cs%2Bin%5Cs%2Brange%5C(.*%24&start=10&sa=N Johan From skip at pobox.com Sun Jul 29 14:18:49 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 29 Jul 2007 07:18:49 -0500 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46ABE025.4050204@async.com.br> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> Message-ID: <18092.34217.512107.677855@montanaro.dyndns.org> Johan> FWIW, I'd say yes; I sometimes find it a bit difficult to Johan> remember how the operator should be placed, there are several Johan> possible ways of making a mistake, eg; Johan> a > x < b Johan> a < x > b Johan> a < x < b Johan> a > x > b If the two angles face the same way it's correct. It's hard to see how it could be any other way. Johan> Now, the range syntax seems a bit strange at first, but I find it easier Johan> to parse: Johan> if x in range(a, b) You can't spell a <= x <= b or a < x < b without remembering to add or subtract 1 from the appropriate endpoint if x in range(a, b+1) if x in range(a-1, b) That would seem to me to be more error-prone than confusion about a < x < b Skip From tomerfiliba at gmail.com Sun Jul 29 14:48:21 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 29 Jul 2007 12:48:21 -0000 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> Message-ID: <1185713301.934213.186010@q75g2000hsh.googlegroups.com> i understand there is no much need for using ranges instead of intervals (a < x < b), but: 1) it's already supported. you CAN use x in range(100), so why not optimize it? there's no justification to keep it an O(N) operation (you're not trying to punish anyone :). it just calls for adding a __contains__ slot to range objects. the cost is very minimal. 2) ranges are more like set-builder notation, i.e. evens = {2*n | n in N} which can be written as evens = range(0, maxint, 2) odds = range(1, maxint, 2) you cannot phrase "x in odds" in "a <= x < b" notation. sure, just use modulu, but then it just gets ugly. if range (== xrange) would be a cheap, O(1) operation, there's not reason to to use it when it suits well. -tomer On Jul 29, 12:04 am, "Guido van Rossum" <gu... at python.org> wrote: > Do we really need another way to spell a <= x < b? Have you got a > real-world use case in mind for the version with step > 1? > > I'm at most lukewarm; I'd be willing to look at a patch to the C code > in the py3k-struni branch, plus unit tests though. > > --Guido > From guido at python.org Sun Jul 29 19:33:34 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Jul 2007 10:33:34 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46ABE025.4050204@async.com.br> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> Message-ID: <ca471dc20707291033v1bec4607pad990db38f82eda8@mail.gmail.com> On 7/28/07, Johan Dahlin <jdahlin at async.com.br> wrote: > Guido van Rossum wrote: > > Do we really need another way to spell a <= x < b? > > FWIW, I'd say yes; I sometimes find it a bit difficult to remember > how the operator should be placed, there are several possible ways > of making a mistake, eg; > > a > x < b > a < x > b > a < x < b > a > x > b Were you drunk at the time? :-) > Now, the range syntax seems a bit strange at first, but I find it easier > to parse: > > if x in range(a, b) > > There's no way to incorrectly parse that, it's immediately known that > the programmer tries to see whether x is in a specific range. > > It seems to be used quite widely already; > > http://google.com/codesearch?hl=en&q=+%5E.*if%5Cs%2B.*%5Cs%2Bin%5Cs%2Brange%5C(.*%24&start=10&sa=N Sorry, 50 hits is not "quite widely". Did you find *any* examples using a step > 1? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Jul 29 19:37:53 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Jul 2007 10:37:53 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <1185713301.934213.186010@q75g2000hsh.googlegroups.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <1185713301.934213.186010@q75g2000hsh.googlegroups.com> Message-ID: <ca471dc20707291037w63378621qe5683d9076518083@mail.gmail.com> On 7/29/07, tomer filiba <tomerfiliba at gmail.com> wrote: > i understand there is no much need for using ranges instead of > intervals (a < x < b), but: > > 1) it's already supported. you CAN use x in range(100), so > why not optimize it? there's no justification to keep it an > O(N) operation (you're not trying to punish anyone :). > it just calls for adding a __contains__ slot to range objects. > the cost is very minimal. Don't forget the *cost* in terms of code bloat. Plus, I asked for a patch. Where is it? This is not Santa Claus's email address. You're expected to contribute more than a wish. > 2) ranges are more like set-builder notation, i.e. > evens = {2*n | n in N} > which can be written as > evens = range(0, maxint, 2) > odds = range(1, maxint, 2) > you cannot phrase "x in odds" in "a <= x < b" notation. > sure, just use modulu, but then it just gets ugly. Um, your range "solution" would break for examples like 2**100 in evens (it's hard to think of a more even number than that. :-) Typically one would write a predicate that tested for modulo. > if range (== xrange) would be a cheap, O(1) operation, there's > not reason to to use it when it suits well. But I still see no reason to make it O(1). > -tomer > > On Jul 29, 12:04 am, "Guido van Rossum" <gu... at python.org> wrote: > > Do we really need another way to spell a <= x < b? Have you got a > > real-world use case in mind for the version with step > 1? > > > > I'm at most lukewarm; I'd be willing to look at a patch to the C code > > in the py3k-struni branch, plus unit tests though. > > > > --Guido > > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tomerfiliba at gmail.com Sun Jul 29 20:06:47 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 29 Jul 2007 20:06:47 +0200 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707291037w63378621qe5683d9076518083@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <1185713301.934213.186010@q75g2000hsh.googlegroups.com> <ca471dc20707291037w63378621qe5683d9076518083@mail.gmail.com> Message-ID: <1d85506f0707291106w4c48cbb4s5ecac3415722cb2b@mail.gmail.com> On 7/29/07, Guido van Rossum <guido at python.org> wrote: > Don't forget the *cost* in terms of code bloat. Plus, I asked for a > patch. Where is it? This is not Santa Claus's email address. You're > expected to contribute more than a wish. first off all, that's not the politest way to put it, especially since i have submitted some patches before. second, i've already given a 3-line implementation in python. it would only take two minutes to convert it to C, save the unit tests. third, i'm busy over my head studying of my exams. forth, due to lack of public interest, i might as well withdraw this. > Um, your range "solution" would break for examples like 2**100 in > evens (it's hard to think of a more even number than that. :-) there's no reason why (x)range shouldn't support longs too. after all, it only tests for modulo internally (*unlike* how it works today, which will never finish). besides, that's not the point. i'm only saying there's no reason that testing for containment in range objects (which are no longer lists), should be O(N), when it can easily be made O(1) in under 10 lines of C code. > But I still see no reason to make it O(1). as you wish. *goes back in time and withdraws the proposal* -tomer From martin at v.loewis.de Sun Jul 29 21:03:06 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 29 Jul 2007 21:03:06 +0200 Subject: [Python-3000] optimizing [x]range In-Reply-To: <1d85506f0707291106w4c48cbb4s5ecac3415722cb2b@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <1185713301.934213.186010@q75g2000hsh.googlegroups.com> <ca471dc20707291037w63378621qe5683d9076518083@mail.gmail.com> <1d85506f0707291106w4c48cbb4s5ecac3415722cb2b@mail.gmail.com> Message-ID: <46ACE46A.5080102@v.loewis.de> >> Don't forget the *cost* in terms of code bloat. Plus, I asked for a >> patch. Where is it? This is not Santa Claus's email address. You're >> expected to contribute more than a wish. > > first off all, that's not the politest way to put it, especially since i have > submitted some patches before. second, i've already given a 3-line > implementation in python. it would only take two minutes to convert > it to C, save the unit tests. third, i'm busy over my head studying of > my exams. forth, due to lack of public interest, i might as well > withdraw this. It's not that *you* were asked to contribute it. Guido just pointed out that, without a patch, it won't get implemented. More so if the patch is as trivial as you expect it to be. We *all* are under time pressure - I am busy giving exams, for example. So to your original question "why not optimize it?", there is a very simple answer: there is no ready implementation available. > besides, that's not the point. i'm only saying there's no reason that > testing for containment in range objects (which are no longer lists), > should be O(N), when it can easily be made O(1) in under 10 lines of C > code. Nothing is easy. Neal Norwitz was working on implementing xrange with longs, and it took an entire week. The patch is still sitting on SF somewhere. Regards, Martin From unknown_kev_cat at hotmail.com Sun Jul 29 22:11:12 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Sun, 29 Jul 2007 16:11:12 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org><ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com><f7lk7q$9m6$1@sea.gmane.org><ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com><f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> Message-ID: <f8is94$tgh$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707181158p17417c9cg37c5382d61b53fe5 at mail.gmail.com... > On 7/18/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> >> I'm wondering if the recusion limit on my build is getting set too low >> >> somehow. >> > >> > Can you find out what it is? sys.getrecursionlimit(). >> >> Hmm... It is a limit of 1000. >> That is probably large enough, no? > > Yes, that's what it is for me. > >> Anyway, from some basic testing it looks like marshal is always throwing >> that error when marshal.load() is called. >> However, marshal.loads() works fine. >> >> Might this be another encoding related error? > > Perhaps. Or something else. Do try to investigate. > What I have found is that (on CYGWIN) all of marshal seems to work fine except for marshal.load(). marshal.dump()'s output can be read by 2.5's marshal.load() without problem. 3k's marshal.load() will not load the data from 3k's marshal.dump or 2.5's marshal.dump() It turns out to be a fault due to an uninitialized value on a RFILE. Specifically, the following patch (part of marshal_load in marshal.c fixes things. -----BEGIN PATCH----- Index: Python/marshal.c =================================================================== --- Python/marshal.c (revision 56620) +++ Python/marshal.c (working copy) @@ -1181,6 +1181,7 @@ return NULL; } rf.strings = PyList_New(0); + rf.depth=0; result = read_object(&rf); Py_DECREF(rf.strings); Py_DECREF(data); -----END PATCH----- I'll submit the patch to sourceforge if needed, although the fact that all the other loading methods do set rf.depth=0 (including PyMarshal_ReadObjectFromFile) indicates to me that this is definately the correct patch. Looks like that line was accidentally forgoten. From unknown_kev_cat at hotmail.com Mon Jul 30 01:34:22 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Sun, 29 Jul 2007 19:34:22 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org><ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com><f7lk7q$9m6$1@sea.gmane.org><ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com><f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> Message-ID: <f8j862$st0$1@sea.gmane.org> "Joe Smith" <unknown_kev_cat at hotmail.com> wrote in message news:f8is94$tgh$1 at sea.gmane.org... > > "Guido van Rossum" <guido at python.org> wrote in message > news:ca471dc20707181158p17417c9cg37c5382d61b53fe5 at mail.gmail.com... >> On 7/18/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >>> >> I'm wondering if the recusion limit on my build is getting set too >>> >> low >>> >> somehow. >>> > >>> > Can you find out what it is? sys.getrecursionlimit(). >>> >>> Hmm... It is a limit of 1000. >>> That is probably large enough, no? >> >> Yes, that's what it is for me. >> >>> Anyway, from some basic testing it looks like marshal is always throwing >>> that error when marshal.load() is called. >>> However, marshal.loads() works fine. >>> >>> Might this be another encoding related error? >> >> Perhaps. Or something else. Do try to investigate. >> > > What I have found is that (on CYGWIN) all of marshal seems to work fine > except for marshal.load(). > marshal.dump()'s output can be read by 2.5's marshal.load() without > problem. > 3k's marshal.load() will not > load the data from 3k's marshal.dump or 2.5's marshal.dump() > > It turns out to be a fault due to an uninitialized value on a RFILE. > Specifically, the following patch (part of marshal_load in marshal.c fixes > things. > > -----BEGIN PATCH----- > Index: Python/marshal.c > =================================================================== > --- Python/marshal.c (revision 56620) > +++ Python/marshal.c (working copy) > @@ -1181,6 +1181,7 @@ > return NULL; > } > rf.strings = PyList_New(0); > + rf.depth=0; > result = read_object(&rf); > Py_DECREF(rf.strings); > Py_DECREF(data); > -----END PATCH----- > > I'll submit the patch to sourceforge if needed, although the fact that all > the other loading methods > do set rf.depth=0 (including PyMarshal_ReadObjectFromFile) indicates to me > that this is definately the correct patch. > Looks like that line was accidentally forgoten. With that patch, things on CYGWIN are getting close to matching the other platforms. There are still some problems with the 'Python' directory for example. This is because of a change in the internals of Cygwin. Cygwin does have "managed mounts" which allow for case sensitivity. Compiling Python inside a managed mount eliminates those issues. So it is not a terribly big deal. If I patch io.py to default to "utf-8" rather than using the filesystem encoding (ascii), that fixes a few more things. (test_coding.py and test_minidom.py) Then there are only 2 test failures remaining that are not listed on the wiki. One of them is a very minor issue in test_platform.py. The other is a more complicated problem with test_mailbox.py First the test_platform problem. sys.executable lacks the ".exe" suffix. In order for libc_ver to work it would need to be passed the exe suffix. The cygwin specific hack in the test_platform.py does not work if when using a managed mount because a managed mount is case sensitive, so isdir(executable) returns false. (Using libc_ver with no arguments also fails for the same basic reason. (although there is no cygwin hack in that case.)) That said, using libc_ver on cygwin would not be meaningful because cygwin uses newlib instead of libc/glibc. The mailbox.py problem seems troubling, I'm getting exceptions of type "IOError: [Errno 13] Permission denied" on "./@test" (aka. test_support.TESTFN) . This is true for all tests after TestMbox's run of test_add(). All of TestMailDir works fine. TestMbox's test_add() works fine, but all the remaining tests that use "./@test" fail. Sounds like something is not getting cleaned up correctly. That said no "@test" file or directory is left behind after the end of the test. (For whats its worth, Cygwin's python 2.5 (as installed on my system) fails 2 of the tests in it's version of test_mailbox.py, both with "IOError: [Errno 13] Permission denied"). From guido at python.org Mon Jul 30 02:09:37 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Jul 2007 17:09:37 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8is94$tgh$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> Message-ID: <ca471dc20707291709y68e8c301qa0845fb9dab1874a@mail.gmail.com> On 7/29/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > What I have found is that (on CYGWIN) all of marshal seems to work fine > except for marshal.load(). > marshal.dump()'s output can be read by 2.5's marshal.load() without problem. > 3k's marshal.load() will not > load the data from 3k's marshal.dump or 2.5's marshal.dump() > > It turns out to be a fault due to an uninitialized value on a RFILE. > Specifically, the following patch (part of marshal_load in marshal.c fixes > things. > > -----BEGIN PATCH----- > Index: Python/marshal.c > =================================================================== > --- Python/marshal.c (revision 56620) > +++ Python/marshal.c (working copy) > @@ -1181,6 +1181,7 @@ > return NULL; > } > rf.strings = PyList_New(0); > + rf.depth=0; > result = read_object(&rf); > Py_DECREF(rf.strings); > Py_DECREF(data); > -----END PATCH----- > > I'll submit the patch to sourceforge if needed, although the fact that all > the other loading methods > do set rf.depth=0 (including PyMarshal_ReadObjectFromFile) indicates to me > that this is definately the correct patch. > Looks like that line was accidentally forgoten. Thanks! Looks like that line was accidentally dropped -- perhaps as a result of a merge. It was in all previous versions. Anyway, I've added it back. Committed revision 56623. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Mon Jul 30 02:08:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 30 Jul 2007 12:08:22 +1200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46ABFE12.5000101@gmail.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46ABFE12.5000101@gmail.com> Message-ID: <46AD2BF6.5080507@canterbury.ac.nz> Nick Coghlan wrote: > Py3k strings are unicode, so returning a string would mean you just have > to encode it again using the ascii codec to get the bytes to put on the > wire. I still believe that producing a string is conceptually the right thing to do. The point of base64 is to encode binary data as text, not binary data as binary data. If I ever had a reason to use base64, it would be because I had a "wire" that would accept text but not binary data, e.g. a file open in text mode, or some other text that I wanted to embed it in. Getting bytes in that situation would force me to make an *extra* conversion. -- Greg From greg.ewing at canterbury.ac.nz Mon Jul 30 02:22:56 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 30 Jul 2007 12:22:56 +1200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AC1E78.7050900@v.loewis.de> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> Message-ID: <46AD2F60.9050907@canterbury.ac.nz> Martin v. L?wis wrote: > The point that proponents of "base64 encoding should > yield strings" miss is that US-ASCII is *both* a character set, > and an encoding. Last time we discussed this, I went and looked at the RFC where base64 is defined. According to my reading of it, nowhere does it say that base64 output must be encoded as US-ASCII, nor any other particular encoding. It *does* say that the characters used were chosen because they are present in a number of different character sets in use at the time, and explicity mentions EBCDIC as one of those character sets. To me this quite clearly says that base64 is defined at the level of characters, not encodings. -- Greg From guido at python.org Mon Jul 30 02:27:20 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Jul 2007 17:27:20 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8j862$st0$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> Message-ID: <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> On 7/29/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > There are still some problems with the 'Python' directory for example. This > is because of a change in the internals of Cygwin. > Cygwin does have "managed mounts" which allow for case sensitivity. > Compiling Python inside a managed mount eliminates those issues. > So it is not a terribly big deal. > > If I patch io.py to default to "utf-8" rather than using the filesystem > encoding (ascii), that fixes a few more things. (test_coding.py and > test_minidom.py) How come the filesystem decoding is set to ASCII? > Then there are only 2 test failures remaining that are not listed on the > wiki. One of them is a very minor issue in test_platform.py. > The other is a more complicated problem with test_mailbox.py Please do add these to the wiki, so we won't forget them. If you want CYGWIN to work, existing CYGWIN users will have to contribute patches. > First the test_platform problem. > sys.executable lacks the ".exe" suffix. In order for libc_ver to work it > would need to be passed the exe suffix. > The cygwin specific hack in the test_platform.py does not work if when using > a managed mount because a managed mount is case sensitive, so > isdir(executable) returns false. > (Using libc_ver with no arguments also fails for the same basic reason. > (although there is no cygwin hack in that case.)) That said, using > libc_ver on cygwin would not be meaningful because cygwin uses newlib > instead of libc/glibc. > > > > > The mailbox.py problem seems troubling, I'm getting exceptions of type > "IOError: [Errno 13] Permission denied" on "./@test" (aka. > test_support.TESTFN) . > > This is true for all tests after TestMbox's run of test_add(). All of > TestMailDir works fine. TestMbox's test_add() works fine, but all the > remaining tests that use "./@test" fail. Sounds like something is not > getting cleaned up correctly. That said no "@test" file or directory is left > behind after the end of the test. Sounds like test_add() changes the perms on the file. > (For whats its worth, Cygwin's python 2.5 (as installed on my system) fails > 2 of the tests in it's version of test_mailbox.py, both with "IOError: > [Errno 13] Permission denied"). OK, so the fix may need to be backported -- or perhaps (if it's easier to find and fix in 2.5) forward ported. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Mon Jul 30 02:29:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 30 Jul 2007 12:29:14 +1200 Subject: [Python-3000] optimizing [x]range In-Reply-To: <18092.34217.512107.677855@montanaro.dyndns.org> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> Message-ID: <46AD30DA.6050405@canterbury.ac.nz> skip at pobox.com wrote: > You can't spell > > a <= x <= b > > or > > a < x < b > > without remembering to add or subtract 1 from the appropriate endpoint I think the use cases for this are where you're trying to express a range-like condition, i.e 'a <= x < b'. Then you have to make sure you get the right relations in the right places, which is the same kind of burden as remembering to add or subtract 1 in the right places. -- Greg From guido at python.org Mon Jul 30 02:43:19 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Jul 2007 17:43:19 -0700 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD2F60.9050907@canterbury.ac.nz> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> Message-ID: <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> On 7/29/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > Martin v. L?wis wrote: > > The point that proponents of "base64 encoding should > > yield strings" miss is that US-ASCII is *both* a character set, > > and an encoding. > > Last time we discussed this, I went and looked at the > RFC where base64 is defined. According to my reading of > it, nowhere does it say that base64 output must be > encoded as US-ASCII, nor any other particular encoding. > > It *does* say that the characters used were chosen because > they are present in a number of different character sets > in use at the time, and explicity mentions EBCDIC as one > of those character sets. > > To me this quite clearly says that base64 is defined at > the level of characters, not encodings. I think it's all besides the point. We should look at the use cases. I recall finding out once that a Java base64 implementation was much slower than Python's -- turns out that the Java version was converting everything to Strings; then we needed to convert back to bytes in order to output them. My suspicion is that in the end using bytes is more efficient *and* more convenient; it might take some looking through the email package to confirm or refute this. (The email package hasn't been converted to work in the struni branch; that should happen first. Whoever does that might well be the one who tells us how they want their base64 APIs.) An alternative might be to provide both string- and bytes-based APIs, although that doesn't help with deciding what the default one (the one that uses the same names as 2.x) should do. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Mon Jul 30 03:21:13 2007 From: talin at acm.org (Talin) Date: Sun, 29 Jul 2007 18:21:13 -0700 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> Message-ID: <46AD3D09.9060006@acm.org> Guido van Rossum wrote: > On 7/29/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: >> Martin v. L?wis wrote: >>> The point that proponents of "base64 encoding should >>> yield strings" miss is that US-ASCII is *both* a character set, >>> and an encoding. >> Last time we discussed this, I went and looked at the >> RFC where base64 is defined. According to my reading of >> it, nowhere does it say that base64 output must be >> encoded as US-ASCII, nor any other particular encoding. >> >> It *does* say that the characters used were chosen because >> they are present in a number of different character sets >> in use at the time, and explicity mentions EBCDIC as one >> of those character sets. >> >> To me this quite clearly says that base64 is defined at >> the level of characters, not encodings. > > I think it's all besides the point. We should look at the use cases. I > recall finding out once that a Java base64 implementation was much > slower than Python's -- turns out that the Java version was converting > everything to Strings; then we needed to convert back to bytes in > order to output them. My suspicion is that in the end using bytes is > more efficient *and* more convenient; it might take some looking > through the email package to confirm or refute this. (The email > package hasn't been converted to work in the struni branch; that > should happen first. Whoever does that might well be the one who tells > us how they want their base64 APIs.) > > An alternative might be to provide both string- and bytes-based APIs, > although that doesn't help with deciding what the default one (the one > that uses the same names as 2.x) should do. One has to be careful when comparing performance with Java, because you need to specify whether you are using the "old" API or the "new" one. (It seems that almost everything in Java has an old and new API.) I just recently did some work in Java with base64 encoding, or more specifically, URL-safe encoding. The library I was working with both consumed and produced arrays of bytes. I think that this is the correct way to do it. In my specific use case, I was dealing with encrypted bytes, where the encrypter also produced and consumed bytes, so it made sense that the character encoder did the same. But even in the case where no encryption is involved, I think dealing with bytes is right. I believe that converting a Unicode string to a base64 encoded form is necessarily a 2-step process. Step 1 is to convert from unicode characters to bytes, using an appropriate character encoding (UTF-8, UTF-16, and so on), and step 2 is to encode the bytes in base64. The resulting encoded byte array is actually an ASCII-encoded string, although it's more convenient in most cases to represent it as a byte array than as a string object, since it's likely in most cases that you are about to send it over the wire. So in other words, it makes sense to think about the conversion as (string -> bytes -> string), the actual objects being generated are (string -> bytes -> bytes). The fact that 2 steps are needed is evident by the fact that there are actually two encodings involved, and these two encodings are mostly independent. So for example, one could just as easily base64-encode a UTF-16 encoded string as opposed to a UTF-8 encoded string. So the fact that you can vary one encoding without changing the other would seem to argue for the notion that they are distinct and independent. Nor can you collapse to a single encoding step - you can't go directly from an internal unicode string to base64, since a unicode string is an array of code units which range from 1-0xffff, and base64 can't encode a number larger than 255. Now, you *could* do both steps in a single function. However, you still have to choose what the intermediate encoding form is, even if you never actually see it. Usually this will be UTF-8. -- Talin From skip at pobox.com Mon Jul 30 04:17:24 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 29 Jul 2007 21:17:24 -0500 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46AD30DA.6050405@canterbury.ac.nz> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> Message-ID: <18093.18996.944540.279864@montanaro.dyndns.org> Greg> I think the use cases for this are where you're trying to express Greg> a range-like condition, i.e 'a <= x < b'. Then you have to make Greg> sure you get the right relations in the right places, which is the Greg> same kind of burden as remembering to add or subtract 1 in the Greg> right places. I think it's easier to learn that 'a <= x < b' is logically equivalent to 'a <= x and x < b' than inferring that 'x in range(a, b)' means the same thing. In fact, due to shortcut semantics they actually don't mean quite the same thing since b might not get evaluated in the cascading comparison case. Given that I find the cascading comparisons clearer I see no reason to optimize the "in range(...)" case. Skip From rrr at ronadam.com Mon Jul 30 04:54:04 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 29 Jul 2007 21:54:04 -0500 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD2BF6.5080507@canterbury.ac.nz> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46ABFE12.5000101@gmail.com> <46AD2BF6.5080507@canterbury.ac.nz> Message-ID: <46AD52CC.4060403@ronadam.com> Greg Ewing wrote: > Nick Coghlan wrote: >> Py3k strings are unicode, so returning a string would mean you just have >> to encode it again using the ascii codec to get the bytes to put on the >> wire. > > I still believe that producing a string is conceptually > the right thing to do. The point of base64 is to encode > binary data as text, not binary data as binary data. > > If I ever had a reason to use base64, it would be because > I had a "wire" that would accept text but not binary data, > e.g. a file open in text mode, or some other text that I > wanted to embed it in. Getting bytes in that situation > would force me to make an *extra* conversion. Not extra, you just need to make sure your binary data is in the correct range of values the text device you are sending to can handle. As long as it is, it should just work. That is the primary purpose of the base64 encoding. Keep in mind you are sending byte "characters", not integers. So it would work like the following I think, with the application having responsibility of doing the object to bytes conversion and back, instead of the base64 encoder being limited to only strings. OUTPUT: convert object to bytes -> encode_64 to bytes -> bytes to output INPUT: bytes from input* -> decode_64 to bytes -> convert bytes to object *Reads text "characters" into bytes instance. By refusing to guess what the object is, we also create an opportunity to manipulate the results or source further in a bytes instance without doing multiple (or needless) conversions to and from strings. Cheers, Ron From tjreedy at udel.edu Mon Jul 30 05:02:02 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 29 Jul 2007 23:02:02 -0400 Subject: [Python-3000] base64 - bytes and strings References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com><46ABFE12.5000101@gmail.com> <46AD2BF6.5080507@canterbury.ac.nz> Message-ID: <f8jkb8$mre$1@sea.gmane.org> "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote in message news:46AD2BF6.5080507 at canterbury.ac.nz... | Nick Coghlan wrote: | > Py3k strings are unicode, so returning a string would mean you just have | > to encode it again using the ascii codec to get the bytes to put on the | > wire. | | I still believe that producing a string is conceptually | the right thing to do. The point of base64 is to encode | binary data as text, not binary data as binary data. On the contrary, to me, the point of base64 is to encode bytes into a subset of bytes more or less guaranteed to not get mangled during transport. That these safe bytes correspond to ascii chars (which, yes,is why they are safe) does not, to me, make the resulting quasi-random sequence 'text'. tjr From talin at acm.org Mon Jul 30 05:41:08 2007 From: talin at acm.org (Talin) Date: Sun, 29 Jul 2007 20:41:08 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> Message-ID: <46AD5DD4.8000509@acm.org> Phillip J. Eby wrote: > At 08:25 AM 7/27/2007 -0700, Guido van Rossum wrote: >> Basic GFs, great. Before/after/around, good. Other method >> combinations, fine. But GFs in classes and subclassing? Not until we >> have a much better design. > > Sounds reasonable to me. The only time I actually use them in > classes myself is to override existing generic functions that live > outside the class, like ones from an Interface or a standalone generic. I've been thinking about this quite a bit over the last week, and in particular thinking about the kinds of use cases that I would want to use GFs for. One idea that I had a while back, but rejected as simply too much of a kludge, was to say that for GFs that are also methods, we would use regular Python method dispatching on the first argument, followed by GF overload dispatching on the subsequent arguments. The reason that this is a kludge is that now the first argument behaves differently than the others. (Pay no attention to the specific syntax here.) class A: @overload def method(self, x:object): ... class B(A): @overload def method(self, x:int): ... b = B() b.method("test") // Method not found With regular GFs, this example works because there is a method that satisfies the constraints - the one in A. But since the first argument dominates all of the decision, by the time we get to B, the overloads in A are no longer accessible. Its as if each subclass is in it's own little GF world. However, even though this is clumsy from a theoretical standpoint, from a practical standpoint it may not be all that bad. Most of the time, when I want to declare a GF that is also a method, I'm just using the class as a namespace to hold all this stuff, and I really don't care much about whether subclasses can extend it or not. I'm not using the type of 'self' to select different implementations in this case. And in the case where I really do want to do dynamic dispatch on *all* of the arguments, including the first one, then it's more likely that I will declare the GF as a global function, instead of as a method. An example of this would be an AST walker: I have a class which walks an AST and does various transformations on it, such as constant folding or algebraic simplification. Since I have some state that I'd need to carry around, I'll put that in a class, and then I'll have class methods which are dynamically dispatched on the *second* argument which is the node type: class ASTWalker: @overload def foldConstants(self, node:InfixOperator): ... @overload def foldConstants(self, node:UnaryOperator): ... @overload def foldConstants(self, node:ConstantInteger): ... @overload def foldConstants(self, node:ConstantString): ... In this case, I have no need to subclass the class, and I'm only doing dynamic dispatching on the second argument. So basically what I would propose is that we simply say that we don't mix normal overloading and multi-method dispatch until PJE comes up with his better solution. -- Talin From martin at v.loewis.de Mon Jul 30 06:27:13 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 30 Jul 2007 06:27:13 +0200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD3D09.9060006@acm.org> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> <46AD3D09.9060006@acm.org> Message-ID: <46AD68A1.8030403@v.loewis.de> > I believe that converting a Unicode string to a base64 encoded form is > necessarily a 2-step process. I think that part is undebated. What is the debate is whether base64.encodestring (which accepts bytes) should *produce* (unicode) strings, which would then have to be encoded as us-ascii. That would make a process of going from unicode to base64 bytes a three-step process: tosend = base64.encodestring(data.encode("utf-8")).encode("ascii") Currently, you can spare the last step if you do want bytes, and need to specify .decode("ascii") if you want strings. Regards, Martin From martin at v.loewis.de Mon Jul 30 06:51:51 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 30 Jul 2007 06:51:51 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> Message-ID: <46AD6E67.50407@v.loewis.de> >> If I patch io.py to default to "utf-8" rather than using the filesystem >> encoding (ascii), that fixes a few more things. (test_coding.py and >> test_minidom.py) > > How come the filesystem decoding is set to ASCII? I guess there are two problems: a) MS_WINDOWS isn't defined, and the relevant code in bltinmodule.c doesn't special-case cygwin, and b) setlocale is defined on Cygwin, but doesn't work. >> (For whats its worth, Cygwin's python 2.5 (as installed on my system) fails >> 2 of the tests in it's version of test_mailbox.py, both with "IOError: >> [Errno 13] Permission denied"). I found that in many cases, this is a virus scanner or the indexing service interfering. They open the file, and then the test suite cannot delete it. Regards, Martin From jyasskin at gmail.com Mon Jul 30 07:56:27 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 29 Jul 2007 22:56:27 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46AD5DD4.8000509@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <46AD5DD4.8000509@acm.org> Message-ID: <5d44f72f0707292256t2e4a8d25y976637456618eeaa@mail.gmail.com> On 7/29/07, Talin <talin at acm.org> wrote: > Phillip J. Eby wrote: > > At 08:25 AM 7/27/2007 -0700, Guido van Rossum wrote: > >> Basic GFs, great. Before/after/around, good. Other method > >> combinations, fine. But GFs in classes and subclassing? Not until we > >> have a much better design. > > > > Sounds reasonable to me. The only time I actually use them in > > classes myself is to override existing generic functions that live > > outside the class, like ones from an Interface or a standalone generic. > > I've been thinking about this quite a bit over the last week, and in > particular thinking about the kinds of use cases that I would want to > use GFs for. > > One idea that I had a while back, but rejected as simply too much of a > kludge, was to say that for GFs that are also methods, we would use > regular Python method dispatching on the first argument, followed by GF > overload dispatching on the subsequent arguments. > > The reason that this is a kludge is that now the first argument behaves > differently than the others. > > (Pay no attention to the specific syntax here.) > > class A: > @overload > def method(self, x:object): > ... > > class B(A): > @overload > def method(self, x:int): > ... > > b = B() > b.method("test") // Method not found > > With regular GFs, this example works because there is a method that > satisfies the constraints - the one in A. But since the first argument > dominates all of the decision, by the time we get to B, the overloads in > A are no longer accessible. Its as if each subclass is in it's own > little GF world. > > However, even though this is clumsy from a theoretical standpoint, from > a practical standpoint it may not be all that bad. Most of the time, > when I want to declare a GF that is also a method, I'm just using the > class as a namespace to hold all this stuff, and I really don't care > much about whether subclasses can extend it or not. I'm not using the > type of 'self' to select different implementations in this case. FWIW, this dispatching on self before overloading on the rest of the arguments is what C++ does, and I think also what Java does. To get the parent class's methods to participate in overloading, you have to say using the_parent::method; which looks pretty similar to Phillip's method = the_parent.method except that using can appear anywhere within a class, while the method assignment looks like it needs to appear first. Unfortunately, this seems to surprise people, although I don't have any experience about whether an alternative would be better or worse. A lot of times, I write: class Parent { virtual int method(int i, string s) = 0; int method(Bar b) { return method(b.i, b.s); } int method(Quux q, Foo f) { return method(q.i, q.t + f.x); } // Note that the non-virtual methods forward to the virtual one. // Although the visibility would be the same if they were virtual too. }; class Child : public Parent { virtual int method(int i, string s) { return do_something(i, s); } }; and am then surprised that Child c; c.method(Bar(...)); fails to compile. (Because I forgot the using declaration in Child. Again.) So the possibility is practically clumsy, but there's a precedent for it. -- Namast?, Jeffrey Yasskin From hasan.diwan at gmail.com Mon Jul 30 08:40:29 2007 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Sun, 29 Jul 2007 23:40:29 -0700 Subject: [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> Message-ID: <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> The issue seems to be in the socket.py close method. It needs to sleep socket.SO_REUSEADDR seconds before returning. Yes, it is a simple fix in python, but the socket code is C. I found some code in socket.py and made the changes. Patch is available at http://sourceforge.net/tracker/index.php?func=detail&aid=1763387&group_id=5470&atid=305470 -- enjoy your week. -- Cheers, Hasan Diwan <hasan.diwan at gmail.com> From jdahlin at async.com.br Sun Jul 29 19:46:55 2007 From: jdahlin at async.com.br (Johan Dahlin) Date: Sun, 29 Jul 2007 14:46:55 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707291033v1bec4607pad990db38f82eda8@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <ca471dc20707291033v1bec4607pad990db38f82eda8@mail.gmail.com> Message-ID: <46ACD28F.6030808@async.com.br> Guido van Rossum wrote: [..] > > Were you drunk at the time? :-) No, I just remember that I made that mistake several times. [..] > > It seems to be used quite widely already; > > > > http://google.com/codesearch?hl=en&q=+%5E.*if%5Cs%2B.*%5Cs%2Bin%5Cs%2Brange%5C(.*%24&start=10&sa=N > > > > Sorry, 50 hits is not "quite widely". Not everything is known to google's code search. I'm just saying that there's code out there that uses this syntax. > > Did you find *any* examples using a step > 1? No, I didn't. I'm not arguing for that use case either, I'm mainly interested in the use case where step == 1. Johan From Jack.Jansen at cwi.nl Sun Jul 29 21:32:42 2007 From: Jack.Jansen at cwi.nl (Jack Jansen) Date: Sun, 29 Jul 2007 21:32:42 +0200 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> Message-ID: <1719694F-6FCA-4A64-8A42-5BC60C3A793C@cwi.nl> One minor point (that may already have been addressed, I've not seen the whole discussion): note that 4CCs not only occur on the Mac but also in various other contexts: AIFF files use 4CCs to define chunk types, MP4 files use them for a gazillion different things (media types, codec types, etc). Actually, codec types are generally defined by their 4CC, and some times these even get to be used as their mainstream name (divx and xvid). It may be worthwhile to add generalized support somewhere to handle converting 4CCs from readable to binary representation. And, of course, the world being as it is some formats (Mac OSTypes, for example, and probably quicktime/mp4 as well, but I'm not sure) represent 4CCs in big-endian order, others (AIFF) in little-endian. -- Jack Jansen, <Jack.Jansen at cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman From skip at pobox.com Mon Jul 30 15:13:18 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 30 Jul 2007 08:13:18 -0500 Subject: [Python-3000] io library/PEP 3116 bits Message-ID: <18093.58350.892824.688493@montanaro.dyndns.org> I was looking at PEP 3116 to try and figure out what the newline keyword argument was for (it was mentioned in a couple replies to some checkin comments and I see it in io.py). It's not really mentioned in the PEP as far as I could tell other than this: Some new features include universal newlines and character set encoding and decoding. The io.open() docstring has this to say: newline: optional newlines specifier; must be None, '\n' or '\r\n'; specifies the line ending expected on input and written on output. If None, use universal newlines on input and use os.linesep on output. Shouldn't '\r' be provided as an option for Macs? Also, shouldn't the "U" mode flag be discarded (2to3 could maybe do this)? Is this particular bit of backwards compatibility all that necessary? The other thing I wanted to comment on is the default value for n in the various read methods. In some places it's -1 (why not zero? *), but in other places it's None, with presumably the same meaning. Shouldn't this be consistent across all read methods? The couple read methods mentioned in PEP 3116 only mention n=-1 as a default. Skip (*) A few days ago at work I saw someone check in a piece of code with f.read(-1) That looked so strange to me I had to look up its meaning. I don't think I had ever seen someone explicitly call read with a -1 arg. S From bwinton at latte.ca Mon Jul 30 15:41:21 2007 From: bwinton at latte.ca (Blake Winton) Date: Mon, 30 Jul 2007 09:41:21 -0400 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD68A1.8030403@v.loewis.de> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> <46AD3D09.9060006@acm.org> <46AD68A1.8030403@v.loewis.de> Message-ID: <46ADEA81.8000609@latte.ca> Martin v. L?wis wrote: > The debate is whether base64.encodestring (which accepts bytes) > should *produce* (unicode) strings, which would then have to be > encoded as us-ascii. That would make a process of going from > unicode to base64 bytes a three-step process: > > tosend = base64.encodestring(data.encode("utf-8")).encode("ascii") > > Currently, you can spare the last step if you do want bytes, > and need to specify .decode("ascii") if you want strings. As a vote for keeping it, does anyone really want to encode the base64-ed data as something other than "ascii"? I mean, does it make any sense to write: tosend = base64.encodestring(data.encode("utf-8")).encode("UTF-16") ? Even if you could, I believe the resulting string would be un-processable by any other base-64 decoding tool. Later, Blake. From unknown_kev_cat at hotmail.com Mon Jul 30 18:12:17 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Mon, 30 Jul 2007 12:12:17 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org><f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> Message-ID: <f8l2l4$2d7$1@sea.gmane.org> ""Martin v. L?wis"" <martin at v.loewis.de> wrote in message news:46AD6E67.50407 at v.loewis.de... >>> If I patch io.py to default to "utf-8" rather than using the filesystem >>> encoding (ascii), that fixes a few more things. (test_coding.py and >>> test_minidom.py) >> >> How come the filesystem decoding is set to ASCII? > > I guess there are two problems: a) MS_WINDOWS isn't defined, and the > relevant code in bltinmodule.c doesn't special-case cygwin, and b) > setlocale is defined on Cygwin, but doesn't work. Cygwin's setlocale function only supports the "C" locale. I am a bit suprised that ASCII is returned rather than the system's default encoding. (I believe that should be Latin-1 on my system). >>> (For whats its worth, Cygwin's python 2.5 (as installed on my system) >>> fails >>> 2 of the tests in it's version of test_mailbox.py, both with "IOError: >>> [Errno 13] Permission denied"). > > I found that in many cases, this is a virus scanner or the indexing > service interfering. They open the file, and then the test suite cannot > delete it. Good guesses, but the indexing service is turned off, and I am not running any virus scanning software. The failures for the 2.5 test come from lines that look like: "for line in f:". The failures in the 3k tests come from the lines that attempt to open the file. It does seem likely though that something is going wrong with the deletion, effectively delaying it, which is triggering the errors. But i'm not sure what. But actually looking closely I was mistaken. Only one test failed under 2.5. That was TestMH's test_pack. It may be a fluke. Or perhaps the Cygwin Python maintainer understood that failure and decided it was nothing to worry about. GvR's thoughts of changing file permissions do not seem to work, as test_add certainly does not look to be changing file permissions. From guido at python.org Mon Jul 30 19:09:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 10:09:27 -0700 Subject: [Python-3000] struni and the Apple four-character-codes In-Reply-To: <1719694F-6FCA-4A64-8A42-5BC60C3A793C@cwi.nl> References: <5d44f72f0707242218i71554da4x1d6924715f016562@mail.gmail.com> <4EFA40DF-0113-1000-ED2D-BFFF4499DECF-Webmail-10021@mac.com> <5d44f72f0707262038j7a2dd0dcued85d9ef6d014236@mail.gmail.com> <B707418E-B504-40D8-9EFF-1B1FB6216EFE@mac.com> <1719694F-6FCA-4A64-8A42-5BC60C3A793C@cwi.nl> Message-ID: <ca471dc20707301009k487f8775iaa5f8561c84bf09d@mail.gmail.com> On 7/29/07, Jack Jansen <Jack.Jansen at cwi.nl> wrote: > One minor point (that may already have been addressed, I've not seen > the whole discussion): note that 4CCs not only occur on the Mac but > also in various other contexts: AIFF files use 4CCs to define chunk > types, MP4 files use them for a gazillion different things (media > types, codec types, etc). Actually, codec types are generally defined > by their 4CC, and some times these even get to be used as their > mainstream name (divx and xvid). > > It may be worthwhile to add generalized support somewhere to handle > converting 4CCs from readable to binary representation. And, of > course, the world being as it is some formats (Mac OSTypes, for > example, and probably quicktime/mp4 as well, but I'm not sure) > represent 4CCs in big-endian order, others (AIFF) in little-endian. And some support both, detecting the byte order from the 4CC. (This excludes palindromic 4CCs. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 30 19:20:50 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 10:20:50 -0700 Subject: [Python-3000] io library/PEP 3116 bits In-Reply-To: <18093.58350.892824.688493@montanaro.dyndns.org> References: <18093.58350.892824.688493@montanaro.dyndns.org> Message-ID: <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> On 7/30/07, skip at pobox.com <skip at pobox.com> wrote: > I was looking at PEP 3116 to try and figure out what the newline keyword > argument was for (it was mentioned in a couple replies to some checkin > comments and I see it in io.py). It's not really mentioned in the PEP as > far as I could tell other than this: > > Some new features include universal newlines and character set encoding > and decoding. > > The io.open() docstring has this to say: > > newline: optional newlines specifier; must be None, '\n' or '\r\n'; > specifies the line ending expected on input and written on > output. If None, use universal newlines on input and > use os.linesep on output. > > Shouldn't '\r' be provided as an option for Macs? Also, shouldn't the "U" > mode flag be discarded (2to3 could maybe do this)? Is this particular bit > of backwards compatibility all that necessary? I don't think \r needs to be supported -- OSX uses \n; Python 3.0 isn't going to be ported to MacOS 9. We discussed this before; I promised I'd add \r support if anyone can find a current use case for it. So far none have been reported. Regarding dropping 'U': agreed. But since the fixer hasn't been written yet it hasn't been dropped yet. We need help for little niggling details like this! > The other thing I wanted to comment on is the default value for n in the > various read methods. In some places it's -1 (why not zero? *), but in > other places it's None, with presumably the same meaning. Shouldn't this be > consistent across all read methods? The couple read methods mentioned in > PEP 3116 only mention n=-1 as a default. > > Skip > > (*) A few days ago at work I saw someone check in a piece of code with > > f.read(-1) > > That looked so strange to me I had to look up its meaning. I don't think I > had ever seen someone explicitly call read with a -1 arg. read(0) means to read zero bytes. It always returns an empty string (or byte array). There are plenty of end cases where this is useful. read(), read(None) and read(-1) are all synonyms, meaning "read until EOF". The reason there are three spellings is mostly historic; because there are so many different file-like objects and not all of them implemented this consistently. Since the argument is an integer, it's the easiest to use -1 as the default; but since some classes used None as the default instead, some people started *passing* None, and then the need was born to support both. Arguably this was a bad idea, and we should add a new API readall() (one of the implementations already has this, and read(-1) calls it). Then the 2to3 fixer will have to recognize this. I welcome patches! But right now, getting the number of failing unit tests in the py3k-struni branch down to zero is more important. To help, see http://wiki.python.org/moin/Py3kStrUniTests. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 30 19:24:35 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 10:24:35 -0700 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46ADEA81.8000609@latte.ca> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> <46AD3D09.9060006@acm.org> <46AD68A1.8030403@v.loewis.de> <46ADEA81.8000609@latte.ca> Message-ID: <ca471dc20707301024q4b5a82cdjeba1dc7461efc12c@mail.gmail.com> On 7/30/07, Blake Winton <bwinton at latte.ca> wrote: > Martin v. L?wis wrote: > > The debate is whether base64.encodestring (which accepts bytes) > > should *produce* (unicode) strings, which would then have to be > > encoded as us-ascii. That would make a process of going from > > unicode to base64 bytes a three-step process: > > > > tosend = base64.encodestring(data.encode("utf-8")).encode("ascii") > > > > Currently, you can spare the last step if you do want bytes, > > and need to specify .decode("ascii") if you want strings. > > As a vote for keeping it, does anyone really want to encode the > base64-ed data as something other than "ascii"? > > I mean, does it make any sense to write: > tosend = base64.encodestring(data.encode("utf-8")).encode("UTF-16") > ? Even if you could, I believe the resulting string would be > un-processable by any other base-64 decoding tool. I think you're missing the point, the point being that the most common use needs bytes, so returning bytes is the most useful API design. And to answer your rhetorical question: yes, there are other conceivable encodings for base64; in particular EBCDIC. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 30 19:27:35 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 10:27:35 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <46AD6E67.50407@v.loewis.de> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> Message-ID: <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> On 7/29/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > >> (For whats its worth, Cygwin's python 2.5 (as installed on my system) fails > >> 2 of the tests in it's version of test_mailbox.py, both with "IOError: > >> [Errno 13] Permission denied"). > > I found that in many cases, this is a virus scanner or the indexing > service interfering. They open the file, and then the test suite cannot > delete it. Oh darn. I remember running into that in a completely different context. What's the solution? Turn off the virus scanner? Wait until it's done? I guess we could add something to test_support.unlink() that checks for windows or cygwin, and when it gets this error on cleanup, waits half a second and tries again, looping for a few seconds before giving up completely. Currently the unlink() call ignores all errors, so then the subsequent open() call gets the error. Unit tests that still call os.unlink() or os.remove() instead of test_support.unlink() should be updated anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Jul 30 20:07:50 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 11:07:50 -0700 Subject: [Python-3000] io library/PEP 3116 bits In-Reply-To: <DBADB8D9-9941-43C2-8B79-9440E1B00DAC@PageDNA.com> References: <18093.58350.892824.688493@montanaro.dyndns.org> <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> <DBADB8D9-9941-43C2-8B79-9440E1B00DAC@PageDNA.com> Message-ID: <ca471dc20707301107i60789370lfce119eb8ecb44d7@mail.gmail.com> On 7/30/07, Tony Lownds <tony at pagedna.com> wrote: > On Jul 30, 2007, at 10:20 AM, Guido van Rossum wrote: > > I don't think \r needs to be supported -- OSX uses \n; Python 3.0 > > isn't going to be ported to MacOS 9. We discussed this before; I > > promised I'd add \r support if anyone can find a current use case for > > it. So far none have been reported. > > I routinely work with OS X created files that use \r newlines. The most > common ones are Excel (when exporting to text) and Adobe Illustrator > EPS files. fair enough. We'll have to support \r then. I'll update the PEP; a patch for the code would be most welcome. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Mon Jul 30 20:20:14 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Jul 2007 14:20:14 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070721181442.48FB03A403A@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> Message-ID: <fb6fbf560707301120w2421cbc4s519a59a93163c937@mail.gmail.com> On 7/21/07, Phillip J. Eby <pje at telecommunity.com> wrote: >... If you have to use @somegeneric.before and > @somegeneric.after, you can't decide on your own to add > @somegeneric.debug. > However, if it's @before(somegeneric...), then you can add > @debug and @authorize and @discount and whatever else > you need for your > application, without needing to monkeypatch them in. I honestly don't see any difference here. @somegeneric.method implies that somegeneric is an existing object, and even that it already has rules for combining .before and .after; it can just as easily have a rule for combining arbitrary methods. If you're saying that @discount could include its own combination rules, then each method needs to repeat the boilerplate to pick apart the current decision tree. The only compensating "advantage" I see is that the decision tree could be changed arbitrarily from anywhere, even as "good practice." (Since my new @thumpit decorator would takes the generic as an argument, you won't see the name of the generic in my file; you might never see it there was iteration involved.) > Our brains run by pattern recognition, with more-specific > patterns taking precedence, so this is an easier model for your > brain to follow than step-by-step computation anyway. Only if you are confident that you have all the patterns enumerated. I realize that subclasses are theoretically just as arbitrary, but they aren't in practice. Base classes are almost always named directly, rather than indirectly through a variable. Subclassing (normally) affects only the first dimension, so you don't have a cartesian product to mentally resolve. You can certainly say now that configuration specialization should be in one place, and that dispatching on parameter patterns like (* # ignored , :int # actual int subclass , :Container # meets the Container ABC , 4<val<17.3 # value-specific rule ) is a bad idea -- but whenever I look at an application from the outside, well-organized configuration data is a rare exception. > At 10:55 PM 7/20/2007 -0700, Talin wrote: > >If it turns out that there's no way to get a callback when the > >class has finished being built, Could you clarify why the __class__ attribute being used by super is not sufficient? -jJ From tony at PageDNA.com Mon Jul 30 19:50:16 2007 From: tony at PageDNA.com (Tony Lownds) Date: Mon, 30 Jul 2007 10:50:16 -0700 Subject: [Python-3000] io library/PEP 3116 bits In-Reply-To: <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> References: <18093.58350.892824.688493@montanaro.dyndns.org> <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> Message-ID: <DBADB8D9-9941-43C2-8B79-9440E1B00DAC@PageDNA.com> On Jul 30, 2007, at 10:20 AM, Guido van Rossum wrote: > I don't think \r needs to be supported -- OSX uses \n; Python 3.0 > isn't going to be ported to MacOS 9. We discussed this before; I > promised I'd add \r support if anyone can find a current use case for > it. So far none have been reported. I routinely work with OS X created files that use \r newlines. The most common ones are Excel (when exporting to text) and Adobe Illustrator EPS files. -Tony From jimjjewett at gmail.com Mon Jul 30 21:00:46 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Jul 2007 15:00:46 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <5d44f72f0707292256t2e4a8d25y976637456618eeaa@mail.gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <46AD5DD4.8000509@acm.org> <5d44f72f0707292256t2e4a8d25y976637456618eeaa@mail.gmail.com> Message-ID: <fb6fbf560707301200o1065b65p5516689848dd579d@mail.gmail.com> On 7/30/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote: > On 7/29/07, Talin <talin at acm.org> wrote: > > Phillip J. Eby wrote: > > > At 08:25 AM 7/27/2007 -0700, Guido van Rossum wrote: > > >> ... But GFs in classes and subclassing? Not until we > > >> have a much better design. > > > The only time I actually use them in > > > classes myself is to override existing generic functions > > > that live outside the class Why are you overriding, instead of just specializing? Why not define the @overload operator so that it just registers the specialization with the base class? > > class A: > > @overload > > def method1(self, x:object): > > ... Should this register with a "global" generic method, so that method1(first_arg:A, x:object) forwards to A.method1(first_arg, x) > > class B(A): > > @overload > > def method(self, x:int): > > ... and this would register with A.method1 (or the global method1, depending on the previous answer) for the pattern method1(first_arg:B, x:int) > > b = B() > > b.method("test") // Method not found Instead, this would skip back to A.method1(self, "test") -- and I think the @overload decorator is sufficient warning. (I do wonder whether that is magical enough to call it @__overload__) -jJ From pje at telecommunity.com Mon Jul 30 21:45:33 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Jul 2007 15:45:33 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <fb6fbf560707301120w2421cbc4s519a59a93163c937@mail.gmail.co m> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <20070713173936.53C213A404D@sparrow.telecommunity.com> <f7pgki$6o3$1@sea.gmane.org> <ca471dc20707200749p4ed42134h453c7535c98cc73d@mail.gmail.com> <20070720174706.AE5773A40A8@sparrow.telecommunity.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <fb6fbf560707301120w2421cbc4s519a59a93163c937@mail.gmail.com> Message-ID: <20070730194510.08C733A40AA@sparrow.telecommunity.com> At 02:20 PM 7/30/2007 -0400, Jim Jewett wrote: >On 7/21/07, Phillip J. Eby <pje at telecommunity.com> wrote: > > >... If you have to use @somegeneric.before and > > @somegeneric.after, you can't decide on your own to add > > @somegeneric.debug. > > > However, if it's @before(somegeneric...), then you can add > > @debug and @authorize and @discount and whatever else > > you need for your > > application, without needing to monkeypatch them in. > >I honestly don't see any difference here. @somegeneric.method implies >that somegeneric is an existing object, and even that it already has >rules for combining .before and .after; it can just as easily have a >rule for combining arbitrary methods. I don't understand what you're saying or how it relates to what I said above. If you define a new kind of method qualifier (e.g. @discount), then all existing generic functions aren't suddenly going to grow a '.discount' attribute. That's what the above discussion is about -- how you *access* qualifier decorators. >If you're saying that @discount could include its own combination >rules, then each method needs to repeat the boilerplate to pick apart >the current decision tree. Still don't understand you. Method combination is done with a generic function called "combine_actions" which takes two arbitrary "method" objects and returns a new "method" representing their combination. There is no boilerplate or picking anything apart. > The only compensating "advantage" I see is >that the decision tree could be changed arbitrarily from anywhere, >even as "good practice." (Since my new @thumpit decorator would takes >the generic as an argument, you won't see the name of the generic in >my file; you might never see it there was iteration involved.) Decision trees are generated from a flat collection of rules; they're not directly manipulated. In the default implementation (based on Guido's prototype), the "tree" is just a big dictionary mapping tuples of types to "method" objects created by combining all the methods whose signatures are implied by that tuple of types. It's also sparse, in that it doesn't contain type combinations that haven't been looked up yet. So there isn't really any tree that you could "change" here. There's just a collection of rules, where a rule consists of a predicate, a definition order, a "body" (function), and a method factory. A predicate is a collection of possible signatures (e.g. the sequence of applicable types) -- i.e., an OR of ANDs. To actually build a tree, rules are turned into a set of "cases", where each case consists of one signature from the rule's predicate, plus a method instance created using the signature, body, and definition order. (Not all methods care about definition order, just ones like before/after.) In the default engine (loosely based on Guido's prototype), these cases are merged by using combine_actions() on any cases with the same signature, and stored in a dictionary called the "registry". The registry is built up incrementally as you add methods. When you call the function, a type tuple is built and looked up in the cache. If nothing is found in the cache, we loop over the *entire* registry, and build up a derived method, like this (actual code excerpt): try: f = cache[types] except KeyError: # guard against re-entrancy looking for the same thing... action = cache[types] = self.rules.default_action for sig in self.registry: if sig==types or implies(types, sig): action = combine_actions(action, self.registry[sig]) f = cache[types] = action return f(*args) The 'self.rules.default_action' is to method objects what zero is to numbers -- the start of the summing. Ordinarily, the default action is a NoMethodFound object -- a perfectly valid "method" implementation whose behavior is to raise an error. All other method types have higher combination precedence than NoMethodFound, so it always sinks to the end of any combination of methods. The relevant generic functions here are implies(), combine_actions(), and overrides() -- where combine_actions() calls overrides() to find out which action should override the other, and then returns overriding_action.override(overridden_action). The overrides() relationship of two actions of the same type (e.g. two Around methods), is defined by the implies() relationship of the action signatures. For Before/After methods, the definition order is used to resolve any ambiguity in the implies(). The .override() of a method is usually a new instance of the same method type, but with a "tail" that points to the overridden method, so that next_method will do the right thing. There are more details than this, of course, but the point is that method combination is 100% orthogonal to the dispatch tree mechanism. You can build any kind of dispatch engine you want, just by using combine_actions to combine the actions. The action types themselves only need to know how to .override() a lower precedence method and .merge() with a same-precedence method. And there needs to be an overrides() relationship defined between all pairs of method types, but in my current version of the implementation, overrides() is automatically transitive for any type-level relationship. So if you define a type that overrides Around, then it also overrides anything that Around overrides. So, for the most part you just say what types you want to override (and/or be overridden by), and maybe add a rule for how to compare two methods of your type (if the default of comparing by the implies() of signatures isn't sufficient). The way that generic functions make this incredible orthogonality and flexibility possible is itself an argument for generic functions, IMO. Certainly, it's a hell of an argument for implementing generic functions in terms of other generic functions, which is why I did it. It beats the crap out of my previous implementation approaches, which had way too much coupling between method combination and tree-building and rules and cases and whatnot. Separating these ideas into different functional/conceptual domains makes the whole thing easier to understand -- as long as you're not locked into procedural-implementation thinking. If you want to think step-by-step, it's potentially a vast increase in complication. On the other hand, it's like thinking about reference counting while writing Python code. Sure, you need to drop down to that level every now and then, but it's a waste of time to think about it 90% of the time. Being able to have a class of things that you *don't* think about is what makes Python a higher-level language than the C it's implemented with. In the same way, generic functions are a higher-level version of OO -- you get to think in terms of a domain's abstract operations, like implication, overriding, and combination in this example. The domain abstractions are not an "interface", nor are they methods or object types. They're more like "concepts", except that the term "concept" has been abused to refer to much lower-level things that can attach to only one object within an operation. The concept of implication is that there are imply-ers and imply-ees -- a role for each argument, each of which is an implicit interface or abstract object type. In traditional OO and even interfaces, there are considerable limits to your ability to specify such partial interfaces and the relationships between them, forcing you to choose arbitrary and implementation-defined organization to put them in. You then have to force-fit objects to have the right methods, because you didn't define an x.is_implied_by(y) relationship, only a x.implies(y) relationship. Thing is, a *relationship* doesn't belong to one side or the other -- it's a *relationship*. A third, independent thing. Like a GF method. In any program, these relationships already exist, and you still have to understand them. They're just forced into whatever pattern the designer chose or had thrust upon them to make them fit the at-best-binary nature of OO methods, instead of called out as explicit relationships, following the form of the problem domain. >I realize that subclasses are theoretically just as arbitrary, but >they aren't in practice. Right -- and neither are generic functions in normal usage. The only reason you think that subclasses aren't arbitrary is because you're used to the ways that things get force-fitted into those relationships. Whereas, with GF's, the program can simply model the application domain relationships, and you're going to know what patterns will follow because they'll reflect the application domain. For example, if you see implies() and combine_actions() and overrides(), are you going to have any problems knowing when you see a type, whether these GF's might have methods for that type? You'll know when to *look* for such a method, because you know what roles the arguments play in each GF. If the type might play such a role, then you'll want to know *how* it plays that role in connection with specific collaborators or circumstances -- and you'll know what method implementations to look for. It's ridiculously simple in practice, even though it sounds hard in theory. That's the very problem in fact -- in neither subclassing nor GF's can you solve such problems *in theory*. You can only solve them in *practice*, because it's only in the context of a specific program that you have any domain knowledge to apply -- i.e., knowledge about what general kinds of things the program is supposed to do and what general kinds of things it does them with. If you have that general knowledge, it's just as easy to handle one organization as the other -- but the GF-based version gives you the option of having a module that defines lots of basic "kinds of things it's supposed to do" up front, so that you have an idea of how to understand the "things it does them with" when you encounter them. >You can certainly say now that configuration specialization should be >in one place, and that dispatching on parameter patterns like > >(* # ignored >, :int # actual int subclass >, :Container # meets the Container ABC >, 4<val<17.3 # value-specific rule >) > >is a bad idea But I *don't* say that. What I say is that in practice, there are only a few natural places to *put* such a definition: * near the definition of Container (or int, but that's a builtin in this case) * near the definition of the generic function being overloaded * in a "concern-based" grouping, e.g. an appropriate module that groups together matters for some application-domain concept. (For example, an "ordering_policy" module might contain overrides for a variety of generic functions that relate to inventory, shipping, and billing, within the context of placing orders.) * in an application-designated catchall location Which of these locations is "best" depends on the overall size of the program. A one-module program is certainly small enough to not need to pick one. As a system gets bigger, some of the other usage patterns become more applicable. >-- but whenever I look at an application from the >outside, well-organized configuration data is a rare exception. That may be -- but one enormous advantage of generic functions is that you can always relocate your method definitions to a different module or different part of the same module without affecting the meaning of the program, as long as all the destination modules are imported by the time you execute any of the functions. In other words, if a program is messy, you can clean it up -- heck, it's potentially safer to do with an automatic refactoring tool, than other types of refactorings in Python. (e.g., changing the signature of a 'foo()' method is difficult to do safely because you don't necessarily know whether two arbitrary methods *named* 'foo' are semantically the same, whereas generic functions are objects, not names.) From pje at telecommunity.com Mon Jul 30 22:10:08 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Jul 2007 16:10:08 -0400 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> Message-ID: <20070730201511.C14ED3A406B@sparrow.telecommunity.com> At 12:20 PM 7/27/2007 -0400, Phillip J. Eby wrote: >At 08:25 AM 7/27/2007 -0700, Guido van Rossum wrote: > >Basic GFs, great. Before/after/around, good. Other method > >combinations, fine. But GFs in classes and subclassing? Not until we > >have a much better design. > >Sounds reasonable to me. The only time I actually use them in >classes myself is to override existing generic functions that live >outside the class, like ones from an Interface or a standalone generic. > >The main reason I included GFs-in-classes examples in the PEP is >because of the "dynamic overloading" meme. In C++, Java, etc., you >can use overloading in methods, so I wanted to show how you could do >that, if you wanted to. > >I suspect that the simplest way to fix this in Py3K is with an >"overloading" metaclass, as it would not even require any >decorators. That is, you could provide a custom dictionary that >records every definition of a function with the same name. The >actual metaclass creation process would check for a method of the >same name in a base class, and if it's generic (or the current class >added more than one method), put a generic method in. > >With a little bit of work, you could probably determine whether you >could get away with dropping the genericness in a subclass; >specifically, if all the subclass-defined methods are "more specific" >than all base class methods, then there's no need for them to be in >the same generic function, unless they make next_method calls. Thus, >you'll end up with normal methods except where absolutely necessary. > >Such a metaclass would make method overloads look pretty much the >same as in OO languages with static overloading. The only remaining >hole at that point would be reconciling super() and next_method. If >you're using this metaclass, super() is only meaningful if you're not >in the same generic function as is used in your base, while >next_method() is only meaningful if you *are*. > >I don't know of any quick way to fix that, but I'll give it some thought. I think I see how to resolve next_method() and super() now: if you create a new GF in a subclass, you just define its default_action to be something that calls super(). Then, you just use next_method() instead of super(). Currently the default default_action is a NoMethodFound action, but replacing it for a given GF is a piece of cake. So, an "overloading" metaclass could be written that would: 1. Use __prepare__ to catch multiple function assignments to the same name, converting them to overloads 2. Decide whether to combine those overloads with an existing generic in the base classes, or to create a new generic and chain it with a super() default action. 3. Automatically make the class object part of the overload registrations for 'self'. The principle downside to this approach is that only one metaclass can provide a __prepare__ dictionary, which means it's even more difficult to combine metaclasses than it is in today's Python -- which means I want to give a little more thought to PEP 3115, to see if there is any way to at least emulate the "derived metaclass rule" for __prepare__, that Python currently enforces for the base classes. In other words, a class' metaclass has to be a derivative of all its bases' metaclasses; ISTM that a __prepare__ namespace needs to be a derivative in some sense of all its bases' __prepare__ results. This probably isn't enforceable, but the pattern should be documented such that e.g. the overloading metaclass' __prepare__ would return a mapping that delegates operations to the mapping returned by its super()'s __prepare__, and the actual class creation would be similarly chained. PEP 3115 probably needs a section to explain these issues and recommend best practices for implementing __prepare__ and class creation on that basis. I'll write something up after I've thought this through some more. But I think this wraps up the overall question of *how* to integrate methods and GFs in a way that supports a more C++/Java-like overloading style (i.e., no decorators on individual overloads within a class). The main drawback is that it's a silent error if you leave off the metaclass. Another option of course would be to make this part of the default metaclass, but that would bring in the issue of needing a standard API (and default implementation) for GF's. In the meantime, though, it's nice to see a practical application for PEP 3115 -- i.e., implementing transparent Java-style overloading. It's absolutely not possible in 2.x without decorators, both because of the lack of argument annotations and the lack of a __prepare__-controlled class-suite namespace. From martin at v.loewis.de Mon Jul 30 23:39:40 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 30 Jul 2007 23:39:40 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> Message-ID: <46AE5A9C.5000103@v.loewis.de> Guido van Rossum schrieb: >> I found that in many cases, this is a virus scanner or the indexing >> service interfering. They open the file, and then the test suite cannot >> delete it. > > Oh darn. I remember running into that in a completely different > context. What's the solution? Turn off the virus scanner? Wait until > it's done? I never found the time to properly research the official solution. Looking at the DeleteFile documentation, the problem is slightly different, still: "The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED." So it is not the DeleteFile that fails, but the subsequent attempt to create a new file in the same place. For the test suite, the solution would be to always use a fresh file name for temporary files. Of course, it is then more important that all files created actually do get removed in the fixture. Regards, Martin From greg.ewing at canterbury.ac.nz Tue Jul 31 03:18:41 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:18:41 +1200 Subject: [Python-3000] optimizing [x]range In-Reply-To: <18093.18996.944540.279864@montanaro.dyndns.org> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> Message-ID: <46AE8DF1.1030409@canterbury.ac.nz> skip at pobox.com wrote: > Given that I find the cascading comparisons clearer I see no reason > to optimize the "in range(...)" case. The sort of thing I have in mind is where I have a sequence that I want to frequently iterate over the indices of, so I do r = xrange(len(myseq)) so I can write for i in r: ... Having done that, if I want to test whether some index j is within the range of indices for this sequence, it seems natural to write if j in r: ... Given the context, I think this is a very Obvious Way To Do It, and it's surprising that it isn't as efficient as it looks like it should be. -- Greg From greg.ewing at canterbury.ac.nz Tue Jul 31 03:26:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:26:28 +1200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD3D09.9060006@acm.org> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> <46AD3D09.9060006@acm.org> Message-ID: <46AE8FC4.8080602@canterbury.ac.nz> Talin wrote: > I believe that converting a Unicode string to a base64 encoded form is > necessarily a 2-step process. Well, yes, but only because base64 itself takes arbitrary binary data as input, not Unicode strings. Encoding *anything* other than binary data as base64 is going to require an extra step in that sense. > So the fact > that you can vary one encoding without changing the other would seem to > argue for the notion that they are distinct and independent. I would say that the first encoding is outside the scope of base64 and therefore irrelevant to this discussion. -- Greg From greg.ewing at canterbury.ac.nz Tue Jul 31 03:33:43 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:33:43 +1200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <f8jkb8$mre$1@sea.gmane.org> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46ABFE12.5000101@gmail.com> <46AD2BF6.5080507@canterbury.ac.nz> <f8jkb8$mre$1@sea.gmane.org> Message-ID: <46AE9177.4010109@canterbury.ac.nz> Terry Reedy wrote: > On the contrary, to me, the point of base64 is to encode bytes into a > subset of bytes more or less guaranteed to not get mangled during > transport. Yes, and the way it goes about it is to map the binary data to a sequence of characters, the reasoning being that most such channels can at least encode those characters somehow, because they're designed for the purpose of sending text. > That these safe bytes correspond to ascii chars They only correspond to ASCII character *codes* when the channel in question is designed to transmit text encoded in ASCII. If the channel were designed to transmit text encoded in EBCDIC or some other way, then ASCII codes would likely get mangled just as badly as raw binary data. -- Greg From stephen at xemacs.org Tue Jul 31 03:54:23 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 31 Jul 2007 10:54:23 +0900 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8l2l4$2d7$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <f8l2l4$2d7$1@sea.gmane.org> Message-ID: <87myxde3sg.fsf@uwakimon.sk.tsukuba.ac.jp> Joe Smith writes: > Cygwin's setlocale function only supports the "C" locale. > I am a bit suprised that ASCII is returned rather than the system's default > encoding. If I understand the situation correctly, you shouldn't be. The C locale is defined to use ASCII. From greg.ewing at canterbury.ac.nz Tue Jul 31 03:38:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:38:07 +1200 Subject: [Python-3000] pep 3124 plans In-Reply-To: <46AD5DD4.8000509@acm.org> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <46AD5DD4.8000509@acm.org> Message-ID: <46AE927F.8080102@canterbury.ac.nz> Talin wrote: > So basically what I would propose is that we simply say that we don't > mix normal overloading and multi-method dispatch until PJE comes up with > his better solution. Maybe this should be enforced, i.e. only allow global functions and class or static methods to be GFs, not regular methods. -- Greg From greg.ewing at canterbury.ac.nz Tue Jul 31 03:45:33 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:45:33 +1200 Subject: [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> Message-ID: <46AE943D.1040105@canterbury.ac.nz> Hasan Diwan wrote: > The issue seems to be in the socket.py close method. It needs to sleep > socket.SO_REUSEADDR seconds before returning. WHAT??? socket.SO_REUSEADDR is a flag that you pass when creating a socket to tell it to re-use an existing address, not something to be used as a timeout value, as far as I know. -- Greg From greg.ewing at canterbury.ac.nz Tue Jul 31 03:58:01 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 13:58:01 +1200 Subject: [Python-3000] io library/PEP 3116 bits In-Reply-To: <18093.58350.892824.688493@montanaro.dyndns.org> References: <18093.58350.892824.688493@montanaro.dyndns.org> Message-ID: <46AE9729.6050507@canterbury.ac.nz> skip at pobox.com wrote: > The other thing I wanted to comment on is the default value for n in the > various read methods. In some places it's -1 (why not zero? *), Maybe because reading 0 bytes already has a well-defined (if not particularly useful) meaning? You probably wouldn't use it explicitly, but it could arise as the result of a calculation, and it would then need to be special-cased if it had a reserved meaning. > (*) A few days ago at work I saw someone check in a piece of code with > > f.read(-1) That does look strange. Maybe the result of someone reading the docs and failing to notice that there was an easier spelling. -- Greg From talin at acm.org Tue Jul 31 04:06:48 2007 From: talin at acm.org (Talin) Date: Mon, 30 Jul 2007 19:06:48 -0700 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070730201511.C14ED3A406B@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> Message-ID: <46AE9938.6070802@acm.org> Phillip J. Eby wrote: > The principle downside to this approach is that only one metaclass > can provide a __prepare__ dictionary, which means it's even more > difficult to combine metaclasses than it is in today's Python -- > which means I want to give a little more thought to PEP 3115, to see > if there is any way to at least emulate the "derived metaclass rule" > for __prepare__, that Python currently enforces for the base classes. I would love any improvements to PEP 3115 that you can think of. -- Talin From greg.ewing at canterbury.ac.nz Tue Jul 31 04:19:44 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 14:19:44 +1200 Subject: [Python-3000] io library/PEP 3116 bits In-Reply-To: <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> References: <18093.58350.892824.688493@montanaro.dyndns.org> <ca471dc20707301020h49f89131k944fe32036708628@mail.gmail.com> Message-ID: <46AE9C40.8040003@canterbury.ac.nz> Guido van Rossum wrote: > I don't think \r needs to be supported -- OSX uses \n; Not always. It's still possible to come across situations where dealing with \r is necessary, when using Classic applications or OSX ports of them. I think it would be premature to drop support for \r at this stage. -- Greg From guido at python.org Tue Jul 31 05:41:36 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Jul 2007 20:41:36 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <46AE8DF1.1030409@canterbury.ac.nz> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> Message-ID: <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> On 7/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > The sort of thing I have in mind is where I have a sequence > that I want to frequently iterate over the indices of, so > I do > > r = xrange(len(myseq)) > > so I can write > > for i in r: > ... > > Having done that, if I want to test whether some index j > is within the range of indices for this sequence, it > seems natural to write > > if j in r: > ... > > Given the context, I think this is a very Obvious Way To > Do It, and it's surprising that it isn't as efficient as it > looks like it should be. Fair enough. So maybe *you* can contribute a patch? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Tue Jul 31 06:09:47 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 30 Jul 2007 21:09:47 -0700 Subject: [Python-3000] optimizing [x]range References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com><ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com><46ABE025.4050204@async.com.br><18092.34217.512107.677855@montanaro.dyndns.org><46AD30DA.6050405@canterbury.ac.nz><18093.18996.944540.279864@montanaro.dyndns.org><46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> Message-ID: <00b701c7d328$a5cd0d60$f101a8c0@RaymondLaptop1> >> Having done that, if I want to test whether some index j >> is within the range of indices for this sequence, it >> seems natural to write >> >> if j in r: >> ... > > Fair enough. So maybe *you* can contribute a patch? And maybe we can do the same for xrange() in Py2.6 Raymond From greg.ewing at canterbury.ac.nz Tue Jul 31 06:29:47 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Jul 2007 16:29:47 +1200 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AD52CC.4060403@ronadam.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46ABFE12.5000101@gmail.com> <46AD2BF6.5080507@canterbury.ac.nz> <46AD52CC.4060403@ronadam.com> Message-ID: <46AEBABB.5030902@canterbury.ac.nz> Ron Adam wrote: > Not extra, you just need to make sure your binary data is in the correct > range of values the text device you are sending to can handle. Does this mean that Py3k text streams will accept byte arrays in their write() methods, and that byte arrays can be concatenated with unicode strings and otherwise used in any context expecting a text string, as long as all their elements are in the ASCII range? If that's true, then some of my objection is mitigated. -- Greg From foom at fuhm.net Tue Jul 31 07:37:40 2007 From: foom at fuhm.net (James Y Knight) Date: Tue, 31 Jul 2007 01:37:40 -0400 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <ca471dc20707301024q4b5a82cdjeba1dc7461efc12c@mail.gmail.com> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46AC1E78.7050900@v.loewis.de> <46AD2F60.9050907@canterbury.ac.nz> <ca471dc20707291743l5eefd72cx28ca281c451e15ba@mail.gmail.com> <46AD3D09.9060006@acm.org> <46AD68A1.8030403@v.loewis.de> <46ADEA81.8000609@latte.ca> <ca471dc20707301024q4b5a82cdjeba1dc7461efc12c@mail.gmail.com> Message-ID: <92FDE9DA-D6C1-47CE-807E-ACA0544C7CEE@fuhm.net> On Jul 30, 2007, at 1:24 PM, Guido van Rossum wrote: > I think you're missing the point, the point being that the most common > use needs bytes, so returning bytes is the most useful API design. I'd say that encoding binary data in XML is at least in the running for most common use of base64. And for that use case, you'll need it as a text string, I think? James From martin at v.loewis.de Tue Jul 31 08:07:18 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 31 Jul 2007 08:07:18 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <87myxde3sg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <f8l2l4$2d7$1@sea.gmane.org> <87myxde3sg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46AED196.4050506@v.loewis.de> > > Cygwin's setlocale function only supports the "C" locale. > > I am a bit suprised that ASCII is returned rather than the system's default > > encoding. > > If I understand the situation correctly, you shouldn't be. The C > locale is defined to use ASCII. I think you don't. I'm certain that standard C doesn't define the C locale to be ASCII, and I believe POSIX doesn't, either. What they do define is that the "basic execution character set" must be in it (or some such). However, in absence of better knowledge, assuming ASCII is the best choice that the library can make. Regards, Martin From ncoghlan at gmail.com Tue Jul 31 11:40:12 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Jul 2007 19:40:12 +1000 Subject: [Python-3000] pep 3124 plans In-Reply-To: <20070730201511.C14ED3A406B@sparrow.telecommunity.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> Message-ID: <46AF037C.9050902@gmail.com> Phillip J. Eby wrote: > In other words, a class' metaclass has to be a derivative of all its > bases' metaclasses; ISTM that a __prepare__ namespace needs to be a > derivative in some sense of all its bases' __prepare__ results. This > probably isn't enforceable, but the pattern should be documented such > that e.g. the overloading metaclass' __prepare__ would return a > mapping that delegates operations to the mapping returned by its > super()'s __prepare__, and the actual class creation would be > similarly chained. PEP 3115 probably needs a section to explain > these issues and recommend best practices for implementing > __prepare__ and class creation on that basis. I'll write something > up after I've thought this through some more. A variant of the metaclass rule specific to __prepare__ might look something like: A class's metaclass providing the __prepare__ method must be a subclass of all of the class's base classes providing __prepare__ methods. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From skip at pobox.com Tue Jul 31 12:22:56 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 31 Jul 2007 05:22:56 -0500 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> Message-ID: <18095.3456.693480.981533@montanaro.dyndns.org> >> if j in r: >> ... >> >> Given the context, I think this is a very Obvious Way To Do It, and >> it's surprising that it isn't as efficient as it looks like it should >> be. Guido> Fair enough. So maybe *you* can contribute a patch? Given the nature of this discussion and who you're asking to provide a patch, I'd rather see a patch for this: Python 3.0x (py3k-struni:56553M, Jul 26 2007, 13:34:26) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> for 0 <= i < 10 by 3: ... print(i) ... 0 3 6 9 :-) (Yes, I know the language is frozen at this point.) Also, bringing it back more on-topic, what should the value of this expression be? 4 in range(0, 10, 3) That is, are we treating range() as a set or an interval? Maybe I missed earlier messages in this thread where this was discussed, but part of the discussion focused on this construct 0 <= 4 < 10 where there was no option to provide a step size. Also, this particular notation screams out interval, not set, to me. Skip From pje at telecommunity.com Tue Jul 31 18:26:35 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 31 Jul 2007 12:26:35 -0400 Subject: [Python-3000] PEP 3115 chaining rules (was Re: pep 3124 plans) In-Reply-To: <46AF037C.9050902@gmail.com> References: <4edc17eb0707122239y1af87a17k99eaa710726050b0@mail.gmail.com> <46A19FCC.7070609@acm.org> <20070721181442.48FB03A403A@sparrow.telecommunity.com> <46A2AE31.2080105@canterbury.ac.nz> <20070722020422.5AAAC3A403A@sparrow.telecommunity.com> <46A3ECB7.9070504@canterbury.ac.nz> <20070723010750.E27693A40A9@sparrow.telecommunity.com> <46A453C7.9070407@acm.org> <20070723153031.D00273A403D@sparrow.telecommunity.com> <5d44f72f0707262227o6fcf8471ja6654910c7ee07e0@mail.gmail.com> <ca471dc20707270825j3e53c11dyb2064468f3665c14@mail.gmail.com> <20070727162212.60E2F3A40E6@sparrow.telecommunity.com> <20070730201511.C14ED3A406B@sparrow.telecommunity.com> <46AF037C.9050902@gmail.com> Message-ID: <20070731162912.3E2C53A40A7@sparrow.telecommunity.com> At 07:40 PM 7/31/2007 +1000, Nick Coghlan wrote: >Phillip J. Eby wrote: >>In other words, a class' metaclass has to be a derivative of all >>its bases' metaclasses; ISTM that a __prepare__ namespace needs to >>be a derivative in some sense of all its bases' __prepare__ >>results. This probably isn't enforceable, but the pattern should >>be documented such that e.g. the overloading metaclass' __prepare__ >>would return a mapping that delegates operations to the mapping >>returned by its super()'s __prepare__, and the actual class >>creation would be similarly chained. PEP 3115 probably needs a >>section to explain these issues and recommend best practices for >>implementing __prepare__ and class creation on that basis. I'll >>write something up after I've thought this through some more. > >A variant of the metaclass rule specific to __prepare__ might look >something like: > A class's metaclass providing the __prepare__ method must be a > subclass of all of the class's base classes providing __prepare__ methods. That doesn't really work; among other things, it would require everything to be a dict subclass, since type.__prepare__() will presumably return a dict. Therefore, it really does need to be delegation instead of inheritance, or it becomes very difficult to provide any "interesting" properties. So let's say that your super().__prepare__() is your "delegate". And we recommend that any write operations you receive, you should also invoke on your delegate, and that you delegate any read operations you can't handle (i.e., key not found) to your delegate as well. And of course, this requirement is recursive -- i.e., all metaclasses that define a __prepare__() should follow it, in order to be fully co-operative. Actually, speaking of co-operative metaclasses, I wonder if it's time to finally implement automatic metaclass mixing in 3.x? Python currently requires you to mix base classes' metaclasses, but doesn't provide any assistance in doing so. For 2.x, I wrote a function that can be called to automatically generate a mixed metaclass for a type; perhaps we should include something like it in the stdlib, so if you get a mixed metaclasses error, the error message itself can suggest using 'metaclass=mixed' or whatever we call it. From unknown_kev_cat at hotmail.com Tue Jul 31 19:21:41 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 31 Jul 2007 13:21:41 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org> <ca471dc20707181002w64e076aco9a509ec7e4e15b9a@mail.gmail.com> <f7lk7q$9m6$1@sea.gmane.org> <ca471dc20707181113m360db736h2fd079f29f71220@mail.gmail.com> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> <46AE5A9C.5000103@v.loewis.de> Message-ID: <f8nr3b$s25$1@sea.gmane.org> ""Martin v. L?wis"" <martin at v.loewis.de> wrote in message news:46AE5A9C.5000103 at v.loewis.de... > Guido van Rossum schrieb: >>> I found that in many cases, this is a virus scanner or the indexing >>> service interfering. They open the file, and then the test suite cannot >>> delete it. >> >> Oh darn. I remember running into that in a completely different >> context. What's the solution? Turn off the virus scanner? Wait until >> it's done? > > I never found the time to properly research the official solution. > > Looking at the DeleteFile documentation, the problem is slightly > different, still: "The DeleteFile function marks a file for deletion on > close. Therefore, the file deletion does not occur until the last handle > to the file is closed. Subsequent calls to CreateFile to open the file > fail with ERROR_ACCESS_DENIED." > > So it is not the DeleteFile that fails, but the subsequent attempt > to create a new file in the same place. > > For the test suite, the solution would be to always use a fresh file > name for temporary files. Of course, it is then more important that > all files created actually do get removed in the fixture. Hmm... The documentation for Cygwin's unlink() implies that it should function the same as a POSIX unlink() except perhaps if a non-Cygwin process has an open handle for it without the correct attributes. I see nothing on my system that would have done that. (No indexing service or virus scanner) So that implies that at the time Python is trying to create the file, it still has an open handle for it. Either that, or something besides Python is opening the file without my knowledge. From guido at python.org Tue Jul 31 20:06:58 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Jul 2007 11:06:58 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8nr3b$s25$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> <46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> Message-ID: <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> On 7/31/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: > Hmm... The documentation for Cygwin's unlink() implies that it should > function the same as a POSIX unlink() except perhaps if a non-Cygwin process > has an open handle for it without the correct attributes. I see nothing on > my system that would have done that. (No indexing service or virus scanner) > So that implies that at the time Python is trying to create the file, it > still has an open handle for it. Either that, or something besides Python is > opening the file without my knowledge. Regular Windows typically won't let you remove a file when you still have it open. Is this also a restriction on CYGWIN? I don't know anything about CYGWIN but I could imagine that they allow unlink() to succeed when there's still a file descriptor referencing it, and that they will delete the file when you close it. But if that fd is never closed the file is probably in weird state. Anyway, before we start speculating more, you probably need to find a source of more CYGWIN expertise elsewhere -- it's rather thin here. Rewriting those tests to use a mroe random temporary file might also be an option, as long as you make sure to clean up (use try/finally or setUp/tearDown). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jul 31 20:11:35 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Jul 2007 11:11:35 -0700 Subject: [Python-3000] optimizing [x]range In-Reply-To: <18095.3456.693480.981533@montanaro.dyndns.org> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> <18095.3456.693480.981533@montanaro.dyndns.org> Message-ID: <ca471dc20707311111w8bf7b09qed98e72ca3f7707b@mail.gmail.com> On 7/31/07, skip at pobox.com <skip at pobox.com> wrote: > Also, bringing it back more on-topic, what should the value of this > expression be? > > 4 in range(0, 10, 3) > > That is, are we treating range() as a set or an interval? Maybe I missed > earlier messages in this thread where this was discussed, but part of the > discussion focused on this construct > > 0 <= 4 < 10 > > where there was no option to provide a step size. Also, this particular > notation screams out interval, not set, to me. You missed it -- it should definitely be equivalent to 4 in list(range(0, 10, 3)) i.e. 4 in [0, 4, 8] -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jul 31 20:13:38 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Jul 2007 11:13:38 -0700 Subject: [Python-3000] base64 - bytes and strings In-Reply-To: <46AEBABB.5030902@canterbury.ac.nz> References: <3f1451f50707281847q2171f82fu2e48f2297214f591@mail.gmail.com> <46ABFE12.5000101@gmail.com> <46AD2BF6.5080507@canterbury.ac.nz> <46AD52CC.4060403@ronadam.com> <46AEBABB.5030902@canterbury.ac.nz> Message-ID: <ca471dc20707311113g342a30b0h23ee7e2b8f5f630f@mail.gmail.com> On 7/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > Does this mean that Py3k text streams will accept byte arrays > in their write() methods, and that byte arrays can be concatenated > with unicode strings and otherwise used in any context expecting > a text string, as long as all their elements are in the ASCII range? No, that is not the intention (even if some of that may accidentally be supported in the current pre-alpha branch). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dalcinl at gmail.com Tue Jul 31 20:18:37 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 31 Jul 2007 15:18:37 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: <18095.3456.693480.981533@montanaro.dyndns.org> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> <18095.3456.693480.981533@montanaro.dyndns.org> Message-ID: <e7ba66e40707311118l2e5d6b93k82bde7e7a3ea30c2@mail.gmail.com> On 7/31/07, skip at pobox.com <skip at pobox.com> wrote: > Also, bringing it back more on-topic, what should the value of this > expression be? > 4 in range(0, 10, 3) > That is, are we treating range() as a set or an interval? IMHO, 'range' is a like a set of integers, not an interval. For me, 'x in range(...)' sould return the same as 'x in list(range(...))'. -- Lisandro Dalc?n From skip at pobox.com Tue Jul 31 20:22:04 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 31 Jul 2007 13:22:04 -0500 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707311111w8bf7b09qed98e72ca3f7707b@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> <18095.3456.693480.981533@montanaro.dyndns.org> <ca471dc20707311111w8bf7b09qed98e72ca3f7707b@mail.gmail.com> Message-ID: <18095.32204.309127.375372@montanaro.dyndns.org> Guido> You missed it -- it should definitely be equivalent to Guido> 4 in list(range(0, 10, 3)) Guido> i.e. Guido> 4 in [0, 4, 8] Ummm... you mean 4 in [0, 3, 6, 9] right? <wink> Skip From dalcinl at gmail.com Tue Jul 31 20:31:12 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 31 Jul 2007 15:31:12 -0300 Subject: [Python-3000] optimizing [x]range In-Reply-To: <ca471dc20707311111w8bf7b09qed98e72ca3f7707b@mail.gmail.com> References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> <ca471dc20707281504g704ff836m8aeaa966559483d3@mail.gmail.com> <46ABE025.4050204@async.com.br> <18092.34217.512107.677855@montanaro.dyndns.org> <46AD30DA.6050405@canterbury.ac.nz> <18093.18996.944540.279864@montanaro.dyndns.org> <46AE8DF1.1030409@canterbury.ac.nz> <ca471dc20707302041j7c834590xf0d315a2f3d3baaa@mail.gmail.com> <18095.3456.693480.981533@montanaro.dyndns.org> <ca471dc20707311111w8bf7b09qed98e72ca3f7707b@mail.gmail.com> Message-ID: <e7ba66e40707311131k17eb946bpdde30b5223916112@mail.gmail.com> On 7/31/07, Guido van Rossum <guido at python.org> wrote: > You missed it -- it should definitely be equivalent to > 4 in list(range(0, 10, 3)) > i.e. > 4 in [0, 4, 8] An then, as list/tuple __contains__ is implemented in terms of rich comparison (with Py_EQ), perhaps a patch is not so easy to be implemented, at first it do not seems to be as trivial as previously suggested in this thread. -- Lisandro Dalc?n From unknown_kev_cat at hotmail.com Tue Jul 31 20:34:27 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 31 Jul 2007 14:34:27 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> Message-ID: <f8nvbm$bcs$1@sea.gmane.org> "Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20707311106v696b3c7an67939bd802b81176 at mail.gmail.com... > On 7/31/07, Joe Smith <unknown_kev_cat at hotmail.com> wrote: >> Hmm... The documentation for Cygwin's unlink() implies that it should >> function the same as a POSIX unlink() except perhaps if a non-Cygwin >> process >> has an open handle for it without the correct attributes. I see nothing >> on >> my system that would have done that. (No indexing service or virus >> scanner) >> So that implies that at the time Python is trying to create the file, it >> still has an open handle for it. Either that, or something besides Python >> is >> opening the file without my knowledge. > > Regular Windows typically won't let you remove a file when you still > have it open. My understanding is that POSIX does not require that ability. > Is this also a restriction on CYGWIN? I don't know > anything about CYGWIN but I could imagine that they allow unlink() to > succeed when there's still a file descriptor referencing it, and that > they will delete the file when you close it. Exactly. That is exactly what they do. The claim was that this meets the POSIX standard. Looking closely, it looks like it does not. POSIX says: >When the file's link count becomes 0 and no process has the file open, >the space occupied by the file shall be freed and the file shall no longer >be accessible. If one or more processes have the file open when the last >link is removed, the link shall be removed before unlink() returns, but >the removal of the file contents shall be postponed until all references >to the file are closed. >But if that fd is never > closed the file is probably in weird state. Anyway, before we start > speculating more, you probably need to find a source of more CYGWIN > expertise elsewhere -- it's rather thin here. Exactly the issue. I see the problem here is cygwin's partial POSIX complience. However, Windows NT had a design goal of allowing a complient implementation of POSIX to be implmented in a subsystem (along with userespace utilities). So it should be possible to get unlink() to work as like a POSIX unlink using raw NT kernel calls. Since Cygwin has dropped support for pre-NT systems, swithing to that seems to be the correct thing to do. I'll discuss this with the cygwin team. Regardless, the exact same issue will likely exist on the windows side. It seems likely that a fix for the Windows side may fix the cygwin issue. > > Rewriting those tests to use a mroe random temporary file might also > be an option, as long as you make sure to clean up (use try/finally or > setUp/tearDown). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Tue Jul 31 21:21:50 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 31 Jul 2007 15:21:50 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org><f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org><f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de><f8nr3b$s25$1@sea.gmane.org><ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> <f8nvbm$bcs$1@sea.gmane.org> Message-ID: <f8o24h$ku5$1@sea.gmane.org> "Joe Smith" <unknown_kev_cat at hotmail.com> wrote in message news:f8nvbm$bcs$1 at sea.gmane.org... > > Exactly the issue. > I see the problem here is cygwin's partial POSIX complience. However, > Windows NT had a design goal of allowing a complient implementation > of POSIX to be implmented in a subsystem (along with userespace > utilities). > > So it should be possible to get unlink() to work as like a POSIX unlink > using raw NT kernel calls. > Since Cygwin has dropped support for pre-NT systems, swithing to that > seems > to be the correct thing to do. > > I'll discuss this with the cygwin team. > > Regardless, the exact same issue will likely exist on the windows side. > It seems likely that a fix for the Windows side may fix the cygwin issue. > Looks like the fix needed for cygwin's unlink was checked in two days ago. The problem should automatically disappear in the next cygwin release. From martin at v.loewis.de Tue Jul 31 21:33:12 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 31 Jul 2007 21:33:12 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org> <ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com> <f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> <46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> Message-ID: <46AF8E78.6020607@v.loewis.de> > Regular Windows typically won't let you remove a file when you still > have it open. It depends. If FILE_SHARE_DELETE was passed to CreateFile when opening, you may DeleteFile it while it is still open. Otherwise, you get an error from DeleteFile. > Is this also a restriction on CYGWIN? Cygwin is a wrapper around Win32. So it "can't do" anything that Win32 can't do (like deleting a file that is still open). > I don't know > anything about CYGWIN but I could imagine that they allow unlink() to > succeed when there's still a file descriptor referencing it, and that > they will delete the file when you close it. They can't do that, because there is no Win32 mechanism for that. Regards, martin From martin at v.loewis.de Tue Jul 31 21:42:54 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 31 Jul 2007 21:42:54 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8nvbm$bcs$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> <f8nvbm$bcs$1@sea.gmane.org> Message-ID: <46AF90BE.3050803@v.loewis.de> >> Is this also a restriction on CYGWIN? I don't know >> anything about CYGWIN but I could imagine that they allow unlink() to >> succeed when there's still a file descriptor referencing it, and that >> they will delete the file when you close it. > > Exactly. That is exactly what they do. Not exactly; it's not possible with Win32 to do that. What they do instead is 1. try to delete the file. If that fails for sharing violation, try 2. 2. move the file to the recycle bin, and set the "delete" disposition flag on the file, this will cause it to be removed from the recycle bin when the last handle is closed. Regards, Martin From guido at python.org Tue Jul 31 21:58:23 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Jul 2007 12:58:23 -0700 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <46AF90BE.3050803@v.loewis.de> References: <f7ithr$lrr$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> <46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> <f8nvbm$bcs$1@sea.gmane.org> <46AF90BE.3050803@v.loewis.de> Message-ID: <ca471dc20707311258l26f0ab6apb157464e3db16496@mail.gmail.com> On 7/31/07, "Martin v. L?wis" <martin at v.loewis.de> wrote: > >> Is this also a restriction on CYGWIN? I don't know > >> anything about CYGWIN but I could imagine that they allow unlink() to > >> succeed when there's still a file descriptor referencing it, and that > >> they will delete the file when you close it. > > > > Exactly. That is exactly what they do. > > Not exactly; it's not possible with Win32 to do that. > > What they do instead is > 1. try to delete the file. If that fails for sharing > violation, try 2. > 2. move the file to the recycle bin, and set the > "delete" disposition flag on the file, this will > cause it to be removed from the recycle bin when > the last handle is closed. I don't understand how that approach would cause the permission error when trying to create the same file later again. Unless (a) I don't understand the phrase "move it to the recycle bin" (is this a rename() call?), or (b) you're describing the new version that was submitted 2 days ago (but not yet released). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From unknown_kev_cat at hotmail.com Tue Jul 31 22:02:33 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 31 Jul 2007 16:02:33 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com><f8nvbm$bcs$1@sea.gmane.org> <46AF90BE.3050803@v.loewis.de> Message-ID: <f8o4gs$tbu$1@sea.gmane.org> ""Martin v. L?wis"" <martin at v.loewis.de> wrote in message news:46AF90BE.3050803 at v.loewis.de... >>> Is this also a restriction on CYGWIN? I don't know >>> anything about CYGWIN but I could imagine that they allow unlink() to >>> succeed when there's still a file descriptor referencing it, and that >>> they will delete the file when you close it. >> >> Exactly. That is exactly what they do. > Not exactly; it's not possible with Win32 to do that. Um. It is indeed possible to mark a file for deletion on close. The requirement is that all file handles have SHARED_DELETE. This is one of the things Cywin has tried. It works fine except when a Windows app has opened the file without that flag. To prevent the name clashes, movment to the recycle bin is required. > What they do instead is > 1. try to delete the file. If that fails for sharing > violation, try 2. > 2. move the file to the recycle bin, and set the > "delete" disposition flag on the file, this will > cause it to be removed from the recycle bin when > the last handle is closed. That is what they do with the latest patches. It is pretty much equivent to the POSIX system. That requires Native NT Calls, and is not part of win32. It is equivlent to marking the file for deletion on close, except the other handles do not need to have shared_delete. The moving the file to the recycle bin just gets the file out of the way. But for what it is worth, the next cygwin release will be doing exactly what is described above. So to Python it will look and act *almost* exactly like POSIX. It should fix the problem. GVR: The move to recycle bin is more or less a rename() call, except I belive it has special support for avoiding name conflicts. From unknown_kev_cat at hotmail.com Tue Jul 31 22:06:14 2007 From: unknown_kev_cat at hotmail.com (Joe Smith) Date: Tue, 31 Jul 2007 16:06:14 -0400 Subject: [Python-3000] Py3k_struni additional test failures under cygwin References: <f7ithr$lrr$1@sea.gmane.org><f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org><f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de><f8nr3b$s25$1@sea.gmane.org><ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com><f8nvbm$bcs$1@sea.gmane.org> <f8o24h$ku5$1@sea.gmane.org> Message-ID: <f8o4no$u4b$1@sea.gmane.org> "Joe Smith" <unknown_kev_cat at hotmail.com> wrote in message news:f8o24h$ku5$1 at sea.gmane.org... > > "Joe Smith" <unknown_kev_cat at hotmail.com> wrote in message > news:f8nvbm$bcs$1 at sea.gmane.org... >> >> Exactly the issue. >> I see the problem here is cygwin's partial POSIX complience. However, >> Windows NT had a design goal of allowing a complient implementation >> of POSIX to be implmented in a subsystem (along with userespace >> utilities). >> >> So it should be possible to get unlink() to work as like a POSIX unlink >> using raw NT kernel calls. >> Since Cygwin has dropped support for pre-NT systems, swithing to that >> seems >> to be the correct thing to do. >> >> I'll discuss this with the cygwin team. >> >> Regardless, the exact same issue will likely exist on the windows side. >> It seems likely that a fix for the Windows side may fix the cygwin issue. >> > > Looks like the fix needed for cygwin's unlink was checked in two days ago. > The problem should automatically disappear in the next cygwin release. Sorry for misinformation. It looks like it has been changed for more than 2 days, but 2 days is the date of the most recent change. Regardless it looks like the code that does the right thing is not in the latest released DLL. From martin at v.loewis.de Tue Jul 31 22:35:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 31 Jul 2007 22:35:26 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <f8o4gs$tbu$1@sea.gmane.org> References: <f7ithr$lrr$1@sea.gmane.org> <f7lnd8$l2s$1@sea.gmane.org><ca471dc20707181158p17417c9cg37c5382d61b53fe5@mail.gmail.com><f8is94$tgh$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org><ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com><46AD6E67.50407@v.loewis.de><ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com><46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com><f8nvbm$bcs$1@sea.gmane.org> <46AF90BE.3050803@v.loewis.de> <f8o4gs$tbu$1@sea.gmane.org> Message-ID: <46AF9D0E.4060700@v.loewis.de> > That is what they do with the latest patches. It is pretty much > equivent to the POSIX system. That requires Native NT Calls, and is > not part of win32. It is equivlent to marking the file for deletion > on close, except the other handles do not need to have shared_delete. > The moving the file to the recycle bin just gets the file out of the > way. On a true POSIX system, this would not be necessary: you can immediately create a new file in place of the previous one after you're done with unlink, and there is no way to get the file back - but there is on Windows (go to the recycle bin). > But for what it is worth, the next cygwin release will be doing > exactly what is described above. Indeed, I reported what the Cygwin code does (or will do once released). Regards, Martin From martin at v.loewis.de Tue Jul 31 22:37:05 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 31 Jul 2007 22:37:05 +0200 Subject: [Python-3000] Py3k_struni additional test failures under cygwin In-Reply-To: <ca471dc20707311258l26f0ab6apb157464e3db16496@mail.gmail.com> References: <f7ithr$lrr$1@sea.gmane.org> <f8j862$st0$1@sea.gmane.org> <ca471dc20707291727q6438b6eax5eaadbb6c712ef29@mail.gmail.com> <46AD6E67.50407@v.loewis.de> <ca471dc20707301027m71b91ffbnbfb31a9075e66d8b@mail.gmail.com> <46AE5A9C.5000103@v.loewis.de> <f8nr3b$s25$1@sea.gmane.org> <ca471dc20707311106v696b3c7an67939bd802b81176@mail.gmail.com> <f8nvbm$bcs$1@sea.gmane.org> <46AF90BE.3050803@v.loewis.de> <ca471dc20707311258l26f0ab6apb157464e3db16496@mail.gmail.com> Message-ID: <46AF9D71.6090804@v.loewis.de> >> What they do instead is >> 1. try to delete the file. If that fails for sharing >> violation, try 2. >> 2. move the file to the recycle bin, and set the >> "delete" disposition flag on the file, this will >> cause it to be removed from the recycle bin when >> the last handle is closed. > > I don't understand how that approach would cause the permission error > when trying to create the same file later again. Unless (a) I don't > understand the phrase "move it to the recycle bin" (is this a rename() > call?), or (b) you're describing the new version that was submitted 2 > days ago (but not yet released). The latter - I just looked into the CVS tree to find out what they do. Regards, Martin