From victor.stinner at haypocalc.com  Mon Oct  1 00:59:40 2007
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 1 Oct 2007 00:59:40 +0200
Subject: [Python-3000] Python, int/long and GMP
In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com>
References: <200709280429.39396.victor.stinner@haypocalc.com>
	<aac2c7cb0709280944n15419164w124c7ed5cda85b57@mail.gmail.com>
	<200709281858.29705.victor.stinner@haypocalc.com>
Message-ID: <200710010059.41161.victor.stinner@haypocalc.com>

Hi,

I wrote another patch with two improvment: use small integer cache and use 
Python memory allocation functions. Now GMP overhead (pystones result) is 
only -2% and not -20% (previous patch).

Since the patch is huge, I prefer to leave copy on my server:
http://www.haypocalc.com/tmp/py3k-long_gmp-v2.patch

Victor
-- 
Victor Stinner
http://hachoir.org/

From guido at python.org  Mon Oct  1 01:14:07 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 30 Sep 2007 16:14:07 -0700
Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable
	Bytesand Mutable Buffer)
In-Reply-To: <fb6fbf560709300931y142e562p632a79e0d845117a@mail.gmail.com>
References: <fb6fbf560709281023h9a09e5m1081f8a0357e908d@mail.gmail.com>
	<ca471dc20709281140q2ef95c2ap8bbc7b7d3d46ebc0@mail.gmail.com>
	<fdkd6i$fjh$1@sea.gmane.org>
	<ca471dc20709282008v210c1778oc4670a58268ab248@mail.gmail.com>
	<20070929142126.D61D23A4045@sparrow.telecommunity.com>
	<ca471dc20709290733i54f63ac3pb4501b94530db820@mail.gmail.com>
	<20070929151127.AE5203A4045@sparrow.telecommunity.com>
	<dcbbbb410709290826w46e45d14p108972abe8e5bae7@mail.gmail.com>
	<20070929155823.C552B3A4045@sparrow.telecommunity.com>
	<fb6fbf560709300931y142e562p632a79e0d845117a@mail.gmail.com>
Message-ID: <ca471dc20709301614n7a304a1cg8c15f95f05d77dad@mail.gmail.com>

I see no other solution to this thread than to revert the decision
that comparing bytes and str raises TypeError. It may catch a trivial
mistake or two, but the far from trivial, subtle issues it causes for
more sophisticated code just aren't worth it. I'll add this to PEP
3137.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct  1 01:25:20 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 30 Sep 2007 16:25:20 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable
	Buffer
Message-ID: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>

Thanks all for the focused and helpful discussion on this PEP. Here's
a new posting of the full text of the PEP as it now stands. Most of
the changes since the first posting are fleshing out of some details;
the decision to make the individual elements of bytes and buffer be
ints; and the decision to change bytes/str and buffer/str comparisons
again to just return False instead of raising TypeError.

(I'm not favorable towards the proposal of c'x' style literals or
changes to the I/O APIs to use different names for calls involving
bytes instead of text. If you still disagree, please start a new
thread with new subject line.)

I plan to accept the PEP within a day or two barring major objections,
and expect to start implementing soon after.

--Guido

PEP: 3137
Title: Immutable Bytes and Mutable Buffer
Version: $Revision: 58290 $
Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $
Author: Guido van Rossum <guido at python.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 26-Sep-2007
Python-Version: 3.0
Post-History: 26-Sep-2007, 30-Sep-2007

Introduction
============

After releasing Python 3.0a1 with a mutable bytes type, pressure
mounted to add a way to represent immutable bytes.  Gregory P. Smith
proposed a patch that would allow making a bytes object temporarily
immutable by requesting that the data be locked using the new buffer
API from PEP 3118.  This did not seem the right approach to me.

Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to
make the bytes type immutable (by crudely removing all mutating APIs)
and fix the fall-out in the test suite.  This showed that there aren't
all that many places that depend on the mutability of bytes, with the
exception of code that builds up a return value from small pieces.

Thinking through the consequences, and noticing that using the array
module as an ersatz mutable bytes type is far from ideal, and
recalling a proposal put forward earlier by Talin, I floated the
suggestion to have both a mutable and an immutable bytes type.  (This
had been brought up before, but until seeing the evidence of Jeffrey's
patch I wasn't open to the suggestion.)

Moreover, a possible implementation strategy became clear: use the old
PyString implementation, stripped down to remove locale support and
implicit conversions to/from Unicode, for the immutable bytes type,
and keep the new PyBytes implementation as the mutable bytes type.

The ensuing discussion made it clear that the idea is welcome but
needs to be specified more precisely.  Hence this PEP.

Advantages
==========

One advantage of having an immutable bytes type is that code objects
can use these.  It also makes it possible to efficiently create hash
tables using bytes for keys; this may be useful when parsing protocols
like HTTP or SMTP which are based on bytes representing text.

Porting code that manipulates binary data (or encoded text) in Python
2.x will be easier using the new design than using the original 3.0
design with mutable bytes; simply replace ``str`` with ``bytes`` and
change '...' literals into b'...' literals.

Naming
======

I propose the following type names at the Python level:

  - ``bytes`` is an immutable array of bytes (PyString)

  - ``buffer`` is a mutable array of bytes (PyBytes)

  - ``memoryview`` is a bytes view on another object (PyMemory)

The old type named ``buffer`` is so similar to the new type
``memoryview``, introduce by PEP 3118, that it is redundant.  The rest
of this PEP doesn't discuss the functionality of ``memoryview``; it is
just mentioned here to justify getting rid of the old ``buffer`` type
so we can reuse its name for the mutable bytes type.

While eventually it makes sense to change the C API names, this PEP
maintains the old C API names, which should be familiar to all.

Literal Notations
=================

The b'...' notation introduced in Python 3.0a1 returns an immutable
bytes object, whatever variation is used.  To create a mutable bytes
buffer object, use buffer(b'...') or buffer([...]).  The latter may
use a list of integers in range(256).

Functionality
=============

PEP 3118 Buffer API
-------------------

Both bytes and buffer implement the PEP 3118 buffer API.  The bytes
type only implements read-only requests; the buffer type allows
writable and data-locked requests as well.  The element data type is
always 'B' (i.e. unsigned byte).

Constructors
------------

There are four forms of constructors, applicable to both bytes and
buffer:

  - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``,
    ``buffer(<buffer>)``: simple copying constructors, with the note
    that ``bytes(<bytes>)`` might return its (immutable) argument.

  - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>,
    <encoding>[, <errors>])``: encode a text string.  Note that the
    ``str.encode()`` method returns an *immutable* bytes object.
    The <encoding> argument is mandatory; <errors> is optional.

  - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a
    bytes or buffer object from anything implementing the PEP 3118
    buffer API.

  - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``:
    construct an immutable bytes or mutable buffer object from a
    stream of integers in range(256).

  - ``buffer(<int>)``: construct a zero-initialized buffer of a given
    length.

Comparisons
-----------

The bytes and buffer types are comparable with each other and
orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.

Comparing either type to a str object for equality returns False
regardless of the contents of either operand.  Ordering comparisons
with str raise TypeError.  This is all conformant to the standard
rules for comparison and ordering between objects of incompatible
types.

(**Note:** in Python 3.0a1, comparing a bytes instance with a str
instance would raise TypeError, on the premise that this would catch
the occasional mistake quicker, especially in code ported from Python
2.x.  However, a long discussion on the python-3000 list pointed out
so many problems with this that it is clearly a bad idea, to be rolled
back in 3.0a2 regardless of the fate of the rest of this PEP.)

Slicing
-------

Slicing a bytes object returns a bytes object.  Slicing a buffer
object returns a buffer object.

Slice assignment to a mutable buffer object accept anything that
implements the PEP 3118 buffer API, or an iterable of integers in
range(256).

Indexing
--------

Indexing bytes and buffer returns small ints (like the bytes type in
3.0a1, and like lists or array.array('B')).

Assignment to an item of a mutable buffer object accepts an int in
range(256).  (To assign from a bytes sequence, use a slice
assignment.)

Str() and Repr()
----------------

The str() and repr() functions return the same thing for these
objects.  The repr() of a bytes object returns a b'...' style literal.
The repr() of a buffer returns a string of the form "buffer(b'...')".

Operators
---------

The following operators are implemented by the bytes and buffer types,
except where mentioned:

  - ``b1 + b2``: concatenation.  With mixed bytes/buffer operands,
    the return type is that of the first argument (this seems arbitrary
    until you consider how ``+=`` works).

  - ``b1 += b2'': mutates b1 if it is a buffer object.

  - ``b * n``, ``n * b``: repetition; n must be an integer.

  - ``b *= n``: mutates b if it is a buffer object.

  - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any
    object implementing the PEP 3118 buffer API.

  - ``i in b``, ``i not in b``: single-byte membership test; i must
    be an integer (if it is a length-1 bytes array, it is considered
    to be a substring test, with the same outcome).

  - ``len(b)``: the number of bytes.

  - ``hash(b)``: the hash value; only implemented by the bytes type.

Note that the % operator is *not* implemented.  It does not appear
worth the complexity.

Methods
-------

The following methods are implemented by bytes as well as buffer, with
similar semantics.  They accept anything that implements the PEP 3118
buffer API for bytes arguments, and return the same type as the object
whose method is called ("self")::

  .capitalize(), .center(), .count(), .decode(), .endswith(),
  .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
  .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
  .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
  .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
  .splitlines(), .startswith(), .strip(), .swapcase(), .title(),
  .translate(), .upper(), .zfill()

This is exactly the set of methods present on the str type in Python
2.x, with the exclusion of .encode().  The signatures and semantics
are the same too.  However, whenever character classes like letter,
whitespace, lower case are used, the ASCII definitions of these
classes are used.  (The Python 2.x str type uses the definitions from
the current locale, settable through the locale module.)  The
.encode() method is left out because of the more strict definitions of
encoding and decoding in Python 3000: encoding always takes a Unicode
string and returns a bytes sequence, and decoding always takes a bytes
sequence and returns a Unicode string.

In addition, both types implement the class method ``.fromhex()``,
which constructs an object from a string containing hexadecimal values
(with or without spaces between the bytes).

The buffer type implements these additional methods from the
MutableSequence ABC (see PEP 3119):

  .extend(), .insert(), .append(), .reverse(), .pop(), .remove().

Bytes and the Str Type
----------------------

Like the bytes type in Python 3.0a1, and unlike the relationship
between str and unicode in Python 2.x, any attempt to mix bytes (or
buffer) objects and str objects without specifying an encoding will
raise a TypeError exception.  This is the case even for simply
comparing a bytes or buffer object to a str object (even violating the
general rule that comparing objects of different types for equality
should just return False).

Conversions between bytes or buffer objects and str objects must
always be explicit, using an encoding.  There are two equivalent APIs:
``str(b, <encoding>[, <errors>])`` is equivalent to
``b.decode(<encoding>[, <errors>])``, and
``bytes(s, <encoding>[, <errors>])`` is equivalent to
``s.encode(<encoding>[, <errors>])``.

There is one exception: we can convert from bytes (or buffer) to str
without specifying an encoding by writing ``str(b)``.  This produces
the same result as ``repr(b)``.  This exception is necessary because
of the general promise that *any* object can be printed, and printing
is just a special case of conversion to str.  There is however no
promise that printing a bytes object interprets the individual bytes
as characters (unlike in Python 2.x).

The str type currently implements the PEP 3118 buffer API.  While this
is perhaps occasionally convenient, it is also potentially confusing,
because the bytes accessed via the buffer API represent a
platform-depending encoding: depending on the platform byte order and
a compile-time configuration option, the encoding could be UTF-16-BE,
UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
of the str type might completely change the bytes representation,
e.g. to UTF-8, or even make it impossible to access the data as a
contiguous array of bytes at all.  Therefore, the PEP 3118 buffer API
will be removed from the str type.

Pickling
--------

Left as an exercise for the reader.

Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Mon Oct  1 02:11:38 2007
From: brett at python.org (Brett Cannon)
Date: Sun, 30 Sep 2007 17:11:38 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
Message-ID: <bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>

+1 from me.

-Brett

On 9/30/07, Guido van Rossum <guido at python.org> wrote:
> Thanks all for the focused and helpful discussion on this PEP. Here's
> a new posting of the full text of the PEP as it now stands. Most of
> the changes since the first posting are fleshing out of some details;
> the decision to make the individual elements of bytes and buffer be
> ints; and the decision to change bytes/str and buffer/str comparisons
> again to just return False instead of raising TypeError.
>
> (I'm not favorable towards the proposal of c'x' style literals or
> changes to the I/O APIs to use different names for calls involving
> bytes instead of text. If you still disagree, please start a new
> thread with new subject line.)
>
> I plan to accept the PEP within a day or two barring major objections,
> and expect to start implementing soon after.
>
> --Guido
>
> PEP: 3137
> Title: Immutable Bytes and Mutable Buffer
> Version: $Revision: 58290 $
> Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $
> Author: Guido van Rossum <guido at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 26-Sep-2007
> Python-Version: 3.0
> Post-History: 26-Sep-2007, 30-Sep-2007
>
> Introduction
> ============
>
> After releasing Python 3.0a1 with a mutable bytes type, pressure
> mounted to add a way to represent immutable bytes.  Gregory P. Smith
> proposed a patch that would allow making a bytes object temporarily
> immutable by requesting that the data be locked using the new buffer
> API from PEP 3118.  This did not seem the right approach to me.
>
> Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to
> make the bytes type immutable (by crudely removing all mutating APIs)
> and fix the fall-out in the test suite.  This showed that there aren't
> all that many places that depend on the mutability of bytes, with the
> exception of code that builds up a return value from small pieces.
>
> Thinking through the consequences, and noticing that using the array
> module as an ersatz mutable bytes type is far from ideal, and
> recalling a proposal put forward earlier by Talin, I floated the
> suggestion to have both a mutable and an immutable bytes type.  (This
> had been brought up before, but until seeing the evidence of Jeffrey's
> patch I wasn't open to the suggestion.)
>
> Moreover, a possible implementation strategy became clear: use the old
> PyString implementation, stripped down to remove locale support and
> implicit conversions to/from Unicode, for the immutable bytes type,
> and keep the new PyBytes implementation as the mutable bytes type.
>
> The ensuing discussion made it clear that the idea is welcome but
> needs to be specified more precisely.  Hence this PEP.
>
> Advantages
> ==========
>
> One advantage of having an immutable bytes type is that code objects
> can use these.  It also makes it possible to efficiently create hash
> tables using bytes for keys; this may be useful when parsing protocols
> like HTTP or SMTP which are based on bytes representing text.
>
> Porting code that manipulates binary data (or encoded text) in Python
> 2.x will be easier using the new design than using the original 3.0
> design with mutable bytes; simply replace ``str`` with ``bytes`` and
> change '...' literals into b'...' literals.
>
> Naming
> ======
>
> I propose the following type names at the Python level:
>
>   - ``bytes`` is an immutable array of bytes (PyString)
>
>   - ``buffer`` is a mutable array of bytes (PyBytes)
>
>   - ``memoryview`` is a bytes view on another object (PyMemory)
>
> The old type named ``buffer`` is so similar to the new type
> ``memoryview``, introduce by PEP 3118, that it is redundant.  The rest
> of this PEP doesn't discuss the functionality of ``memoryview``; it is
> just mentioned here to justify getting rid of the old ``buffer`` type
> so we can reuse its name for the mutable bytes type.
>
> While eventually it makes sense to change the C API names, this PEP
> maintains the old C API names, which should be familiar to all.
>
> Literal Notations
> =================
>
> The b'...' notation introduced in Python 3.0a1 returns an immutable
> bytes object, whatever variation is used.  To create a mutable bytes
> buffer object, use buffer(b'...') or buffer([...]).  The latter may
> use a list of integers in range(256).
>
> Functionality
> =============
>
> PEP 3118 Buffer API
> -------------------
>
> Both bytes and buffer implement the PEP 3118 buffer API.  The bytes
> type only implements read-only requests; the buffer type allows
> writable and data-locked requests as well.  The element data type is
> always 'B' (i.e. unsigned byte).
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
>   - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``,
>     ``buffer(<buffer>)``: simple copying constructors, with the note
>     that ``bytes(<bytes>)`` might return its (immutable) argument.
>
>   - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>,
>     <encoding>[, <errors>])``: encode a text string.  Note that the
>     ``str.encode()`` method returns an *immutable* bytes object.
>     The <encoding> argument is mandatory; <errors> is optional.
>
>   - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a
>     bytes or buffer object from anything implementing the PEP 3118
>     buffer API.
>
>   - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``:
>     construct an immutable bytes or mutable buffer object from a
>     stream of integers in range(256).
>
>   - ``buffer(<int>)``: construct a zero-initialized buffer of a given
>     length.
>
> Comparisons
> -----------
>
> The bytes and buffer types are comparable with each other and
> orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.
>
> Comparing either type to a str object for equality returns False
> regardless of the contents of either operand.  Ordering comparisons
> with str raise TypeError.  This is all conformant to the standard
> rules for comparison and ordering between objects of incompatible
> types.
>
> (**Note:** in Python 3.0a1, comparing a bytes instance with a str
> instance would raise TypeError, on the premise that this would catch
> the occasional mistake quicker, especially in code ported from Python
> 2.x.  However, a long discussion on the python-3000 list pointed out
> so many problems with this that it is clearly a bad idea, to be rolled
> back in 3.0a2 regardless of the fate of the rest of this PEP.)
>
> Slicing
> -------
>
> Slicing a bytes object returns a bytes object.  Slicing a buffer
> object returns a buffer object.
>
> Slice assignment to a mutable buffer object accept anything that
> implements the PEP 3118 buffer API, or an iterable of integers in
> range(256).
>
> Indexing
> --------
>
> Indexing bytes and buffer returns small ints (like the bytes type in
> 3.0a1, and like lists or array.array('B')).
>
> Assignment to an item of a mutable buffer object accepts an int in
> range(256).  (To assign from a bytes sequence, use a slice
> assignment.)
>
> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects.  The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".
>
> Operators
> ---------
>
> The following operators are implemented by the bytes and buffer types,
> except where mentioned:
>
>   - ``b1 + b2``: concatenation.  With mixed bytes/buffer operands,
>     the return type is that of the first argument (this seems arbitrary
>     until you consider how ``+=`` works).
>
>   - ``b1 += b2'': mutates b1 if it is a buffer object.
>
>   - ``b * n``, ``n * b``: repetition; n must be an integer.
>
>   - ``b *= n``: mutates b if it is a buffer object.
>
>   - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any
>     object implementing the PEP 3118 buffer API.
>
>   - ``i in b``, ``i not in b``: single-byte membership test; i must
>     be an integer (if it is a length-1 bytes array, it is considered
>     to be a substring test, with the same outcome).
>
>   - ``len(b)``: the number of bytes.
>
>   - ``hash(b)``: the hash value; only implemented by the bytes type.
>
> Note that the % operator is *not* implemented.  It does not appear
> worth the complexity.
>
> Methods
> -------
>
> The following methods are implemented by bytes as well as buffer, with
> similar semantics.  They accept anything that implements the PEP 3118
> buffer API for bytes arguments, and return the same type as the object
> whose method is called ("self")::
>
>   .capitalize(), .center(), .count(), .decode(), .endswith(),
>   .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
>   .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
>   .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
>   .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
>   .splitlines(), .startswith(), .strip(), .swapcase(), .title(),
>   .translate(), .upper(), .zfill()
>
> This is exactly the set of methods present on the str type in Python
> 2.x, with the exclusion of .encode().  The signatures and semantics
> are the same too.  However, whenever character classes like letter,
> whitespace, lower case are used, the ASCII definitions of these
> classes are used.  (The Python 2.x str type uses the definitions from
> the current locale, settable through the locale module.)  The
> .encode() method is left out because of the more strict definitions of
> encoding and decoding in Python 3000: encoding always takes a Unicode
> string and returns a bytes sequence, and decoding always takes a bytes
> sequence and returns a Unicode string.
>
> In addition, both types implement the class method ``.fromhex()``,
> which constructs an object from a string containing hexadecimal values
> (with or without spaces between the bytes).
>
> The buffer type implements these additional methods from the
> MutableSequence ABC (see PEP 3119):
>
>   .extend(), .insert(), .append(), .reverse(), .pop(), .remove().
>
> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, <encoding>[, <errors>])`` is equivalent to
> ``b.decode(<encoding>[, <errors>])``, and
> ``bytes(s, <encoding>[, <errors>])`` is equivalent to
> ``s.encode(<encoding>[, <errors>])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``.  This produces
> the same result as ``repr(b)``.  This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str.  There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).
>
> The str type currently implements the PEP 3118 buffer API.  While this
> is perhaps occasionally convenient, it is also potentially confusing,
> because the bytes accessed via the buffer API represent a
> platform-depending encoding: depending on the platform byte order and
> a compile-time configuration option, the encoding could be UTF-16-BE,
> UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
> of the str type might completely change the bytes representation,
> e.g. to UTF-8, or even make it impossible to access the data as a
> contiguous array of bytes at all.  Therefore, the PEP 3118 buffer API
> will be removed from the str type.
>
> Pickling
> --------
>
> Left as an exercise for the reader.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org
>

From aahz at pythoncraft.com  Mon Oct  1 04:10:02 2007
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 30 Sep 2007 19:10:02 -0700
Subject: [Python-3000] Extension: mpf for GNU MP floating point
In-Reply-To: <e8452f7c0709271002r5dda1e09w56160fa359ba6f7d@mail.gmail.com>
References: <e8452f7c0709192049o406eddebi5b0481ad442de056@mail.gmail.com>
	<e8452f7c0709271002r5dda1e09w56160fa359ba6f7d@mail.gmail.com>
Message-ID: <20071001021001.GA12746@panix.com>

On Thu, Sep 27, 2007, Rob Crowther wrote:
> 
> I've uploaded the latest code to http://umass.glexia.net/mpf.tar.bz2
> 
> Here's a quick rundown of supported functions and operations.

Could you explain what your goal is here?  MPF isn't currently part of
the standard library, so it probably should exist as a standalone
extension first.  This mailing list is probably not the right place for
discussion, either.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

From carsten at uniqsys.com  Mon Oct  1 04:10:32 2007
From: carsten at uniqsys.com (Carsten Haese)
Date: Sun, 30 Sep 2007 22:10:32 -0400
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
	and	Mutable Buffer
In-Reply-To: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
Message-ID: <1191204632.3258.6.camel@localhost.localdomain>

On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote:
> [...]
> (**Note:** in Python 3.0a1, comparing a bytes instance with a str
> instance would raise TypeError, on the premise that this would catch
> the occasional mistake quicker, especially in code ported from Python
> 2.x.  However, a long discussion on the python-3000 list pointed out
> so many problems with this that it is clearly a bad idea, to be rolled
> back in 3.0a2 regardless of the fate of the rest of this PEP.)
> [...]
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).

It appears that you didn't revise the latter paragraph after adding the
former paragraph.

-- 
Carsten Haese
http://informixdb.sourceforge.net



From alexandre at peadrop.com  Mon Oct  1 04:44:31 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 30 Sep 2007 22:44:31 -0400
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
Message-ID: <acd65fa20709301944x3dd193d4tcef5ce3f97eebe68@mail.gmail.com>

On 9/30/07, Guido van Rossum <guido at python.org> wrote:
> Pickling
> --------
>
> Left as an exercise for the reader.
>

A simple way to add specific pickling support for bytes/buffer objects
would be to define two new constants:

  BYTES          = b'\x8c'  # push a bytes object
  BUFFER         = b'\x8d'  # push a buffer object

And add the following pickling and unpickling procedures:

  def save_bytes(self, obj, pack=struct.pack):
      n = len(obj)
      self.write(BYTES + pack("<i", n) + obj)

  def save_buffer(self, obj, pack=struct.pack):
      n = len(obj)
      self.write(BUFFER + pack("<i", n) + obj)

  def load_bytes(self):
      len = mloads(b'i' + self.read(4))
      self.append(self.read(len))

  def load_buffer(self):
      len = mloads(b'i' + self.read(4))
      self.append(buffer(self.read(len)))

The only problem with this approach is that bytes object bigger than
4GB cannot be pickled. Currently, this applies to all string-like
objects, so I don't think this restriction will cause any trouble.
Also, it would be a good idea to bump the protocol version to 3 to
ensure that older Python versions don't try to load pickle streams
created with these new constants.

By the way, would it be a good idea to add specific pickling support
for sets (and frozensets)?

-- Alexandre

From guido at python.org  Mon Oct  1 04:59:16 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 30 Sep 2007 19:59:16 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <1191204632.3258.6.camel@localhost.localdomain>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<1191204632.3258.6.camel@localhost.localdomain>
Message-ID: <ca471dc20709301959t7837d38erde5449fb786383f3@mail.gmail.com>

On 9/30/07, Carsten Haese <carsten at uniqsys.com> wrote:
> On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote:
> > [...]
> > (**Note:** in Python 3.0a1, comparing a bytes instance with a str
> > instance would raise TypeError, on the premise that this would catch
> > the occasional mistake quicker, especially in code ported from Python
> > 2.x.  However, a long discussion on the python-3000 list pointed out
> > so many problems with this that it is clearly a bad idea, to be rolled
> > back in 3.0a2 regardless of the fate of the rest of this PEP.)
> > [...]
> > Like the bytes type in Python 3.0a1, and unlike the relationship
> > between str and unicode in Python 2.x, any attempt to mix bytes (or
> > buffer) objects and str objects without specifying an encoding will
> > raise a TypeError exception.  This is the case even for simply
> > comparing a bytes or buffer object to a str object (even violating the
> > general rule that comparing objects of different types for equality
> > should just return False).
>
> It appears that you didn't revise the latter paragraph after adding the
> former paragraph.

Good catch! Fixed in svn. The latter paragraph now reads:

"""
Like the bytes type in Python 3.0a1, and unlike the relationship
between str and unicode in Python 2.x, attempts to mix bytes (or
buffer) objects and str objects without specifying an encoding will
raise a TypeError exception.  (However, comparing bytes/buffer and str
objects for equality will simply return False; see the section on
Comparisons above.)
"""

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant.travis at ieee.org  Mon Oct  1 06:47:34 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Sun, 30 Sep 2007 23:47:34 -0500
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
Message-ID: <fdpu52$vaa$1@sea.gmane.org>

+1 from me.

I like that the str will not support the buffer API because it gets rid 
of one of the flags in the PEP 3118 API that was only there to support 
the abuse of the buffer API by unicode objects.

- Travis Oliphant


From greg at krypto.org  Mon Oct  1 07:16:29 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Sun, 30 Sep 2007 22:16:29 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <fdpu52$vaa$1@sea.gmane.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<fdpu52$vaa$1@sea.gmane.org>
Message-ID: <52dc1c820709302216h34a82c45m2385f8dcf34de800@mail.gmail.com>

+10 from me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070930/ee7c03f7/attachment.htm 

From ncoghlan at gmail.com  Mon Oct  1 15:55:12 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 01 Oct 2007 23:55:12 +1000
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
 and	Mutable Buffer
In-Reply-To: <bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
Message-ID: <4700FC40.1060206@gmail.com>

Brett Cannon wrote:
> +1 from me.

Looks good to me too: +1

I wouldn't mind seeing some iteration-in-C bit-bashing operations in 
there eventually, but they aren't needed on the first pass, and even 
being able to do things like the following will be a decent improvement 
over the status quo for low-level bitstream manipulation:

   data = bytes([x & 0x1F for x in orig_data])

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From dalcinl at gmail.com  Mon Oct  1 17:00:11 2007
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Mon, 1 Oct 2007 12:00:11 -0300
Subject: [Python-3000] [Python-Dev] building with -Wwrite-strings
In-Reply-To: <20071001141007.GA20122@code0.codespeak.net>
References: <e7ba66e40709271518v1f10cdd4h8643f74048bf0b82@mail.gmail.com>
	<46FD6DA2.1060107@v.loewis.de>
	<20071001141007.GA20122@code0.codespeak.net>
Message-ID: <e7ba66e40710010800l68d546fbke5bbd8721bab132b@mail.gmail.com>

Yes, you are completely right. I ended up realizing that a change like
this would break almost all third-party extension.

But... What about of doing this for Py3K? Third-party extension have
to be fixed anyway.


On 10/1/07, Armin Rigo <arigo at tunes.org> wrote:
> Hi Martin,
>
> On Fri, Sep 28, 2007 at 11:09:54PM +0200, "Martin v. L?wis" wrote:
> > What's wrong with
> >
> > static const char *kwlist[] = {"x", "base", 0};
>
> The following goes wrong if we try again to walk this path:
> http://mail.python.org/pipermail/python-dev/2006-February/060689.html
>
>
> Armin
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From skip at pobox.com  Mon Oct  1 19:14:40 2007
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 1 Oct 2007 12:14:40 -0500
Subject: [Python-3000] bytes vs. array.array vs. numpy.array
In-Reply-To: <4700FC40.1060206@gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com>
Message-ID: <18177.11008.244338.509409@montanaro.dyndns.org>


    Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations
    Nick> in there eventually...

    Nick>    data = bytes([x & 0x1F for x in orig_data])

This begins to make it look what you want is array.array or nump.array.
Python's arrays don't support bitwise operations either, but numpy's do.
How much overlap is there between the three types?  Does it make sense to
consider that canonical underlying array type now (or in the near future,
sometime before the release of 3.0 final)?

Skip

From ncoghlan at gmail.com  Mon Oct  1 23:18:19 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 02 Oct 2007 07:18:19 +1000
Subject: [Python-3000] bytes vs. array.array vs. numpy.array
In-Reply-To: <18177.11008.244338.509409@montanaro.dyndns.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com>
	<18177.11008.244338.509409@montanaro.dyndns.org>
Message-ID: <4701641B.4040501@gmail.com>

skip at pobox.com wrote:
>     Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations
>     Nick> in there eventually...
> 
>     Nick>    data = bytes([x & 0x1F for x in orig_data])
> 
> This begins to make it look what you want is array.array or nump.array.
> Python's arrays don't support bitwise operations either, but numpy's do.
> How much overlap is there between the three types?  Does it make sense to
> consider that canonical underlying array type now (or in the near future,
> sometime before the release of 3.0 final)?

Not hugely urgent for me - it's a direction I'd like to see the data 
type go in (as the less custom code needed on the C/C++ side of the 
fence to do reasonably efficient low level I/O the better as far as I am 
concerned), but work is still on 2.4 (with no compelling motivation to 
upgrade) so I'm personally resigned to the use of assorted ord(), chr() 
and ''.join() calls for the immediate future.

The advantage of having the bit manipulation features in the builtin 
bytes type for this kind of thing over numpy.array is that I expect the 
builtin bytes type to be usable directly with Py3k versions of libraries 
like pyserial, and numpy would be a big dependency to bring in just to 
get more efficient bit-oriented operations on a byte sequence - 
array.array doesn't have them (not to mention the fact that these 
operations would make far less sense for any array containing something 
other than bytes).

However, because the addition of any bit-oriented operations to the 
bytes/buffer types would be a new backwardly-compatible feature, it can 
be proposed whenever is convenient rather than having to be done right now.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Tue Oct  2 03:19:32 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 02 Oct 2007 14:19:32 +1300
Subject: [Python-3000] bytes vs. array.array vs. numpy.array
In-Reply-To: <4701641B.4040501@gmail.com>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com>
	<18177.11008.244338.509409@montanaro.dyndns.org>
	<4701641B.4040501@gmail.com>
Message-ID: <47019CA4.4010403@canterbury.ac.nz>

Nick Coghlan wrote:
> numpy would be a big dependency to bring in just to 
> get more efficient bit-oriented operations on a byte sequence

Random thought - if long integers were to use byte
sequences internally to hold their data, it might
be possible to get this more or less for free in
terms of code size.

-- 
Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | Carpe post meridiem!          	  |
Christchurch, New Zealand	   | (I'm not a morning person.)          |
greg.ewing at canterbury.ac.nz	   +--------------------------------------+

From tjreedy at udel.edu  Tue Oct  2 05:59:12 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 1 Oct 2007 23:59:12 -0400
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
	and	Mutable Buffer
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com><bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com>
Message-ID: <fdsfmg$7d6$1@sea.gmane.org>


"Nick Coghlan" <ncoghlan at gmail.com> wrote in message 
news:4700FC40.1060206 at gmail.com...
| Brett Cannon wrote:
| > +1 from me.
|
| Looks good to me too: +1
|
| I wouldn't mind seeing some iteration-in-C bit-bashing operations in
| there eventually, but they aren't needed on the first pass, and even
| being able to do things like the following will be a decent improvement
| over the status quo for low-level bitstream manipulation:
|
|   data = bytes([x & 0x1F for x in orig_data])

If orig_data were mutable (the new buffer, as proposed in the PEP), would 
not

for i in range(len(orig_data)):
  orig_data[i] &= 0x1F

do it in place? (I don't have .0a1 to try on the current bytes.)

tjr




From lists at cheimes.de  Tue Oct  2 09:59:26 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 02 Oct 2007 09:59:26 +0200
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <fdsfmg$7d6$1@sea.gmane.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com><bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>	<4700FC40.1060206@gmail.com>
	<fdsfmg$7d6$1@sea.gmane.org>
Message-ID: <fdstov$av5$1@sea.gmane.org>

Terry Reedy wrote:
> If orig_data were mutable (the new buffer, as proposed in the PEP), would 
> not
> 
> for i in range(len(orig_data)):
>   orig_data[i] &= 0x1F
> 
> do it in place? (I don't have .0a1 to try on the current bytes.)

Good catch!

Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
>>> orig_data = b"abc"
>>> orig_data
b'abc'
>>> for i in range(len(orig_data)):
...   orig_data[i] &= 0x1F
...
>>> orig_data
b'\x01\x02\x03'

It'd be useful and more efficient if the new buffer type would support
the bit wise operations directly:

>>> orig_data &= 0x1F
TypeError: unsupported operand type(s) for &=: 'bytes' and 'int'
>>> orig_data &= b"\x1F"
TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes'

Christian


From guido at python.org  Tue Oct  2 16:10:04 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Oct 2007 07:10:04 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and
	Mutable Buffer
In-Reply-To: <fdstov$av5$1@sea.gmane.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com> <fdsfmg$7d6$1@sea.gmane.org>
	<fdstov$av5$1@sea.gmane.org>
Message-ID: <ca471dc20710020710l59b9e409p8a441b927e9647cf@mail.gmail.com>

I am hereby accepting my own PEP 3137. The responses fell into three
categories: enthusiastic +1s, textual corrections, and ideas for
future enhancements. That's about as positive as it gets for any
proposal. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adam at hupp.org  Tue Oct  2 16:37:22 2007
From: adam at hupp.org (Adam Hupp)
Date: Tue, 2 Oct 2007 10:37:22 -0400
Subject: [Python-3000] Emacs22 python.el support for py3k
Message-ID: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>

I've submitted patches to emacs for python 3000 support.  It does not
handle any new syntax but the emacs<->python interaction works again.
This applies to the python.el that ships with emacs22, not
python-mode.el.

The changes are available in emacs cvs.  If you don't want to build a
new copy it should be sufficient to pull the files python.el,
emacs.py, emacs2.py and emacs3.py.

-- 
Adam Hupp | http://hupp.org/adam/

From guido at python.org  Tue Oct  2 17:04:34 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Oct 2007 08:04:34 -0700
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
Message-ID: <ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>

On 10/2/07, Adam Hupp <adam at hupp.org> wrote:
> I've submitted patches to emacs for python 3000 support.  It does not
> handle any new syntax but the emacs<->python interaction works again.
> This applies to the python.el that ships with emacs22, not
> python-mode.el.

Just curious -- how do python.el and python-mode.el differ?

> The changes are available in emacs cvs.  If you don't want to build a
> new copy it should be sufficient to pull the files python.el,
> emacs.py, emacs2.py and emacs3.py.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adam at hupp.org  Tue Oct  2 17:28:19 2007
From: adam at hupp.org (Adam Hupp)
Date: Tue, 2 Oct 2007 11:28:19 -0400
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
	<ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
Message-ID: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>

On 10/2/07, Guido van Rossum <guido at python.org> wrote:
>
> Just curious -- how do python.el and python-mode.el differ?

Off the top of my head:

 * python-mode.el did not play well with transient-mark-mode
(mark-block didn't work).   transient-mark-mode highlights the marked
region and is required for other functions (e.g. comment-dwim).

 * python-mode.el had problems with syntax highlighting in the
presence of triple quoted strings and in comments.  python.el does
not.

 * python.el is supposed to be more consistent with other major modes.
 e.g. M-; for comment.

 * python.el ships with emacs.  There are claims that python-mode.el
was not as well maintained for FSF emacs as XEmacs.

-- 
Adam Hupp | http://hupp.org/adam/

From barry at python.org  Tue Oct  2 17:33:44 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 2 Oct 2007 11:33:44 -0400
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
	<ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
	<766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
Message-ID: <8B5D00B9-F765-43F6-B3DE-AA6BB50CA611@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 2, 2007, at 11:28 AM, Adam Hupp wrote:

> On 10/2/07, Guido van Rossum <guido at python.org> wrote:
>>
>> Just curious -- how do python.el and python-mode.el differ?
>
> Off the top of my head:
>
>  * python-mode.el did not play well with transient-mark-mode
> (mark-block didn't work).   transient-mark-mode highlights the marked
> region and is required for other functions (e.g. comment-dwim).
>
>  * python-mode.el had problems with syntax highlighting in the
> presence of triple quoted strings and in comments.  python.el does
> not.
>
>  * python.el is supposed to be more consistent with other major modes.
>  e.g. M-; for comment.
>
>  * python.el ships with emacs.  There are claims that python-mode.el
> was not as well maintained for FSF emacs as XEmacs.

It would be nice if there were only one mode that worked with both  
FSF Emacs and XEmacs and merged the best qualities of both modes.  I  
don't have much time to work on that, and I suspect Skip is pretty  
busy too.  Adam, if you're interested, willing, and able to help  
develop such a merge, python-mode at python.org would be the place to do  
so.

I'd certainly be willing to test and I'd try to do a limited amount  
of XEmacs compatibility hacking.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRwJk2XEjvBPtnXfVAQJ9ZgP/bbG+OSHEnWGCBIXibnTzxEUL2ifIO8YU
E/odKLMogXKFc40/weansKpjX9+Mv+/ye7a49HPH+AZ2vxKJsFvZVHill6F3pbh2
bd+94O1AkYIsuJwO7u3Pc3clje85jXDSUtmPRM3yWGweLDNNDaS4kxE02tNqdSTd
rKiHn4gUzYk=
=zMKd
-----END PGP SIGNATURE-----

From tjreedy at udel.edu  Tue Oct  2 17:59:07 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 2 Oct 2007 11:59:07 -0400
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
	andMutable Buffer
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com><bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>	<4700FC40.1060206@gmail.com><fdsfmg$7d6$1@sea.gmane.org>
	<fdstov$av5$1@sea.gmane.org>
Message-ID: <fdtpsa$135$1@sea.gmane.org>


"Christian Heimes" <lists at cheimes.de> wrote in message 
news:fdstov$av5$1 at sea.gmane.org...
| Terry Reedy wrote:
| > If orig_data were mutable (the new buffer, as proposed in the PEP), 
would
| > not
| >
| > for i in range(len(orig_data)):
| >   orig_data[i] &= 0x1F
| >
| > do it in place? (I don't have .0a1 to try on the current bytes.)
|
| Good catch!
|
| Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57)
| [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
| >>> orig_data = b"abc"
| >>> orig_data
| b'abc'
| >>> for i in range(len(orig_data)):
| ...   orig_data[i] &= 0x1F
| ...
| >>> orig_data
| b'\x01\x02\x03'

Thanks for testing this!  Glad it worked.  This sort of thing makes having 
bytes/buffer[i] an int a plus.  (Just noticed, PEP accepted.)

| It'd be useful and more efficient if the new buffer type would support
| the bit wise operations directly:
|
| >>> orig_data &= 0x1F
| TypeError: unsupported operand type(s) for &=: 'bytes' and 'int'

This sort of broadcast behavior seems like numpy territory to me.  Or 
better for a buffer subclass.  Write it first in Python, using loops like 
above (partly for documentation and other implementations), then in C when 
interest and usage warrents.

| >>> orig_data &= b"\x1F"
| TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes'

Ugh is my response.  Stick with the first ;-).

Terry Jan Reedy




From guido at python.org  Tue Oct  2 18:24:01 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Oct 2007 09:24:01 -0700
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
	<ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
	<766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
Message-ID: <ca471dc20710020924j2a81bb95uc57fec290419eb7e@mail.gmail.com>

So is python.el a descendant of python-mode.el, or an independent development?

On 10/2/07, Adam Hupp <adam at hupp.org> wrote:
> On 10/2/07, Guido van Rossum <guido at python.org> wrote:
> >
> > Just curious -- how do python.el and python-mode.el differ?
>
> Off the top of my head:
>
>  * python-mode.el did not play well with transient-mark-mode
> (mark-block didn't work).   transient-mark-mode highlights the marked
> region and is required for other functions (e.g. comment-dwim).
>
>  * python-mode.el had problems with syntax highlighting in the
> presence of triple quoted strings and in comments.  python.el does
> not.
>
>  * python.el is supposed to be more consistent with other major modes.
>  e.g. M-; for comment.
>
>  * python.el ships with emacs.  There are claims that python-mode.el
> was not as well maintained for FSF emacs as XEmacs.
>
> --
> Adam Hupp | http://hupp.org/adam/
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adam at hupp.org  Tue Oct  2 18:44:54 2007
From: adam at hupp.org (Adam Hupp)
Date: Tue, 2 Oct 2007 12:44:54 -0400
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <ca471dc20710020924j2a81bb95uc57fec290419eb7e@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
	<ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
	<766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
	<ca471dc20710020924j2a81bb95uc57fec290419eb7e@mail.gmail.com>
Message-ID: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com>

On 10/2/07, Guido van Rossum <guido at python.org> wrote:
> So is python.el a descendant of python-mode.el, or an independent development?

I've never seen a definitive statement but I believe it was developed
independently.

-- 
Adam Hupp | http://hupp.org/adam/

From skip at pobox.com  Tue Oct  2 19:05:17 2007
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 2 Oct 2007 12:05:17 -0500
Subject: [Python-3000] Emacs22 python.el support for py3k
In-Reply-To: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com>
References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com>
	<ca471dc20710020804k315b536dqfc3eec392c05a0b6@mail.gmail.com>
	<766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com>
	<ca471dc20710020924j2a81bb95uc57fec290419eb7e@mail.gmail.com>
	<766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com>
Message-ID: <18178.31309.146267.585340@montanaro.dyndns.org>


    Guido> So is python.el a descendant of python-mode.el, or an independent
    Guido> development?

    Adam> I've never seen a definitive statement but I believe it was
    Adam> developed independently.

Correct.

Skip

From qrczak at knm.org.pl  Tue Oct  2 20:49:07 2007
From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk)
Date: Tue, 02 Oct 2007 20:49:07 +0200
Subject: [Python-3000] Python, int/long and GMP
In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com>
References: <200709280429.39396.victor.stinner@haypocalc.com>
	<400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net>
	<aac2c7cb0709280944n15419164w124c7ed5cda85b57@mail.gmail.com>
	<200709281858.29705.victor.stinner@haypocalc.com>
Message-ID: <1191350947.8483.6.camel@qrnik>

Dnia 28-09-2007, Pt o godzinie 18:58 +0200, Victor Stinner pisze:

> I don't know GMP internals. I thaught that GMP uses an hack for small
> integers.

It does not.

(And I'm glad that it does not, because it allows for super-specialized
representation of small integers where even the space for mpz_t itself
is not allocated. An GMP-internal optimization for the same cases would
be underutilized and thus wasteful.)

> I may also use Python garbage collector for GMP memory allocations
> since GMP allows to use my own memory allocating functions.

This would make linking with another library which uses GMP impossible
(unless the allocator is compatible with malloc, reentrant etc.).
Glasgow Haskell has been unfortunate to go that way.

> GMP also has its own reference counter mechanism :-/

It does not.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From mark at qtrac.eu  Wed Oct  3 04:24:50 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 3 Oct 2007 03:24:50 +0100
Subject: [Python-3000] Are strs sequences of characters or disguised byte
	strings?
Message-ID: <200710030324.50588.mark@qtrac.eu>

In Python 3.0a1, exec() appears to normalize strings, but in other cases
they don't appear to be normalized, and this leads to results that
appear to be counter-intuitive in some cases, at least to me.

    >>> c1 = "\u00C7"
    >>> c2 = "C\u0327"
    >>> c3 = "\u0043\u0327"
    >>> c1, c2, c3
    ('\xc7', 'C\u0327', 'C\u0327')
    >>> print(c1, c2)
    ? ?

Clearly c1 and c2 are different at the byte level. But if we use them to
create variables using exec(), Python appears to normalize them:

    >>> dir()
    ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3']
    >>> exec("C\u0327 = 5")
    >>> dir()
    ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
    >>> ?
    5
    >>> exec("\u00C7 = -7")
    >>> dir()
    ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
    >>> ?
    -7

This seems to be the right behaviour to me, since from the point of view
of a programmer, ? is the name of the variable, no matter what the
underlying byte encoding used to represent the variable's name.

    >>> print(c1, c2)
    ? ?
    >>> c1.encode("utf8") == c2.encode("utf8")
    False

This is what I'd expect, since here I'm comparing the actual bytes.

But when I compare them as strings I really expect them to be compared
as sequences of characters (in a human sense), so this:

    >>> c1 == c2
    False

seems counter-intuitive to me. It is easy to fix:

    >>> from unicodedata import normalize
    >>> normalize("NFKD", c1) == normalize("NFKD", c2)
    True

but isn't it asking a lot of Python users to use normalize() whenever
they want to perform such a basic operation as string comparison?

Another issue that arises is that you can end up with duplicate
dictionary keys and set elements. (The duplication is in human terms, in
byte terms the keys/set elements differ of course):

    >>> d = {c1: 1, c2: 2}
    >>> d
    {'C\u0327': 2, '\xc7': 1}
    >>> for k, v in d.items():
    ...     print(k, v)
    ...
    ? 2
    ? 1

I think this is surprising.

    >>> s = {c1, c2}
    >>> s
    {'C\u0327', '\xc7'}
    >>> for x in s:
    ...     print(x)
    ...
    ?
    ?

And the same result applies to sets of course.

I don't know what the performance costs would be for always normalizing
strings, but it seems to me that if strings are not normalized, then
they are really being treated as byte strings thinly disguised as
strings rather than as true sequences of characters whose byte
representation is a detail that programmers can ignore (unless they
choose to explicitly decode).

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From guido at python.org  Wed Oct  3 05:28:56 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 2 Oct 2007 20:28:56 -0700
Subject: [Python-3000] Are strs sequences of characters or disguised
	byte strings?
In-Reply-To: <200710030324.50588.mark@qtrac.eu>
References: <200710030324.50588.mark@qtrac.eu>
Message-ID: <ca471dc20710022028g5a074d2ev9f62986518bc0b4f@mail.gmail.com>

String objects are arrays of code units. They can represent normalized
and unnormalized Unicode text just as easily, and even invalid data,
like half a surrogate and other illegal code units. It is up to the
application (or perhaps at some point the library) to implement
various checks and normalizations. AFAIK this is the same stance that
Java and C# take -- the String types there don't concern themselves
with the higher levels of Unicode standard compliance. (Though those
languages probably have more library support than Python does --
perhaps someone can contribute something, like wrappers for ICU?)

However, for identifiers occurring in source code, we *do* normalize
before comparing them. PEP 3131 should explain this.

--Guido

On 10/2/07, Mark Summerfield <mark at qtrac.eu> wrote:
> In Python 3.0a1, exec() appears to normalize strings, but in other cases
> they don't appear to be normalized, and this leads to results that
> appear to be counter-intuitive in some cases, at least to me.
>
>     >>> c1 = "\u00C7"
>     >>> c2 = "C\u0327"
>     >>> c3 = "\u0043\u0327"
>     >>> c1, c2, c3
>     ('\xc7', 'C\u0327', 'C\u0327')
>     >>> print(c1, c2)
>     ? ?
>
> Clearly c1 and c2 are different at the byte level. But if we use them to
> create variables using exec(), Python appears to normalize them:
>
>     >>> dir()
>     ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3']
>     >>> exec("C\u0327 = 5")
>     >>> dir()
>     ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
>     >>> ?
>     5
>     >>> exec("\u00C7 = -7")
>     >>> dir()
>     ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7']
>     >>> ?
>     -7
>
> This seems to be the right behaviour to me, since from the point of view
> of a programmer, ? is the name of the variable, no matter what the
> underlying byte encoding used to represent the variable's name.
>
>     >>> print(c1, c2)
>     ? ?
>     >>> c1.encode("utf8") == c2.encode("utf8")
>     False
>
> This is what I'd expect, since here I'm comparing the actual bytes.
>
> But when I compare them as strings I really expect them to be compared
> as sequences of characters (in a human sense), so this:
>
>     >>> c1 == c2
>     False
>
> seems counter-intuitive to me. It is easy to fix:
>
>     >>> from unicodedata import normalize
>     >>> normalize("NFKD", c1) == normalize("NFKD", c2)
>     True
>
> but isn't it asking a lot of Python users to use normalize() whenever
> they want to perform such a basic operation as string comparison?
>
> Another issue that arises is that you can end up with duplicate
> dictionary keys and set elements. (The duplication is in human terms, in
> byte terms the keys/set elements differ of course):
>
>     >>> d = {c1: 1, c2: 2}
>     >>> d
>     {'C\u0327': 2, '\xc7': 1}
>     >>> for k, v in d.items():
>     ...     print(k, v)
>     ...
>     ? 2
>     ? 1
>
> I think this is surprising.
>
>     >>> s = {c1, c2}
>     >>> s
>     {'C\u0327', '\xc7'}
>     >>> for x in s:
>     ...     print(x)
>     ...
>     ?
>     ?
>
> And the same result applies to sets of course.
>
> I don't know what the performance costs would be for always normalizing
> strings, but it seems to me that if strings are not normalized, then
> they are really being treated as byte strings thinly disguised as
> strings rather than as true sequences of characters whose byte
> representation is a detail that programmers can ignore (unless they
> choose to explicitly decode).
>
> --
> Mark Summerfield, Qtrac Ltd., www.qtrac.eu
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Wed Oct  3 19:30:46 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 03 Oct 2007 19:30:46 +0200
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
	andMutable Buffer
In-Reply-To: <fdtpsa$135$1@sea.gmane.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com><bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>	<4700FC40.1060206@gmail.com><fdsfmg$7d6$1@sea.gmane.org>	<fdstov$av5$1@sea.gmane.org>
	<fdtpsa$135$1@sea.gmane.org>
Message-ID: <fe0jkf$lc3$1@sea.gmane.org>

Terry Reedy wrote:
> | It'd be useful and more efficient if the new buffer type would support
> | the bit wise operations directly:
> |
> | >>> orig_data &= 0x1F
> | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int'
> 
> This sort of broadcast behavior seems like numpy territory to me.  Or 
> better for a buffer subclass.  Write it first in Python, using loops like 
> above (partly for documentation and other implementations), then in C when 
> interest and usage warrents.

The C implementation of the bit wise operations for buffer() gains a
large speed improvement over the Python implementation. I'm not sure if
Guido would like it and I don't have a use case yet but it sounds like a
useful addition to the new buffer() type:

buffer &= smallint
buffer |= smallint
buffer ^= smallint
newbuffer = buffer & smallint
newbuffer = buffer | smallint
newbuffer = buffer ^ smallint

I'm willing to give it a try and implement it if people are interested
in it.

I have an use case for another feature but that's surely out of the
scope for the Python core. For some algorithms especially cryptographic
algorithms I could use a bytes type which contains larger elements than
a char (unsigned int8) and which does overflow (255 + 1 == 0).

for b in bytes(b"....", wordsize=32, signed=True):
    ...

Again, it's just a pipe dream and I tend to say that it doesn't belong
into the core.

> 
> | >>> orig_data &= b"\x1F"
> | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes'
> 
> Ugh is my response.  Stick with the first ;-).

Ugh, too :)

Christian


From guido at python.org  Wed Oct  3 19:36:32 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 3 Oct 2007 10:36:32 -0700
Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes
	andMutable Buffer
In-Reply-To: <fe0jkf$lc3$1@sea.gmane.org>
References: <ca471dc20709301625x4bca1a20i60c91053ab01076d@mail.gmail.com>
	<bbaeab100709301711w46672f34w812239c5ef20ef82@mail.gmail.com>
	<4700FC40.1060206@gmail.com> <fdsfmg$7d6$1@sea.gmane.org>
	<fdstov$av5$1@sea.gmane.org> <fdtpsa$135$1@sea.gmane.org>
	<fe0jkf$lc3$1@sea.gmane.org>
Message-ID: <ca471dc20710031036m13846fe6u35a92bfb84084331@mail.gmail.com>

On 10/3/07, Christian Heimes <lists at cheimes.de> wrote:
> I don't have a use case yet but it sounds like a
> useful addition to the new buffer() type:

That's a contradiction. Without a use case it's not useful.

Let's be conservative on these "kitchen sink" ideas. They belong in
python-ideas anyway.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nas at arctrix.com  Wed Oct  3 20:01:06 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Wed, 3 Oct 2007 18:01:06 +0000 (UTC)
Subject: [Python-3000] Simplifying pickle for Py3k
Message-ID: <fe0ld1$shp$1@sea.gmane.org>

I guess the library overhaul hasn't really started it but it would
be nice if the pickle module could get some work.  Today I'm trying
to efficiently store a class using pickle and the documentation is
making my head hurt.  I don't think the documentation itself is the
problem, just the fact that the rules are so complicated.

I guess there are several different solutions:

    * Remove backwards compatible stuff from the code and the
      documentation.  The downside is that old pickles could not be
      loaded.  Perhaps that's not a huge issue since the removal of
      old-style classes might already break old pickles.

    * Remove the backwards compatible stuff from the documentation
      only.  The would help people using the language but would
      still be a long term maintenance issue.

    * Leave the old code in but generate warnings when old pickle
      mechanisms are used.  Eventually the old stuff could be
      removed from the code.

    * Provide an "oldpickle" module the supports pre-3k pickles.

I think I like the warnings idea best.

  Neil


From guido at python.org  Wed Oct  3 20:29:18 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 3 Oct 2007 11:29:18 -0700
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0ld1$shp$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
Message-ID: <ca471dc20710031129n5ad93973of62bec56d1b91273@mail.gmail.com>

I think it's essential to be able to *read* pickles generated by older
Python versions. But for writing I'm okay with only writing protocol 2
(which Python 2.x also understands) and only supporting the modern
APIs for customizing pickle writing.

I don't think classic class instances are necessarily unpicklable in
3.0 -- they will just show up as instances of the corresponding
new-style classes.

--Guido

On 10/3/07, Neil Schemenauer <nas at arctrix.com> wrote:
> I guess the library overhaul hasn't really started it but it would
> be nice if the pickle module could get some work.  Today I'm trying
> to efficiently store a class using pickle and the documentation is
> making my head hurt.  I don't think the documentation itself is the
> problem, just the fact that the rules are so complicated.
>
> I guess there are several different solutions:
>
>     * Remove backwards compatible stuff from the code and the
>       documentation.  The downside is that old pickles could not be
>       loaded.  Perhaps that's not a huge issue since the removal of
>       old-style classes might already break old pickles.
>
>     * Remove the backwards compatible stuff from the documentation
>       only.  The would help people using the language but would
>       still be a long term maintenance issue.
>
>     * Leave the old code in but generate warnings when old pickle
>       mechanisms are used.  Eventually the old stuff could be
>       removed from the code.
>
>     * Provide an "oldpickle" module the supports pre-3k pickles.
>
> I think I like the warnings idea best.
>
>   Neil
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Wed Oct  3 20:28:48 2007
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Oct 2007 14:28:48 -0400
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0ld1$shp$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
Message-ID: <09EFA1D6-BF99-47A5-8C04-9C481E6DA75D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 3, 2007, at 2:01 PM, Neil Schemenauer wrote:

> I guess the library overhaul hasn't really started it but it would
> be nice if the pickle module could get some work.  Today I'm trying
> to efficiently store a class using pickle and the documentation is
> making my head hurt.  I don't think the documentation itself is the
> problem, just the fact that the rules are so complicated.

+1.  Try reverse engineering those rules if you really want to have  
some fun. ;)

> I guess there are several different solutions:
>
>     * Remove backwards compatible stuff from the code and the
>       documentation.  The downside is that old pickles could not be
>       loaded.  Perhaps that's not a huge issue since the removal of
>       old-style classes might already break old pickles.
>
>     * Remove the backwards compatible stuff from the documentation
>       only.  The would help people using the language but would
>       still be a long term maintenance issue.
>
>     * Leave the old code in but generate warnings when old pickle
>       mechanisms are used.  Eventually the old stuff could be
>       removed from the code.
>
>     * Provide an "oldpickle" module the supports pre-3k pickles.
>
> I think I like the warnings idea best.

I'm not sure about eventually removing the code, since we may need  
long term support for migration from 2.x pickles to 3.0 pickles.   
OTOH, if 2to3 or Python 2.6+ could include pickle migration code,  
that might be fine.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRwPfYXEjvBPtnXfVAQJfSwQAnoAmgSQy99rJz4C+hks0jvKZz5X3yNOa
qV9pV9942KEVZN5lwXLtzoWAnBr9MpXTjZ9AEmDgJVScSXV4Vk/MegsS/Q8R2diG
88x1vpuXQF333CHgWnGiQYw6lysZfP5rbKEHaOYwQB4mjLTS7VSKuZdVtZvvMGH8
7HDj3GqqC0I=
=1Plz
-----END PGP SIGNATURE-----

From g.brandl at gmx.net  Wed Oct  3 20:36:19 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 03 Oct 2007 20:36:19 +0200
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0ld1$shp$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
Message-ID: <fe0nf0$4n3$1@sea.gmane.org>

Neil Schemenauer schrieb:
> I guess the library overhaul hasn't really started it but it would
> be nice if the pickle module could get some work.  Today I'm trying
> to efficiently store a class using pickle and the documentation is
> making my head hurt.  I don't think the documentation itself is the
> problem, just the fact that the rules are so complicated.
> 
> I guess there are several different solutions:
> 
>     * Remove backwards compatible stuff from the code and the
>       documentation.  The downside is that old pickles could not be
>       loaded.  Perhaps that's not a huge issue since the removal of
>       old-style classes might already break old pickles.
> 
>     * Remove the backwards compatible stuff from the documentation
>       only.  The would help people using the language but would
>       still be a long term maintenance issue.
> 
>     * Leave the old code in but generate warnings when old pickle
>       mechanisms are used.  Eventually the old stuff could be
>       removed from the code.
> 
>     * Provide an "oldpickle" module the supports pre-3k pickles.
> 
> I think I like the warnings idea best.

I'm in favor of #1, perhaps combined with #4. With the fundamental
change in basic types (unicode -> str, str -> bytes) I wouldn't
expect 2.x pickles to be loadable by 3.0 anyway.

Cruft removal from the pickle protocol is really needed; I don't envy
everyone reading the pickle docs trying to understand which method
exactly he has to implement, which is going to be called with what
arguments, etc.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From skip at pobox.com  Wed Oct  3 22:27:34 2007
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 3 Oct 2007 15:27:34 -0500
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0nf0$4n3$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
        <fe0nf0$4n3$1@sea.gmane.org>
Message-ID: <18179.64310.424753.609880@montanaro.dyndns.org>


    Georg> I don't envy everyone reading the pickle docs trying to
    Georg> understand which method exactly he has to implement, which is
    Georg> going to be called with what arguments, etc.

Agreed.  I've been going through that (painful) exercise the past couple of
days as I try and figure out what methods my to-be-pickled objects need to
implement.  __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg,
__safe_for_unpickling__, __getnewargs__.  Your head starts to swim after
awhile.

Skip

From lists at cheimes.de  Wed Oct  3 23:52:10 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 03 Oct 2007 23:52:10 +0200
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0ld1$shp$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
Message-ID: <fe12ub$d6j$1@sea.gmane.org>

Neil Schemenauer wrote:
> I guess there are several different solutions:
> 
>     * Remove backwards compatible stuff from the code and the
>       documentation.  The downside is that old pickles could not be
>       loaded.  Perhaps that's not a huge issue since the removal of
>       old-style classes might already break old pickles.
> 
>     * Remove the backwards compatible stuff from the documentation
>       only.  The would help people using the language but would
>       still be a long term maintenance issue.
> 
>     * Leave the old code in but generate warnings when old pickle
>       mechanisms are used.  Eventually the old stuff could be
>       removed from the code.
> 
>     * Provide an "oldpickle" module the supports pre-3k pickles.
> 
> I think I like the warnings idea best.

Please keep in mind that we want people to move to Python 3.x. Pickles
are very important for a bunch of well known and large Python
applications like Zope2, Zope3, Mailman and probably many more. Zope's
ZODB makes heavy use of pickles. If you remove the support for old style
pickles from Python 2.x you also remove the migration path for a large
user base to Python 3.x.

I like to propose option (4b): Provide an oldpickle module which can
load old pickles and migrate an old pickle to a Python 3.x pickle. As
long as Python 3.0 can load and migrate old to new pickles I'm also for
option (1). The pickle module could use an emaciation.

Christian


From greg.ewing at canterbury.ac.nz  Thu Oct  4 00:24:14 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 04 Oct 2007 10:24:14 +1200
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <18179.64310.424753.609880@montanaro.dyndns.org>
References: <fe0ld1$shp$1@sea.gmane.org> <fe0nf0$4n3$1@sea.gmane.org>
	<18179.64310.424753.609880@montanaro.dyndns.org>
Message-ID: <4704168E.3090005@canterbury.ac.nz>

skip at pobox.com wrote:
> I've been going through that (painful) exercise the past couple of
> days as I try and figure out what methods my to-be-pickled objects need to
> implement.  __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg,
> __safe_for_unpickling__, __getnewargs__.  Your head starts to swim after
> awhile.

Not all of these are old cruft -- some of them are alternatives
that are useful in one situation or another. Some of them
could no doubt be removed, though.

--
Greg

From alexandre at peadrop.com  Thu Oct  4 08:49:16 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Thu, 4 Oct 2007 02:49:16 -0400
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <fe0ld1$shp$1@sea.gmane.org>
References: <fe0ld1$shp$1@sea.gmane.org>
Message-ID: <acd65fa20710032349k2d0bdfc5h72914509024a903b@mail.gmail.com>

On 10/3/07, Neil Schemenauer <nas at arctrix.com> wrote:
> I guess the library overhaul hasn't really started it but it would
> be nice if the pickle module could get some work.  Today I'm trying
> to efficiently store a class using pickle

Could you elaborate on what you are trying to do?

> and the documentation is making my head hurt.  I don't think the
> documentation itself is the problem, just the fact that the rules
> are so complicated.
>
> I guess there are several different solutions:
>
>     * Remove backwards compatible stuff from the code and the
>       documentation.  The downside is that old pickles could not be
>       loaded.  Perhaps that's not a huge issue since the removal of
>       old-style classes might already break old pickles.
>

This would not simplify the pickle module by much. So, I don't think
this would justify breaking backward-compatibility.

As far as I know, the removal of the old-style classes does not break
old pickle streams, since the code of classes is not pickled but
referenced.

>     * Remove the backwards compatible stuff from the documentation
>       only.  The would help people using the language but would
>       still be a long term maintenance issue.

The documentation for the pickle module is completely outdated and
confusing. In fact, some sections are outright wrong about how the
current module works. If I get some free time (which is unlikely,
right now), I will update the documentation.

>     * Leave the old code in but generate warnings when old pickle
>       mechanisms are used.  Eventually the old stuff could be
>       removed from the code.

Could point out specific examples of the "old code" that you are referring to?

>     * Provide an "oldpickle" module the supports pre-3k pickles.

As I said, old pickle streams should work fine with Py3k. So, adding
yet another pickle module is unnecessary.

-- Alexandre

From nas at arctrix.com  Fri Oct  5 06:59:30 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Thu, 4 Oct 2007 22:59:30 -0600
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <acd65fa20710032349k2d0bdfc5h72914509024a903b@mail.gmail.com>
References: <fe0ld1$shp$1@sea.gmane.org>
	<acd65fa20710032349k2d0bdfc5h72914509024a903b@mail.gmail.com>
Message-ID: <20071005045930.GA20564@arctrix.com>

On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote:
> Could you elaborate on what you are trying to do?

I'm trying to efficiently pickle a 'unicode' subclass.  I'm
disappointed that it's not possible to be as efficient as the
built-in unicode class, even when using an extension code.

> The documentation for the pickle module is completely outdated and
> confusing. In fact, some sections are outright wrong about how the
> current module works. If I get some free time (which is unlikely,
> right now), I will update the documentation.

Yes, I've changed my mind and agree.  PEP 307 provides a lot of
details that library docs do not but it's not written as a reference
doc.  Improved library docs would help a lot.

> >     * Leave the old code in but generate warnings when old pickle
> >       mechanisms are used.  Eventually the old stuff could be
> >       removed from the code.
> 
> Could point out specific examples of the "old code" that you are referring to?

I don't have time right now to point at specific code.  How about
the code that implements all the different versions of __reduce__
and code for __getinitargs__, __getstate__, __setstate__? 

In any case, it looks like there will be volunteers to maintain the
backwards compatability of the pickle module.  That's great.

  Neil

From mark at qtrac.eu  Fri Oct  5 09:20:39 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Fri, 5 Oct 2007 08:20:39 +0100
Subject: [Python-3000] Small renaming suggestion: re.sub() -> re.replace()
	or re.substitute()
Message-ID: <200710050820.39238.mark@qtrac.eu>

Hi,

It seems to me that one of the few really "bad" method names in the
Python library that I regularly encounter is re.sub().

I don't like the name because:
(1) It is an abbreviation, but not an "obvious" one like max and min
(2) It is an ambiguous name: could be substitute or could be subtract
(3) Elsewhere where special method __foo__ that implements a named (as
    opposed to symbol-based) method, that method is called foo. For
    example, __cmp__() -> cmp(), __int__() -> int(), __len__() -> len().
    But __add__ -> +, __sub__() -> -. 
(4) It is the only function with this name in the library; whereas there
    are several replace methods:
	bytes.replace()
	str.replace()
	datetime.date.replace()
	# and a few others, plus some replace_* functions.

Although re.substitute() would work (and be better than sub), I think
re.replace() is better and more consistent regarding the rest of the
library.

And as for subn, well, replacen or substituten are possible, but why not
have just one method and have an optional keyword argument if a tuple is
wanted?

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From facundobatista at gmail.com  Fri Oct  5 12:45:54 2007
From: facundobatista at gmail.com (Facundo Batista)
Date: Fri, 5 Oct 2007 07:45:54 -0300
Subject: [Python-3000] Small renaming suggestion: re.sub() ->
	re.replace() or re.substitute()
In-Reply-To: <200710050820.39238.mark@qtrac.eu>
References: <200710050820.39238.mark@qtrac.eu>
Message-ID: <e04bdf310710050345ka63fafax6640eecb88fa3d2f@mail.gmail.com>

2007/10/5, Mark Summerfield <mark at qtrac.eu>:

> Although re.substitute() would work (and be better than sub), I think
> re.replace() is better and more consistent regarding the rest of the
> library.

+1, happened twice to me, different jobs, that a colleague came to me
asking why there was no "replace" in "re".

Yes, sub() is even difficult to find (unless you *read* all the
descriptions of the methods).

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From alexandre at peadrop.com  Sat Oct  6 06:35:39 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sat, 6 Oct 2007 00:35:39 -0400
Subject: [Python-3000] Simplifying pickle for Py3k
In-Reply-To: <20071005045930.GA20564@arctrix.com>
References: <fe0ld1$shp$1@sea.gmane.org>
	<acd65fa20710032349k2d0bdfc5h72914509024a903b@mail.gmail.com>
	<20071005045930.GA20564@arctrix.com>
Message-ID: <acd65fa20710052135y18d2c6bcm17102c7851808380@mail.gmail.com>

On 10/5/07, Neil Schemenauer <nas at arctrix.com> wrote:
> On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote:
> > Could you elaborate on what you are trying to do?
>
> I'm trying to efficiently pickle a 'unicode' subclass.  I'm
> disappointed that it's not possible to be as efficient as the
> built-in unicode class, even when using an extension code.

There is a few things you could do to produce smaller pickle streams.
If you are certain that the objects you will pickle are not
self-referential, then you can set Pickler.fast to True. This will
disable the "memorizer", which adds a 2-bytes overhead to each objects
pickled (depending on the input, this might or not shorten the
resulting stream). If this isn't enough, then you could subclass
Pickler and Unpickler and define a custom rule for your unicode
subclass.

An obvious optimization for pickle, in Py3k, would to add support for
short unicode string. Currently, there is a 4-bytes overhead per
string. Since Py3k is unicode throughout, this overhead can become
quite large.

> > Could point out specific examples of the "old code" that you are referring to?
>
> I don't have time right now to point at specific code.  How about
> the code that implements all the different versions of __reduce__
> and code for __getinitargs__, __getstate__, __setstate__?

At first glance, __reduce__ seems to be useful only for instances of
subclasses of built-in type. However, __getnewsargs__ could easily
replace it for that. So, removing __reduce__ (and __reduce_ex__) is
probably a good idea.

As far as I know, the current pickle module doesn't use
__getinitargs__ (this is one of the things the documentation is
totally wrong about).

As for __getstate__ and __setstate__, I think they are essential.
Without them, you couldn't pickle objects with __slots__ or save the
I/O state of certain objects.

It would certainly be possible to simplify a little the algorithm used
for pickling class instances. In "pseudo-code", it would look like
something along these lines:

    def save_obj(obj):
        # let obj be the instance of a user-defined class
        cls = obj.__class__
        if hasattr(obj, "__getnewargs__"):
            args = obj.__getnewargs__()
        else:
            args = ()
        if hasattr(obj, "__getstate__"):
            state = obj.__getstate__()
        else:
            state = obj.__dict__
        return (cls, args, state)

    def load_obj(cls, args, state):
        obj = cls.__new__(cls, *args)
        if hasattr(obj, "__getstate__"):
            try:
                obj.__setstate__(state)
            except AttributeError:
                raise UnpicklingError
        else:
            obj.__dict__.update(state)
        return obj

The main difference, between this and current method used to pickle
instances, is the use of __getnewargs__, instead of __reduce__.

-- Alexandre

From guido at python.org  Mon Oct  8 06:32:59 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Oct 2007 21:32:59 -0700
Subject: [Python-3000] PEP 3137 plan of attack
Message-ID: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>

I'd like to make complete implementation of PEP 3137 the goal for the
3.0a2 release. It should be doable to do this release by the end of
October. I don't think anything else *needs* to be done to have a
successful a2 release.

The work for PEP 3137 can be split into a number of relatively
independent steps. In some cases these can even be carried out in
either order. I'd love to see volunteers for each of these steps.

Note: I'll refer to the three string types by their C names, as I plan
to keep those unchanged in 3.0a2. We can rename them later, but
renaming them will make merging from the trunk and converting 3rd
party extensions harder. The C names are PyString (immutable bytes),
PyBytes (mutable bytes), PyUnicode (immutable unicode code units,
either 16 bits or 32 bits).

The tasks I can think of are:

- remove locale support from PyString
- remove compatibility with PyUnicode from PyString
- remove compatibility with PyString from PyUnicode
- add missing methods to PyBytes (for list, see the PEP and compare to
what's already there)
- remove buffer API from PyUnicode
- make == and != between PyBytes and PyUnicode return False instead of
raising TypeError
- make == and != between PyString and Pyunicode return False instead
of converting
- make comparisons between PyString and PyBytes work (these are
properly ordered)
- change lots of places (e.g. encoders) to return PyString instead of PyBytes
- change indexing and iteration over PyString to return ints, not
1-char PyStrings
- change PyString's repr() to return "b'...'"
- change PyBytes's repr() to return "buffer(b'...')"
- change parser so that b"..." returns PyString, not PyBytes
- rename bytes -> buffer, str8 -> bytes

If a task is done independently from the others, it should include
changes to keep the unit tests working.

If you volunteer, please send out an email to this list before you
start doing any work, to avoid duplicate work (unless sending the
email would take more time than it would take to write the code,
compile it, run all unit tests, and upload the patch). I'd appreciate
it if you gave an estimate for when you expect to be done (or give up)
too. For code submissions, please use bugs.python.org and send an
email pointing to the relevant issue to this list.

PS. Is there anyone who understands test_urllib2net and can fix it?
It's been failing for weeks (maybe months) now.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tom at vector-seven.com  Mon Oct  8 07:03:37 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Mon, 08 Oct 2007 15:03:37 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <4709BA29.3060503@vector-seven.com>

Guido van Rossum wrote:
> - make == and != between PyBytes and PyUnicode return False instead of
> raising TypeError
> - make == and != between PyString and Pyunicode return False instead
> of converting
> - make comparisons between PyString and PyBytes work (these are
> properly ordered)
>   
If nobody else is doing this, it sounds like sounds like something I - 
as a relative newbie - could handle. Possibly the repr() stuff too if 
nobody else wants that. Should be able to get a patch up before Friday.

Cheers,
Tom


From lists at cheimes.de  Mon Oct  8 13:17:55 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 08 Oct 2007 13:17:55 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <fed3l3$q7t$1@sea.gmane.org>

Guido van Rossum wrote:
> - change PyString's repr() to return "b'...'"
> - change PyBytes's repr() to return "buffer(b'...')"
> - change parser so that b"..." returns PyString, not PyBytes

I'll take the three steps. They sound like low hanging fruits even for a
noob like me. I expect to have a working patch in the new couple of days.

Christian


From greg at krypto.org  Mon Oct  8 18:32:31 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 8 Oct 2007 09:32:31 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>

> - add missing methods to PyBytes (for list, see the PEP and compare to
> what's already there)
> - remove buffer API from PyUnicode


I'll take these two with a goal of having them done by the end of the week.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071008/3e1e9a51/attachment.htm 

From janssen at parc.com  Mon Oct  8 19:51:23 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 8 Oct 2007 10:51:23 PDT
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com> 
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <07Oct8.105132pdt."57996"@synergy1.parc.xerox.com>

I think I can spend some time on the 3K SSL support, but I've been
waiting till the "bytes" work settles down.  Sounds like I should
keep waiting a bit more?  Or have the C APIs already settled?

Bill

From guido at python.org  Mon Oct  8 20:42:09 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 11:42:09 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <7125419022533265919@unknownmsgid>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<7125419022533265919@unknownmsgid>
Message-ID: <ca471dc20710081142x674105efpa897df1cc21e37b@mail.gmail.com>

On 10/8/07, Bill Janssen <janssen at parc.com> wrote:
> I think I can spend some time on the 3K SSL support, but I've been
> waiting till the "bytes" work settles down.  Sounds like I should
> keep waiting a bit more?  Or have the C APIs already settled?

The C APIs haven't quite settled down yet, but I'd like to convince
you that you needn't wait. For all bytes input, you should use the
(new) buffer API,i. e. PyObject_GetBuffer() and
PyObject_ReleaseBuffer() (grep for usage examples if they aren't
sufficiently documented in the docs or in PEP 3118). For stuff that
returns bytes, you can either use PyBytes_FromStringAndSize() -- which
is the 3.0a1 recommended best practice (returning a mutable bytes
object) or PyString_FromStringAndSize() -- which will be the 3.0a2 way
of returning an immutable bytes object). Since they have the same
signature there's very little to worry about having to change this
around later.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Mon Oct  8 21:50:02 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Oct 2007 12:50:02 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>

On 10/7/07, Guido van Rossum <guido at python.org> wrote:
[SNIP]
> PS. Is there anyone who understands test_urllib2net and can fix it?
> It's been failing for weeks (maybe months) now.

I don't understand it but I fixed it in r58378.  =)

When ftplib.FTP was converted over to Py3K it was given a default
encoding of ASCII on all read data, but that doesn't work as the stuff
on the other end could be latin1 (and it was).  So I just changed the
default encoding.

-Brett

From guido at python.org  Mon Oct  8 21:51:59 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 12:51:59 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
Message-ID: <ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>

On 10/8/07, Brett Cannon <brett at python.org> wrote:
> On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> [SNIP]
> > PS. Is there anyone who understands test_urllib2net and can fix it?
> > It's been failing for weeks (maybe months) now.
>
> I don't understand it but I fixed it in r58378.  =)
>
> When ftplib.FTP was converted over to Py3K it was given a default
> encoding of ASCII on all read data, but that doesn't work as the stuff
> on the other end could be latin1 (and it was).  So I just changed the
> default encoding.

Cool. Though how do you know it was really latin1? Is there anything
standardized about the encoding used by FTP?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct  8 22:03:35 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 13:03:35 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <ca471dc20710081303i57620cdfk332b90cd24bf320a@mail.gmail.com>

On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> - remove locale support from PyString
> - remove compatibility with PyUnicode from PyString
> - remove compatibility with PyString from PyUnicode

I'll tackle these myself by Friday, unless someone else beats me to it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Mon Oct  8 22:05:31 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Oct 2007 13:05:31 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
	<ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
Message-ID: <bbaeab100710081305r7bda323bka1a2c5f4575f8ff0@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> On 10/8/07, Brett Cannon <brett at python.org> wrote:
> > On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > [SNIP]
> > > PS. Is there anyone who understands test_urllib2net and can fix it?
> > > It's been failing for weeks (maybe months) now.
> >
> > I don't understand it but I fixed it in r58378.  =)
> >
> > When ftplib.FTP was converted over to Py3K it was given a default
> > encoding of ASCII on all read data, but that doesn't work as the stuff
> > on the other end could be latin1 (and it was).  So I just changed the
> > default encoding.
>
> Cool. Though how do you know it was really latin1? Is there anything
> standardized about the encoding used by FTP?

See, now I had to go and look stuff up.  So much work for a holiday.  =)

According to the spec, data transfers can be anything based on data
transfer format specified.  ASCII is one of them, but so is Local
which can be anything.

Turns out that ftplib.FTP.connect() reads from the socket using
socket.makefile('r', encoding), so it starts off in text mode.  So
that makes restricting the encoding to bytes < 128 a bad thing as not
all possible data transfers would be legal.

Basically it sounds like the ftplib module might need a thorough
rewrite to use bytes/buffers so that the proper decoding happens at
the last second.  But I am not the person to do that rewrite.  =)

-Brett

From guido at python.org  Mon Oct  8 22:08:01 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 13:08:01 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <bbaeab100710081305r7bda323bka1a2c5f4575f8ff0@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
	<ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
	<bbaeab100710081305r7bda323bka1a2c5f4575f8ff0@mail.gmail.com>
Message-ID: <ca471dc20710081308y50be49bfia2a6ccdf8c2d3bcf@mail.gmail.com>

On 10/8/07, Brett Cannon <brett at python.org> wrote:
> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> > On 10/8/07, Brett Cannon <brett at python.org> wrote:
> > > On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > > [SNIP]
> > > > PS. Is there anyone who understands test_urllib2net and can fix it?
> > > > It's been failing for weeks (maybe months) now.
> > >
> > > I don't understand it but I fixed it in r58378.  =)
> > >
> > > When ftplib.FTP was converted over to Py3K it was given a default
> > > encoding of ASCII on all read data, but that doesn't work as the stuff
> > > on the other end could be latin1 (and it was).  So I just changed the
> > > default encoding.
> >
> > Cool. Though how do you know it was really latin1? Is there anything
> > standardized about the encoding used by FTP?
>
> See, now I had to go and look stuff up.  So much work for a holiday.  =)
>
> According to the spec, data transfers can be anything based on data
> transfer format specified.  ASCII is one of them, but so is Local
> which can be anything.
>
> Turns out that ftplib.FTP.connect() reads from the socket using
> socket.makefile('r', encoding), so it starts off in text mode.  So
> that makes restricting the encoding to bytes < 128 a bad thing as not
> all possible data transfers would be legal.
>
> Basically it sounds like the ftplib module might need a thorough
> rewrite to use bytes/buffers so that the proper decoding happens at
> the last second.  But I am not the person to do that rewrite.  =)

Thanks. Mind filing a bug for someone to find? It sounds like the
rewrite might be easier once we have immutable bytes. (So this
conversation is not entirely off-topic for this thread. ;-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Mon Oct  8 22:12:22 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Oct 2007 13:12:22 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710081308y50be49bfia2a6ccdf8c2d3bcf@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
	<ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
	<bbaeab100710081305r7bda323bka1a2c5f4575f8ff0@mail.gmail.com>
	<ca471dc20710081308y50be49bfia2a6ccdf8c2d3bcf@mail.gmail.com>
Message-ID: <bbaeab100710081312o15408210l4b6d99225c53967f@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> On 10/8/07, Brett Cannon <brett at python.org> wrote:
> > On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> > > On 10/8/07, Brett Cannon <brett at python.org> wrote:
> > > > On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > > > [SNIP]
> > > > > PS. Is there anyone who understands test_urllib2net and can fix it?
> > > > > It's been failing for weeks (maybe months) now.
> > > >
> > > > I don't understand it but I fixed it in r58378.  =)
> > > >
> > > > When ftplib.FTP was converted over to Py3K it was given a default
> > > > encoding of ASCII on all read data, but that doesn't work as the stuff
> > > > on the other end could be latin1 (and it was).  So I just changed the
> > > > default encoding.
> > >
> > > Cool. Though how do you know it was really latin1? Is there anything
> > > standardized about the encoding used by FTP?
> >
> > See, now I had to go and look stuff up.  So much work for a holiday.  =)
> >
> > According to the spec, data transfers can be anything based on data
> > transfer format specified.  ASCII is one of them, but so is Local
> > which can be anything.
> >
> > Turns out that ftplib.FTP.connect() reads from the socket using
> > socket.makefile('r', encoding), so it starts off in text mode.  So
> > that makes restricting the encoding to bytes < 128 a bad thing as not
> > all possible data transfers would be legal.
> >
> > Basically it sounds like the ftplib module might need a thorough
> > rewrite to use bytes/buffers so that the proper decoding happens at
> > the last second.  But I am not the person to do that rewrite.  =)
>
> Thanks. Mind filing a bug for someone to find? It sounds like the
> rewrite might be easier once we have immutable bytes. (So this
> conversation is not entirely off-topic for this thread. ;-)

Created issue1248.

-Brett

From nnorwitz at gmail.com  Mon Oct  8 22:13:29 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Mon, 8 Oct 2007 13:13:29 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710081303i57620cdfk332b90cd24bf320a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710081303i57620cdfk332b90cd24bf320a@mail.gmail.com>
Message-ID: <ee2a432c0710081313i2a1ecef8k5650080c2dc4ba5c@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > - remove locale support from PyString
> > - remove compatibility with PyUnicode from PyString
> > - remove compatibility with PyString from PyUnicode
>
> I'll tackle these myself by Friday, unless someone else beats me to it.

I experimented a bit with removing some of the delegation to PyUnicode
in stringobject.c.  I ran into many problems starting the interpreter
or printing things out (fatal errors or exceptions).  It seems we
still are using str8 in a bunch of places that need to converted to
Unicode.  I think that will make it easier to rip out the
dependencies.

If I have time, I'll probably focus on converting more uses of
PyString to PyUnicode.  These need to be done anyways and will
probably make other changes easier.

n

From phd at phd.pp.ru  Mon Oct  8 22:00:15 2007
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 9 Oct 2007 00:00:15 +0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<bbaeab100710081250v23598e65i76eb3d64129bc35a@mail.gmail.com>
	<ca471dc20710081251h368cf917mfbb29cd5e214a852@mail.gmail.com>
Message-ID: <20071008200015.GA3316@phd.pp.ru>

On Mon, Oct 08, 2007 at 12:51:59PM -0700, Guido van Rossum wrote:
> Cool. Though how do you know it was really latin1? Is there anything
> standardized about the encoding used by FTP?

   There is no. Russian users, e.g., use all encodings - koi8-r, cp1251,
utf-8; cp1251 is the most popular here, of course.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From alexandre at peadrop.com  Tue Oct  9 00:05:36 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 8 Oct 2007 18:05:36 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> - remove buffer API from PyUnicode
> - change PyString's repr() to return "b'...'"
> - change PyBytes's repr() to return "buffer(b'...')"

I got patches for these. I plan to submit them for review after doing
more testing to make sure they work right.


-- Alexandre

From guido at python.org  Tue Oct  9 00:36:05 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 15:36:05 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
Message-ID: <ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>

Cool. Just notice that you haven't been following protocol --
Christian Heimes volunteered to do these too. :-)

On 10/8/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> > - remove buffer API from PyUnicode
> > - change PyString's repr() to return "b'...'"
> > - change PyBytes's repr() to return "buffer(b'...')"
>
> I got patches for these. I plan to submit them for review after doing
> more testing to make sure they work right.
>
>
> -- Alexandre
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alexandre at peadrop.com  Tue Oct  9 00:45:17 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 8 Oct 2007 18:45:17 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
Message-ID: <acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> Cool. Just notice that you haven't been following protocol --
> Christian Heimes volunteered to do these too. :-)

Oops, sorry Christian for taking yours.

-- Alexandre

From brett at python.org  Tue Oct  9 01:19:34 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Oct 2007 16:19:34 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
Message-ID: <bbaeab100710081619l250fb0dwe7e20fe1104a751e@mail.gmail.com>

On 10/8/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> > Cool. Just notice that you haven't been following protocol --
> > Christian Heimes volunteered to do these too. :-)
>
> Oops, sorry Christian for taking yours.

See http://bugs.python.org/issue1247 for Christian's patch.  Maybe you
can do a code review of Christian's work, Alexandre?  And if you want
to be really brave you could maybe even do the commit yourself.  =)

-Brett

From alexandre at peadrop.com  Tue Oct  9 00:56:41 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 8 Oct 2007 18:56:41 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <acd65fa20710081556i514ce973i3b00cb8e561ef08@mail.gmail.com>

On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> - change indexing and iteration over PyString to return ints, not
> 1-char PyStrings

I will try do this one.

-- Alexandre

From lists at cheimes.de  Tue Oct  9 00:57:49 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 09 Oct 2007 00:57:49 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
Message-ID: <feecle$m0e$1@sea.gmane.org>

Alexandre Vassalotti wrote:
> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
>> Cool. Just notice that you haven't been following protocol --
>> Christian Heimes volunteered to do these too. :-)
> 
> Oops, sorry Christian for taking yours.

I've submitted my patch a few hours ago. I wasn't able to test it to
full extend because the svn server was down and I couldn't get the
latest update.

I noticed that PyBytes doesn't have an iteration view like PyString. Do
we need a view for it?

Christian


From lists at cheimes.de  Tue Oct  9 01:29:31 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 09 Oct 2007 01:29:31 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <bbaeab100710081619l250fb0dwe7e20fe1104a751e@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>	
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>	
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
	<bbaeab100710081619l250fb0dwe7e20fe1104a751e@mail.gmail.com>
Message-ID: <470ABD5B.3060601@cheimes.de>

Brett Cannon wrote:
> See http://bugs.python.org/issue1247 for Christian's patch.  Maybe you
> can do a code review of Christian's work, Alexandre?  And if you want
> to be really brave you could maybe even do the commit yourself.  =)

I'm not happy with:

    static const char *quote_prefix = "buffer(b'";
    p = PyUnicode_AS_UNICODE(v);
    for (i=0; i<strlen(quote_prefix); i++) {
        *p++ = quote_prefix[i];
    }

but I didn't know how to code it more elegant. It follows the previous
version of the code and it's the fastest way I can think of without
messing around with unicode. strncpy/memcpy doesn't work for obvious
reasons. :/

Christian

From guido at python.org  Tue Oct  9 01:30:27 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Oct 2007 16:30:27 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <feecle$m0e$1@sea.gmane.org>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
	<feecle$m0e$1@sea.gmane.org>
Message-ID: <ca471dc20710081630x5bdc0a44vac546fe762b228df@mail.gmail.com>

On 10/8/07, Christian Heimes <lists at cheimes.de> wrote:
> Alexandre Vassalotti wrote:
> > On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> >> Cool. Just notice that you haven't been following protocol --
> >> Christian Heimes volunteered to do these too. :-)
> >
> > Oops, sorry Christian for taking yours.
>
> I've submitted my patch a few hours ago. I wasn't able to test it to
> full extend because the svn server was down and I couldn't get the
> latest update.

Now we'll have competing patches. Can you two please review each
other's so I won't have to review two? Anyway, anonymous svn should be
working again.

> I noticed that PyBytes doesn't have an iteration view like PyString. Do
> we need a view for it?

Yes, that would be a good idea! This currently causes a bit of a
problem for the Sequence ABC.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alexandre at peadrop.com  Tue Oct  9 01:55:13 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 8 Oct 2007 19:55:13 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470ABD5B.3060601@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
	<bbaeab100710081619l250fb0dwe7e20fe1104a751e@mail.gmail.com>
	<470ABD5B.3060601@cheimes.de>
Message-ID: <acd65fa20710081655v7efb8c27je6e700a512a535d0@mail.gmail.com>

Ah! In my review, I was going to suggest you that:

   while (*quote_prefix)
      *p++ = *quote_prefix++;

-- Alexandre

On 10/8/07, Christian Heimes <lists at cheimes.de> wrote:
> I'm not happy with:
>
>     static const char *quote_prefix = "buffer(b'";
>     p = PyUnicode_AS_UNICODE(v);
>     for (i=0; i<strlen(quote_prefix); i++) {
>         *p++ = quote_prefix[i];
>     }
>
> but I didn't know how to code it more elegant. It follows the previous
> version of the code and it's the fastest way I can think of without

From qrczak at knm.org.pl  Tue Oct  9 02:02:26 2007
From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk)
Date: Tue, 09 Oct 2007 02:02:26 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470ABD5B.3060601@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081505y6ea93efdsb92606cf2d525db7@mail.gmail.com>
	<ca471dc20710081536r51203eeahaf57ceedd1a01b22@mail.gmail.com>
	<acd65fa20710081545i5c74c5e6h2d760d3d37c18b17@mail.gmail.com>
	<bbaeab100710081619l250fb0dwe7e20fe1104a751e@mail.gmail.com>
	<470ABD5B.3060601@cheimes.de>
Message-ID: <1191888146.15402.5.camel@qrnik>

Dnia 09-10-2007, Wt o godzinie 01:29 +0200, Christian Heimes pisze:

> I'm not happy with:
> 
>     static const char *quote_prefix = "buffer(b'";
>     p = PyUnicode_AS_UNICODE(v);
>     for (i=0; i<strlen(quote_prefix); i++) {
>         *p++ = quote_prefix[i];
>     }

strlen in a loop is bad for performance.

I would do:

   static const Py_UNICODE quote_prefix[] = {
      'b', 'u', 'f', 'f', 'e', 'r', '(', 'b', '\''
   };

and memcpy.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From tom at vector-seven.com  Tue Oct  9 15:31:16 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Tue, 09 Oct 2007 23:31:16 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <4709BA29.3060503@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
Message-ID: <470B82A4.8030703@vector-seven.com>

Thomas Lee wrote:
> Guido van Rossum wrote:
>   
>> - make == and != between PyBytes and PyUnicode return False instead of
>> raising TypeError
>>     
A patch for this is ready. I'll submit it to the bug tracker later tonight.
>> - make == and != between PyString and Pyunicode return False instead
>> of converting
>>     
This will be trivial, but I need to ask a stupid question: is this also 
true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(), 
str()) != 0 ?)

And, if so, what should PyUnicode_Compare actually return if one of the 
parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for 
PyUnicode on the right?
>> - make comparisons between PyString and PyBytes work (these are
>> properly ordered)
>>   
>>     
Is it just me, or do string/bytes comparisons already work?

 >>> s = str8('test')
 >>> b = b'test'
 >>> s == b
True
 >>> b == s
True
 >>> s != b
False
 >>> b != s
False

Cheers,
Tom

From tom at vector-seven.com  Tue Oct  9 15:39:59 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Tue, 09 Oct 2007 23:39:59 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470B82A4.8030703@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
Message-ID: <470B84AF.4060704@vector-seven.com>

Thomas Lee wrote:
> Thomas Lee wrote:
>   
>> Guido van Rossum wrote:
>>   
>>     
>>> - make == and != between PyBytes and PyUnicode return False instead of
>>> raising TypeError
>>>     
>>>       
> A patch for this is ready. I'll submit it to the bug tracker later tonight.
>   
This patch is now up:
http://bugs.python.org/issue1249

Cheers,
Tom

From guido at python.org  Tue Oct  9 17:01:02 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 9 Oct 2007 08:01:02 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470B82A4.8030703@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
Message-ID: <ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>

On 10/9/07, Thomas Lee <tom at vector-seven.com> wrote:
> Thomas Lee wrote:
> > Guido van Rossum wrote:
> >
> >> - make == and != between PyBytes and PyUnicode return False instead of
> >> raising TypeError
> >>
> A patch for this is ready. I'll submit it to the bug tracker later tonight.
> >> - make == and != between PyString and Pyunicode return False instead
> >> of converting
> >>
> This will be trivial, but I need to ask a stupid question: is this also
> true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(),
> str()) != 0 ?)
>
> And, if so, what should PyUnicode_Compare actually return if one of the
> parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for
> PyUnicode on the right?

Assuming that PyUnicode_Compare is a three-way comparison (less,
equal, more), it should raise a TypeError when one of the arguments is
a PyString or PyBytes.

> >> - make comparisons between PyString and PyBytes work (these are
> >> properly ordered)
> >>
> >>
> Is it just me, or do string/bytes comparisons already work?
>
>  >>> s = str8('test')
>  >>> b = b'test'
>  >>> s == b
> True
>  >>> b == s
> True
>  >>> s != b
> False
>  >>> b != s
> False

Seems it's already so. Do they order properly too? (< <= > >=)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct  9 17:56:50 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 9 Oct 2007 08:56:50 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470BA418.5060301@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
	<470BA418.5060301@vector-seven.com>
Message-ID: <ca471dc20710090856v128555abl225a40de8f691f13@mail.gmail.com>

On 10/9/07, Thomas Lee <tom at vector-seven.com> wrote:
> Guido van Rossum wrote:
> >>>
> >>>> - make == and != between PyBytes and PyUnicode return False instead of
> >>>> raising TypeError
> >>>>
> >>>>
> Just thinking about it I'm pretty sure my initial patch is wrong -
> forgive my ignorance. To remove the ambiguity, is it fair to state the
> following?
>
> bytes() == str() -> False instead of raising TypeError
> bytes() != str() -> True instead of raising TypeError

Correct.

> I initially read that as "return False whenever any comparison between
> bytes and unicode objects is attempted" ...

The point is that a bytes and a str instance are never considered equal...

> > Assuming that PyUnicode_Compare is a three-way comparison (less,
> > equal, more), it should raise a TypeError when one of the arguments is
> > a PyString or PyBytes.
> >
> >
> Cool. Should have that sorted out soon. As above:
>
> str8() == str() -> False
> str8() != str() -> True
>
> Correct?

Well, in this case you actually have to compare the individual bytes.
But yes. ;-)

> >> Is it just me, or do string/bytes comparisons already work?
> >>
> >>  >>> s = str8('test')
> >>  >>> b = b'test'
> >>  >>> s == b
> >> True
> >>  >>> b == s
> >> True
> >>  >>> s != b
> >> False
> >>  >>> b != s
> >> False
> >>
> >
> > Seems it's already so. Do they order properly too? (< <= > >=)
> >
> Looks like it:
>
>  >>> str8('a') > b'b'
> False
>  >>> str8('a') < b'b'
> True
>  >>> str8('a') <= b'b'
> True
>  >>> str8('a') >= b'b'
> False

Well that part was easy then. ;-)


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct  9 19:02:03 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 9 Oct 2007 10:02:03 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470BAA01.9090202@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
	<470BA418.5060301@vector-seven.com>
	<ca471dc20710090856v128555abl225a40de8f691f13@mail.gmail.com>
	<470BAA01.9090202@vector-seven.com>
Message-ID: <ca471dc20710091002x69e31578h6d3775ed6b900491@mail.gmail.com>

On 10/9/07, Thomas Lee <tom at vector-seven.com> wrote:
> Guido van Rossum wrote:
> >
> > The point is that a bytes and a str instance are never considered equal...
> >
> >
> Sorry. I understand now. My brain must have been on a holiday earlier.
> :) Just pushed an updated patch to the bug tracker.
> >> str8() == str() -> False
> >> str8() != str() -> True
> >>
> >> Correct?
> >>
> >
> > Well, in this case you actually have to compare the individual bytes.
> > But yes. ;-)
> >
> I'm confused: if I'm making == and != between PyString return False
> instead of converting, at what point would I need to be comparing bytes?
>
> The fix I have ready for this merely wipes out the conversion from
> PyString to PyUnicode in PyUnicode_Compare and the existing code takes
> care of the rest. Is this all that's required, or have I misinterpreted
> this one too? :)

Sorry, my bad. I misread and though you were talking about PyString vs. PyBytes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct  9 19:24:33 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 9 Oct 2007 10:24:33 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470BA418.5060301@vector-seven.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
	<470BA418.5060301@vector-seven.com>
Message-ID: <ca471dc20710091024q77f0ddfcrb71a03a91d3e1e8b@mail.gmail.com>

On 10/9/07, Thomas Lee <tom at vector-seven.com> wrote:
> Looks like it:
>
>  >>> str8('a') > b'b'
> False
>  >>> str8('a') < b'b'
> True
>  >>> str8('a') <= b'b'
> True
>  >>> str8('a') >= b'b'
> False

Which reminds me of a task I forgot to add to the list:

- change the constructor for PyString to match the one for PyBytes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Oct 10 00:33:20 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 9 Oct 2007 15:33:20 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710091024q77f0ddfcrb71a03a91d3e1e8b@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<4709BA29.3060503@vector-seven.com>
	<470B82A4.8030703@vector-seven.com>
	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
	<470BA418.5060301@vector-seven.com>
	<ca471dc20710091024q77f0ddfcrb71a03a91d3e1e8b@mail.gmail.com>
Message-ID: <ca471dc20710091533j74dfbcctc216416b3fcbb891@mail.gmail.com>

On 10/9/07, Guido van Rossum <guido at python.org> wrote:
> Which reminds me of a task I forgot to add to the list:
>
> - change the constructor for PyString to match the one for PyBytes.

And another pair of forgotten tasks:

- change PyBytes so that its str() is the same as its repr().
- change PyString so that its str() is the same as its repr().

The former seems easy. The latter might cause trouble (though then
again, it may not).

I should also note that I already submitted the changes to remove
locale support from PyString, and am working on removing its encode()
method. This is not going so smoothly.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alexandre at peadrop.com  Wed Oct 10 05:27:43 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Tue, 9 Oct 2007 23:27:43 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710081556i514ce973i3b00cb8e561ef08@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<acd65fa20710081556i514ce973i3b00cb8e561ef08@mail.gmail.com>
Message-ID: <acd65fa20710092027s4e7b099p88afdefbcabd2e02@mail.gmail.com>

On 10/8/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
> > - change indexing and iteration over PyString to return ints, not
> > 1-char PyStrings
>
> I will try do this one.

This took a bit longer than I expected. Changing the PyString iterator
to return ints was easy, but I ran into some issues with the codec
registry.

I won't have the time this week to work on my patch any further.
Meanwhile if someone would like to improve it, feel free to do so (the
patch is attached to this email). Otherwise, I will continue to work
on it next weekend.

Cheers,
-- Alexandre
-------------- next part --------------
A non-text attachment was scrubbed...
Name: string_iter_ret_ints.patch
Type: text/x-diff
Size: 4742 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20071009/7053a418/attachment-0001.patch 

From greg at krypto.org  Wed Oct 10 07:49:00 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 9 Oct 2007 22:49:00 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
Message-ID: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>

>
> > - remove buffer API from PyUnicode
>
>
> I'll take these two with a goal of having them done by the end of the
> week.
>
> -gps
>

I should've known not to believe the simple description.  This one is
proving difficult by itself.  If I modify the Unicode object to not support
the buffer API I can't even launch the python interpreter.  Any one with
more time on their hands want this one?

I'll still deal with adding the missing PyBytes methods.

-g
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071009/2da87f85/attachment.htm 

From jyasskin at gmail.com  Wed Oct 10 08:02:19 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 10 Oct 2007 01:02:19 -0500
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
Message-ID: <5d44f72f0710092302s52be427fp19bfbae07a8d2700@mail.gmail.com>

On 10/10/07, Gregory P. Smith <greg at krypto.org> wrote:
>
>
> >
> >
> > >
> > > - remove buffer API from PyUnicode
> >
> >
> > I'll take these two with a goal of having them done by the end of the
> week.
> >
> > -gps
>
> I should've known not to believe the simple description.  This one is
> proving difficult by itself.  If I modify the Unicode object to not support
> the buffer API I can't even launch the python interpreter.  Any one with
> more time on their hands want this one?
>
> I'll still deal with adding the missing PyBytes methods.

I've got two plane flights coming up, so I can tackle removing the
buffer API from PyUnicode (and perhaps removing the PyBUF_CHARACTER
constant entirely if it's on the way). I'll hope to be done by Monday,
with a status report of some sort by Friday.

From alexandre at peadrop.com  Wed Oct 10 15:27:09 2007
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Wed, 10 Oct 2007 09:27:09 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
Message-ID: <acd65fa20710100627u69e1086cnbf4ee46a3784e07a@mail.gmail.com>

On 10/10/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > - remove buffer API from PyUnicode
> >
> > I'll take these two with a goal of having them done by the end of the
> week.
> >
>
> I should've known not to believe the simple description.  This one is
> proving difficult by itself.  If I modify the Unicode object to not support
> the buffer API I can't even launch the python interpreter.  Any one with
> more time on their hands want this one?
>

I have a patch for this one. I just haven't tested it throughly.
I attached the patch, so free to improve it.

-- Alexandre
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unicode_rm_buf_api.patch
Type: text/x-diff
Size: 1889 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20071010/e1ffa412/attachment.patch 

From lists at cheimes.de  Wed Oct 10 20:01:21 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 10 Oct 2007 20:01:21 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <470D1371.3020309@cheimes.de>

Guido van Rossum wrote:
> > The tasks I can think of are:
[...]

(Resend, the first mail didn't make it and I forgot a point)

While I was working on a patch for the renaming of bytes and str8 I
found some open issues that need to be discussed and addressed:

- Create an iterator view for PyBytes. The buffer object doesn't have a
view for iteration like bytes have with PyStringIter_Type. Guido said he
wants a view to play nice with the Sequence ABC.

- Should bytes (PyString_Type) subclass from basestring? It doesn't feel
quite right to me. I think we could remove basestring completely if
bytes doesn't subclass from it.

- Do we need a common base type for bytes and buffer like e.g. basebytes?

- The new bytes type (formally known as str8 / PyString_Type) still has
a bunch of methods from its original Python 2.x parent:

['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
'__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count',
'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace',
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
'upper', 'zfill']

Should any of these methods be removed?

- PyString still excepts unicode in a lot of places and some important
parts of Python still require it. The interpreter was f... up as I
removed unicode support from functions like PyString_Size and
PyString_AsString. I'm not sure which function is causing trouble. The
error message was an exception bootstrapping error because
PyImport_ImportModule("__builtin__") failed. Should these methods still
accept unicode and convert it with the default encoding?

Christian


From guido at python.org  Wed Oct 10 20:08:20 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 10 Oct 2007 11:08:20 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470D1371.3020309@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<470D1371.3020309@cheimes.de>
Message-ID: <ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>

On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > > The tasks I can think of are:
> [...]
>
> (Resend, the first mail didn't make it and I forgot a point)
>
> While I was working on a patch for the renaming of bytes and str8 I
> found some open issues that need to be discussed and addressed:
>
> - Create an iterator view for PyBytes. The buffer object doesn't have a
> view for iteration like bytes have with PyStringIter_Type. Guido said he
> wants a view to play nice with the Sequence ABC.

Right. Though it is a minor point and can be done later.

> - Should bytes (PyString_Type) subclass from basestring? It doesn't feel
> quite right to me. I think we could remove basestring completely if
> bytes doesn't subclass from it.

Definitely not. basestring is for text strings. We could even decide
to remove it; we should instead have ABCs for this purpose.

> - Do we need a common base type for bytes and buffer like e.g. basebytes?

We can deal with that in abc.py as well, using virtual inheritance
(the .register() method).

> - The new bytes type (formally known as str8 / PyString_Type) still has

You mean 'formerly', not 'formally' :-) I prefer to just call these by
their C names (PyString) to be precise, as the C names aren't changing
(at least not yet ;-).

> a bunch of methods from its original Python 2.x parent:
>
> ['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
> '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__',
> '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
> '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count',
> 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
> 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
> 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace',
> 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
> 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
> 'upper', 'zfill']
>
> Should any of these methods be removed?

No, that's spelled out in the PEP. Those should all stay. (If you see
a method that's not listed in the PEP, ask me about it before deleting
it. :-)

> - PyString still excepts unicode in a lot of places and some important
> parts of Python still require it. The interpreter was f... up as I
> removed unicode support from functions like PyString_Size and
> PyString_AsString. I'm not sure which function is causing trouble. The
> error message was an exception bootstrapping error because
> PyImport_ImportModule("__builtin__") failed. Should these methods still
> accept unicode and convert it with the default encoding?

Several people have noted the same issue. My goal is to remove this
behavior completely. I don't know how much it will take; these
bootstrap issues are always hard to debug and sometimes hard to fix.

I am looking into this a bit right now; I suspect it's got to do with
some types that still return a PyString from their repr(). I noticed
that even removing .encode() from PyString breaks about 5 tests.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Wed Oct 10 20:10:43 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 10 Oct 2007 11:10:43 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710100627u69e1086cnbf4ee46a3784e07a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
	<acd65fa20710100627u69e1086cnbf4ee46a3784e07a@mail.gmail.com>
Message-ID: <bbaeab100710101110v5b44413dva4842f38653a3ca6@mail.gmail.com>

On 10/10/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
> On 10/10/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > > - remove buffer API from PyUnicode
> > >
> > > I'll take these two with a goal of having them done by the end of the
> > week.
> > >
> >
> > I should've known not to believe the simple description.  This one is
> > proving difficult by itself.  If I modify the Unicode object to not support
> > the buffer API I can't even launch the python interpreter.  Any one with
> > more time on their hands want this one?
> >
>
> I have a patch for this one. I just haven't tested it throughly.
> I attached the patch, so free to improve it.

It's best to toss all patches up on the issue tracker as then they
don't get lost amongst the other emails in the mailing list.  Plus it
provides a more centralized history of what happens with the code and
lets anyone searching for work on this exact topic have another place
to find it.

-Brett

From brett at python.org  Wed Oct 10 20:12:50 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 10 Oct 2007 11:12:50 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470D1371.3020309@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<470D1371.3020309@cheimes.de>
Message-ID: <bbaeab100710101112u2e645340u1ea15ac46b436304@mail.gmail.com>

On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > > The tasks I can think of are:
> [...]
>
> (Resend, the first mail didn't make it and I forgot a point)
>
> While I was working on a patch for the renaming of bytes and str8 I
> found some open issues that need to be discussed and addressed:
>
> - Create an iterator view for PyBytes. The buffer object doesn't have a
> view for iteration like bytes have with PyStringIter_Type. Guido said he
> wants a view to play nice with the Sequence ABC.
>
> - Should bytes (PyString_Type) subclass from basestring? It doesn't feel
> quite right to me. I think we could remove basestring completely if
> bytes doesn't subclass from it.
>
> - Do we need a common base type for bytes and buffer like e.g. basebytes?
>
> - The new bytes type (formally known as str8 / PyString_Type) still has
> a bunch of methods from its original Python 2.x parent:
>
> ['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
> '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__',
> '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
> '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count',
> 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
> 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
> 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace',
> 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
> 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
> 'upper', 'zfill']
>
> Should any of these methods be removed?
>

See PEP 3137; http://www.python.org/dev/peps/pep-3137/#methods .

-Brett

From lists at cheimes.de  Wed Oct 10 21:08:27 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 10 Oct 2007 21:08:27 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	
	<470D1371.3020309@cheimes.de>
	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
Message-ID: <470D232B.80607@cheimes.de>

Guido van Rossum wrote:
> Definitely not. basestring is for text strings. We could even decide
> to remove it; we should instead have ABCs for this purpose.

I'm going to provide a patch which rips basestring out, k? Somebody has
to write a fixer for 2to3 which replaces code like isinstance(egg,
basestring) with isinstance(egg, str).

> You mean 'formerly', not 'formally' :-) I prefer to just call these by
> their C names (PyString) to be precise, as the C names aren't changing
> (at least not yet ;-).

Oh, formerly ... right. The current state of the names is very
confusing. It's going to cost me some cups of coffee.

  str - PyUnicode
  bytes - PyString
  buffer - PyBytes

> No, that's spelled out in the PEP. Those should all stay. (If you see
> a method that's not listed in the PEP, ask me about it before deleting
> it. :-)

Doh, I should have read the PEP again before asking the question.

I've a question about one point. The PEP states "They accept anything
that implements the PEP 3118 buffer API for bytes arguments, and return
the same type as the object whose method is called ("self")". Which
types do implement the buffer API? PyString, PyBytes but not PyUnicode?

For now the PyString takes PyUnicode objects are argument and vice versa
but PyBytes doesn't take unicode. Do I understand correctly that
PyString must not accept PyUnicode?

>>> b"abc".count("b")
1
>>> "abc".count(b"b")
1
>> buffer(b"abc").count("b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: can't use str as char buffer
>>> buffer(b"abc").count(b"b")
1

> Several people have noted the same issue. My goal is to remove this
> behavior completely. I don't know how much it will take; these
> bootstrap issues are always hard to debug and sometimes hard to fix.

I tried to debug and fix it but I gave up after half an hour.

> I am looking into this a bit right now; I suspect it's got to do with
> some types that still return a PyString from their repr(). I noticed
> that even removing .encode() from PyString breaks about 5 tests.

Great!

I've a patch that renames PyString -> bytes and PyByte -> buffer while
keeping str8 as an alias for bytes until str8 is removed. It's based on
Alexandres patch which itself is partly based on my patch. It breaks a
hell of a lot but it could give you a head start.

>>> b''
b''
>>> type(b'')
<type 'bytes'>
>>> type(b'') is str8
True
>>> type(b'') is bytes
True
>>> type(buffer(b''))
<type 'buffer'>

I'll keep working on the patch.

Crys



From g.brandl at gmx.net  Wed Oct 10 21:33:24 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 10 Oct 2007 21:33:24 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470D232B.80607@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>		<470D1371.3020309@cheimes.de>	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
	<470D232B.80607@cheimes.de>
Message-ID: <fej9dv$1pd$1@sea.gmane.org>

Christian Heimes schrieb:

>> You mean 'formerly', not 'formally' :-) I prefer to just call these by
>> their C names (PyString) to be precise, as the C names aren't changing
>> (at least not yet ;-).
> 
> Oh, formerly ... right. The current state of the names is very
> confusing. It's going to cost me some cups of coffee.
> 
>   str - PyUnicode
>   bytes - PyString
>   buffer - PyBytes

I agree that this is quite confusing. The PyBytes functions can be changed
without a thought since they aren't 2.x heritage. Since PyBuffer_* is already
taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed
to PyByteString_. PyUnicode might be allowed to stay...

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From lists at cheimes.de  Wed Oct 10 21:58:19 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 10 Oct 2007 21:58:19 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fej9dv$1pd$1@sea.gmane.org>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>		<470D1371.3020309@cheimes.de>	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>	<470D232B.80607@cheimes.de>
	<fej9dv$1pd$1@sea.gmane.org>
Message-ID: <fejat1$7t7$1@sea.gmane.org>

Georg Brandl wrote:
> I agree that this is quite confusing. The PyBytes functions can be changed
> without a thought since they aren't 2.x heritage. Since PyBuffer_* is already
> taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed
> to PyByteString_. PyUnicode might be allowed to stay...

I like your idea!

IMHO PyUnicode_ can stay. It reflects the intention and aim of the type
and it's easy to remember. str() contains unicode data and it's C name
is PyUnicode. That works for me. *g*

For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_
for buffer() easier to remember and more consistent.

Christian


From brett at python.org  Wed Oct 10 22:30:36 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 10 Oct 2007 13:30:36 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fejat1$7t7$1@sea.gmane.org>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<470D1371.3020309@cheimes.de>
	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
	<470D232B.80607@cheimes.de> <fej9dv$1pd$1@sea.gmane.org>
	<fejat1$7t7$1@sea.gmane.org>
Message-ID: <bbaeab100710101330n390dbd21p574788681ea38c5@mail.gmail.com>

On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Georg Brandl wrote:
> > I agree that this is quite confusing. The PyBytes functions can be changed
> > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already
> > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed
> > to PyByteString_. PyUnicode might be allowed to stay...
>
> I like your idea!
>
> IMHO PyUnicode_ can stay. It reflects the intention and aim of the type
> and it's easy to remember. str() contains unicode data and it's C name
> is PyUnicode. That works for me. *g*
>
> For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_
> for buffer() easier to remember and more consistent.

+1 from me.  No need to have PyBytes_ be PyBytesString_ as the string
tie-in will become historical.  Plus PyBytes_ is shorter without
losing any detail of what the functions work with.

-Brett

From guido at python.org  Wed Oct 10 23:00:26 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 10 Oct 2007 14:00:26 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <bbaeab100710101330n390dbd21p574788681ea38c5@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<470D1371.3020309@cheimes.de>
	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
	<470D232B.80607@cheimes.de> <fej9dv$1pd$1@sea.gmane.org>
	<fejat1$7t7$1@sea.gmane.org>
	<bbaeab100710101330n390dbd21p574788681ea38c5@mail.gmail.com>
Message-ID: <ca471dc20710101400t609a4499g52d7aab5c663a235@mail.gmail.com>

It's all fine to debate new names, but for 3.0a2, the existing C-level
names will be used. Period. I am not going to review a change that
touches every other line of code to do such a big rename.

FWIW, I think the new names should be different from any existing
names, otherwise merges from the trunk will be too much of a pain (and
ditto for ports of 3rd party code).

--Guido

On 10/10/07, Brett Cannon <brett at python.org> wrote:
> On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> > Georg Brandl wrote:
> > > I agree that this is quite confusing. The PyBytes functions can be changed
> > > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already
> > > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed
> > > to PyByteString_. PyUnicode might be allowed to stay...
> >
> > I like your idea!
> >
> > IMHO PyUnicode_ can stay. It reflects the intention and aim of the type
> > and it's easy to remember. str() contains unicode data and it's C name
> > is PyUnicode. That works for me. *g*
> >
> > For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_
> > for buffer() easier to remember and more consistent.
>
> +1 from me.  No need to have PyBytes_ be PyBytesString_ as the string
> tie-in will become historical.  Plus PyBytes_ is shorter without
> losing any detail of what the functions work with.
>
> -Brett
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Oct 10 23:06:33 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 10 Oct 2007 14:06:33 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <470D232B.80607@cheimes.de>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<470D1371.3020309@cheimes.de>
	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>
	<470D232B.80607@cheimes.de>
Message-ID: <ca471dc20710101406q413d46b3ne018d506673db4db@mail.gmail.com>

On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> I've a question about one point. The PEP states "They accept anything
> that implements the PEP 3118 buffer API for bytes arguments, and return
> the same type as the object whose method is called ("self")". Which
> types do implement the buffer API? PyString, PyBytes but not PyUnicode?

Plus some other standard types, like memoryview and array.array.
Plus certain extension types, like numpy arrays.

> For now the PyString takes PyUnicode objects are argument and vice versa
> but PyBytes doesn't take unicode. Do I understand correctly that
> PyString must not accept PyUnicode?

Correct.

> >>> b"abc".count("b")
> 1

This is a bug.

> >>> "abc".count(b"b")
> 1

This too.

> >> buffer(b"abc").count("b")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> SystemError: can't use str as char buffer

What is buffer? Are you using an old version of the tree (where it was
an object like memoryview) or a patched version where you've already
renamed str8 to buffer?

Anyway, str8().count(str()) should raise TypeError.

> >>> buffer(b"abc").count(b"b")
> 1

Same question. Once the PEP is completely implemented, this should be correct.

> I've a patch that renames PyString -> bytes and PyByte -> buffer while
> keeping str8 as an alias for bytes until str8 is removed. It's based on
> Alexandres patch which itself is partly based on my patch. It breaks a
> hell of a lot but it could give you a head start.

The rename is trivial. It's fixing all the unit tests that matters.

> >>> b''
> b''
> >>> type(b'')
> <type 'bytes'>
> >>> type(b'') is str8
> True
> >>> type(b'') is bytes
> True
> >>> type(buffer(b''))
> <type 'buffer'>
>
> I'll keep working on the patch.

Cool.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Wed Oct 10 23:31:35 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 10 Oct 2007 23:31:35 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710101406q413d46b3ne018d506673db4db@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	
	<470D1371.3020309@cheimes.de>	
	<ca471dc20710101108n2b6513f2o4f539776e897e1ab@mail.gmail.com>	
	<470D232B.80607@cheimes.de>
	<ca471dc20710101406q413d46b3ne018d506673db4db@mail.gmail.com>
Message-ID: <470D44B7.4060509@cheimes.de>

Guido van Rossum wrote:
>>>>> b"abc".count("b")
>> >> 1
> >
> > This is a bug.
> >
>>>>> "abc".count(b"b")
>> >> 1
> >
> > This too.
> >
>>>> >>>> buffer(b"abc").count("b")
>> >> Traceback (most recent call last):
>> >>   File "<stdin>", line 1, in <module>
>> >> SystemError: can't use str as char buffer
> >
> > What is buffer? Are you using an old version of the tree (where it was
> > an object like memoryview) or a patched version where you've already
> > renamed str8 to buffer?

It was a test in my patched version of Python with the new names (str8
-> bytes, bytes -> buffer).

> > The rename is trivial. It's fixing all the unit tests that matters.

Yes, I know what you are talking about. *g* The unit tests aren't easy
to fix. It will take some time. Right now even the interpreter isn't
running with the new names.

>>>>> >>>>> b''
>> >> b''
>>>>> >>>>> type(b'')
>> >> <type 'bytes'>
>>>>> >>>>> type(b'') is str8
>> >> True
>>>>> >>>>> type(b'') is bytes
>> >> True
>>>>> >>>>> type(buffer(b''))
>> >> <type 'buffer'>
>> >>
>> >> I'll keep working on the patch.
> >
> > Cool.

That was another interpreter session with my rename patch. I've another
patch that removes basestring from Python 3.0:
http://bugs.python.org/issue1258

Christian

From greg at krypto.org  Thu Oct 11 07:14:09 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 10 Oct 2007 22:14:09 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710100627u69e1086cnbf4ee46a3784e07a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com>
	<acd65fa20710100627u69e1086cnbf4ee46a3784e07a@mail.gmail.com>
Message-ID: <52dc1c820710102214h63a04ad4ua96a459957fa2071@mail.gmail.com>

haha wow!  your patch was a *lot* less messy than I was expecting things
could get.  most of the test suite still seems to pass for me with this
applied.  if you haven't already please post it on bugs.python.org.

On 10/10/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
>
> On 10/10/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > > - remove buffer API from PyUnicode
> > >
> > > I'll take these two with a goal of having them done by the end of the
> > week.
> > >
> >
> > I should've known not to believe the simple description.  This one is
> > proving difficult by itself.  If I modify the Unicode object to not
> support
> > the buffer API I can't even launch the python interpreter.  Any one with
> > more time on their hands want this one?
> >
>
> I have a patch for this one. I just haven't tested it throughly.
> I attached the patch, so free to improve it.
>
> -- Alexandre
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071010/5e1e0c4c/attachment.htm 

From greg at krypto.org  Thu Oct 11 09:59:35 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Thu, 11 Oct 2007 00:59:35 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
Message-ID: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com>

Guido -

One tiny question has come up while working on this one:

Should the PyBytes buffer (mutable bytes) object's .append(val) and
.remove(val) methods accept anything other than an int in the 0..255 range?

I believe the answer to be no based on the previous long thread on this but
these two weren't mentioned at the time so i figure I'll ask.  Should a
pep3118 buffer api supporting object that produces a length 1 buffer also
work for append and remove?  That would allow .append(b'!') or
.remove(b'!').

amusingly right now in 3.0a1 there is a bug where .append('33') will happily
append a b'!' by converting it into an int then into a byte.  regardless of
the answer that misbehavior will be zapped in the patch i'm about to submit.
;)

-gps

On 10/8/07, Gregory P. Smith <greg at krypto.org> wrote:
>
>
> - add missing methods to PyBytes (for list, see the PEP and compare to
> > what's already there)
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/995afb38/attachment.htm 

From greg at krypto.org  Thu Oct 11 10:09:31 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Thu, 11 Oct 2007 01:09:31 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com>
Message-ID: <52dc1c820710110109h6c5061b7t7832962873f706a@mail.gmail.com>

On 10/11/07, Gregory P. Smith <greg at krypto.org> wrote:
>
> Guido -
>
> One tiny question has come up while working on this one:
>
> Should the PyBytes buffer (mutable bytes) object's .append(val) and
> .remove(val) methods accept anything other than an int in the 0..255 range?
>
> I believe the answer to be no based on the previous long thread on this
> but these two weren't mentioned at the time so i figure I'll ask.  Should a
> pep3118 buffer api supporting object that produces a length 1 buffer also
> work for append and remove?  That would allow .append(b'!') or
> .remove(b'!').


I'm doubly assuming 'no' now as the .insert() method would also need it for
consistancy which just be plain gross to allow .insert(5, b'x') to work but
.insert(5, b'xyz') to fail with a ValueError.  Consider the question unasked
unless you want a different answer.

amusingly right now in 3.0a1 there is a bug where .append('33') will happily
> append a b'!' by converting it into an int then into a byte.  regardless of
> the answer that misbehavior will be zapped in the patch i'm about to submit.
> ;)
>
> -gps
>
> On 10/8/07, Gregory P. Smith <greg at krypto.org> wrote:
> >
> >
> > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > what's already there)
> > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/2e89e471/attachment-0001.htm 

From tom at vector-seven.com  Tue Oct  9 17:54:00 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Wed, 10 Oct 2007 01:54:00 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<4709BA29.3060503@vector-seven.com>	<470B82A4.8030703@vector-seven.com>
	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>
Message-ID: <470BA418.5060301@vector-seven.com>

Guido van Rossum wrote:
>>>       
>>>> - make == and != between PyBytes and PyUnicode return False instead of
>>>> raising TypeError
>>>>
>>>>         
Just thinking about it I'm pretty sure my initial patch is wrong - 
forgive my ignorance. To remove the ambiguity, is it fair to state the 
following?

bytes() == str() -> False instead of raising TypeError
bytes() != str() -> True instead of raising TypeError

I initially read that as "return False whenever any comparison between 
bytes and unicode objects is attempted" ...

> Assuming that PyUnicode_Compare is a three-way comparison (less,
> equal, more), it should raise a TypeError when one of the arguments is
> a PyString or PyBytes.
>
>   
Cool. Should have that sorted out soon. As above:

str8() == str() -> False
str8() != str() -> True

Correct?
>> Is it just me, or do string/bytes comparisons already work?
>>
>>  >>> s = str8('test')
>>  >>> b = b'test'
>>  >>> s == b
>> True
>>  >>> b == s
>> True
>>  >>> s != b
>> False
>>  >>> b != s
>> False
>>     
>
> Seems it's already so. Do they order properly too? (< <= > >=)
>   
Looks like it:

 >>> str8('a') > b'b'
False
 >>> str8('a') < b'b'
True
 >>> str8('a') <= b'b'
True
 >>> str8('a') >= b'b'
False

Cheers,
Tom

From tom at vector-seven.com  Tue Oct  9 18:19:13 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Wed, 10 Oct 2007 02:19:13 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710090856v128555abl225a40de8f691f13@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<4709BA29.3060503@vector-seven.com>	<470B82A4.8030703@vector-seven.com>	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>	<470BA418.5060301@vector-seven.com>
	<ca471dc20710090856v128555abl225a40de8f691f13@mail.gmail.com>
Message-ID: <470BAA01.9090202@vector-seven.com>

Guido van Rossum wrote:
>
> The point is that a bytes and a str instance are never considered equal...
>
>   
Sorry. I understand now. My brain must have been on a holiday earlier. 
:) Just pushed an updated patch to the bug tracker.
>> str8() == str() -> False
>> str8() != str() -> True
>>
>> Correct?
>>     
>
> Well, in this case you actually have to compare the individual bytes.
> But yes. ;-)
>   
I'm confused: if I'm making == and != between PyString return False 
instead of converting, at what point would I need to be comparing bytes?

The fix I have ready for this merely wipes out the conversion from 
PyString to PyUnicode in PyUnicode_Compare and the existing code takes 
care of the rest. Is this all that's required, or have I misinterpreted 
this one too? :)

Cheers,
Tom

From tom at vector-seven.com  Wed Oct 10 05:59:21 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Wed, 10 Oct 2007 13:59:21 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <acd65fa20710092027s4e7b099p88afdefbcabd2e02@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<acd65fa20710081556i514ce973i3b00cb8e561ef08@mail.gmail.com>
	<acd65fa20710092027s4e7b099p88afdefbcabd2e02@mail.gmail.com>
Message-ID: <470C4E19.6090305@vector-seven.com>

I was having weird problems with the codec registry too - specifically 
the assertion checking unidata_version == "3.2.0" mysteriously failing 
after forcing string/unicode equality checks to return false. Thought 
maybe unidata_version somehow got a str8 version or something weird like 
that ... haven't looked into it at all though.

I'll be taking another look tomorrow night. I'll try to give your patch 
a test run then and see if I can help at all if somebody else hasn't 
already sorted it out.

Cheers,
Tom

Alexandre Vassalotti wrote:
> On 10/8/07, Alexandre Vassalotti <alexandre at peadrop.com> wrote:
>   
>> On 10/8/07, Guido van Rossum <guido at python.org> wrote:
>>     
>>> - change indexing and iteration over PyString to return ints, not
>>> 1-char PyStrings
>>>       
>> I will try do this one.
>>     
>
> This took a bit longer than I expected. Changing the PyString iterator
> to return ints was easy, but I ran into some issues with the codec
> registry.
>
> I won't have the time this week to work on my patch any further.
> Meanwhile if someone would like to improve it, feel free to do so (the
> patch is attached to this email). Otherwise, I will continue to work
> on it next weekend.
>
> Cheers,
> -- Alexandre
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/krumms%40gmail.com


From tom at vector-seven.com  Wed Oct 10 06:03:43 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Wed, 10 Oct 2007 14:03:43 +1000
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710091533j74dfbcctc216416b3fcbb891@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<4709BA29.3060503@vector-seven.com>	<470B82A4.8030703@vector-seven.com>	<ca471dc20710090801s7f011d48tc886847cdb75e79@mail.gmail.com>	<470BA418.5060301@vector-seven.com>	<ca471dc20710091024q77f0ddfcrb71a03a91d3e1e8b@mail.gmail.com>
	<ca471dc20710091533j74dfbcctc216416b3fcbb891@mail.gmail.com>
Message-ID: <470C4F1F.1080401@vector-seven.com>

Guido van Rossum wrote:
> On 10/9/07, Guido van Rossum <guido at python.org> wrote:
>   
>> Which reminds me of a task I forgot to add to the list:
>>
>> - change the constructor for PyString to match the one for PyBytes.
>>     
>
> And another pair of forgotten tasks:
>
> - change PyBytes so that its str() is the same as its repr().
> - change PyString so that its str() is the same as its repr().
>
> The former seems easy. The latter might cause trouble (though then
> again, it may not).
>
> I should also note that I already submitted the changes to remove
> locale support from PyString, and am working on removing its encode()
> method. This is not going so smoothly.
>
>   
I'll take the constructor once I sort out unicode/string comparison.

If nobody else has taken care of the other two by the weekend, I'll take 
a look at them too.

Cheers,
Tom

From tom at vector-seven.com  Thu Oct 11 14:41:57 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Thu, 11 Oct 2007 22:41:57 +1000
Subject: [Python-3000] PEP 3137 patch #2 - str8() == str() -> False
Message-ID: <470E1A15.2030709@vector-seven.com>

Okay, here's another patch:

http://bugs.python.org/issue1263

Using unicode-string-eq-false-r3.patch, str8/str comparison will now 
return False instead of attempting to convert. Unfortunately this breaks 
about 30 tests. In attempting to fix test_unicode (the obvious starting 
point for all this), I made changes to Python/structmember.c to use 
PyUnicode instead of PyString - this fixed some of the issues in 
test_unicode, but there would appear to be other, similar problems 
elsewhere.

I'm not going to have the time to get this done by Friday, but I may be 
able to work more on this over the weekend. I'd love some feedback on my 
changes to structmember.c so I know if I'm going about it the right way 
(my knowledge of the PyUnicode API and unicode in general is pretty 
limited). I put the structmember.c patch in a separate file for now - 
unicode-string-eq-false-structmember-c-r1.patch

Until then, if anybody wants to help out with getting those tests 
running that would be great too. Otherwise, I should have made some sort 
of measurable progress by Monday.

Cheers,
Tom

From lists at cheimes.de  Thu Oct 11 17:07:59 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 17:07:59 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
Message-ID: <470E3C4F.5020707@cheimes.de>

Hello Python!

I've written a patch that removes basestring from py3k:
http://bugs.python.org/issue1258 During the testing of the patch I hit a
problem with __file__ and codeobject.co_filename. Both __file__ and
co_filename are byte strings and not unicode which is causing some
trouble. Guido asked me to provide another patch which decodes the
string using the default filesystem encoding.

Most of the patch was straight forward and easy but I hit one spot
that's causing some trouble. It's a chicken and egg issue.
codeobject.co_filename is a PyString instance. I like to perform

filename = PyString_AsDecodedObject(filename,
Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8",
NULL);

in order to decode the string with either the fs encoding or UTF-8 but
it's not possible. It's way too early in the bootstrapping process of
Python and the codecs aren't registered yet. In fact large parts of the
codecs package is implemented in Python ...

Ideas?

I could check if Py_FilesystemDefaultEncoding is one of the encodings
that are implemented in Python (UTF-8, 16, 32, latin1, mbcs) but what if
the fs default encoding is some obscure encoding?

Christian

From lists at cheimes.de  Thu Oct 11 17:21:44 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 17:21:44 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E3C4F.5020707@cheimes.de>
References: <470E3C4F.5020707@cheimes.de>
Message-ID: <470E3F88.7010301@cheimes.de>

PS: The patch for __file__ and co_filename is causing a minor problem
with the hotspot profiler and filenames. I remember a plan to remove
hotspot from Python 3.x. Shall I leave the problem alone?


From lists at cheimes.de  Thu Oct 11 17:32:25 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 17:32:25 +0200
Subject: [Python-3000] Bug with pdb.set_trace() and with block
Message-ID: <felfmb$kl3$1@sea.gmane.org>

I found a pretty annoying bug caused by with blocks. A with block
terminates the debugging session and the program keeps running. It's not
possible to go to the next line with 'n'. 's' steps into the open() call.

# pdbtest.py
import pdb
pdb.set_trace()
print("before with")
with open("/etc/passwd") as fd:
    data = fd.read()
print("after with")
print("end of program")

$ ./python pdbtest.py
> /home/heimes/dev/python/py3k/pdbtest.py(3)<module>()
-> print("before with")
(Pdb) n
before with
> /home/heimes/dev/python/py3k/pdbtest.py(4)<module>()
-> with open("/etc/passwd") as fd:
(Pdb) n
after with
end of program


Christian


From fdrake at acm.org  Thu Oct 11 18:01:03 2007
From: fdrake at acm.org (Fred Drake)
Date: Thu, 11 Oct 2007 12:01:03 -0400
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E3F88.7010301@cheimes.de>
References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de>
Message-ID: <AA3C7EC8-2C85-41E2-AEAF-A896C47E9ED3@acm.org>

On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote:
> PS: The patch for __file__ and co_filename is causing a minor problem
> with the hotspot profiler and filenames. I remember a plan to remove
> hotspot from Python 3.x. Shall I leave the problem alone?

I asked about the removal of hotshot a few weeks ago, and there was  
some uncertainty about whether a decision had been reached.  Reading  
back over the mails, there were no objections.  Python 3.0 seems a  
perfect time to rip it out.  If there are no objections, I'll do that  
this weekend.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From guido at python.org  Thu Oct 11 18:55:34 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 11 Oct 2007 09:55:34 -0700
Subject: [Python-3000] Bug with pdb.set_trace() and with block
In-Reply-To: <felfmb$kl3$1@sea.gmane.org>
References: <felfmb$kl3$1@sea.gmane.org>
Message-ID: <ca471dc20710110955g557ef115w68b0be9547ad354a@mail.gmail.com>

Please file this in the bug tracker.

Thanks for finding this -- I kew there was a problem with the debugger
losing control but I never traced it down to the with statement!

On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
> I found a pretty annoying bug caused by with blocks. A with block
> terminates the debugging session and the program keeps running. It's not
> possible to go to the next line with 'n'. 's' steps into the open() call.
>
> # pdbtest.py
> import pdb
> pdb.set_trace()
> print("before with")
> with open("/etc/passwd") as fd:
>     data = fd.read()
> print("after with")
> print("end of program")
>
> $ ./python pdbtest.py
> > /home/heimes/dev/python/py3k/pdbtest.py(3)<module>()
> -> print("before with")
> (Pdb) n
> before with
> > /home/heimes/dev/python/py3k/pdbtest.py(4)<module>()
> -> with open("/etc/passwd") as fd:
> (Pdb) n
> after with
> end of program
>
>
> Christian
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Oct 11 18:56:11 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 11 Oct 2007 09:56:11 -0700
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <AA3C7EC8-2C85-41E2-AEAF-A896C47E9ED3@acm.org>
References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de>
	<AA3C7EC8-2C85-41E2-AEAF-A896C47E9ED3@acm.org>
Message-ID: <ca471dc20710110956l4b25e6ffxcd5928f12fb90af8@mail.gmail.com>

On 10/11/07, Fred Drake <fdrake at acm.org> wrote:
> On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote:
> > PS: The patch for __file__ and co_filename is causing a minor problem
> > with the hotspot profiler and filenames. I remember a plan to remove
> > hotspot from Python 3.x. Shall I leave the problem alone?
>
> I asked about the removal of hotshot a few weeks ago, and there was
> some uncertainty about whether a decision had been reached.  Reading
> back over the mails, there were no objections.  Python 3.0 seems a
> perfect time to rip it out.  If there are no objections, I'll do that
> this weekend.

Go for it!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu Oct 11 18:58:42 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 11 Oct 2007 09:58:42 -0700
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E3C4F.5020707@cheimes.de>
References: <470E3C4F.5020707@cheimes.de>
Message-ID: <ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>

Hm, can't we make co_filename a PyUnicode instance?

On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
> Hello Python!
>
> I've written a patch that removes basestring from py3k:
> http://bugs.python.org/issue1258 During the testing of the patch I hit a
> problem with __file__ and codeobject.co_filename. Both __file__ and
> co_filename are byte strings and not unicode which is causing some
> trouble. Guido asked me to provide another patch which decodes the
> string using the default filesystem encoding.
>
> Most of the patch was straight forward and easy but I hit one spot
> that's causing some trouble. It's a chicken and egg issue.
> codeobject.co_filename is a PyString instance. I like to perform
>
> filename = PyString_AsDecodedObject(filename,
> Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8",
> NULL);
>
> in order to decode the string with either the fs encoding or UTF-8 but
> it's not possible. It's way too early in the bootstrapping process of
> Python and the codecs aren't registered yet. In fact large parts of the
> codecs package is implemented in Python ...
>
> Ideas?
>
> I could check if Py_FilesystemDefaultEncoding is one of the encodings
> that are implemented in Python (UTF-8, 16, 32, latin1, mbcs) but what if
> the fs default encoding is some obscure encoding?
>
> Christian
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Thu Oct 11 19:26:23 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 19:26:23 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
Message-ID: <470E5CBF.3030103@cheimes.de>

Guido van Rossum wrote:
> Hm, can't we make co_filename a PyUnicode instance?

I already did it in my patch but doesn't it cause a problem when the
encoding isn't UTF-8? I may understand
PyUnicode_FromString(PyString_AS_STRING(filename)) wrong. Doesn't it
decode filename from UTF-8?

Christian

From lists at cheimes.de  Thu Oct 11 19:27:04 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 19:27:04 +0200
Subject: [Python-3000] Bug with pdb.set_trace() and with block
In-Reply-To: <ca471dc20710110955g557ef115w68b0be9547ad354a@mail.gmail.com>
References: <felfmb$kl3$1@sea.gmane.org>
	<ca471dc20710110955g557ef115w68b0be9547ad354a@mail.gmail.com>
Message-ID: <470E5CE8.6080706@cheimes.de>

Guido van Rossum wrote:
> Please file this in the bug tracker.
> 
> Thanks for finding this -- I kew there was a problem with the debugger
> losing control but I never traced it down to the with statement!

Already done! http://bugs.python.org/issue1265

Christian

From guido at python.org  Thu Oct 11 19:40:21 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 11 Oct 2007 10:40:21 -0700
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E5CBF.3030103@cheimes.de>
References: <470E3C4F.5020707@cheimes.de>
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
	<470E5CBF.3030103@cheimes.de>
Message-ID: <ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>

Um, where does the filename object in that expression come from? It
appears to be a PyString object. Who created it? That could should be
changed to create a PyUnicode instead (using the filesystem encoding).

On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > Hm, can't we make co_filename a PyUnicode instance?
>
> I already did it in my patch but doesn't it cause a problem when the
> encoding isn't UTF-8? I may understand
> PyUnicode_FromString(PyString_AS_STRING(filename)) wrong. Doesn't it
> decode filename from UTF-8?
>
> Christian
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Thu Oct 11 20:01:15 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 20:01:15 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>	
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>	
	<470E5CBF.3030103@cheimes.de>
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>
Message-ID: <470E64EB.2070203@cheimes.de>

Guido van Rossum wrote:
> Um, where does the filename object in that expression come from? It
> appears to be a PyString object. Who created it? That could should be
> changed to create a PyUnicode instead (using the filesystem encoding).

Python/compile.c:makecode()
filename = PyString_FromString(c->c_filename);

Modules/pyexpat.c:getcode()
filename = PyString_FromString(__FILE__);

Objects/codeobject.c:code_new()
PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code"

As I tried to explain earlier that may be a problem. PyUnicode_Decode()
doesn't work so early. The codecs package isn't initialized yet.

Christian

From fdrake at acm.org  Thu Oct 11 20:06:23 2007
From: fdrake at acm.org (Fred Drake)
Date: Thu, 11 Oct 2007 14:06:23 -0400
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E3F88.7010301@cheimes.de>
References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de>
Message-ID: <F5C54D7B-9E20-4B14-9CB1-C44F573FA783@acm.org>

On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote:
> PS: The patch for __file__ and co_filename is causing a minor problem
> with the hotspot profiler and filenames. I remember a plan to remove
> hotspot from Python 3.x. Shall I leave the problem alone?

hotshot should no longer be a problem for this.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From lists at cheimes.de  Thu Oct 11 20:10:32 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 20:10:32 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <F5C54D7B-9E20-4B14-9CB1-C44F573FA783@acm.org>
References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de>
	<F5C54D7B-9E20-4B14-9CB1-C44F573FA783@acm.org>
Message-ID: <470E6718.2080909@cheimes.de>

Fred Drake wrote:
> hotshot should no longer be a problem for this.

Thanks Fred!

Unfortunately the anon svn server is down again. It's the second time
this week. Something must be wrong with the Apache server for
svn.python.org.

Christian

From guido at python.org  Thu Oct 11 20:23:22 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 11 Oct 2007 11:23:22 -0700
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E64EB.2070203@cheimes.de>
References: <470E3C4F.5020707@cheimes.de>
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
	<470E5CBF.3030103@cheimes.de>
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>
	<470E64EB.2070203@cheimes.de>
Message-ID: <ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>

On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > Um, where does the filename object in that expression come from? It
> > appears to be a PyString object. Who created it? That could should be
> > changed to create a PyUnicode instead (using the filesystem encoding).
>
> Python/compile.c:makecode()
> filename = PyString_FromString(c->c_filename);
>
> Modules/pyexpat.c:getcode()
> filename = PyString_FromString(__FILE__);
>
> Objects/codeobject.c:code_new()
> PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code"
>
> As I tried to explain earlier that may be a problem. PyUnicode_Decode()
> doesn't work so early. The codecs package isn't initialized yet.

But some codecs are "built-in" and have custom APIs. I wonder if we
could do something that figures out the default fs encoding, and see
if it is one of the supported ones, and then uses that; otherwise
tries UTF-8 with the "replace" error handling option (so it won't fail
if the data is non-UTF-8).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Thu Oct 11 22:11:49 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Thu, 11 Oct 2007 13:11:49 -0700
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
	<470E5CBF.3030103@cheimes.de>
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>
	<470E64EB.2070203@cheimes.de>
	<ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>
Message-ID: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>

On 10/11/07, Guido van Rossum <guido at python.org> wrote:
>
> On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
> > Guido van Rossum wrote:
> > > Um, where does the filename object in that expression come from? It
> > > appears to be a PyString object. Who created it? That could should be
> > > changed to create a PyUnicode instead (using the filesystem encoding).
> >
> > Python/compile.c:makecode()
> > filename = PyString_FromString(c->c_filename);
> >
> > Modules/pyexpat.c:getcode()
> > filename = PyString_FromString(__FILE__);
> >
> > Objects/codeobject.c:code_new()
> > PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code"
> >
> > As I tried to explain earlier that may be a problem. PyUnicode_Decode()
> > doesn't work so early. The codecs package isn't initialized yet.
>
> But some codecs are "built-in" and have custom APIs. I wonder if we
> could do something that figures out the default fs encoding, and see
> if it is one of the supported ones, and then uses that; otherwise
> tries UTF-8 with the "replace" error handling option (so it won't fail
> if the data is non-UTF-8).
>

Thats pretty much what Christian pondered at the start of this thread but
with a defined "failure" mode.

+1 from me, give it a try and see what 3.0a2 testers say.  Are there OSes
and filesystems out there that'd store in anything other than one of the
popular codecs (UTF-8, 16, 32, latin1, mbcs)?  That seems like a bad idea to
me but obviously I don't run the world.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/94cd685f/attachment-0001.htm 

From lists at cheimes.de  Thu Oct 11 23:23:22 2007
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 11 Oct 2007 23:23:22 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>	
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>	
	<470E5CBF.3030103@cheimes.de>	
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>	
	<470E64EB.2070203@cheimes.de>	
	<ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>
	<52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>
Message-ID: <470E944A.80201@cheimes.de>

Gregory P. Smith wrote:
> Thats pretty much what Christian pondered at the start of this thread but
> with a defined "failure" mode.
> 
> +1 from me, give it a try and see what 3.0a2 testers say.  Are there OSes
> and filesystems out there that'd store in anything other than one of the
> popular codecs (UTF-8, 16, 32, latin1, mbcs)?  That seems like a bad idea to
> me but obviously I don't run the world.

I've implemented the method but my C is a bit rusty and not very good.
I'm not happy with the code especially with the large if else block.

PyObject*
PyUnicode_DecodeFSDefault(const char *string, Py_ssize_t length,
                          const char *errors)
{
    PyObject *v = NULL;
    char encoding[32], mangled[32], *encptr, *manptr;
    char tmp;

    if (errors != NULL)
        Py_FatalError("non-NULL encoding in PyUnicode_DecodeFSDefault");
    if ((length == 0) && *string)
        length = (Py_ssize_t)strlen(string);

    strncpy(encoding,
           Py_FileSystemDefaultEncoding ?
           Py_FileSystemDefaultEncoding : "UTF-8",
           31);
    encoding[31] = '\0';

    encptr = encoding;
    manptr = mangled;
    /* lower the string and remove non alpha numeric chars like '-' */
    while(*encptr) {
       tmp = *encptr++;
       if (isupper(tmp))
           tmp = tolower(tmp);
       if (!isalnum(tmp))
           continue;
       *manptr++ = tmp;
    }
    *manptr++ = '\0';

    if (mangled == "utf8")
        v = PyUnicode_DecodeUTF8(string, length, NULL);
    else if (mangled == "utf16")
        v = PyUnicode_DecodeUTF16(string, length, NULL, 0);
    else if (mangled == "utf32")
        v = PyUnicode_DecodeUTF32(string, length, NULL, 0);
    else if ((mangled == "latin1") || (mangled == "iso88591") ||
             (mangled == "iso885915"))
        v = PyUnicode_DecodeLatin1(string, length, NULL);
    else if (mangled == "ascii")
        v = PyUnicode_DecodeASCII(string, length, NULL);
#ifdef MS_WIN32
    else if (mangled = "mbcs")
        v = PyUnicode_DecodeMBCS(string, length, NULL);
#endif

    if (v == NULL)
        v = PyUnicode_DecodeUTF8(string, length, "replace");

    return (PyObject*)v;
}

From greg.ewing at canterbury.ac.nz  Fri Oct 12 01:00:22 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 12 Oct 2007 12:00:22 +1300
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com>
Message-ID: <470EAB06.2000000@canterbury.ac.nz>

Gregory P. Smith wrote:
> Should a pep3118 buffer api supporting object that produces a length 1 
> buffer also work for append and remove?

My thought is -- only if such an object is also usable in
any *other* context expecting an integer. And I don't think
that would be a good idea at all.

You can always use .extend(b'!') to append a byte that's
already inside another bytes object or other buffer-supporting
object.

(BTW, I'm worried that we're overloading the term "buffer"
here. Having it refer to both the buffer interface and
also certain types of object that hold data is getting
confusing.)

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Oct 12 01:33:17 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 12 Oct 2007 12:33:17 +1300
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470E3C4F.5020707@cheimes.de>
References: <470E3C4F.5020707@cheimes.de>
Message-ID: <470EB2BD.30809@canterbury.ac.nz>

Christian Heimes wrote:
> I like to perform
> 
> filename = PyString_AsDecodedObject(filename,
> Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8",
> NULL);
> 
> in order to decode the string with either the fs encoding or UTF-8 but
> it's not possible. It's way too early in the bootstrapping process

How about just using ascii if the codec system isn't fully
operational? It would just mean that files needed during
bootstrapping would need to have pure-ascii filenames,
which doesn't seem like a serious restriction.

--
Greg

From lists at cheimes.de  Fri Oct 12 01:57:06 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 12 Oct 2007 01:57:06 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <470EB2BD.30809@canterbury.ac.nz>
References: <470E3C4F.5020707@cheimes.de> <470EB2BD.30809@canterbury.ac.nz>
Message-ID: <470EB852.6040407@cheimes.de>

Greg Ewing wrote:
> How about just using ascii if the codec system isn't fully
> operational? It would just mean that files needed during
> bootstrapping would need to have pure-ascii filenames,
> which doesn't seem like a serious restriction.

The file names aren't the issue but the directory names are. For example
it may screw up a local installation in the user's application data
directory on Windows if the user name contains umlauts. Any kind of
installation in $HOME would cause trouble if $USER isn't plain ASCII.

Christian

From qrczak at knm.org.pl  Fri Oct 12 02:34:03 2007
From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk)
Date: Fri, 12 Oct 2007 02:34:03 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>
	<470E5CBF.3030103@cheimes.de>
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>
	<470E64EB.2070203@cheimes.de>
	<ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>
	<52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>
Message-ID: <1192149243.5288.13.camel@qrnik>

Dnia 11-10-2007, Cz o godzinie 13:11 -0700, Gregory P. Smith pisze:

> Are there OSes and filesystems out there that'd store in anything
> other than one of the popular codecs (UTF-8, 16, 32, latin1, mbcs)?

I've been using ISO-8859-2 by default on my Linux until February 2007.

Most filenames were not Polish and thus ASCII of course, and Evolution
used UTF-8 for internal filenames of its folders even with the locale
encoding being ISO-8859-2 (which accidentally helped with the migration
to UTF-8).

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From lists at cheimes.de  Fri Oct 12 05:32:44 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 12 Oct 2007 05:32:44 +0200
Subject: [Python-3000] basestring removal, __file__ and co_filename
In-Reply-To: <5d44f72f0710111941h5442f1f3k2dc2c1d2edc587eb@mail.gmail.com>
References: <470E3C4F.5020707@cheimes.de>	
	<ca471dc20710110958s1eeb05f6qbc4e0b1743bc6f23@mail.gmail.com>	
	<470E5CBF.3030103@cheimes.de>	
	<ca471dc20710111040m6ddd797ard9fce847a1ba03d2@mail.gmail.com>	
	<470E64EB.2070203@cheimes.de>	
	<ca471dc20710111123p4bdda2b5ocfe6b20e4c876ccb@mail.gmail.com>	
	<52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com>	
	<470E944A.80201@cheimes.de>
	<5d44f72f0710111941h5442f1f3k2dc2c1d2edc587eb@mail.gmail.com>
Message-ID: <470EEADC.1050608@cheimes.de>

Jeffrey Yasskin wrote:
> On 10/11/07, Christian Heimes <lists at cheimes.de> wrote:
>>     if (mangled == "utf8")
> 
> FYI, this is always going to be false. It compares the pointer values,
> rather than the strings.

Doh! I've done too much Python programming in the past. I forgot that
I've to use strcmp(s1, s2) == 0 in order to compare two strings in C.

Thanks pal!

Christian

From lists at cheimes.de  Fri Oct 12 17:49:09 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 12 Oct 2007 17:49:09 +0200
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
Message-ID: <470F9775.4080405@cheimes.de>

Yesterday I found a design problem in the array module. Travis Oliphant
added a new typecode 'w' to the array module. 'w' is a wide unicode type
that is guaranteed to be at least 4 bytes long. The 'u' typecode may be
2 bytes long.

Unfortunately his change removed 'u' as a possible typecode which makes
it unnecessary hard to write code that works on Windows (UCS2 only) and
Unix (UCS4 for most Linux distributions). I've written a patch that
keeps 'u' in every build and adds 'w' as an alias for 'u' in UCS-4
builds only. It also introduces the new module variable typecodes
which is a unicode string containing all valid typecodes.

http://bugs.python.org/issue1268

Christian

From oliphant at enthought.com  Fri Oct 12 19:52:27 2007
From: oliphant at enthought.com (Travis E. Oliphant)
Date: Fri, 12 Oct 2007 12:52:27 -0500
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <470F9775.4080405@cheimes.de>
References: <470F9775.4080405@cheimes.de>
Message-ID: <470FB45B.6060004@enthought.com>

Christian Heimes wrote:
> Yesterday I found a design problem in the array module. Travis Oliphant
> added a new typecode 'w' to the array module. 'w' is a wide unicode type
> that is guaranteed to be at least 4 bytes long. The 'u' typecode may be
> 2 bytes long.
>
> Unfortunately his change removed 'u' as a possible typecode which makes
> it unnecessary hard to write code that works on Windows (UCS2 only) and
> Unix (UCS4 for most Linux distributions). I've written a patch that
> keeps 'u' in every build and adds 'w' as an alias for 'u' in UCS-4
> builds only. It also introduces the new module variable typecodes
> which is a unicode string containing all valid typecodes.
>   
The problem is to keep the array typecodes somewhat consistent with the 
typecodes in PEP 3118 which will be in the struct module.  

How about making 'U' be the typecode that translates to 'u' or 'w' 
depending on the platform and supporting both 'u' and 'w' on all 
platforms by appropriate translation of bytes on getting and setting?

-Travis


From lists at cheimes.de  Fri Oct 12 20:14:41 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 12 Oct 2007 20:14:41 +0200
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <470FB45B.6060004@enthought.com>
References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com>
Message-ID: <470FB991.7010403@cheimes.de>

Travis E. Oliphant wrote:
> The problem is to keep the array typecodes somewhat consistent with the
> typecodes in PEP 3118 which will be in the struct module. 
> How about making 'U' be the typecode that translates to 'u' or 'w'
> depending on the platform and supporting both 'u' and 'w' on all
> platforms by appropriate translation of bytes on getting and setting?

Now I see your point. :) Your solution sounds feasible but is it
realizable on all platforms? I once hit a thick wall of bricks during my
work on PythonNET. I tried to make it compatible with Mono and UCS-4
builds of Python but it was really hard because the .NET standards don't
care about anything else than a 16bit wchar_t which doesn't even
translate to UTF-16. I fear that 'w' may hit a similar wall on Windows.

Should PEP 3118 and the array module have a 'U' typecode, too? It may
proof useful for platform and build independent software to have a
typecode that translates to the native unicode type (UCS-2 or UCS-4).

Christian

From python3now at gmail.com  Fri Oct 12 21:37:24 2007
From: python3now at gmail.com (James Thiele)
Date: Fri, 12 Oct 2007 12:37:24 -0700
Subject: [Python-3000] PEP 3105 "Backward Compatibility"
Message-ID: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com>

I was reading PEP 3105 -- Make print a function and in the section
"Backwards Compatibility" found the following statement:
"The changes proposed in this PEP will render most of today's print
statements invalid, only those which incidentally feature parentheses
around all of their arguments will continue to be valid Python syntax
in version 3.0."
--
They may both be valid syntax, but they may not do  the same thing:
$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
>>> print (1,2)
(1, 2)
>>>
$ python3.0
Python 3.0a1 (py3k:57844, Aug 31 2007, 08:01:11)
>>> print (1,2)
1 2
--
It might be useful to note this in the PEP.

James

From guido at python.org  Fri Oct 12 22:25:46 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Oct 2007 13:25:46 -0700
Subject: [Python-3000] PEP 3105 "Backward Compatibility"
In-Reply-To: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com>
References: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com>
Message-ID: <ca471dc20710121325k2083c4f3w6513fc8720fca5b2@mail.gmail.com>

Good point. I added a few examples to the PEP.

--Guido

On 10/12/07, James Thiele <python3now at gmail.com> wrote:
> I was reading PEP 3105 -- Make print a function and in the section
> "Backwards Compatibility" found the following statement:
> "The changes proposed in this PEP will render most of today's print
> statements invalid, only those which incidentally feature parentheses
> around all of their arguments will continue to be valid Python syntax
> in version 3.0."
> --
> They may both be valid syntax, but they may not do  the same thing:
> $ python
> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
> >>> print (1,2)
> (1, 2)
> >>>
> $ python3.0
> Python 3.0a1 (py3k:57844, Aug 31 2007, 08:01:11)
> >>> print (1,2)
> 1 2
> --
> It might be useful to note this in the PEP.
>
> James
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant.travis at ieee.org  Fri Oct 12 23:37:33 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Fri, 12 Oct 2007 16:37:33 -0500
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <470FB991.7010403@cheimes.de>
References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com>
	<470FB991.7010403@cheimes.de>
Message-ID: <470FE91D.5010001@ieee.org>

Christian Heimes wrote:
> Travis E. Oliphant wrote:
>> The problem is to keep the array typecodes somewhat consistent with the
>> typecodes in PEP 3118 which will be in the struct module. 
>> How about making 'U' be the typecode that translates to 'u' or 'w'
>> depending on the platform and supporting both 'u' and 'w' on all
>> platforms by appropriate translation of bytes on getting and setting?
> 
> Now I see your point. :) Your solution sounds feasible but is it
> realizable on all platforms? I once hit a thick wall of bricks during my
> work on PythonNET. I tried to make it compatible with Mono and UCS-4
> builds of Python but it was really hard because the .NET standards don't
> care about anything else than a 16bit wchar_t which doesn't even
> translate to UTF-16. I fear that 'w' may hit a similar wall on Windows.
> 

I think it would be feasible, but I'm not sure it is worth it at this 
point.   My suggestion right now (and what I've done) is to back-out the 
'w' typecode for the array module and just leave it as 'u' as before.

I'll check in this change.

-Travis


From greg at krypto.org  Fri Oct 12 23:55:46 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 12 Oct 2007 14:55:46 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
Message-ID: <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>

> - add missing methods to PyBytes (for list, see the PEP and compare to
> > what's already there)
> >
>
As I work on these..  Should the mutable PyBytes_ (buffer) objects implement
the following methods inplace and return an additional reference to self?

.capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(),
.upper(), .zfill()

Also what about .replace() and .translate()?

If they are not done in place should they return a new buffer (PyBytes_)
object or a bytes (PyString_) object?  [i'd say a buffer (PyBytes_)]

Alos if not, should we add additional .ireplace() .ilower() etc.. methods to
the mutable buffer (PyBytes_)?  There are speed advantages to doing many of
those in place rather than a data copy.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071012/4fada1bb/attachment.htm 

From guido at python.org  Sat Oct 13 03:20:44 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 12 Oct 2007 18:20:44 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
Message-ID: <ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>

On 10/12/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > what's already there)
>
> As I work on these..  Should the mutable PyBytes_ (buffer) objects implement
> the following methods inplace and return an additional reference to self?
>
> .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(),
> .upper(), .zfill()

No... That would be a huge trap to fall in at all sorts of occasions.

> Also what about .replace() and .translate()?

> If they are not done in place should they return a new buffer (PyBytes_)
> object or a bytes (PyString_) object?  [i'd say a buffer (PyBytes_)]

They should return the same type as 'self'.

> Alos if not, should we add additional .ireplace() .ilower() etc.. methods to
> the mutable buffer (PyBytes_)?  There are speed advantages to doing many of
> those in place rather than a data copy.

I'm not sure I see the use case where this matters all that much
though. Let's say not, if only because it's not in the PEP. ;-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant.travis at ieee.org  Fri Oct 12 23:37:33 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Fri, 12 Oct 2007 16:37:33 -0500
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <470FB991.7010403@cheimes.de>
References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com>
	<470FB991.7010403@cheimes.de>
Message-ID: <470FE91D.5010001@ieee.org>

Christian Heimes wrote:
> Travis E. Oliphant wrote:
>> The problem is to keep the array typecodes somewhat consistent with the
>> typecodes in PEP 3118 which will be in the struct module. 
>> How about making 'U' be the typecode that translates to 'u' or 'w'
>> depending on the platform and supporting both 'u' and 'w' on all
>> platforms by appropriate translation of bytes on getting and setting?
> 
> Now I see your point. :) Your solution sounds feasible but is it
> realizable on all platforms? I once hit a thick wall of bricks during my
> work on PythonNET. I tried to make it compatible with Mono and UCS-4
> builds of Python but it was really hard because the .NET standards don't
> care about anything else than a 16bit wchar_t which doesn't even
> translate to UTF-16. I fear that 'w' may hit a similar wall on Windows.
> 

I think it would be feasible, but I'm not sure it is worth it at this 
point.   My suggestion right now (and what I've done) is to back-out the 
'w' typecode for the array module and just leave it as 'u' as before.

I'll check in this change.

-Travis

From lists at cheimes.de  Sat Oct 13 15:38:08 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 13 Oct 2007 15:38:08 +0200
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <470FE91D.5010001@ieee.org>
References: <470F9775.4080405@cheimes.de>
	<470FB45B.6060004@enthought.com>	<470FB991.7010403@cheimes.de>
	<470FE91D.5010001@ieee.org>
Message-ID: <4710CA40.7080705@cheimes.de>

Travis Oliphant wrote:
> I think it would be feasible, but I'm not sure it is worth it at this 
> point.   My suggestion right now (and what I've done) is to back-out the 
> 'w' typecode for the array module and just leave it as 'u' as before.

Thanks! I've seen that you've also checked in my typecodes addition to
arraymodule.c Do you think it's worth backporting to 2.6?

The table
http://www.python.org/dev/peps/pep-3118/#additions-to-the-struct-string-syntax
isn't exactly clear to me. I *guess* 'u' means UCS-2 on all platforms
and builds of Python - even UCS-4 builds - and 'w' is only available on
wide builds. I suggest that you place emphasis on the size to make the
table unambiguous. I know that I'm nit picking but documentation should
be crystal clear. ;)

If I'm correct with my assumption about 'u' and 'w' your suggestion of a
native 'U' could become in handy.

Christian


From qrczak at knm.org.pl  Sat Oct 13 16:19:36 2007
From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk)
Date: Sat, 13 Oct 2007 16:19:36 +0200
Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds
In-Reply-To: <4710CA40.7080705@cheimes.de>
References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com>
	<470FB991.7010403@cheimes.de> <470FE91D.5010001@ieee.org>
	<4710CA40.7080705@cheimes.de>
Message-ID: <1192285176.3643.2.camel@qrnik>

Dnia 13-10-2007, So o godzinie 15:38 +0200, Christian Heimes pisze:

> If I'm correct with my assumption about 'u' and 'w' your suggestion of a
> native 'U' could become in handy.

Wouldn't it be nicer if 'u' and 'U' corresponded to \uxxxx and
\Uxxxxxxxx, i.e. UCS-2 and UCS-4, and something else was used for
the native width?

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From jimjjewett at gmail.com  Mon Oct 15 15:57:20 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 15 Oct 2007 09:57:20 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
Message-ID: <fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>

On 10/12/07, Guido van Rossum <guido at python.org> wrote:
> On 10/12/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > > what's already there)

> > As I work on these..  Should the mutable PyBytes_ (buffer) objects implement
> > the following methods inplace and return an additional reference to self?

> > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(),
> > .upper(), .zfill()

> No... That would be a huge trap to fall in at all sorts of occasions.

So would returning a different object.  I expect a mutation operation
on an explicitly mutable object to mutate the object, instead of
creating something new.

If it returns a new one, I can imagine doing something like:

    obj.inqueue=bytesbuffer(100)
    obj.inqueue.lower()   # oh, wait, that didn't really do anything
after all...
    if obj.inqueue[:4] == b"http":   # works on my *regular* input...

Maybe the answer is "don't do that", and to only do this sort of
processing before it goes in the buffer or after it comes out, but ...
it still looks like a major gotcha.

-jJ

From guido at python.org  Mon Oct 15 16:49:31 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 07:49:31 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
Message-ID: <ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>

On 10/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 10/12/07, Guido van Rossum <guido at python.org> wrote:
> > On 10/12/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > > > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > > > what's already there)
>
> > > As I work on these..  Should the mutable PyBytes_ (buffer) objects implement
> > > the following methods inplace and return an additional reference to self?
>
> > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(),
> > > .upper(), .zfill()
>
> > No... That would be a huge trap to fall in at all sorts of occasions.
>
> So would returning a different object.  I expect a mutation operation
> on an explicitly mutable object to mutate the object, instead of
> creating something new.
>
> If it returns a new one, I can imagine doing something like:
>
>     obj.inqueue=bytesbuffer(100)
>     obj.inqueue.lower()   # oh, wait, that didn't really do anything
> after all...
>     if obj.inqueue[:4] == b"http":   # works on my *regular* input...
>
> Maybe the answer is "don't do that", and to only do this sort of
> processing before it goes in the buffer or after it comes out, but ...
> it still looks like a major gotcha.

Since these methods with these very names already exist for strings
and return new values there, I don't see the gotcha unless you never
use strings.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jjb5 at cornell.edu  Mon Oct 15 18:20:24 2007
From: jjb5 at cornell.edu (Joel Bender)
Date: Mon, 15 Oct 2007 12:20:24 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
Message-ID: <47139348.5090006@cornell.edu>

Speaking from the protocol encoding/decoding view, and one where a 
buffer is very similar to a list of small integers...

>> Also what about .replace() and .translate()?
> 
>> If they are not done in place should they return a new buffer (PyBytes_)
>> object or a bytes (PyString_) object?  [i'd say a buffer (PyBytes_)]
> 
> They should return the same type as 'self'.

My preference would be to do the work in place and return None, just 
like sorting a list, reversing a list, appending to a list, etc.

>> Alos if not, should we add additional .ireplace() .ilower() etc.. methods to
>> the mutable buffer (PyBytes_)?  There are speed advantages to doing many of
>> those in place rather than a data copy.
> 
> I'm not sure I see the use case where this matters all that much
> though. Let's say not, if only because it's not in the PEP. ;-)

I would appreciate it if these functions were list-like and not 
tuple-like.  In extending buffers to support more structure encoding and 
decoding functions, it would be nice to carry the expectation that these 
extensions mutate the buffer and I can leverage the built-in 
functionality to do that.

I am but a small voice in the chorus.


Joel

From guido at python.org  Mon Oct 15 18:28:12 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 09:28:12 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <47139348.5090006@cornell.edu>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<47139348.5090006@cornell.edu>
Message-ID: <ca471dc20710150928o1bb4853bte3b6d2d1c2ebe2f6@mail.gmail.com>

On 10/15/07, Joel Bender <jjb5 at cornell.edu> wrote:
> Speaking from the protocol encoding/decoding view, and one where a
> buffer is very similar to a list of small integers...
>
> >> Also what about .replace() and .translate()?
> >
> >> If they are not done in place should they return a new buffer (PyBytes_)
> >> object or a bytes (PyString_) object?  [i'd say a buffer (PyBytes_)]
> >
> > They should return the same type as 'self'.
>
> My preference would be to do the work in place and return None, just
> like sorting a list, reversing a list, appending to a list, etc.

Then propose new APIs that don't have the same names as the existing
ones, which are amongst the most well-known APIs in all of Python.

> >> Alos if not, should we add additional .ireplace() .ilower() etc.. methods to
> >> the mutable buffer (PyBytes_)?  There are speed advantages to doing many of
> >> those in place rather than a data copy.
> >
> > I'm not sure I see the use case where this matters all that much
> > though. Let's say not, if only because it's not in the PEP. ;-)
>
> I would appreciate it if these functions were list-like and not
> tuple-like.  In extending buffers to support more structure encoding and
> decoding functions, it would be nice to carry the expectation that these
> extensions mutate the buffer and I can leverage the built-in
> functionality to do that.

The existing mutable PyBytes type (which will be known as 'buffer' in
3.0a2 and beyond) *does* have a number of list-like methods:
.append(), .insert(), .extend(). Also += will work in place. And of
course slice assignment works.

For structure encoding/decoding, please have a look at the existing
APIs in the struct module and let us know what's missing.

> I am but a small voice in the chorus.

There is no rule that PEPs need to be written by senior developers!
All you need to be able to do in order to *write* a good PEP is to
*listen* well.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Mon Oct 15 18:33:48 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 15 Oct 2007 18:33:48 +0200
Subject: [Python-3000] Should PyString (new bytes type) accept strings with
	encoding?
Message-ID: <4713966C.8080904@cheimes.de>

I'm working on the renaming of str8 -> bytes and bytes -> buffer.
PyBytes (old bytes, new buffer) can take a string together with an
encoding and an optional error argument:


>>> bytes(source="abc", encoding="ascii", errors="replace")
b'abc'
>>> str(b"abc", encoding="ascii")
'abc'

IMO this should work
>>> str8("abc", encoding="ascii")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'encoding' is an invalid keyword argument for this function

And this should break with a type error
>>> str8("abc")
b'abc'


PyString' constructor doesn't take strings (PyUnicode). I like to add
the support for strings to it. It makes the API of str, bytes and buffer
consistent and fixes a *lot* of broken code and tests.

Are you confused by the name changes? I'm sometimes confused so I made a
table:

 c name   |  old  |   new  |  repr
-------------------------------------------
PyUnicode | str   |   -    | ''
PyString  | str8  | bytes  | b''
PyBytes   | bytes | buffer | buffer(b'')

Christian

From guido at python.org  Mon Oct 15 18:49:17 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 09:49:17 -0700
Subject: [Python-3000] Should PyString (new bytes type) accept strings
	with encoding?
In-Reply-To: <4713966C.8080904@cheimes.de>
References: <4713966C.8080904@cheimes.de>
Message-ID: <ca471dc20710150949p1193171ewd01483297120978d@mail.gmail.com>

On 10/15/07, Christian Heimes <lists at cheimes.de> wrote:
> I'm working on the renaming of str8 -> bytes and bytes -> buffer.
> PyBytes (old bytes, new buffer) can take a string together with an
> encoding and an optional error argument:
>
>
> >>> bytes(source="abc", encoding="ascii", errors="replace")
> b'abc'
> >>> str(b"abc", encoding="ascii")
> 'abc'

Correct.

> IMO this should work
> >>> str8("abc", encoding="ascii")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: 'encoding' is an invalid keyword argument for this function

Yes, this should work. (I thought it already did but was wrong. ;-)

> And this should break with a type error
> >>> str8("abc")
> b'abc'

Correct.

> PyString' constructor doesn't take strings (PyUnicode). I like to add
> the support for strings to it. It makes the API of str, bytes and buffer
> consistent and fixes a *lot* of broken code and tests.

Right.

> Are you confused by the name changes? I'm sometimes confused so I made a
> table:
>
>  c name   |  old  |   new  |  repr
> -------------------------------------------
> PyUnicode | str   |   -    | ''
> PyString  | str8  | bytes  | b''
> PyBytes   | bytes | buffer | buffer(b'')

I'd rewrite this as follows:

C name    | 2.x          | 3.0a1      | 3.0a2               |
----------+--------------+------------+---------------------+
PyUnicode | unicode  u"" | str     "" | str             ""  |
PyString  | str       "" | str8   s"" | bytes           ""  |
PyBytes   | N/A          | bytes  b"" | buffer  buffer(b"") |
----------+--------------+------------+---------------------+

Seems worth adding to the PEP. I'll do that.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/b5013dcf/attachment.htm 

From steven.bethard at gmail.com  Mon Oct 15 18:54:43 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 15 Oct 2007 10:54:43 -0600
Subject: [Python-3000] Should PyString (new bytes type) accept strings
	with encoding?
In-Reply-To: <ca471dc20710150949p1193171ewd01483297120978d@mail.gmail.com>
References: <4713966C.8080904@cheimes.de>
	<ca471dc20710150949p1193171ewd01483297120978d@mail.gmail.com>
Message-ID: <d11dcfba0710150954g2d799094wb86bb123003c36e2@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> C name    | 2.x          | 3.0a1      | 3.0a2               |
> ----------+--------------+------------+---------------------+
> PyUnicode | unicode  u"" | str     "" | str             ""  |
> PyString  | str       "" | str8   s"" | bytes           ""  |
> PyBytes   | N/A          | bytes  b"" | buffer  buffer(b"") |
> ----------+--------------+------------+---------------------+

That "" beside bytes in the 3.0a2 column should be b"" (that is, with
a "b" prefix), right?

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Mon Oct 15 18:59:30 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 09:59:30 -0700
Subject: [Python-3000] Should PyString (new bytes type) accept strings
	with encoding?
In-Reply-To: <d11dcfba0710150954g2d799094wb86bb123003c36e2@mail.gmail.com>
References: <4713966C.8080904@cheimes.de>
	<ca471dc20710150949p1193171ewd01483297120978d@mail.gmail.com>
	<d11dcfba0710150954g2d799094wb86bb123003c36e2@mail.gmail.com>
Message-ID: <ca471dc20710150959k7c886fb7tead47e563382f016@mail.gmail.com>

Correct. Sorry. Here's an improved table that I'm also adding to the PEP:

C name       | 2.x    repr | 3.0a1 repr | 3.0a2         repr
-------------+-------------+------------+-------------------
PyUnicode    | unicode u"" | str     "" | str             ""
PyString     | str      "" | str8   s"" | bytes          b""
PyBytes      | N/A         | bytes  b"" | buffer buffer(b"")
PyBuffer     | buffer  N/A | buffer N/A | N/A
PyMemoryView | N/A         | N/A        | memoryview
N/A-------------+-------------+------------+-------------------

--Guido

On 10/15/07, Steven Bethard <steven.bethard at gmail.com> wrote:
>
> On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> > C name    | 2.x          | 3.0a1      | 3.0a2               |
> > ----------+--------------+------------+---------------------+
> > PyUnicode | unicode  u"" | str     "" | str             ""  |
> > PyString  | str       "" | str8   s"" | bytes           ""  |
> > PyBytes   | N/A          | bytes  b"" | buffer  buffer(b"") |
> > ----------+--------------+------------+---------------------+
>
> That "" beside bytes in the 3.0a2 column should be b"" (that is, with
> a "b" prefix), right?
>
> STeVe
> --
> I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
> tiny blip on the distant coast of sanity.
>         --- Bucky Katt, Get Fuzzy
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/65f4d859/attachment-0001.htm 

From christian at cheimes.de  Mon Oct 15 18:39:23 2007
From: christian at cheimes.de (Christian Heimes)
Date: Mon, 15 Oct 2007 18:39:23 +0200
Subject: [Python-3000] Should PyString (new bytes type) accept strings
	with encoding?
In-Reply-To: <4713966C.8080904@cheimes.de>
References: <4713966C.8080904@cheimes.de>
Message-ID: <471397BB.5070806@cheimes.de>

Doh, the answer is in the PEP. Please ignore the other mail :)

http://www.python.org/dev/peps/pep-3137/#constructors

Christian

From tjreedy at udel.edu  Mon Oct 15 18:41:26 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 15 Oct 2007 12:41:26 -0400
Subject: [Python-3000] PEP 3137 plan of attack
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com><52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com><52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com><ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com><fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
Message-ID: <ff057k$hbi$1@ger.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20710150749y70ba12cfmadf1c59974c61926 at mail.gmail.com...
| > > > As I work on these..  Should the mutable PyBytes_ (buffer) objects 
implement
| > > > the following methods inplace and return an additional reference to 
self?
| >
| > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), 
.title(),
| > > > .upper(), .zfill()
| >
| > > No... That would be a huge trap to fall in at all sorts of occasions.

At this point, I though your objection was to returning the buffer instead 
of None, as with list mutations, and for the same reason.  But admittedly, 
some people do not like this feature of lists.

| > So would returning a different object.  I expect a mutation operation
| > on an explicitly mutable object to mutate the object, instead of
| > creating something new.

So was I.

| Since these methods with these very names already exist for strings
| and return new values there, I don't see the gotcha unless you never
| use strings.

The real question is what is more useful?  I would think that being able to 
edit in place would be a reason to use a buffer rather than (immutable) 
bytes.

tjr






From greg at krypto.org  Mon Oct 15 19:55:43 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 15 Oct 2007 10:55:43 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710150928o1bb4853bte3b6d2d1c2ebe2f6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<47139348.5090006@cornell.edu>
	<ca471dc20710150928o1bb4853bte3b6d2d1c2ebe2f6@mail.gmail.com>
Message-ID: <52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com>

> > >> Also what about .replace() and .translate()?
> > >
> > >> If they are not done in place should they return a new buffer
> (PyBytes_)
> > >> object or a bytes (PyString_) object?  [i'd say a buffer (PyBytes_)]
> > >
> > > They should return the same type as 'self'.
> >
> > My preference would be to do the work in place and return None, just
> > like sorting a list, reversing a list, appending to a list, etc.
>
> Then propose new APIs that don't have the same names as the existing
> ones, which are amongst the most well-known APIs in all of Python.


Agreed, thats why I suggest new method names with an 'i' in front for
inplace.  Anyways I'll be done with my patch to add the copying versions of
the methods later today.  Stay tuned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/5a876dcd/attachment.htm 

From greg at krypto.org  Mon Oct 15 19:58:15 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 15 Oct 2007 10:58:15 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ff057k$hbi$1@ger.gmane.org>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
	<ff057k$hbi$1@ger.gmane.org>
Message-ID: <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>

On 10/15/07, Terry Reedy <tjreedy at udel.edu> wrote:
>
>
> "Guido van Rossum" <guido at python.org> wrote in message
> news:ca471dc20710150749y70ba12cfmadf1c59974c61926 at mail.gmail.com...
> | > > > As I work on these..  Should the mutable PyBytes_ (buffer) objects
> implement
> | > > > the following methods inplace and return an additional reference
> to
> self?
> | >
> | > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(),
> .title(),
> | > > > .upper(), .zfill()
> | >
> | > > No... That would be a huge trap to fall in at all sorts of
> occasions.
>
> At this point, I though your objection was to returning the buffer instead
> of None, as with list mutations, and for the same reason.  But admittedly,
> some people do not like this feature of lists.
>
> | > So would returning a different object.  I expect a mutation operation
> | > on an explicitly mutable object to mutate the object, instead of
> | > creating something new.
>
> So was I.
>
> | Since these methods with these very names already exist for strings
> | and return new values there, I don't see the gotcha unless you never
> | use strings.
>
> The real question is what is more useful?  I would think that being able
> to
> edit in place would be a reason to use a buffer rather than (immutable)
> bytes.
>
> tjr


I agree, thats a benefit of a mutable object.  But I think the point about
not reusing the names with a different behavior is valid so that some code
can be written to operate on objects with duck type without having to know
if its mutable or not.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/b505de4e/attachment.htm 

From jimjjewett at gmail.com  Mon Oct 15 20:11:35 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 15 Oct 2007 14:11:35 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
	<ff057k$hbi$1@ger.gmane.org>
	<52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>
Message-ID: <fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>

On 10/15/07, Gregory P. Smith <greg at krypto.org> wrote:
> On 10/15/07, Terry Reedy <tjreedy at udel.edu> wrote:

> > ...I would think that being able to edit in place would be a reason
> > to use a buffer rather than (immutable) bytes.

> I agree, thats a benefit of a mutable object.  But I think the point about
> not reusing the names with a different behavior is valid so that some
> code can be written to operate on objects with duck type without
> having to know if its mutable or not.

I thought that was the reason to return self instead of None.

If returning the original (but mutated) buffer is a problem, then
there is already a problem, because someone else could already mutate
the original.

(Also note that for duck-typing, it should be OK if the new result
object is always immutable, since you have to handle that case
anyhow.)

-jJ

From luke.stebbing at gmail.com  Mon Oct 15 20:33:46 2007
From: luke.stebbing at gmail.com (Luke Stebbing)
Date: Mon, 15 Oct 2007 11:33:46 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
	<ff057k$hbi$1@ger.gmane.org>
	<52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>
	<fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
Message-ID: <dcb1979a0710151133s3653b533g5c8bdd6a8f7d248d@mail.gmail.com>

On 10/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> If returning the original (but mutated) buffer is a problem, then
> there is already a problem, because someone else could already mutate
> the original.
>
> (Also note that for duck-typing, it should be OK if the new result
> object is always immutable, since you have to handle that case
> anyhow.)

Changing the contract of a function can really mess with duck-typing.
If you write a function that internally creates a lowered copy of a
variable (for comparison, say), suddenly you're unintentionally
lowering your argument in-place. Even returning an immutable result
object is a problem, because your contract changes from "I return a
lowered, rjusted copy of my argument" to "I return a lowered rjusted
copy of my argument that -- oops -- is immutable now if it wasn't
before".

Luke

From rhamph at gmail.com  Mon Oct 15 20:40:29 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 15 Oct 2007 12:40:29 -0600
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
Message-ID: <aac2c7cb0710151140q5dc0dd92i604baaccdf3c61af@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> On 10/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > On 10/12/07, Guido van Rossum <guido at python.org> wrote:
> > > On 10/12/07, Gregory P. Smith <greg at krypto.org> wrote:
> > > > > > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > > > > what's already there)
> >
> > > > As I work on these..  Should the mutable PyBytes_ (buffer) objects implement
> > > > the following methods inplace and return an additional reference to self?
> >
> > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(),
> > > > .upper(), .zfill()
> >
> > > No... That would be a huge trap to fall in at all sorts of occasions.
> >
> > So would returning a different object.  I expect a mutation operation
> > on an explicitly mutable object to mutate the object, instead of
> > creating something new.
> >
> > If it returns a new one, I can imagine doing something like:
> >
> >     obj.inqueue=bytesbuffer(100)
> >     obj.inqueue.lower()   # oh, wait, that didn't really do anything
> > after all...
> >     if obj.inqueue[:4] == b"http":   # works on my *regular* input...
> >
> > Maybe the answer is "don't do that", and to only do this sort of
> > processing before it goes in the buffer or after it comes out, but ...
> > it still looks like a major gotcha.
>
> Since these methods with these very names already exist for strings
> and return new values there, I don't see the gotcha unless you never
> use strings.

Maybe .lower() should return immutable bytes, rather than mutable
buffer?  For the use cases I can imagine this'd still work correctly,
and fits better with why it makes a copy.  buffer is all about
operating in-place, so any copy immediately doesn't fit the buffer
concept.

    obj.inqueue = bytesbuffer(100)
    # replaces existing buffer contents.  Temp copy need not be mutable
    obj.inqueue[:] = obj.inqueue.lower()
    if obj.inqueue[:4] == b"http":


    obj.inqueue = bytesbuffer(100)
    if obj.inqueue[:4].lower() == b"http":  # compares bytes with bytes

-- 
Adam Olsen, aka Rhamphoryncus

From luke.stebbing at gmail.com  Mon Oct 15 20:42:48 2007
From: luke.stebbing at gmail.com (Luke Stebbing)
Date: Mon, 15 Oct 2007 11:42:48 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
Message-ID: <dcb1979a0710151142h2b774a62x16a273cfe49ef77f@mail.gmail.com>

On 10/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> So would returning a different object.  I expect a mutation operation
> on an explicitly mutable object to mutate the object, instead of
> creating something new.
>
> If it returns a new one, I can imagine doing something like:
>
>     obj.inqueue=bytesbuffer(100)
>     obj.inqueue.lower()   # oh, wait, that didn't really do anything
> after all...
>     if obj.inqueue[:4] == b"http":   # works on my *regular* input...
>
> Maybe the answer is "don't do that", and to only do this sort of
> processing before it goes in the buffer or after it comes out, but ...
> it still looks like a major gotcha.

I expect something spelled "lower" to try and transform an object
in-place, period. Too bad changing it to "lowered" would be such a
royal pain.

Luke

From guido at python.org  Mon Oct 15 21:32:16 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 12:32:16 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
	<ff057k$hbi$1@ger.gmane.org>
	<52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>
	<fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
Message-ID: <ca471dc20710151232v7bfcfdcei5a1cab4553557daa@mail.gmail.com>

I am not going to explain this further if you still don't get it.

These functions should not modify their argument, and return a copy of
the same type as the original.

I'm fine with new APIs that perform similar things in-place.

--Guido

On 10/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 10/15/07, Gregory P. Smith <greg at krypto.org> wrote:
> > On 10/15/07, Terry Reedy <tjreedy at udel.edu> wrote:
>
> > > ...I would think that being able to edit in place would be a reason
> > > to use a buffer rather than (immutable) bytes.
>
> > I agree, thats a benefit of a mutable object.  But I think the point about
> > not reusing the names with a different behavior is valid so that some
> > code can be written to operate on objects with duck type without
> > having to know if its mutable or not.
>
> I thought that was the reason to return self instead of None.
>
> If returning the original (but mutated) buffer is a problem, then
> there is already a problem, because someone else could already mutate
> the original.
>
> (Also note that for duck-typing, it should be OK if the new result
> object is always immutable, since you have to handle that case
> anyhow.)
>
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From cjw at sympatico.ca  Mon Oct 15 23:16:08 2007
From: cjw at sympatico.ca (Colin J. Williams)
Date: Mon, 15 Oct 2007 17:16:08 -0400
Subject: [Python-3000] bytes vs array.array vs numpy.array
In-Reply-To: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com>
References: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com>
Message-ID: <ff0lat$cjj$1@ger.gmane.org>

skip at pobox.com wrote:
>     Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations
>     Nick> in there eventually...
>
>     Nick>    data = bytes([x & 0x1F for x in orig_data])
>
> This begins to make it look what you want is array.array or nump.array.
> Python's arrays don't support bitwise operations either, but numpy's do.
> How much overlap is there between the three types?  Does it make sense to
> consider that canonical underlying array type now (or in the near future,
> sometime before the release of 3.0 final)?
>
> Skip

I am a lurker here, rather than a 
contributer but I hope that this
idea will be explored further.

A good canonical multi-dimensional array 
is needed.

NumPy provides a class which, in 
addition to serving various numeric
needs, also provides for a 
multi-dimensional array where the elements
can be of some class/types.

It would be good if array.Array could 
create a multidimensional array, where 
each element would be an instance of 
dtype, which could be any known type or 
class

The Array could have a signature 
something like:
          Array(shape, type, initializer)

          where:
            shape        is a tuple, 
giving the dimensionality
                         (or an integer 
for a single dimension)
            dtype        is a Python 
type or class
            initializer  is a Python 
expression which can be converted into 
an array of dtype, where dtype is any 
known type or class.

Thus, Array(5, float, [0, 1, 2, 3, 4]) 
would have the same effect as the 
current array.array('f', [0., 1., 2., 
3., 4.])

To allow for the full range of data 
types provided by array.array, it would 
be necessary to define a few additional 
Python data types.  The aim here is to 
use meaningful mnemonics, rather than 
obscure letter codes.

Colin W.



From guido at python.org  Tue Oct 16 00:31:37 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 15:31:37 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
Message-ID: <ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>

There's one thing that I forgot to add to PEP 3137. It's the removal
of the basestring type. I think this is a reasonable thing to do.
Christian Heimes has a patch that does this cleanly. Anyone objecting,
please speak up now!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct 16 01:04:54 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 16:04:54 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
Message-ID: <ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> There's one thing that I forgot to add to PEP 3137. It's the removal
> of the basestring type. I think this is a reasonable thing to do.
> Christian Heimes has a patch that does this cleanly. Anyone objecting,
> please speak up now!

And, quite separately, we will need a common base type for bytes and
buffer. I think that should be an ABC in the collections module
though, which simply registers bytes and buffer. Any suggestions for a
name?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Tue Oct 16 02:04:59 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 15 Oct 2007 17:04:59 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
	<ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>
Message-ID: <52dc1c820710151704ra5789b9q7fcb59760bdb7479@mail.gmail.com>

while trying to figure out what to update the common method docstrings to
say I've come up with terms such as 'byte string' or 'byte buffer' but none
of those are extra appealing to me to turn into an ABC name.  other
thoughts?

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
>
> On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> > There's one thing that I forgot to add to PEP 3137. It's the removal
> > of the basestring type. I think this is a reasonable thing to do.
> > Christian Heimes has a patch that does this cleanly. Anyone objecting,
> > please speak up now!
>
> And, quite separately, we will need a common base type for bytes and
> buffer. I think that should be an ABC in the collections module
> though, which simply registers bytes and buffer. Any suggestions for a
> name?
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/3929a99a/attachment.htm 

From greg at krypto.org  Tue Oct 16 02:10:24 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 15 Oct 2007 17:10:24 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<47139348.5090006@cornell.edu>
	<ca471dc20710150928o1bb4853bte3b6d2d1c2ebe2f6@mail.gmail.com>
	<52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com>
Message-ID: <52dc1c820710151710g3edf3b3cs61a010c85e3b35b1@mail.gmail.com>

>  Anyways I'll be done with my patch to add the copying versions of the
> methods later today.  Stay tuned.
>

The PyBytes methods from PEP3137 have been implemented.  Review as desired.

http://bugs.python.org/issue1261

If its good as is, let me know and I can check that in if you don't want to
yourself.

I believe there are some more opportunities for code sharing between
PyString and PyBytes both in methods already existing in stringobject and
bytesobject and in some of the Objects/stringlib/transmogrify.h code that
this patch adds.  I tried to share as much as possible to avoid both bloat
and most importantly multiple copies of the same algorithms.  That could be
considered additional cleanup or optimization.

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/7d00db1c/attachment.htm 

From greg.ewing at canterbury.ac.nz  Tue Oct 16 02:26:27 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 16 Oct 2007 13:26:27 +1300
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
Message-ID: <47140533.9040700@canterbury.ac.nz>

Jim Jewett wrote:
> On 10/12/07, Guido van Rossum <guido at python.org> wrote:
> 
>>On 10/12/07, Gregory P. Smith <greg at krypto.org> wrote:
>>
>>>Should the mutable PyBytes_ (buffer) objects implement
>>>the following methods inplace and return an additional reference to self?

If they're to work in-place, they should return None.

--
Greg

From brett at python.org  Tue Oct 16 03:45:30 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 15 Oct 2007 18:45:30 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
	<ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>
Message-ID: <bbaeab100710151845g115a963bvc86c01f9808c6858@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> > There's one thing that I forgot to add to PEP 3137. It's the removal
> > of the basestring type. I think this is a reasonable thing to do.
> > Christian Heimes has a patch that does this cleanly. Anyone objecting,
> > please speak up now!
>
> And, quite separately, we will need a common base type for bytes and
> buffer. I think that should be an ABC in the collections module
> though, which simply registers bytes and buffer. Any suggestions for a
> name?

BinaryData, RawData.  I use both 'binary' and 'raw' in my variable
names when I have used bytes so that's why those names pop into my
head.

-Brett

From guido at python.org  Tue Oct 16 04:12:53 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Oct 2007 19:12:53 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <bbaeab100710151845g115a963bvc86c01f9808c6858@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
	<ca471dc20710151604x277e625ate17c04e899bd8fc6@mail.gmail.com>
	<bbaeab100710151845g115a963bvc86c01f9808c6858@mail.gmail.com>
Message-ID: <ca471dc20710151912h23ad8464gb0e1e95d43eea021@mail.gmail.com>

ByteSequence.

On 10/15/07, Brett Cannon <brett at python.org> wrote:
> On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> > On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> > > There's one thing that I forgot to add to PEP 3137. It's the removal
> > > of the basestring type. I think this is a reasonable thing to do.
> > > Christian Heimes has a patch that does this cleanly. Anyone objecting,
> > > please speak up now!
> >
> > And, quite separately, we will need a common base type for bytes and
> > buffer. I think that should be an ABC in the collections module
> > though, which simply registers bytes and buffer. Any suggestions for a
> > name?
>
> BinaryData, RawData.  I use both 'binary' and 'raw' in my variable
> names when I have used bytes so that's why those names pop into my
> head.
>
> -Brett
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Oct 16 08:01:43 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 16 Oct 2007 19:01:43 +1300
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<ca471dc20710150749y70ba12cfmadf1c59974c61926@mail.gmail.com>
	<ff057k$hbi$1@ger.gmane.org>
	<52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com>
	<fb6fbf560710151111x14e1a722je4ee38dc71348b96@mail.gmail.com>
Message-ID: <471453C7.1010509@canterbury.ac.nz>

Jim Jewett wrote:
> I thought that was the reason to return self instead of None.

That would be even more misleading, because you would get
no warning that you had called a mutating method when you
thought you were calling a non-mutating one.

This is the reason that all the existing mutating methods
return None instead of self. It's safer that way.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Oct 16 08:04:54 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 16 Oct 2007 19:04:54 +1300
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <dcb1979a0710151142h2b774a62x16a273cfe49ef77f@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com>
	<ca471dc20710121820r606568barf51b63e4c52703b0@mail.gmail.com>
	<fb6fbf560710150657l3f47612fiaf9f8856cc947ba3@mail.gmail.com>
	<dcb1979a0710151142h2b774a62x16a273cfe49ef77f@mail.gmail.com>
Message-ID: <47145486.4050505@canterbury.ac.nz>

Luke Stebbing wrote:
> I expect something spelled "lower" to try and transform an object
> in-place, period. Too bad changing it to "lowered" would be such a
> royal pain.

Yes, those methods should probably have been called
"lowered", "captitalized", etc. from the beginning, but
the time machine would need an upgrade to make that big
a history change. :-(

--
Greg

From greg at krypto.org  Tue Oct 16 08:36:30 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 15 Oct 2007 23:36:30 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
Message-ID: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com>

On 10/8/07, Gregory P. Smith <greg at krypto.org> wrote:
>
>
> - add missing methods to PyBytes (for list, see the PEP and compare to
> > what's already there)
>
>
Committed revision 58493.  (closes issue1261).

fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the
following tests on the py3k branch were failing both before and after this
change.

  test_cProfile test_doctest test_email test_profile

I didn't break them. :)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/bcea2f46/attachment.htm 

From brett at python.org  Tue Oct 16 09:16:29 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 16 Oct 2007 00:16:29 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com>
Message-ID: <bbaeab100710160016s5d0277b5wc557acc4d3cf2a3c@mail.gmail.com>

On 10/15/07, Gregory P. Smith <greg at krypto.org> wrote:
>
> On 10/8/07, Gregory P. Smith <greg at krypto.org> wrote:
> >
> >
> >
> > > - add missing methods to PyBytes (for list, see the PEP and compare to
> > > what's already there)
> >
>
> Committed revision 58493.  (closes issue1261).
>
> fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the
> following tests on the py3k branch were failing both before and after this
> change.
>
>   test_cProfile test_doctest test_email test_profile
>
> I didn't break them. :)

Running test_doctest really quickly (to make sure I didn't break it
=) shows no breakage on a build from r58479.

-Brett

From goodger at python.org  Tue Oct 16 15:30:50 2007
From: goodger at python.org (David Goodger)
Date: Tue, 16 Oct 2007 09:30:50 -0400
Subject: [Python-3000] PyCon 2008: Call for Talk & Tutorial Proposals
In-Reply-To: <47140763.30009@python.org>
References: <47140763.30009@python.org>
Message-ID: <4335d2c40710160630p1f94e67am11c504f15ddeff42@mail.gmail.com>

Proposals for PyCon 2008 talks & tutorials are now being accepted.
The deadline for proposals is November 16.

PyCon 2008 will be held in Chicago, Illinois, USA, from March 13-20.

Please see the full announcement here:
http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html

-- 
David Goodger <http://python.net/~goodger>

From lists at cheimes.de  Tue Oct 16 15:45:56 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 16 Oct 2007 15:45:56 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com>
	<52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com>
Message-ID: <ff2fak$e05$1@ger.gmane.org>

Gregory P. Smith wrote:
> fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the
> following tests on the py3k branch were failing both before and after this
> change.
> 
>   test_cProfile test_doctest test_email test_profile
> 
> I didn't break them. :)

They are broken on Ubuntu Linux, i386 and UCS-4 build for me, too. The
failures in doctest, profile and cProfile are caused by additional calls
to utf_8_decode. They were introduced by the patch from me and Alexandre
but we don't know how to fix them.

I've a fix for one of the two failures in test_email in one of my
pending patches.

Christian


From guido at python.org  Tue Oct 16 18:55:51 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 16 Oct 2007 09:55:51 -0700
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
Message-ID: <ca471dc20710160955n715e9dffw22195602b52a5429@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> There's one thing that I forgot to add to PEP 3137. It's the removal
> of the basestring type. I think this is a reasonable thing to do.
> Christian Heimes has a patch that does this cleanly. Anyone objecting,
> please speak up now!

No-one spoke up. I'll check in Christian's patch now, and add this to the PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Tue Oct 16 19:06:59 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 16 Oct 2007 19:06:59 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710160955n715e9dffw22195602b52a5429@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
	<ca471dc20710160955n715e9dffw22195602b52a5429@mail.gmail.com>
Message-ID: <4714EFB3.303@cheimes.de>

Guido van Rossum wrote:
> No-one spoke up. I'll check in Christian's patch now, and add this to the PEP.

Thanks!

The fixer for basestr -> str is available at http://bugs.python.org/file8548

Christian

From lists at cheimes.de  Tue Oct 16 19:06:20 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 16 Oct 2007 19:06:20 +0200
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710160955n715e9dffw22195602b52a5429@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
	<ca471dc20710160955n715e9dffw22195602b52a5429@mail.gmail.com>
Message-ID: <ff2r2p$q6u$1@ger.gmane.org>

Guido van Rossum wrote:
> No-one spoke up. I'll check in Christian's patch now, and add this to the PEP.

Thanks!

The fixer for basestr -> str is available at http://bugs.python.org/file8548

Christian


From dwheeler at dwheeler.com  Tue Oct 16 20:41:14 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Tue, 16 Oct 2007 14:41:14 -0400 (EDT)
Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119
Message-ID: <E1IhrLu-000133-Jn@fenris.runbox.com>

Hi, I'm a Python user who likes much in the upcoming Python 3000.  I wish you well! I have a few comments, though, that I hope are constructive.  Guido asked me to repost them to this mailing list for discussion.  I'll send my different comments as separate messages, so that they can be easily discussed separately.  So...

In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?).

Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean.  (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works).   All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way.

--- David A. Wheeler 

From dwheeler at dwheeler.com  Tue Oct 16 20:43:05 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Tue, 16 Oct 2007 14:43:05 -0400 (EDT)
Subject: [Python-3000] Add python-3000-like print function to python 2.6
Message-ID: <E1IhrNh-0001d4-AD@fenris.runbox.com>

In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics?  Call it printfunc or whatever.

Python 3000 is undergoing much pain so that print can become a function. How about making those benefits available sooner than 3.0, so that we can use them earlier?  Obviously people can create their own such function, but having a STANDARD name for it would mean that 2to3 could easily automate that translation.  Plus, it'd help people get used to the idea of a printing _function_.

--- David A. Wheeler 

From dwheeler at dwheeler.com  Tue Oct 16 20:45:09 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Tue, 16 Oct 2007 14:45:09 -0400 (EDT)
Subject: [Python-3000] Please re-add __cmp__ to python 3000
Message-ID: <E1IhrPh-0007OE-6Y@garm.runbox.com>

I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).

Yes, it's possible to write all the comparison operations, but I think it's _clearer_ to create a single low-level operator that handles ALL the comparison operators.  It also avoids many mistakes; once you get that ONE operator right, ALL comparisons are right.  I think the python 2 way is better: individual operations for the cases where you want to handle each case specially, and a single __cmp__ function that is a simple way to handle comparisons all at once.

--- David A. Wheeler 

From lists at cheimes.de  Tue Oct 16 21:27:21 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 16 Oct 2007 21:27:21 +0200
Subject: [Python-3000] Add python-3000-like print function to python 2.6
In-Reply-To: <E1IhrNh-0001d4-AD@fenris.runbox.com>
References: <E1IhrNh-0001d4-AD@fenris.runbox.com>
Message-ID: <ff33ar$rfr$1@ger.gmane.org>

David A. Wheeler wrote:
> In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics?  Call it printfunc or whatever.

I like xprint(). It follows the example of range/xrange, it's short,
fast to type and easy to remember. Neither google nor find -name \*.py |
xargs grep xprint revealed a method xprint.

Christian


From steven.bethard at gmail.com  Tue Oct 16 21:34:09 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 16 Oct 2007 13:34:09 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IhrPh-0007OE-6Y@garm.runbox.com>
References: <E1IhrPh-0007OE-6Y@garm.runbox.com>
Message-ID: <d11dcfba0710161234x2dd8b777t19e83113a9d3a909@mail.gmail.com>

On 10/16/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).
>
> Yes, it's possible to write all the comparison operations, but I think
> it's _clearer_ to create a single low-level operator that handles ALL
> the comparison operators.  It also avoids many mistakes; once you
> get that ONE operator right, ALL comparisons are right.  I think the
> python 2 way is better: individual operations for the cases where you
> want to handle each case specially, and a single __cmp__ function
> that is a simple way to handle comparisons all at once.

Why can't this just be supplied with a mixin?  Here's a recipe
providing the appropriate mixins if you want to define a __key__
function:

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

Presumably, you could do a very similar thing for __cmp__ if you
wanted to use it.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Tue Oct 16 22:27:22 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 16 Oct 2007 13:27:22 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IhrPh-0007OE-6Y@garm.runbox.com>
References: <E1IhrPh-0007OE-6Y@garm.runbox.com>
Message-ID: <ca471dc20710161327q5e5d3737p7cfef4f2cf3e4a01@mail.gmail.com>

On 10/16/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).
>
> Yes, it's possible to write all the comparison operations, but I think it's _clearer_ to create a single low-level operator that handles ALL the comparison operators.  It also avoids many mistakes; once you get that ONE operator right, ALL comparisons are right.  I think the python 2 way is better: individual operations for the cases where you want to handle each case specially, and a single __cmp__ function that is a simple way to handle comparisons all at once.

Perhaps, but do note that __cmp__ is *higher* level than __eq__ etc. ,
not lower level.

I'd be okay with code that detects the presence of _cmp__ and then
automatically defines __eq__ etc. accordingly. Whether this should be
default behavior or a mixin that you explicitly have to request I'm
not sure. I'd be willing to entertain a PEP that clearly explains the
motivation and puts forward a specific solution.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct 16 22:29:07 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 16 Oct 2007 13:29:07 -0700
Subject: [Python-3000] Add python-3000-like print function to python 2.6
In-Reply-To: <E1IhrNh-0001d4-AD@fenris.runbox.com>
References: <E1IhrNh-0001d4-AD@fenris.runbox.com>
Message-ID: <ca471dc20710161329y7874e01bx8ce219af947c0526@mail.gmail.com>

On 10/16/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics?  Call it printfunc or whatever.
>
> Python 3000 is undergoing much pain so that print can become a function. How about making those benefits available sooner than 3.0, so that we can use them earlier?  Obviously people can create their own such function, but having a STANDARD name for it would mean that 2to3 could easily automate that translation.  Plus, it'd help people get used to the idea of a printing _function_.

I expect this will happen. At the very least, you'll be able to just
use 'print' for that function's name if you include

  from __future__ import print_function

at the top of your module. Whether it's worth it to make the same
function available under a different name that doesn't require such an
import I'm not sure.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fdrake at acm.org  Tue Oct 16 22:53:08 2007
From: fdrake at acm.org (Fred Drake)
Date: Tue, 16 Oct 2007 16:53:08 -0400
Subject: [Python-3000] Add python-3000-like print function to python 2.6
In-Reply-To: <ca471dc20710161329y7874e01bx8ce219af947c0526@mail.gmail.com>
References: <E1IhrNh-0001d4-AD@fenris.runbox.com>
	<ca471dc20710161329y7874e01bx8ce219af947c0526@mail.gmail.com>
Message-ID: <EBC6AF71-ED7F-429D-8D9B-D471789EADDB@acm.org>

On Oct 16, 2007, at 4:29 PM, Guido van Rossum wrote:
> I expect this will happen. At the very least, you'll be able to just
> use 'print' for that function's name if you include
>
>   from __future__ import print_function
>
> at the top of your module. Whether it's worth it to make the same
> function available under a different name that doesn't require such an
> import I'm not sure.

This makes sense to me.  Creating a new name for the function doesn't  
add anything, IMO: to use it I need to "dirty" my code wherever I  
print, using the __future__ import only dirties an isolated spot in a  
module that prints.  Much better, and probably useful during the  
transitional period.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From nnorwitz at gmail.com  Tue Oct 16 22:59:15 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Tue, 16 Oct 2007 13:59:15 -0700
Subject: [Python-3000] Add python-3000-like print function to python 2.6
In-Reply-To: <EBC6AF71-ED7F-429D-8D9B-D471789EADDB@acm.org>
References: <E1IhrNh-0001d4-AD@fenris.runbox.com>
	<ca471dc20710161329y7874e01bx8ce219af947c0526@mail.gmail.com>
	<EBC6AF71-ED7F-429D-8D9B-D471789EADDB@acm.org>
Message-ID: <ee2a432c0710161359x38158168pd28001a1a4e61ab2@mail.gmail.com>

On 10/16/07, Fred Drake <fdrake at acm.org> wrote:
> On Oct 16, 2007, at 4:29 PM, Guido van Rossum wrote:
> > I expect this will happen. At the very least, you'll be able to just
> > use 'print' for that function's name if you include
> >
> >   from __future__ import print_function
> >
> > at the top of your module. Whether it's worth it to make the same
> > function available under a different name that doesn't require such an
> > import I'm not sure.
>
> This makes sense to me.  Creating a new name for the function doesn't
> add anything, IMO: to use it I need to "dirty" my code wherever I
> print, using the __future__ import only dirties an isolated spot in a
> module that prints.  Much better, and probably useful during the
> transitional period.

There's a patch for this too.  http://bugs.python.org/issue1633807

n

From brett at python.org  Tue Oct 16 23:03:51 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 16 Oct 2007 14:03:51 -0700
Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119
In-Reply-To: <E1IhrLu-000133-Jn@fenris.runbox.com>
References: <E1IhrLu-000133-Jn@fenris.runbox.com>
Message-ID: <bbaeab100710161403q2feb0dcaj734fd69f17df347f@mail.gmail.com>

On 10/16/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> Hi, I'm a Python user who likes much in the upcoming Python 3000.  I wish you well! I have a few comments, though, that I hope are constructive.  Guido asked me to repost them to this mailing list for discussion.  I'll send my different comments as separate messages, so that they can be easily discussed separately.  So...
>
> In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?).

That just makes me think it is a Google product.  I would say Boolean
is a fine name since the type is named bool, but that might be too
close of a name.

>
> Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean.  (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works).   All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way.

Seems fine by me.

-Brett

From guido at python.org  Tue Oct 16 23:42:12 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 16 Oct 2007 14:42:12 -0700
Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119
In-Reply-To: <E1IhrLu-000133-Jn@fenris.runbox.com>
References: <E1IhrLu-000133-Jn@fenris.runbox.com>
Message-ID: <ca471dc20710161442i605197e6g731577ebf7525148@mail.gmail.com>

On 10/16/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> Hi, I'm a Python user who likes much in the upcoming Python 3000.  I wish you well! I have a few comments, though, that I hope are constructive.  Guido asked me to repost them to this mailing list for discussion.  I'll send my different comments as separate messages, so that they can be easily discussed separately.  So...
>
> In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?).
>
> Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean.  (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works).   All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way.

This sounds misguided to me. While it is true that some types can
never be false, they can still be useful as a truth value: e.g. a
parameter could be either a Widget object (assuming Widgets are never
false) or None. This is used pretty commonly.

So there is absolutely nothing to test for in the type of an object --
*every* object is usable as a "generalized boolean". It therefore
becomes purely a matter of argument annotation, an area which is
explicitly left open for experimentation by PEP 3107.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Oct 17 00:06:01 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 17 Oct 2007 11:06:01 +1300
Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119
In-Reply-To: <E1IhrLu-000133-Jn@fenris.runbox.com>
References: <E1IhrLu-000133-Jn@fenris.runbox.com>
Message-ID: <471535C9.7000500@canterbury.ac.nz>

David A. Wheeler wrote:

> Any class defining __bool__ (formerly __nonzero__), or one implementing 
 > Sized (which implement __len__), would be a generalized boolean.

Considering that *all* objects have at least an implicit
implementation of __bool__ (that tests against None) I'm
not sure that this would be a meaningful or useful
concept.

What use cases do you have in mind for this?

--
Greg

From jimjjewett at gmail.com  Wed Oct 17 02:40:01 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 16 Oct 2007 20:40:01 -0400
Subject: [Python-3000] PEP 3137 plan of attack
In-Reply-To: <ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
References: <ca471dc20710072132n3d77e13em28b3a5a2e5f19d9f@mail.gmail.com>
	<ca471dc20710151531he4df23cj1ffcd3bd53c0c7b6@mail.gmail.com>
Message-ID: <fb6fbf560710161740n65acab5bq28fe10b669d1236c@mail.gmail.com>

On 10/15/07, Guido van Rossum <guido at python.org> wrote:
> There's one thing that I forgot to add to PEP 3137. It's the removal
> of the basestring type. I think this is a reasonable thing to do.
> Christian Heimes has a patch that does this cleanly. Anyone objecting,
> please speak up now!

I don't like replacing the abstract basestring with a concrete type in
isinstance checks.  I agree that the right answer is something in ABC,
which may not need to be a builtin.

Does tearing out basestring before adding that "something" (String?)
cause any problems?

-jJ

From dwheeler at dwheeler.com  Wed Oct 17 05:47:41 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Tue, 16 Oct 2007 23:47:41 -0400 (EDT)
Subject: [Python-3000] Add python-3000-like print function to python 2.6
Message-ID: <E1Ihzsj-00089S-Ap@fenris.runbox.com>

Guido van Rossum wrote:
> > I expect this will happen. At the very least, you'll be able to just
> > use 'print' for that function's name if you include
> >   from __future__ import print_function

Neal Norwitz wrote:
> There's a patch for this too.  http://bugs.python.org/issue1633807

Excellent!  I like the "from __future__..." approach better than what I'd originally proposed.  If that is the plan for Python 2.6 (and I hope it is), can I appeal to someone to modify PEP 3105 to specifically _note_ that this is a planned addition for 2.6?  Just a sentence or two would do it, e.g.: "Python 2.6 will include a 'from __future__ import print_function', which enables use of print as a function with these semantics instead of the traditional Python 2 print statement.".  A note in some other materials about Python 2->3 transition would be nice too.

Also... will the 2to3 tool support this?  What I mean is, if 2to3 sees "from __future__ import print_function", will it leave print function calls alone?  If not, could that be changed?

Thanks.

--- David A. Wheeler 

From dwheeler at dwheeler.com  Wed Oct 17 18:40:08 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Wed, 17 Oct 2007 12:40:08 -0400 (EDT)
Subject: [Python-3000] Please re-add __cmp__ to python 3000
Message-ID: <E1IiBwH-0003sz-04@fenris.runbox.com>

I said:
> I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).

Steven Bethard said:
>Why can't this just be supplied with a mixin?  Here's a recipe
>providing the appropriate mixins if you want to define a __key__
>function:
>    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

That _works_ from a functional perspective, and if Python3 fails to include direct support for __cmp__, then I think providing a built-in mixin is necessary.

But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function. Sorting is completely dominated by comparison time, and the mixin is a big lose for performance.  Basically, sorts always involve an inner loop that does comparisons, so any time comparison is slow, you're potentially dooming the whole program to a slow inner loop.  A mixin-calling-cmp model doubles the function call work; it has to find the mixin, call it, which eventually has to find and call the final cmp operator.

I did a test (see below), and the mixin using a simulated cmp took 50% MORE time to sort a list using Python 2.5 (see code below) than when __cmp__ is used directly (as you CAN do in Python 2.5).  A few tests with varying list size lead me to believe that this isn't linear - as the lists get longer the % performance hit increases.  In other words, it's a LOT slower, and as the size increases it gets far worse.   That kind of nasty performance hit will probably lead people to write lots of code that duplicates comparison functionality in each __lt__, __gt__, etc.  When the comparisons THEMSELVES are nontrivial, that will result in lots of duplicate, potentially-buggy code.  All of which is avoidable if __cmp__ is directly supported, as it ALREADY is in Python 1 and 2.

In addition, even IF the performance wasn't a big deal (and I think it is), I believe __cmp__ is the better basic operator in most cases.  As a style issue, I strongly prefer __cmp__ unless I have a specific need for comparisons which are atypical, e.g., where sometimes both __lt__ and __ge__ will return false given the same data (IEEE floats do this if you need exactly-IEEE-specified behavior of NaNs, etc.).  By preferring __cmp__ I eliminate lots of duplicate code, and once it's right, it's always right for ALL comparisons.  Sometimes __lt__ and friends are absolutely needed, e.g., when __lt__(x,y)==__gt__(x,y) for some values of x,y, but most of the time I find that they're an indicator of bad code and that __cmp__ should have been used instead.   Direct support of __cmp__ is a GOOD thing, not a wart or obsolete feature.

Adding a standard comparison mixin in a library is probably a good idea as well, but restoring __cmp__ is in my mind more important.  I can write my own mixin, but working around a failure to call __cmp__ gives a big performance hit.

--- David A. Wheeler


========================================
Here's my quick test code, in two files cmptest and cmptest2.
The whitespace may be munged by my mailer or the list, sorry if it is.

==== cmptest2 ====

#!/usr/bin/env python2

# cmp-test2.py

import timeit


time1 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberWithCmp(x) for x in randomlist]
""")
time2 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberMixinCmp(x) for x in randomlist]
""")

finaltime1 = time1.timeit(3)
finaltime2 = time2.timeit(3)

print finaltime1
print finaltime2



====== cmptest ======

#!/usr/bin/env python2

# cmp-test.py

import random
import timeit

class NumberWithCmp(object):
    "Store 'x' for comparison"
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def __cmp__(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)

# Mixin, similar to http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

class ComparisonMixin(object):
    "Implement <, >, etc. by invoking a 'cmp' function."
    def __lt__(self, other):
        return self.cmp(other) < 0
    def __le__(self, other):
        return self.cmp(other) <= 0
    def __gt__(self, other):
        return self.cmp(other) > 0
    def __ge__(self, other):
        return self.cmp(other) >= 0

class NumberMixinCmp(ComparisonMixin):
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def cmp(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)




From dwheeler at dwheeler.com  Wed Oct 17 18:57:38 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Wed, 17 Oct 2007 12:57:38 -0400 (EDT)
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IiBwH-0003sz-04@fenris.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
Message-ID: <E1IiCDC-000417-TE@fenris.runbox.com>

I said:
> I did a test (see below), and the mixin using a simulated cmp took
> 50% MORE time to sort a list using Python 2.5 (see code below) than
> when __cmp__ is used directly (as you CAN do in Python 2.5).

Oops, I forgot to post the actual numbers. Here they are, on my box (your mileage will CERTAINLY vary):

$ ./cmptest2.py 
7.34321498871
10.9759318829

$ ./cmptest2.py 
7.30745196342
10.9110951424

$ ./cmptest2.py 
7.25755906105
10.9108018875

In each run, the first number is the # of seconds to do the sort, using __cmp__; the second is the number of seconds, using a mixin.  I ran it 3 times, and took the min of each.  Using the min() of each number, we have a mixin performance overhead of  (10.91-7.26)/7.26 = 50.3%

--- David A. Wheeler


From adam at hupp.org  Wed Oct 17 19:21:18 2007
From: adam at hupp.org (Adam Hupp)
Date: Wed, 17 Oct 2007 13:21:18 -0400
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IiBwH-0003sz-04@fenris.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
Message-ID: <766a29bd0710171021g7e5d67d0ic6f5ae1d944a1765@mail.gmail.com>

On 10/17/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> class NumberMixinCmp(ComparisonMixin):
...
>     def cmp(self, other):
>         if self.x == other.x: return 0
>         return (-1 if self.x < other.x else 1)

In the common case the == test will be false.  About  ~1/2 of the
tests will be be <, and half >.   It's better then to do:

       if self.x < other.x:
           return -1
       elif self.x > other.x:
           return 1
       else:
           return 0

This almost halves the time difference, as you would expect.

-- 
Adam Hupp | http://hupp.org/adam/

From guido at python.org  Wed Oct 17 19:23:39 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 17 Oct 2007 10:23:39 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IiBwH-0003sz-04@fenris.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
Message-ID: <ca471dc20710171023v24924844g70248982131fa2c3@mail.gmail.com>

On 10/17/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I said:
> > I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).
>
> Steven Bethard said:
> >Why can't this just be supplied with a mixin?  Here's a recipe
> >providing the appropriate mixins if you want to define a __key__
> >function:
> >    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403
>
> That _works_ from a functional perspective, and if Python3 fails to include direct support for __cmp__, then I think providing a built-in mixin is necessary.
>
> But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function.

However, note that Python's sort() and sorted() are guaranteed to only use '<'.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Wed Oct 17 19:27:38 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 17 Oct 2007 11:27:38 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IiBwH-0003sz-04@fenris.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
Message-ID: <d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>

On 10/17/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I said:
> > I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).
>
> Steven Bethard said:
> >Why can't this just be supplied with a mixin?  Here's a recipe
> >providing the appropriate mixins if you want to define a __key__
> >function:
> >    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403
>
> That _works_ from a functional perspective, and if Python3 fails to
> include direct support for __cmp__, then I think providing a built-in
> mixin is necessary.
>
> But mixins for comparison are a BIG LOSER for sort performance
> if your fundamental operator is a cmp-like function.
[snip]
> I did a test (see below), and the mixin using a simulated cmp took
> 50% MORE time to sort a list using Python 2.5

Patient: When I move my arm, it hurts.
Doctor: Well don't move your arm then.

;-)

I'm having troubles coming up with things where the *basic* operator
is really a cmp-like function.  Even in your example, the cmp function
was defined in terms of "less than". If the basic operator is really
"less than", then why define a cmp() function at all? Particularly
since, even in Python 2.5, sorting is faster when you define __lt__
instead of __cmp__::

    class NumberWithLessThan(object):
        def __init__(self, data):
            self.data = data
        def __lt__(self, other):
            return self.data < other.data

    class NumberWithCmp(object):
        def __init__(self, data):
            self.data = data
       def __cmp__(self, other):
           return cmp(self.data, other.data)

    $ python -m timeit -s "import script, random" "data =
[script.NumberWithLessThan(i) for i in xrange(1000)];
random.shuffle(data); data.sort()"
    100 loops, best of 3: 7.93 msec per loop

    $ python -m timeit -s "import script, random" "data =
[script.NumberWithCmp(i) for i in xrange(1000)]; random.shuffle(data);
data.sort()"
    100 loops, best of 3: 10.5 msec per loop

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From aahz at pythoncraft.com  Wed Oct 17 21:25:32 2007
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 17 Oct 2007 12:25:32 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
Message-ID: <20071017192532.GA23548@panix.com>

On Wed, Oct 17, 2007, Steven Bethard wrote:
>
> I'm having troubles coming up with things where the *basic* operator
> is really a cmp-like function.  Even in your example, the cmp function
> was defined in terms of "less than". If the basic operator is really
> "less than", then why define a cmp() function at all? 

>From my perspective, the real use case for cmp() is when you want to do
a three-way comparison of a "large" object (for example, a Decimal
instance).  You can store the result of cmp() and then do a separate
three-way branch.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

The best way to get information on Usenet is not to ask a question, but
to post the wrong information.

From guido at python.org  Thu Oct 18 01:00:23 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 17 Oct 2007 16:00:23 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
Message-ID: <ca471dc20710171600y21ecd788y2306e265f66a0e78@mail.gmail.com>

On 10/17/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> I'm having troubles coming up with things where the *basic* operator
> is really a cmp-like function.

Here's one. When implementing the '<' operator on lists or tuples, you
really want to call the 'cmp' operator on the individual items,
because otherwise (if all you have is == and <) the algorithm becomes
something like "compare for equality until you've found the first pair
of items that are unequal; then compare those items again using < to
decide the final outcome". If you don't believe this, try to implement
this operation using only == or < without comparing any two items more
than once.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu Oct 18 01:36:48 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 18 Oct 2007 12:36:48 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1IiBwH-0003sz-04@fenris.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
Message-ID: <47169C90.2070003@canterbury.ac.nz>

David A. Wheeler wrote:
> But mixins for comparison are a BIG LOSER for sort performance

Why not provide a __richcmp__ method that directly connects
with the corresponding type slot? All the comparisons
eventually end up there anyway, so it seems like the
right place to provide a one-stop comparison method
in the 3.0 age.

--
Greg

From greg.ewing at canterbury.ac.nz  Thu Oct 18 01:44:46 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 18 Oct 2007 12:44:46 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
Message-ID: <47169E6E.7000804@canterbury.ac.nz>

Steven Bethard wrote:
> I'm having troubles coming up with things where the *basic* operator
> is really a cmp-like function.

Think of things like comparing a tuple. You need to work your
way along and recursively compare the elements. The decision
about when to stop always involves ==, whatever comparison
you're trying to do. So if e.g. you're doing <, then you have
to test each element first for <, and if that's false, test
it for ==. If the element is itself a tuple, it's doing this
on its elements too, etc., and things get very inefficient.

If you have a single cmp operation that you can apply to the
elements, you only need to do it once for each element and it
gives you all the information you need.

--
Greg

From guido at python.org  Fri Oct 19 23:06:02 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 19 Oct 2007 14:06:02 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
Message-ID: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>

On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> I'd like to make complete implementation of PEP 3137 the goal for the
> 3.0a2 release. It should be doable to do this release by the end of
> October. I don't think anything else *needs* to be done to have a
> successful a2 release.

I'm still hopeful, though realistically we may not quite make it.
Here's a status update on the issues I identified in my last message
(plus some identified afterwards):



> - remove locale support from PyString

Done.

> - remove compatibility with PyUnicode from PyString
> - remove compatibility with PyString from PyUnicode

Not done yet.

> - add missing methods to PyBytes (for list, see the PEP and compare to
> what's already there)

Done (Gregory P. Smith)

> - remove buffer API from PyUnicode

Done.

> - make == and != between PyBytes and PyUnicode return False instead of
> raising TypeError

Done.

> - make == and != between PyString and Pyunicode return False instead
> of converting

A patch by Thomas Lee exists: http://bugs.python.org/issue1263
However it breaks some unit tests.

> - make comparisons between PyString and PyBytes work (these are
> properly ordered)

Already works.

> - change lots of places (e.g. encoders) to return PyString instead of PyBytes

Not done.

> - change indexing and iteration over PyString to return ints, not
> 1-char PyStrings

A patch by Alexandre Vassalotti exists but breaks some unit tests:
http://bugs.python.org/issue1280

> - change PyString's repr() to return "b'...'"
> - change PyBytes's repr() to return "buffer(b'...')"
> - change parser so that b"..." returns PyString, not PyBytes
> - rename bytes -> buffer, str8 -> bytes

A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
http://bugs.python.org/issue1247
However it breaks too many tests to be applied right now.

> - change the constructor for PyString to match the one for PyBytes

Not done.

> - change PyBytes so that its str() is the same as its repr().
> - change PyString so that its str() is the same as its repr().

Not done.

> - add an iteration view over PyBytes (optional)

Not yet done (Christian Heimes offered).

> - kill basestring.

Done (Christian Heimes).

> - move initialization of sys.std{in,out,err} into C code and do it earlier.

A patch by Christian Heimes exists: http://bugs.python.org/issue1267
However it still breaks some unit tests...



All, please provide updated information if I missed a contribution!
I'm still hoping for more contributions. I will also try to guide the
existing patches into completion and acceptance.

There are also some issues that mainly crop up in non-English locales.
We will try to get to the bottom of those before releasing 3.0a2, but
I need help as I'm myself absolutely unable to work with locales (and
I don't have access to a Windows box).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Fri Oct 19 23:24:17 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 19 Oct 2007 14:24:17 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
Message-ID: <bbaeab100710191424p1713e9bev87661316e69f5874@mail.gmail.com>

On 10/19/07, Guido van Rossum <guido at python.org> wrote:
> On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > I'd like to make complete implementation of PEP 3137 the goal for the
> > 3.0a2 release. It should be doable to do this release by the end of
> > October. I don't think anything else *needs* to be done to have a
> > successful a2 release.
>
> I'm still hopeful, though realistically we may not quite make it.
> Here's a status update on the issues I identified in my last message
> (plus some identified afterwards):
[SNIP]
>
> > - make == and != between PyString and Pyunicode return False instead
> > of converting
>
> A patch by Thomas Lee exists: http://bugs.python.org/issue1263
> However it breaks some unit tests.
>
[SNIP]
> A patch by Alexandre Vassalotti exists but breaks some unit tests:
> http://bugs.python.org/issue1280
>
> > - change PyString's repr() to return "b'...'"
> > - change PyBytes's repr() to return "buffer(b'...')"
> > - change parser so that b"..." returns PyString, not PyBytes
> > - rename bytes -> buffer, str8 -> bytes
>
> A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
> http://bugs.python.org/issue1247
> However it breaks too many tests to be applied right now.
[SNIP]
> > - move initialization of sys.std{in,out,err} into C code and do it earlier.
>
> A patch by Christian Heimes exists: http://bugs.python.org/issue1267
> However it still breaks some unit tests...

With so many patches now floating around, I figure getting some help
with patch approval is probably the most useful.  Is there a specific
patch you would like to see get applied above the others?  Or does it
not matter and one should just grab any of them and just try to fix a
test or two when one has the spare time?

-Brett

From guido at python.org  Fri Oct 19 23:28:43 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 19 Oct 2007 14:28:43 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <bbaeab100710191424p1713e9bev87661316e69f5874@mail.gmail.com>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
	<bbaeab100710191424p1713e9bev87661316e69f5874@mail.gmail.com>
Message-ID: <ca471dc20710191428g75a61f76u4f8a3868ab9f535c@mail.gmail.com>

On 10/19/07, Brett Cannon <brett at python.org> wrote:
> On 10/19/07, Guido van Rossum <guido at python.org> wrote:
> > On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > > I'd like to make complete implementation of PEP 3137 the goal for the
> > > 3.0a2 release. It should be doable to do this release by the end of
> > > October. I don't think anything else *needs* to be done to have a
> > > successful a2 release.
> >
> > I'm still hopeful, though realistically we may not quite make it.
> > Here's a status update on the issues I identified in my last message
> > (plus some identified afterwards):
> [SNIP]
> >
> > > - make == and != between PyString and Pyunicode return False instead
> > > of converting
> >
> > A patch by Thomas Lee exists: http://bugs.python.org/issue1263
> > However it breaks some unit tests.
> >
> [SNIP]
> > A patch by Alexandre Vassalotti exists but breaks some unit tests:
> > http://bugs.python.org/issue1280
> >
> > > - change PyString's repr() to return "b'...'"
> > > - change PyBytes's repr() to return "buffer(b'...')"
> > > - change parser so that b"..." returns PyString, not PyBytes
> > > - rename bytes -> buffer, str8 -> bytes
> >
> > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
> > http://bugs.python.org/issue1247
> > However it breaks too many tests to be applied right now.
> [SNIP]
> > > - move initialization of sys.std{in,out,err} into C code and do it earlier.
> >
> > A patch by Christian Heimes exists: http://bugs.python.org/issue1267
> > However it still breaks some unit tests...
>
> With so many patches now floating around, I figure getting some help
> with patch approval is probably the most useful.  Is there a specific
> patch you would like to see get applied above the others?  Or does it
> not matter and one should just grab any of them and just try to fix a
> test or two when one has the spare time?

Alas, al of them have problems where they break several unit tests in
a fairly deep way. I've made several aborted attempts already at
assessing how close each one is, but I got distracted each time (this
has been an extra busy week at Google). I'm making a commitment now to
doing nothing but this the rest of this afternoon.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Fri Oct 19 23:50:58 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 19 Oct 2007 23:50:58 +0200
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
Message-ID: <471926C2.3010103@cheimes.de>

Guido van Rossum wrote:
>> - change PyString's repr() to return "b'...'"
>> - change PyBytes's repr() to return "buffer(b'...')"
>> - change parser so that b"..." returns PyString, not PyBytes
>> - rename bytes -> buffer, str8 -> bytes
> 
> A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
> http://bugs.python.org/issue1247
> However it breaks too many tests to be applied right now.

Yes, it's breaking horrible. It doesn't make sense to work on the fixes
until "change the constructor for PyString to match the one for PyBytes"
is done. PyString needs to accept an optional encoding and error argument.

>> - add an iteration view over PyBytes (optional)
> 
> Not yet done (Christian Heimes offered).

I only pointed out that it's missing. I didn't say that I would write it
because I don't feel qualified and experienced enough for it.

> A patch by Christian Heimes exists: http://bugs.python.org/issue1267
> However it still breaks some unit tests...

Which unit tests are broken for you? test_cProfile test_doctest
test_email test_profile are broken for me in a vanilla build of py3k. My
patch doesn't break additional tests for me.

By the way I may have figured out how to fix the profile tests.

Christian

From guido at python.org  Fri Oct 19 23:57:06 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 19 Oct 2007 14:57:06 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <471926C2.3010103@cheimes.de>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
	<471926C2.3010103@cheimes.de>
Message-ID: <ca471dc20710191457t5ba50d50w34d6343412dc9dea@mail.gmail.com>

On 10/19/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> >> - change PyString's repr() to return "b'...'"
> >> - change PyBytes's repr() to return "buffer(b'...')"
> >> - change parser so that b"..." returns PyString, not PyBytes
> >> - rename bytes -> buffer, str8 -> bytes
> >
> > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
> > http://bugs.python.org/issue1247
> > However it breaks too many tests to be applied right now.
>
> Yes, it's breaking horrible. It doesn't make sense to work on the fixes
> until "change the constructor for PyString to match the one for PyBytes"
> is done. PyString needs to accept an optional encoding and error argument.

Of course. I didn't mean to imply there was a problem with the patch, sorry.

> >> - add an iteration view over PyBytes (optional)
> >
> > Not yet done (Christian Heimes offered).
>
> I only pointed out that it's missing. I didn't say that I would write it
> because I don't feel qualified and experienced enough for it.

Oops, sorry again.

> > A patch by Christian Heimes exists: http://bugs.python.org/issue1267
> > However it still breaks some unit tests...
>
> Which unit tests are broken for you? test_cProfile test_doctest
> test_email test_profile are broken for me in a vanilla build of py3k. My
> patch doesn't break additional tests for me.

I'll look into it. Maybe I misremember.

> By the way I may have figured out how to fix the profile tests.

Cooll submit to the tracker and assign to me any time.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Sat Oct 20 00:26:27 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 20 Oct 2007 00:26:27 +0200
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <ca471dc20710191457t5ba50d50w34d6343412dc9dea@mail.gmail.com>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>	
	<471926C2.3010103@cheimes.de>
	<ca471dc20710191457t5ba50d50w34d6343412dc9dea@mail.gmail.com>
Message-ID: <47192F13.1040305@cheimes.de>

Guido van Rossum wrote:
>> I only pointed out that it's missing. I didn't say that I would write it
>> because I don't feel qualified and experienced enough for it.
> 
> Oops, sorry again.

I may take it as a challenge to write the view but I don't know if I can
handle it. I'm still learning how to program C for Python. It may be a
good opportunity to learn more. If you don't mind that it may take
longer and if somebody could lend me a hand ... :]

>> Which unit tests are broken for you? test_cProfile test_doctest
>> test_email test_profile are broken for me in a vanilla build of py3k. My
>> patch doesn't break additional tests for me.
> 
> I'll look into it. Maybe I misremember.

I don't see additional failing unit tests. My patch had an issue but I
fixed it couple of days ago. Maybe you can't remember the fix.

>> By the way I may have figured out how to fix the profile tests.
> 
> Cooll submit to the tracker and assign to me any time.

I can't assign bugs with my current user level but I added you to the
nosy list.

http://bugs.python.org/issue1302

test_mail fails because the file Lib/email/test/data/msg_15.txt contains
an invalid UTF-8 character in "Da dit postl?sningsprogram". The text
looks like something should fail:

    def test_same_boundary_inner_outer(self):
        unless = self.failUnless
        msg = self._msgobj('msg_15.txt')
        # XXX We can probably eventually do better
        inner = msg.get_payload(0)
        unless(hasattr(inner, 'defects'))
        self.assertEqual(len(inner.defects), 1)
        unless(isinstance(inner.defects[0],
                          errors.StartBoundaryNotFoundDefect))

Christian

From brett at python.org  Sat Oct 20 00:30:09 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 19 Oct 2007 15:30:09 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <47192F13.1040305@cheimes.de>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
	<471926C2.3010103@cheimes.de>
	<ca471dc20710191457t5ba50d50w34d6343412dc9dea@mail.gmail.com>
	<47192F13.1040305@cheimes.de>
Message-ID: <bbaeab100710191530r62dbbb6dt229ee7766370d961@mail.gmail.com>

On 10/19/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> >> I only pointed out that it's missing. I didn't say that I would write it
> >> because I don't feel qualified and experienced enough for it.
> >
> > Oops, sorry again.
>
> I may take it as a challenge to write the view but I don't know if I can
> handle it. I'm still learning how to program C for Python. It may be a
> good opportunity to learn more. If you don't mind that it may take
> longer and if somebody could lend me a hand ... :]
>
> >> Which unit tests are broken for you? test_cProfile test_doctest
> >> test_email test_profile are broken for me in a vanilla build of py3k. My
> >> patch doesn't break additional tests for me.
> >
> > I'll look into it. Maybe I misremember.
>
> I don't see additional failing unit tests. My patch had an issue but I
> fixed it couple of days ago. Maybe you can't remember the fix.
>
> >> By the way I may have figured out how to fix the profile tests.
> >
> > Cooll submit to the tracker and assign to me any time.
>
> I can't assign bugs with my current user level but I added you to the
> nosy list.
>
> http://bugs.python.org/issue1302

I went ahead and did the assignment, but there is no patch.  =)

-Brett

From g.brandl at gmx.net  Sat Oct 20 00:36:06 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 20 Oct 2007 00:36:06 +0200
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <471926C2.3010103@cheimes.de>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
	<471926C2.3010103@cheimes.de>
Message-ID: <ffbbgp$1ol$1@ger.gmane.org>

Christian Heimes schrieb:
> Guido van Rossum wrote:
>>> - change PyString's repr() to return "b'...'"
>>> - change PyBytes's repr() to return "buffer(b'...')"
>>> - change parser so that b"..." returns PyString, not PyBytes
>>> - rename bytes -> buffer, str8 -> bytes
>> 
>> A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
>> http://bugs.python.org/issue1247
>> However it breaks too many tests to be applied right now.
> 
> Yes, it's breaking horrible. It doesn't make sense to work on the fixes
> until "change the constructor for PyString to match the one for PyBytes"
> is done. PyString needs to accept an optional encoding and error argument.

I do that, currently, patch should be up in a minute.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From nnorwitz at gmail.com  Sat Oct 20 01:41:06 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Fri, 19 Oct 2007 16:41:06 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <47192F13.1040305@cheimes.de>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
	<471926C2.3010103@cheimes.de>
	<ca471dc20710191457t5ba50d50w34d6343412dc9dea@mail.gmail.com>
	<47192F13.1040305@cheimes.de>
Message-ID: <ee2a432c0710191641t5b0bf66eg27562d68ed395f78@mail.gmail.com>

On 10/19/07, Christian Heimes <lists at cheimes.de> wrote:
>
> I may take it as a challenge to write the view but I don't know if I can
> handle it. I'm still learning how to program C for Python. It may be a
> good opportunity to learn more. If you don't mind that it may take
> longer and if somebody could lend me a hand ... :]

I think questions related to how to make these sorts of changes are
on-topic for this list.  If you get stuck, feel free to ask here.

If you prefer, you (or others) can mail me privately.  I tend to
answer the questions at my night time (US Pacific), so I may not
always be fast with answering.

n

From tom at vector-seven.com  Mon Oct 15 15:10:22 2007
From: tom at vector-seven.com (Thomas Lee)
Date: Mon, 15 Oct 2007 23:10:22 +1000
Subject: [Python-3000] PEP3137: str8() and str() comparison
Message-ID: <471366BE.50303@vector-seven.com>

I just uploaded a patch with all my progress on str8/str comparisons here:

http://bugs.python.org/issue1263

I would really like some help from anybody knowledgeable with the 
following tests:

test_compile
test_str
test_struct
test_sqlite

As discussed in the issue tracker, these are all failing for various 
reasons: in all cases I'm not exactly sure how to progress.

The following are also failing for me, although this would appear to be 
unrelated to my patch:

test_doctest
test_email
test_nis
test_pty

Are these failing for anybody else?

Cheers,
Tom

From lists at cheimes.de  Mon Oct 22 04:13:52 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 22 Oct 2007 04:13:52 +0200
Subject: [Python-3000] Failing unit tests on WIndows
Message-ID: <ffh110$36h$1@ger.gmane.org>

Python 3000 needs some love from Windows developers. The test were run
on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed
pythoncore project file. My build environment has no devenv.exe so bsddb
is missing.

252 tests OK.
20 tests failed:
    test_csv test_dumbdbm test_file test_fileinput test_gettext
    test_io test_mailbox test_netrc test_pep277 test_shutil
    test_sqlite test_strptime test_subprocess test_tarfile
    test_tempfile test_threaded_import test_threadedtempfile test_time
    test_urllib test_zipfile
48 tests skipped:
    test__locale test_aepack test_applesingle test_bsddb test_bsddb3
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_commands test_crypt
    test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm
    test_grp test_ioctl test_largefile test_macostools test_mhlib
    test_nis test_normalization test_openpty test_ossaudiodev
    test_pipes test_plistlib test_poll test_posix test_pty test_pwd
    test_resource test_scriptpackages test_signal test_socket_ssl
    test_socketserver test_ssl test_syslog test_threadsignals
    test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4
    test_xmlrpc_net test_zipfile64
3 skips unexpected on win32:
    test_ssl test_syslog test_bsddb

Christian


From lists at cheimes.de  Mon Oct 22 04:27:26 2007
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 22 Oct 2007 04:27:26 +0200
Subject: [Python-3000] Failing unit tests on WIndows
In-Reply-To: <ffh110$36h$1@ger.gmane.org>
References: <ffh110$36h$1@ger.gmane.org>
Message-ID: <ffh1qe$4jb$1@ger.gmane.org>

Fix for tempfile bug on Windows:

http://bugs.python.org/issue1310

Fix for project file:

http://bugs.python.org/issue1309

By the way to build repository contains an old version of OpenSSL 0.9.8a
while OpenSSL 0.9.8g is out. 0.9.8a is more than 2 years old and doesn't
build cleanly with VS 2005. Could somebody please update it to
http://openssl.org/source/openssl-0.9.8g.tar.gz ?

Christian


From guido at python.org  Mon Oct 22 05:33:57 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 21 Oct 2007 20:33:57 -0700
Subject: [Python-3000] Failing unit tests on WIndows
In-Reply-To: <ffh110$36h$1@ger.gmane.org>
References: <ffh110$36h$1@ger.gmane.org>
Message-ID: <ca471dc20710212033s17166d35we980e47f51aca99@mail.gmail.com>

Thanks for taking the time to do this, Chris! I'm sure the fixes you
posted separately will be checked in soon. Hopefully others will jump
in with fixes for more of the issues below.

--Guido

2007/10/21, Christian Heimes <lists at cheimes.de>:
> Python 3000 needs some love from Windows developers. The test were run
> on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed
> pythoncore project file. My build environment has no devenv.exe so bsddb
> is missing.
>
> 252 tests OK.
> 20 tests failed:
>     test_csv test_dumbdbm test_file test_fileinput test_gettext
>     test_io test_mailbox test_netrc test_pep277 test_shutil
>     test_sqlite test_strptime test_subprocess test_tarfile
>     test_tempfile test_threaded_import test_threadedtempfile test_time
>     test_urllib test_zipfile
> 48 tests skipped:
>     test__locale test_aepack test_applesingle test_bsddb test_bsddb3
>     test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
>     test_codecmaps_kr test_codecmaps_tw test_commands test_crypt
>     test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm
>     test_grp test_ioctl test_largefile test_macostools test_mhlib
>     test_nis test_normalization test_openpty test_ossaudiodev
>     test_pipes test_plistlib test_poll test_posix test_pty test_pwd
>     test_resource test_scriptpackages test_signal test_socket_ssl
>     test_socketserver test_ssl test_syslog test_threadsignals
>     test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4
>     test_xmlrpc_net test_zipfile64
> 3 skips unexpected on win32:
>     test_ssl test_syslog test_bsddb
>
> Christian
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Mon Oct 22 22:27:44 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 22 Oct 2007 13:27:44 -0700
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>
Message-ID: <bbaeab100710221327q30d79bddo6f1b09393118e85@mail.gmail.com>

On 10/19/07, Guido van Rossum <guido at python.org> wrote:
[SNIP]
> > - make == and != between PyString and Pyunicode return False instead
> > of converting
>
> A patch by Thomas Lee exists: http://bugs.python.org/issue1263
> However it breaks some unit tests.

This is now done.

-Brett

From lists at cheimes.de  Tue Oct 23 03:15:25 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 23 Oct 2007 03:15:25 +0200
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <ffbbgp$1ol$1@ger.gmane.org>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>	<471926C2.3010103@cheimes.de>
	<ffbbgp$1ol$1@ger.gmane.org>
Message-ID: <471D4B2D.1090905@cheimes.de>

Georg Brandl wrote:
> I do that, currently, patch should be up in a minute.

How is your patch? It's not in the svn repository yet.

Christian

From g.brandl at gmx.net  Tue Oct 23 07:59:51 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 23 Oct 2007 07:59:51 +0200
Subject: [Python-3000] PEP 3137 plan of attack {stage 2]
In-Reply-To: <471D4B2D.1090905@cheimes.de>
References: <ca471dc20710191406v1091d13fp8c7327254e45c016@mail.gmail.com>	<471926C2.3010103@cheimes.de>	<ffbbgp$1ol$1@ger.gmane.org>
	<471D4B2D.1090905@cheimes.de>
Message-ID: <ffk2hg$4t9$1@ger.gmane.org>

Christian Heimes schrieb:
> Georg Brandl wrote:
>> I do that, currently, patch should be up in a minute.
> 
> How is your patch? It's not in the svn repository yet.

It's in issue 1303.

Georg


From gnewsg at gmail.com  Mon Oct 22 15:07:50 2007
From: gnewsg at gmail.com (Giampaolo Rodola')
Date: Mon, 22 Oct 2007 06:07:50 -0700
Subject: [Python-3000] Failing unit tests on WIndows
In-Reply-To: <ca471dc20710212033s17166d35we980e47f51aca99@mail.gmail.com>
References: <ffh110$36h$1@ger.gmane.org>
	<ca471dc20710212033s17166d35we980e47f51aca99@mail.gmail.com>
Message-ID: <1193058470.684200.189820@q3g2000prf.googlegroups.com>

On 22 Ott, 04:13, Christian Heimes <li... at cheimes.de> wrote:
> Python 3000 needs some love from Windows developers. The test were run
> on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed
> pythoncore project file. My build environment has no devenv.exe so bsddb
> is missing.
>
> 252 tests OK.
> 20 tests failed:
>     test_csv test_dumbdbm test_file test_fileinput test_gettext
>     test_io test_mailbox test_netrc test_pep277 test_shutil
>     test_sqlite test_strptime test_subprocess test_tarfile
>     test_tempfile test_threaded_import test_threadedtempfile test_time
>     test_urllib test_zipfile
> 48 tests skipped:
>     test__locale test_aepack test_applesingle test_bsddb test_bsddb3
>     test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
>     test_codecmaps_kr test_codecmaps_tw test_commands test_crypt
>     test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm
>     test_grp test_ioctl test_largefile test_macostools test_mhlib
>     test_nis test_normalization test_openpty test_ossaudiodev
>     test_pipes test_plistlib test_poll test_posix test_pty test_pwd
>     test_resource test_scriptpackages test_signal test_socket_ssl
>     test_socketserver test_ssl test_syslog test_threadsignals
>     test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4
>     test_xmlrpc_net test_zipfile64
> 3 skips unexpected on win32:
>     test_ssl test_syslog test_bsddb
>
> Christian
>
> _______________________________________________
> Python-3000 mailing list
> Python-3... at python.orghttp://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:http://mail.python.org/mailman/options/python-3000/python-3000-garchi...

Most error seems to be attributable to Unicode-related problems:

UnicodeDecodeError: 'utf8' codec can't decode bytes in position xx-yy:
invalid data

The following tests DO NOT fail on my Windows XP prof sp2 box:
test_sqlite, test_strptime, test_tarfile, test_threaded_import,
test_threadedtempfile, test_time, test_urllib, test_zipfile.


From lists at cheimes.de  Tue Oct 23 10:08:12 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 23 Oct 2007 10:08:12 +0200
Subject: [Python-3000] Failing unit tests on WIndows
In-Reply-To: <1193058470.684200.189820@q3g2000prf.googlegroups.com>
References: <ffh110$36h$1@ger.gmane.org>	<ca471dc20710212033s17166d35we980e47f51aca99@mail.gmail.com>
	<1193058470.684200.189820@q3g2000prf.googlegroups.com>
Message-ID: <ffka5e$p8l$1@ger.gmane.org>

Giampaolo Rodola' wrote:
> Most error seems to be attributable to Unicode-related problems:
> 
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position xx-yy:
> invalid data
> 
> The following tests DO NOT fail on my Windows XP prof sp2 box:
> test_sqlite, test_strptime, test_tarfile, test_threaded_import,
> test_threadedtempfile, test_time, test_urllib, test_zipfile.

A bunch of tests are already fixed (r58590 and r58593). Some of the
failing tests depend on the locale and time zone. They don't break when
I "set TZ=GMT" on the console before I run the test suite.

257 tests OK.
15 tests failed:
    test_codeccallbacks test_csv test_ctypes test_dumbdbm test_file
    test_fileinput test_gettext test_io test_mailbox test_netrc
    test_pep277 test_strptime test_subprocess test_tempfile test_time
48 tests skipped:
    test__locale test_aepack test_applesingle test_bsddb test_bsddb3
    test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
    test_codecmaps_kr test_codecmaps_tw test_commands test_crypt
    test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm
    test_grp test_ioctl test_largefile test_macostools test_mhlib
    test_nis test_normalization test_openpty test_ossaudiodev
    test_pipes test_plistlib test_poll test_posix test_pty test_pwd
    test_resource test_scriptpackages test_signal test_socket_ssl
    test_socketserver test_ssl test_syslog test_threadsignals
    test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4
    test_xmlrpc_net test_zipfile64
3 skips unexpected on win32:
    test_ssl test_syslog test_bsddb

With set TZ=GMT test_time and test_strptime pass.

Christian



From g.brandl at gmx.net  Tue Oct 23 20:47:45 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 23 Oct 2007 20:47:45 +0200
Subject: [Python-3000] PyInt_AS_LONG error checking
Message-ID: <fflfh9$2su$1@ger.gmane.org>

PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification.

However, most places that use this macro (and also places that
use PyInt_AsLong) assume it cannot fail which means that an exception
won't be properly propagated in that case.

If I don't overlook something here, all these places have to be fixed...

Georg


From martin at v.loewis.de  Tue Oct 23 21:03:54 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Oct 2007 21:03:54 +0200
Subject: [Python-3000] PyInt_AS_LONG error checking
In-Reply-To: <fflfh9$2su$1@ger.gmane.org>
References: <fflfh9$2su$1@ger.gmane.org>
Message-ID: <471E459A.6060605@v.loewis.de>

> PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification.
> 
> However, most places that use this macro (and also places that
> use PyInt_AsLong) assume it cannot fail which means that an exception
> won't be properly propagated in that case.
> 
> If I don't overlook something here, all these places have to be fixed...

I think you do overlook something. Many of these places do
PyInt_CheckExact before invoking the macro. PyInt_CheckExact includes
_PyLong_FitsInLong, so if that test returns true, then PyInt_AS_LONG
cannot fail.

So the only places that need to be fixed are those where PyInt_AS_LONG
isn't protected by PyInt_CheckExact.

HTH,
Martin

From g.brandl at gmx.net  Tue Oct 23 21:19:45 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 23 Oct 2007 21:19:45 +0200
Subject: [Python-3000] PyInt_AS_LONG error checking
In-Reply-To: <471E459A.6060605@v.loewis.de>
References: <fflfh9$2su$1@ger.gmane.org> <471E459A.6060605@v.loewis.de>
Message-ID: <fflhd8$997$1@ger.gmane.org>

Martin v. L?wis schrieb:
>> PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification.
>> 
>> However, most places that use this macro (and also places that
>> use PyInt_AsLong) assume it cannot fail which means that an exception
>> won't be properly propagated in that case.
>> 
>> If I don't overlook something here, all these places have to be fixed...
> 
> I think you do overlook something. Many of these places do
> PyInt_CheckExact before invoking the macro. PyInt_CheckExact includes
> _PyLong_FitsInLong, so if that test returns true, then PyInt_AS_LONG
> cannot fail.
> 
> So the only places that need to be fixed are those where PyInt_AS_LONG
> isn't protected by PyInt_CheckExact.

Ok, thanks, that explains it.

Georg

BTW, _PyLong_FitsInLong says "/* conservative estimate */" -- it doesn't
really allow the whole range of C long...





From guido at python.org  Tue Oct 23 21:30:07 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 23 Oct 2007 12:30:07 -0700
Subject: [Python-3000] Three new failing tests?
Message-ID: <ca471dc20710231230q2abaac94y55b68284b42fc8ba@mail.gmail.com>

I've got three tests failing in my py3k branch, on Linux:

$ ./python Lib/test/regrtest.py test_codeccallbacks test_ctypes test_locale
test_codeccallbacks
test test_codeccallbacks failed -- Traceback (most recent call last):
  File "/usr/local/google/home/guido/python/py3kd/Lib/test/test_codeccallbacks.py",
line 795, in test_translatehelper
    self.assertRaises(ValueError, "\xff".translate, D())
AssertionError: ValueError not raised by translate

test_ctypes
test test_ctypes failed -- errors occurred; run in verbose mode for details
test_locale
test test_locale produced unexpected output:
**********************************************************************
*** lines 2-5 of actual output doesn't appear in expected output after line 1:
+ s'\xc3\xac\xc2\xa0\xc2\xbc'.split() == [s'\xc3\xac\xc2\xa0\xc2\xbc']
!= ['\xec\xa0\xbc']
+ s'\xc3\xad\xc2\x95\xc2\xa0'.strip() == s'\xc3\xad\xc2\x95\xc2\xa0'
!= '\xed\x95\xa0'
+ s'\xc3\x8c\xc2\x85'.lower() == s'\xc3\x8c\xc2\x85' != '\xcc\x85'
+ s'\xc3\xad\xc2\x95\xc2\xa0'.upper() == s'\xc3\xad\xc2\x95\xc2\xa0'
!= '\xed\x95\xa0'
**********************************************************************
3 tests failed:
    test_codeccallbacks test_ctypes test_locale
[94873 refs]
$

I don't think Georg's latest checkin (PyInt_Check/PyLong_Check issues)
broke these.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Oct 23 21:36:16 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 23 Oct 2007 12:36:16 -0700
Subject: [Python-3000] Question about email/generator.py
In-Reply-To: <AC3AAF69-E18A-44A8-A860-92DA7D5BB5E5@python.org>
References: <ca471dc20710101538s2cc9fdm8998734b16fb41b8@mail.gmail.com>
	<AC3AAF69-E18A-44A8-A860-92DA7D5BB5E5@python.org>
Message-ID: <ca471dc20710231236o2ec7c1e1h6a3449fe383e7153@mail.gmail.com>

There's an issue in the email package that I can't resolve by myself.
I described it to Barry like this:

> > So in generator.py on line 291, I read:
> >
> >   print(part.get_payload(decode=True), file=self)
> >
> > It turns out that part.get_payload(decode=True) returns a bytes
> > object, and printing a bytes object to a text file is not the right
> > thing to do -- in 3.0a1 it silently just prints those bytes, in 3.0a2
> > it will probably print the repr() of the bytes object. Right now, it
> > errors out because I'm removing the encode() method on PyString
> > objects, and print() converts PyBytes to PyString; then the
> > TextIOWrapper.write() method tries to encode its argument.
> >
> > If I change this to (decode=False), all tests in the email package
> > pass. But is this the right fix???

I should note that this was checked in by the time Barry replied, even
though it clearly was the wrong thing to do. Barry replied:

> Maybe. ;)  The problem is that this API is either being too smart for
> its own good, or not smart enough.  The intent of decode=True is to
> return the original object encoded in the payload.  So for example,
> if MIMEImage was used to encode some jpeg, then decode=True should
> return that jpeg.
>
> The problem is that what you really want is something that's content-
> type aware, such that if your main type is some non-text type like
> image/* or audio/* or even application/octet-stream, you will almost
> always want a bytes object back.  But text can also be encoded via
> charset and/or transfer-encoding, and (at least in Py2.x), you'd use
> the same method to get the original, unencoded text back.  In that
> case, you definitely want the string, since that's the most natural
> API (i.e. you fed it a string object when you created the MIMEText,
> so you want a string on the way back out).
>
> This is yet another corner case where the old API doesn't really fit
> the new bytes/string model correctly, and of course you can
> (rightly!) argue we were sloppy in Py2.x but were able to (mostly)
> get away with it.
>
> In this /specific/ situation, generator.py:291 can only be called
> when the main type is text, so I think it is clearly expecting a
> string, even though .get_payload() will return a bytes there.
>
> Short of redesigning the API, I can think of two options.  First, we
> can change .get_payload() to specific return a string when the main
> type is text and decode=True.  This is ugly because the return type
> will depend on the content type of the message.  OTOH, get_payload()
> is already fairly ugly here because its return type differs based on
> its argument, although I'd like to split this into a
> separate .get_decoded_payload() method.
>
> The other option is to let .get_payload() return bytes in all cases,
> but in generator.py:291, explicitly convert it to a string, probably
> using raw-unicode-escape.  Because we know the main type is text
> here, we know that the payload must contain a string.  get_payload()
> will return the bytes of the decoded unicode string, so raw-unicode-
> escape should do the right thing.  That's ugly too for obvious reasons.
>
> The one thing that doesn't seem right is for decode=False to be used
> because should the payload be an encoded string, it won't get
> correctly decoded.  This is part of the DecodedGenerator, which
> honestly is probably not much used outside the test cases.  but the
> intent of that generator is clearly to print the decoded text parts
> with the non-text parts stripped and replaced by a placeholder.  So I
> think it definitely wants decoded text payloads, otherwise there's
> not much point in the class.
>
> I hope that explains the situation.  I'm open to any other idea -- it
> doesn't even have to be better. ;)  I see that you made the
> decode=False change in svn, but that's the one solution that doesn't
> seem right.

At this point I (Guido) am really hoping someone will want to "own"
this issue and redesign the API properly...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Tue Oct 23 21:50:05 2007
From: barry at python.org (Barry Warsaw)
Date: Tue, 23 Oct 2007 15:50:05 -0400
Subject: [Python-3000] Question about email/generator.py
In-Reply-To: <ca471dc20710231236o2ec7c1e1h6a3449fe383e7153@mail.gmail.com>
References: <ca471dc20710101538s2cc9fdm8998734b16fb41b8@mail.gmail.com>
	<AC3AAF69-E18A-44A8-A860-92DA7D5BB5E5@python.org>
	<ca471dc20710231236o2ec7c1e1h6a3449fe383e7153@mail.gmail.com>
Message-ID: <F7A7FEC9-092B-4087-84B9-63B84337BAE7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote:

> There's an issue in the email package that I can't resolve by myself.
> I described it to Barry like this:
>
>>> So in generator.py on line 291, I read:
>>>
>>>   print(part.get_payload(decode=True), file=self)
>>>
>>> It turns out that part.get_payload(decode=True) returns a bytes
>>> object, and printing a bytes object to a text file is not the right
>>> thing to do -- in 3.0a1 it silently just prints those bytes, in  
>>> 3.0a2
>>> it will probably print the repr() of the bytes object. Right now, it
>>> errors out because I'm removing the encode() method on PyString
>>> objects, and print() converts PyBytes to PyString; then the
>>> TextIOWrapper.write() method tries to encode its argument.
>>>
>>> If I change this to (decode=False), all tests in the email package
>>> pass. But is this the right fix???
>
> I should note that this was checked in by the time Barry replied, even
> though it clearly was the wrong thing to do. Barry replied:
>
>> Maybe. ;)  The problem is that this API is either being too smart for
>> its own good, or not smart enough.  The intent of decode=True is to
>> return the original object encoded in the payload.  So for example,
>> if MIMEImage was used to encode some jpeg, then decode=True should
>> return that jpeg.
>>
>> The problem is that what you really want is something that's content-
>> type aware, such that if your main type is some non-text type like
>> image/* or audio/* or even application/octet-stream, you will almost
>> always want a bytes object back.  But text can also be encoded via
>> charset and/or transfer-encoding, and (at least in Py2.x), you'd use
>> the same method to get the original, unencoded text back.  In that
>> case, you definitely want the string, since that's the most natural
>> API (i.e. you fed it a string object when you created the MIMEText,
>> so you want a string on the way back out).
>>
>> This is yet another corner case where the old API doesn't really fit
>> the new bytes/string model correctly, and of course you can
>> (rightly!) argue we were sloppy in Py2.x but were able to (mostly)
>> get away with it.
>>
>> In this /specific/ situation, generator.py:291 can only be called
>> when the main type is text, so I think it is clearly expecting a
>> string, even though .get_payload() will return a bytes there.
>>
>> Short of redesigning the API, I can think of two options.  First, we
>> can change .get_payload() to specific return a string when the main
>> type is text and decode=True.  This is ugly because the return type
>> will depend on the content type of the message.  OTOH, get_payload()
>> is already fairly ugly here because its return type differs based on
>> its argument, although I'd like to split this into a
>> separate .get_decoded_payload() method.
>>
>> The other option is to let .get_payload() return bytes in all cases,
>> but in generator.py:291, explicitly convert it to a string, probably
>> using raw-unicode-escape.  Because we know the main type is text
>> here, we know that the payload must contain a string.  get_payload()
>> will return the bytes of the decoded unicode string, so raw-unicode-
>> escape should do the right thing.  That's ugly too for obvious  
>> reasons.
>>
>> The one thing that doesn't seem right is for decode=False to be used
>> because should the payload be an encoded string, it won't get
>> correctly decoded.  This is part of the DecodedGenerator, which
>> honestly is probably not much used outside the test cases.  but the
>> intent of that generator is clearly to print the decoded text parts
>> with the non-text parts stripped and replaced by a placeholder.  So I
>> think it definitely wants decoded text payloads, otherwise there's
>> not much point in the class.
>>
>> I hope that explains the situation.  I'm open to any other idea -- it
>> doesn't even have to be better. ;)  I see that you made the
>> decode=False change in svn, but that's the one solution that doesn't
>> seem right.
>
> At this point I (Guido) am really hoping someone will want to "own"
> this issue and redesign the API properly...

I'm really bummed that I've had no time to work on this.  Life and  
work have imposed.  I'd be willing to chat with someone about what I  
think should happen.  At this point irc or im might be best. :(

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRx5Qb3EjvBPtnXfVAQIcbwP9FPa/IJpIg+D2y/FJJp0LRqXctGhXUssi
aDX8M07pHu9aMPXKvDYZw50NFcyx87mMjWNVf2gX1KjM+U5XUns3WwtU+C60ZBSn
gEUmzAaYJVhDWguRiOpCX/bR1F2U8dudDR0UC8wrV9Mylk/C4b/q7bUdrGeT8riK
+oSTcaKTatY=
=98W1
-----END PGP SIGNATURE-----

From lists at cheimes.de  Tue Oct 23 21:56:57 2007
From: lists at cheimes.de (Christian Heimes)
Date: Tue, 23 Oct 2007 21:56:57 +0200
Subject: [Python-3000] Three new failing tests?
In-Reply-To: <ca471dc20710231230q2abaac94y55b68284b42fc8ba@mail.gmail.com>
References: <ca471dc20710231230q2abaac94y55b68284b42fc8ba@mail.gmail.com>
Message-ID: <ffljm9$gcd$1@ger.gmane.org>

Guido van Rossum wrote:
> I don't think Georg's latest checkin (PyInt_Check/PyLong_Check issues)
> broke these.

You are right. They were already broken at 10am CEST.



From luke.stebbing at gmail.com  Tue Oct 23 23:31:01 2007
From: luke.stebbing at gmail.com (Luke Stebbing)
Date: Tue, 23 Oct 2007 14:31:01 -0700
Subject: [Python-3000] Question about email/generator.py
In-Reply-To: <F7A7FEC9-092B-4087-84B9-63B84337BAE7@python.org>
References: <ca471dc20710101538s2cc9fdm8998734b16fb41b8@mail.gmail.com>
	<AC3AAF69-E18A-44A8-A860-92DA7D5BB5E5@python.org>
	<ca471dc20710231236o2ec7c1e1h6a3449fe383e7153@mail.gmail.com>
	<F7A7FEC9-092B-4087-84B9-63B84337BAE7@python.org>
Message-ID: <dcb1979a0710231431j42949189g2efc60d5a55d74ba@mail.gmail.com>

On 10/23/07, Barry Warsaw <barry at python.org> wrote:
> On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote:
> > At this point I (Guido) am really hoping someone will want to "own"
> > this issue and redesign the API properly...
>
> I'm really bummed that I've had no time to work on this.  Life and
> work have imposed.  I'd be willing to chat with someone about what I
> think should happen.  At this point irc or im might be best. :(

In py2k, you determine whether a payload is 'list of Message' or 'str'
by calling .is_multipart(). Maybe .is_str() and .is_bytes() methods
(or properties) could be added. Alternatively, there could be a
.payload_type property to test against.

Whatever it does, I think it should parallel the polymorphic structure
used by the new I/O [1]. Does that mean Message ABCs? Would that be
overkill?

I've been using the email package pretty heavily this year, and I'd be
up for talking about this on any of the im services or on freenode or
whatever.

Luke

[1] http://www.python.org/dev/peps/pep-3116

From guido at python.org  Fri Oct 26 01:33:13 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 25 Oct 2007 16:33:13 -0700
Subject: [Python-3000] Need help with Windows failures
Message-ID: <ca471dc20710251633g21142a91w694baa3ba3c3073f@mail.gmail.com>

Hi Christian and Amaury (and anyone else with a Windows setup who
would like to help!),

I noticed that both of you are contributing fixes for Windows-specific
issues. Could I get your help with some other Windows issues?

See e.g. the failures on
http://www.python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/182/step-test/0

I'd be happy to give you some pointers on specific failures if you
can't figure out what might cause them. (To find the errors, scroll to
the end and then scroll up; or search for "Re-running failed tests in
verbose mode".) Please CC Neal Norwitz as well, he may have some
suggestions as well.

Some random notes:

- It looks like there are some CRLF issues. Quite a few things
complain about mysterious syntax errors; I see some problems where \n
characters seem to appear or disappear.

- Most of the mailbox test failures seem due to a failed cleanup in
the second failing test (note how it prints FAIL and then ERROR --
that suggests the ERROR happened in the tearDown()).

- In general, whenever you see other failures mentioning things like
"The process cannot access the file because it is being used by
another process: '@test'" it's probably a failed test not properly
cleaning up; a lot of tests either don't always close files (could use
try/finally: f.close()) or don't remove them. (The best way to remove
files btw is typically test_support.remove().)

Thanks in advance to anyone who fixes a Windows bug in Py3k!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Fri Oct 26 05:31:28 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 25 Oct 2007 23:31:28 -0400
Subject: [Python-3000] Question about email/generator.py
In-Reply-To: <dcb1979a0710231431j42949189g2efc60d5a55d74ba@mail.gmail.com>
References: <ca471dc20710101538s2cc9fdm8998734b16fb41b8@mail.gmail.com>
	<AC3AAF69-E18A-44A8-A860-92DA7D5BB5E5@python.org>
	<ca471dc20710231236o2ec7c1e1h6a3449fe383e7153@mail.gmail.com>
	<F7A7FEC9-092B-4087-84B9-63B84337BAE7@python.org>
	<dcb1979a0710231431j42949189g2efc60d5a55d74ba@mail.gmail.com>
Message-ID: <A7A5E64F-B7C0-471A-89FF-9191C2600555@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Oct 23, 2007, at 5:31 PM, Luke Stebbing wrote:

> On 10/23/07, Barry Warsaw <barry at python.org> wrote:
>> On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote:
>>> At this point I (Guido) am really hoping someone will want to "own"
>>> this issue and redesign the API properly...
>>
>> I'm really bummed that I've had no time to work on this.  Life and
>> work have imposed.  I'd be willing to chat with someone about what I
>> think should happen.  At this point irc or im might be best. :(
>
> In py2k, you determine whether a payload is 'list of Message' or 'str'
> by calling .is_multipart(). Maybe .is_str() and .is_bytes() methods
> (or properties) could be added. Alternatively, there could be a
> .payload_type property to test against.
>
> Whatever it does, I think it should parallel the polymorphic structure
> used by the new I/O [1]. Does that mean Message ABCs? Would that be
> overkill?
>
> I've been using the email package pretty heavily this year, and I'd be
> up for talking about this on any of the im services or on freenode or
> whatever.

Hi Luke,

I'm actually thinking something along the lines of  
changing .get_payload() to only return the raw payload when the  
content type is scalar.  For non-scalar types (i.e. multiparts),  
you'd get an exception if you tried to use .get_payload().  I'd also  
separate out getting the raw payload and getting the decoded payload  
into separate APIs, either by adding a new .get_decoded_payload() or  
having .get_payload() return a Payload object that knows how to  
decode itself (and return its content type).

Can you talk more about how you think the polymorphism would work?  I  
don't immediately see a parallel, and yeah, I kind of do think that  
message ABCs are overkill (I'd love for whatever we come up with to  
be backward compatible with Python 2.x if at all possible).  The fact  
that .get_decoded_payload() could return bytes or strings is  
bothersome though, so if you have some thoughts about how to do that  
more cleanly, I'm all ears.

Definitely ping me on freenode (nick: barry) any time during working  
hours EST if you want to chat.  I'm almost always on, hanging out in  
#mailman and #launchpad, though that shouldn't matter if you want to  
pvtmsg me.

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRyFfkHEjvBPtnXfVAQJA0gP+PsEBOhInn5ReEwGgD9BeDg12VFkVrDET
UTZlbPpBhDISNvByfvHJXSJMnO1XCmUniA4a7sQ0PHEjdEMHSFY6NKT3BtVRg4yh
WoDIEVs8WIOn2k+tHb2E0SDPQTNtnyA2FbG8CGq27wvGxbd3C61ytylgKofP+0A8
oJHW6atRW7g=
=fNGw
-----END PGP SIGNATURE-----

From christian at cheimes.de  Fri Oct 26 01:43:12 2007
From: christian at cheimes.de (Christian Heimes)
Date: Fri, 26 Oct 2007 01:43:12 +0200
Subject: [Python-3000] Need help with Windows failures
In-Reply-To: <ca471dc20710251633g21142a91w694baa3ba3c3073f@mail.gmail.com>
References: <ca471dc20710251633g21142a91w694baa3ba3c3073f@mail.gmail.com>
Message-ID: <47212A10.50506@cheimes.de>

Guido van Rossum wrote:
> Hi Christian and Amaury (and anyone else with a Windows setup who
> would like to help!),
> 
> I noticed that both of you are contributing fixes for Windows-specific
> issues. Could I get your help with some other Windows issues?

Yes, I've set up a VMWare Win XP instance on my Linux box for Python 3.0
and PythonDotNet.

> - Most of the mailbox test failures seem due to a failed cleanup in
> the second failing test (note how it prints FAIL and then ERROR --
> that suggests the ERROR happened in the tearDown()).
> 
> - In general, whenever you see other failures mentioning things like
> "The process cannot access the file because it is being used by
> another process: '@test'" it's probably a failed test not properly
> cleaning up; a lot of tests either don't always close files (could use
> try/finally: f.close()) or don't remove them. (The best way to remove
> files btw is typically test_support.remove().)

I'm going to look into the mailbox and @test problems.

Christian


From shiblon at gmail.com  Fri Oct 26 15:48:28 2007
From: shiblon at gmail.com (Chris Monson)
Date: Fri, 26 Oct 2007 09:48:28 -0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <200710260855.52649.mark@qtrac.eu>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
Message-ID: <da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>

Forwarding to the group for discussion.

On 10/26/07, Mark Summerfield wrote:

There is one thing about this PEP I don't like:

    The available integer presentation types are:

        'd' - Decimal Integer. Outputs the number in base 10.

I think this is confusing (since this will not print a decimal.Decimal
object), and is a throw back to early versions of C. Modern C now has
'i' as an alternative to 'd' and I wish this PEP would use 'i' for
integer rather than the contrived 'd' for "decimal" integer (which
sounds like a contradition because most people expect decimals to have
fractional parts). I guess if 'd' is too late to change then one
"solution" would be:

        'd' - Denary Integer. Outputs the number in base 10.

because at least that fits with octal and hex.

--
Mark Summerfield, Qtrac Ltd., www.qtrac.eu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071026/1ad1ab4e/attachment.htm 

From phd at phd.pp.ru  Fri Oct 26 16:20:36 2007
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 26 Oct 2007 18:20:36 +0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
Message-ID: <20071026142036.GB3365@phd.pp.ru>

On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote:
>         'd' - Decimal Integer. Outputs the number in base 10.
[skip]
>         'd' - Denary Integer. Outputs the number in base 10.

   -1. I know what "decimal integers" are, but never heard about "denary"
(my spellchecker complains, too).

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From guido at python.org  Fri Oct 26 16:24:43 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 07:24:43 -0700
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <20071026142036.GB3365@phd.pp.ru>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
Message-ID: <ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>

2007/10/26, Oleg Broytmann <phd at phd.pp.ru>:
> On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote:
[quoting Mark Summerfield]
> >         'd' - Decimal Integer. Outputs the number in base 10.
> [skip]
> >         'd' - Denary Integer. Outputs the number in base 10.
>
>    -1. I know what "decimal integers" are, but never heard about "denary"
> (my spellchecker complains, too).

-1 indeed. What's wrong with binary, octal, decimal, hexademimal?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mark at qtrac.eu  Fri Oct 26 16:40:55 2007
From: mark at qtrac.eu (Mark Summerfield)
Date: Fri, 26 Oct 2007 15:40:55 +0100
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
Message-ID: <200710261540.55302.mark@qtrac.eu>

On 2007-10-26, Guido van Rossum wrote:
> 2007/10/26, Oleg Broytmann <phd at phd.pp.ru>:
> > On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote:
>
> [quoting Mark Summerfield]
>
> > >         'd' - Decimal Integer. Outputs the number in base 10.
> >
> > [skip]
> >
> > >         'd' - Denary Integer. Outputs the number in base 10.
> >
> >    -1. I know what "decimal integers" are, but never heard about "denary"
> > (my spellchecker complains, too).

http://www.thefreedictionary.com/denary

> -1 indeed. What's wrong with binary, octal, decimal, hexademimal?

If it was logical it would be 'b', 'o', 'd', 'h', not 'b', 'o', 'd', 'x'.

Why use x rather than h for hexadecimal? Because it is an established
convention. Of course 'd' is an established convention too, but in the
end the C standard adopted 'i' as an alternative because people _expect_
an 'i' to be there and to mean integer. (Surely it is only old C
programmers who learnt C before 'i' was available use 'd' these days.)

And decimal may lead people new to Python to think decimal.Decimal is
intended, or at least that a decimal number (i.e., one with a fractional
part) is expected.

I think the right solution is to use

    'i' - Integer. Outputs the number in base 10.

because I think people assume base 10 for integers unless told
otherwise, whereas "decimal" is ambiguous, is it a base 10 integer or a
decimal floating point number.

Both C and C++ accept both 'i' and 'd' (and I think accepting both is
fine although that goes against TOOWTDI), but having to use 'd' somehow
seems like a retrograde step reminding me of when I started programming
in C many years ago---something I thought I'd escaped:-)

-- 
Mark Summerfield, Qtrac Ltd., www.qtrac.eu


From phd at phd.pp.ru  Fri Oct 26 16:53:36 2007
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 26 Oct 2007 18:53:36 +0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <200710261540.55302.mark@qtrac.eu>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu>
Message-ID: <20071026145336.GA5139@phd.pp.ru>

On Fri, Oct 26, 2007 at 03:40:55PM +0100, Mark Summerfield wrote:
> http://www.thefreedictionary.com/denary

   No need to use a word I have to lookup in a dictionary when "decimal" is
so widely used.
   The article says "decimal" is a synonym. What is the point to use an
unknown synonym instead of a well-known word?
   Still -1 from me.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From larry at hastings.org  Fri Oct 26 19:11:45 2007
From: larry at hastings.org (Larry Hastings)
Date: Fri, 26 Oct 2007 10:11:45 -0700
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <20071026145336.GA5139@phd.pp.ru>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>	<20071026142036.GB3365@phd.pp.ru>	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>	<200710261540.55302.mark@qtrac.eu>
	<20071026145336.GA5139@phd.pp.ru>
Message-ID: <47221FD1.3080802@hastings.org>

Oleg Broytmann wrote:
> The article says "decimal" is a synonym. What is the point to use an
> unknown synonym instead of a well-known word?

His point is that Python has a fixed-point number type called "Decimal", 
and that this will lead to confusion.  I can see his point, but we all 
know from years of C programming that "%d" takes an int and formats it 
in base 10--there is no confusion about this.  Indeed, I suspect 
describing this as "denary" would lead to far more confusion, and using 
the format character "d" to take a Decimal object instead of an int 
would lead to widespread panic and mayhem.  So -0.5 from me.


/larry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071026/128fdeec/attachment.htm 

From jimjjewett at gmail.com  Fri Oct 26 21:14:56 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 26 Oct 2007 15:14:56 -0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <47221FD1.3080802@hastings.org>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru>
	<47221FD1.3080802@hastings.org>
Message-ID: <fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>

On 10/26/07, Larry Hastings <larry at hastings.org> wrote:
>  His point is that Python has a fixed-point number type called "Decimal",
> and that this will lead to confusion.  I can see his point, but we all know
> from years of C programming that "%d" takes an int and formats it in base
> 10--there is no confusion about this.

Sure there is.  C isn't the only language where I've used it, but I
still sometimes have to look up whether 'd' is "decimal" or "double".
I've found bugs in C where someone else just assumed it was "double".
If it weren't for backwards compatibility, 'i' would be a much better
option, and saving 'd' for an actual Decimal (which might have a
decimal point) would be good.

http://docs.python.org/lib/typesseq-strings.html already allows both.
The question is whether repurposing 'd' would break too much.

That said, I think a Decimal that happens to be an integer probably
*should* print differently from an integer, because the precision is
an important part of a Decimal, and won't always fall conveniently at
the decimal point.

-jJ

From guido at python.org  Fri Oct 26 21:18:13 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 12:18:13 -0700
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
Message-ID: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>

2007/10/19, Guido van Rossum <guido at python.org>:
> On 10/7/07, Guido van Rossum <guido at python.org> wrote:
> > I'd like to make complete implementation of PEP 3137 the goal for the
> > 3.0a2 release. It should be doable to do this release by the end of
> > October. I don't think anything else *needs* to be done to have a
> > successful a2 release.
>
> I'm still hopeful, though realistically we may not quite make it.

Mid November sounds more like it.

Below is a full updated status update; here's a short list of the
tasks that remain to be done:

- remove compatibility with PyString from PyUnicode
- change lots of places (e.g. encoders) to return PyString instead of PyBytes
- change PyString's repr() to return "b'...'" (1)
- change PyBytes's repr() to return "buffer(b'...')" (1)
- change parser so that b"..." returns PyString, not PyBytes (1)
- rename bytes -> buffer, str8 -> bytes (1)
- change PyBytes so that its str() is the same as its repr().
- change PyString so that its str() is the same as its repr().

(1) see http://bugs.python.org/issue1247

I'll be working on all of these together; they're hard to separate out.

Here's the full list:

> > - remove locale support from PyString
>
> Done.
>
> > - remove compatibility with PyUnicode from PyString

Done.

> > - remove compatibility with PyString from PyUnicode
>
> Not done yet.
>
> > - add missing methods to PyBytes (for list, see the PEP and compare to
> > what's already there)
>
> Done (Gregory P. Smith)
>
> > - remove buffer API from PyUnicode
>
> Done.
>
> > - make == and != between PyBytes and PyUnicode return False instead of
> > raising TypeError
>
> Done.
>
> > - make == and != between PyString and Pyunicode return False instead
> > of converting
>
> A patch by Thomas Lee exists: http://bugs.python.org/issue1263
> However it breaks some unit tests.

Done.

> > - make comparisons between PyString and PyBytes work (these are
> > properly ordered)
>
> Already works.
>
> > - change lots of places (e.g. encoders) to return PyString instead of PyBytes
>
> Not done.
>
> > - change indexing and iteration over PyString to return ints, not
> > 1-char PyStrings
>
> A patch by Alexandre Vassalotti exists but breaks some unit tests:
> http://bugs.python.org/issue1280

Done.

> > - change PyString's repr() to return "b'...'"
> > - change PyBytes's repr() to return "buffer(b'...')"
> > - change parser so that b"..." returns PyString, not PyBytes
> > - rename bytes -> buffer, str8 -> bytes
>
> A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items:
> http://bugs.python.org/issue1247
> However it breaks too many tests to be applied right now.

Still pending.

> > - change the constructor for PyString to match the one for PyBytes
>
> Not done.

Done.

> > - change PyBytes so that its str() is the same as its repr().
> > - change PyString so that its str() is the same as its repr().
>
> Not done.
>
> > - add an iteration view over PyBytes (optional)
>
> Not yet done (Christian Heimes offered).

Done.

> > - kill basestring.
>
> Done (Christian Heimes).
>
> > - move initialization of sys.std{in,out,err} into C code and do it earlier.
>
> A patch by Christian Heimes exists: http://bugs.python.org/issue1267
> However it still breaks some unit tests...

Done.

> There are also some issues that mainly crop up in non-English locales.
> We will try to get to the bottom of those before releasing 3.0a2, but
> I need help as I'm myself absolutely unable to work with locales (and
> I don't have access to a Windows box).

I think Christian and a few others are making progress here.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Fri Oct 26 21:18:19 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 12:18:19 PDT
Subject: [Python-3000] 3K bytes I/O?
Message-ID: <07Oct26.121820pdt."57996"@synergy1.parc.xerox.com>

I'm looking at the Py3K SSL code, and have a question:

What's the upshot of the bytes/string decisions in the C world?  Is
PyString_* now all about immutable bytes, and PyUnicode_* about
strings?  There still seem to be a lot of encode/decode methods in
stringobject.h, operations which I'd expect to be in unicodeobject.h.

Bill


From guido at python.org  Fri Oct 26 21:26:17 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 12:26:17 -0700
Subject: [Python-3000] 3K bytes I/O?
In-Reply-To: <-912240280709553237@unknownmsgid>
References: <-912240280709553237@unknownmsgid>
Message-ID: <ca471dc20710261226r5eea7cdax70aca984e1529beb@mail.gmail.com>

2007/10/26, Bill Janssen <janssen at parc.com>:
> I'm looking at the Py3K SSL code, and have a question:
>
> What's the upshot of the bytes/string decisions in the C world?  Is
> PyString_* now all about immutable bytes, and PyUnicode_* about
> strings?  There still seem to be a lot of encode/decode methods in
> stringobject.h, operations which I'd expect to be in unicodeobject.h.

I think the PyString encode/decode APIs should all go; use the
corresponding PyUnicode ones.

I recommend that you write your code to assume PyBytes for
encoded/binary data, and PyUnicode for text; at some point we'll
substitute PyString for most cases where PyBytes is currently used:
that will happen once PyString is called bytes in at the Python level,
and PyBytes will be called buffer. But that's still a while off.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From lists at cheimes.de  Fri Oct 26 23:26:19 2007
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 26 Oct 2007 23:26:19 +0200
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
Message-ID: <47225B7B.8020206@cheimes.de>

Guido van Rossum wrote:
> Mid November sounds more like it.
> 
> Below is a full updated status update; here's a short list of the
> tasks that remain to be done:
> 
> - remove compatibility with PyString from PyUnicode
> - change lots of places (e.g. encoders) to return PyString instead of PyBytes
> - change PyString's repr() to return "b'...'" (1)
> - change PyBytes's repr() to return "buffer(b'...')" (1)
> - change parser so that b"..." returns PyString, not PyBytes (1)
> - rename bytes -> buffer, str8 -> bytes (1)
> - change PyBytes so that its str() is the same as its repr().
> - change PyString so that its str() is the same as its repr().
> 
> (1) see http://bugs.python.org/issue1247
> 
> I'll be working on all of these together; they're hard to separate out.

I suggest that you create a branch for the transition period. It will
take at least several days to kick and drag everything in place. We can
work on the transition while the rest can play with a working py3k branch.

>> There are also some issues that mainly crop up in non-English locales.
>> We will try to get to the bottom of those before releasing 3.0a2, but
>> I need help as I'm myself absolutely unable to work with locales (and
>> I don't have access to a Windows box).
> 
> I think Christian and a few others are making progress here.

I think that I have found and fixed the last bit of problematic code in
the time module several days ago. I don't get any locales related errors
on my German Windows installation anymore. I would like to have people
with other locales to test py3k on Windows. In particular I'm interested
in tests with more "exotic" locales like Cyrillic alphabet (Greek,
Russian), Arabian and Asian locales.

Christian

From guido at python.org  Fri Oct 26 23:52:58 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 14:52:58 -0700
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <47225B7B.8020206@cheimes.de>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
	<47225B7B.8020206@cheimes.de>
Message-ID: <ca471dc20710261452s73e64dfbv8e2daaf999310172@mail.gmail.com>

2007/10/26, Christian Heimes <lists at cheimes.de>:
> I suggest that you create a branch for the transition period. It will
> take at least several days to kick and drag everything in place. We can
> work on the transition while the rest can play with a working py3k branch.

Thanks for the suggestion -- I'm now working in a new branch, py3k-pep3137.

> >> There are also some issues that mainly crop up in non-English locales.
> >> We will try to get to the bottom of those before releasing 3.0a2, but
> >> I need help as I'm myself absolutely unable to work with locales (and
> >> I don't have access to a Windows box).
> >
> > I think Christian and a few others are making progress here.
>
> I think that I have found and fixed the last bit of problematic code in
> the time module several days ago. I don't get any locales related errors
> on my German Windows installation anymore. I would like to have people
> with other locales to test py3k on Windows. In particular I'm interested
> in tests with more "exotic" locales like Cyrillic alphabet (Greek,
> Russian), Arabian and Asian locales.

Also check the 3.0 buildbots: http://www.python.org/dev/buildbot/3.0/

I still see a lot of failing tests there... E.g.
http://www.python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/191/step-test/0

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Sat Oct 27 00:06:32 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Oct 2007 11:06:32 +1300
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
Message-ID: <472264E8.9060205@canterbury.ac.nz>

Chris Monson wrote:
>         'd' - Decimal Integer. Outputs the number in base 10.
> 
> Modern C now has
> 'i' as an alternative to 'd'

Considering that in printf formats the alternatives to
'd' or 'i' are 'x' for hexadecimal and 'o' for octal,
then 'd' for decimal makes a lot more sense to me than
'i', which says nothing about the base in which it
will be displayed.

It makes even more sense in Python, where the format
codes are clearly all about the display format and
nothing to do with the type of data (whereas the
distinction is somewhat blurred in C).

 > most people expect decimals to have fractional parts).

Then their expectations require adjustment. "Decimal"
means "base 10". On its own it doesn't imply anything
about fractions.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Oct 27 00:42:01 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Oct 2007 11:42:01 +1300
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru>
	<47221FD1.3080802@hastings.org>
	<fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>
Message-ID: <47226D39.2020002@canterbury.ac.nz>

Jim Jewett wrote:
> If it weren't for backwards compatibility, 'i' would be a much better
> option,

No, it wouldn't, because 'integer' is a data type, not
a display format. The Python format codes specify display
formats, not data types.

--
Greg

From janssen at parc.com  Sat Oct 27 01:07:26 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 16:07:26 PDT
Subject: [Python-3000] base64.{decode,encode}string
Message-ID: <07Oct26.160735pdt."57996"@synergy1.parc.xerox.com>

I think encodestring() should return a string, not bytes, and
decodestring() should take either a string, or bytes containing an
ASCII-encoded string.  Otherwise, every place they'll ever be
used has to wrap an additional unicode/encode step around their
use.

Bill

From guido at python.org  Sat Oct 27 01:33:02 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 16:33:02 -0700
Subject: [Python-3000] base64.{decode,encode}string
In-Reply-To: <-1648744909719026234@unknownmsgid>
References: <-1648744909719026234@unknownmsgid>
Message-ID: <ca471dc20710261633v723fc05al93f36b5fc0ff3b89@mail.gmail.com>

2007/10/26, Bill Janssen <janssen at parc.com>:
> I think encodestring() should return a string, not bytes, and
> decodestring() should take either a string, or bytes containing an
> ASCII-encoded string.  Otherwise, every place they'll ever be
> used has to wrap an additional unicode/encode step around their
> use.

I'm okay with being flexible on input. I think there ought to be
separate functions returning bytes and str.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Sat Oct 27 02:24:47 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 17:24:47 PDT
Subject: [Python-3000] base64.{decode,encode}string
In-Reply-To: <ca471dc20710261633v723fc05al93f36b5fc0ff3b89@mail.gmail.com> 
References: <-1648744909719026234@unknownmsgid>
	<ca471dc20710261633v723fc05al93f36b5fc0ff3b89@mail.gmail.com>
Message-ID: <07Oct26.172456pdt."57996"@synergy1.parc.xerox.com>

> 2007/10/26, Bill Janssen <janssen at parc.com>:
> > I think encodestring() should return a string, not bytes, and
> > decodestring() should take either a string, or bytes containing an
> > ASCII-encoded string.  Otherwise, every place they'll ever be
> > used has to wrap an additional unicode/encode step around their
> > use.
> 
> I'm okay with being flexible on input. I think there ought to be
> separate functions returning bytes and str.

I'm fine with that, too.  I just think that the purpose of
standard_b64encode() is to take bytes and produce text, and the
purpose of standard_b64decode() is to take text and produce bytes.
But if we want to add encodestring_to_ascii(), to take bytes and
produce ASCII base64-encoded bytes, and decodestring_from_ascii(), to
take an ASCII-encoded string as bytes, and produce bytes, that's OK
with me.  But it seems odd.

Bill



From stephen at xemacs.org  Sat Oct 27 02:34:32 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 27 Oct 2007 09:34:32 +0900
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <472264E8.9060205@canterbury.ac.nz>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<472264E8.9060205@canterbury.ac.nz>
Message-ID: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 >  > most people expect decimals to have fractional parts).
 > 
 > Then their expectations require adjustment. "Decimal"
 > means "base 10". On its own it doesn't imply anything
 > about fractions.

"Decimal point" notwithstanding, I guess.

Getting "them" to change their expectations is a losing battle.

From janssen at parc.com  Sat Oct 27 02:37:15 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 17:37:15 PDT
Subject: [Python-3000] passing bytes buffers to C with NUL characters in
	them?
Message-ID: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com>

I'm not sure what to use in PyArg_ParseTuple in 3K.  I'm passing in
bytes which may contain NUL characters.  Using 's#' doesn't really
work, because it erroneously accepts Unicode strings.

Bill

From guido at python.org  Sat Oct 27 02:42:42 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 17:42:42 -0700
Subject: [Python-3000] passing bytes buffers to C with NUL characters in
	them?
In-Reply-To: <-6189415837657270969@unknownmsgid>
References: <-6189415837657270969@unknownmsgid>
Message-ID: <ca471dc20710261742w26a062b7l21946074f02d854f@mail.gmail.com>

2007/10/26, Bill Janssen <janssen at parc.com>:
> I'm not sure what to use in PyArg_ParseTuple in 3K.  I'm passing in
> bytes which may contain NUL characters.  Using 's#' doesn't really
> work, because it erroneously accepts Unicode strings.

Use y# I think.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Sat Oct 27 02:42:29 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 17:42:29 PDT
Subject: [Python-3000] passing bytes buffers to C with NUL characters in
	them?
In-Reply-To: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com> 
References: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <07Oct26.174236pdt."57996"@synergy1.parc.xerox.com>

> I'm not sure what to use in PyArg_ParseTuple in 3K.  I'm passing in
> bytes which may contain NUL characters.  Using 's#' doesn't really
> work, because it erroneously accepts Unicode strings.

Ah, sorry, found it.  "y#".

Bill

From guido at python.org  Sat Oct 27 02:44:49 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 26 Oct 2007 17:44:49 -0700
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<472264E8.9060205@canterbury.ac.nz>
	<87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <ca471dc20710261744r4d298a90neadfac410cadbeec@mail.gmail.com>

2007/10/26, Stephen J. Turnbull <stephen at xemacs.org>:
> Greg Ewing writes:
>
>  >  > most people expect decimals to have fractional parts).
>  >
>  > Then their expectations require adjustment. "Decimal"
>  > means "base 10". On its own it doesn't imply anything
>  > about fractions.
>
> "Decimal point" notwithstanding, I guess.
>
> Getting "them" to change their expectations is a losing battle.

However, non of the participants in this discussion are "most people"
and I can't recall ever hearing about Python and programming newbies
who had trouble with %d. So that "their" expectations are is a matter
of pure speculation.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Sat Oct 27 02:45:02 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 17:45:02 PDT
Subject: [Python-3000] plat-mac seriously broken?
Message-ID: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com>

I found that an SSL test was failing on 3K because of the following:

Traceback (most recent call last):
  File "/local/python/3k/src/Lib/test/test_ssl.py", line 818, in testAsyncore
    f = urllib.urlopen(url)
  File "/local/python/3k/src/Lib/urllib.py", line 75, in urlopen
    opener = FancyURLopener()
  File "/local/python/3k/src/Lib/urllib.py", line 553, in __init__
    URLopener.__init__(self, *args, **kwargs)
  File "/local/python/3k/src/Lib/urllib.py", line 124, in __init__
    proxies = getproxies()
  File "/local/python/3k/src/Lib/urllib.py", line 1278, in getproxies
    return getproxies_environment() or getproxies_internetconfig()
  File "/local/python/3k/src/Lib/urllib.py", line 1263, in getproxies_internetconfig
    if 'UseHTTPProxy' in config and config['UseHTTPProxy']:
  File "/local/python/3k/src/Lib/plat-mac/ic.py", line 187, in __getitem__
    return _decode(self.h.data, key)
  File "/local/python/3k/src/Lib/plat-mac/ic.py", line 144, in _decode
    return decoder(data, key)
  File "/local/python/3k/src/Lib/plat-mac/ic.py", line 68, in _decode_boolean
    return ord(data[0])
TypeError: ord() expected string of length 1, but int found

All of the modules in plat-mac are full of this kind of stuff.
Someone needs to run 2to3 over them, I think.  Or maybe ord(int)
should just return the int.

Bill

From janssen at parc.com  Sat Oct 27 04:33:32 2007
From: janssen at parc.com (Bill Janssen)
Date: Fri, 26 Oct 2007 19:33:32 PDT
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> 
References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <07Oct26.193341pdt."57996"@synergy1.parc.xerox.com>

> All of the modules in plat-mac are full of this kind of stuff.
> Someone needs to run 2to3 over them, I think.

Actually, after looking at the code a bit more, I think 1to3 would be
more appropriate. :-)

Bill

From stephen at xemacs.org  Sat Oct 27 06:44:02 2007
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 27 Oct 2007 13:44:02 +0900
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <ca471dc20710261744r4d298a90neadfac410cadbeec@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<472264E8.9060205@canterbury.ac.nz>
	<87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20710261744r4d298a90neadfac410cadbeec@mail.gmail.com>
Message-ID: <87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > I can't recall ever hearing about Python and programming newbies
 > who had trouble with %d.

OK.  I think Greg's basic point is correct, I just (over?)reacted to
the suggestion that *if* people do have trouble, telling them to
change expectations will have a useful effect.  Emacs advocates do
that *far* too much, and the only effect it has that I know of is to
increase the ranks of vi users.  (Does perl have %i?<wink>)

From greg.ewing at canterbury.ac.nz  Sat Oct 27 08:33:10 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Oct 2007 19:33:10 +1300
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<472264E8.9060205@canterbury.ac.nz>
	<87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4722DBA6.4010005@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> Greg Ewing writes:
> 
>  "Decimal"
>  > means "base 10". On its own it doesn't imply anything
>  > about fractions.
> 
> "Decimal point" notwithstanding, I guess.

That's not "decimal" on its own -- it includes the
word "point", which is what tells you that you're
(potentially) dealing with fractions.

--
Greg

From greg.ewing at canterbury.ac.nz  Sat Oct 27 08:49:01 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 27 Oct 2007 19:49:01 +1300
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<200710260855.52649.mark@qtrac.eu>
	<da3f900e0710260648m13e0bj576eea24c1f1ece@mail.gmail.com>
	<472264E8.9060205@canterbury.ac.nz>
	<87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20710261744r4d298a90neadfac410cadbeec@mail.gmail.com>
	<87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4722DF5D.80402@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> I just (over?)reacted to
> the suggestion that *if* people do have trouble, telling them to
> change expectations will have a useful effect.

I wasn't really suggesting that they change their expectations,
only that we shouldn't use such expectations as a basis for
deciding what to do.

--
Greg

From lists at cheimes.de  Sat Oct 27 16:23:08 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 27 Oct 2007 16:23:08 +0200
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
Message-ID: <472349CC.4050103@cheimes.de>

Guido van Rossum wrote:
>> There are also some issues that mainly crop up in non-English locales.
>> We will try to get to the bottom of those before releasing 3.0a2, but
>> I need help as I'm myself absolutely unable to work with locales (and
>> I don't have access to a Windows box).
> 
> I think Christian and a few others are making progress here.

I've hit another wall of bricks on Windows. It's not possible to run
Python from a directory with non ASCII characters:
http://bugs.python.org/issue1342. I've a patch that reduces the problem
from a segfault to an unrecoverable import error. The remaining problem
seems to lay deep in PC/getpathp.c:Py_GetPath(). It seems that it can't
handle non ASCII chars correctly.

The second line is a fprintf(stderr, "%s\n", char *path). Do you see the
difference between "test???" and "test???"?

c:\test???\PCBuild8\win32release>python
c:\test???\PCBuild8\win32release\python30.zip;c:\test???\DLLs;c:\
test???\lib;c:\test???\lib\plat-win;c:\test???\lib\lib-tk;c:\test???\PCBuild8\wi
n32release
Fatal Python error: Py_Initialize: can't initialize sys
standard streams
object  : ImportError('No module named encodings.utf_8',)
type    : ImportError
refcount: 4
address : 00A43540
lost sys.stderr

Christian

From skip at pobox.com  Sat Oct 27 20:25:46 2007
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 27 Oct 2007 13:25:46 -0500
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com>
References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com>
Message-ID: <18211.33450.332197.304601@montanaro.dyndns.org>


    Bill> I found that an SSL test was failing on 3K because of the following:
    ...
    Bill> All of the modules in plat-mac are full of this kind of stuff.

ISTR much of the plat-mac stuff was generated by Tools/bgen.  If so, that
would be the place to fix things.

Skip

From janssen at parc.com  Sun Oct 28 00:03:08 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 15:03:08 PDT
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <18211.33450.332197.304601@montanaro.dyndns.org> 
References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com>
	<18211.33450.332197.304601@montanaro.dyndns.org>
Message-ID: <07Oct27.150317pdt."57996"@synergy1.parc.xerox.com>

> ISTR much of the plat-mac stuff was generated by Tools/bgen.  If so, that
> would be the place to fix things.

Sure looks like generated code.  Be nice if that generator was run
during the build process, on OS X.  That way you'd be sure to get code
that matches the platform and codebase.

Bill

From janssen at parc.com  Sun Oct 28 00:27:08 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 15:27:08 PDT
Subject: [Python-3000] Odd output from test -- buffering bug?
Message-ID: <07Oct27.152712pdt."57996"@synergy1.parc.xerox.com>

I'm seeing a sort of odd thing going on when running one of my tests.
I'm seeing two lines of output, from two different threads, being
duplicated when I run with "regrtest -u all -v test_ssl".  This is
with the latest 3K sources on PPC OS X 10.4.10.

testSTARTTLS (test.test_ssl.ThreadedTests) ... 
 client:  sending b'msg 1'...
^@client:  sending b'msg 1'...
 server:  new connection from ('127.0.0.1', 52371)
 server:  new connection from ('127.0.0.1', 52371)

This is output to an Emacs shell buffer, so it shows control
characters in the output, and I'm seeing a NUL character being output
there at the beginning of the third line.  Both of the duplicated lines
are being output with code like this:

    if test_support.verbose:
        sys.stdout.write(
            " client:  sending %s...\n" % repr(msg))

This looks like some kind of buffering bug.  Is it in the test
harness, or the standard I/O library?

Bill


From janssen at parc.com  Sun Oct 28 02:11:06 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 18:11:06 PDT
Subject: [Python-3000] bug in i/o module buffering?
Message-ID: <07Oct27.181107pdt."57996"@synergy1.parc.xerox.com>

In the following, 'n' is equal to 0 (read from a non-blocking socket).
Is this a bug in the I/O module buffering?

Bill

Traceback (most recent call last):
  File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request
    self.process_request(request, client_address)
  File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request
    self.finish_request(request, client_address)
  File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__
    self.handle()
  File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle
    self.handle_one_request()
  File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request
    self.raw_requestline = self.rfile.readline()
  File "/local/python/3k/src/Lib/io.py", line 391, in readline
    b = self.read(nreadahead())
  File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead
    readahead = self.peek(1, unsafe=True)
  File "/local/python/3k/src/Lib/io.py", line 778, in peek
    current = self.raw.read(to_read)
  File "/local/python/3k/src/Lib/io.py", line 455, in read
    del b[n:]
TypeError: 'slice' object does not support item deletion
----------------------------------------


From guido at python.org  Sun Oct 28 02:21:21 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Oct 2007 18:21:21 -0700
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <8566285166171308234@unknownmsgid>
References: <8566285166171308234@unknownmsgid>
Message-ID: <ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>

More interesting is, what's b?

2007/10/27, Bill Janssen <janssen at parc.com>:
> In the following, 'n' is equal to 0 (read from a non-blocking socket).
> Is this a bug in the I/O module buffering?
>
> Bill
>
> Traceback (most recent call last):
>   File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request
>     self.process_request(request, client_address)
>   File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request
>     self.finish_request(request, client_address)
>   File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__
>     self.handle()
>   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle
>     self.handle_one_request()
>   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request
>     self.raw_requestline = self.rfile.readline()
>   File "/local/python/3k/src/Lib/io.py", line 391, in readline
>     b = self.read(nreadahead())
>   File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead
>     readahead = self.peek(1, unsafe=True)
>   File "/local/python/3k/src/Lib/io.py", line 778, in peek
>     current = self.raw.read(to_read)
>   File "/local/python/3k/src/Lib/io.py", line 455, in read
>     del b[n:]
> TypeError: 'slice' object does not support item deletion
> ----------------------------------------
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Oct 28 02:22:32 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 27 Oct 2007 18:22:32 -0700
Subject: [Python-3000] Odd output from test -- buffering bug?
In-Reply-To: <7173684078305279630@unknownmsgid>
References: <7173684078305279630@unknownmsgid>
Message-ID: <ca471dc20710271822m290e4366l6359433db277c3f2@mail.gmail.com>

Hard to say. Never seen this before. Are you using fork() *anywhere*
in your tests (not necessarily the affected test)?

2007/10/27, Bill Janssen <janssen at parc.com>:
> I'm seeing a sort of odd thing going on when running one of my tests.
> I'm seeing two lines of output, from two different threads, being
> duplicated when I run with "regrtest -u all -v test_ssl".  This is
> with the latest 3K sources on PPC OS X 10.4.10.
>
> testSTARTTLS (test.test_ssl.ThreadedTests) ...
>  client:  sending b'msg 1'...
> ^@client:  sending b'msg 1'...
>  server:  new connection from ('127.0.0.1', 52371)
>  server:  new connection from ('127.0.0.1', 52371)
>
> This is output to an Emacs shell buffer, so it shows control
> characters in the output, and I'm seeing a NUL character being output
> there at the beginning of the third line.  Both of the duplicated lines
> are being output with code like this:
>
>     if test_support.verbose:
>         sys.stdout.write(
>             " client:  sending %s...\n" % repr(msg))
>
> This looks like some kind of buffering bug.  Is it in the test
> harness, or the standard I/O library?
>
> Bill
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Sun Oct 28 02:24:04 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 18:24:04 PDT
Subject: [Python-3000] bad socket close in httplib.py
Message-ID: <07Oct27.182413pdt."57996"@synergy1.parc.xerox.com>

I think the socket close in HTTPConnection.close() is incorrect, but is being
hidden by the delayed closing implemented in socket.py.  See issue
1348.

Bill

From janssen at parc.com  Sun Oct 28 02:27:29 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 18:27:29 PDT
Subject: [Python-3000] Odd output from test -- buffering bug?
In-Reply-To: <ca471dc20710271822m290e4366l6359433db277c3f2@mail.gmail.com> 
References: <7173684078305279630@unknownmsgid>
	<ca471dc20710271822m290e4366l6359433db277c3f2@mail.gmail.com>
Message-ID: <07Oct27.182738pdt."57996"@synergy1.parc.xerox.com>

No, not unless the test harness uses it.  But there are two threads.

> Hard to say. Never seen this before. Are you using fork() *anywhere*
> in your tests (not necessarily the affected test)?
> 
> 2007/10/27, Bill Janssen <janssen at parc.com>:
> > I'm seeing a sort of odd thing going on when running one of my tests.
> > I'm seeing two lines of output, from two different threads, being
> > duplicated when I run with "regrtest -u all -v test_ssl".  This is
> > with the latest 3K sources on PPC OS X 10.4.10.
> >
> > testSTARTTLS (test.test_ssl.ThreadedTests) ...
> >  client:  sending b'msg 1'...
> > ^@client:  sending b'msg 1'...
> >  server:  new connection from ('127.0.0.1', 52371)
> >  server:  new connection from ('127.0.0.1', 52371)
> >
> > This is output to an Emacs shell buffer, so it shows control
> > characters in the output, and I'm seeing a NUL character being output
> > there at the beginning of the third line.  Both of the duplicated lines
> > are being output with code like this:
> >
> >     if test_support.verbose:
> >         sys.stdout.write(
> >             " client:  sending %s...\n" % repr(msg))
> >
> > This looks like some kind of buffering bug.  Is it in the test
> > harness, or the standard I/O library?
> >
> > Bill
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> 
> 
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)


From janssen at parc.com  Sun Oct 28 02:30:30 2007
From: janssen at parc.com (Bill Janssen)
Date: Sat, 27 Oct 2007 18:30:30 PDT
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com> 
References: <8566285166171308234@unknownmsgid>
	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>
Message-ID: <07Oct27.183033pdt."57996"@synergy1.parc.xerox.com>

>From RawIOBase.read().  What's __index__() do?

        b = bytes(n.__index__())

> More interesting is, what's b?
> 
> 2007/10/27, Bill Janssen <janssen at parc.com>:
> > In the following, 'n' is equal to 0 (read from a non-blocking socket).
> > Is this a bug in the I/O module buffering?
> >
> > Bill
> >
> > Traceback (most recent call last):
> >   File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request
> >     self.process_request(request, client_address)
> >   File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request
> >     self.finish_request(request, client_address)
> >   File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request
> >     self.RequestHandlerClass(request, client_address, self)
> >   File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__
> >     self.handle()
> >   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle
> >     self.handle_one_request()
> >   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request
> >     self.raw_requestline = self.rfile.readline()
> >   File "/local/python/3k/src/Lib/io.py", line 391, in readline
> >     b = self.read(nreadahead())
> >   File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead
> >     readahead = self.peek(1, unsafe=True)
> >   File "/local/python/3k/src/Lib/io.py", line 778, in peek
> >     current = self.raw.read(to_read)
> >   File "/local/python/3k/src/Lib/io.py", line 455, in read
> >     del b[n:]
> > TypeError: 'slice' object does not support item deletion
> > ----------------------------------------
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> 
> 
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)


From jimjjewett at gmail.com  Sun Oct 28 18:27:57 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 28 Oct 2007 13:27:57 -0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <47226D39.2020002@canterbury.ac.nz>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru>
	<47221FD1.3080802@hastings.org>
	<fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>
	<47226D39.2020002@canterbury.ac.nz>
Message-ID: <fb6fbf560710281027l21c35358w7261e786dc5a9a5e@mail.gmail.com>

On 10/26/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Jim Jewett wrote:
> > If it weren't for backwards compatibility, 'i' would be a much better
> > option,

> No, it wouldn't, because 'integer' is a data type, not
> a display format. The Python format codes specify display
> formats, not data types.

I think that distinction is splitting hairs.

(1)  Even to a programmer, there may not be much difference between

    "%f" prints it as a float
and
    "%f" means to convert it to a float and print that

(If anything, the docs support the second definition.)

(2)  To most people, all numbers are base-10, and using another base
is just a silly affectation, like pig-latin.

Decimal doesn't mean "base 10", it means "has a decimal point", and
contrasts with both fractions and integers.

Programmers have typically been exceptions, but I'm not sure how true
that will remain in the future.  Octal is already a wart that causes
more bugs that it prevents.  Hex is still useful.  In another
half-generation, I'm not so sure.

It is *probably* too early to drop support for %d as "Signed integer
decimal" rather than "Decimal".  But I believe the docs would already
be improved by changing the definition table at
http://docs.python.org/lib/typesseq-strings.html from

    d	Signed integer decimal.	
    i	Signed integer decimal.

to

    d	Signed integer decimal.  Currently an alias for i.
    i	Signed integer decimal.

-jJ

From jimjjewett at gmail.com  Sun Oct 28 18:36:21 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 28 Oct 2007 13:36:21 -0400
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <472349CC.4050103@cheimes.de>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
	<472349CC.4050103@cheimes.de>
Message-ID: <fb6fbf560710281036v7ea69108pb93361666f6d300f@mail.gmail.com>

On 10/27/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:

> The second line is a fprintf(stderr, "%s\n", char *path).
> Do you see the
> difference between "test???" and "test???"?

One likely difference is that test??? should be a legitimate (unicode)
Python name, but test??? probably isn't, because the division sign
isn't alphanumeric.

Also, there is a chance that test??? was already in the appropriate
normalized form, but test??? probably isn't, because of the
superscript.

Whether either of these *should* matter in this case, I couldn't tell
from your post.

-jJ


-jJ

From jimjjewett at gmail.com  Sun Oct 28 18:45:36 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 28 Oct 2007 13:45:36 -0400
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <8035548431694532893@unknownmsgid>
References: <8566285166171308234@unknownmsgid>
	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>
	<8035548431694532893@unknownmsgid>
Message-ID: <fb6fbf560710281045w749c47afyf2a1ba87a192ef65@mail.gmail.com>

On 10/27/07, Bill Janssen <janssen at parc.com> wrote:

> > >   File "/local/python/3k/src/Lib/io.py", line 455, in read
> > >     del b[n:]
> > > TypeError: 'slice' object does not support item deletion

>         b = bytes(n.__index__())

Isn't bytes the *im*mutable bytestring, so that you would need a
buffer (rather than a bytes) if you plan to clear it out?

-jJ

From lists at cheimes.de  Sun Oct 28 18:54:14 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 28 Oct 2007 18:54:14 +0100
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <fb6fbf560710281036v7ea69108pb93361666f6d300f@mail.gmail.com>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>	
	<472349CC.4050103@cheimes.de>
	<fb6fbf560710281036v7ea69108pb93361666f6d300f@mail.gmail.com>
Message-ID: <4724CCC6.3080705@cheimes.de>

Jim Jewett wrote:
> One likely difference is that test??? should be a legitimate (unicode)
> Python name, but test??? probably isn't, because the division sign
> isn't alphanumeric.
> 
> Also, there is a chance that test??? was already in the appropriate
> normalized form, but test??? probably isn't, because of the
> superscript.
> 
> Whether either of these *should* matter in this case, I couldn't tell
> from your post.

I'm neither a Windows expert nor an experienced Windows developer. I'm
just guessing here. Could it be that Python is using the char* NameA API
methods instead of the wide wchar_t * NameW methods?

Christian

From lists at cheimes.de  Sun Oct 28 19:27:05 2007
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 28 Oct 2007 19:27:05 +0100
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <fb6fbf560710281045w749c47afyf2a1ba87a192ef65@mail.gmail.com>
References: <8566285166171308234@unknownmsgid>	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>	<8035548431694532893@unknownmsgid>
	<fb6fbf560710281045w749c47afyf2a1ba87a192ef65@mail.gmail.com>
Message-ID: <fg2k9q$emv$1@ger.gmane.org>

Jim Jewett wrote:
> Isn't bytes the *im*mutable bytestring, so that you would need a
> buffer (rather than a bytes) if you plan to clear it out?

The types aren't renamed yet. Bytes is still the mutable bytestring and
str8 the immutable.

Christian


From janssen at parc.com  Sun Oct 28 19:28:14 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 28 Oct 2007 11:28:14 PDT
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <fb6fbf560710281045w749c47afyf2a1ba87a192ef65@mail.gmail.com> 
References: <8566285166171308234@unknownmsgid>
	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>
	<8035548431694532893@unknownmsgid>
	<fb6fbf560710281045w749c47afyf2a1ba87a192ef65@mail.gmail.com>
Message-ID: <07Oct28.102824pst."57996"@synergy1.parc.xerox.com>

Jim Jewett wrote:
> On 10/27/07, Bill Janssen <janssen at parc.com> wrote:
> 
> > > >   File "/local/python/3k/src/Lib/io.py", line 455, in read
> > > >     del b[n:]
> > > > TypeError: 'slice' object does not support item deletion
> 
> >         b = bytes(n.__index__())
> 
> Isn't bytes the *im*mutable bytestring, so that you would need a
> buffer (rather than a bytes) if you plan to clear it out?

I think when this code was written, "bytes" was mutable (that's why it
couldn't be a key in a dict).  If I understand the grand plan
correctly, "bytes" will become "buffer" (mutable), and "str8" will
become "bytes" (immutable).

Bill

From janssen at parc.com  Sun Oct 28 20:09:56 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 28 Oct 2007 12:09:56 PDT
Subject: [Python-3000] socket GC worries
Message-ID: <07Oct28.111004pst."57996"@synergy1.parc.xerox.com>

I've now got a working SSL patch for Py3K (assuming that the patches
for #1347 and #1349 are committed), but I'm a bit worried about the
lazy GC of sockets.  I find that simply dropping an SSLSocket on the
floor doesn't GC the C structures.  This implies that the instance in
the SSLSocket._sslobj slot never gets decref'ed.  I think it's due to
the fact that "socket.makefile()" creates a circular reference with an
instance of "socket.SocketCloser", which points to the socket, and the
socket has a slot which points to the "_closer".  If "socket.close()"
is never explicitly called, the underlying system socket never gets
closed.  Since sockets are bound to a scarce system resource, this
could be problematic.

I think that the SocketCloser (new in Py3K) was developed to address
another issue, which is that there's a lot of library code which
assumes that the Python socket instance is just window dressing over
an underlying system file descriptor, and isn't important.  In fact,
that whole mess of code is a good argument for *not* exposing the
fileno in Python (perhaps only for special cases, like "select").
Take httplib and urllib, for instance.  HTTPConnection creates a
"file" from the socket, by calling socket.makefile(), then in some
cases *closes* the socket (thereby reasonably rendering the socket
*dead*), *then* returns the "file" to the caller as part of the
response.  urllib then takes the response, pulls the "file" out of it,
and discards the rest, returning the "file" as part of an instance of
addinfourl.  Somewhere along the way some code should call "close()"
on that HTTPConnection socket, but not till the caller is finished
using the bytes of the response (and those bytes are kept queued up in
the real OS socket).  Ideally, GC of the response instance should call
close() on the socket instance, which means that the instance should
be passed along as part of the response, IMO.

Bill

From python3now at gmail.com  Sun Oct 28 21:05:43 2007
From: python3now at gmail.com (James Thiele)
Date: Sun, 28 Oct 2007 13:05:43 -0700
Subject: [Python-3000] __bool__ in 2.6?
Message-ID: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>

PEP 361 lists __bool__ support as being possible for 2.6 backporting.
As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha
uses __bool__. Has a decision been made on whether this will make the
cut for 2.6?

In a more general vein, is there a cutoff date for producing a list of
3.0 features which will be backported to 2.6?

Thanks,
James

From qrczak at knm.org.pl  Sun Oct 28 21:19:40 2007
From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk)
Date: Sun, 28 Oct 2007 21:19:40 +0100
Subject: [Python-3000] PEP 3137 plan of attack (stage 3)
In-Reply-To: <472349CC.4050103@cheimes.de>
References: <ca471dc20710261218l4db3128fr26fa8e1738299da1@mail.gmail.com>
	<472349CC.4050103@cheimes.de>
Message-ID: <1193602780.17694.2.camel@qrnik>

Dnia 27-10-2007, So o godzinie 16:23 +0200, Christian Heimes pisze:

> The second line is a fprintf(stderr, "%s\n", char *path). Do you see the
> difference between "test???" and "test???"?

"???" in CP-1252 has the same bytes as "???" in CP-850, so this is some
confusion between ANSI and OEM codepages.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


From brett at python.org  Sun Oct 28 22:23:54 2007
From: brett at python.org (Brett Cannon)
Date: Sun, 28 Oct 2007 14:23:54 -0700
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
Message-ID: <bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>

On 10/28/07, James Thiele <python3now at gmail.com> wrote:
> PEP 361 lists __bool__ support as being possible for 2.6 backporting.
> As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha
> uses __bool__. Has a decision been made on whether this will make the
> cut for 2.6?
>
> In a more general vein, is there a cutoff date for producing a list of
> 3.0 features which will be backported to 2.6?

Backporting decisions have not been made as the feature set of 3.0 is
still a moving target.  Once we nail down the features (I am going to
guess not until b1 at the earliest) then backporting will probably
start.

-Brett

From greg.ewing at canterbury.ac.nz  Sun Oct 28 22:49:18 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 29 Oct 2007 10:49:18 +1300
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <fb6fbf560710281027l21c35358w7261e786dc5a9a5e@mail.gmail.com>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru>
	<47221FD1.3080802@hastings.org>
	<fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>
	<47226D39.2020002@canterbury.ac.nz>
	<fb6fbf560710281027l21c35358w7261e786dc5a9a5e@mail.gmail.com>
Message-ID: <472503DE.5080608@canterbury.ac.nz>

Jim Jewett wrote:
> Decimal doesn't mean "base 10", it means "has a decimal point"

According to dictionary.com, it means

1.	pertaining to tenths or to the number 10.
2.	proceeding by tens: a decimal system.

--
Greg

From greg.ewing at canterbury.ac.nz  Sun Oct 28 22:56:55 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 29 Oct 2007 10:56:55 +1300
Subject: [Python-3000] socket GC worries
In-Reply-To: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
Message-ID: <472505A7.108@canterbury.ac.nz>

Bill Janssen wrote:
> that whole mess of code is a good argument for *not* exposing the
> fileno in Python

Seems to me that a socket should already *be* a file,
so it shouldn't need a makefile() method and you
shouldn't have to mess around with filenos.

--
Greg

From janssen at parc.com  Mon Oct 29 00:36:42 2007
From: janssen at parc.com (Bill Janssen)
Date: Sun, 28 Oct 2007 16:36:42 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <472505A7.108@canterbury.ac.nz> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz>
Message-ID: <07Oct28.153644pst."57996"@synergy1.parc.xerox.com>

> Bill Janssen wrote:
> > that whole mess of code is a good argument for *not* exposing the
> > fileno in Python
> 
> Seems to me that a socket should already *be* a file,
> so it shouldn't need a makefile() method and you
> shouldn't have to mess around with filenos.

I like that model, too.  I also wish the classes in io.py were sort of
inverted; that is, I'd like to have an IOStream base class with read()
and write() methods (and maybe close()), which things like Socket
could inherit from.  FileIO would inherit from IOStream and from
Seekable, and add a fileno() method and "name" property.  And so
forth.  But apparently that's out; maybe in Python 4000.

Right now the socket is very much like an OS socket; with "send" and
"recv" being the star players, not "read" and "write".  socket.makefile
wraps a buffered file-like interface around it.

Bill

From jimjjewett at gmail.com  Mon Oct 29 00:51:48 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 28 Oct 2007 19:51:48 -0400
Subject: [Python-3000] PEP 3101 suggested corrections
In-Reply-To: <472503DE.5080608@canterbury.ac.nz>
References: <da3f900e0710251650gc9ec6b8s7a705cc94c6dee28@mail.gmail.com>
	<20071026142036.GB3365@phd.pp.ru>
	<ca471dc20710260724h77cb5213mee1e9d20b0de9a46@mail.gmail.com>
	<200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru>
	<47221FD1.3080802@hastings.org>
	<fb6fbf560710261214k780d7d00q25a604e35e12e09b@mail.gmail.com>
	<47226D39.2020002@canterbury.ac.nz>
	<fb6fbf560710281027l21c35358w7261e786dc5a9a5e@mail.gmail.com>
	<472503DE.5080608@canterbury.ac.nz>
Message-ID: <fb6fbf560710281651m1615d34bqd2f1cc49e5668d48@mail.gmail.com>

On 10/28/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Jim Jewett wrote:
> > Decimal doesn't mean "base 10", it means "has a decimal point"

> According to dictionary.com, it means

I see that I wasn't clear about this still being within the scope of
"To most people ..."

The dictionary gives a correct definition -- but realistically, that
definition is jargon, rather than the way most people I've talked to
actually use it.

When I asked my kids what they were studying in math, the answer was
sometimes "decimals" -- and this was always after plenty of work with
multiple-digit arithmetic, but years before they learned about
alternate bases.

-jJ

From guido at python.org  Mon Oct 29 18:41:31 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 10:41:31 -0700
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <1209807056282906541@unknownmsgid>
References: <18211.33450.332197.304601@montanaro.dyndns.org>
	<1209807056282906541@unknownmsgid>
Message-ID: <ca471dc20710291041s3f66d478se6168310164e6f60@mail.gmail.com>

2007/10/27, Bill Janssen <janssen at parc.com>:
> > ISTR much of the plat-mac stuff was generated by Tools/bgen.  If so, that
> > would be the place to fix things.
>
> Sure looks like generated code.  Be nice if that generator was run
> during the build process, on OS X.  That way you'd be sure to get code
> that matches the platform and codebase.

ISTR that the generator needs a lot of hand-holding. Fixing it would
be A Project.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 18:48:05 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 10:48:05 -0700
Subject: [Python-3000] Odd output from test -- buffering bug?
In-Reply-To: <9003268132624447768@unknownmsgid>
References: <7173684078305279630@unknownmsgid>
	<ca471dc20710271822m290e4366l6359433db277c3f2@mail.gmail.com>
	<9003268132624447768@unknownmsgid>
Message-ID: <ca471dc20710291048k7fcc7a76m9845c8adc311728d@mail.gmail.com>

Thinking about this some more, the io module isn't thread-safe. It
probably should be (the old file objects were more-or-less
thread-safe, although I believe there might've been corner cases if
one thread were to close a file).

--Guido

2007/10/27, Bill Janssen <janssen at parc.com>:
> No, not unless the test harness uses it.  But there are two threads.
>
> > Hard to say. Never seen this before. Are you using fork() *anywhere*
> > in your tests (not necessarily the affected test)?
> >
> > 2007/10/27, Bill Janssen <janssen at parc.com>:
> > > I'm seeing a sort of odd thing going on when running one of my tests.
> > > I'm seeing two lines of output, from two different threads, being
> > > duplicated when I run with "regrtest -u all -v test_ssl".  This is
> > > with the latest 3K sources on PPC OS X 10.4.10.
> > >
> > > testSTARTTLS (test.test_ssl.ThreadedTests) ...
> > >  client:  sending b'msg 1'...
> > > ^@client:  sending b'msg 1'...
> > >  server:  new connection from ('127.0.0.1', 52371)
> > >  server:  new connection from ('127.0.0.1', 52371)
> > >
> > > This is output to an Emacs shell buffer, so it shows control
> > > characters in the output, and I'm seeing a NUL character being output
> > > there at the beginning of the third line.  Both of the duplicated lines
> > > are being output with code like this:
> > >
> > >     if test_support.verbose:
> > >         sys.stdout.write(
> > >             " client:  sending %s...\n" % repr(msg))
> > >
> > > This looks like some kind of buffering bug.  Is it in the test
> > > harness, or the standard I/O library?
> > >
> > > Bill
> > >
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 19:07:21 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 11:07:21 -0700
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <-2917254322633780664@unknownmsgid>
References: <8566285166171308234@unknownmsgid>
	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>
	<-2917254322633780664@unknownmsgid>
Message-ID: <ca471dc20710291107i1a5ecaa0h74e225b9131def47@mail.gmail.com>

__index__() converts an "int-like" object to an int. This is needed to
make sure that e.g. numpy integral scalars can be used for indexing.
For a regular int it doesn't matter, so here it's a red herring.

I'm asking about b because the error message "TypeError: 'slice'
object does not support item deletion" would suggest that b is a slice
object. I agree that doesn't sound very likely given the code
though... :-( Could you step through this using pdb and investigate
some more? Perhaps there's a refcount error somewhere in the C code?

--Guido

2007/10/27, Bill Janssen <janssen at parc.com>:
> From RawIOBase.read().  What's __index__() do?
>
>         b = bytes(n.__index__())
>
> > More interesting is, what's b?
> >
> > 2007/10/27, Bill Janssen <janssen at parc.com>:
> > > In the following, 'n' is equal to 0 (read from a non-blocking socket).
> > > Is this a bug in the I/O module buffering?
> > >
> > > Bill
> > >
> > > Traceback (most recent call last):
> > >   File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request
> > >     self.process_request(request, client_address)
> > >   File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request
> > >     self.finish_request(request, client_address)
> > >   File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request
> > >     self.RequestHandlerClass(request, client_address, self)
> > >   File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__
> > >     self.handle()
> > >   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle
> > >     self.handle_one_request()
> > >   File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request
> > >     self.raw_requestline = self.rfile.readline()
> > >   File "/local/python/3k/src/Lib/io.py", line 391, in readline
> > >     b = self.read(nreadahead())
> > >   File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead
> > >     readahead = self.peek(1, unsafe=True)
> > >   File "/local/python/3k/src/Lib/io.py", line 778, in peek
> > >     current = self.raw.read(to_read)
> > >   File "/local/python/3k/src/Lib/io.py", line 455, in read
> > >     del b[n:]
> > > TypeError: 'slice' object does not support item deletion
> > > ----------------------------------------
> > >
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 19:10:31 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 11:10:31 -0700
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
Message-ID: <ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>

2007/10/28, Brett Cannon <brett at python.org>:
> On 10/28/07, James Thiele <python3now at gmail.com> wrote:
> > PEP 361 lists __bool__ support as being possible for 2.6 backporting.
> > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha
> > uses __bool__. Has a decision been made on whether this will make the
> > cut for 2.6?
> >
> > In a more general vein, is there a cutoff date for producing a list of
> > 3.0 features which will be backported to 2.6?
>
> Backporting decisions have not been made as the feature set of 3.0 is
> still a moving target.  Once we nail down the features (I am going to
> guess not until b1 at the earliest) then backporting will probably
> start.

In this case, like many, the backport can't be an exact copy of the
3.0 code: 2.6 *must* support __nonzero__. But it should also support
__bool__ as a fallback. I think it would be great if someone submited
a patch to implement this (though it isn't necessarily the highest
backporting priority).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From dwheeler at dwheeler.com  Mon Oct 29 19:34:03 2007
From: dwheeler at dwheeler.com (David A. Wheeler)
Date: Mon, 29 Oct 2007 14:34:03 -0400 (EDT)
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <47169E6E.7000804@canterbury.ac.nz>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
Message-ID: <E1ImZR5-000251-4n@garm.runbox.com>

I think several postings have explained better than I have on why __cmp__ is still very valuable.  (See below.)

Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here:
  http://www.dwheeler.com/misc/pep-cmp.txt 
Please let me know if it makes sense.  Thanks.

Greg Ewing stated "Why not provide a __richcmp__ method that directly connects
with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age."
It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!).  But maybe I just don't understand the comment, so explanation welcome.



--- David A. Wheeler

========================================

Aahz: 	 
>From my perspective, the real use case for cmp() is when you want to do
>a three-way comparison of a "large" object (for example, a Decimal
>instance). You can store the result of cmp() and then do a separate
>three-way branch.

and reply to the note "I'm having troubles coming up with things where
the *basic* operator is really a cmp-like function.", there were two replies..


Guido van Rossum:
>Here's one. When implementing the '<' operator on lists or tuples, you
> really want to call the 'cmp' operator on the individual items,
> because otherwise (if all you have is == and <) the algorithm becomes
> something like "compare for equality until you've found the first pair
> of items that are unequal; then compare those items again using < to
> decide the final outcome". If you don't believe this, try to implement
> this operation using only == or < without comparing any two items more
> than once.

and

Greg Ewing:
> Think of things like comparing a tuple. You need to work your
> way along and recursively compare the elements. The decision
> about when to stop always involves ==, whatever comparison
> you're trying to do. So if e.g. you're doing <, then you have
> to test each element first for <, and if that's false, test
> it for ==. If the element is itself a tuple, it's doing this
> on its elements too, etc., and things get very inefficient.
> 
> If you have a single cmp operation that you can apply to the
> elements, you only need to do it once for each element and it
> gives you all the information you need.


From python3now at gmail.com  Mon Oct 29 19:40:19 2007
From: python3now at gmail.com (James Thiele)
Date: Mon, 29 Oct 2007 11:40:19 -0700
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
Message-ID: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>

So just to clarify:
2.5 __nonzero__ only
2.6 __nonzero__ first, then __bool__ (if patch submitted)
3.x __bool__ first, then __nonzero__

Is this correct?

On 10/29/07, Guido van Rossum <guido at python.org> wrote:
> 2007/10/28, Brett Cannon <brett at python.org>:
> > On 10/28/07, James Thiele <python3now at gmail.com> wrote:
> > > PEP 361 lists __bool__ support as being possible for 2.6 backporting.
> > > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha
> > > uses __bool__. Has a decision been made on whether this will make the
> > > cut for 2.6?
> > >
> > > In a more general vein, is there a cutoff date for producing a list of
> > > 3.0 features which will be backported to 2.6?
> >
> > Backporting decisions have not been made as the feature set of 3.0 is
> > still a moving target.  Once we nail down the features (I am going to
> > guess not until b1 at the earliest) then backporting will probably
> > start.
>
> In this case, like many, the backport can't be an exact copy of the
> 3.0 code: 2.6 *must* support __nonzero__. But it should also support
> __bool__ as a fallback. I think it would be great if someone submited
> a patch to implement this (though it isn't necessarily the highest
> backporting priority).
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From fdrake at acm.org  Mon Oct 29 19:44:55 2007
From: fdrake at acm.org (Fred Drake)
Date: Mon, 29 Oct 2007 14:44:55 -0400
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
	<8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
Message-ID: <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org>

On Oct 29, 2007, at 2:40 PM, James Thiele wrote:
> So just to clarify:
> 2.6 __nonzero__ first, then __bool__ (if patch submitted)
> 3.x __bool__ first, then __nonzero__

I'd expect switching the order for this to be a bug magnet.  I'd much  
rather see:

2.5 __nonzero__ only
2.6 __bool__ first, then __nonzero__ (if patch submitted)
3.x __bool__ first, then __nonzero__

The fewer variations there are in the algorithm, the better.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From guido at python.org  Mon Oct 29 19:45:55 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 11:45:55 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <6186646035112263762@unknownmsgid>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
Message-ID: <ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>

2007/10/28, Bill Janssen <janssen at parc.com>:
> > Bill Janssen wrote:
> > > that whole mess of code is a good argument for *not* exposing the
> > > fileno in Python
> >
> > Seems to me that a socket should already *be* a file,
> > so it shouldn't need a makefile() method and you
> > shouldn't have to mess around with filenos.

That model fits TCP/IP streams just fine, but doesn't work so well for
UDP and other odd socket types. The assumption that "s.write(a);
s.write(b) is equivalent to s.write(a+b)", which is fundamental for
any "stream" abstraction, just doesn't work for UDP. Ditto for
reading: AFAIK recv() truncates the rest of an UDP packet.

> I like that model, too.  I also wish the classes in io.py were sort of
> inverted; that is, I'd like to have an IOStream base class with read()
> and write() methods (and maybe close()), which things like Socket
> could inherit from.  FileIO would inherit from IOStream and from
> Seekable, and add a fileno() method and "name" property.  And so
> forth.  But apparently that's out; maybe in Python 4000.

Actually, I'm still up for tweaks to the I/O model if it solves a real
problem, as long as most of the high-level APIs stay the same (there
simply is too much code that expects those to behave a certain way).

I don't quite understand what you mean by inverted though.

> Right now the socket is very much like an OS socket; with "send" and
> "recv" being the star players, not "read" and "write".  socket.makefile
> wraps a buffered file-like interface around it.

I was going to say "we can just replace SocketIO with a non-seekable
_fileio.FileIO instance" until I realized that on Windows, socket fds
and filesystem fds live in different spaces and are managed using
different calls. That may also explain why the inversion you're
looking for doesn't quite work (IIUC what you meant).

The real issue seems to be file descriptor GC. Maybe we haven't
written down the rules clearly enough for when the fd is supposed to
be GC'ed, when there are both a socket and a SocketIO (or more)
referencing it; and whether a close() call means something beyond
dropping the last reference to the object. Or maybe we haven't
implemented the rules right? ISTM that the SocketCloser class is
*intended* to solve these issues. Back to your initial mail (which is
more relevant than Greg Ewing's snipe!):

> I think that the SocketCloser (new in Py3K) was developed to address
> another issue, which is that there's a lot of library code which
> assumes that the Python socket instance is just window dressing over
> an underlying system file descriptor, and isn't important.  In fact,
> that whole mess of code is a good argument for *not* exposing the
> fileno in Python (perhaps only for special cases, like "select").
> Take httplib and urllib, for instance.  HTTPConnection creates a
> "file" from the socket, by calling socket.makefile(), then in some
> cases *closes* the socket (thereby reasonably rendering the socket
> *dead*), *then* returns the "file" to the caller as part of the
> response.  urllib then takes the response, pulls the "file" out of it,
> and discards the rest, returning the "file" as part of an instance of
> addinfourl.  Somewhere along the way some code should call "close()"
> on that HTTPConnection socket, but not till the caller is finished
> using the bytes of the response (and those bytes are kept queued up in
> the real OS socket).  Ideally, GC of the response instance should call
> close() on the socket instance, which means that the instance should
> be passed along as part of the response, IMO.

Hm, I think you're right. The SocketCloser class wasn't written with
the SSL use case in mind. :-( I wonder if one key to solving the
problem isn't to make the socket *wrap* a low-level _socket instance
instead of *being* one (i.e. containment instead of subclassing). Then
the SSL code could be passed the low-level _socket instance and the
high(er)-level socket class could wrap either a _socket or an SSL
instance. The SocketCloser would then be responsible for closing
whatever the socket instance wraps, i.e. either the _socket or the SSL
instance. Then we could have any number of SocketIO instances *plus*
at most one socket instance, and the wrapped thing would be closed
when the last of the higher-level things was either GC'ed or
explicitly closed. If you wanted to reuse the _socket after closing
the SSL instance, you'd have to wrap it in a fresh socket instance.

Does that make sense? (Please do note the difference throughout
between _socket and socket, the former being defined in socketmodule.c
and the latter in socket.py.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 19:50:50 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 11:50:50 -0700
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
	<8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
Message-ID: <ca471dc20710291150s20cb3615p7a1a6cb55fda37e6@mail.gmail.com>

2007/10/29, James Thiele <python3now at gmail.com>:
> So just to clarify:
> 2.5 __nonzero__ only
> 2.6 __nonzero__ first, then __bool__ (if patch submitted)
> 3.x __bool__ first, then __nonzero__
>
> Is this correct?

No. 3.x tests __bool__ only.

> On 10/29/07, Guido van Rossum <guido at python.org> wrote:
> > 2007/10/28, Brett Cannon <brett at python.org>:
> > > On 10/28/07, James Thiele <python3now at gmail.com> wrote:
> > > > PEP 361 lists __bool__ support as being possible for 2.6 backporting.
> > > > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha
> > > > uses __bool__. Has a decision been made on whether this will make the
> > > > cut for 2.6?
> > > >
> > > > In a more general vein, is there a cutoff date for producing a list of
> > > > 3.0 features which will be backported to 2.6?
> > >
> > > Backporting decisions have not been made as the feature set of 3.0 is
> > > still a moving target.  Once we nail down the features (I am going to
> > > guess not until b1 at the earliest) then backporting will probably
> > > start.
> >
> > In this case, like many, the backport can't be an exact copy of the
> > 3.0 code: 2.6 *must* support __nonzero__. But it should also support
> > __bool__ as a fallback. I think it would be great if someone submited
> > a patch to implement this (though it isn't necessarily the highest
> > backporting priority).
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 19:51:23 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 11:51:23 -0700
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
	<8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
	<0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org>
Message-ID: <ca471dc20710291151i6ef7c14te5f50c02cf8b3c63@mail.gmail.com>

2007/10/29, Fred Drake <fdrake at acm.org>:
> On Oct 29, 2007, at 2:40 PM, James Thiele wrote:
> > So just to clarify:
> > 2.6 __nonzero__ first, then __bool__ (if patch submitted)
> > 3.x __bool__ first, then __nonzero__
>
> I'd expect switching the order for this to be a bug magnet.  I'd much
> rather see:
>
> 2.5 __nonzero__ only
> 2.6 __bool__ first, then __nonzero__ (if patch submitted)
> 3.x __bool__ first, then __nonzero__
>
> The fewer variations there are in the algorithm, the better.

Makes sense, if you change the 3.x rule to

3.x __bool__ only.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fdrake at acm.org  Mon Oct 29 19:57:53 2007
From: fdrake at acm.org (Fred Drake)
Date: Mon, 29 Oct 2007 14:57:53 -0400
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <ca471dc20710291151i6ef7c14te5f50c02cf8b3c63@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
	<8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
	<0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org>
	<ca471dc20710291151i6ef7c14te5f50c02cf8b3c63@mail.gmail.com>
Message-ID: <6242C601-C31D-4347-975E-DE18AF3D9062@acm.org>

On Oct 29, 2007, at 2:51 PM, Guido van Rossum wrote:
> Makes sense, if you change the 3.x rule to
>
> 3.x __bool__ only.

Even better!  I think I'm going to like 3.0 if I ever get a chance to  
use it.  ;-)


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




From janssen at parc.com  Mon Oct 29 19:59:26 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 29 Oct 2007 11:59:26 PDT
Subject: [Python-3000] bug in i/o module buffering?
In-Reply-To: <ca471dc20710291107i1a5ecaa0h74e225b9131def47@mail.gmail.com> 
References: <8566285166171308234@unknownmsgid>
	<ca471dc20710271821l3fa83724sba0c27a589f3e7c8@mail.gmail.com>
	<-2917254322633780664@unknownmsgid>
	<ca471dc20710291107i1a5ecaa0h74e225b9131def47@mail.gmail.com>
Message-ID: <07Oct29.105930pst."57996"@synergy1.parc.xerox.com>

> I'm asking about b because the error message "TypeError: 'slice'
> object does not support item deletion" would suggest that b is a slice
> object. I agree that doesn't sound very likely given the code
> though... :-( Could you step through this using pdb and investigate
> some more? Perhaps there's a refcount error somewhere in the C code?

I'll see if I can unfix my test code to reproduce the failure :-).

Bill

From janssen at parc.com  Mon Oct 29 20:24:36 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 29 Oct 2007 12:24:36 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
Message-ID: <07Oct29.112445pst."57996"@synergy1.parc.xerox.com>

> The SocketCloser class wasn't written with
> the SSL use case in mind. 

I don't think it's just SSL.  The problem is that it explicitly counts
calls to "close()".  So if you let the GC sweep up after you, that
close() just doesn't get called, the circular refs persist, and the
resource doesn't get collected till the backup GC runs (if it does).
Waiting for that to happen, you might run out of a scarce system
resource (file descriptors).  A nasty timing-dependent bug, there.

Hmmm, does real_close even get called in that case?  In the C module,
perhaps?

> If you wanted to reuse the _socket after closing
> the SSL instance, you'd have to wrap it in a fresh socket instance.
> 
> Does that make sense? (Please do note the difference throughout
> between _socket and socket, the former being defined in socketmodule.c
> and the latter in socket.py.)

That's what I do with SSLSocket, pretty much.  I worry that doing it
with socket.socket might break a lot of non-TCP code, though.  And
perhaps it's overkill.

Why not move the count of how many SocketIO instances are pointing to
it into the socket.socket class again, as it was in 2.x?  I don't
think you're gaining anything with the circular data structure of
SocketCloser.  Add a "_closed" property, and "__del__" method to
socket.socket (which calls "close()").  Remove SocketCloser.  You're
finished, and there's one less class to maintain.

And, ref your other comments, why not call SocketIO "TCPStream"?  It
would make things much clearer.

Also, is it too late to rename socket.socket to "socket.Socket"?
There are only a handful of references to "socket.socket" outside of
the socket and ssl modules.

Bill



From rhamph at gmail.com  Mon Oct 29 20:49:53 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Oct 2007 13:49:53 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1ImZR5-000251-4n@garm.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
Message-ID: <aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>

On 10/29/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I think several postings have explained better than I have on why __cmp__ is still very valuable.  (See below.)
>
> Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here:
>   http://www.dwheeler.com/misc/pep-cmp.txt
> Please let me know if it makes sense.  Thanks.
>
> Greg Ewing stated "Why not provide a __richcmp__ method that directly connects
> with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age."
> It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!).  But maybe I just don't understand the comment, so explanation welcome.

I believe the intent was for __richcmp__ to take an argument
indicating what sort of comparison is to be done (as tp_richcompare
does in C.)  ie, you'd write code like this:

    def __richcmp__(self, other, op):
        if !isinstance(other, MyType):
            return NotImplemented
        return richcmp(self.foo, other.foo, op)

Short-circuiting of equality checks (due to identity or interning)
would work right.  Likewise, there's no odd behaviour with
comparable-but-unorderable types.

It's not clear to me how many distinct operations you'd need though,
or how acceptable reflections would be.  Would only two operations,
equality and ordering, be sufficient?  Just what are the non-symmetric
use cases the current design caters to?


>
>
>
> --- David A. Wheeler
>
> ========================================
>
> Aahz:
> >From my perspective, the real use case for cmp() is when you want to do
> >a three-way comparison of a "large" object (for example, a Decimal
> >instance). You can store the result of cmp() and then do a separate
> >three-way branch.
>
> and reply to the note "I'm having troubles coming up with things where
> the *basic* operator is really a cmp-like function.", there were two replies..
>
>
> Guido van Rossum:
> >Here's one. When implementing the '<' operator on lists or tuples, you
> > really want to call the 'cmp' operator on the individual items,
> > because otherwise (if all you have is == and <) the algorithm becomes
> > something like "compare for equality until you've found the first pair
> > of items that are unequal; then compare those items again using < to
> > decide the final outcome". If you don't believe this, try to implement
> > this operation using only == or < without comparing any two items more
> > than once.
>
> and
>
> Greg Ewing:
> > Think of things like comparing a tuple. You need to work your
> > way along and recursively compare the elements. The decision
> > about when to stop always involves ==, whatever comparison
> > you're trying to do. So if e.g. you're doing <, then you have
> > to test each element first for <, and if that's false, test
> > it for ==. If the element is itself a tuple, it's doing this
> > on its elements too, etc., and things get very inefficient.
> >
> > If you have a single cmp operation that you can apply to the
> > elements, you only need to do it once for each element and it
> > gives you all the information you need.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com
>


-- 
Adam Olsen, aka Rhamphoryncus

From janssen at parc.com  Mon Oct 29 20:46:14 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 29 Oct 2007 12:46:14 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
Message-ID: <07Oct29.114621pst."57996"@synergy1.parc.xerox.com>

> Actually, I'm still up for tweaks to the I/O model if it solves a real
> problem, as long as most of the high-level APIs stay the same (there
> simply is too much code that expects those to behave a certain way).
>
> I don't quite understand what you mean by inverted though.

I'm actually thinking more in terms of avoiding future problems.  I
thought we'd discussed this a few months ago, but here it is again:

I'd break up the BaseIO class into a small set of base classes, so that
we can be more explicit about what a particular kind of I/O channel is
or is not:

(Please excuse typos, I'm generating this off-the-cuff -- does
@abstract actually exist?)

-------------------------------------------------------------

class IOStream:

   @abstract
   def close(self):

   @property
   def closed(self):

class InputIOStream (IOStream):

   @abstract
   def read(self, buffer=None, nbytes=None):

class OutputIOStream (IOStream):

   @abstract
   def write(self, bytes):

   @abstract
   def flush(self):

class SeekableIOStream (IOStream):

   @abstract
   def tell(self):

   @abstract
   def seek(self):

   @abstract
   def truncate(self):

class SystemIOStream (IOStream):

   @property
   def fileno(self):

   @property
   def isatty (self):

class TextInputStream (InputIOStream):

   @abstract
   def readline(self):

   @abstract
   def readlines(self):

class TextOutputStream (InputIOStream):

   @abstract
   def readline(self):

   @abstract
   def readlines(self):

class FileStream (SystemIOStream, SeekableIOStream):

   @property
   name

   @property
   mode

# note that open() would return FileStream mixed with one or both of
# {Text}InputStream and {Text}OutputStream, depending on the "mode".

class StringIO (SeekableIOStream):

# again, mixed with IO modes, depending on "mode".

-------------------------------------------------------------

I think of this as inverted, because it puts primitives like "read"
and "write" at the lowest layers, not above things like "fileno" or
"truncate", which are very specialized and should only apply to a
subset of I/O channels.

I realize that there are some practical problems with this; such as
making _fileio.FileIO inherit from (multiple) Python base classes.

Bill




From steven.bethard at gmail.com  Mon Oct 29 21:04:56 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 29 Oct 2007 14:04:56 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1ImZR5-000251-4n@garm.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
Message-ID: <d11dcfba0710291304qcf5b757m64048d1fc044c42a@mail.gmail.com>

On 10/29/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> I think several postings have explained better than I have on why __cmp__ is still very valuable.  (See below.)
>
> Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here:
>   http://www.dwheeler.com/misc/pep-cmp.txt
> Please let me know if it makes sense.  Thanks.

I think the PEP's a little misleading in that it makes it sound like
defining __lt__, __gt__, etc. is inefficient.  I think you want to be
explicit about where __lt__, __gt__ are efficient, and where __cmp__
is efficient.  For example::

* __lt__ is more efficient for sorting (given the current implementation)
* __cmp__ is more efficient for comparing sequences like tuples, where
you always need to check for equality first, and you don't want to
have to do an == check followed by a < check if you can do them both
at the same time. (This is basically the same argument as for Decimal
-- why do two comparisons when you can do one?)

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Mon Oct 29 21:26:16 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 13:26:16 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1ImZR5-000251-4n@garm.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
Message-ID: <ca471dc20710291326j63660149ia5ef26d4a703d167@mail.gmail.com>

I'm a bit too busy to look into this right now; I hope one or two more
rounds of feedback on the PEP will get it into a state where I can
review it more easily. Having a patch to go with it would be immensely
helpful as well (in fact I'd say that without a patch it's unlikely to
happen).

--Guido

2007/10/29, David A. Wheeler <dwheeler at dwheeler.com>:
> I think several postings have explained better than I have on why __cmp__ is still very valuable.  (See below.)
>
> Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here:
>   http://www.dwheeler.com/misc/pep-cmp.txt
> Please let me know if it makes sense.  Thanks.
>
> Greg Ewing stated "Why not provide a __richcmp__ method that directly connects
> with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age."
> It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!).  But maybe I just don't understand the comment, so explanation welcome.
>
>
>
> --- David A. Wheeler
>
> ========================================
>
> Aahz:
> >From my perspective, the real use case for cmp() is when you want to do
> >a three-way comparison of a "large" object (for example, a Decimal
> >instance). You can store the result of cmp() and then do a separate
> >three-way branch.
>
> and reply to the note "I'm having troubles coming up with things where
> the *basic* operator is really a cmp-like function.", there were two replies..
>
>
> Guido van Rossum:
> >Here's one. When implementing the '<' operator on lists or tuples, you
> > really want to call the 'cmp' operator on the individual items,
> > because otherwise (if all you have is == and <) the algorithm becomes
> > something like "compare for equality until you've found the first pair
> > of items that are unequal; then compare those items again using < to
> > decide the final outcome". If you don't believe this, try to implement
> > this operation using only == or < without comparing any two items more
> > than once.
>
> and
>
> Greg Ewing:
> > Think of things like comparing a tuple. You need to work your
> > way along and recursively compare the elements. The decision
> > about when to stop always involves ==, whatever comparison
> > you're trying to do. So if e.g. you're doing <, then you have
> > to test each element first for <, and if that's false, test
> > it for ==. If the element is itself a tuple, it's doing this
> > on its elements too, etc., and things get very inefficient.
> >
> > If you have a single cmp operation that you can apply to the
> > elements, you only need to do it once for each element and it
> > gives you all the information you need.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 21:32:14 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 13:32:14 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <-5143302779702104898@unknownmsgid>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<-5143302779702104898@unknownmsgid>
Message-ID: <ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com>

2007/10/29, Bill Janssen <janssen at parc.com>:
> > The SocketCloser class wasn't written with
> > the SSL use case in mind.
>
> I don't think it's just SSL.  The problem is that it explicitly counts
> calls to "close()".  So if you let the GC sweep up after you, that
> close() just doesn't get called, the circular refs persist, and the
> resource doesn't get collected till the backup GC runs (if it does).
> Waiting for that to happen, you might run out of a scarce system
> resource (file descriptors).  A nasty timing-dependent bug, there.

Ouch. Unfortunately adding a __del__() method that calls close()
won't really help, as the cyclic GC refuses to do anything with
objects having a __del__. This needs more thinking than I have time
for right now, but i agree we need to fix it.

> Hmmm, does real_close even get called in that case?  In the C module,
> perhaps?

The C module will certainly close the fd when the object goes away.
The question is, is that soon enough.

> > If you wanted to reuse the _socket after closing
> > the SSL instance, you'd have to wrap it in a fresh socket instance.
> >
> > Does that make sense? (Please do note the difference throughout
> > between _socket and socket, the former being defined in socketmodule.c
> > and the latter in socket.py.)
>
> That's what I do with SSLSocket, pretty much.  I worry that doing it
> with socket.socket might break a lot of non-TCP code, though.  And
> perhaps it's overkill.
>
> Why not move the count of how many SocketIO instances are pointing to
> it into the socket.socket class again, as it was in 2.x?  I don't
> think you're gaining anything with the circular data structure of
> SocketCloser.  Add a "_closed" property, and "__del__" method to
> socket.socket (which calls "close()").  Remove SocketCloser.  You're
> finished, and there's one less class to maintain.

I'll look into this later.

> And, ref your other comments, why not call SocketIO "TCPStream"?  It
> would make things much clearer.

Good idea; SocketIO made more sense when it was part of io.py.

> Also, is it too late to rename socket.socket to "socket.Socket"?
> There are only a handful of references to "socket.socket" outside of
> the socket and ssl modules.

Really? AFAIK everyone who opens a socket calls it.

I'd be okay with calling the class Socket and having a factory
function named socket though.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Oct 29 21:38:19 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 13:38:19 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <3470699275677683430@unknownmsgid>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<3470699275677683430@unknownmsgid>
Message-ID: <ca471dc20710291338q40c186dfkaf75ebc4fe005638@mail.gmail.com>

2007/10/29, Bill Janssen <janssen at parc.com>:
> > Actually, I'm still up for tweaks to the I/O model if it solves a real
> > problem, as long as most of the high-level APIs stay the same (there
> > simply is too much code that expects those to behave a certain way).
> >
> > I don't quite understand what you mean by inverted though.
>
> I'm actually thinking more in terms of avoiding future problems.

Can you remind me of what future problems again?

> I thought we'd discussed this a few months ago, but here it is again:
>
> I'd break up the BaseIO class into a small set of base classes, so that
> we can be more explicit about what a particular kind of I/O channel is
> or is not:

I see, static type checks in favor of dynamic behavior checks -- e.g.
isinstance(s, SeekableIOStream) rather than s.seekable(). If that's
all, I guess I already expressed earlier I don't really like that --
in practice I think the dynamic checks are more flexible, and the
class hierarchy you're proposing is hard to implement in C (where
unfortunately I'm restricted to single inheritance). E.g. depending on
how a program is invoked, sys.stdin may be seekable or it may not be.

--Guido

> (Please excuse typos, I'm generating this off-the-cuff -- does
> @abstract actually exist?)
>
> -------------------------------------------------------------
>
> class IOStream:
>
>    @abstract
>    def close(self):
>
>    @property
>    def closed(self):
>
> class InputIOStream (IOStream):
>
>    @abstract
>    def read(self, buffer=None, nbytes=None):
>
> class OutputIOStream (IOStream):
>
>    @abstract
>    def write(self, bytes):
>
>    @abstract
>    def flush(self):
>
> class SeekableIOStream (IOStream):
>
>    @abstract
>    def tell(self):
>
>    @abstract
>    def seek(self):
>
>    @abstract
>    def truncate(self):
>
> class SystemIOStream (IOStream):
>
>    @property
>    def fileno(self):
>
>    @property
>    def isatty (self):
>
> class TextInputStream (InputIOStream):
>
>    @abstract
>    def readline(self):
>
>    @abstract
>    def readlines(self):
>
> class TextOutputStream (InputIOStream):
>
>    @abstract
>    def readline(self):
>
>    @abstract
>    def readlines(self):
>
> class FileStream (SystemIOStream, SeekableIOStream):
>
>    @property
>    name
>
>    @property
>    mode
>
> # note that open() would return FileStream mixed with one or both of
> # {Text}InputStream and {Text}OutputStream, depending on the "mode".
>
> class StringIO (SeekableIOStream):
>
> # again, mixed with IO modes, depending on "mode".
>
> -------------------------------------------------------------
>
> I think of this as inverted, because it puts primitives like "read"
> and "write" at the lowest layers, not above things like "fileno" or
> "truncate", which are very specialized and should only apply to a
> subset of I/O channels.
>
> I realize that there are some practical problems with this; such as
> making _fileio.FileIO inherit from (multiple) Python base classes.
>
> Bill
>
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg at krypto.org  Mon Oct 29 23:26:45 2007
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 29 Oct 2007 15:26:45 -0700
Subject: [Python-3000] 3K bytes I/O?
In-Reply-To: <ca471dc20710261226r5eea7cdax70aca984e1529beb@mail.gmail.com>
References: <-912240280709553237@unknownmsgid>
	<ca471dc20710261226r5eea7cdax70aca984e1529beb@mail.gmail.com>
Message-ID: <52dc1c820710291526t3b176560o9745dae1b1198dc1@mail.gmail.com>

And for non-unicode inputs the code should use the PEP 3118 buffer API
rather than PyBytes_ or PyString_ or whatnot.

On 10/26/07, Guido van Rossum <guido at python.org> wrote:
>
> 2007/10/26, Bill Janssen <janssen at parc.com>:
> > I'm looking at the Py3K SSL code, and have a question:
> >
> > What's the upshot of the bytes/string decisions in the C world?  Is
> > PyString_* now all about immutable bytes, and PyUnicode_* about
> > strings?  There still seem to be a lot of encode/decode methods in
> > stringobject.h, operations which I'd expect to be in unicodeobject.h.
>
> I think the PyString encode/decode APIs should all go; use the
> corresponding PyUnicode ones.
>
> I recommend that you write your code to assume PyBytes for
> encoded/binary data, and PyUnicode for text; at some point we'll
> substitute PyString for most cases where PyBytes is currently used:
> that will happen once PyString is called bytes in at the Python level,
> and PyBytes will be called buffer. But that's still a while off.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/greg%40krypto.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20071029/5b861cbb/attachment.htm 

From janssen at parc.com  Mon Oct 29 23:48:02 2007
From: janssen at parc.com (Bill Janssen)
Date: Mon, 29 Oct 2007 15:48:02 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<-5143302779702104898@unknownmsgid>
	<ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com>
Message-ID: <07Oct29.144811pst."57996"@synergy1.parc.xerox.com>

> Really? AFAIK everyone who opens a socket calls it.

Sorry, I meant only a handful (10?) in the standard library.

> I'd be okay with calling the class Socket and having a factory
> function named socket though.

Ah, good idea.

Bill

From greg.ewing at canterbury.ac.nz  Tue Oct 30 00:30:36 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 12:30:36 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <E1ImZR5-000251-4n@garm.runbox.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz> <E1ImZR5-000251-4n@garm.runbox.com>
Message-ID: <47266D1C.30302@canterbury.ac.nz>

David A. Wheeler wrote:
> Greg Ewing stated "Why not provide a __richcmp__ method that directly connects
> with the corresponding type slot?

> It _seems_ to me that this is the same as "__cmp__",

No, it's not -- a __richcmp__ method would take an extra
argument specifying which of the six comparison operations
to perform, and return a boolean instead of -1, 0, 1.
Giving it the same name as the old __cmp__ would be
confusing, I think.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Oct 30 00:32:35 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 12:32:35 +1300
Subject: [Python-3000] __bool__ in 2.6?
In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com>
	<bbaeab100710281423k77a54ca2ie5e059f6d0f71dbc@mail.gmail.com>
	<ca471dc20710291110i30ca8749w50d0e571e65b6d32@mail.gmail.com>
	<8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com>
Message-ID: <47266D93.7050407@canterbury.ac.nz>

James Thiele wrote:
> 3.x __bool__ first, then __nonzero__

Does 3.x need __nonzero__ at all?

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Oct 30 00:58:03 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 12:58:03 +1300
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
Message-ID: <4726738B.2080106@canterbury.ac.nz>

I wrote:

 > Seems to me that a socket should already *be* a file,
 > so it shouldn't need a makefile() method and you
 > shouldn't have to mess around with filenos.

Guido van Rossum wrote:

> That model fits TCP/IP streams just fine, but doesn't work so well for
> UDP and other odd socket types.

No, but I think that a socket should have read() and
write() methods that work if it happens to be a socket
of an appropriate kind. Unix lets you use read and write
as synonyms for send and recv on stream sockets, and
it's surprising that Python doesn't do the same.

At the very least, it should be possible to wrap
any of the higher-level I/O stack objects around a
stream socket directly.

> The real issue seems to be file descriptor GC. Maybe we haven't
> written down the rules clearly enough for when the fd is supposed to
> be GC'ed

I don't see what's so difficult about this. Each file
descriptor should be owned by exactly one object. If
two objects need to share a fd, then you dup() it so
that each one has its own fd. When the object is
close()d or GCed, it closes its fd.

However, I don't see that it should be necessary for
objects to share fds in the first place. Buffering
layers should wrap directly around the the object
being buffered, whether a file or socket or something
else. Then whether the socket has a fd or not is
an implementation detail of the socket object, so
there's no problem on Windows.

Bill Janssen wrote:

> Back to your initial mail (which is
> more relevant than Greg Ewing's snipe!):

What snipe? I'm trying to make a constructive suggestion.

> then in some
> cases *closes* the socket (thereby reasonably rendering the socket
> *dead*), *then* returns the "file" to the caller as part of the
> response.

I don't understand that. What good can returning a *closed* file
object possibly do anyone?

--
Greg

From guido at python.org  Tue Oct 30 01:05:41 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 17:05:41 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <47266D1C.30302@canterbury.ac.nz>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com> <47266D1C.30302@canterbury.ac.nz>
Message-ID: <ca471dc20710291705u4b21ee1obd376debaf77ae11@mail.gmail.com>

2007/10/29, Greg Ewing <greg.ewing at canterbury.ac.nz>:
> David A. Wheeler wrote:
> > Greg Ewing stated "Why not provide a __richcmp__ method that
> > directly connects with the corresponding type slot?
>
> > It _seems_ to me that this is the same as "__cmp__",
>
> No, it's not -- a __richcmp__ method would take an extra
> argument specifying which of the six comparison operations
> to perform, and return a boolean instead of -1, 0, 1.

Eh? Shouldn't it return True, False or NotImplemented if that's the interface?

> Giving it the same name as the old __cmp__ would be
> confusing, I think.

For sure.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Oct 30 01:08:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 13:08:39 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz> <E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
Message-ID: <47267607.2000806@canterbury.ac.nz>

Adam Olsen wrote:
> It's not clear to me how many distinct operations you'd need though,
> or how acceptable reflections would be.

My intention was just to directly expose the tp_richcmp
slot, so there would be six.

To make things easier in the common case, there could
perhaps be a utility function that would take a comparison
operation code and a -1, 0, 1 value and return the
appropriate boolean. Then a __richcmp__ method could be
written very similarly to the way a __cmp__ method is
now. It might even be possible for 2to3 to convert
__cmp__ methods to __richcmp__ methods automatically.

--
Greg

From guido at python.org  Tue Oct 30 01:15:27 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 17:15:27 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <4726738B.2080106@canterbury.ac.nz>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
Message-ID: <ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>

2007/10/29, Greg Ewing <greg.ewing at canterbury.ac.nz>:
> I wrote:
>
>  > Seems to me that a socket should already *be* a file,
>  > so it shouldn't need a makefile() method and you
>  > shouldn't have to mess around with filenos.
>
> Guido van Rossum wrote:
>
> > That model fits TCP/IP streams just fine, but doesn't work so well for
> > UDP and other odd socket types.
>
> No, but I think that a socket should have read() and
> write() methods that work if it happens to be a socket
> of an appropriate kind. Unix lets you use read and write
> as synonyms for send and recv on stream sockets, and
> it's surprising that Python doesn't do the same.

That's because I don't find the synonyms a good idea.

> At the very least, it should be possible to wrap
> any of the higher-level I/O stack objects around a
> stream socket directly.

Why? What problem does this solve?

> > The real issue seems to be file descriptor GC. Maybe we haven't
> > written down the rules clearly enough for when the fd is supposed to
> > be GC'ed
>
> I don't see what's so difficult about this. Each file
> descriptor should be owned by exactly one object. If
> two objects need to share a fd, then you dup() it so
> that each one has its own fd. When the object is
> close()d or GCed, it closes its fd.

On Windows you can't dup() a fd.

> However, I don't see that it should be necessary for
> objects to share fds in the first place. Buffering
> layers should wrap directly around the the object
> being buffered, whether a file or socket or something
> else. Then whether the socket has a fd or not is
> an implementation detail of the socket object, so
> there's no problem on Windows.

There's a tension though between using GC and explicit closing. A
fairly nice model would be that the lowest-level object "owns" the fd
and is the one to close it when it is GC'ed. However for various
reasons we don't want to rely on GC to close fds, since that may delay
closing in Jython and when there happens to be an innocent reference
keeping the lowest-level socket object alive (e.g. someone still has
it in their stack frame or traceback). So we end up having to
implement a second reference counting scheme on top of close() calls.
Which is what we did. But now just dropping the last reference to an
object doesn't call close(), so explicit closes suddenly become
mandatory instead of recommended good practice. Adding __del__ as an
alias for close might help, except this makes circular references a
primary sin (since the cycle GC doesn't like calling __del__). I guess
there really is no way around this solution though, and we'll just
have to make extra sure not to create cycles during normal usage
patterns, or use weak references in those cases where we can't avoid
them.

I think this is the way to go, together with changing the Socket class
from subclassing _socket to wrapping one.

--Guido

> Bill Janssen wrote:
>
> > Back to your initial mail (which is
> > more relevant than Greg Ewing's snipe!):
>
> What snipe? I'm trying to make a constructive suggestion.
>
> > then in some
> > cases *closes* the socket (thereby reasonably rendering the socket
> > *dead*), *then* returns the "file" to the caller as part of the
> > response.
>
> I don't understand that. What good can returning a *closed* file
> object possibly do anyone?
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Oct 30 01:48:53 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 13:48:53 +1300
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
	<ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>
Message-ID: <47267F75.7040404@canterbury.ac.nz>

Guido van Rossum wrote:

> That's because I don't find the synonyms a good idea.

Even if it means that stream sockets then have the
same interface as all other stream-like objects in
the I/O system, so buffering layers can be used on
them, etc.? That seems like a rather good reason to
me.

If you want to be pedantic about not having synonyms,
then fix send() and recv() so that they only work
on *non*-stream sockets, or have different classes
for stream and non-stream sockets.

In other words, to my mind, for stream sockets it's
send and recv that are synonyms for read and write,
not the other way around.

> On Windows you can't dup() a fd.

Oh, blarg. Forget that part, then.

But I still think it shouldn't be necessary to share
fds between different objects in the first place.

This is the problem that would be solved by making
sockets have an interface that is directly usable by
higher layers of the I/O system. There would be no
need to reach down below the socket object and grab
its fd, so the socket would have complete ownership
of it, and it would get closed when the socket
object eventually went away. This would happen at
the C level, so cycles and __del__ methods wouldn't
be a serious problem.

--
Greg

From guido at python.org  Tue Oct 30 01:58:56 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 29 Oct 2007 17:58:56 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <47267F75.7040404@canterbury.ac.nz>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
	<ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>
	<47267F75.7040404@canterbury.ac.nz>
Message-ID: <ca471dc20710291758h529dc8b8p12cd01b0bbd95c29@mail.gmail.com>

2007/10/29, Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Guido van Rossum wrote:
>
> > That's because I don't find the synonyms a good idea.
>
> Even if it means that stream sockets then have the
> same interface as all other stream-like objects in
> the I/O system, so buffering layers can be used on
> them, etc.? That seems like a rather good reason to
> me.
>
> If you want to be pedantic about not having synonyms,
> then fix send() and recv() so that they only work
> on *non*-stream sockets, or have different classes
> for stream and non-stream sockets.
>
> In other words, to my mind, for stream sockets it's
> send and recv that are synonyms for read and write,
> not the other way around.
>
> > On Windows you can't dup() a fd.
>
> Oh, blarg. Forget that part, then.
>
> But I still think it shouldn't be necessary to share
> fds between different objects in the first place.
>
> This is the problem that would be solved by making
> sockets have an interface that is directly usable by
> higher layers of the I/O system. There would be no
> need to reach down below the socket object and grab
> its fd, so the socket would have complete ownership
> of it, and it would get closed when the socket
> object eventually went away. This would happen at
> the C level, so cycles and __del__ methods wouldn't
> be a serious problem.

Having the SocketIO wrapper works just as well. I agree we need some
refactoring to deal with the ownership issue better, but having read()
and write() methods on the _socket object is not the solution.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Oct 30 02:07:18 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 30 Oct 2007 14:07:18 +1300
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291758h529dc8b8p12cd01b0bbd95c29@mail.gmail.com>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
	<ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>
	<47267F75.7040404@canterbury.ac.nz>
	<ca471dc20710291758h529dc8b8p12cd01b0bbd95c29@mail.gmail.com>
Message-ID: <472683C6.9090405@canterbury.ac.nz>

Guido van Rossum wrote:
> having read()
> and write() methods on the _socket object is not the solution.

It's not a necessary part of the solution, I agree.
I just don't see what purpose is served by requiring
an extra layer of wrapper between a socket and the
other I/O layers. That's not a necessary part of the
solution either.

--
Greg

From rhamph at gmail.com  Tue Oct 30 02:36:14 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 29 Oct 2007 19:36:14 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <47267607.2000806@canterbury.ac.nz>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
Message-ID: <aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>

On 10/29/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Adam Olsen wrote:
> > It's not clear to me how many distinct operations you'd need though,
> > or how acceptable reflections would be.
>
> My intention was just to directly expose the tp_richcmp
> slot, so there would be six.
>
> To make things easier in the common case, there could
> perhaps be a utility function that would take a comparison
> operation code and a -1, 0, 1 value and return the
> appropriate boolean. Then a __richcmp__ method could be
> written very similarly to the way a __cmp__ method is
> now. It might even be possible for 2to3 to convert
> __cmp__ methods to __richcmp__ methods automatically.

It'd be simpler still if we only had __cmp__ and __eq__.  I just don't
understand the use cases where that's not sufficient.

Hrm.  I guess set's subset checking requires more relationships than
__cmp__ provides.  Abandoning that feature probably isn't an option,
so nevermind me.

(Although, if we really wanted we could use -2/+2 to mean
subset/superset, while -1/+1 mean smaller/larger.)

-- 
Adam Olsen, aka Rhamphoryncus

From jyasskin at gmail.com  Tue Oct 30 06:19:43 2007
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Mon, 29 Oct 2007 22:19:43 -0700
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710291304qcf5b757m64048d1fc044c42a@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<d11dcfba0710291304qcf5b757m64048d1fc044c42a@mail.gmail.com>
Message-ID: <5d44f72f0710292219h11a28c2dk4c7540bc5bd824e4@mail.gmail.com>

On 10/29/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 10/29/07, David A. Wheeler <dwheeler at dwheeler.com> wrote:
> > I think several postings have explained better than I have on why __cmp__ is still very valuable.  (See below.)
> >
> > Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here:
> >   http://www.dwheeler.com/misc/pep-cmp.txt
> > Please let me know if it makes sense.  Thanks.
>
> I think the PEP's a little misleading in that it makes it sound like
> defining __lt__, __gt__, etc. is inefficient.  I think you want to be
> explicit about where __lt__, __gt__ are efficient, and where __cmp__
> is efficient.  For example::
>
> * __lt__ is more efficient for sorting (given the current implementation)
> * __cmp__ is more efficient for comparing sequences like tuples, where
> you always need to check for equality first, and you don't want to
> have to do an == check followed by a < check if you can do them both
> at the same time. (This is basically the same argument as for Decimal
> -- why do two comparisons when you can do one?)

When implementing a large, totally ordered object (>=2 fields), both
__lt__ and __cmp__ should probably be implemented by calling __cmp__
on the fields. If you decide to implement __lt__ by letting it forward
to __cmp__, the cutoff might be at 3 fields.

Partial orders (what the PEP calls "asymmetric classes") cannot, of
course, be implemented with __cmp__ and should have it return
NotImplemented. Well, if we wanted to diverge from most other
languages, we could extend __cmp__ to let it return a distinguished
"Unordered" value, which returns false on all comparisons with 0. This
is similar to Fortress's approach, which returns one of 4 values from
a PartialOrder's CMP operator: EqualTo, LessThan, GreaterThan, and
Unordered. Haskell has only a total ordering class in the core
libraries, while Scala has a PartiallyOrdered trait that returns None
from its compare method for unordered values.

For Python, I think I favor reviving __cmp__ for totally ordered
types, and asking that partially ordered ones return NotImplemented
from it explicitly.

Jeffrey

From janssen at parc.com  Tue Oct 30 17:20:11 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Oct 2007 09:20:11 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <4726738B.2080106@canterbury.ac.nz> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
Message-ID: <07Oct30.082017pst."57996"@synergy1.parc.xerox.com>

> Bill Janssen wrote:
> 
> > Back to your initial mail (which is
> > more relevant than Greg Ewing's snipe!):

Actually, Bill Janssen didn't write that, but did write this:

> > then in some
> > cases *closes* the socket (thereby reasonably rendering the socket
> > *dead*), *then* returns the "file" to the caller as part of the
> > response.
> 
> I don't understand that. What good can returning a *closed* file
> object possibly do anyone?

Indeed.  The httplib code is relying on the fact that close(), under
certain circumstances, has no effect.  It's just that the
circumstances have changed, in Python 3K.  I think that the close() in
HTTPConnection should be removed.

Bill

From guido at python.org  Tue Oct 30 17:31:07 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Oct 2007 09:31:07 -0700
Subject: [Python-3000] socket GC worries
In-Reply-To: <274486914162998601@unknownmsgid>
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz> <274486914162998601@unknownmsgid>
Message-ID: <ca471dc20710300931t164f01d9w2deab39fd3dd8dcc@mail.gmail.com>

2007/10/30, Bill Janssen <janssen at parc.com>:
> Indeed.  The httplib code is relying on the fact that close(), under
> certain circumstances, has no effect.  It's just that the
> circumstances have changed, in Python 3K.  I think that the close() in
> HTTPConnection should be removed.

I'd like to have an opinion, but this is not my code and there don't
seem to be enough unittests to make sure that removing that close()
doesn't break anything.

I'd love to work on this more in-depth but it'll have to wait until
after PEP 3137 is finished.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Tue Oct 30 18:37:29 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Oct 2007 10:37:29 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<-5143302779702104898@unknownmsgid>
	<ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com>
Message-ID: <07Oct30.093737pst."57996"@synergy1.parc.xerox.com>

> > I don't think it's just SSL.  The problem is that it explicitly counts
> > calls to "close()".  So if you let the GC sweep up after you, that
> > close() just doesn't get called, the circular refs persist, and the
> > resource doesn't get collected till the backup GC runs (if it does).
> > Waiting for that to happen, you might run out of a scarce system
> > resource (file descriptors).  A nasty timing-dependent bug, there.
> 
> Ouch. Unfortunately adding a __del__() method that calls close()
> won't really help, as the cyclic GC refuses to do anything with
> objects having a __del__. This needs more thinking than I have time
> for right now, but i agree we need to fix it.

But if we remove SocketCloser, there's no need for the cyclic GC to be
involved.  If the count (of the number of outstanding SocketIO
instances pointing to this socket.socket) is just moved into the
socket.socket object itself, there's no cyclic reference, and normal
refcounting should work just fine.  I don't even think a __del__ method
on socket.socket is necessary.

> > Why not move the count of how many SocketIO instances are pointing to
> > it into the socket.socket class again, as it was in 2.x?  I don't
> > think you're gaining anything with the circular data structure of
> > SocketCloser.  Add a "_closed" property, and "__del__" method to
> > socket.socket (which calls "close()").  Remove SocketCloser.  You're
> > finished, and there's one less class to maintain.
> 
> I'll look into this later.

OK.

Bill

From brett at python.org  Tue Oct 30 19:05:23 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 30 Oct 2007 11:05:23 -0700
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <ca471dc20710291041s3f66d478se6168310164e6f60@mail.gmail.com>
References: <18211.33450.332197.304601@montanaro.dyndns.org>
	<1209807056282906541@unknownmsgid>
	<ca471dc20710291041s3f66d478se6168310164e6f60@mail.gmail.com>
Message-ID: <bbaeab100710301105j4f149731sbb89b255554be85e@mail.gmail.com>

On 10/29/07, Guido van Rossum <guido at python.org> wrote:
> 2007/10/27, Bill Janssen <janssen at parc.com>:
> > > ISTR much of the plat-mac stuff was generated by Tools/bgen.  If so, that
> > > would be the place to fix things.
> >
> > Sure looks like generated code.  Be nice if that generator was run
> > during the build process, on OS X.  That way you'd be sure to get code
> > that matches the platform and codebase.
>
> ISTR that the generator needs a lot of hand-holding. Fixing it would
> be A Project.

Just so that it is publicly known, when the Great Stdlib Reorg begins,
I am seriously thinking of paring down the Mac stuff to the bare
minimum.  I think the only reason all the Mac stuff was even allowed
in to begin with was because Jack was one of the first contributors to
Python (but that is just a hunch).  It seems rather unfair to have all
of this Mac stuff in the stdlib while Windows doesn't go far beyond
_winreg and everything else is kept in win32all.  Considering it has
gone this far into Py3K and no one has noticed that it was broken kind
of says something anyway.

And no, I don't know when I am going to start doing the cleanup as I
am under time pressure for three proposals between now and late
December.

-Brett

From guido at python.org  Tue Oct 30 19:39:17 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 30 Oct 2007 11:39:17 -0700
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <bbaeab100710301105j4f149731sbb89b255554be85e@mail.gmail.com>
References: <18211.33450.332197.304601@montanaro.dyndns.org>
	<1209807056282906541@unknownmsgid>
	<ca471dc20710291041s3f66d478se6168310164e6f60@mail.gmail.com>
	<bbaeab100710301105j4f149731sbb89b255554be85e@mail.gmail.com>
Message-ID: <ca471dc20710301139v73834dd6uc8623a6033ff851f@mail.gmail.com>

Also, IMO the Mac-specific stuff was a lot more important before OSX.

The really interesting Mac stuff is the ObjC bridge which is not
maintained here anyway.

--Guido

2007/10/30, Brett Cannon <brett at python.org>:
> On 10/29/07, Guido van Rossum <guido at python.org> wrote:
> > 2007/10/27, Bill Janssen <janssen at parc.com>:
> > > > ISTR much of the plat-mac stuff was generated by Tools/bgen.  If so, that
> > > > would be the place to fix things.
> > >
> > > Sure looks like generated code.  Be nice if that generator was run
> > > during the build process, on OS X.  That way you'd be sure to get code
> > > that matches the platform and codebase.
> >
> > ISTR that the generator needs a lot of hand-holding. Fixing it would
> > be A Project.
>
> Just so that it is publicly known, when the Great Stdlib Reorg begins,
> I am seriously thinking of paring down the Mac stuff to the bare
> minimum.  I think the only reason all the Mac stuff was even allowed
> in to begin with was because Jack was one of the first contributors to
> Python (but that is just a hunch).  It seems rather unfair to have all
> of this Mac stuff in the stdlib while Windows doesn't go far beyond
> _winreg and everything else is kept in win32all.  Considering it has
> gone this far into Py3K and no one has noticed that it was broken kind
> of says something anyway.
>
> And no, I don't know when I am going to start doing the cleanup as I
> am under time pressure for three proposals between now and late
> December.
>
> -Brett
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From janssen at parc.com  Tue Oct 30 20:49:21 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Oct 2007 12:49:21 PDT
Subject: [Python-3000] socket GC worries
In-Reply-To: <07Oct30.093737pst."57996"@synergy1.parc.xerox.com> 
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<-5143302779702104898@unknownmsgid>
	<ca471dc20710291332m67019b74kb5d1205b880e790a@mail.gmail.com>
	<07Oct30.093737pst."57996"@synergy1.parc.xerox.com>
Message-ID: <07Oct30.114923pst."57996"@synergy1.parc.xerox.com>

> But if we remove SocketCloser, there's no need for the cyclic GC to be
> involved.  If the count (of the number of outstanding SocketIO
> instances pointing to this socket.socket) is just moved into the
> socket.socket object itself, there's no cyclic reference, and normal
> refcounting should work just fine.  I don't even think a __del__ method
> on socket.socket is necessary.

Here's a patch, for whenever you get back to this.  You can
ignore/remove the first hunk, which is about SSL.  I've tried all the
tests, and they work.  I've looked for leaks in test_socket and
test_ssl, no leaks.

Bill

Index: Lib/socket.py
===================================================================
--- Lib/socket.py	(revision 58714)
+++ Lib/socket.py	(working copy)
@@ -21,7 +21,6 @@
 htons(), htonl() -- convert 16, 32 bit int from host to network byte order
 inet_aton() -- convert IP addr string (123.45.67.89) to 32-bit packed format
 inet_ntoa() -- convert 32-bit packed format IP to string (123.45.67.89)
-ssl() -- secure socket layer support (only available if configured)
 socket.getdefaulttimeout() -- get the default timeout value
 socket.setdefaulttimeout() -- set the default timeout value
 create_connection() -- connects to an address, with an optional timeout
@@ -46,36 +45,6 @@
 import _socket
 from _socket import *
 
-try:
-    import _ssl
-    import ssl as _realssl
-except ImportError:
-    # no SSL support
-    pass
-else:
-    def ssl(sock, keyfile=None, certfile=None):
-        # we do an internal import here because the ssl
-        # module imports the socket module
-        warnings.warn("socket.ssl() is deprecated.  Use ssl.wrap_socket() instead.",
-                      DeprecationWarning, stacklevel=2)
-        return _realssl.sslwrap_simple(sock, keyfile, certfile)
-
-    # we need to import the same constants we used to...
-    from _ssl import SSLError as sslerror
-    from _ssl import \
-         RAND_add, \
-         RAND_egd, \
-         RAND_status, \
-         SSL_ERROR_ZERO_RETURN, \
-         SSL_ERROR_WANT_READ, \
-         SSL_ERROR_WANT_WRITE, \
-         SSL_ERROR_WANT_X509_LOOKUP, \
-         SSL_ERROR_SYSCALL, \
-         SSL_ERROR_SSL, \
-         SSL_ERROR_WANT_CONNECT, \
-         SSL_ERROR_EOF, \
-         SSL_ERROR_INVALID_ERROR_CODE
-
 import os, sys, io
 
 try:
@@ -119,49 +88,11 @@
         nfd = os.dup(fd)
         return socket(family, type, proto, fileno=nfd)
 
-class SocketCloser:
-
-    """Helper to manage socket close() logic for makefile().
-
-    The OS socket should not be closed until the socket and all
-    of its makefile-children are closed.  If the refcount is zero
-    when socket.close() is called, this is easy: Just close the
-    socket.  If the refcount is non-zero when socket.close() is
-    called, then the real close should not occur until the last
-    makefile-child is closed.
-    """
-
-    def __init__(self, sock):
-        self._sock = sock
-        self._makefile_refs = 0
-        # Test whether the socket is open.
-        try:
-            sock.fileno()
-            self._socket_open = True
-        except error:
-            self._socket_open = False
-
-    def socket_close(self):
-        self._socket_open = False
-        self.close()
-
-    def makefile_open(self):
-        self._makefile_refs += 1
-
-    def makefile_close(self):
-        self._makefile_refs -= 1
-        self.close()
-
-    def close(self):
-        if not (self._socket_open or self._makefile_refs):
-            self._sock._real_close()
-
-
 class socket(_socket.socket):
 
     """A subclass of _socket.socket adding the makefile() method."""
 
-    __slots__ = ["__weakref__", "_closer"]
+    __slots__ = ["__weakref__", "_io_refs", "_closed"]
     if not _can_dup_socket:
         __slots__.append("_base")
 
@@ -170,16 +101,17 @@
             _socket.socket.__init__(self, family, type, proto)
         else:
             _socket.socket.__init__(self, family, type, proto, fileno)
-        # Defer creating a SocketCloser until makefile() is actually called.
-        self._closer = None
+        self._io_refs = 0
+        self._closed = False
 
     def __repr__(self):
         """Wrap __repr__() to reveal the real class name."""
         s = _socket.socket.__repr__(self)
         if s.startswith("<socket object"):
-            s = "<%s.%s%s" % (self.__class__.__module__,
-                              self.__class__.__name__,
-                              s[7:])
+            s = "<%s.%s%s%s" % (self.__class__.__module__,
+                                self.__class__.__name__,
+                                (self._closed and " [closed] ") or "",
+                                s[7:])
         return s
 
     def accept(self):
@@ -196,6 +128,12 @@
             conn.close()
         return wrapper, addr
 
+    def decref_socketios(self):
+        if self._io_refs > 0:
+            self._io_refs -= 1
+        if self._closed:
+            self.close()
+
     def makefile(self, mode="r", buffering=None, *,
                  encoding=None, newline=None):
         """Return an I/O stream connected to the socket.
@@ -216,9 +154,8 @@
             rawmode += "r"
         if writing:
             rawmode += "w"
-        if self._closer is None:
-            self._closer = SocketCloser(self)
-        raw = SocketIO(self, rawmode, self._closer)
+        raw = SocketIO(self, rawmode)
+        self._io_refs += 1
         if buffering is None:
             buffering = -1
         if buffering < 0:
@@ -246,10 +183,9 @@
         return text
 
     def close(self):
-        if self._closer is None:
+        self._closed = True
+        if self._io_refs < 1:
             self._real_close()
-        else:
-            self._closer.socket_close()
 
     # _real_close calls close on the _socket.socket base class.
 
@@ -275,16 +211,14 @@
 
     # XXX More docs
 
-    def __init__(self, sock, mode, closer):
+    def __init__(self, sock, mode):
         if mode not in ("r", "w", "rw"):
             raise ValueError("invalid mode: %r" % mode)
         io.RawIOBase.__init__(self)
         self._sock = sock
         self._mode = mode
-        self._closer = closer
         self._reading = "r" in mode
         self._writing = "w" in mode
-        closer.makefile_open()
 
     def readinto(self, b):
         self._checkClosed()
@@ -308,10 +242,12 @@
     def close(self):
         if self.closed:
             return
-        self._closer.makefile_close()
         io.RawIOBase.close(self)
 
+    def __del__(self):
+        self._sock.decref_socketios()
 
+
 def getfqdn(name=''):
     """Get fully qualified domain name from name.
 

From janssen at parc.com  Tue Oct 30 20:52:42 2007
From: janssen at parc.com (Bill Janssen)
Date: Tue, 30 Oct 2007 12:52:42 PDT
Subject: [Python-3000] plat-mac seriously broken?
In-Reply-To: <ca471dc20710301139v73834dd6uc8623a6033ff851f@mail.gmail.com> 
References: <18211.33450.332197.304601@montanaro.dyndns.org>
	<1209807056282906541@unknownmsgid>
	<ca471dc20710291041s3f66d478se6168310164e6f60@mail.gmail.com>
	<bbaeab100710301105j4f149731sbb89b255554be85e@mail.gmail.com>
	<ca471dc20710301139v73834dd6uc8623a6033ff851f@mail.gmail.com>
Message-ID: <07Oct30.115248pst."57996"@synergy1.parc.xerox.com>

> Also, IMO the Mac-specific stuff was a lot more important before OSX.
> 
> The really interesting Mac stuff is the ObjC bridge which is not
> maintained here anyway.

I'm not so sure about that.  The IC module, for instance, plugs into
the Internet Config on the Mac, so you can read things like proxy
settings when making an HTTP or FTP connection.  To make Python as
useful on the Mac as it currently is, you'd have to refit a lot of
that in PyObjC, and bundle PyObjC into Python, wouldn't you?

And I haven't seen a lot of volunteers on the MacPython mailing list
raring to contribute to this.

Bill

From greg.ewing at canterbury.ac.nz  Tue Oct 30 21:42:35 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 31 Oct 2007 09:42:35 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz> <E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
Message-ID: <4727973B.3060203@canterbury.ac.nz>

Adam Olsen wrote:
> It'd be simpler still if we only had __cmp__ and __eq__.  I just don't
> understand the use cases where that's not sufficient.
> 
> Hrm.  I guess set's subset checking requires more relationships than
> __cmp__ provides.

Also, you might want to give the comparison operators meanings
that don't have anything to do with comparison in the usual
sense. The reason tp_richcmp was added in the first place was
so that arbitrary meanings could be given to the comparison
operators individually.

--
Greg

From rhamph at gmail.com  Tue Oct 30 22:36:57 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Oct 2007 15:36:57 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <4727973B.3060203@canterbury.ac.nz>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
Message-ID: <aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>

On 10/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Adam Olsen wrote:
> > It'd be simpler still if we only had __cmp__ and __eq__.  I just don't
> > understand the use cases where that's not sufficient.
> >
> > Hrm.  I guess set's subset checking requires more relationships than
> > __cmp__ provides.
>
> Also, you might want to give the comparison operators meanings
> that don't have anything to do with comparison in the usual
> sense. The reason tp_richcmp was added in the first place was
> so that arbitrary meanings could be given to the comparison
> operators individually.

Yeah.

It's clear to me that the opposition to removing __cmp__ comes down to
"make the common things easy and the rare things possible".  Removing
__cmp__ means one of the common things (total ordering) becomes hard.
__richcmp__ might solve that, but I'd like to see some larger examples
first (involving unordered types, total ordered types, and partially
ordered types.)


-- 
Adam Olsen, aka Rhamphoryncus

From steven.bethard at gmail.com  Wed Oct 31 01:17:17 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 30 Oct 2007 18:17:17 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
Message-ID: <d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>

On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> It's clear to me that the opposition to removing __cmp__ comes down to
> "make the common things easy and the rare things possible".  Removing
> __cmp__ means one of the common things (total ordering) becomes hard.

I don't really think that's it.  I don't see much of a difference in
difficulty between writing::

    class C(TotalOrderingMixin):
        def __lt__(self, other):
            self.foo < other.foo
        def __eq__(self, other):
            self.foo == other.foo

or writing [1] ::

    class C(object):
        def __cmp__(self, other):
            if self.foo < other.foo:
                return -1
            elif self.foo < other.foo:
                return 1
            else:
                return 0

The main motivation seems really to be efficiency for a particular
task.  For some tasks, e.g. sorting, you really only need __lt__, so
going through __cmp__ will just be slower. For other tasks, e.g.
comparing objects with several components, you know you have to do
both the __lt__ and __eq__ comparisons, so it would be wasteful to
make two calls when you know you could do it in one through __cmp__.

So it's not really about making things easier or harder, it's about
making the most efficient tool for the task available.

Steve

[1] Yes, of course, you could just write cmp(self.foo, other.foo), but
this is how it's been written in the rest of the thread, so I have to
assume that it's more representative of real code.
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From rhamph at gmail.com  Wed Oct 31 02:57:17 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Oct 2007 19:57:17 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
Message-ID: <aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>

On 10/30/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > It's clear to me that the opposition to removing __cmp__ comes down to
> > "make the common things easy and the rare things possible".  Removing
> > __cmp__ means one of the common things (total ordering) becomes hard.
>
> I don't really think that's it.  I don't see much of a difference in
> difficulty between writing::
>
>     class C(TotalOrderingMixin):
>         def __lt__(self, other):
>             self.foo < other.foo
>         def __eq__(self, other):
>             self.foo == other.foo
>
> or writing [1] ::
>
>     class C(object):
>         def __cmp__(self, other):
>             if self.foo < other.foo:
>                 return -1
>             elif self.foo < other.foo:
>                 return 1
>             else:
>                 return 0
>
> The main motivation seems really to be efficiency for a particular
> task.  For some tasks, e.g. sorting, you really only need __lt__, so
> going through __cmp__ will just be slower. For other tasks, e.g.
> comparing objects with several components, you know you have to do
> both the __lt__ and __eq__ comparisons, so it would be wasteful to
> make two calls when you know you could do it in one through __cmp__.
>
> So it's not really about making things easier or harder, it's about
> making the most efficient tool for the task available.
>
> Steve
>
> [1] Yes, of course, you could just write cmp(self.foo, other.foo), but
> this is how it's been written in the rest of the thread, so I have to
> assume that it's more representative of real code.

cmp and __cmp__ are doomed, due to unorderable types now raising exceptions:

>>> cmp(3, 'hello')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
>>> 3 == 'hello'
False

A mixin for __cmp__ would be sufficient for scalars (where you can
avoid this exception and your size is constant), but not for
containers (which need to avoid inappropriate types and wish to avoid
multiple passes.)

I don't think __richcmp__ makes the process quite as simple as we want though:

    class C(RichCmpMixin):
        def __richcmp__(self, other, mode):
            if not isinstance(other, C):
                return NotImplemented
            for a, b in zip(self.data, other.data):
                result = richcmp(a, b, mode)
                # XXX how do I know when to stop if all I'm doing is a
                # <= comparison?  cmp() is much easier!
            return richcmp(len(self.data), len(other.data), mode)

If you standardize the meaning of the return values, rather than
changing meaning based upon arguments, the whole thing works much
better.  A simple ordered flag indicates the extent of your
comparison.  Returning a false value always means equal, while
returning a true value means unequal possibly with a specific
ordering.

    class C:
        def __richcmp__(self, other, ordered):
            if not isinstance(other, C):
                return NotImplemented
            for a, b in zip(self.data, other.data):
                result = richcmp(a, b, ordered)
                if result:
                    return result
            return richcmp(len(self.data), len(other.data), ordered)

It also occurs to me that, if a type doesn't use symmetric
comparisons, it should raise an exception rather than silently doing
the wrong thing.  To do that you need to know explicitly when ordering
is being done (which richcmp/__richcmp__ does.)


-- 
Adam Olsen, aka Rhamphoryncus

From steven.bethard at gmail.com  Wed Oct 31 03:11:47 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 30 Oct 2007 20:11:47 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<47169E6E.7000804@canterbury.ac.nz>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
Message-ID: <d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>

On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> cmp and __cmp__ are doomed, due to unorderable types now raising exceptions:
>
> >>> cmp(3, 'hello')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: unorderable types: int() < str()
> >>> 3 == 'hello'
> False
>
> A mixin for __cmp__ would be sufficient for scalars (where you can
> avoid this exception and your size is constant), but not for
> containers (which need to avoid inappropriate types and wish to avoid
> multiple passes.)

I don't understand this conclusion.  If you start comparing things
that are unorderable, you'll get an exception.  But cmp() still makes
sense when you compare other things::

    >>> cmp((1, 'a', 4.5), (1, 'a', 6.2))
    -1
    >>> cmp([6, 5, 4], [6, 4, 5])
    1

I definitely don't want any cmp/__cmp__ implementation that swallows
exceptions when the types don't align, e.g.::

    >>> cmp((1, 'a'), ('a', 1))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: int() < str()

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From rhamph at gmail.com  Wed Oct 31 03:29:09 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Oct 2007 20:29:09 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
Message-ID: <aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>

On 10/30/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions:
> >
> > >>> cmp(3, 'hello')
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > TypeError: unorderable types: int() < str()
> > >>> 3 == 'hello'
> > False
> >
> > A mixin for __cmp__ would be sufficient for scalars (where you can
> > avoid this exception and your size is constant), but not for
> > containers (which need to avoid inappropriate types and wish to avoid
> > multiple passes.)
>
> I don't understand this conclusion.  If you start comparing things
> that are unorderable, you'll get an exception.  But cmp() still makes
> sense when you compare other things::
>
>     >>> cmp((1, 'a', 4.5), (1, 'a', 6.2))
>     -1
>     >>> cmp([6, 5, 4], [6, 4, 5])
>     1
>
> I definitely don't want any cmp/__cmp__ implementation that swallows
> exceptions when the types don't align, e.g.::
>
>     >>> cmp((1, 'a'), ('a', 1))
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: unorderable types: int() < str()

What I meant is that you can't use a mixin to map __eq__ to __cmp__,
as you'll get TypeError even though == is defined for those types.

-- 
Adam Olsen, aka Rhamphoryncus

From steven.bethard at gmail.com  Wed Oct 31 03:56:33 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 30 Oct 2007 20:56:33 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
	<aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>
Message-ID: <d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>

On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 10/30/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> > On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions:
> > >
> > > >>> cmp(3, 'hello')
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in <module>
> > > TypeError: unorderable types: int() < str()
> > > >>> 3 == 'hello'
> > > False
> > >
> > > A mixin for __cmp__ would be sufficient for scalars (where you can
> > > avoid this exception and your size is constant), but not for
> > > containers (which need to avoid inappropriate types and wish to avoid
> > > multiple passes.)
> >
> > I don't understand this conclusion.  If you start comparing things
> > that are unorderable, you'll get an exception.  But cmp() still makes
> > sense when you compare other things::
> >
> >     >>> cmp((1, 'a', 4.5), (1, 'a', 6.2))
> >     -1
> >     >>> cmp([6, 5, 4], [6, 4, 5])
> >     1
> >
> > I definitely don't want any cmp/__cmp__ implementation that swallows
> > exceptions when the types don't align, e.g.::
> >
> >     >>> cmp((1, 'a'), ('a', 1))
> >     Traceback (most recent call last):
> >       File "<stdin>", line 1, in <module>
> >     TypeError: unorderable types: int() < str()
>
> What I meant is that you can't use a mixin to map __eq__ to __cmp__,
> as you'll get TypeError even though == is defined for those types.

I wasn't suggesting that, though I don't see why a mixin would fail
here assuming you have both __eq__ and __lt__.  Just to the __lt__
comparison first.

I'm actually currently in favor of keeping __cmp__ as it is in Python
2.5.  If a class defines only __cmp__, Python will do the appropriate
dance to make <, >, ==, etc. work right.  If a class defines only
__eq__, __lt__, etc. Python will do the appropriate dance to make
cmp() work right.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From rhamph at gmail.com  Wed Oct 31 04:13:03 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Oct 2007 21:13:03 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
	<aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>
	<d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>
Message-ID: <aac2c7cb0710302013yd1dfcc9n3ac914215ad1dec0@mail.gmail.com>

On 10/30/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > On 10/30/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> > > On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > > > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions:
> > > >
> > > > >>> cmp(3, 'hello')
> > > > Traceback (most recent call last):
> > > >   File "<stdin>", line 1, in <module>
> > > > TypeError: unorderable types: int() < str()
> > > > >>> 3 == 'hello'
> > > > False
> > > >
> > > > A mixin for __cmp__ would be sufficient for scalars (where you can
> > > > avoid this exception and your size is constant), but not for
> > > > containers (which need to avoid inappropriate types and wish to avoid
> > > > multiple passes.)
> > >
> > > I don't understand this conclusion.  If you start comparing things
> > > that are unorderable, you'll get an exception.  But cmp() still makes
> > > sense when you compare other things::
> > >
> > >     >>> cmp((1, 'a', 4.5), (1, 'a', 6.2))
> > >     -1
> > >     >>> cmp([6, 5, 4], [6, 4, 5])
> > >     1
> > >
> > > I definitely don't want any cmp/__cmp__ implementation that swallows
> > > exceptions when the types don't align, e.g.::
> > >
> > >     >>> cmp((1, 'a'), ('a', 1))
> > >     Traceback (most recent call last):
> > >       File "<stdin>", line 1, in <module>
> > >     TypeError: unorderable types: int() < str()
> >
> > What I meant is that you can't use a mixin to map __eq__ to __cmp__,
> > as you'll get TypeError even though == is defined for those types.
>
> I wasn't suggesting that, though I don't see why a mixin would fail
> here assuming you have both __eq__ and __lt__.  Just to the __lt__
> comparison first.
>
> I'm actually currently in favor of keeping __cmp__ as it is in Python
> 2.5.  If a class defines only __cmp__, Python will do the appropriate
> dance to make <, >, ==, etc. work right.  If a class defines only
> __eq__, __lt__, etc. Python will do the appropriate dance to make
> cmp() work right.

For some definition of "right".  A container defines only __cmp__,
using cmp() internally, will be broken in 3.0.


-- 
Adam Olsen, aka Rhamphoryncus

From steven.bethard at gmail.com  Wed Oct 31 04:22:08 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 30 Oct 2007 21:22:08 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710302013yd1dfcc9n3ac914215ad1dec0@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
	<aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>
	<d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>
	<aac2c7cb0710302013yd1dfcc9n3ac914215ad1dec0@mail.gmail.com>
Message-ID: <d11dcfba0710302022i6f272ea3ve2af93a12ccd0433@mail.gmail.com>

On 10/30/07, Adam Olsen <rhamph at gmail.com> wrote:
> > I'm actually currently in favor of keeping __cmp__ as it is in Python
> > 2.5.  If a class defines only __cmp__, Python will do the appropriate
> > dance to make <, >, ==, etc. work right.  If a class defines only
> > __eq__, __lt__, etc. Python will do the appropriate dance to make
> > cmp() work right.
>
> For some definition of "right".  A container defines only __cmp__,
> using cmp() internally, will be broken in 3.0.

Sure, but that's their choice.  If you don't want to raise exceptions
on equality comparisons, then you should define __eq__, in addition to
__cmp__.  Or you should only compare against comparable things.

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From greg.ewing at canterbury.ac.nz  Wed Oct 31 05:46:16 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 31 Oct 2007 17:46:16 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz> <E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
Message-ID: <47280898.7010603@canterbury.ac.nz>

Steven Bethard wrote:
>     class C(object):
>         def __cmp__(self, other):
>             if self.foo < other.foo:
>                 return -1
>             elif self.foo < other.foo:
>                 return 1
>             else:
>                 return 0

With __cmp__, in cases like that you can punt the whole
thing off to the subsidiary object, e.g.

   def __cmp__(self, other):
     return cmp(self.foo, other.foo)

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Oct 31 06:13:36 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 31 Oct 2007 18:13:36 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<d11dcfba0710171027tbfeb31dw6f89ff77f6d51fcd@mail.gmail.com>
	<47169E6E.7000804@canterbury.ac.nz> <E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
Message-ID: <47280F00.1050002@canterbury.ac.nz>

Adam Olsen wrote:
>             for a, b in zip(self.data, other.data):
>                 result = richcmp(a, b, ordered)
>                 if result:
>                     return result

That can't be right, because there are *three* possible
results you need to be able to distinguish from comparing
a pair of elements: "stop and return True", "stop and
return False", and "keep going". There's no way you can
get that out of a boolean return value.

Maybe what we need to do is enhance __cmp__ so that
it has *four* possible return values: -1, 0, 1 and
UnequalButNotOrdered.

The scheme for handling a comparison 'op' between
two values 'a' and 'b' would then be:

1) Try a.__richcmp__(op, b) and vice versa. If either
    of these produces a result, return it.

2) Try a.__cmp__(b) and vice versa. If either of these
    produces a result, then

    a) If the result is -1, 0 or 1, return an appropriate
       value based on the operation.

    b) If the result is UnequalButNotOrdered, and the
       operation is == or !=, return an appropriate
       value.

    c) Otherwise, raise an exception.

The pattern for comparing sequences would become:

   def __cmp__(self, other):
     for a, b in zip(self.items, other.items):
       result = cmp(a, b)
       if result != 0:
         return result
      return 0

Which is actually the same as it is now, with an added
bit of It Just Works behaviour: if any of the element
comparisons gives UnequalButNotOrdered, then the whole
sequence gets reported as such.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Oct 31 06:22:06 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 31 Oct 2007 18:22:06 +1300
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<d11dcfba0710301911l7c17f1aeqc1eec56c2c056256@mail.gmail.com>
	<aac2c7cb0710301929y1f1518f1n4d08682a787677e2@mail.gmail.com>
	<d11dcfba0710301956q37e0660fi6d8d435cd99952f@mail.gmail.com>
Message-ID: <472810FE.7060805@canterbury.ac.nz>

Steven Bethard wrote:
> If a class defines only __cmp__, Python will do the appropriate
> dance to make <, >, ==, etc. work right.  If a class defines only
> __eq__, __lt__, etc. Python will do the appropriate dance to make
> cmp() work right.

With a four-way __cmp__, I wouldn't actually mind if
the dance only worked one way, i.e. richcmp --> cmp.
In that world, the only reason to define separate
comparison operators would be if you were using them
for something radically different from normal
comparison. So defining __cmp__ could be defined as
the standard way to implement comparison operators
unless there's some reason you really can't do it
that way, in which case you just have to live with
cmp() not working on your type.

--
Greg

From rhamph at gmail.com  Wed Oct 31 06:53:02 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 30 Oct 2007 23:53:02 -0600
Subject: [Python-3000] Please re-add __cmp__ to python 3000
In-Reply-To: <47280F00.1050002@canterbury.ac.nz>
References: <E1IiBwH-0003sz-04@fenris.runbox.com>
	<E1ImZR5-000251-4n@garm.runbox.com>
	<aac2c7cb0710291249y31a74c9cw66c5e7fbff818ea@mail.gmail.com>
	<47267607.2000806@canterbury.ac.nz>
	<aac2c7cb0710291836u28894c1epceb45b3640c794f6@mail.gmail.com>
	<4727973B.3060203@canterbury.ac.nz>
	<aac2c7cb0710301436j44c1dd26i682def6facfd9cef@mail.gmail.com>
	<d11dcfba0710301717r22a2c37t38af72e5b6a1d9b8@mail.gmail.com>
	<aac2c7cb0710301857r7a2d7148q45191a1cbd22621e@mail.gmail.com>
	<47280F00.1050002@canterbury.ac.nz>
Message-ID: <aac2c7cb0710302253y26fd3847rfa90a35a69837885@mail.gmail.com>

On 10/30/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Adam Olsen wrote:
> >             for a, b in zip(self.data, other.data):
> >                 result = richcmp(a, b, ordered)
> >                 if result:
> >                     return result
>
> That can't be right, because there are *three* possible
> results you need to be able to distinguish from comparing
> a pair of elements: "stop and return True", "stop and
> return False", and "keep going". There's no way you can
> get that out of a boolean return value.

It's not strictly a boolean value.

If ordered is false then you interpret it as either a false value or a
true value (but it may return -1 or +1 for the true values.)

If ordered is true then it may be -1, 0/false, +1, or raise a
TypeError if ordering is unsupported.


>    def __cmp__(self, other):
>      for a, b in zip(self.items, other.items):
>        result = cmp(a, b)
>        if result != 0:
>          return result
>       return 0
>
> Which is actually the same as it is now, with an added
> bit of It Just Works behaviour: if any of the element
> comparisons gives UnequalButNotOrdered, then the whole
> sequence gets reported as such.

So the difference between our two approaches is that mine uses a flag
to indicate if a TypeError should be raised, while yours adds an extra
return value.

Mine does have a small benefit: list currently exits early if it's
only testing for equality and the lengths differ, which couldn't be
done with your API.

-- 
Adam Olsen, aka Rhamphoryncus

From nnorwitz at gmail.com  Wed Oct 31 08:13:46 2007
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Wed, 31 Oct 2007 00:13:46 -0700
Subject: [Python-3000] status of buildbots
Message-ID: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>

We've made a lot of progress with the tests.  Several buildbots are
green.  http://python.org/dev/buildbot/3.0/

There are some tests that are unstable, at least:
   test_asynchat test_urllib2net test_xmlrpc
http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step-test/0
http://python.org/dev/buildbot/3.0/MIPS%20Debian%203.0/builds/81/step-test/0
http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step-test/0

I would really like to get these flaky tests fixed so they don't
create false positives.  It will help us greatly to move forward.  I
think these failures can occur on all platforms, so nothing special is
required and it should be just fixing the test (all python code).

Other platform specific problems:

Windows has more problems, with these tests failing:
   test_csv test_dumbdbm test_gettext test_mailbox test_netrc
test_pep277 test_subprocess
http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0

Win64 has a few more:
    test_csv test_dumbdbm test_fileinput test_format test_getargs2
    test_gettext test_mailbox test_netrc test_pep277 test_subprocess
test_winsound
http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/183/step-test/0
(This link is old.  There were other problems with the bot.)

There might be patches for one or more of these problems, but I'm not
sure if they work.

n

From theller at ctypes.org  Wed Oct 31 13:57:08 2007
From: theller at ctypes.org (Thomas Heller)
Date: Wed, 31 Oct 2007 13:57:08 +0100
Subject: [Python-3000] status of buildbots
In-Reply-To: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
Message-ID: <fg9u34$uo6$1@ger.gmane.org>

Neal Norwitz schrieb:

> Other platform specific problems:
> 
> Win64 has a few more:
>     test_csv test_dumbdbm test_fileinput test_format test_getargs2
>     test_gettext test_mailbox test_netrc test_pep277 test_subprocess
> test_winsound
> http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/183/step-test/0
> (This link is old.  There were other problems with the bot.)

Please ignore the test_winsound result on Win64.  They are caused by the machine,
and I do not know how to disable the test (Martins advice to remove the sound-driver
did not help, unfortunately).  AFAICT, the test_winsound succeeds on Win64 if I'm
logged in with a remote desktop connection to this machine, but I cannot stand
the sudden beeping when the tests run ;-).

Thomas


From adam at hupp.org  Wed Oct 31 14:30:55 2007
From: adam at hupp.org (Adam Hupp)
Date: Wed, 31 Oct 2007 09:30:55 -0400
Subject: [Python-3000] status of buildbots
In-Reply-To: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
Message-ID: <766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com>

On 10/31/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> We've made a lot of progress with the tests.  Several buildbots are
> green.  http://python.org/dev/buildbot/3.0/
>
> There are some tests that are unstable, at least:
>    test_asynchat test_urllib2net test_xmlrpc
> http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step-test/0
> http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step-test/0

test_xmlrpc has code to ignore these but the error message has changed
slightly so it's no longer in effect.

The reason for the errors is that the test is setting a timeout on the
socket object which puts it in to non-blocking mode.  That's
incompatible with SocketServer which uses socket.makefile for IO.  I
don't think the timeout is necessary as long as one other fix is made.
 I've asked the author of the test for confirmation.

On a related note, I think socket.makefile should throw an error if
called on a non-blocking socket.  The docs are pretty unambiguous that
this is wrong:

"file objects returned by the makefile() method must only be used when
the socket is in blocking mode; in timeout or non-blocking mode file
operations that cannot be completed immediately will fail."

Throwing an error would prevent things like this CherryPy issue:

 http://www.cherrypy.org/ticket/598

This doesn't help if the socket is put into non-blocking mode after
makefile is called but it's better than nothing.

Alternatively, if the a timeout is set but non-blocking is *not*
explicitly enabled the socket implementation could handle the retry
loop itself.

-- 
Adam Hupp | http://hupp.org/adam/

From lists at cheimes.de  Wed Oct 31 16:19:06 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 31 Oct 2007 16:19:06 +0100
Subject: [Python-3000] status of buildbots
In-Reply-To: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
Message-ID: <47289CEA.5060505@cheimes.de>

Neal Norwitz wrote:
> Windows has more problems, with these tests failing:
>    test_csv test_dumbdbm test_gettext test_mailbox test_netrc
> test_pep277 test_subprocess
> http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0

test_csv
Changing TemporaryFile("w+") to TemporaryFile("w+", newline='') in
test_csv.py readerAssertEqual() line 377 fixes the text. I'm not sure if
it's the proper way to fix the issue.

test_netrc
Added newline='' to fp = open(temp_filename, mode) in test_netrc.py
fixes the test. Same as test_csv.

test_gettext
Index: gettext.py
===================================================================
--- gettext.py	(revision 58729)
+++ gettext.py	(working copy)
@@ -291,7 +291,7 @@
             if mlen == 0:
                 # Catalog description
                 lastk = k = None
-                for b_item in tmsg.split(os.linesep.encode("ascii")):
+                for b_item in tmsg.split('\n'.encode("ascii")):
                     item = str(b_item).strip()
                     if not item:
                         continue
Index: test/test_gettext.py
===================================================================
--- test/test_gettext.py	(revision 58729)
+++ test/test_gettext.py	(working copy)
@@ -332,6 +332,7 @@

     def test_weird_metadata(self):
         info = self.t.info()
+        self.assertEqual(len(info), 9)
         self.assertEqual(info['last-translator'],
            'John Doe <jdoe at example.com>\nJane Foobar
<jfoobar at example.com>')

test_pep277
The test fails because the code in _fileio:fileio_init doesn't set name
from widename. On windows the variable widename contains the name as
PyUNICODE and name stays empty but
PyErr_SetFromErrnoWithFilename(PyExc_IOError, name) uses the name.

test_subprocess
It passes on my machine

test_mailbox
It suffers from the same problem with newlines as test_csv and test_netrc

Christian

From fumanchu at aminus.org  Wed Oct 31 17:00:55 2007
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 31 Oct 2007 09:00:55 -0700
Subject: [Python-3000] status of buildbots
In-Reply-To: <766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
	<766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E64DA5220@ex10.hostedexchange.local>

Adam Hupp wrote:
> On 10/31/07, Neal Norwitz <nnorwitz at gmail.com> wrote:
> > We've made a lot of progress with the tests.  Several buildbots are
> > green.  http://python.org/dev/buildbot/3.0/
> >
> > There are some tests that are unstable, at least:
> >    test_asynchat test_urllib2net test_xmlrpc
> > http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step-
> test/0
> >
>
http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step-
> test/0
> 
> test_xmlrpc has code to ignore these but the error message has changed
> slightly so it's no longer in effect.
> 
> The reason for the errors is that the test is setting a timeout on the
> socket object which puts it in to non-blocking mode.  That's
> incompatible with SocketServer which uses socket.makefile for IO.  I
> don't think the timeout is necessary as long as one other fix is made.
>  I've asked the author of the test for confirmation.
> 
> On a related note, I think socket.makefile should throw an error if
> called on a non-blocking socket.  The docs are pretty unambiguous that
> this is wrong:
> 
> "file objects returned by the makefile() method must only be used when
> the socket is in blocking mode; in timeout or non-blocking mode file
> operations that cannot be completed immediately will fail."
> 
> Throwing an error would prevent things like this CherryPy issue:
> 
>  http://www.cherrypy.org/ticket/598
> 
> This doesn't help if the socket is put into non-blocking mode after
> makefile is called but it's better than nothing.
> 
> Alternatively, if the a timeout is set but non-blocking is *not*
> explicitly enabled the socket implementation could handle the retry
> loop itself.

That's the route I would prefer, and is probably what we're going to end
up doing for CherryPy (write our own makefile which retries). We prefer
the timeout for various reasons, yet WSGI requires file-like objects.


Robert Brewer
fumanchu at aminus.org

From lists at cheimes.de  Wed Oct 31 17:29:23 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 31 Oct 2007 17:29:23 +0100
Subject: [Python-3000] status of buildbots
In-Reply-To: <47289CEA.5060505@cheimes.de>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
	<47289CEA.5060505@cheimes.de>
Message-ID: <4728AD63.6030608@cheimes.de>

Christian Heimes wrote:
> Neal Norwitz wrote:
>> Windows has more problems, with these tests failing:
>>    test_csv test_dumbdbm test_gettext test_mailbox test_netrc
>> test_pep277 test_subprocess
>> http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0

I forgot to mention that test_dumbdbm fails because the test replaces
\r\n line endings with \r\r\n line endings on Windows.

Christian

From r.m.oudkerk at gmail.com  Wed Oct 31 19:19:05 2007
From: r.m.oudkerk at gmail.com (roudkerk)
Date: Wed, 31 Oct 2007 18:19:05 +0000 (UTC)
Subject: [Python-3000] socket GC worries
References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com>
	<472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid>
	<ca471dc20710291145w35180095x4893e58560bc1eb7@mail.gmail.com>
	<4726738B.2080106@canterbury.ac.nz>
	<ca471dc20710291715x1e38f090j8521c1a9c4dc8a7a@mail.gmail.com>
Message-ID: <loom.20071031T181427-51@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> 
> 2007/10/29, Greg Ewing <greg.ewing <at> canterbury.ac.nz>:
> > I don't see what's so difficult about this. Each file
> > descriptor should be owned by exactly one object. If
> > two objects need to share a fd, then you dup() it so
> > that each one has its own fd. When the object is
> > close()d or GCed, it closes its fd.
> 
> On Windows you can't dup() a fd.
> 

You can use os.dup() on an fd.  But with sockets you must 
use DuplicateHandle() instead because socket.fileno() returns 
a handle not an fd.

socket.py has this comment:

#
# These classes are used by the socket() defined on Windows and BeOS
# platforms to provide a best-effort implementation of the cleanup
# semantics needed when sockets can't be dup()ed.
#
# These are not actually used on other platforms.
#

I don't know whether BeOS still matters to anyone...  I would just
implement _socket.socket.dup() on Windows using DuplicateHandle().

Example of DuplicateHandle():


import ctypes, socket
from _subprocess import *

# send a message to a socket object 'conn'
listener = socket.socket()
listener.bind(('localhost', 0))
listener.listen(1)
client = socket.socket()
client.connect(listener.getsockname())
conn, addr = listener.accept()
client.sendall('hello world')

# duplicate handle
handle = conn.fileno()
duphandle = DuplicateHandle(
    GetCurrentProcess(), handle, GetCurrentProcess(), 
    0, False, DUPLICATE_SAME_ACCESS
    ).Detach()

# use duplicate handle to read the message
buffer = ctypes.c_buffer(20)
ctypes.windll.ws2_32.recv(duphandle, buffer, 20, 0)
print handle, duphandle, buffer.value


BTW.  On Windows can we please have a socket.fromfd() 
function (or maybe that should be socket.fromhandle()).



From lists at cheimes.de  Wed Oct 31 21:03:54 2007
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 31 Oct 2007 21:03:54 +0100
Subject: [Python-3000] status of buildbots
In-Reply-To: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
References: <ee2a432c0710310013n66af2851u775e35eb7e86f43f@mail.gmail.com>
Message-ID: <4728DFAA.2040604@cheimes.de>

Neal Norwitz wrote:
> Windows has more problems, with these tests failing:
>    test_csv test_dumbdbm test_gettext test_mailbox test_netrc
> test_pep277 test_subprocess
> http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0

I've used my new developer privileges to check in some fixes. I'm down
to three failing unit tests on my WinXP (SP2 i386 German, inside a
VMWare sandbox).

3 tests failed:
    test_csv test_mailbox test_netrc

All remaining failures are caused by newline madness.

Christian