From aleaxit at gmail.com  Mon Dec  1 02:14:08 2008
From: aleaxit at gmail.com (Alex Martelli)
Date: Sun, 30 Nov 2008 17:14:08 -0800
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>
	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>
	<gguqcv$q87$1@ger.gmane.org> <4932F901.6070803@gmail.com>
	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>
	<49330AA9.7070005@gmail.com>
	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>
Message-ID: <e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>

On Sun, Nov 30, 2008 at 2:02 PM, Filip Gruszczy?ski <gruszczy at gmail.com> wrote:
>> Yeah, any time someone implements their own attribute lookup process for
>> a class (be it via __getattr__, __getattribute__ or the C equivalents),
>> it is up to the reimplementation to appropriately format their error
>> message if they raise AttributeError directly.
>
> I guess, this means that I have to go to Phil Thompson at Riverbank
> and try to convince him to change the message.

Yes, but he should be able to change it in one place (in sip, the C++
to Python wrapper generator he's also authored and uses for PyQt) AND
it would make sip even better, so he may want to put it on his
backlog.

Alex

From jyasskin at gmail.com  Mon Dec  1 02:54:02 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sun, 30 Nov 2008 17:54:02 -0800
Subject: [Python-Dev] Patch to speed up non-tracing case in
	PyEval_EvalFrameEx (2% on pybench)
Message-ID: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com>

Tracing support shows up fairly heavily an a Python profile, even
though it's nearly always turned off. The attached patch against the
trunk speeds up PyBench by 2% for me. All tests pass. I have 2
questions:

1) Can other people corroborate this speedup on their machines? I'm
running on a Macbook Pro (Intel Core2 processor, probably Merom) with
a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently
produces a faster python than gcc-4.3.)

2) Assuming this speeds things up for most people, should I check it
in anywhere besides the trunk? I assume it's out for 3.0; is it in for
2.6.1 or 3.0.1?



Pybench output:

-------------------------------------------------------------------------------
PYBENCH 2.0
-------------------------------------------------------------------------------
* using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1
(Apple Inc. build 5488)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.time

-------------------------------------------------------------------------------
Benchmark: pybench.out
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    Darwin-9.5.0-i386-32bit
       Processor:      i386

    Python:
       Implementation: CPython
       Executable:
/Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe
       Version:        2.7.0
       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
       Bits:           32bit
       Build:          Nov 30 2008 17:14:10 (#trunk:67458M)
       Unicode:        UCS2


-------------------------------------------------------------------------------
Comparing with: ../build_orig/pybench.out
-------------------------------------------------------------------------------

    Rounds: 10
    Warp:   10
    Timer:  time.time

    Machine Details:
       Platform ID:    Darwin-9.5.0-i386-32bit
       Processor:      i386

    Python:
       Implementation: CPython
       Executable:
/Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe
       Version:        2.7.0
       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
       Bits:           32bit
       Build:          Nov 30 2008 13:51:09 (#trunk:67458)
       Unicode:        UCS2


Test                             minimum run-time        average  run-time
                                 this    other   diff    this    other   diff
-------------------------------------------------------------------------------
          BuiltinFunctionCalls:   127ms   130ms   -2.4%   129ms   132ms   -2.1%
           BuiltinMethodLookup:    90ms    93ms   -3.2%    91ms    94ms   -3.1%
                 CompareFloats:    88ms    91ms   -3.3%    89ms    93ms   -4.3%
         CompareFloatsIntegers:    97ms    99ms   -2.1%    97ms   100ms   -2.4%
               CompareIntegers:    79ms    82ms   -4.2%    79ms    85ms   -6.1%
        CompareInternedStrings:    90ms    92ms   -2.4%    94ms    94ms   -0.9%
                  CompareLongs:    86ms    83ms   +3.6%    87ms    84ms   +3.5%
                CompareStrings:    80ms    82ms   -3.1%    81ms    83ms   -2.3%
                CompareUnicode:   103ms   105ms   -2.3%   106ms   108ms   -1.5%
    ComplexPythonFunctionCalls:   139ms   137ms   +1.3%   140ms   139ms   +0.1%
                 ConcatStrings:   142ms   151ms   -6.0%   156ms   154ms   +1.1%
                 ConcatUnicode:    87ms    92ms   -5.4%    89ms    94ms   -5.7%
               CreateInstances:   142ms   144ms   -1.4%   144ms   145ms   -1.1%
            CreateNewInstances:   107ms   109ms   -2.3%   108ms   111ms   -2.1%
       CreateStringsWithConcat:   114ms   137ms  -17.1%   117ms   139ms  -16.0%
       CreateUnicodeWithConcat:    92ms   101ms   -9.2%    95ms   102ms   -7.2%
                  DictCreation:    77ms    81ms   -4.4%    80ms    85ms   -5.9%
             DictWithFloatKeys:    91ms   107ms  -14.5%    93ms   109ms  -14.6%
           DictWithIntegerKeys:    95ms    94ms   +1.4%   108ms    96ms  +12.3%
            DictWithStringKeys:    83ms    88ms   -5.8%    84ms    88ms   -4.7%
                      ForLoops:    72ms    72ms   -0.1%    79ms    74ms   +5.8%
                    IfThenElse:    83ms    80ms   +3.9%    85ms    80ms   +5.3%
                   ListSlicing:   117ms   118ms   -0.7%   118ms   121ms   -1.8%
                NestedForLoops:   116ms   119ms   -2.4%   121ms   121ms   +0.0%
          NormalClassAttribute:   106ms   115ms   -7.7%   108ms   117ms   -7.7%
       NormalInstanceAttribute:    96ms    98ms   -2.3%    97ms   100ms   -3.1%
           PythonFunctionCalls:    92ms    95ms   -3.7%    94ms    99ms   -5.2%
             PythonMethodCalls:   147ms   147ms   +0.1%   152ms   149ms   +2.1%
                     Recursion:   135ms   136ms   -0.3%   140ms   144ms   -2.9%
                  SecondImport:   101ms    99ms   +2.1%   103ms   101ms   +2.2%
           SecondPackageImport:   107ms   103ms   +3.5%   108ms   104ms   +3.3%
         SecondSubmoduleImport:   134ms   134ms   +0.3%   136ms   136ms   -0.0%
       SimpleComplexArithmetic:   105ms   111ms   -5.0%   110ms   112ms   -1.4%
        SimpleDictManipulation:    95ms   106ms  -10.6%    96ms   109ms  -12.0%
         SimpleFloatArithmetic:    90ms    99ms   -9.3%    93ms   102ms   -8.2%
      SimpleIntFloatArithmetic:    78ms    76ms   +2.3%    79ms    77ms   +2.0%
       SimpleIntegerArithmetic:    78ms    77ms   +1.8%    79ms    77ms   +2.0%
        SimpleListManipulation:    80ms    78ms   +2.4%    80ms    79ms   +1.9%
          SimpleLongArithmetic:   110ms   113ms   -2.0%   111ms   113ms   -2.1%
                    SmallLists:   128ms   117ms   +9.5%   130ms   124ms   +4.9%
                   SmallTuples:   115ms   114ms   +1.7%   117ms   114ms   +2.2%
         SpecialClassAttribute:   101ms   112ms  -10.3%   104ms   114ms   -8.9%
      SpecialInstanceAttribute:   173ms   177ms   -1.9%   176ms   179ms   -1.6%
                StringMappings:   165ms   167ms   -1.2%   168ms   169ms   -0.5%
              StringPredicates:   126ms   134ms   -5.7%   127ms   134ms   -5.6%
                 StringSlicing:   125ms   123ms   +1.9%   131ms   130ms   +0.7%
                     TryExcept:    79ms    80ms   -0.6%    80ms    80ms   -0.8%
                    TryFinally:   110ms   107ms   +3.0%   111ms   112ms   -1.1%
                TryRaiseExcept:    99ms   101ms   -1.6%   100ms   102ms   -1.7%
                  TupleSlicing:   127ms   127ms   +0.6%   137ms   137ms   +0.0%
               UnicodeMappings:   144ms   144ms   -0.3%   145ms   145ms   -0.4%
             UnicodePredicates:   116ms   114ms   +1.3%   117ms   115ms   +1.1%
             UnicodeProperties:   106ms   102ms   +3.6%   107ms   104ms   +3.1%
                UnicodeSlicing:    95ms   111ms  -14.0%    99ms   112ms  -11.8%
                   WithFinally:   157ms   152ms   +3.3%   159ms   154ms   +3.3%
               WithRaiseExcept:   123ms   125ms   -1.1%   125ms   126ms   -1.2%
-------------------------------------------------------------------------------
Totals:                          6043ms  6182ms   -2.2%  6185ms  6301ms   -1.9%

(this=pybench.out, other=../build_orig/pybench.out)


2to3 times:

Before:
$ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
real	0m56.685s
user	0m55.620s
sys	0m0.380s

After:
$ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
real	0m55.067s
user	0m53.843s
sys	0m0.376s

== 3% faster


Gory details:

The meat of the patch is:
@@ -884,11 +891,12 @@
 	fast_next_opcode:
 		f->f_lasti = INSTR_OFFSET();

 		/* line-by-line tracing support */

-		if (tstate->c_tracefunc != NULL && !tstate->tracing) {
+		if (_Py_TracingPossible &&
+		    tstate->c_tracefunc != NULL && !tstate->tracing) {


This converts the generated assembly (produced with `gcc -S -dA ...`,
then manually annotated a bit) from:

	# basic block 17
	# ../Python/ceval.c:885
LM541:
	movl	8(%ebp), %ecx
LVL319:
	subl	-316(%ebp), %edx
	movl	%edx, 60(%ecx)
	# ../Python/ceval.c:889
LM542:
# %esi = tstate
	movl	-336(%ebp), %esi
LVL320:
# %eax = tstate->c_tracefunc
	movl	28(%esi), %eax
LVL321:
# if tstate->c_tracefunc == 0
	testl	%eax, %eax
# goto past-if ()
	je	L567
# more if conditions here

to:

	# basic block 17
	# ../Python/ceval.c:889
LM542:
	movl	8(%ebp), %ecx
LVL319:
	subl	-316(%ebp), %edx
	movl	%edx, 60(%ecx)
	# ../Python/ceval.c:893
LM543:
# %eax = _Py_TracingPossible
	movl	__Py_TracingPossible-"L00000000033$pb"(%ebx), %eax
LVL320:
# if _Py_TracingPossible != 0
	testl	%eax, %eax
# goto rest-of-if (nearby)
	jne	L2321
# opcode = NEXTOP(); continues here


The branch should be predicted accurately either way, so there are 2
things that may be contributing to the performance change.

First, adding the global caching variable halves the amount of memory
that has to be read to check the prediction. The memory that is read
is still read one instruction before it's used, but adding a local
variable to read the memory earlier doesn't affect the performance.

Without the global variable, the compiler puts the tracing code
immediately after the if; with the global, it moves it away and puts
the non-tracing code immediately after the first test in the if. This
may affect branch prediction and may affect the icache. I tried using
gcc's __builtin_expect() to ensure that the tracing code is always
out-of-line. This moved it much farther away and cost about 1% in
performance (i.e. 1% instead of 2% faster than "before"). I don't know
why the __builtin_expect() version would be slower. If anyone feels
inspired to test this out on another processor or compiler version,
let me know how it goes.

Jeffrey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fast-tracing.diff
Type: application/octet-stream
Size: 1658 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081130/1a858073/attachment.obj>

From brett at python.org  Mon Dec  1 05:14:45 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 30 Nov 2008 20:14:45 -0800
Subject: [Python-Dev] Patch to speed up non-tracing case in
	PyEval_EvalFrameEx (2% on pybench)
In-Reply-To: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com>
References: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com>
Message-ID: <bbaeab100811302014x74dd9ba6je5b65c3cb0ce4e6b@mail.gmail.com>

Can you toss the patch into the issue tracker, Jeffrey, so that any
patch comments can be done there?

-Brett

On Sun, Nov 30, 2008 at 17:54, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> Tracing support shows up fairly heavily an a Python profile, even
> though it's nearly always turned off. The attached patch against the
> trunk speeds up PyBench by 2% for me. All tests pass. I have 2
> questions:
>
> 1) Can other people corroborate this speedup on their machines? I'm
> running on a Macbook Pro (Intel Core2 processor, probably Merom) with
> a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently
> produces a faster python than gcc-4.3.)
>
> 2) Assuming this speeds things up for most people, should I check it
> in anywhere besides the trunk? I assume it's out for 3.0; is it in for
> 2.6.1 or 3.0.1?
>
>
>
> Pybench output:
>
> -------------------------------------------------------------------------------
> PYBENCH 2.0
> -------------------------------------------------------------------------------
> * using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1
> (Apple Inc. build 5488)]
> * disabled garbage collection
> * system check interval set to maximum: 2147483647
> * using timer: time.time
>
> -------------------------------------------------------------------------------
> Benchmark: pybench.out
> -------------------------------------------------------------------------------
>
>    Rounds: 10
>    Warp:   10
>    Timer:  time.time
>
>    Machine Details:
>       Platform ID:    Darwin-9.5.0-i386-32bit
>       Processor:      i386
>
>    Python:
>       Implementation: CPython
>       Executable:
> /Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe
>       Version:        2.7.0
>       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
>       Bits:           32bit
>       Build:          Nov 30 2008 17:14:10 (#trunk:67458M)
>       Unicode:        UCS2
>
>
> -------------------------------------------------------------------------------
> Comparing with: ../build_orig/pybench.out
> -------------------------------------------------------------------------------
>
>    Rounds: 10
>    Warp:   10
>    Timer:  time.time
>
>    Machine Details:
>       Platform ID:    Darwin-9.5.0-i386-32bit
>       Processor:      i386
>
>    Python:
>       Implementation: CPython
>       Executable:
> /Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe
>       Version:        2.7.0
>       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
>       Bits:           32bit
>       Build:          Nov 30 2008 13:51:09 (#trunk:67458)
>       Unicode:        UCS2
>
>
> Test                             minimum run-time        average  run-time
>                                 this    other   diff    this    other   diff
> -------------------------------------------------------------------------------
>          BuiltinFunctionCalls:   127ms   130ms   -2.4%   129ms   132ms   -2.1%
>           BuiltinMethodLookup:    90ms    93ms   -3.2%    91ms    94ms   -3.1%
>                 CompareFloats:    88ms    91ms   -3.3%    89ms    93ms   -4.3%
>         CompareFloatsIntegers:    97ms    99ms   -2.1%    97ms   100ms   -2.4%
>               CompareIntegers:    79ms    82ms   -4.2%    79ms    85ms   -6.1%
>        CompareInternedStrings:    90ms    92ms   -2.4%    94ms    94ms   -0.9%
>                  CompareLongs:    86ms    83ms   +3.6%    87ms    84ms   +3.5%
>                CompareStrings:    80ms    82ms   -3.1%    81ms    83ms   -2.3%
>                CompareUnicode:   103ms   105ms   -2.3%   106ms   108ms   -1.5%
>    ComplexPythonFunctionCalls:   139ms   137ms   +1.3%   140ms   139ms   +0.1%
>                 ConcatStrings:   142ms   151ms   -6.0%   156ms   154ms   +1.1%
>                 ConcatUnicode:    87ms    92ms   -5.4%    89ms    94ms   -5.7%
>               CreateInstances:   142ms   144ms   -1.4%   144ms   145ms   -1.1%
>            CreateNewInstances:   107ms   109ms   -2.3%   108ms   111ms   -2.1%
>       CreateStringsWithConcat:   114ms   137ms  -17.1%   117ms   139ms  -16.0%
>       CreateUnicodeWithConcat:    92ms   101ms   -9.2%    95ms   102ms   -7.2%
>                  DictCreation:    77ms    81ms   -4.4%    80ms    85ms   -5.9%
>             DictWithFloatKeys:    91ms   107ms  -14.5%    93ms   109ms  -14.6%
>           DictWithIntegerKeys:    95ms    94ms   +1.4%   108ms    96ms  +12.3%
>            DictWithStringKeys:    83ms    88ms   -5.8%    84ms    88ms   -4.7%
>                      ForLoops:    72ms    72ms   -0.1%    79ms    74ms   +5.8%
>                    IfThenElse:    83ms    80ms   +3.9%    85ms    80ms   +5.3%
>                   ListSlicing:   117ms   118ms   -0.7%   118ms   121ms   -1.8%
>                NestedForLoops:   116ms   119ms   -2.4%   121ms   121ms   +0.0%
>          NormalClassAttribute:   106ms   115ms   -7.7%   108ms   117ms   -7.7%
>       NormalInstanceAttribute:    96ms    98ms   -2.3%    97ms   100ms   -3.1%
>           PythonFunctionCalls:    92ms    95ms   -3.7%    94ms    99ms   -5.2%
>             PythonMethodCalls:   147ms   147ms   +0.1%   152ms   149ms   +2.1%
>                     Recursion:   135ms   136ms   -0.3%   140ms   144ms   -2.9%
>                  SecondImport:   101ms    99ms   +2.1%   103ms   101ms   +2.2%
>           SecondPackageImport:   107ms   103ms   +3.5%   108ms   104ms   +3.3%
>         SecondSubmoduleImport:   134ms   134ms   +0.3%   136ms   136ms   -0.0%
>       SimpleComplexArithmetic:   105ms   111ms   -5.0%   110ms   112ms   -1.4%
>        SimpleDictManipulation:    95ms   106ms  -10.6%    96ms   109ms  -12.0%
>         SimpleFloatArithmetic:    90ms    99ms   -9.3%    93ms   102ms   -8.2%
>      SimpleIntFloatArithmetic:    78ms    76ms   +2.3%    79ms    77ms   +2.0%
>       SimpleIntegerArithmetic:    78ms    77ms   +1.8%    79ms    77ms   +2.0%
>        SimpleListManipulation:    80ms    78ms   +2.4%    80ms    79ms   +1.9%
>          SimpleLongArithmetic:   110ms   113ms   -2.0%   111ms   113ms   -2.1%
>                    SmallLists:   128ms   117ms   +9.5%   130ms   124ms   +4.9%
>                   SmallTuples:   115ms   114ms   +1.7%   117ms   114ms   +2.2%
>         SpecialClassAttribute:   101ms   112ms  -10.3%   104ms   114ms   -8.9%
>      SpecialInstanceAttribute:   173ms   177ms   -1.9%   176ms   179ms   -1.6%
>                StringMappings:   165ms   167ms   -1.2%   168ms   169ms   -0.5%
>              StringPredicates:   126ms   134ms   -5.7%   127ms   134ms   -5.6%
>                 StringSlicing:   125ms   123ms   +1.9%   131ms   130ms   +0.7%
>                     TryExcept:    79ms    80ms   -0.6%    80ms    80ms   -0.8%
>                    TryFinally:   110ms   107ms   +3.0%   111ms   112ms   -1.1%
>                TryRaiseExcept:    99ms   101ms   -1.6%   100ms   102ms   -1.7%
>                  TupleSlicing:   127ms   127ms   +0.6%   137ms   137ms   +0.0%
>               UnicodeMappings:   144ms   144ms   -0.3%   145ms   145ms   -0.4%
>             UnicodePredicates:   116ms   114ms   +1.3%   117ms   115ms   +1.1%
>             UnicodeProperties:   106ms   102ms   +3.6%   107ms   104ms   +3.1%
>                UnicodeSlicing:    95ms   111ms  -14.0%    99ms   112ms  -11.8%
>                   WithFinally:   157ms   152ms   +3.3%   159ms   154ms   +3.3%
>               WithRaiseExcept:   123ms   125ms   -1.1%   125ms   126ms   -1.2%
> -------------------------------------------------------------------------------
> Totals:                          6043ms  6182ms   -2.2%  6185ms  6301ms   -1.9%
>
> (this=pybench.out, other=../build_orig/pybench.out)
>
>
> 2to3 times:
>
> Before:
> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
> real    0m56.685s
> user    0m55.620s
> sys     0m0.380s
>
> After:
> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
> real    0m55.067s
> user    0m53.843s
> sys     0m0.376s
>
> == 3% faster
>
>
> Gory details:
>
> The meat of the patch is:
> @@ -884,11 +891,12 @@
>        fast_next_opcode:
>                f->f_lasti = INSTR_OFFSET();
>
>                /* line-by-line tracing support */
>
> -               if (tstate->c_tracefunc != NULL && !tstate->tracing) {
> +               if (_Py_TracingPossible &&
> +                   tstate->c_tracefunc != NULL && !tstate->tracing) {
>
>
> This converts the generated assembly (produced with `gcc -S -dA ...`,
> then manually annotated a bit) from:
>
>        # basic block 17
>        # ../Python/ceval.c:885
> LM541:
>        movl    8(%ebp), %ecx
> LVL319:
>        subl    -316(%ebp), %edx
>        movl    %edx, 60(%ecx)
>        # ../Python/ceval.c:889
> LM542:
> # %esi = tstate
>        movl    -336(%ebp), %esi
> LVL320:
> # %eax = tstate->c_tracefunc
>        movl    28(%esi), %eax
> LVL321:
> # if tstate->c_tracefunc == 0
>        testl   %eax, %eax
> # goto past-if ()
>        je      L567
> # more if conditions here
>
> to:
>
>        # basic block 17
>        # ../Python/ceval.c:889
> LM542:
>        movl    8(%ebp), %ecx
> LVL319:
>        subl    -316(%ebp), %edx
>        movl    %edx, 60(%ecx)
>        # ../Python/ceval.c:893
> LM543:
> # %eax = _Py_TracingPossible
>        movl    __Py_TracingPossible-"L00000000033$pb"(%ebx), %eax
> LVL320:
> # if _Py_TracingPossible != 0
>        testl   %eax, %eax
> # goto rest-of-if (nearby)
>        jne     L2321
> # opcode = NEXTOP(); continues here
>
>
> The branch should be predicted accurately either way, so there are 2
> things that may be contributing to the performance change.
>
> First, adding the global caching variable halves the amount of memory
> that has to be read to check the prediction. The memory that is read
> is still read one instruction before it's used, but adding a local
> variable to read the memory earlier doesn't affect the performance.
>
> Without the global variable, the compiler puts the tracing code
> immediately after the if; with the global, it moves it away and puts
> the non-tracing code immediately after the first test in the if. This
> may affect branch prediction and may affect the icache. I tried using
> gcc's __builtin_expect() to ensure that the tracing code is always
> out-of-line. This moved it much farther away and cost about 1% in
> performance (i.e. 1% instead of 2% faster than "before"). I don't know
> why the __builtin_expect() version would be slower. If anyone feels
> inspired to test this out on another processor or compiler version,
> let me know how it goes.
>
> Jeffrey
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>

From jyasskin at gmail.com  Mon Dec  1 05:34:27 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sun, 30 Nov 2008 20:34:27 -0800
Subject: [Python-Dev] Patch to speed up non-tracing case in
	PyEval_EvalFrameEx (2% on pybench)
In-Reply-To: <bbaeab100811302014x74dd9ba6je5b65c3cb0ce4e6b@mail.gmail.com>
References: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com>
	<bbaeab100811302014x74dd9ba6je5b65c3cb0ce4e6b@mail.gmail.com>
Message-ID: <5d44f72f0811302034u541a5021l6420c8bdd3f2b0ba@mail.gmail.com>

Done: http://bugs.python.org/issue4477

On Sun, Nov 30, 2008 at 8:14 PM, Brett Cannon <brett at python.org> wrote:
> Can you toss the patch into the issue tracker, Jeffrey, so that any
> patch comments can be done there?
>
> -Brett
>
> On Sun, Nov 30, 2008 at 17:54, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
>> Tracing support shows up fairly heavily an a Python profile, even
>> though it's nearly always turned off. The attached patch against the
>> trunk speeds up PyBench by 2% for me. All tests pass. I have 2
>> questions:
>>
>> 1) Can other people corroborate this speedup on their machines? I'm
>> running on a Macbook Pro (Intel Core2 processor, probably Merom) with
>> a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently
>> produces a faster python than gcc-4.3.)
>>
>> 2) Assuming this speeds things up for most people, should I check it
>> in anywhere besides the trunk? I assume it's out for 3.0; is it in for
>> 2.6.1 or 3.0.1?
>>
>>
>>
>> Pybench output:
>>
>> -------------------------------------------------------------------------------
>> PYBENCH 2.0
>> -------------------------------------------------------------------------------
>> * using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1
>> (Apple Inc. build 5488)]
>> * disabled garbage collection
>> * system check interval set to maximum: 2147483647
>> * using timer: time.time
>>
>> -------------------------------------------------------------------------------
>> Benchmark: pybench.out
>> -------------------------------------------------------------------------------
>>
>>    Rounds: 10
>>    Warp:   10
>>    Timer:  time.time
>>
>>    Machine Details:
>>       Platform ID:    Darwin-9.5.0-i386-32bit
>>       Processor:      i386
>>
>>    Python:
>>       Implementation: CPython
>>       Executable:
>> /Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe
>>       Version:        2.7.0
>>       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
>>       Bits:           32bit
>>       Build:          Nov 30 2008 17:14:10 (#trunk:67458M)
>>       Unicode:        UCS2
>>
>>
>> -------------------------------------------------------------------------------
>> Comparing with: ../build_orig/pybench.out
>> -------------------------------------------------------------------------------
>>
>>    Rounds: 10
>>    Warp:   10
>>    Timer:  time.time
>>
>>    Machine Details:
>>       Platform ID:    Darwin-9.5.0-i386-32bit
>>       Processor:      i386
>>
>>    Python:
>>       Implementation: CPython
>>       Executable:
>> /Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe
>>       Version:        2.7.0
>>       Compiler:       GCC 4.0.1 (Apple Inc. build 5488)
>>       Bits:           32bit
>>       Build:          Nov 30 2008 13:51:09 (#trunk:67458)
>>       Unicode:        UCS2
>>
>>
>> Test                             minimum run-time        average  run-time
>>                                 this    other   diff    this    other   diff
>> -------------------------------------------------------------------------------
>>          BuiltinFunctionCalls:   127ms   130ms   -2.4%   129ms   132ms   -2.1%
>>           BuiltinMethodLookup:    90ms    93ms   -3.2%    91ms    94ms   -3.1%
>>                 CompareFloats:    88ms    91ms   -3.3%    89ms    93ms   -4.3%
>>         CompareFloatsIntegers:    97ms    99ms   -2.1%    97ms   100ms   -2.4%
>>               CompareIntegers:    79ms    82ms   -4.2%    79ms    85ms   -6.1%
>>        CompareInternedStrings:    90ms    92ms   -2.4%    94ms    94ms   -0.9%
>>                  CompareLongs:    86ms    83ms   +3.6%    87ms    84ms   +3.5%
>>                CompareStrings:    80ms    82ms   -3.1%    81ms    83ms   -2.3%
>>                CompareUnicode:   103ms   105ms   -2.3%   106ms   108ms   -1.5%
>>    ComplexPythonFunctionCalls:   139ms   137ms   +1.3%   140ms   139ms   +0.1%
>>                 ConcatStrings:   142ms   151ms   -6.0%   156ms   154ms   +1.1%
>>                 ConcatUnicode:    87ms    92ms   -5.4%    89ms    94ms   -5.7%
>>               CreateInstances:   142ms   144ms   -1.4%   144ms   145ms   -1.1%
>>            CreateNewInstances:   107ms   109ms   -2.3%   108ms   111ms   -2.1%
>>       CreateStringsWithConcat:   114ms   137ms  -17.1%   117ms   139ms  -16.0%
>>       CreateUnicodeWithConcat:    92ms   101ms   -9.2%    95ms   102ms   -7.2%
>>                  DictCreation:    77ms    81ms   -4.4%    80ms    85ms   -5.9%
>>             DictWithFloatKeys:    91ms   107ms  -14.5%    93ms   109ms  -14.6%
>>           DictWithIntegerKeys:    95ms    94ms   +1.4%   108ms    96ms  +12.3%
>>            DictWithStringKeys:    83ms    88ms   -5.8%    84ms    88ms   -4.7%
>>                      ForLoops:    72ms    72ms   -0.1%    79ms    74ms   +5.8%
>>                    IfThenElse:    83ms    80ms   +3.9%    85ms    80ms   +5.3%
>>                   ListSlicing:   117ms   118ms   -0.7%   118ms   121ms   -1.8%
>>                NestedForLoops:   116ms   119ms   -2.4%   121ms   121ms   +0.0%
>>          NormalClassAttribute:   106ms   115ms   -7.7%   108ms   117ms   -7.7%
>>       NormalInstanceAttribute:    96ms    98ms   -2.3%    97ms   100ms   -3.1%
>>           PythonFunctionCalls:    92ms    95ms   -3.7%    94ms    99ms   -5.2%
>>             PythonMethodCalls:   147ms   147ms   +0.1%   152ms   149ms   +2.1%
>>                     Recursion:   135ms   136ms   -0.3%   140ms   144ms   -2.9%
>>                  SecondImport:   101ms    99ms   +2.1%   103ms   101ms   +2.2%
>>           SecondPackageImport:   107ms   103ms   +3.5%   108ms   104ms   +3.3%
>>         SecondSubmoduleImport:   134ms   134ms   +0.3%   136ms   136ms   -0.0%
>>       SimpleComplexArithmetic:   105ms   111ms   -5.0%   110ms   112ms   -1.4%
>>        SimpleDictManipulation:    95ms   106ms  -10.6%    96ms   109ms  -12.0%
>>         SimpleFloatArithmetic:    90ms    99ms   -9.3%    93ms   102ms   -8.2%
>>      SimpleIntFloatArithmetic:    78ms    76ms   +2.3%    79ms    77ms   +2.0%
>>       SimpleIntegerArithmetic:    78ms    77ms   +1.8%    79ms    77ms   +2.0%
>>        SimpleListManipulation:    80ms    78ms   +2.4%    80ms    79ms   +1.9%
>>          SimpleLongArithmetic:   110ms   113ms   -2.0%   111ms   113ms   -2.1%
>>                    SmallLists:   128ms   117ms   +9.5%   130ms   124ms   +4.9%
>>                   SmallTuples:   115ms   114ms   +1.7%   117ms   114ms   +2.2%
>>         SpecialClassAttribute:   101ms   112ms  -10.3%   104ms   114ms   -8.9%
>>      SpecialInstanceAttribute:   173ms   177ms   -1.9%   176ms   179ms   -1.6%
>>                StringMappings:   165ms   167ms   -1.2%   168ms   169ms   -0.5%
>>              StringPredicates:   126ms   134ms   -5.7%   127ms   134ms   -5.6%
>>                 StringSlicing:   125ms   123ms   +1.9%   131ms   130ms   +0.7%
>>                     TryExcept:    79ms    80ms   -0.6%    80ms    80ms   -0.8%
>>                    TryFinally:   110ms   107ms   +3.0%   111ms   112ms   -1.1%
>>                TryRaiseExcept:    99ms   101ms   -1.6%   100ms   102ms   -1.7%
>>                  TupleSlicing:   127ms   127ms   +0.6%   137ms   137ms   +0.0%
>>               UnicodeMappings:   144ms   144ms   -0.3%   145ms   145ms   -0.4%
>>             UnicodePredicates:   116ms   114ms   +1.3%   117ms   115ms   +1.1%
>>             UnicodeProperties:   106ms   102ms   +3.6%   107ms   104ms   +3.1%
>>                UnicodeSlicing:    95ms   111ms  -14.0%    99ms   112ms  -11.8%
>>                   WithFinally:   157ms   152ms   +3.3%   159ms   154ms   +3.3%
>>               WithRaiseExcept:   123ms   125ms   -1.1%   125ms   126ms   -1.2%
>> -------------------------------------------------------------------------------
>> Totals:                          6043ms  6182ms   -2.2%  6185ms  6301ms   -1.9%
>>
>> (this=pybench.out, other=../build_orig/pybench.out)
>>
>>
>> 2to3 times:
>>
>> Before:
>> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
>> real    0m56.685s
>> user    0m55.620s
>> sys     0m0.380s
>>
>> After:
>> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null
>> real    0m55.067s
>> user    0m53.843s
>> sys     0m0.376s
>>
>> == 3% faster
>>
>>
>> Gory details:
>>
>> The meat of the patch is:
>> @@ -884,11 +891,12 @@
>>        fast_next_opcode:
>>                f->f_lasti = INSTR_OFFSET();
>>
>>                /* line-by-line tracing support */
>>
>> -               if (tstate->c_tracefunc != NULL && !tstate->tracing) {
>> +               if (_Py_TracingPossible &&
>> +                   tstate->c_tracefunc != NULL && !tstate->tracing) {
>>
>>
>> This converts the generated assembly (produced with `gcc -S -dA ...`,
>> then manually annotated a bit) from:
>>
>>        # basic block 17
>>        # ../Python/ceval.c:885
>> LM541:
>>        movl    8(%ebp), %ecx
>> LVL319:
>>        subl    -316(%ebp), %edx
>>        movl    %edx, 60(%ecx)
>>        # ../Python/ceval.c:889
>> LM542:
>> # %esi = tstate
>>        movl    -336(%ebp), %esi
>> LVL320:
>> # %eax = tstate->c_tracefunc
>>        movl    28(%esi), %eax
>> LVL321:
>> # if tstate->c_tracefunc == 0
>>        testl   %eax, %eax
>> # goto past-if ()
>>        je      L567
>> # more if conditions here
>>
>> to:
>>
>>        # basic block 17
>>        # ../Python/ceval.c:889
>> LM542:
>>        movl    8(%ebp), %ecx
>> LVL319:
>>        subl    -316(%ebp), %edx
>>        movl    %edx, 60(%ecx)
>>        # ../Python/ceval.c:893
>> LM543:
>> # %eax = _Py_TracingPossible
>>        movl    __Py_TracingPossible-"L00000000033$pb"(%ebx), %eax
>> LVL320:
>> # if _Py_TracingPossible != 0
>>        testl   %eax, %eax
>> # goto rest-of-if (nearby)
>>        jne     L2321
>> # opcode = NEXTOP(); continues here
>>
>>
>> The branch should be predicted accurately either way, so there are 2
>> things that may be contributing to the performance change.
>>
>> First, adding the global caching variable halves the amount of memory
>> that has to be read to check the prediction. The memory that is read
>> is still read one instruction before it's used, but adding a local
>> variable to read the memory earlier doesn't affect the performance.
>>
>> Without the global variable, the compiler puts the tracing code
>> immediately after the if; with the global, it moves it away and puts
>> the non-tracing code immediately after the first test in the if. This
>> may affect branch prediction and may affect the icache. I tried using
>> gcc's __builtin_expect() to ensure that the tracing code is always
>> out-of-line. This moved it much farther away and cost about 1% in
>> performance (i.e. 1% instead of 2% faster than "before"). I don't know
>> why the __builtin_expect() version would be slower. If anyone feels
>> inspired to test this out on another processor or compiler version,
>> let me know how it goes.
>>
>> Jeffrey
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>>
>>
>



-- 
Namast?,
Jeffrey Yasskin
http://jeffrey.yasskin.info/

From gruszczy at gmail.com  Mon Dec  1 10:30:27 2008
From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=)
Date: Mon, 1 Dec 2008 10:30:27 +0100
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>
	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>
	<gguqcv$q87$1@ger.gmane.org> <4932F901.6070803@gmail.com>
	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>
	<49330AA9.7070005@gmail.com>
	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>
	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>
Message-ID: <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>

> Yes, but he should be able to change it in one place (in sip, the C++
> to Python wrapper generator he's also authored and uses for PyQt) AND
> it would make sip even better, so he may want to put it on his
> backlog.

He does. It is supposed to appear in 4.8. So I guess that's it, thanks
a lot for your help.

-- 
Filip Gruszczy?ski

From kristjan at ccpgames.com  Mon Dec  1 16:32:24 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Mon, 1 Dec 2008 15:32:24 +0000
Subject: [Python-Dev] Python under valgrind
In-Reply-To: <492FF774.1050101@avl.com>
References: <492FE636.50905@avl.com>
	<e27efe130811280455p2c037a77g367b81d099164f47@mail.gmail.com>
	<e27efe130811280456y6316f962o6ed67fe810b2b1c2@mail.gmail.com>
	<492FF774.1050101@avl.com>
Message-ID: <4E9372E6B2234D4F859320D896059A9510E0D7D122@exchis.ccp.ad.local>

Probably because of the object memory allocator.  It reads the start of memory pages to see if a block belongs tot the obmalloc system or not.
You want to remove the following line:
#define WITH_PYMALLOC 1
>From pyconfig.h if you intend to run using valgrind or say, purify.
K

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Hrvoje Niksic
Sent: 28. n?vember 2008 13:52
Cc: Python-Dev
Subject: Re: [Python-Dev] Python under valgrind

Amaury Forgeot d'Arc wrote:
> Did you use the suppressions file as suggested in Misc/README.valgrind?

Thanks for the suggestion (as well as to Gustavo and Victor), but my 
question wasn't about how to suppress the messages, but about why the 
messages appear in the first place.  I think my last paragraph answers 
my own question, but I'm not sure.
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com


From aleaxit at gmail.com  Mon Dec  1 16:35:34 2008
From: aleaxit at gmail.com (Alex Martelli)
Date: Mon, 1 Dec 2008 07:35:34 -0800
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>
	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>
	<gguqcv$q87$1@ger.gmane.org> <4932F901.6070803@gmail.com>
	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>
	<49330AA9.7070005@gmail.com>
	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>
	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>
	<1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>
Message-ID: <e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>

I wonder if there's some desiderata left for future Python versions to
make this standard behavior easier (for C-coded, Python-coded, and
Cython-coded classes, ones made by SWIG, etc) without too much black
magic...

Alex

On Mon, Dec 1, 2008 at 1:30 AM, Filip Gruszczy?ski <gruszczy at gmail.com> wrote:
>> Yes, but he should be able to change it in one place (in sip, the C++
>> to Python wrapper generator he's also authored and uses for PyQt) AND
>> it would make sip even better, so he may want to put it on his
>> backlog.
>
> He does. It is supposed to appear in 4.8. So I guess that's it, thanks
> a lot for your help.
>
> --
> Filip Gruszczy?ski
>

From dinov at microsoft.com  Mon Dec  1 18:56:08 2008
From: dinov at microsoft.com (Dino Viehland)
Date: Mon, 1 Dec 2008 09:56:08 -0800
Subject: [Python-Dev] format specification mini-language docs...
In-Reply-To: <492BF1C4.4050807@trueblade.com>
References: <350E7D38B6D819428718949920EC235556486D00A7@NA-EXMSG-C102.redmond.corp.microsoft.com>
	<492BF1C4.4050807@trueblade.com>
Message-ID: <350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com>

Yep, after the thanksgiving delay I've opened bug #4482 (http://bugs.python.org/issue4482).

I either don't know how to or don't have the power to change who a bug is assigned to so it appears to be currently unassigned.

-----Original Message-----
From: Eric Smith [mailto:eric at trueblade.com]
Sent: Tuesday, November 25, 2008 4:38 AM
To: Dino Viehland
Cc: python-dev at python.org dev
Subject: Re: [Python-Dev] format specification mini-language docs...

Dino Viehland wrote:

<previously discussed cases deleted>
> Finally providing any sign character seems to cause +1.0#INF and friends to be returned instead of inf as is documented:
>
>>>> 10e667.__format__('+')
> '+1.0#INF'
>>>> 10e667.__format__('')
> 'inf'
>
>
> Are these just doc bugs?  The inf issue is the only one that seems particularly weird to me.

I think the inf one is a bug. Would you mind opening a bug and assigning
it to me? Thanks.

Eric.



From eric at trueblade.com  Mon Dec  1 19:15:03 2008
From: eric at trueblade.com (Eric Smith)
Date: Mon, 01 Dec 2008 13:15:03 -0500
Subject: [Python-Dev] format specification mini-language docs...
In-Reply-To: <350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com>
References: <350E7D38B6D819428718949920EC235556486D00A7@NA-EXMSG-C102.redmond.corp.microsoft.com>	<492BF1C4.4050807@trueblade.com>
	<350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com>
Message-ID: <493429A7.4020004@trueblade.com>

Dino Viehland wrote:
> Yep, after the thanksgiving delay I've opened bug #4482 (http://bugs.python.org/issue4482).

Thanks!

> I either don't know how to or don't have the power to change who a bug is assigned to so it appears to be currently unassigned.

I'll take care of it.

Eric.

> -----Original Message-----
> From: Eric Smith [mailto:eric at trueblade.com]
> Sent: Tuesday, November 25, 2008 4:38 AM
> To: Dino Viehland
> Cc: python-dev at python.org dev
> Subject: Re: [Python-Dev] format specification mini-language docs...
> 
> Dino Viehland wrote:
> 
> <previously discussed cases deleted>
>> Finally providing any sign character seems to cause +1.0#INF and friends to be returned instead of inf as is documented:
>>
>>>>> 10e667.__format__('+')
>> '+1.0#INF'
>>>>> 10e667.__format__('')
>> 'inf'
>>
>>
>> Are these just doc bugs?  The inf issue is the only one that seems particularly weird to me.
> 
> I think the inf one is a bug. Would you mind opening a bug and assigning
> it to me? Thanks.
> 
> Eric.
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/eric%2Bpython-dev%40trueblade.com
> 


From gerald.koenig at hp.com  Mon Dec  1 20:05:01 2008
From: gerald.koenig at hp.com (Koenig, Gerald)
Date: Mon, 1 Dec 2008 19:05:01 +0000
Subject: [Python-Dev] Python for windows.
In-Reply-To: <49331277.10003@v.loewis.de>
References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com>
	<238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net>
	<ggkat4$2q5$1@ger.gmane.org>
	<238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net>
	<042401c95012$3bff99d0$b3fecd70$@com.au>	<492DCE5E.5080602@v.loewis.de>
	<043e01c95019$9955a0a0$cc00e1e0$@com.au>	<492DDE40.2040206@v.loewis.de>
	<045801c9503a$8e85d2f0$ab9178d0$@com.au>	<492F2788.7040300@canterbury.ac.nz>
	<04ce01c9510d$3132d750$939885f0$@com.au>	<492FBB2C.5000309@gmail.com>
	<053201c95337$fa09e930$ee1dbb90$@com.au> <49331277.10003@v.loewis.de>
Message-ID: <238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net>

Hi all,

I didn't look at the thread until this morning.

The OEM ready program required that the installed force to program files.
But as we preinstalled we use your msi with a normal parameter: python-2.5.2.msi TARGETDIR=c:\program files\python"

That why I didn't ask you about that.

WE have done already few weeks of test and nothing is breaking up to now :)

Now about the 2 others issues what will be the easier way to fix them properly ?
- for the executable without manifest as we are on vista OS only I can add a manifest for vista outside the executable it should work.
- for python_icon.exe I do not know what is calling it in start menu can you help me on that ?

Gerald

-----Original Message-----
From: python-dev-bounces+gerald.koenig=hp.com at python.org [mailto:python-dev-bounces+gerald.koenig=hp.com at python.org] On Behalf Of "Martin v. L?wis"
Sent: Sunday, November 30, 2008 2:24 PM
To: mhammond at skippinet.com.au
Cc: 'Nick Coghlan'; python-dev at python.org
Subject: Re: [Python-Dev] Python for windows.

> Of course, I don't object to that and still think we should help where we
> can, but if that is true it would make the premise of this thread a little
> misleading, as obviously HP could then make *any* necessary changes without
> our agreement or even knowledge.

Perhaps. However, "help where we can" is about right. If its only the
changes HP discussed so far, I think we should be able to help.
For the Program Files issue, without going into the discussion whether
Python's defaults are good or not, I think there would be still a number
of technical solutions (such as providing a merge module which changes
the default).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/gerald.koenig%40hp.com

From gerald.koenig at hp.com  Mon Dec  1 20:08:10 2008
From: gerald.koenig at hp.com (Koenig, Gerald)
Date: Mon, 1 Dec 2008 19:08:10 +0000
Subject: [Python-Dev] Python for windows.
References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com>
	<238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net>
	<ggkat4$2q5$1@ger.gmane.org>
	<238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net>
	<042401c95012$3bff99d0$b3fecd70$@com.au>	<492DCE5E.5080602@v.loewis.de>
	<043e01c95019$9955a0a0$cc00e1e0$@com.au>	<492DDE40.2040206@v.loewis.de>
	<045801c9503a$8e85d2f0$ab9178d0$@com.au>	<492F2788.7040300@canterbury.ac.nz>
	<04ce01c9510d$3132d750$939885f0$@com.au>	<492FBB2C.5000309@gmail.com>
	<053201c95337$fa09e930$ee1dbb90$@com.au> <49331277.10003@v.loewis.de> 
Message-ID: <238A96A773B3934685A7269CC8A8D0423F80619F9F@GVW0436EXB.americas.hpqcorp.net>

Mark,

We do not install that on first boot.

I can not tell how it is install but on first boot python is already there and installed properly

Gerald



-----Original Message-----
From: Koenig, Gerald
Sent: Monday, December 01, 2008 11:05 AM
To: '"Martin v. L?wis"'; mhammond at skippinet.com.au
Cc: 'Nick Coghlan'; python-dev at python.org
Subject: RE: [Python-Dev] Python for windows.

Hi all,

I didn't look at the thread until this morning.

The OEM ready program required that the installed force to program files.
But as we preinstalled we use your msi with a normal parameter: python-2.5.2.msi TARGETDIR=c:\program files\python"

That why I didn't ask you about that.

WE have done already few weeks of test and nothing is breaking up to now :)

Now about the 2 others issues what will be the easier way to fix them properly ?
- for the executable without manifest as we are on vista OS only I can add a manifest for vista outside the executable it should work.
- for python_icon.exe I do not know what is calling it in start menu can you help me on that ?

Gerald

-----Original Message-----
From: python-dev-bounces+gerald.koenig=hp.com at python.org [mailto:python-dev-bounces+gerald.koenig=hp.com at python.org] On Behalf Of "Martin v. L?wis"
Sent: Sunday, November 30, 2008 2:24 PM
To: mhammond at skippinet.com.au
Cc: 'Nick Coghlan'; python-dev at python.org
Subject: Re: [Python-Dev] Python for windows.

> Of course, I don't object to that and still think we should help where we
> can, but if that is true it would make the premise of this thread a little
> misleading, as obviously HP could then make *any* necessary changes without
> our agreement or even knowledge.

Perhaps. However, "help where we can" is about right. If its only the
changes HP discussed so far, I think we should be able to help.
For the Program Files issue, without going into the discussion whether
Python's defaults are good or not, I think there would be still a number
of technical solutions (such as providing a merge module which changes
the default).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/gerald.koenig%40hp.com

From ncoghlan at gmail.com  Mon Dec  1 22:20:36 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 02 Dec 2008 07:20:36 +1000
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>	
	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>	
	<gguqcv$q87$1@ger.gmane.org> <4932F901.6070803@gmail.com>	
	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>	
	<49330AA9.7070005@gmail.com>	
	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>	
	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>	
	<1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>
	<e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>
Message-ID: <49345524.2090409@gmail.com>

Alex Martelli wrote:
> I wonder if there's some desiderata left for future Python versions to
> make this standard behavior easier (for C-coded, Python-coded, and
> Cython-coded classes, ones made by SWIG, etc) without too much black
> magic...

Perhaps adding something like the following to the C API:

void PyErr_FormatAttributeError(PyObject* type, char *attr)
{
  PyErr_Format(PyExc_AttributeError,
    "object of type %.100s has no attribute '%.200s'",
    type->tp_name, attr);
}

This could also be exposed as a class method of AttributeError itself
for use in Python code.

(Interestingly, I noticed that there are still quite a few attribute
errors at least in typeobject.c that don't provide any information on
the type of the object that is missing an attribute - they appeared to
mostly be obscure errors that will only turn up if something has gone
very strange, but they're there)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Mon Dec  1 23:55:37 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 01 Dec 2008 23:55:37 +0100
Subject: [Python-Dev] Python for windows.
In-Reply-To: <238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net>
References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com>	<238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net>	<ggkat4$2q5$1@ger.gmane.org>	<238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net>	<042401c95012$3bff99d0$b3fecd70$@com.au>	<492DCE5E.5080602@v.loewis.de>	<043e01c95019$9955a0a0$cc00e1e0$@com.au>	<492DDE40.2040206@v.loewis.de>	<045801c9503a$8e85d2f0$ab9178d0$@com.au>	<492F2788.7040300@canterbury.ac.nz>	<04ce01c9510d$3132d750$939885f0$@com.au>	<492FBB2C.5000309@gmail.com>	<053201c95337$fa09e930$ee1dbb90$@com.au>
	<49331277.10003@v.loewis.de>
	<238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net>
Message-ID: <49346B69.1030901@v.loewis.de>

> The OEM ready program required that the installed force to program
> files. But as we preinstalled we use your msi with a normal
> parameter: python-2.5.2.msi TARGETDIR=c:\program files\python"

I think the debate was about whether it can be "OEM ready",
even though you still need to pass the TARGETDIR parameter.

If it works for you, it works for me, of course.

> Now about the 2 others issues what will be the easier way to fix them
> properly ? - for the executable without manifest as we are on vista
> OS only I can add a manifest for vista outside the executable it
> should work.

Please do submit an issue in the bug tracker atleast, asking that the
files be renamed. Please confirm explicitly that renaming them would
also solve the problem (assuming you are still talking about the files
in distutils).

> for python_icon.exe I do not know what is calling it
> in start menu can you help me on that ?

Please look in Tools/msi/msi.py for all occurrences of python_icon.exe.

Regards,
Martin

From martin at v.loewis.de  Tue Dec  2 00:10:57 2008
From: martin at v.loewis.de (=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 02 Dec 2008 00:10:57 +0100
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>	<gguqcv$q87$1@ger.gmane.org>
	<4932F901.6070803@gmail.com>	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>	<49330AA9.7070005@gmail.com>	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>	<1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>
	<e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>
Message-ID: <49346F01.2030607@v.loewis.de>

Alex Martelli wrote:
> I wonder if there's some desiderata left for future Python versions to
> make this standard behavior easier (for C-coded, Python-coded, and
> Cython-coded classes, ones made by SWIG, etc) without too much black
> magic...

I think the standard exception hierarchy should grow additional standard
fields. E.g. AttributeError should have attributes 'type','name', or
perhaps even 'object','name'. TypeError should have attributes
'expected', 'actual' (or, again, 'expected', 'object'). Also, some
languages support nested exceptions (attribute 'inner'); usefulness of
this concept should be reviewed.

And so on - that might produce quite a large PEP. As 3.0 missed the
chance to fix this, compatibility is also an issue. It might be possible
to overload exception constructors on the number of parameters, or using
keyword parameters for the new way of filling the exception.

And no, I don't volunteer to write this PEP :-)

Regards,
Martin

From tom at vector-seven.com  Tue Dec  2 07:57:11 2008
From: tom at vector-seven.com (Thomas Lee)
Date: Tue, 02 Dec 2008 17:57:11 +1100
Subject: [Python-Dev] Move encoding_decl to the top of Grammar/Grammar?
Message-ID: <4934DC47.6040508@vector-seven.com>

Hi all,

Currently, Parser/parsetok.c has a dependency on graminit.h. This can 
cause headaches when rebuilding after adding new syntax  to 
Grammar/Grammar because parsetok.c is part of pgen, which is responsible 
for *generating* graminit.h.

This circular dependency can result in parsetok.c using a different 
value for encoding_decl to what is used in ast.c, which causes 
PyAST_FromNode to fall over at runtime. It effectively looks something 
like this:

* Grammar/Grammar is modified
* build begins -- pgen compiles, parsetok.c uses encoding_decl=X
* graminit.h is rebuilt with encoding_decl=Y
* ast.c is compiled using encoding_decl=Y
* when python runs, parsetok() emits encoding_decl nodes that 
PyAST_FromNode can't recognize:

SystemError: invalid node XXX for PyAST_FromNode

A nice, easy short term solution that doesn't require unwinding this 
dependency would be to simply move encoding_decl to the top of 
Grammar/Grammar and add a big warning noting that it needs to come 
before everything else. This will help to ensure its value never changes 
when syntax is added/removed.

I'm happy to provide a patch for this (including some additional 
dependency info for files dependent upon graminit.h and Python-ast.h), 
but was wondering if there were any opinions about how this should be 
resolved.

Cheers,
Tom

From tom at vector-seven.com  Tue Dec  2 09:00:16 2008
From: tom at vector-seven.com (Thomas Lee)
Date: Tue, 02 Dec 2008 19:00:16 +1100
Subject: [Python-Dev] Move encoding_decl to the top of Grammar/Grammar?
In-Reply-To: <4934DC47.6040508@vector-seven.com>
References: <4934DC47.6040508@vector-seven.com>
Message-ID: <4934EB10.2010100@vector-seven.com>

Here's the corresponding tracker issue:

http://bugs.python.org/issue4347

I've uploaded a patch there anyway, since I'm going to need this stuff 
working for a presentation I'm giving tomorrow.

Cheers,
T

Thomas Lee wrote:
> Hi all,
>
> Currently, Parser/parsetok.c has a dependency on graminit.h. This can 
> cause headaches when rebuilding after adding new syntax  to 
> Grammar/Grammar because parsetok.c is part of pgen, which is 
> responsible for *generating* graminit.h.
>
> This circular dependency can result in parsetok.c using a different 
> value for encoding_decl to what is used in ast.c, which causes 
> PyAST_FromNode to fall over at runtime. It effectively looks something 
> like this:
>
> * Grammar/Grammar is modified
> * build begins -- pgen compiles, parsetok.c uses encoding_decl=X
> * graminit.h is rebuilt with encoding_decl=Y
> * ast.c is compiled using encoding_decl=Y
> * when python runs, parsetok() emits encoding_decl nodes that 
> PyAST_FromNode can't recognize:
>
> SystemError: invalid node XXX for PyAST_FromNode
>
> A nice, easy short term solution that doesn't require unwinding this 
> dependency would be to simply move encoding_decl to the top of 
> Grammar/Grammar and add a big warning noting that it needs to come 
> before everything else. This will help to ensure its value never 
> changes when syntax is added/removed.
>
> I'm happy to provide a patch for this (including some additional 
> dependency info for files dependent upon graminit.h and Python-ast.h), 
> but was wondering if there were any opinions about how this should be 
> resolved.
>
> Cheers,
> Tom
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/tom%40vector-seven.com


From ncoghlan at gmail.com  Tue Dec  2 11:37:56 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 02 Dec 2008 20:37:56 +1000
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <49346F01.2030607@v.loewis.de>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>	<gguqcv$q87$1@ger.gmane.org>
	<4932F901.6070803@gmail.com>	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>	<49330AA9.7070005@gmail.com>	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>	<1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>
	<e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>
	<49346F01.2030607@v.loewis.de>
Message-ID: <49351004.9090208@gmail.com>

Martin v. L?wis wrote:
> Alex Martelli wrote:
> I think the standard exception hierarchy should grow additional standard
> fields. E.g. AttributeError should have attributes 'type','name', or
> perhaps even 'object','name'. TypeError should have attributes
> 'expected', 'actual' (or, again, 'expected', 'object').

> And so on - that might produce quite a large PEP.

I don't think there's any reason to do it in one big bang. And
approached individually, each such alternate constructor is a small RFE
consisting of:

1. Specific C API for creating exceptions of that type with a standard
message and attributes
2. Python level class method
3. New attributes on the affected object

Point 3 would be optional really, since most of the gain comes from the
better error messages. If extra attributes were included in such an RFE,
the potential lifecycle implications of including references to actual
objects rather than merely their types makes the better choice fairly
obvious to me (i.e. just include the type information, since it
generally tells you everything you need to know for TypeErrors and
AttributeErrors).

> As 3.0 missed the
> chance to fix this, compatibility is also an issue. It might be possible
> to overload exception constructors on the number of parameters, or using
> keyword parameters for the new way of filling the exception.

Or go the traditional "multiple constructor" route and provide class
methods for the alternative mechanisms.

> And no, I don't volunteer to write this PEP :-)

Assuming I understand what you mean by "nested exceptions" correctly,
they should be covered by the __context__ and __cause__ attributes in Py3k:

Exception context:
===========================
>>> try:
...   raise IOError
... except:
...   raise AttributeError
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IOError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
AttributeError
===========================

Exception cause:
===========================
>>> raise AttributeError from KeyError
KeyError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError
===========================


Putting it all together:
===========================
>>> try:
...   raise IOError
... except:
...   try:
...     raise KeyError
...   except Exception as ex:
...     raise AttributeError from ex
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
IOError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
KeyError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
AttributeError
===========================

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From barry at python.org  Tue Dec  2 21:31:58 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 2 Dec 2008 15:31:58 -0500
Subject: [Python-Dev] Tomorrow's releases
Message-ID: <A55F56A2-6A2C-4578-ACCF-8D24D4DA985F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I believe we are on track for releasing Python 3.0 final and 2.6.1  
tomorrow.  There is just one release blocker for 3.0 left -- Guido  
needs to finish the What's New for 3.0.

This is bug 2306.

So that Martin can have something to work with when he wakes up  
tomorrow morning, I would like to tag and branch the tree some time  
today, Tuesday 02-Dec US/Eastern.  Therefore I am freezing both the  
2.6 and 3.0 trees, with special dispensation to Guido for the updated  
What's New.

Ping me on irc @ freenode #python-dev if you have anything else to  
check in to either tree before then.  As soon as I hear from Guido, or  
issue 2306 is closed, I'm branching 3.0 and tagging it for release.

Great work everyone, we're almost there!
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTWbPnEjvBPtnXfVAQKtOgP9EZgGkE8/UY1IRn7j0l6vX6uqbPapg+9H
MlBIZrA6mEbGiaDSvPRiwuo71jP5cg0u/xFRdDlGYl0GAzOEWvKCZVlVsndM4kbh
7UxHjHfkIOo4MUw4zz1NrJ4GRNgBQa52OOtiOKKkIhr/oMsg+GWv8Y9hRXYA9xue
s8as2AQe2QU=
=5j55
-----END PGP SIGNATURE-----

From steve at holdenweb.com  Wed Dec  3 04:39:23 2008
From: steve at holdenweb.com (Steve Holden)
Date: Tue, 02 Dec 2008 22:39:23 -0500
Subject: [Python-Dev] Attribute error: providing type name
In-Reply-To: <49351004.9090208@gmail.com>
References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com>	<aac2c7cb0811301106y3a1bbcbbt705365f37be4f548@mail.gmail.com>	<gguqcv$q87$1@ger.gmane.org>	<4932F901.6070803@gmail.com>	<1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com>	<49330AA9.7070005@gmail.com>	<1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com>	<e8a0972d0811301714x263aebe0nf729f045f928a29f@mail.gmail.com>	<1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com>	<e8a0972d0812010735i6feede8ao7068b23b11db3dc0@mail.gmail.com>	<49346F01.2030607@v.loewis.de>
	<49351004.9090208@gmail.com>
Message-ID: <gh4v13$ibo$1@ger.gmane.org>

Nick Coghlan wrote:
> Martin v. L?wis wrote:
>> Alex Martelli wrote:
>> I think the standard exception hierarchy should grow additional standard
>> fields. E.g. AttributeError should have attributes 'type','name', or
>> perhaps even 'object','name'. TypeError should have attributes
>> 'expected', 'actual' (or, again, 'expected', 'object').
> 
>> And so on - that might produce quite a large PEP.
> 
> I don't think there's any reason to do it in one big bang. And
> approached individually, each such alternate constructor is a small RFE
> consisting of:
> 
> 1. Specific C API for creating exceptions of that type with a standard
> message and attributes
> 2. Python level class method
> 3. New attributes on the affected object
> 
> Point 3 would be optional really, since most of the gain comes from the
> better error messages. If extra attributes were included in such an RFE,
> the potential lifecycle implications of including references to actual
> objects rather than merely their types makes the better choice fairly
> obvious to me (i.e. just include the type information, since it
> generally tells you everything you need to know for TypeErrors and
> AttributeErrors).
> 
>> As 3.0 missed the
>> chance to fix this, compatibility is also an issue. It might be possible
>> to overload exception constructors on the number of parameters, or using
>> keyword parameters for the new way of filling the exception.
> 
> Or go the traditional "multiple constructor" route and provide class
> methods for the alternative mechanisms.
> 
Bear in mind, though, that as new functionality none of this can go in
before 3.1/2.7. So a PEP might not be a bad idea if only to establish
best practices.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From alexander.belopolsky at gmail.com  Wed Dec  3 04:44:43 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 2 Dec 2008 22:44:43 -0500
Subject: [Python-Dev] Accessing source code in zipped packages
Message-ID: <d38f5330812021944q17b69506u83090e3a3b734616@mail.gmail.com>

About a month ago, I submitted two patches that address Pdb and
doctest inability to load source code from modules with custom loaders
such as modules loaded from zip files:

http://bugs.python.org/issue4201
http://bugs.python.org/issue4197

The patches are very simple, basically calls to linecache.getline()
need to be provided with the module's dict to enable linecache to find
the module's __loader__.

Is there a chance that these patches could make it to 2.6.1?

From g.brandl at gmx.net  Wed Dec  3 07:52:02 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 03 Dec 2008 07:52:02 +0100
Subject: [Python-Dev] Accessing source code in zipped packages
In-Reply-To: <d38f5330812021944q17b69506u83090e3a3b734616@mail.gmail.com>
References: <d38f5330812021944q17b69506u83090e3a3b734616@mail.gmail.com>
Message-ID: <gh5aai$aq4$1@ger.gmane.org>

Alexander Belopolsky schrieb:
> About a month ago, I submitted two patches that address Pdb and
> doctest inability to load source code from modules with custom loaders
> such as modules loaded from zip files:
> 
> http://bugs.python.org/issue4201
> http://bugs.python.org/issue4197
> 
> The patches are very simple, basically calls to linecache.getline()
> need to be provided with the module's dict to enable linecache to find
> the module's __loader__.

There is also http://bugs.python.org/issue4223 which goes in the same
direction.

Georg


From amk at amk.ca  Wed Dec  3 16:31:28 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Wed, 3 Dec 2008 10:31:28 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
Message-ID: <20081203153128.GA6161@amk-desktop.matrixgroup.net>

The PyCon organizers are planning a Python Language Summit to be held
in Chicago just before the conference, on Thursday March 26 2009.
(This is the second day of tutorials, and the day before PyCon
officially starts.)

The purpose of the Python Language Summit is to let the developers of
Python implementations discuss issues that affect us all, and to let
the developers of a particular implementation discuss their own
project-specific issues.  PyCon brings a lot of the core developers
together into one place and there's been a "Python core" sprint for a
long time, but we haven't had a formal time and place for *discussion*
among core developers.

Attending the summit will be free; registration for PyCon is *not*
included, but won't be required to attend the summit.

I e-mailed some CPython, Jython, IronPython, PyPy, etc. developers
asking for topic suggestions, and assembled a draft of a schedule from
some of the most commonly mentioned topics; the current draft schedule
is below.  The schedule is very 'loose', leaving a fair bit of open
space so that we can hopefully begin working on ideas arising from the
discussion.

* What do you think of the selected topics?

* I'd like to have a champion for each session, who will make a brief
  presentation about the session's topic at the start, laying out the
  issues and possible courses of action to guide the resulting
  discussion.  If you wish to volunteer as the champion for a session,
  please let me know.  (Preference will be given to people actively
  working on the particular topic.)

* For CPython, invitations will be sent to everyone with committer
  status (plus a few book authors, significant patch contributors who
  aren't committers yet, etc.).  If you're not a committer but think
  you can contribute, please let me know privately.  Also, please
  suggest other

* There will probably be summit-related sponsorship opportunities for
  interested companies.


Andrew M. Kuchling
amk at amk.ca
Registration Manager, PyCon 2009
http://us.pycon.org



9:00 - 10:30   
=============

Open discussion session


11:00 - 12:30
=============

Transition plan for rest of 2.x series; goals for 2.7/3.1.
- New features & future plans?
- Is 2.7 last of the 2.x releases?
- Unicode issues
- Stdlib plans?

Champion needed.


12:30 - 14:00
=============

Lunch (probably provided by the PSF or a sponsor).


14:00 - 15:30
=============

Two tracks:

Cross-implementation issues:

  What do the various VMs want/need from CPython to help with their
  implementations?

  * Marking CPython-specific tests in the test suite?
  * Getting an implementation agnostic test suite for the Python language?  
  * Separating the language tests and the pure Python part of the stdlib into 
    a separate project?  (Or publish them as a separate package.)
  * Transition plans for 3.0?

  Champion needed.

Package distribution & installation.

  * setting up an organized network of mirrors ? la CPAN
  * adding a commenting system on PyPI
  * think about a reference implementation for a PyPI client in the
    stdlib (XML-RPC client+upload and register)
  * improvments on packaging matters - this includes distutils but
    also setuptools.

  Champion needed.


16:00 - 17:30
=============

Free space for sprinting, hacking, further discussion, etc.


18:00-ish
=============

Group dinners.

From ziade.tarek at gmail.com  Wed Dec  3 16:48:12 2008
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 3 Dec 2008 16:48:12 +0100
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
Message-ID: <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com>

On Wed, Dec 3, 2008 at 4:31 PM, A.M. Kuchling <amk at amk.ca> wrote:
> The PyCon organizers are planning a Python Language Summit to be held
> in Chicago just before the conference, on Thursday March 26 2009.
> (This is the second day of tutorials, and the day before PyCon
> officially starts.)
> [cut]
>
> Package distribution & installation.
>
>  * setting up an organized network of mirrors ? la CPAN
>  * adding a commenting system on PyPI
>  * think about a reference implementation for a PyPI client in the
>    stdlib (XML-RPC client+upload and register)
>  * improvments on packaging matters - this includes distutils but
>    also setuptools.

Hello,

I'd like to volunteer for that part given the fact that I am currently
working on the patches
for the mirroring thing in a branch of PyPI.

The work is described here : http://wiki.python.org/moin/PEP%20374
It changed a bit and I need to update it, but you get the idea there.

I also have some work going on for distutils.

You have a summary of the work going on in my blog
http://tarekziade.wordpress.com/2008/11/26/python-package-distribution-my-current-work/

Regards
Tarek

>
>  Champion needed.
>
>
> 16:00 - 17:30
> =============
>
> Free space for sprinting, hacking, further discussion, etc.
>
>
> 18:00-ish
> =============
>
> Group dinners.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com
>



-- 
Tarek Ziad? | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

From ncoghlan at gmail.com  Wed Dec  3 21:34:56 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 04 Dec 2008 06:34:56 +1000
Subject: [Python-Dev] Accessing source code in zipped packages
In-Reply-To: <gh5aai$aq4$1@ger.gmane.org>
References: <d38f5330812021944q17b69506u83090e3a3b734616@mail.gmail.com>
	<gh5aai$aq4$1@ger.gmane.org>
Message-ID: <4936ED70.2040005@gmail.com>

Georg Brandl wrote:
> Alexander Belopolsky schrieb:
>> About a month ago, I submitted two patches that address Pdb and
>> doctest inability to load source code from modules with custom loaders
>> such as modules loaded from zip files:
>>
>> http://bugs.python.org/issue4201
>> http://bugs.python.org/issue4197
>>
>> The patches are very simple, basically calls to linecache.getline()
>> need to be provided with the module's dict to enable linecache to find
>> the module's __loader__.
> 
> There is also http://bugs.python.org/issue4223 which goes in the same
> direction.

I've assigned all 3 of those to myself, since I've been meaning to look
at some zipimport related stuff anyway (the things I'm looking at are
2.7/3.1 related though, so I was waiting for the 3.0 release to be cut
first).

We already missed the 2.6.1 deadline though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Wed Dec  3 21:44:38 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 04 Dec 2008 06:44:38 +1000
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com>
Message-ID: <4936EFB6.1080808@gmail.com>

Tarek Ziad? wrote:
> Hello,
> 
> I'd like to volunteer for that part given the fact that I am currently
> working on the patches
> for the mirroring thing in a branch of PyPI.
> 
> The work is described here : http://wiki.python.org/moin/PEP%20374
> It changed a bit and I need to update it, but you get the idea there.

For the record, when working on a PEP draft on the Wiki or Google docs,
it's worth asking the PEP editors (or any of the SVN committers really)
to reserve a PEP number once things start to progress to the point where
folks need a common shorthand reference to the document.

PEP 374 for example, is already a placeholder for the SVN to DVCS
migration PEP:
http://www.python.org/dev/peps/pep-0374/

We aren't going to run out of PEP numbers anytime soon - it's OK if some
of them get "wasted" on draft PEPs that end up getting abandoned.
(Better that than having multiple draft PEPs being referred to with the
same number as appears to be the case at the moment).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ziade.tarek at gmail.com  Wed Dec  3 23:02:07 2008
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 3 Dec 2008 23:02:07 +0100
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <4936EFB6.1080808@gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com>
	<4936EFB6.1080808@gmail.com>
Message-ID: <94bdd2610812031402x152d5c7bjfb18c2c14fa5d411@mail.gmail.com>

On Wed, Dec 3, 2008 at 9:44 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Tarek Ziad? wrote:
>> Hello,
>>
>> I'd like to volunteer for that part given the fact that I am currently
>> working on the patches
>> for the mirroring thing in a branch of PyPI.
>>
>> The work is described here : http://wiki.python.org/moin/PEP%20374
>> It changed a bit and I need to update it, but you get the idea there.
>
> For the record, when working on a PEP draft on the Wiki or Google docs,
> it's worth asking the PEP editors (or any of the SVN committers really)
> to reserve a PEP number once things start to progress to the point where
> folks need a common shorthand reference to the document.
>
> PEP 374 for example, is already a placeholder for the SVN to DVCS
> migration PEP:
> http://www.python.org/dev/peps/pep-0374/
>

Right, I'll ask for a number and change it accordingly;

Regards
Tarek

From barry at python.org  Thu Dec  4 02:51:33 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Dec 2008 20:51:33 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
Message-ID: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On behalf of the Python development team and the Python community, I  
am happy to announce the release of Python 3.0 final.

Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major  
milestone in Python's history, and was nearly three years in the  
making.  This is a new version of the language that is incompatible  
with the 2.x line of releases, while remaining true to BDFL Guido van  
Rossum's vision.  Some things you will notice include:

* Fixes to many old language warts
* Removal of long deprecated features and redundant syntax
* Improvements in, and a reorganization of, the standard library
* Changes to the details of how built-in objects like strings and  
dicts work
* ...and many more new features

While these changes were made without concern for backward  
compatibility, Python 3.0 still remains very much "Pythonic".

We are confident that Python 3.0 is of the same high quality as our  
previous releases, such as the recently announced Python 2.6.  We will  
continue to support and develop both Python 3 and Python 2 for the  
foreseeable future, and you can safely choose either version (or both)  
to use in your projects.  Which you choose depends on your own needs  
and the availability of third-party packages that you depend on.  Some  
other things to consider:

* Python 3 has a single Unicode string type; there are no more 8-bit  
strings
* The C API has changed considerably in Python 3.0 and third-party  
extension modules you rely on may not yet be ported
* Tools are available in both Python 2.6 and 3.0 to help you migrate  
your code
* Python 2.6 is backward compatible with earlier Python 2.x releases

We encourage you to participate in Python 3.0's development process by  
joining its mailing list:

     http://mail.python.org/mailman/listinfo/python-3000

If you find things in Python 3.0 that are broken or incorrect, please  
submit bug reports at:

    http://bugs.python.org/

For more information, links to documentation, and downloadable  
distributions, see the Python 3.0 website:

    http://www.python.org/download/releases/3.0/

Enjoy,
- -Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTc3pXEjvBPtnXfVAQI69wP/dPHh8IL3GxziEV9QzlveKG+KyZb2X16x
fxJnTCiXAbiAhT5C+m43OEnbF1PJgMDKtcZ5b7aQb4TQ0mJxISTQh0RfLCpArmlo
tdTbzCLnh13KzB+3sUHCx+MeQNXERoWDV8hLz+4Ae71UsuUGynhtyP7ZJMJDue8j
so2gv3fOMSs=
=vkiy
-----END PGP SIGNATURE-----

From guido at python.org  Thu Dec  4 03:19:09 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 3 Dec 2008 18:19:09 -0800
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
Message-ID: <ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>

Thanks so much for seeing this one through, Barry and co! Champagne!!!

On Wed, Dec 3, 2008 at 5:51 PM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On behalf of the Python development team and the Python community, I am
> happy to announce the release of Python 3.0 final.
>
> Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major milestone in
> Python's history, and was nearly three years in the making.  This is a new
> version of the language that is incompatible with the 2.x line of releases,
> while remaining true to BDFL Guido van Rossum's vision.  Some things you
> will notice include:
>
> * Fixes to many old language warts
> * Removal of long deprecated features and redundant syntax
> * Improvements in, and a reorganization of, the standard library
> * Changes to the details of how built-in objects like strings and dicts work
> * ...and many more new features
>
> While these changes were made without concern for backward compatibility,
> Python 3.0 still remains very much "Pythonic".
>
> We are confident that Python 3.0 is of the same high quality as our previous
> releases, such as the recently announced Python 2.6.  We will continue to
> support and develop both Python 3 and Python 2 for the foreseeable future,
> and you can safely choose either version (or both) to use in your projects.
>  Which you choose depends on your own needs and the availability of
> third-party packages that you depend on.  Some other things to consider:
>
> * Python 3 has a single Unicode string type; there are no more 8-bit strings
> * The C API has changed considerably in Python 3.0 and third-party extension
> modules you rely on may not yet be ported
> * Tools are available in both Python 2.6 and 3.0 to help you migrate your
> code
> * Python 2.6 is backward compatible with earlier Python 2.x releases
>
> We encourage you to participate in Python 3.0's development process by
> joining its mailing list:
>
>    http://mail.python.org/mailman/listinfo/python-3000
>
> If you find things in Python 3.0 that are broken or incorrect, please submit
> bug reports at:
>
>   http://bugs.python.org/
>
> For more information, links to documentation, and downloadable
> distributions, see the Python 3.0 website:
>
>   http://www.python.org/download/releases/3.0/
>
> Enjoy,
> - -Barry
>
> Barry Warsaw
> barry at python.org
> Python 2.6/3.0 Release Manager
> (on behalf of the entire python-dev team)
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Darwin)
>
> iQCVAwUBSTc3pXEjvBPtnXfVAQI69wP/dPHh8IL3GxziEV9QzlveKG+KyZb2X16x
> fxJnTCiXAbiAhT5C+m43OEnbF1PJgMDKtcZ5b7aQb4TQ0mJxISTQh0RfLCpArmlo
> tdTbzCLnh13KzB+3sUHCx+MeQNXERoWDV8hLz+4Ae71UsuUGynhtyP7ZJMJDue8j
> so2gv3fOMSs=
> =vkiy
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu Dec  4 03:24:23 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Dec 2008 21:24:23 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>
Message-ID: <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 3, 2008, at 9:13 PM, Dotan Cohen wrote:

> On this page:
> http://www.python.org/download/releases/3.0/
>
> The text "This is a proeuction release" should probably read "This is
> a production release". It would give a better first impression :)

Fixed, thanks!
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTc/WHEjvBPtnXfVAQL8TwP+M2Ryv7WY36ICEvzGU4EzlRG/gI4MolQe
cD8DJUJfQuR6INTot/t7vTcL8oDHq7q9OHbfvd3jmSwH/ZytsMz2OvJUYlKDQjwG
BcQRpioprcesoU6cufSmKAUiUP+L0RTAMmT0WDbbeCzzMZRq3Humd4Zs43nL26NT
uFb83Dk6yWA=
=qPjn
-----END PGP SIGNATURE-----

From barry at python.org  Thu Dec  4 03:25:05 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 3 Dec 2008 21:25:05 -0500
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
Message-ID: <C3E23444-13CA-41DF-86AA-21575E9B55F7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 3, 2008, at 9:19 PM, Guido van Rossum wrote:

> Thanks so much for seeing this one through, Barry and co! Champagne!!!

Now if only I could go on vacation. :)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTc/gXEjvBPtnXfVAQKZGgP/Y41JSlU6bQlGQKQmrjxv2jUWf2AWDLSu
4HG45m5plX/r6z1bZlxdqvpVqVRGgInoe+uw96WEgjW+F5NomU4ZKQ+YVOZFjkJY
izAWQllxZRkErdIBq158DOKTTyiJpUpRnGvwx2J67/pIBGLfFLZ+yPAu+4jT4fJ+
qFq/oGKCKIY=
=wiBX
-----END PGP SIGNATURE-----

From ed at leafe.com  Thu Dec  4 04:29:41 2008
From: ed at leafe.com (Ed Leafe)
Date: Wed, 3 Dec 2008 21:29:41 -0600
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
Message-ID: <20739B1C-8B6F-4A7A-B699-76DD938DA2E3@leafe.com>

On Dec 3, 2008, at 7:51 PM, Barry Warsaw wrote:

> On behalf of the Python development team and the Python community, I  
> am happy to announce the release of Python 3.0 final.


	Props to all the folks whose hard work made this possible! You guys  
rock!


-- Ed Leafe




From martin at v.loewis.de  Thu Dec  4 08:26:35 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 04 Dec 2008 08:26:35 +0100
Subject: [Python-Dev] 2.5.3 and 2.4.6 release schedule
Message-ID: <4937862B.8000403@v.loewis.de>

I would like to create 2.5.3 and 2.4.6 release candidates next week,
December 12, and final releases on December 19. If there are any open
issues that you think need to be considered, please create a bug in
the bug tracker, mark it as release blocker, and label it with version
2.5.3 (or 2.4). Of course, a number of such issues are already in the
tracker, some already being worked on.

Remember: 2.5.3 will be the last bug fix release for Python 2.5;
afterwards, only security patches will be accepted for the 2.5
branch. The 2.4 branch is already in that state (the 2.3 branch
is not maintained anymore; 2.4 security patches will be produced
until November 2009).

Regards,
Martin

From martin at v.loewis.de  Thu Dec  4 08:36:11 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 04 Dec 2008 08:36:11 +0100
Subject: [Python-Dev] Merging mailing lists
Message-ID: <4937886B.4000002@v.loewis.de>

I would like to merge mailing lists, now that the design and first
implementation of Python 3000 is complete. In particular, I would
like to merge the python-3000 mailing list back into python-dev,
and the python-3000-checkins mailing list back into python-checkins.
The rationale is to simplify usage of the lists, and to avoid
cross-postings.

To implement this, all subscribers of the 3000 mailing lists would
be added to the trunk mailing lists (avoiding duplicates, of course),
and all automated messages going to python-3000-checkins would then
be directed to the trunk lists. The 3000 mailing lists would change
into read-only mode (i.e. primarily leaving the archives behind).

Any objections?

Regards,
Martin

From fdrake at gmail.com  Thu Dec  4 09:04:27 2008
From: fdrake at gmail.com (Fred Drake)
Date: Thu, 4 Dec 2008 03:04:27 -0500
Subject: [Python-Dev] [Python-checkins] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <9cee7ab80812040004r54cce844lbd3728d99dc780d8@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:36 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I would like to merge mailing lists, now that the design and first
> implementation of Python 3000 is complete. In particular, I would

+1


  -Fred

-- 
Fred L. Drake, Jr.    <fdrake at gmail.com>
"Chaos is the score upon which reality is written." --Henry Miller

From ondrej at certik.cz  Thu Dec  4 10:42:52 2008
From: ondrej at certik.cz (Ondrej Certik)
Date: Thu, 4 Dec 2008 10:42:52 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>
	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>
Message-ID: <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:24 AM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Dec 3, 2008, at 9:13 PM, Dotan Cohen wrote:
>
>> On this page:
>> http://www.python.org/download/releases/3.0/
>>
>> The text "This is a proeuction release" should probably read "This is
>> a production release". It would give a better first impression :)
>
> Fixed, thanks!

I tried to find the documentation here:

http://python.org/doc/

but clicking on the links:

http://docs.python.org/whatsnew/3.0.html
http://docs.python.org/3.0

gives me:

404 Not Found

Ondrej

From ncoghlan at gmail.com  Thu Dec  4 11:59:25 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 04 Dec 2008 20:59:25 +1000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>
	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>
Message-ID: <4937B80D.9070309@gmail.com>

Ondrej Certik wrote:
> I tried to find the documentation here:
> 
> http://python.org/doc/
> 
> but clicking on the links:
> 
> http://docs.python.org/whatsnew/3.0.html
> http://docs.python.org/3.0

These 404 for me as well. but the dev links have already rolled over to
3.1a0.

There are also no cross-links from the main 2.6 docs to the released
py3k docs.

I was going to suggest there needs to be something in PEP 101 about
checking the doc links, but it's already there :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Thu Dec  4 12:07:02 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 04 Dec 2008 21:07:02 +1000
Subject: [Python-Dev] [Python-checkins] r67511 - in python/trunk:
 Doc/library/logging.rst Lib/logging/__init__.py Lib/test/test_logging.py
 Misc/NEWS
In-Reply-To: <20081203232258.7526A1E4002@bag.python.org>
References: <20081203232258.7526A1E4002@bag.python.org>
Message-ID: <4937B9D6.3020906@gmail.com>

vinay.sajip wrote:
> +def _showwarning(message, category, filename, lineno, file=None, line=None):
> +    """
> +    Implementation of showwarnings which redirects to logging, which will first
> +    check to see if the file parameter is None. If a file is specified, it will
> +    delegate to the original warnings implementation of showwarning. Otherwise,
> +    it will call warnings.formatwarning and will log the resulting string to a
> +    warnings logger named "py.warnings" with level logging.WARNING.
> +    """
> +    if file is not None:
> +        if _warnings_showwarning is not None:
> +            _warnings_showwarning(message, category, filename, lineno, file, line)
> +    else:
> +        import warnings
> +        s = warnings.formatwarning(message, category, filename, lineno, line)
> +        logger = getLogger("py.warnings")
> +        if not logger.handlers:
> +            logger.addHandler(NullHandler())
> +        logger.warning("%s", s)

I'd be careful here - this could deadlock if a thread spawned as a side
effect of importing a module happens to trigger a warning.

warnings is pulled into sys.modules as part of the interpreter startup -
having a global "import warnings" shouldn't have any real effect on
logging's import time.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From lists at cheimes.de  Thu Dec  4 12:50:21 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 04 Dec 2008 12:50:21 +0100
Subject: [Python-Dev] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <gh8g5t$fd2$1@ger.gmane.org>

Martin v. L?wis wrote:
> Any objections?

+1


From amk at amk.ca  Thu Dec  4 13:37:50 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 4 Dec 2008 07:37:50 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
Message-ID: <20081204123750.GA890@amk.local>

On Wed, Dec 03, 2008 at 08:51:33PM -0500, Barry Warsaw wrote:
> On behalf of the Python development team and the Python community, I  
> am happy to announce the release of Python 3.0 final.

Yay!

> We are confident that Python 3.0 is of the same high quality as our  
> previous releases, such as the recently announced Python 2.6.  We will  
> continue to support and develop both Python 3 and Python 2 for the  
> foreseeable future, and you can safely choose either version (or both)  
> to use in your projects.  Which you choose depends on your own needs  
> and the availability of third-party packages that you depend on.  Some  
> other things to consider:

I think we should also have a statement upon on python.org about
future plans: e.g.

* that there will be a Python 2.7 that will incorporate what we learn from
  people trying to port,
* that 3.1 will rearrange the standard library in mostly-known ways, and 
* that we expect people to use 3.0 mostly for compatibility testing, 
  not going into serious production use until 3.1 or maybe even 3.2.

(The details are open to discussion, of course.)

--amk


From g.brandl at gmx.net  Thu Dec  4 13:40:19 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 04 Dec 2008 13:40:19 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <4937B80D.9070309@gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>
	<4937B80D.9070309@gmail.com>
Message-ID: <gh8j4g$ol6$1@ger.gmane.org>

Nick Coghlan schrieb:
> Ondrej Certik wrote:
>> I tried to find the documentation here:
>> 
>> http://python.org/doc/
>> 
>> but clicking on the links:
>> 
>> http://docs.python.org/whatsnew/3.0.html
>> http://docs.python.org/3.0
> 
> These 404 for me as well. but the dev links have already rolled over to
> 3.1a0.
> 
> There are also no cross-links from the main 2.6 docs to the released
> py3k docs.
> 
> I was going to suggest there needs to be something in PEP 101 about
> checking the doc links, but it's already there :)

I can't find any docs built for Python 3.0 (not 3.1a0).  I would have
handled building and uploading the docs if somebody (or at least anybody)
had told me I was to do it.  Now we again have the situation that the
docs for the new release are wrecked.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From steve at holdenweb.com  Thu Dec  4 14:08:47 2008
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 04 Dec 2008 08:08:47 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <gh8j4g$ol6$1@ger.gmane.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>	<4937B80D.9070309@gmail.com>
	<gh8j4g$ol6$1@ger.gmane.org>
Message-ID: <gh8kok$tpg$1@ger.gmane.org>

Georg Brandl wrote:
> Nick Coghlan schrieb:
>> Ondrej Certik wrote:
>>> I tried to find the documentation here:
>>>
>>> http://python.org/doc/
>>>
>>> but clicking on the links:
>>>
>>> http://docs.python.org/whatsnew/3.0.html
>>> http://docs.python.org/3.0
>> These 404 for me as well. but the dev links have already rolled over to
>> 3.1a0.
>>
>> There are also no cross-links from the main 2.6 docs to the released
>> py3k docs.
>>
>> I was going to suggest there needs to be something in PEP 101 about
>> checking the doc links, but it's already there :)
> 
> I can't find any docs built for Python 3.0 (not 3.1a0).  I would have
> handled building and uploading the docs if somebody (or at least anybody)
> had told me I was to do it.  Now we again have the situation that the
> docs for the new release are wrecked.
> 
Sounds like we need a bot to check the web each new release before the
release manager "presses the button" and makes the announcement.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From facundobatista at gmail.com  Thu Dec  4 14:18:11 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Thu, 4 Dec 2008 11:18:11 -0200
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081204123750.GA890@amk.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
Message-ID: <e04bdf310812040518t626a6ba4n79d21bd47f2fc3f5@mail.gmail.com>

2008/12/4 A.M. Kuchling <amk at amk.ca>:

> * that there will be a Python 2.7 that will incorporate what we learn from
>  people trying to port,
> * that 3.1 will rearrange the standard library in mostly-known ways, and
> * that we expect people to use 3.0 mostly for compatibility testing,
>  not going into serious production use until 3.1 or maybe even 3.2.

I think that would be fantastic to have a small set of straightforward
sentences like these, to transmit the most important stuff.

For my part, when it's fixed, I will translate them to spanish and
propagate them.


> (The details are open to discussion, of course.)

I think those are fine. I would add something about the migration
path, something like "If you want to start testing your library/system
in 3.0, you should first use Python 2.6, see migration details [here]"

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From jeremy at alum.mit.edu  Thu Dec  4 14:24:57 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 4 Dec 2008 08:24:57 -0500
Subject: [Python-Dev] [Python-3000] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <e8bf7a530812040524r46178728u5ed980104155adf8@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:36 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I would like to merge mailing lists, now that the design and first
> implementation of Python 3000 is complete. In particular, I would
> like to merge the python-3000 mailing list back into python-dev,
> and the python-3000-checkins mailing list back into python-checkins.
> The rationale is to simplify usage of the lists, and to avoid
> cross-postings.

+1

> To implement this, all subscribers of the 3000 mailing lists would
> be added to the trunk mailing lists (avoiding duplicates, of course),
> and all automated messages going to python-3000-checkins would then
> be directed to the trunk lists. The 3000 mailing lists would change
> into read-only mode (i.e. primarily leaving the archives behind).
>
> Any objections?

No

Jeremy


> Regards,
> Martin
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>

From lists at cheimes.de  Thu Dec  4 16:12:07 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 04 Dec 2008 16:12:07 +0100
Subject: [Python-Dev] Merging flow
Message-ID: <gh8s08$p9r$1@ger.gmane.org>

Several people have asked about the patch and merge flow. Now that 
Python 3.0 is out it's a bit more complicated.

Flow diagram
------------

trunk ---> release26-maint
        \->      py3k       ---> release30-maint


Patches for all versions of Python should land in the trunk. They are 
then merged into release26-maint and py3k branches. Changes for Python 
3.0 are merged via the py3k branch.

Christian


From tjreedy at udel.edu  Thu Dec  4 17:12:23 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 11:12:23 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <gh8j4g$ol6$1@ger.gmane.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>	<4937B80D.9070309@gmail.com>
	<gh8j4g$ol6$1@ger.gmane.org>
Message-ID: <gh8vh2$638$1@ger.gmane.org>

Georg Brandl wrote:

> I can't find any docs built for Python 3.0 (not 3.1a0). 

The Windows installation has new 3.0 doc dated Dec 3, so it was built, 
just not posted correctly.


From tjreedy at udel.edu  Thu Dec  4 17:47:22 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 11:47:22 -0500
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
Message-ID: <gh91il$f0m$1@ger.gmane.org>

Guido van Rossum wrote:

>> Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major milestone in
>> Python's history, and was nearly three years in the making.  This is a new
>> version of the language that is incompatible with the 2.x line of releases,

I think this

>> while remaining true to BDFL Guido van Rossum's vision.  Some things you
>> will notice include:
>>
>> * Fixes to many old language warts
>> * Removal of long deprecated features and redundant syntax
>> * Improvements in, and a reorganization of, the standard library
>> * Changes to the details of how built-in objects like strings and dicts work
>> * ...and many more new features
>>
>> While these changes were made without concern for backward compatibility,

and this could give some people a mis-impression, most likely negative, 
as to the magnitude and nature of the change.  Most of the code I am now 
writing would, I believe, run with 2.5 except for print(..., file=xxx). 
  And I know that there was concern for backward compatibility to the 
point that some changes were rejected (renaming builtins) or delayed 
(deleting duplicate test asserts) for that reason.  So I would soften 
the statements to "... version of the language that is partially 
incompatible with... " and "were made without being bound by backward 
compatibility,"

tjr


From dickinsm at gmail.com  Thu Dec  4 18:23:53 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Thu, 4 Dec 2008 17:23:53 +0000
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes <lists at cheimes.de> wrote:
> Patches for all versions of Python should land in the trunk. They are then
> merged into release26-maint and py3k branches. Changes for Python 3.0 are
> merged via the py3k branch.

Thanks, Christian!

Questions:

(1) If I commit a change to the trunk that I don't want to go into
release26-maint, should I explicitly block it using svnmerge?

(2) Same question for trunk -> py3k

(3) Same question for py3k -> release30-maint.

I'm guessing that the answers are (1) No, (2) Yes, (3) No.

Mark

From musiccomposition at gmail.com  Thu Dec  4 18:30:43 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 4 Dec 2008 11:30:43 -0600
Subject: [Python-Dev] Merging flow
In-Reply-To: <5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com>
Message-ID: <1afaf6160812040930i350d44ffwaeb40b670f3da537@mail.gmail.com>

On Thu, Dec 4, 2008 at 11:23 AM, Mark Dickinson <dickinsm at gmail.com> wrote:
> On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes <lists at cheimes.de> wrote:
>> Patches for all versions of Python should land in the trunk. They are then
>> merged into release26-maint and py3k branches. Changes for Python 3.0 are
>> merged via the py3k branch.
>
> Thanks, Christian!
>
> Questions:
>
> (1) If I commit a change to the trunk that I don't want to go into
> release26-maint, should I explicitly block it using svnmerge?
>
> (2) Same question for trunk -> py3k
>
> (3) Same question for py3k -> release30-maint.
>
> I'm guessing that the answers are (1) No, (2) Yes, (3) No.

That is correct. We don't care too much about blocking for the release branches.



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From brett at python.org  Thu Dec  4 18:37:05 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Dec 2008 09:37:05 -0800
Subject: [Python-Dev] [Python-3000-checkins] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <bbaeab100812040937u6087dfcat9f0abeb52b2fde66@mail.gmail.com>

On Wed, Dec 3, 2008 at 23:36, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I would like to merge mailing lists, now that the design and first
> implementation of Python 3000 is complete. In particular, I would
> like to merge the python-3000 mailing list back into python-dev,
> and the python-3000-checkins mailing list back into python-checkins.
> The rationale is to simplify usage of the lists, and to avoid
> cross-postings.
>
> To implement this, all subscribers of the 3000 mailing lists would
> be added to the trunk mailing lists (avoiding duplicates, of course),
> and all automated messages going to python-3000-checkins would then
> be directed to the trunk lists. The 3000 mailing lists would change
> into read-only mode (i.e. primarily leaving the archives behind).
>
> Any objections?
>

Nope; +1.

-Brett

From jeremy at alum.mit.edu  Thu Dec  4 19:18:23 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 4 Dec 2008 13:18:23 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <e8bf7a530812041018v140fb3c9h2e864a2f73a05760@mail.gmail.com>

On Thu, Dec 4, 2008 at 10:12 AM, Christian Heimes <lists at cheimes.de> wrote:
> Several people have asked about the patch and merge flow. Now that Python
> 3.0 is out it's a bit more complicated.
>
> Flow diagram
> ------------
>
> trunk ---> release26-maint
>       \->      py3k       ---> release30-maint
>
>
> Patches for all versions of Python should land in the trunk. They are then
> merged into release26-maint and py3k branches. Changes for Python 3.0 are
> merged via the py3k branch.

You say "they are then merged."  Does that mean if I commit something
on the trunk, someone else will merge it for me?  Or do I need to do
it?

The library is vastly different between 2.x and 3.x.  I'm personally
aware of the many changes related to httplib / urllib / xmlrpclib.
I'm worried that it will be hard to decide how to "merge" things
between the two versions.

Jeremy

> Christian
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>

From musiccomposition at gmail.com  Thu Dec  4 19:25:30 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 4 Dec 2008 12:25:30 -0600
Subject: [Python-Dev] Merging flow
In-Reply-To: <e8bf7a530812041018v140fb3c9h2e864a2f73a05760@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<e8bf7a530812041018v140fb3c9h2e864a2f73a05760@mail.gmail.com>
Message-ID: <1afaf6160812041025y113825dfoefbee6c4b69a55f2@mail.gmail.com>

On Thu, Dec 4, 2008 at 12:18 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On Thu, Dec 4, 2008 at 10:12 AM, Christian Heimes <lists at cheimes.de> wrote:
>> Several people have asked about the patch and merge flow. Now that Python
>> 3.0 is out it's a bit more complicated.
>>
>> Flow diagram
>> ------------
>>
>> trunk ---> release26-maint
>>       \->      py3k       ---> release30-maint
>>
>>
>> Patches for all versions of Python should land in the trunk. They are then
>> merged into release26-maint and py3k branches. Changes for Python 3.0 are
>> merged via the py3k branch.
>
> You say "they are then merged."  Does that mean if I commit something
> on the trunk, someone else will merge it for me?  Or do I need to do
> it?

Generally, somebody else will do it if it is on the trunk and bound
for py3k. (Bug fixes should be backported by the original committer.)
Of course, if the change required in py3k is complicated and vastly
different, I and the other mergers would appreciate it if you did it
yourself.

>
> The library is vastly different between 2.x and 3.x.  I'm personally
> aware of the many changes related to httplib / urllib / xmlrpclib.
> I'm worried that it will be hard to decide how to "merge" things
> between the two versions.

Feel free to do it yourself.

>
> Jeremy




-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From eric at trueblade.com  Thu Dec  4 19:52:05 2008
From: eric at trueblade.com (Eric Smith)
Date: Thu, 04 Dec 2008 13:52:05 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <493826D5.3020205@trueblade.com>

Christian Heimes wrote:
> Several people have asked about the patch and merge flow. Now that 
> Python 3.0 is out it's a bit more complicated.
> 
> Flow diagram
> ------------
> 
> trunk ---> release26-maint
>        \->      py3k       ---> release30-maint
> 
> 
> Patches for all versions of Python should land in the trunk. They are 
> then merged into release26-maint and py3k branches. Changes for Python 
> 3.0 are merged via the py3k branch.

Apologies if this has been discussed before. I looked but didn't see 
anything.

Given that at least 99% of the changes for the trunk will not get merged 
into release26-maint, doesn't it make more sense to merge the other way? 
That is, anything that gets checked in to release26-maint would 
potentially be merged into trunk. That would remove the huge number of 
merge blocks that will otherwise be required. Same fore py3k and 
release30-maint.

Eric.

From nicole at cats-muvva.net  Thu Dec  4 19:36:48 2008
From: nicole at cats-muvva.net (Nicole King)
Date: Thu, 4 Dec 2008 18:36:48 +0000
Subject: [Python-Dev] Taint Mode in Python 3.0
Message-ID: <200812041836.48146.nicole@cats-muvva.net>

Dear All,

I have published the diff for my implementation of tainted mode in Python for 
R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the 
bottom the page. I apologise for past problems accessing this web site: I 
hope to have resolved all the issues with it.

Nicole

From musiccomposition at gmail.com  Thu Dec  4 19:57:34 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 4 Dec 2008 12:57:34 -0600
Subject: [Python-Dev] Merging flow
In-Reply-To: <493826D5.3020205@trueblade.com>
References: <gh8s08$p9r$1@ger.gmane.org> <493826D5.3020205@trueblade.com>
Message-ID: <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com>

On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith <eric at trueblade.com> wrote:
> Christian Heimes wrote:
>>
>> Several people have asked about the patch and merge flow. Now that Python
>> 3.0 is out it's a bit more complicated.
>>
>> Flow diagram
>> ------------
>>
>> trunk ---> release26-maint
>>       \->      py3k       ---> release30-maint
>>
>>
>> Patches for all versions of Python should land in the trunk. They are then
>> merged into release26-maint and py3k branches. Changes for Python 3.0 are
>> merged via the py3k branch.
>
> Apologies if this has been discussed before. I looked but didn't see
> anything.
>
> Given that at least 99% of the changes for the trunk will not get merged
> into release26-maint, doesn't it make more sense to merge the other way?
> That is, anything that gets checked in to release26-maint would potentially
> be merged into trunk. That would remove the huge number of merge blocks that
> will otherwise be required. Same fore py3k and release30-maint.

I think the percentage is a bit lower than that. Also, we haven't been
using blocking with the maintenance branch so far; svnmerge.py is just
a convenience. (It generates commit messages and has a simpler
interface than a simple "svn merge" command.)



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From python at rcn.com  Thu Dec  4 20:12:57 2008
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 4 Dec 2008 11:12:57 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
Message-ID: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>

From: "A.M. Kuchling" <amk at amk.ca>
> I think we should also have a statement upon on python.org about
> future plans: e.g.
> 
> * that there will be a Python 2.7 that will incorporate what we learn from
>  people trying to port,
> * that 3.1 will rearrange the standard library in mostly-known ways, and 
> * that we expect people to use 3.0 mostly for compatibility testing, 
>  not going into serious production use until 3.1 or maybe even 3.2.

The latter statement worries me.  It seems to unnecessarily undermine
adoption of 3.0.  It essentially says, "don't use this".  Is that what we want?
ISTM, 3.0 is in pretty good shape.  There is nothing intrinsically wrong
with it.  The number one adoption issue is external, i.e. how quickly
key third-party modules get converted.


Raymond


From fijall at gmail.com  Thu Dec  4 20:31:35 2008
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 4 Dec 2008 20:31:35 +0100
Subject: [Python-Dev] Taint Mode in Python 3.0
In-Reply-To: <200812041836.48146.nicole@cats-muvva.net>
References: <200812041836.48146.nicole@cats-muvva.net>
Message-ID: <693bc9ab0812041131o63b462e2id0d9783c2c459143@mail.gmail.com>

When I try to run this, I get:

Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
  File "/home/fijal/lang/python/Python30/Lib/encodings/__init__.py",
line 31, in <module>
  File "/home/fijal/lang/python/Python30/Lib/codecs.py", line 1060, in <module>
TaintError: using tainted data
Aborted

Are there any tests what it should do? Didn't find it in a diff

On Thu, Dec 4, 2008 at 7:36 PM, Nicole King <nicole at cats-muvva.net> wrote:
> Dear All,
>
> I have published the diff for my implementation of tainted mode in Python for
> R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the
> bottom the page. I apologise for past problems accessing this web site: I
> hope to have resolved all the issues with it.
>
> Nicole
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

From barry at python.org  Thu Dec  4 20:41:31 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Dec 2008 14:41:31 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
Message-ID: <B2649D21-0D63-4598-B134-987B37549146@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 4, 2008, at 2:12 PM, Raymond Hettinger wrote:

> From: "A.M. Kuchling" <amk at amk.ca>
>> I think we should also have a statement upon on python.org about
>> future plans: e.g.
>> * that there will be a Python 2.7 that will incorporate what we  
>> learn from
>> people trying to port,
>> * that 3.1 will rearrange the standard library in mostly-known  
>> ways, and * that we expect people to use 3.0 mostly for  
>> compatibility testing,  not going into serious production use until  
>> 3.1 or maybe even 3.2.
>
> The latter statement worries me.  It seems to unnecessarily undermine
> adoption of 3.0.  It essentially says, "don't use this".  Is that  
> what we want?
> ISTM, 3.0 is in pretty good shape.  There is nothing intrinsically  
> wrong
> with it.  The number one adoption issue is external, i.e. how quickly
> key third-party modules get converted.

I agree.  I tried to put a positive spin on the announcement, and the  
backward compatibility issue in particular.  I probably failed.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTgybHEjvBPtnXfVAQJPjgP+NeyLY2ACryOmxeRV8qcotKrMJZYBwu6q
gtNjax3m0faRr2VrRwVLpiJqBoVkwpr+heKg7z2rR183MstsgQ9QsQpkZXBV+QnH
yK1yA18jaVZhLMR0VPT75GN1KPp5KCL+TbuT0cFtJ/SSt1LT5K356jdMYFi/ZbUP
t2YtaWoxB5o=
=4lo8
-----END PGP SIGNATURE-----

From fdrake at acm.org  Thu Dec  4 20:00:39 2008
From: fdrake at acm.org (Fred Drake)
Date: Thu, 04 Dec 2008 14:00:39 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <493826D5.3020205@trueblade.com>
References: <gh8s08$p9r$1@ger.gmane.org> <493826D5.3020205@trueblade.com>
Message-ID: <646307E1-3CB4-4538-9C4D-5ADE9C4E69F1@acm.org>

On Dec 4, 2008, at 1:52 PM, Eric Smith wrote:
> Apologies if this has been discussed before. I looked but didn't see  
> anything.

Probably has, just 'cause everything has been discussed before.

> Given that at least 99% of the changes for the trunk will not get  
> merged into release26-maint, doesn't it make more sense to merge the  
> other way? That is, anything that gets checked in to release26-maint  
> would potentially be merged into trunk. That would remove the huge  
> number of merge blocks that will otherwise be required. Same fore  
> py3k and release30-maint.

The directions of merges were established in the past at some point.   
Though they feel wrong (at least to you and me), the direction is what  
it is.  I'd asked about the direction mostly because I can never  
remember after time away from working on the Python tree.

That said, don't let Python's decision on the direction keep you from  
managing your own projects the right way.  :-)

In fact, it's reasonable to fix bugs on the release26-maint branch,  
migrate the patch to the trunk, and then use svnmerge.py from there to  
propagate the changes.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From a.badger at gmail.com  Thu Dec  4 21:02:19 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 04 Dec 2008 12:02:19 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
Message-ID: <4938374B.8000006@gmail.com>

I opened up bug http://bugs.python.org/issue4006 a while ago and it was
suggested in the report that it's not a bug but a feature and so I
should come here to see about getting the feature changed :-)

I have a specific problem with os.environ and a somewhat less important
architectural issue with the unicode/bytes handling in certain os.*
modules.  I'll start with the important one:

Currently in python3 there's no way to get at environment variables that
are not encoded in the system default encoding.  My understanding is
that this isn't a problem on Windows systems but on *nix this is a huge
problem.  environment variables on *nix are a sequence of non-null
bytes.  These bytes are almost always "characters" but they do not have
to be.  Further, there is nothing that requires that the characters be
in the same encoding; some of the characters could be in the UTF-8
character set while others are in latin-1, shift-jis, or big-5.

These mixed encodings can occur for a variety of reasons.  Here's an
example that isn't too contrived :-)

Swallow is a multi-user shell server hosted at a university in Japan.
The OS installed is Fedora 10 where the encoding of all filenames
provided by the OS are UTF-8.  The administrator of the OS has kept this
convention and, among other things has created a directory to mount and
NFS directory from another computer.  He calls that "??????"
("network" in Japanese).  Since it's utf-8, that gets put on the
filesystem as
'\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf'

Now the administrators of the fileserver have been maintaining it since
before Unicode was invented.  Furthermore, they don't want to suffer
from the space loss of using utf-8 to encode Japanese so they use
shift-jis everywhere.  They have a directory on the nfs share for
programs that are useful for people on the shell server to access.  It's
called "?????" ("programs" in Japanese)  Since they're using
shift-jis, the bytes on the filesystem are:
'\x83v\x83\x8d\x83O\x83\x89\x83\x80'

The system administrator of the shell server adds the directory of
programs to all his user's default PATH variables so then they have this:

PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80

(Note: python syntax, In the unix shell you'd likely have octal instead
of hex)

Now comes the problematic part.  One of the user's on the system wants
to write a python3 program that needs to determine if a needed program
is in the user's PATH.  He tries to code it like this::

#!/usr/bin/python3.0

import os

for directory in os.environ['PATH']:
    programs = os.listdir(directory)

That code raises a KeyError because python3 has silently discarded the
PATH due to the shift-jis encoded path elements.  Much more importantly,
there's no way the programmer can handle the KeyError and actually get
the PATH from within python.

In the bug report I opened, I listed four ways to fix this along with
the pros and cons:

1) return mixed unicode and byte types in os.environ and os.getenv
   - I think this one is a bad idea.  It's the easiest for simple code
to deal with but it's repeating the major problem with python2's Unicode
handling: mixing unicode and byte types unpredictably.

2) return only byte types in os.environ
  - This is conceptually correct but the most annoying option.
Technically we're receiving bytes from the C libraries and the C
libraries expect bytes in return.  But in the common case we will be
dealing with things in one encoding so this causes needless effort to
the application programmer in the common case.

3) silently ignore non-decodable value when accessing os.environ['PATH']
as we do now but allow access to the full information via
os.environ[b'PATH'] and os.getenvb().
  - This mirrors the practice of os.listdir('.') vs os.listdir(b'.') and
os.getcwd() vs os.getcwdb().

4) raise an exception when non-decodable values are *accessed* and
continue as in #3.  This means that os.environ wouldn't be a simple dict
as it would need to decode the values when keys are accessed (although
it could cache the values).
  - This mirrors the practice of open() which is to decode the value for
the common case but throw an exception and allow the programmer to
decide what to do if all values are not decodable.

Either #3 or #4 will solve the major problem and both have precedent in
python3's current implementation.  The difference between them is
whether to throw an exception when a non-decodable value is encountered.
 Here's why I think that's appropriate:

One of the things I enjoy about python is the informative tracebacks
that make debugging easy.  I think that the ease of debugging is lost
when we silently ignore an error.  If we look at the difference in
coding and debugging for problems with files that aren't encoded in the
default encoding (where a traceback is issued) and os.listdir() when
filenames aren't in the default encoding (where the filenames are
silently ignored), I think we'll see that::

#!/usr/bin/python3.0
# Code with two unicode problems:
import os, sys

directory = sys.stdin.readline().strip()
for filename in os.listdir(directory):
    myfile = open(filename, 'r')
    print('%s: %s' % [os.path.join(directory, filename), myfile.readline()])
    myfile.close()

Let's say I write the above code and test it on a directory that's all
encoded in the default encoding.  I release it to the world.  Someone
uses it on a system that has files and filenames with mixed encodings.
They immediately get a traceback like this:

  File "./test.py", line 7, in <module>
    print(myfile.readline())
  [...]
  UnicodeDecodeError: 'utf8' codec can't decode bytes in position 24-26:
invalid data

With that information I can diagnose that my program is failing to read
a line from a file because the file is not written in the default
encoding (utf8 in this case).  It points out that myfile on line 7 of
test.py is the file object that has issues.  I quickly fix it by doing this:

+ unknown_encoded_files = []
[...]
+    try:
-    print(myfile.readline())
+        print('%s: %s' % [os.path.join(directory, filename),
myfile.readline()])
+    except UnicodeDecodeError:
+        unknown_encoded_files.append(filename)
     myfile.close()
+if unknown_encoded_files:
+    print('These files are not in the default encoding:\n %s' % '\n
'.join(unknown_encoded_files))

Very simple.  The traceback has all the information I need to fix this.

A little later I get another report from that user that my code is
failing to list the first line of all the files in their home directory.
 This time there's no traceback to point out which of my files is
failing, just that some files are being ignored.  I ask for the list of
files in the directory and get back:

  ?.txt
  ?.txt

I create those files in a directory and they're processed fine.  I tell
the user that and ask if there's anything special about what's in the
files or anything that makes them different.  No... they're both text
files on his machine.  One was created there, though, and the other was
copied from another machine.  Hmm.. do the filenames show up mangled by
any chance?  Yes, one of them does but he knows it's correct since it
shows up correctly on his machine at home.

Ah ha!  That seems to point at an encoding problem.  But where?  After
writing a test and perusing my code for a while, I find my os.listdir()
call.  directory has to be converted to bytes for this to work.  So I
change the code like so:

- for filename in os.listdir(directory):
+ for filename in os.listdir(directory.encode()):
[...]
-        unknown_encoded_files.append(filename)
+        unknown_encoded_files.append(str(filename, errors='replace'))

The code for the fix is simple but the debugging to find the problem is
not.  Raising an exception instead of silently failing is much better
for getting code that works correctly.

The bug report I opened suggests creating a PEP to address this issue.
I think that's a good idea for whether os.listdir() and friends should
be changed to raise an exception but not having any way to get at some
environment variables seems like it's just a bug that needs to be
addressed.  What do other people think on both these issues?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/e2ab19a0/attachment.pgp>

From fwierzbicki at gmail.com  Thu Dec  4 21:05:51 2008
From: fwierzbicki at gmail.com (Frank Wierzbicki)
Date: Thu, 4 Dec 2008 15:05:51 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
Message-ID: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>

On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling <amk at amk.ca> wrote:
> 14:00 - 15:30
> =============
>
> Two tracks:
>
> Cross-implementation issues:
>
>  What do the various VMs want/need from CPython to help with their
>  implementations?
>
>  * Marking CPython-specific tests in the test suite?
>  * Getting an implementation agnostic test suite for the Python language?
>  * Separating the language tests and the pure Python part of the stdlib into
>    a separate project?  (Or publish them as a separate package.)
>  * Transition plans for 3.0?
>
>  Champion needed.
I would like to champion this one.

-Frank

From brett at python.org  Thu Dec  4 21:16:08 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Dec 2008 12:16:08 -0800
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
Message-ID: <bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>

On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki <fwierzbicki at gmail.com> wrote:
> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling <amk at amk.ca> wrote:
>> 14:00 - 15:30
>> =============
>>
>> Two tracks:
>>
>> Cross-implementation issues:
>>
>>  What do the various VMs want/need from CPython to help with their
>>  implementations?
>>
>>  * Marking CPython-specific tests in the test suite?
>>  * Getting an implementation agnostic test suite for the Python language?
>>  * Separating the language tests and the pure Python part of the stdlib into
>>    a separate project?  (Or publish them as a separate package.)
>>  * Transition plans for 3.0?
>>
>>  Champion needed.
> I would like to champion this one.
>

I told AMK this a while back, but might as well make it more public; I
am up for chairing as well.

-Brett

From amk at amk.ca  Thu Dec  4 21:16:27 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 4 Dec 2008 15:16:27 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
Message-ID: <20081204201627.GA23627@amk-desktop.matrixgroup.net>

On Thu, Dec 04, 2008 at 03:05:51PM -0500, Frank Wierzbicki wrote:
> > Cross-implementation issues:
>
> I would like to champion this one.

Thanks!  You're now listed as the champion for it.

--amk

From p.f.moore at gmail.com  Thu Dec  4 21:20:34 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 4 Dec 2008 20:20:34 +0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <B2649D21-0D63-4598-B134-987B37549146@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
Message-ID: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>

2008/12/4 Barry Warsaw <barry at python.org>:
>>> * that 3.1 will rearrange the standard library in mostly-known ways, and
>>> * that we expect people to use 3.0 mostly for compatibility testing,  not going into serious production
>>>   use until 3.1 or maybe even 3.2.
>> The latter statement worries me.  It seems to unnecessarily undermine
>> adoption of 3.0.  It essentially says, "don't use this".  Is that what we
>> want?
>> ISTM, 3.0 is in pretty good shape.  There is nothing intrinsically wrong
>> with it.  The number one adoption issue is external, i.e. how quickly
>> key third-party modules get converted.
>
> I agree.  I tried to put a positive spin on the announcement, and the
> backward compatibility issue in particular.  I probably failed.

Hmm, looking back, the quote Raymond is referring to is just a
suggestion for additional text on the 3.0 page. I agree with him that
it's a bit too negative.

The announcement itself hits just the right note in my view. You
(Barry) seem to have got it pretty well on target.

One thing I'd like to see more clearly stated is that there's no
reason NOT to use Python 3.0 for new code. I don't think that message
has really come across yet - in spite of the warnings being all about
compatibility issues, no-one has stressed the simple point that if
your code is new, it doesn't have compatibility concerns!

Paul.

From p.f.moore at gmail.com  Thu Dec  4 21:21:28 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 4 Dec 2008 20:21:28 +0000
Subject: [Python-Dev] [Python-3000] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <79990c6b0812041221y6feaae02k87c7133b535e1ece@mail.gmail.com>

2008/12/4 "Martin v. L?wis" <martin at v.loewis.de>:
> Any objections?

The timing is right, go for it.
Paul

From rhamph at gmail.com  Thu Dec  4 21:54:03 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 13:54:03 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938374B.8000006@gmail.com>
References: <4938374B.8000006@gmail.com>
Message-ID: <aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>

On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
> suggested in the report that it's not a bug but a feature and so I
> should come here to see about getting the feature changed :-)
>
> I have a specific problem with os.environ and a somewhat less important
> architectural issue with the unicode/bytes handling in certain os.*
> modules.  I'll start with the important one:
>
> Currently in python3 there's no way to get at environment variables that
> are not encoded in the system default encoding.  My understanding is
> that this isn't a problem on Windows systems but on *nix this is a huge
> problem.  environment variables on *nix are a sequence of non-null
> bytes.  These bytes are almost always "characters" but they do not have
> to be.  Further, there is nothing that requires that the characters be
> in the same encoding; some of the characters could be in the UTF-8
> character set while others are in latin-1, shift-jis, or big-5.

Multiple encoding environments are best described as "batshit insane".
 It's impossible to handle any of it correctly *as text*, which is why
UTF-8 is becoming a universal standard.  For everybody's sanity python
should continue to push it.

However, some pragmatism is also possible.  Many uses of PATH may
allow it to be treated as black-box bytes, rather than text.  The
minimal solution I see is to make os.getenv() and os.putenv() switch
to byte modes when given byte arguments, as os.listdir() does.  This
use case doesn't require the ability to iterate over all environment
variables, as os.environb would allow.

I do wonder if controlling the environment given to a subprocess
requires os.environb, but it may be too obscure to really matter.

-- 
Adam Olsen, aka Rhamphoryncus

From exarkun at divmod.com  Thu Dec  4 22:00:46 2008
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Thu, 4 Dec 2008 16:00:46 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
Message-ID: <20081204210046.20272.2138425533.divmod.quotient.15747@ohm>

On Thu, 4 Dec 2008 20:20:34 +0000, Paul Moore <p.f.moore at gmail.com> wrote:
>2008/12/4 Barry Warsaw <barry at python.org>:
> [snip]
>
>One thing I'd like to see more clearly stated is that there's no
>reason NOT to use Python 3.0 for new code. I don't think that message
>has really come across yet - in spite of the warnings being all about
>compatibility issues, no-one has stressed the simple point that if
>your code is new, it doesn't have compatibility concerns!

New code that wouldn't be more easily written with a dependency on a
library that hasn't been ported, you mean.

Although beyond that, there may be reasons (for example, the significant
performance degradation in the I/O library currently being discussed on
python-list).

Jean-Paul

From ncoghlan at gmail.com  Thu Dec  4 22:07:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Dec 2008 07:07:07 +1000
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <gh91il$f0m$1@ger.gmane.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
	<gh91il$f0m$1@ger.gmane.org>
Message-ID: <4938467B.40806@gmail.com>

Terry Reedy wrote:
> and this could give some people a mis-impression, most likely negative,
> as to the magnitude and nature of the change.  Most of the code I am now
> writing would, I believe, run with 2.5 except for print(..., file=xxx).
>  And I know that there was concern for backward compatibility to the
> point that some changes were rejected (renaming builtins) or delayed
> (deleting duplicate test asserts) for that reason.  So I would soften
> the statements to "... version of the language that is partially
> incompatible with... " and "were made without being bound by backward
> compatibility,"

I would agree with Terry - while there are backwards incompatibilities,
they aren't gratuitous.

Then again, Guido does seem to want to discourage people from trying to
target the common subset of the two languages instead of using 2to3 as a
compilation step from the python3 version.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From nd at perlig.de  Thu Dec  4 22:09:34 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Thu, 4 Dec 2008 22:09:34 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>
Message-ID: <200812042209.34814.nd@perlig.de>

* Adam Olsen wrote: 


> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com> 
wrote:
> > I opened up bug http://bugs.python.org/issue4006 a while ago and it was
> > suggested in the report that it's not a bug but a feature and so I
> > should come here to see about getting the feature changed :-)
> >
> > I have a specific problem with os.environ and a somewhat less important
> > architectural issue with the unicode/bytes handling in certain os.*
> > modules.  I'll start with the important one:
> >
> > Currently in python3 there's no way to get at environment variables
> > that are not encoded in the system default encoding.  My understanding
> > is that this isn't a problem on Windows systems but on *nix this is a
> > huge problem.  environment variables on *nix are a sequence of non-null
> > bytes.  These bytes are almost always "characters" but they do not have
> > to be.  Further, there is nothing that requires that the characters be
> > in the same encoding; some of the characters could be in the UTF-8
> > character set while others are in latin-1, shift-jis, or big-5.
>
> Multiple encoding environments are best described as "batshit insane".
>  It's impossible to handle any of it correctly *as text*, which is why
> UTF-8 is becoming a universal standard.  For everybody's sanity python
> should continue to push it.

Here's an example which will become popular soon, I guess: CGI scripts and, 
of course WSGI applications. All those get their environment in an unknown 
encoding. In the worst case one can blow up the application by simply 
sending strange header lines over the wire. But there's more: consider 
running the server in C locale, then probably even a single 8 bit char 
might break something (?).

> However, some pragmatism is also possible.  Many uses of PATH may
> allow it to be treated as black-box bytes, rather than text.  The
> minimal solution I see is to make os.getenv() and os.putenv() switch
> to byte modes when given byte arguments, as os.listdir() does.  This
> use case doesn't require the ability to iterate over all environment
> variables, as os.environb would allow.
>
> I do wonder if controlling the environment given to a subprocess
> requires os.environb, but it may be too obscure to really matter.

IMHO, environment variables are no text. They are bytes by definition and 
should be treated as such.
I know, there's windows having unicode enabled env vars on demand, but 
there's only trouble with those over there in apache's httpd (when passing 
them to CGI scripts, oh well...).

nd

From ncoghlan at gmail.com  Thu Dec  4 22:11:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Dec 2008 07:11:47 +1000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081204123750.GA890@amk.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
Message-ID: <49384793.2030308@gmail.com>

A.M. Kuchling wrote:
> * that 3.1 will rearrange the standard library in mostly-known ways, and 
> * that we expect people to use 3.0 mostly for compatibility testing, 
>   not going into serious production use until 3.1 or maybe even 3.2.

As Raymond notes, this is probably too negative: for new projects, 3.0
should be fine so long as they don't need too many external libraries in
the short term.

For projects migrating from Python 2.x, the 3rd party library support
problem is likely to hold a lot of projects back for several months at
least, possibly to the point where it makes more sense to just wait for
2.7/3.1 to finalise any migration plans.

Such projects are still well-advised to start their porting efforts as
soon as possible though so they can identify *which* of their external
dependencies don't have python 3.0 compatible versions available yet.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From a.badger at gmail.com  Thu Dec  4 22:15:42 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 04 Dec 2008 13:15:42 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>
Message-ID: <4938487E.7050809@gmail.com>

Adam Olsen wrote:
> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
>> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
>> suggested in the report that it's not a bug but a feature and so I
>> should come here to see about getting the feature changed :-)
>>
>> I have a specific problem with os.environ and a somewhat less important
>> architectural issue with the unicode/bytes handling in certain os.*
>> modules.  I'll start with the important one:
>>
>> Currently in python3 there's no way to get at environment variables that
>> are not encoded in the system default encoding.  My understanding is
>> that this isn't a problem on Windows systems but on *nix this is a huge
>> problem.  environment variables on *nix are a sequence of non-null
>> bytes.  These bytes are almost always "characters" but they do not have
>> to be.  Further, there is nothing that requires that the characters be
>> in the same encoding; some of the characters could be in the UTF-8
>> character set while others are in latin-1, shift-jis, or big-5.
> 
> Multiple encoding environments are best described as "batshit insane".
>  It's impossible to handle any of it correctly *as text*, which is why
> UTF-8 is becoming a universal standard.  For everybody's sanity python
> should continue to push it.
> 
Amen brother!

> However, some pragmatism is also possible.

Unfortunately, this is exactly what I'm talking about :-)

>  Many uses of PATH may
> allow it to be treated as black-box bytes, rather than text.  The
> minimal solution I see is to make os.getenv() and os.putenv() switch
> to byte modes when given byte arguments, as os.listdir() does.  This
> use case doesn't require the ability to iterate over all environment
> variables, as os.environb would allow.
> 
This would be a partial implementation of my option #3.  It allows the
programmer to workaround problems but does allow subtle bugs to creep in
unawares.  For instance::

> I do wonder if controlling the environment given to a subprocess
> requires os.environb, but it may be too obscure to really matter.
> 
If you wanted to change one variable before passing it on to the
subprocess this could lead to head-scratcher bugs.  Here's a contrived
example:  Say I have an app that talks to multiple cvs repositories.  It
copies os.environ and modifies CVSROOT and CVS_RSH then calls subprocess
with env=temp_env.  If the PATH variable contains non-decodable elements
on some machines, this could lead to mysterious failures.  This is
particularly bad because we aren't directly modifying PATH anywhere in
our code so there won't be an obvious reason in the code that this is
failing.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/1f049950/attachment.pgp>

From ncoghlan at gmail.com  Thu Dec  4 22:19:19 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Dec 2008 07:19:19 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938374B.8000006@gmail.com>
References: <4938374B.8000006@gmail.com>
Message-ID: <49384957.3030102@gmail.com>

Toshio Kuratomi wrote:
> The bug report I opened suggests creating a PEP to address this issue.
> I think that's a good idea for whether os.listdir() and friends should
> be changed to raise an exception but not having any way to get at some
> environment variables seems like it's just a bug that needs to be
> addressed.  What do other people think on both these issues?

I'm pretty sure the discussion on this topic a while back decided that
where necessary Python 3 would grow parallel bytes versions of APIs
affected by environmental encoding issues (such as os.environb,
os.listdirb, os.getcwdb), but that we were OK with the idea of deferring
addition of those APIs until 3.1.

That is, this was an acknowledged limitation with a fairly
straightforward agreed solution, but it wasn't considered a common
enough issue to delay the release of 3.0 until all of those parallel
APIs had been implemented

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From g.brandl at gmx.net  Thu Dec  4 22:21:55 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 04 Dec 2008 22:21:55 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org> <493826D5.3020205@trueblade.com>
	<1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com>
Message-ID: <gh9hmk$bhb$1@ger.gmane.org>

Benjamin Peterson schrieb:
> On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith <eric at trueblade.com> wrote:
>> Christian Heimes wrote:
>>>
>>> Several people have asked about the patch and merge flow. Now that Python
>>> 3.0 is out it's a bit more complicated.
>>>
>>> Flow diagram
>>> ------------
>>>
>>> trunk ---> release26-maint
>>>       \->      py3k       ---> release30-maint
>>>
>>>
>>> Patches for all versions of Python should land in the trunk. They are then
>>> merged into release26-maint and py3k branches. Changes for Python 3.0 are
>>> merged via the py3k branch.
>>
>> Apologies if this has been discussed before. I looked but didn't see
>> anything.
>>
>> Given that at least 99% of the changes for the trunk will not get merged
>> into release26-maint, doesn't it make more sense to merge the other way?
>> That is, anything that gets checked in to release26-maint would potentially
>> be merged into trunk. That would remove the huge number of merge blocks that
>> will otherwise be required. Same fore py3k and release30-maint.

I've suggested that too; the counter-argument was that "most people don't
want to care in which branch to commit something".  I'm not too comfortable
with this argument as it implies a certain ignorance on the part of our
committers.  As Fred says, it wasn't discussed anyway.

Also, with svnmerge, it is not too late to change merging direction.

> I think the percentage is a bit lower than that. Also, we haven't been
> using blocking with the maintenance branch so far; svnmerge.py is just
> a convenience. (It generates commit messages and has a simpler
> interface than a simple "svn merge" command.)

I *did* use blocking with the 2.6 branch when I last merged a whole batch
of commits.  As you say, by using svnmerge without blocking we only get a
tool that can generate commit messages.  However, with blocking we get
something more valuable: we don't overlook backportable fixes anymore.

So: yes, blocking is more work, but it gives something important in return.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Thu Dec  4 22:22:50 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 04 Dec 2008 22:22:50 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <gh9ho9$bnc$1@ger.gmane.org>

Christian Heimes schrieb:
> Several people have asked about the patch and merge flow. Now that 
> Python 3.0 is out it's a bit more complicated.
> 
> Flow diagram
> ------------
> 
> trunk ---> release26-maint
>         \->      py3k       ---> release30-maint
> 
> 
> Patches for all versions of Python should land in the trunk. They are 
> then merged into release26-maint and py3k branches. Changes for Python 
> 3.0 are merged via the py3k branch.

As a side-note: this merging flow means that bugfix and feature commits
may never be merged from trunk to py3k in one svnmerge batch.  Else,
they cannot be separated when merging from py3k to 30-maint.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From amk at amk.ca  Thu Dec  4 22:31:04 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 4 Dec 2008 16:31:04 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
Message-ID: <20081204213104.GA24509@amk-desktop.matrixgroup.net>

On Thu, Dec 04, 2008 at 08:20:34PM +0000, Paul Moore wrote:
> Hmm, looking back, the quote Raymond is referring to is just a
> suggestion for additional text on the 3.0 page. I agree with him that
> it's a bit too negative.

Actually I want it to be an entirely separate page so that we can
point people to it.

> has really come across yet - in spite of the warnings being all about
> compatibility issues, no-one has stressed the simple point that if
> your code is new, it doesn't have compatibility concerns!

Well, at least not until you decide you need some particular external
library that hasn't been ported to 3.0 yet.

For example, if you go to discussion threads such as
<http://www.reddit.com/r/programming/comments/7h7d7/python_3000_is_ready/>,
you can see people making statements like "I've been holding off
learning it until 3000 went gold."

But I think starting with Python 3.0 is a bad idea for a newbie,
because they'll be limited in what they can do until the libraries
have been ported.  They can do some tasks (command-line tools,
Fibonacci functions, Tk GUIs), but can they use the fancy new web
framework they've just read about?  Write a game?  Draw graphs with
matplotlib?  Use and extend an application such as Roundup?  Bzzt, no,
not yet!

Starting with 3.0 is starting out on an island.  While I expect the
island will grow in territory over time, I'm worried that new learners
will automatically go for the highest version number, find their
available tools are highly restricted, and get frustrated.

Perhaps the statement could say something like "we do not expect
most Python packages will be ported to the 3.x series until 
around the time 3.1 is released in X months."  (where X=12?  6?)

--amk

From dima at hlabs.spb.ru  Thu Dec  4 21:58:40 2008
From: dima at hlabs.spb.ru (Dmitry Vasiliev)
Date: Thu, 04 Dec 2008 23:58:40 +0300
Subject: [Python-Dev] [Python-3000] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <49384480.8090806@hlabs.spb.ru>

Martin v. L?wis wrote:
> I would like to merge mailing lists, now that the design and first
> implementation of Python 3000 is complete. In particular, I would

+1

-- 
Dmitry Vasiliev (dima at hlabs.spb.ru)
  http://hlabs.spb.ru

From rhamph at gmail.com  Thu Dec  4 22:34:01 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 14:34:01 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812042209.34814.nd@perlig.de>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>
	<200812042209.34814.nd@perlig.de>
Message-ID: <aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo <nd at perlig.de> wrote:
> * Adam Olsen wrote:
>> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com>
> wrote:
>> > I opened up bug http://bugs.python.org/issue4006 a while ago and it was
>> > suggested in the report that it's not a bug but a feature and so I
>> > should come here to see about getting the feature changed :-)
>> >
>> > I have a specific problem with os.environ and a somewhat less important
>> > architectural issue with the unicode/bytes handling in certain os.*
>> > modules.  I'll start with the important one:
>> >
>> > Currently in python3 there's no way to get at environment variables
>> > that are not encoded in the system default encoding.  My understanding
>> > is that this isn't a problem on Windows systems but on *nix this is a
>> > huge problem.  environment variables on *nix are a sequence of non-null
>> > bytes.  These bytes are almost always "characters" but they do not have
>> > to be.  Further, there is nothing that requires that the characters be
>> > in the same encoding; some of the characters could be in the UTF-8
>> > character set while others are in latin-1, shift-jis, or big-5.
>>
>> Multiple encoding environments are best described as "batshit insane".
>>  It's impossible to handle any of it correctly *as text*, which is why
>> UTF-8 is becoming a universal standard.  For everybody's sanity python
>> should continue to push it.
>
> Here's an example which will become popular soon, I guess: CGI scripts and,
> of course WSGI applications. All those get their environment in an unknown
> encoding. In the worst case one can blow up the application by simply
> sending strange header lines over the wire. But there's more: consider
> running the server in C locale, then probably even a single 8 bit char
> might break something (?).

I think that's an argument that the framework should reencode all
input text into the correct system encoding before passing it on to
the CGI script or WSGI app.  If the framework doesn't have a clear way
to determine the client's encoding then it's all just gibberish
anyway.  A HTTP 400 or 500 range error code is appropriate here.


>> However, some pragmatism is also possible.  Many uses of PATH may
>> allow it to be treated as black-box bytes, rather than text.  The
>> minimal solution I see is to make os.getenv() and os.putenv() switch
>> to byte modes when given byte arguments, as os.listdir() does.  This
>> use case doesn't require the ability to iterate over all environment
>> variables, as os.environb would allow.
>>
>> I do wonder if controlling the environment given to a subprocess
>> requires os.environb, but it may be too obscure to really matter.
>
> IMHO, environment variables are no text. They are bytes by definition and
> should be treated as such.
> I know, there's windows having unicode enabled env vars on demand, but
> there's only trouble with those over there in apache's httpd (when passing
> them to CGI scripts, oh well...).

Environment variables have textual names, are set via text, frequently
contain textual file names or paths, and my shell (bash in
gnome-terminal on ubuntu) lets me put unicode text in just fine.  The
underlying APIs may use bytes, but they're *intended* to be encoded
text.


-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Thu Dec  4 22:40:05 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 14:40:05 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <49384957.3030102@gmail.com>
References: <4938374B.8000006@gmail.com> <49384957.3030102@gmail.com>
Message-ID: <aac2c7cb0812041340y3bff0f68v5dd162ee58f0242d@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:19 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Toshio Kuratomi wrote:
>> The bug report I opened suggests creating a PEP to address this issue.
>> I think that's a good idea for whether os.listdir() and friends should
>> be changed to raise an exception but not having any way to get at some
>> environment variables seems like it's just a bug that needs to be
>> addressed.  What do other people think on both these issues?
>
> I'm pretty sure the discussion on this topic a while back decided that
> where necessary Python 3 would grow parallel bytes versions of APIs
> affected by environmental encoding issues (such as os.environb,
> os.listdirb, os.getcwdb), but that we were OK with the idea of deferring
> addition of those APIs until 3.1.

It looks like most of them got into 3.0.
http://docs.python.org/3.0/library/os.html says "All functions
accepting path or file names accept both bytes and string objects, and
result in an object of the same type, if a path or file name is
returned."


> That is, this was an acknowledged limitation with a fairly
> straightforward agreed solution, but it wasn't considered a common
> enough issue to delay the release of 3.0 until all of those parallel
> APIs had been implemented

Aye.  IMO it's fairly clear that os.getenv()/os.putenv() should follow
suit in 3.1.  I'm not so sure about adding os.environb (and making
subprocess use it), unless the OP can demonstrate they really need it.

-- 
Adam Olsen, aka Rhamphoryncus

From python at rcn.com  Thu Dec  4 22:42:35 2008
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 4 Dec 2008 13:42:35 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org><20081204123750.GA890@amk.local><6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1><B2649D21-0D63-4598-B134-987B37549146@python.org><79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
Message-ID: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>

From: "A.M. Kuchling" <amk at amk.ca>
> Perhaps the statement could say something like "we do not expect
> most Python packages will be ported to the 3.x series until 
> around the time 3.1 is released in X months."  (where X=12?  6?)

I would leave out any discussion of 3.1.  Its content and release date
have nothing to do with when third party modules get updated.

Also, we don't know the timing of the third-party updates.
Some may never get converted.  Some may convert quickly
and easily.  Someone (perhaps me) may organize a series of
funded sprints to get many of the major packages converted.

It would be better to simply say that at the present time,
most important third-party modules do not yet support 3.0.

FWIW, my father is Python newbie and I'm pointing him
to 3.0 because it will be easier to learn than 2.6's hodgepodge
of new and old features.  The 3.0 environment is much cleaner.




From brett at python.org  Thu Dec  4 22:49:40 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Dec 2008 13:49:40 -0800
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh9hmk$bhb$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org> <493826D5.3020205@trueblade.com>
	<1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com>
	<gh9hmk$bhb$1@ger.gmane.org>
Message-ID: <bbaeab100812041349t56bd30e1sb3e4b23cb36d3b0f@mail.gmail.com>

On Thu, Dec 4, 2008 at 13:21, Georg Brandl <g.brandl at gmx.net> wrote:
> Benjamin Peterson schrieb:
>> On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith <eric at trueblade.com> wrote:
>>> Christian Heimes wrote:
>>>>
>>>> Several people have asked about the patch and merge flow. Now that Python
>>>> 3.0 is out it's a bit more complicated.
>>>>
>>>> Flow diagram
>>>> ------------
>>>>
>>>> trunk ---> release26-maint
>>>>       \->      py3k       ---> release30-maint
>>>>
>>>>
>>>> Patches for all versions of Python should land in the trunk. They are then
>>>> merged into release26-maint and py3k branches. Changes for Python 3.0 are
>>>> merged via the py3k branch.
>>>
>>> Apologies if this has been discussed before. I looked but didn't see
>>> anything.
>>>
>>> Given that at least 99% of the changes for the trunk will not get merged
>>> into release26-maint, doesn't it make more sense to merge the other way?
>>> That is, anything that gets checked in to release26-maint would potentially
>>> be merged into trunk. That would remove the huge number of merge blocks that
>>> will otherwise be required. Same fore py3k and release30-maint.
>
> I've suggested that too; the counter-argument was that "most people don't
> want to care in which branch to commit something".  I'm not too comfortable
> with this argument as it implies a certain ignorance on the part of our
> committers.  As Fred says, it wasn't discussed anyway.
>

That would make the rule for choosing which branch to first commit to
be the one with the smallest version:

2.6 -> trunk -> 3.0 -> py3k

That seems reasonable to me since that is really what the code
branching is and how I suspect we will do things with a DVCS.

> Also, with svnmerge, it is not too late to change merging direction.
>

If changing it to be like above is not an issue then I vote for the change.

>> I think the percentage is a bit lower than that. Also, we haven't been
>> using blocking with the maintenance branch so far; svnmerge.py is just
>> a convenience. (It generates commit messages and has a simpler
>> interface than a simple "svn merge" command.)
>
> I *did* use blocking with the 2.6 branch when I last merged a whole batch
> of commits.  As you say, by using svnmerge without blocking we only get a
> tool that can generate commit messages.  However, with blocking we get
> something more valuable: we don't overlook backportable fixes anymore.
>
> So: yes, blocking is more work, but it gives something important in return.

The other perk of this ordering is you should be able to place a
single block along the chain where the patch should stop and
potentially be done with the merges if you are in a rush. That way
people who do mass merges can just sequentially merge and not worry
about where a patch should stop.

-Brett

From tjreedy at udel.edu  Thu Dec  4 22:51:52 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 16:51:52 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938374B.8000006@gmail.com>
References: <4938374B.8000006@gmail.com>
Message-ID: <gh9jdk$i48$1@ger.gmane.org>

Toshio Kuratomi wrote:
> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
> suggested in the report that it's not a bug but a feature and so I
> should come here to see about getting the feature changed :-)

It does you no good and (and will irritate others) to conflate 'design 
decision I do not agree with' with 'mistaken documentation or 
implementation of a design decision'.  The former is opinion, the latter 
is usually fact (with occasional border cases).  The latter is what core 
developers mean by 'bug'.

> Currently in python3 there's no way to get at environment variables that
> are not encoded in the system default encoding.  My understanding is
> that this isn't a problem on Windows systems but on *nix this is a huge
> problem.  environment variables on *nix are a sequence of non-null
> bytes.  These bytes are almost always "characters" but they do not have
> to be.  Further, there is nothing that requires that the characters be
> in the same encoding; some of the characters could be in the UTF-8
> character set while others are in latin-1, shift-jis, or big-5.

To me, mixing encodings within a string is at least slightly insane.  If 
by design, maybe even a 'design bug' ;-).

> These mixed encodings can occur for a variety of reasons.  Here's an
> example that isn't too contrived :-)
> 
> Swallow is a multi-user shell server hosted at a university in Japan.
> The OS installed is Fedora 10 where the encoding of all filenames
> provided by the OS are UTF-8.  The administrator of the OS has kept this
> convention and, among other things has created a directory to mount and
> NFS directory from another computer.  He calls that "??????"
> ("network" in Japanese).  Since it's utf-8, that gets put on the
> filesystem as
> '\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf'
> 
> Now the administrators of the fileserver have been maintaining it since
> before Unicode was invented.  Furthermore, they don't want to suffer
> from the space loss of using utf-8 to encode Japanese so they use
> shift-jis everywhere.  They have a directory on the nfs share for
> programs that are useful for people on the shell server to access.  It's
> called "?????" ("programs" in Japanese)  Since they're using
> shift-jis, the bytes on the filesystem are:
> '\x83v\x83\x8d\x83O\x83\x89\x83\x80'
> 
> The system administrator of the shell server adds the directory of
> programs to all his user's default PATH variables so then they have this:
> 
> PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80

I would think life would be ultimately easier if either the file server 
or the shell server automatically translated file names from jis and 
utf8 and back, so that the PATH on the *nix shell server is entirely 
utf8.  How would you ever display a mixture to users?  What if there 
were an ambiguous component that could be legally decoded more than one way?

> Now comes the problematic part.  One of the user's on the system wants
> to write a python3 program that needs to determine if a needed program
> is in the user's PATH.  He tries to code it like this::
> 
> #!/usr/bin/python3.0
> 
> import os
> 
> for directory in os.environ['PATH']:
>     programs = os.listdir(directory)
> 
> That code raises a KeyError because python3 has silently discarded the
> PATH due to the shift-jis encoded path elements.  Much more importantly,
> there's no way the programmer can handle the KeyError and actually get
> the PATH from within python.

Have you tried os.system or os.popen or the subprocess module to use and 
get a response from a native *nix command?  On Windows

 >>> import subprocess as sp
 >>> s=sp.Popen('path', shell=True, stdout=sp.PIPE)
 >>> s.stdout.read()
b'PATH=C:\\temp\\WatconPermanent\\binnt;C:\\temp\\WatconPermanent\\binw;C:\\WINDOWS\\System32;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\Program 
Files\\PC-Doctor for Windows\\services;C:\\Program Files\\ATI 
Technologies\\ATI.ACE\\Core-Static;C:\\Program 
Files\\Python25;C:\\Program Files\\QuickTime\\QTSystem\\\r\n'

There are the bytes.  This took me 10 minutes and a few mistakes as a 
first time subprocess user.

Another 10 minutes and I figured out how to get the entire environment 
as bytes *and* convert them to a dict.  This is a bit trickier

s=sp.Popen('set', shell=True, stdout=sp.PIPE) #null set (env) cmd gets
e1= s.stdout.read()
e2=e1.split(b'\r\n')
e2.pop() # get rid of trailing b'' from trailing '\r\n'
e3=[i.split(b'=') for i in e2]
env = dict(e3)

Whether either of these should be wrapped in os, I'll leave for others 
to discuss and decide, but if you can do the same in *nix, you should be 
able to do what you need to for now.

Terry Jan Reedy


From brett at python.org  Thu Dec  4 23:03:52 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 4 Dec 2008 14:03:52 -0800
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <4938467B.40806@gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>
	<gh91il$f0m$1@ger.gmane.org> <4938467B.40806@gmail.com>
Message-ID: <bbaeab100812041403n387060eem8ee5f6cfc347d38b@mail.gmail.com>

On Thu, Dec 4, 2008 at 13:07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Terry Reedy wrote:
>> and this could give some people a mis-impression, most likely negative,
>> as to the magnitude and nature of the change.  Most of the code I am now
>> writing would, I believe, run with 2.5 except for print(..., file=xxx).
>>  And I know that there was concern for backward compatibility to the
>> point that some changes were rejected (renaming builtins) or delayed
>> (deleting duplicate test asserts) for that reason.  So I would soften
>> the statements to "... version of the language that is partially
>> incompatible with... " and "were made without being bound by backward
>> compatibility,"
>
> I would agree with Terry - while there are backwards incompatibilities,
> they aren't gratuitous.
>
> Then again, Guido does seem to want to discourage people from trying to
> target the common subset of the two languages instead of using 2to3 as a
> compilation step from the python3 version.
>

It makes sense if your code would have required jumping through hoops
to keep the base use-case. But if the only major difference is
something easily covered by a __future__ statement (think
print_function or unicode_literals, I believe although that __future__
statement is not documented anywhere according to Google), then I
honestly think it's okay to try to target the subset.

-Brett

From a.badger at gmail.com  Thu Dec  4 23:13:35 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 04 Dec 2008 14:13:35 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>
References: <4938374B.8000006@gmail.com>	<aac2c7cb0812041254n19b0332sd0a16385855e4ebc@mail.gmail.com>	<200812042209.34814.nd@perlig.de>
	<aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>
Message-ID: <4938560F.8070903@gmail.com>

Adam Olsen wrote:
> On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo <nd at perlig.de> wrote:
>> * Adam Olsen wrote:
>>> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi <a.badger at gmail.com>
>> wrote:
>>>> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
>>>> suggested in the report that it's not a bug but a feature and so I
>>>> should come here to see about getting the feature changed :-)
>>>>
>>>> I have a specific problem with os.environ and a somewhat less important
>>>> architectural issue with the unicode/bytes handling in certain os.*
>>>> modules.  I'll start with the important one:
>>>>
>>>> Currently in python3 there's no way to get at environment variables
>>>> that are not encoded in the system default encoding.  My understanding
>>>> is that this isn't a problem on Windows systems but on *nix this is a
>>>> huge problem.  environment variables on *nix are a sequence of non-null
>>>> bytes.  These bytes are almost always "characters" but they do not have
>>>> to be.  Further, there is nothing that requires that the characters be
>>>> in the same encoding; some of the characters could be in the UTF-8
>>>> character set while others are in latin-1, shift-jis, or big-5.
>>> Multiple encoding environments are best described as "batshit insane".
>>>  It's impossible to handle any of it correctly *as text*, which is why
>>> UTF-8 is becoming a universal standard.  For everybody's sanity python
>>> should continue to push it.
>> Here's an example which will become popular soon, I guess: CGI scripts and,
>> of course WSGI applications. All those get their environment in an unknown
>> encoding. In the worst case one can blow up the application by simply
>> sending strange header lines over the wire. But there's more: consider
>> running the server in C locale, then probably even a single 8 bit char
>> might break something (?).
> 
> I think that's an argument that the framework should reencode all
> input text into the correct system encoding before passing it on to
> the CGI script or WSGI app.  If the framework doesn't have a clear way
> to determine the client's encoding then it's all just gibberish
> anyway.  A HTTP 400 or 500 range error code is appropriate here.
> 
The framework can't always encode input bytes into the system encoding
for text.  Sometimes the framework can be dealing with actual bytes.
For instance, if the framework is being asked to reference an actual
file on a *NIX filesystem the bytes have to match up with the bytes in
the filename whether or not those bytes agree with the system encoding.

> 
>>> However, some pragmatism is also possible.  Many uses of PATH may
>>> allow it to be treated as black-box bytes, rather than text.  The
>>> minimal solution I see is to make os.getenv() and os.putenv() switch
>>> to byte modes when given byte arguments, as os.listdir() does.  This
>>> use case doesn't require the ability to iterate over all environment
>>> variables, as os.environb would allow.
>>>
>>> I do wonder if controlling the environment given to a subprocess
>>> requires os.environb, but it may be too obscure to really matter.
>> IMHO, environment variables are no text. They are bytes by definition and
>> should be treated as such.
>> I know, there's windows having unicode enabled env vars on demand, but
>> there's only trouble with those over there in apache's httpd (when passing
>> them to CGI scripts, oh well...).
> 
> Environment variables have textual names, are set via text, frequently
> contain textual file names or paths, and my shell (bash in
> gnome-terminal on ubuntu) lets me put unicode text in just fine.  The
> underlying APIs may use bytes, but they're *intended* to be encoded
> text.
> 
The example I've started using recently is this: text files on my system
contain character data and I expect them to be read into a string type
when I open them in python3.  However, if a text file contains text that
is not encoded in the system default encoding I should still be able to
get at the data and perform my own conversion.  So I agree with the
default of treating environment variables as text.  We just need to be
able to treat them as bytes when these corner cases come up.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/f6c322e7/attachment.pgp>

From nd at perlig.de  Thu Dec  4 23:47:52 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Thu, 4 Dec 2008 23:47:52 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de>
	<aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>
Message-ID: <200812042347.52388.nd@perlig.de>

* Adam Olsen wrote: 

> On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo <nd at perlig.de> wrote:

> > Here's an example which will become popular soon, I guess: CGI scripts
> > and, of course WSGI applications. All those get their environment in an
> > unknown encoding. In the worst case one can blow up the application by
> > simply sending strange header lines over the wire. But there's more:
> > consider running the server in C locale, then probably even a single 8
> > bit char might break something (?).
>
> I think that's an argument that the framework should reencode all
> input text into the correct system encoding before passing it on to
> the CGI script or WSGI app.  If the framework doesn't have a clear way
> to determine the client's encoding then it's all just gibberish
> anyway.  A HTTP 400 or 500 range error code is appropriate here.

Duh.
See, you're already mixing different encodings and creating issues here! 
You're talking about client encoding (whatever that is) with correct system 
encoding (whatever that is, too) in the same paragraph and assume they are 
the same or compatible.

There are several points here:

- there is no clear way to get a single client encoding for the whole HTTP 
  transaction (headers + body), because *there is none*. If the whole 
  header set matches the same encoding, it's more or less luck.

- there is no correct system encoding either. As said, I prefer running my 
  servers in C locale, so it's all ascii. In fact, it shouldn't matter. The 
  locale should not have anything to do with an application called over the 
  network.

- A 400 or 500 response for a header containing something like my name is 
  not appropriate.

- Octets in HTTP headers are allowed. And they are what they are -
  octets. The interpretation has to be left to the application, not the 
  framework.


>
> >> However, some pragmatism is also possible.  Many uses of PATH may
> >> allow it to be treated as black-box bytes, rather than text.  The
> >> minimal solution I see is to make os.getenv() and os.putenv() switch
> >> to byte modes when given byte arguments, as os.listdir() does.  This
> >> use case doesn't require the ability to iterate over all environment
> >> variables, as os.environb would allow.
> >>
> >> I do wonder if controlling the environment given to a subprocess
> >> requires os.environb, but it may be too obscure to really matter.
> >
> > IMHO, environment variables are no text. They are bytes by definition
> > and should be treated as such.
> > I know, there's windows having unicode enabled env vars on demand, but
> > there's only trouble with those over there in apache's httpd (when
> > passing them to CGI scripts, oh well...).
>
> Environment variables have textual names, are set via text, frequently

Well, think about my example again. The friendly way to maintain them is not 
the issue. The problems arise at least when the variables are set by an 
attacker.

> contain textual file names or paths, and my shell (bash in
> gnome-terminal on ubuntu) lets me put unicode text in just fine.  The
> underlying APIs may use bytes, but they're *intended* to be encoded
> text.

Yes, encoded text == bytes. No, they're intended to be c-strings. And well,  
even if we assume that they should contain text (as in encoded unicode), 
their meaning is application specific and so is the encoding (even if it's 
mixed).

What I'm saying is: I don't see much use for unicode APIs for the 
environment at all, because I don't know what's in there before inspecting 
them. And apparently the only reliable way to inspect them is via a byte 
oriented API.

nd

From a.badger at gmail.com  Thu Dec  4 23:51:25 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 04 Dec 2008 14:51:25 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <gh9jdk$i48$1@ger.gmane.org>
References: <4938374B.8000006@gmail.com> <gh9jdk$i48$1@ger.gmane.org>
Message-ID: <49385EED.9040004@gmail.com>

Terry Reedy wrote:
> Toshio Kuratomi wrote:
>> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
>> suggested in the report that it's not a bug but a feature and so I
>> should come here to see about getting the feature changed :-)
> 
> It does you no good and (and will irritate others) to conflate 'design
> decision I do not agree with' with 'mistaken documentation or
> implementation of a design decision'.  The former is opinion, the latter
> is usually fact (with occasional border cases).  The latter is what core
> developers mean by 'bug'.
> 
Noted.  However, there's also a difference between "Prevents us from
doing useful things" and "Allows doing a useful thing in a non-trivial
manner".  The latter I would call a difference in design decision and
the former I would call a bug in the design.

>> Currently in python3 there's no way to get at environment variables that
>> are not encoded in the system default encoding.  My understanding is
>> that this isn't a problem on Windows systems but on *nix this is a huge
>> problem.  environment variables on *nix are a sequence of non-null
>> bytes.  These bytes are almost always "characters" but they do not have
>> to be.  Further, there is nothing that requires that the characters be
>> in the same encoding; some of the characters could be in the UTF-8
>> character set while others are in latin-1, shift-jis, or big-5.
> 
> To me, mixing encodings within a string is at least slightly insane.  If
> by design, maybe even a 'design bug' ;-).
> 
As an application level developer I echo your sentiment :-)  I
recognize, though, that *nix filesystem semantics were designed many
years before unicode and the decision to treat filenames, environment
variables, and so much else as bytes follows naturally from the C
definition of a char.  It's up to a higher level than the OS to decide
how to displa6 the bytes.

[shell server and fileserver result in this insane PATH]
>> PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80
>>
> 
> I would think life would be ultimately easier if either the file server
> or the shell server automatically translated file names from jis and
> utf8 and back, so that the PATH on the *nix shell server is entirely
> utf8.

This is not possible because no part of the computer knows what the
encoding is.  To the computer, it's just a sequence of bytes.  Unlike
xml or the windows filesystem (winfs? ntfs?) where the encoding is
specified as part of the document/filesystem there's nothing to tell
what encoding the filenames are in.

>  How would you ever display a mixture to users?

This is up to the application.  My recomendation would be to keep the
raw bytes (to access the file on the filesystem) and display the results
of str(filename, errors='replace') to the user.

>  What if there
> were an ambiguous component that could be legally decoded more than one
> way?
> 
The ambiguity is the reason that the fileserver and shell server can't
automatically translate the filename (many encodings merely use all of
the 2^8 byte combinations available in a C char type.  This makes the
byte decodable in any one of those encodings).  In the application, only
using the raw bytes to access the file also prevents ambiguity because
the raw bytes only references one file.

>> Now comes the problematic part.  One of the user's on the system wants
>> to write a python3 program that needs to determine if a needed program
>> is in the user's PATH.  He tries to code it like this::
>>
>> #!/usr/bin/python3.0
>>
>> import os
>>
>> for directory in os.environ['PATH']:
>>     programs = os.listdir(directory)
>>
>> That code raises a KeyError because python3 has silently discarded the
>> PATH due to the shift-jis encoded path elements.  Much more importantly,
>> there's no way the programmer can handle the KeyError and actually get
>> the PATH from within python.
> 
> Have you tried os.system or os.popen or the subprocess module to use and
> get a response from a native *nix command?  On Windows
> 
Sure, you can subprocess your way out of a lot of sticky situations
since you're essentially delegating the task to a C routine.  But there
are drawbacks:

* You become dependent on an external program being available.  What
happens if your code is run in a chroot, for instance?
* Do we want anyone writing programs that access the environment on *NIX
to have to discover this pattern themselves and implement it?

As for wrapping this up in os.*, that isn't necessary -- the python3
interpreter already knows about the byte-oriented environment; it just
isn't making it available to people programming in python.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/c9faf0e7/attachment.pgp>

From p.f.moore at gmail.com  Thu Dec  4 23:52:41 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 4 Dec 2008 22:52:41 +0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
Message-ID: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>

2008/12/4 Raymond Hettinger <python at rcn.com>:
> Also, we don't know the timing of the third-party updates.
> Some may never get converted.  Some may convert quickly
> and easily.  Someone (perhaps me) may organize a series of
> funded sprints to get many of the major packages converted.

One piece of encouraging news I heard today is that mod_wsgi
apparently works with 3.0 already - which may well mean that more web
software than I'd originally anticipated will work sooner rather than
later.

But it's certainly true that Python (all versions, not just 3.0) is
more of an ecosystem than just the CPython core. "Batteries included"
notwithstanding. And it'll take longer for the 3.0 ecosystem to grow
than the 2.6 one.

Paul.

From rhamph at gmail.com  Fri Dec  5 00:15:47 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 16:15:47 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812042347.52388.nd@perlig.de>
References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de>
	<aac2c7cb0812041334r40e7f0d7k23376d74e3adfd04@mail.gmail.com>
	<200812042347.52388.nd@perlig.de>
Message-ID: <aac2c7cb0812041515p32306a75le69254335156198@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:47 PM, Andr? Malo <nd at perlig.de> wrote:
> * Adam Olsen wrote:
>
>> On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo <nd at perlig.de> wrote:
>
>> > Here's an example which will become popular soon, I guess: CGI scripts
>> > and, of course WSGI applications. All those get their environment in an
>> > unknown encoding. In the worst case one can blow up the application by
>> > simply sending strange header lines over the wire. But there's more:
>> > consider running the server in C locale, then probably even a single 8
>> > bit char might break something (?).
>>
>> I think that's an argument that the framework should reencode all
>> input text into the correct system encoding before passing it on to
>> the CGI script or WSGI app.  If the framework doesn't have a clear way
>> to determine the client's encoding then it's all just gibberish
>> anyway.  A HTTP 400 or 500 range error code is appropriate here.
>
> Duh.
> See, you're already mixing different encodings and creating issues here!
> You're talking about client encoding (whatever that is) with correct system
> encoding (whatever that is, too) in the same paragraph and assume they are
> the same or compatible.

Mixing can work so long as the encoding is clearly specified and
unambiguous.  It limits your character set to a common subset of both
encodings, but that's a lesser problem.


> There are several points here:
>
> - there is no clear way to get a single client encoding for the whole HTTP
>  transaction (headers + body), because *there is none*. If the whole
>  header set matches the same encoding, it's more or less luck.

If there is no way, via official standards or defacto standards, you
should assume ascii and blow up if anything else is given.  At that
point it's meaningless garbage anyway.


> - there is no correct system encoding either. As said, I prefer running my
>  servers in C locale, so it's all ascii. In fact, it shouldn't matter. The
>  locale should not have anything to do with an application called over the
>  network.

I half agree: the network should be unaffected by the C locale.
However, using a C locale should limit you to ascii file names and
environment variables.


> - A 400 or 500 response for a header containing something like my name is
>  not appropriate.
>
> - Octets in HTTP headers are allowed. And they are what they are -
>  octets. The interpretation has to be left to the application, not the
>  framework.

If there is no clear interpretation then they're garbage.  If there is
a clear interpretation it could be done just as well in the framework,
which also lets all the apps benefit from a single implementation,
rather than trying to reimplement it for each one.


>> >> However, some pragmatism is also possible.  Many uses of PATH may
>> >> allow it to be treated as black-box bytes, rather than text.  The
>> >> minimal solution I see is to make os.getenv() and os.putenv() switch
>> >> to byte modes when given byte arguments, as os.listdir() does.  This
>> >> use case doesn't require the ability to iterate over all environment
>> >> variables, as os.environb would allow.
>> >>
>> >> I do wonder if controlling the environment given to a subprocess
>> >> requires os.environb, but it may be too obscure to really matter.
>> >
>> > IMHO, environment variables are no text. They are bytes by definition
>> > and should be treated as such.
>> > I know, there's windows having unicode enabled env vars on demand, but
>> > there's only trouble with those over there in apache's httpd (when
>> > passing them to CGI scripts, oh well...).
>>
>> Environment variables have textual names, are set via text, frequently
>
> Well, think about my example again. The friendly way to maintain them is not
> the issue. The problems arise at least when the variables are set by an
> attacker.

Maintaining them *IS* the issue.  The whole reason they're text in the
first place is to display them to and receive them back from the user.
 Otherwise we'd use meaningless serial numbers for directories or
something.

It may not seem to matter in this use case, but that's only because
they're communicated to/from the user on another system.


>> contain textual file names or paths, and my shell (bash in
>> gnome-terminal on ubuntu) lets me put unicode text in just fine.  The
>> underlying APIs may use bytes, but they're *intended* to be encoded
>> text.
>
> Yes, encoded text == bytes. No, they're intended to be c-strings. And well,
> even if we assume that they should contain text (as in encoded unicode),
> their meaning is application specific and so is the encoding (even if it's
> mixed).
>
> What I'm saying is: I don't see much use for unicode APIs for the
> environment at all, because I don't know what's in there before inspecting
> them. And apparently the only reliable way to inspect them is via a byte
> oriented API.

If you don't think your paths should contain text then please alter
your other systems to stop using japanese names.  Standardize on ascii
serial numbers or something equally meaningless.

Treating it as bytes is a bodge.  It's worth getting your use case to
"just work", but in the end it is text, and the *only* broad solution
to text is unicode.


-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Fri Dec  5 00:16:27 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 04 Dec 2008 15:16:27 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041340y3bff0f68v5dd162ee58f0242d@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49384957.3030102@gmail.com>
	<aac2c7cb0812041340y3bff0f68v5dd162ee58f0242d@mail.gmail.com>
Message-ID: <493864CB.1040604@gmail.com>

Adam Olsen wrote:
> On Thu, Dec 4, 2008 at 2:19 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Toshio Kuratomi wrote:
>>> The bug report I opened suggests creating a PEP to address this issue.
>>> I think that's a good idea for whether os.listdir() and friends should
>>> be changed to raise an exception but not having any way to get at some
>>> environment variables seems like it's just a bug that needs to be
>>> addressed.  What do other people think on both these issues?
>> I'm pretty sure the discussion on this topic a while back decided that
>> where necessary Python 3 would grow parallel bytes versions of APIs
>> affected by environmental encoding issues (such as os.environb,
>> os.listdirb, os.getcwdb), but that we were OK with the idea of deferring
>> addition of those APIs until 3.1.
> 
> It looks like most of them got into 3.0.
> http://docs.python.org/3.0/library/os.html says "All functions
> accepting path or file names accept both bytes and string objects, and
> result in an object of the same type, if a path or file name is
> returned."
> 
<nod>  I'm very glad this is coming along.  Just want to make sure the
environment is also handled in 3.1.
> 
>> That is, this was an acknowledged limitation with a fairly
>> straightforward agreed solution, but it wasn't considered a common
>> enough issue to delay the release of 3.0 until all of those parallel
>> APIs had been implemented
> 
> Aye.  IMO it's fairly clear that os.getenv()/os.putenv() should follow
> suit in 3.1.  I'm not so sure about adding os.environb (and making
> subprocess use it), unless the OP can demonstrate they really need it.
> 
Note: subprocess currently uses the "real" environment (the raw
environment as given to the python interpreter) when it is started
without the `env` parameter.  So the question would be what people
overriding the env parameter on their own need to do.

To be non-surprising I'd think they'd want to have a way to override
just a few variables from the raw environment.  Otherwise you have to
know which variables the program you're calling relies on and make sure
that those are set or call os.getenvb() to retrieve the byte version and
add it to your copy of os.environ before passing that to subprocess.

One example of something that would be even harder to implement without
access to the os.environb dictionary would be writing a program that
wraps make.  Since make takes all the variables from the environment and
transforms them into make variables you need to pass everything from the
environment that you are not modifying into the command.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081204/aceadea2/attachment-0001.pgp>

From martin at v.loewis.de  Fri Dec  5 00:21:50 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 00:21:50 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <gh8vh2$638$1@ger.gmane.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>	<4937B80D.9070309@gmail.com>	<gh8j4g$ol6$1@ger.gmane.org>
	<gh8vh2$638$1@ger.gmane.org>
Message-ID: <4938660E.9080809@v.loewis.de>

>> I can't find any docs built for Python 3.0 (not 3.1a0). 
> 
> The Windows installation has new 3.0 doc dated Dec 3, so it was built,
> just not posted correctly.

That doesn't mean very much. I built it on my local machine. Anybody
with subversion and python could do that; the documentation is in
subversion.

Whether or not it appears on the web site as part of the release
process is an entirely different matter. It used to be that the
doc maintainer (Fred Drake) was part of the release team and release
process. I think Georg is complaining that he is release maintainer,
but not part of the release process.

Regards,
Martin

From martin at v.loewis.de  Fri Dec  5 00:24:26 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 00:24:26 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
Message-ID: <493866AA.7000806@v.loewis.de>

> ISTM, 3.0 is in pretty good shape.  There is nothing intrinsically wrong
> with it.

I think it has many bugs, some known before the release, but many more
yet to show up. I agree that the design is good; the implementation will
certainly improve (I deliberately didn't say "could have been better",
because it could not have been better, given the time available to the
contributors).

Regards,
Martin

From barry at python.org  Fri Dec  5 00:25:35 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 4 Dec 2008 18:25:35 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <4938660E.9080809@v.loewis.de>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>	<4937B80D.9070309@gmail.com>	<gh8j4g$ol6$1@ger.gmane.org>
	<gh8vh2$638$1@ger.gmane.org> <4938660E.9080809@v.loewis.de>
Message-ID: <AF6A07DF-C986-4FDE-AE9A-7B679F78A76F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 4, 2008, at 6:21 PM, Martin v. L?wis wrote:

>>> I can't find any docs built for Python 3.0 (not 3.1a0).
>>
>> The Windows installation has new 3.0 doc dated Dec 3, so it was  
>> built,
>> just not posted correctly.
>
> That doesn't mean very much. I built it on my local machine. Anybody
> with subversion and python could do that; the documentation is in
> subversion.
>
> Whether or not it appears on the web site as part of the release
> process is an entirely different matter. It used to be that the
> doc maintainer (Fred Drake) was part of the release team and release
> process. I think Georg is complaining that he is release maintainer,
> but not part of the release process.

I've asked Georg to update PEP 101 to make his role as Documentation  
Expert explicit.  Unfortunately we only debug major releases once (or  
twice) every 18 months.  But next time, we'll get that part right for  
sure!

In the meantime, I'll make sure Georg is involved in point releases  
moving forward.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSThm8HEjvBPtnXfVAQJgGgP/eiAUroHbxvpJLT8JRpW5H+nmyU5yGGCY
NZYrU/Vm2vRPFyfDevOFErQX9Jr1LqO0x4Qgxm4PpIj3OVwM16INz98as6nONEhC
MfTjf8Pp7f5BrF7HYh1XfPqTy5qpVhAkzKrCcjUk2VT/JHzJ4wrAl+29VhDTjvrz
3SXphnxWi6w=
=dfm7
-----END PGP SIGNATURE-----

From amauryfa at gmail.com  Fri Dec  5 00:29:43 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 5 Dec 2008 00:29:43 +0100
Subject: [Python-Dev] Taint Mode in Python 3.0
In-Reply-To: <200812041836.48146.nicole@cats-muvva.net>
References: <200812041836.48146.nicole@cats-muvva.net>
Message-ID: <e27efe130812041529x72d900f8xcb62cd5d8b48bd27@mail.gmail.com>

Hello,

On Thu, Dec 4, 2008 at 19:36, Nicole King <nicole at cats-muvva.net> wrote:
> Dear All,
>
> I have published the diff for my implementation of tainted mode in Python for
> R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the
> bottom the page. I apologise for past problems accessing this web site: I
> hope to have resolved all the issues with it.

The patch is indeed huge! it seems that every function that returns a
PyObject must be modified.
And it seems very difficult to check for its correctness.

Did you look at the Pypy project? The C code of the interpreter is
generated, and it already proposes a "Taint" option at translation
time.
http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html#taint
With only 300 lines of elegant python code...

-- 
Amaury Forgeot d'Arc

From martin at v.loewis.de  Fri Dec  5 00:31:59 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 00:31:59 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh9ho9$bnc$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org> <gh9ho9$bnc$1@ger.gmane.org>
Message-ID: <4938686F.1090508@v.loewis.de>


>> trunk ---> release26-maint
>>         \->      py3k       ---> release30-maint
>>
>>
> 
> As a side-note: this merging flow means that bugfix and feature commits
> may never be merged from trunk to py3k in one svnmerge batch.  Else,
> they cannot be separated when merging from py3k to 30-maint.

True. However, the same would be true for the merge flow

26 -> trunk -> 3.0 -> 3k

In fact, that merge flow wouldn't support merging features *at all*:
a feature added to trunk would need to flow through 3.0, which can't
accept new features.

Regards,
Martin

From fijall at gmail.com  Fri Dec  5 00:38:25 2008
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 5 Dec 2008 00:38:25 +0100
Subject: [Python-Dev] Taint Mode in Python 3.0
In-Reply-To: <e27efe130812041529x72d900f8xcb62cd5d8b48bd27@mail.gmail.com>
References: <200812041836.48146.nicole@cats-muvva.net>
	<e27efe130812041529x72d900f8xcb62cd5d8b48bd27@mail.gmail.com>
Message-ID: <693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com>

Hello,

The thing is pypy's taint code is broken. Basically you don't only
need to patch all places that return pyobject, but also all places
that might modify anything. (All side effects) For example innocently
looking call to addition might end up calling arbitrary python code
(and have arbitrary side effects). There is a question how do you
approach such things?

Cheers,
fijal

On Fri, Dec 5, 2008 at 12:29 AM, Amaury Forgeot d'Arc
<amauryfa at gmail.com> wrote:
> Hello,
>
> On Thu, Dec 4, 2008 at 19:36, Nicole King <nicole at cats-muvva.net> wrote:
>> Dear All,
>>
>> I have published the diff for my implementation of tainted mode in Python for
>> R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the
>> bottom the page. I apologise for past problems accessing this web site: I
>> hope to have resolved all the issues with it.
>
> The patch is indeed huge! it seems that every function that returns a
> PyObject must be modified.
> And it seems very difficult to check for its correctness.
>
> Did you look at the Pypy project? The C code of the interpreter is
> generated, and it already proposes a "Taint" option at translation
> time.
> http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html#taint
> With only 300 lines of elegant python code...
>
> --
> Amaury Forgeot d'Arc
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
>

From martin at v.loewis.de  Fri Dec  5 00:39:24 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 00:39:24 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938374B.8000006@gmail.com>
References: <4938374B.8000006@gmail.com>
Message-ID: <49386A2C.60208@v.loewis.de>

> In the bug report I opened, I listed four ways to fix this along with
> the pros and cons:

I'm in favour of a different, fifth solution:

5) represent all environment variables in Unicode strings,
   including the ones that currently fail to decode.
   (then do the same to file names, then drop the byte-oriented
    file operations again)

Regards,
Martin

From janzert at janzert.com  Fri Dec  5 01:20:57 2008
From: janzert at janzert.com (Janzert)
Date: Thu, 04 Dec 2008 19:20:57 -0500
Subject: [Python-Dev] Merging mailing lists
In-Reply-To: <4937886B.4000002@v.loewis.de>
References: <4937886B.4000002@v.loewis.de>
Message-ID: <gh9s59$f50$1@ger.gmane.org>

Martin v. L?wis wrote:
> I would like to merge mailing lists, now that the design and first
> implementation of Python 3000 is complete. In particular, I would
> like to merge the python-3000 mailing list back into python-dev,
> and the python-3000-checkins mailing list back into python-checkins.
> The rationale is to simplify usage of the lists, and to avoid
> cross-postings.
> 
> To implement this, all subscribers of the 3000 mailing lists would
> be added to the trunk mailing lists (avoiding duplicates, of course),
> and all automated messages going to python-3000-checkins would then
> be directed to the trunk lists. The 3000 mailing lists would change
> into read-only mode (i.e. primarily leaving the archives behind).
> 
> Any objections?
> 
> Regards,
> Martin

I like the general sentiment, but I think it may be a bad idea to
automatically bring all the subscribers from the -3000 lists over to the
more general lists. For instance if someone has an address subscribed
specifically to archive the -3000 list suddenly it will begin archiving
the other. I would rather just see a final announcement to switch to the
other list and then close the list to further submissions. Let people
join the new appropriate list manually if needed.

Otherwise +1 on getting the discussion and checkins back into combined
lists.

Janzert


From fwierzbicki at gmail.com  Fri Dec  5 02:02:59 2008
From: fwierzbicki at gmail.com (Frank Wierzbicki)
Date: Thu, 4 Dec 2008 20:02:59 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
	<bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
Message-ID: <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon <brett at python.org> wrote:
> On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki <fwierzbicki at gmail.com> wrote:
>> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling <amk at amk.ca> wrote:
>>> 14:00 - 15:30
>>> =============
>>>
>>> Two tracks:
>>>
>>> Cross-implementation issues:
>>>
>>>  What do the various VMs want/need from CPython to help with their
>>>  implementations?
>>>
>>>  * Marking CPython-specific tests in the test suite?
>>>  * Getting an implementation agnostic test suite for the Python language?
>>>  * Separating the language tests and the pure Python part of the stdlib into
>>>    a separate project?  (Or publish them as a separate package.)
>>>  * Transition plans for 3.0?
>>>
>>>  Champion needed.
>> I would like to champion this one.
>>
>
> I told AMK this a while back, but might as well make it more public; I
> am up for chairing as well.
Brett,

Are you saying you've already called the cross-implementation champion
role?  If so I'm happy to defer or co-chair.

-Frank

From tjreedy at udel.edu  Fri Dec  5 02:16:55 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 20:16:55 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <49385EED.9040004@gmail.com>
References: <4938374B.8000006@gmail.com> <gh9jdk$i48$1@ger.gmane.org>
	<49385EED.9040004@gmail.com>
Message-ID: <gh9vdu$n0u$1@ger.gmane.org>

Toshio Kuratomi wrote:
>
>> I would think life would be ultimately easier if either the file server
>> or the shell server automatically translated file names from jis and
>> utf8 and back, so that the PATH on the *nix shell server is entirely
>> utf8.
> 
> This is not possible because no part of the computer knows what the
> encoding is.  To the computer, it's just a sequence of bytes.  Unlike
> xml or the windows filesystem (winfs? ntfs?) where the encoding is
> specified as part of the document/filesystem there's nothing to tell
> what encoding the filenames are in.

I thought you said that the file server keep all filenames in shift-jis, 
and the shell server all in utf-8.  If so, then the shell server could 
know if it were told so.


From python at rcn.com  Fri Dec  5 02:29:31 2008
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 4 Dec 2008 17:29:31 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
Message-ID: <F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>

> 2008/12/4 Raymond Hettinger <python at rcn.com>:
>> Also, we don't know the timing of the third-party updates.
>> Some may never get converted.  Some may convert quickly
>> and easily.  Someone (perhaps me) may organize a series of
>> funded sprints to get many of the major packages converted.

From: "Paul Moore" <p.f.moore at gmail.com>
> One piece of encouraging news I heard today is that mod_wsgi
> apparently works with 3.0 already - which may well mean that more web
> software than I'd originally anticipated will work sooner rather than
> later.
> 
> But it's certainly true that Python (all versions, not just 3.0) is
> more of an ecosystem than just the CPython core. "Batteries included"
> notwithstanding. And it'll take longer for the 3.0 ecosystem to grow
> than the 2.6 one.

Here's a bright idea.  On the 3.0 release page, include a box listing
which major third-party apps have been converted.  Update it
once every couple of weeks.  That way, we're not explicitly
discouraging adoption of 3.0, we're just listing what support is
then currently available (if you need twisted and its not on the list,
then that would be your guide).


Raymond


From tjreedy at udel.edu  Fri Dec  5 02:33:56 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 20:33:56 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org><20081204123750.GA890@amk.local><6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1><B2649D21-0D63-4598-B134-987B37549146@python.org><79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
Message-ID: <gha0dr$pkl$1@ger.gmane.org>

Raymond Hettinger wrote:
> From: "A.M. Kuchling" <amk at amk.ca>
>> Perhaps the statement could say something like "we do not expect
>> most Python packages will be ported to the 3.x series until around the 
>> time 3.1 is released in X months."  (where X=12?  6?)
> 
> I would leave out any discussion of 3.1.  Its content and release date
> have nothing to do with when third party modules get updated.
> 
> Also, we don't know the timing of the third-party updates.
> Some may never get converted.  Some may convert quickly
> and easily.  Someone (perhaps me) may organize a series of
> funded sprints to get many of the major packages converted.
> 
> It would be better to simply say that at the present time,
> most important third-party modules do not yet support 3.0.
> 
> FWIW, my father is Python newbie and I'm pointing him
> to 3.0 because it will be easier to learn than 2.6's hodgepodge
> of new and old features.  The 3.0 environment is much cleaner.

I agree with all 4 points, especially the last. I think newcomers should 
be informed of the +/- of different versions and then choose for 
themselves.  For full battery availability, 2.5 is it and will be for 
some months.  For a fresh start without need of extras, 3.0 wins in my 
experience so far.

tjr


From tjreedy at udel.edu  Fri Dec  5 02:36:41 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 20:36:41 -0500
Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final
In-Reply-To: <4938467B.40806@gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<ca471dc20812031819l28ed7463n955267b935602c3@mail.gmail.com>	<gh91il$f0m$1@ger.gmane.org>
	<4938467B.40806@gmail.com>
Message-ID: <gha0iv$pkl$2@ger.gmane.org>

Nick Coghlan wrote:
> Terry Reedy wrote:
>> and this could give some people a mis-impression, most likely negative,
>> as to the magnitude and nature of the change.  Most of the code I am now
>> writing would, I believe, run with 2.5 except for print(..., file=xxx).
>>  And I know that there was concern for backward compatibility to the
>> point that some changes were rejected (renaming builtins) or delayed
>> (deleting duplicate test asserts) for that reason.  So I would soften
>> the statements to "... version of the language that is partially
>> incompatible with... " and "were made without being bound by backward
>> compatibility,"
> 
> I would agree with Terry - while there are backwards incompatibilities,
> they aren't gratuitous.
> 
> Then again, Guido does seem to want to discourage people from trying to
> target the common subset of the two languages instead of using 2to3 as a
> compilation step from the python3 version.

I am not restricting myself to that subset.  There simply is an 
unchanged core that happens to include what I currently need (except 
print, which is isolated in one module).  I might need 'except x as y:' 
someday and will use it if I do but so far 'except x:' is enough and 
back compatible).

tjr


From foom at fuhm.net  Fri Dec  5 02:14:40 2008
From: foom at fuhm.net (James Y Knight)
Date: Thu, 4 Dec 2008 20:14:40 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <49386A2C.60208@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
Message-ID: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>

On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
> I'm in favour of a different, fifth solution:
>
> 5) represent all environment variables in Unicode strings,
>   including the ones that currently fail to decode.
>   (then do the same to file names, then drop the byte-oriented
>    file operations again)

Yay, maybe we can have this whole discussion all over again!

Let's bring out all the same arguments, come to no conclusion, and let  
it taper off unresolved, yet again! :)

FWIW, I still agree with Martin that that's the most reasonable  
solution.

James

From tjreedy at udel.edu  Fri Dec  5 03:08:03 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 04 Dec 2008 21:08:03 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
Message-ID: <gha2dq$u0m$1@ger.gmane.org>

James Y Knight wrote:
> On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
>> I'm in favour of a different, fifth solution:
>>
>> 5) represent all environment variables in Unicode strings,
>>   including the ones that currently fail to decode.
>>   (then do the same to file names, then drop the byte-oriented
>>    file operations again)
> 
> Yay, maybe we can have this whole discussion all over again!
> 
> Let's bring out all the same arguments, come to no conclusion, and let 
> it taper off unresolved, yet again! :)

My impression was that there was not enough time to do something like 
that for the soon-to-be-released 3.0, so it was deferred.  Now or soon 
is the time to reconsider.

> FWIW, I still agree with Martin that that's the most reasonable solution.

FWIW2, I have much the same feeling.


From rhamph at gmail.com  Fri Dec  5 03:32:22 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 19:32:22 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
Message-ID: <aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>

On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote:
> On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
>>
>> I'm in favour of a different, fifth solution:
>>
>> 5) represent all environment variables in Unicode strings,
>>  including the ones that currently fail to decode.
>>  (then do the same to file names, then drop the byte-oriented
>>   file operations again)
>
> Yay, maybe we can have this whole discussion all over again!
>
> Let's bring out all the same arguments, come to no conclusion, and let it
> taper off unresolved, yet again! :)
>
> FWIW, I still agree with Martin that that's the most reasonable solution.

It died because nobody presented a viable solution, and I maintain no
solution is possible.  All suggestions involve arbitrary
transformations that fail to round trip correctly at some point or
another.  They're simply about shuffling the failure around to
somewhere the poster happens to like.

Please, if you have a *new* idea that doesn't have a failure mode, by
all means post it.  But don't resurrect a pointless bikeshed.


-- 
Adam Olsen, aka Rhamphoryncus

From amk at amk.ca  Fri Dec  5 03:35:14 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 4 Dec 2008 21:35:14 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
Message-ID: <20081205023514.GA1723@amk.local>

On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote:
> Here's a bright idea.  On the 3.0 release page, include a box listing
> which major third-party apps have been converted.  Update it
> once every couple of weeks.  That way, we're not explicitly

That's an excellent idea.  We could have a webpage, or start a
topic-specific weblog for posting announcements.

I've started a draft of a 3.0 FAQ in the wiki at
<http://wiki.python.org/moin/Python3000/FAQ>.  Once it's finished we
can move it into the 3.0 release pages.  Everyone please edit and
improve it!

--amk

From dinov at microsoft.com  Fri Dec  5 04:24:08 2008
From: dinov at microsoft.com (Dino Viehland)
Date: Thu, 4 Dec 2008 19:24:08 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
Message-ID: <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com>

Does anyone know what Mono does here?  Presumably they have the exact same
problem as all strings in .NET are Unicode, and filenames/env vars/etc...
are always strings.

Maybe if it's gotta be broken at least it can be broken in a manner
that's consistent with others :)

> -----Original Message-----
> From: python-dev-bounces+dinov=microsoft.com at python.org [mailto:python-
> dev-bounces+dinov=microsoft.com at python.org] On Behalf Of Adam Olsen
> Sent: Thursday, December 04, 2008 6:32 PM
> To: James Y Knight
> Cc: "Martin v. L?wis"; python-dev List
> Subject: Re: [Python-Dev] Python-3.0, unicode, and os.environ
>
> On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote:
> > On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
> >>
> >> I'm in favour of a different, fifth solution:
> >>
> >> 5) represent all environment variables in Unicode strings,
> >>  including the ones that currently fail to decode.
> >>  (then do the same to file names, then drop the byte-oriented
> >>   file operations again)
> >
> > Yay, maybe we can have this whole discussion all over again!
> >
> > Let's bring out all the same arguments, come to no conclusion, and
> let it
> > taper off unresolved, yet again! :)
> >
> > FWIW, I still agree with Martin that that's the most reasonable
> solution.
>
> It died because nobody presented a viable solution, and I maintain no
> solution is possible.  All suggestions involve arbitrary
> transformations that fail to round trip correctly at some point or
> another.  They're simply about shuffling the failure around to
> somewhere the poster happens to like.
>
> Please, if you have a *new* idea that doesn't have a failure mode, by
> all means post it.  But don't resurrect a pointless bikeshed.
>
>
> --
> Adam Olsen, aka Rhamphoryncus
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-
> dev/dinov%40microsoft.com


From rhamph at gmail.com  Fri Dec  5 04:47:22 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 20:47:22 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com>
Message-ID: <aac2c7cb0812041947p47324ffav3c86e661905aa8d1@mail.gmail.com>

On Thu, Dec 4, 2008 at 8:24 PM, Dino Viehland <dinov at microsoft.com> wrote:
> Does anyone know what Mono does here?  Presumably they have the exact same
> problem as all strings in .NET are Unicode, and filenames/env vars/etc...
> are always strings.
>
> Maybe if it's gotta be broken at least it can be broken in a manner
> that's consistent with others :)

Many of the windows APIs use UTF-16 without validating it.  They'll
pass through invalid strings until they hit something that does
validate, at which point it'll blow up.

I suspect that it doesn't happen very often in practice, as having
only one encoding makes it quite clear that it's a broken file name,
not a mixed encoding environment.


-- 
Adam Olsen, aka Rhamphoryncus

From glyph at divmod.com  Fri Dec  5 04:52:36 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 03:52:36 -0000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <gha2dq$u0m$1@ger.gmane.org>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<gha2dq$u0m$1@ger.gmane.org>
Message-ID: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com>

On 02:08 am, tjreedy at udel.edu wrote:
>James Y Knight wrote:
>>On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
>>>I'm in favour of a different, fifth solution:
>>>
>>>5) represent all environment variables in Unicode strings,
>>>   including the ones that currently fail to decode.
>>>   (then do the same to file names, then drop the byte-oriented
>>>    file operations again)

>>FWIW, I still agree with Martin that that's the most reasonable 
>>solution.
>
>FWIW2, I have much the same feeling.

And I still disagree, but I re-read the old thread and didn't see much 
of a clear argument there, so at least I'm not re-treading old ground 
:).

The only strategy that would allow us to encode all inputs as unicode 
(including the invalid ones) is to abuse NUL to mean "ha ha, this isn't 
actually a unicode string, it's something I couldn't decode".  This is 
nice because it allows the type of the returned value to be the same, so 
a Python program that expects a unicode object will be able to 
manipulate this object (as long as it doesn't split it up too close to a 
NUL).

It seems to me that this convenient, but clever-clever type distinction 
will inevitably be a bug magnet.  For the most basic example, see the 
caveat above.  But more realistically - not too much code splits 
filenames on anything but "." or os.sep, after all - if you pass this to 
an extension module which then wants to invoke a C library function 
which passes the file name to open() and friends, what is the right 
thing for the extension module to do?  There would need to be a new API 
which could get the "right" bytes out of a unicode string which 
potentially has NULs in it.  This can't just be an encoding, either, 
because you might need to get the Shift-JIS bytes (some foreign system's 
encoding) for some got-NULs-in-it filename even though your locale says 
the encoding is UTF-8.  And what if those bytes happen to be valid 
Shift-JIS?  Decoding bytes makes a lot more sense to me than transcoding 
strings.

Filenames and environment variables would all need to be encoded or 
decoded according to this magic encoding.  And what happens if you get 
some garbage data from elsewhere and pass it to a function that 
*generates* a filename?  Now, you get a pleasant error message, 
"TypeError: file() argument 1 must be (encoded string without NULL 
bytes), not str".  In the future, I can only assume (if you're lucky) 
that you'll get some weird thing out of the guts of an encoding module; 
or, more likely, some crazy mojibake filename containing PUA code points 
or whatever will silently get opened.  You can make this less likely 
(and harder to debug in the odd cases where it does happen) by making 
the encoding more clever, but eventually your luck will run out: most 
likely on somebody's computer who doesn't speak english well enough to 
report the problem clearly.

The scenario gets progressively more nightmarish as you start putting 
more libraries into the mix.  You pass some environment variable into 
some library which knows all about unicode and happily handles it 
correctly, but a second library which doesn't know about this proposed 
tricky NUL convention gets the unicode filename and transcodes it 
literally, causing an error return from open().  This puts the apparent 
error very far away from the responsible code.

Ultimately it makes sense to expose the underlying bytes as bytes 
without forcing everyone to pretend that they make sense as anything but 
bytes, and allow different applications to make appropriately educated 
guesses about their character format.  In any case, programmers who 
don't know about these kinds of issues are going to make mistakes in 
handling invalid filenames on UNIXy systems, and some users won't be 
able to open some files.  If there is an easy and straightforward way to 
get the bytes out, it's more likely that programmers who know what they 
are doing will be able to get the correct behavior.

Of course, the NUL-encoding trick will make it *possible* to do the 
right thing, but our hypothetically savvy programmer now needs to learn 
about the bytes/unicode distinction between 
windows/mac+linux+everythingelse, and Python's special convention for 
invalid data, and how to mix it with encoding/decoding/transcoding, 
rather than just Python's distinct API for the distinct types that may 
represent a filename.  I think this is significantly harder to document 
than just having two parallel APIs (environ, environb, open(str), 
open(bytes), listdir(str), listdir(bytes)) to reflect the very subtle, 
but nevertheless very real, distinction between the Windows and UNIX 
worlds.

This distinct API can still provide the same illusion of "it usually 
works" portability that the encoding convention can: for Windows, 
environb can be the representation of the environment in a particular 
encoding; for UNIX, environ(u) can be all of the variables which 
correctly decode.  And so on for each other API.

At least this time I think I've encapsulated pretty much my entire 
argument here, so if you don't buy it, we can probably just agree to 
disagree :).

From glyph at divmod.com  Fri Dec  5 04:55:50 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 03:55:50 -0000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
Message-ID: <20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com>


On 02:32 am, rhamph at gmail.com wrote:
>On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote:

>>FWIW, I still agree with Martin that that's the most reasonable 
>>solution.
>
>It died because nobody presented a viable solution, and I maintain no
>solution is possible.  All suggestions involve arbitrary
>transformations that fail to round trip correctly at some point or
>another.  They're simply about shuffling the failure around to
>somewhere the poster happens to like.
>
>Please, if you have a *new* idea that doesn't have a failure mode, by
>all means post it.  But don't resurrect a pointless bikeshed.

Despite my objection to the funny-encoding strategy (which I've 
documented thoroughly in my other message to this thread) this isn't 
accurate.  The PUA solution doesn't work, but using NUL does.  This was 
proposed last time, as a copy of what Mono does.  You can't get a NUL in 
os.environ or a filename; it's not valid.  So, it works fine as an 
escape character.  It can round-trip perfectly.

From glyph at divmod.com  Fri Dec  5 04:59:42 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 03:59:42 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205023514.GA1723@amk.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
Message-ID: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>


On 02:35 am, amk at amk.ca wrote:
>On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote:
>>Here's a bright idea.  On the 3.0 release page, include a box listing
>>which major third-party apps have been converted.  Update it
>>once every couple of weeks.  That way, we're not explicitly
>
>That's an excellent idea.  We could have a webpage, or start a
>topic-specific weblog for posting announcements.
>
>I've started a draft of a 3.0 FAQ in the wiki at
><http://wiki.python.org/moin/Python3000/FAQ>.  Once it's finished we
>can move it into the 3.0 release pages.  Everyone please edit and
>improve it!

It occurs to me that this specific idea (the box with the list of 
supported applications / libraries) should be implementable as a simple 
query against PyPI.  I don't know if it actually is :), but it should 
be.  In general it would be nice to know whether one's favorite tools 
were available for *any* new Python version.

From fdrake at acm.org  Fri Dec  5 05:15:06 2008
From: fdrake at acm.org (Fred Drake)
Date: Thu, 04 Dec 2008 23:15:06 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
Message-ID: <C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>

On Dec 4, 2008, at 10:59 PM, glyph at divmod.com wrote:
> It occurs to me that this specific idea (the box with the list of  
> supported applications / libraries) should be implementable as a  
> simple query against PyPI.  I don't know if it actually is :), but  
> it should be.  In general it would be nice to know whether one's  
> favorite tools were available for *any* new Python version.


I agree, this would be ideal.  I'm not sure the metadata is there to  
support it, though.

Each (version of each) package would need to register metadata  
recording which versions of Python it's known to be compatible with  
("has been tested with").  I'd love for this to be available, and  
would be more proactive about testing software I've been involved in  
releasing against more Python versions.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From guido at python.org  Fri Dec  5 05:16:45 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Dec 2008 20:16:45 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
Message-ID: <ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>

I hear some folks are considering advertising 3.0 as experimental or
not ready for serious use yet.

I think that's too negative -- we should encourage people to use it,
period. They'll have to decide for themselves whether they can live
with the lack of ported 3rd party libraries -- which may resolve
itself soon enough. We should make it clear that it's perfectly fine
to stick with 2.6, but at the same time encourage people to try 3.0
and see for themselves -- IMO it's as solid as 2.6. (2.6.1 being more
solid, of course, as will be 3.0.1).

Especially from the education front I've heard a lot of positive
noises about 3.0. See e.g. an early review, posted 8 months ago:
http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html.

I also want to remind folks that I've promised left and right that
post-3.0 we'll stick to the same backwards compatibility strategy that
we used for the 2.x series. No new incompatibilities. No new features
in 3.0.1 etc.; those go in 3.1, 3.2, etc.

The only compromise I'd be willing to make is that 3.1 can be rather
sooner than the typical 18-24 months cycle. But any API that exists in
3.0 will have to take the regular deprecation route, and if we start
having releases close together we should be careful to measure the
deprecation time in years, not releases.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Fri Dec  5 05:42:10 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 21:42:10 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com>
Message-ID: <aac2c7cb0812042042y237a8f1bm540507b17ca29341@mail.gmail.com>

On Thu, Dec 4, 2008 at 8:55 PM,  <glyph at divmod.com> wrote:
>
> On 02:32 am, rhamph at gmail.com wrote:
>>
>> On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote:
>
>>> FWIW, I still agree with Martin that that's the most reasonable solution.
>>
>> It died because nobody presented a viable solution, and I maintain no
>> solution is possible.  All suggestions involve arbitrary
>> transformations that fail to round trip correctly at some point or
>> another.  They're simply about shuffling the failure around to
>> somewhere the poster happens to like.
>>
>> Please, if you have a *new* idea that doesn't have a failure mode, by
>> all means post it.  But don't resurrect a pointless bikeshed.
>
> Despite my objection to the funny-encoding strategy (which I've documented
> thoroughly in my other message to this thread) this isn't accurate.  The PUA
> solution doesn't work, but using NUL does.  This was proposed last time, as
> a copy of what Mono does.  You can't get a NUL in os.environ or a filename;
> it's not valid.  So, it works fine as an escape character.  It can
> round-trip perfectly.

The failure is more subtle, in that a path from the filesystem cannot
round trip via a different return path.  i.e. list the dir via python,
pass it to an external lib to open.

If you don't need that to work it's quite easy to explicitly encode
byte strings into text for storage or whatever.


-- 
Adam Olsen, aka Rhamphoryncus

From barry at python.org  Fri Dec  5 06:07:53 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 5 Dec 2008 00:07:53 -0500
Subject: [Python-Dev] RELEASED Python 2.6.1
Message-ID: <6898A62C-3BA0-4EF1-BDB5-07B2961BF026@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hot on the heals of Python 3.0 comes the Python 2.6.1 bug-fix  
release.  This is the latest production-ready version in the Python  
2.6 family.  Dozens of issues have fixed since Python 2.6 final was  
released in October.  Please see the NEWS file for details:

     http://www.python.org/download/releases/2.6.1/NEWS.txt

For more information on Python 2.6 please see

     http://docs.python.org/dev/whatsnew/2.6.html

Source tarballs and Windows installers can be downloaded from the  
Python 2.6.1 page:

    http://www.python.org/download/releases/2.6.1/

Bugs can be reported in the Python bug tracker:

    http://bugs.python.org

Enjoy,
- -Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTi3KnEjvBPtnXfVAQLhQAP7BR8eqlVLDlu/bp2tGaRRQS8GW5X8KQQk
h0RwCcAKK19WH6YS6zH+VoIpD8LnD37YqZL3m5MQZ/rDf0o3e6152CZ6GJvWE+0i
6w0cSvDqdWuOpfUfpYR21eQnoFuC6x/yfI//yWCnu8bZCypjmJCLKZAvu4pMjYgD
ceChg4lLE68=
=u/iW
-----END PGP SIGNATURE-----

From guido at python.org  Fri Dec  5 06:14:39 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Dec 2008 21:14:39 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
Message-ID: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>

>> On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote:
>>> I'm in favour of a different, fifth solution:
>>>
>>> 5) represent all environment variables in Unicode strings,
>>>  including the ones that currently fail to decode.
>>>  (then do the same to file names, then drop the byte-oriented
>>>   file operations again)

> On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight <foom at fuhm.net> wrote:
[...]
>> FWIW, I still agree with Martin that that's the most reasonable solution.

On Thu, Dec 4, 2008 at 6:32 PM, Adam Olsen <rhamph at gmail.com> wrote:
> It died because nobody presented a viable solution, and I maintain no
> solution is possible.  All suggestions involve arbitrary
> transformations that fail to round trip correctly at some point or
> another.  They're simply about shuffling the failure around to
> somewhere the poster happens to like.
>
> Please, if you have a *new* idea that doesn't have a failure mode, by
> all means post it.  But don't resurrect a pointless bikeshed.

I don't like Martin's solution at all. Glyph's message nails the
problem -- the "funny encoding" solution breaks as soon as filenames
get passed to other components, and as that's what Python is often all
about, it's likely to happen all the time.

The simplest example I can think of is a program that prints a
directory listing to stdout -- printing the "funny" encoding to stdout
isn't going to be what users expect. So the program has to be aware of
the possibility of "funny" encoded filenames, and the roundtripping
isn't useful at all.

At the risk of bringing up something that was already rejected, let me
propose something that follows the path taken in 3.0 for filenames,
rather than doubling back:

For os.environ, os.getenv() and os.putenv(), I think a similar
approach as used for os.listdir() and os.getcwd() makes sense: let
os.environ skip variables whose name or value is undecodable, and have
a separate os.environb() which contains bytes; let os.getenv() and
os.putenv() do the right thing when the arguments passed in are bytes.

For sys.argv, because it's positional, you can't skip undecodable
values, so I propose to use error=replace for the decoding; again, we
can add sys.argvb that contains the raw bytes values. The various
os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
and the subprocess module) should all accept bytes as well as strings.

On Windows, the bytes APIs should probably not exist.

I predict that most developers can get away with not using the bytes
APIs at all. The small minority that needs to be robust if not all
filenames use the system encoding can use the bytes APIs. This would
be developers on various Unix systems except OSX (which uses UTF8 for
its filesystems), and perhaps the occasional developer on OSX whose
app needs to work with files on mounted filesystems that use a
different encoding.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From glyph at divmod.com  Fri Dec  5 06:40:46 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 05:40:46 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
Message-ID: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>


On 4 Dec, 07:12 pm, python at rcn.com wrote:
>The latter statement worries me.  It seems to unnecessarily undermine
>adoption of 3.0.  It essentially says, "don't use this".  Is that what 
>we want?

I think so.  The default case, the case of the user without the 
wherewithal to understand the nuances of the distinction between 2.x and 
3.x, is a user who should use 2.x.  If the user understands what's going 
on, they're not going to pay attention to such a notice anyway.  I think 
Barry did a great job phrasing this; the language in this comment has to 
be strong enough to counter the prevailing wisdom that "higher version 
number = better".  I think it did that without being overly negative.

For most users, especially new users who have yet to be impressed with 
Python's power, 2.x is much better.  It's not like "library support" is 
one small check-box on the language's feature sheet: most of the 
attractive things about Python are libraries.  Of course I am not free 
from bias, being the author of many libraries myself, but it was other 
libraries that drew me to Python in the first place.

If you're writing an application with 2.x, you get GTK, Qt, PyGame, PIL, 
NumPy, and of course the wonderful Twisted.  With 3.0, you get... 
Tkinter, and ... pywin32, I guess, although I can't find the download on 
sourceforge?  A fork of django that "just barely works"?  A "half 
broken" email module in the stdlib?  All things which you can *also* get 
on 2.x, modulo the "barely works" and "half broken".

If you're writing a library, even if you intend to support py3 as a 
platform on day one, you could reach a much wider audience by simply 
writing in 2to3-friendly style and releasing 2.x source.  Writing a 
3.x-only library will artificially limit your audience and make it much 
harder to combine your library with *other* useful Python libraries 
which have not yet been ported.  There's no 3to2 yet, and maybe there 
never will be.  ("py3to2" looks like an interesting project, but seems 
to be misleadingly named, since I don't think it will help you run your 
3.x-source programs on a stock 2.x VM).

The third (albeit much less likely) option is that you're learning 
Python to learn to interact with a system that's scriptable in embedded 
Python, like Blender or Gimp.  I don't think there's a single system of 
that variety which uses 3.0 yet, and these will likely be even slower to 
move than libraries.  So if the user downloads Python 3 and the 
accompanying tutorial they're likely to be confused when they try to use 
their newly-acquired knowledge to script the tool in question.

Of course, in the long term, maintenance for 2.x is going away and we 
are all being gently herded to 3.x.  Aren't the things I just talked 
about the reason for the continued maintenance of 2.x, though?

It makes sense to talk about 3.1 and beyond, because that points to some 
continuity with 3.0.  It doesn't make sense to say "don't use it", but 
it does make sense to say "use it to get ready for the eventual 
direction of the language".  For example, my experience so far suggests 
that the only motion on Twisted towards 3.x during the 3.0.x/2.6.x cycle 
will be us reporting bugs in 2to3 and in the new version of the stdlib. 
3.1 is likely to be the first version we could realistically target.  I 
am sure that many other libraries are in a similar situation, since 2to3 
has not yet been exposed to a wide variety of ugly, real-world code, and 
nobody's maintaining an #ifdef'd up C extension module yet.  By the time 
3.1 rolls around, we will all know how this migration strategy is really 
working out, and will be able to predict the likely migration timetable 
for various libraries with some degree of accuracy.

From rhamph at gmail.com  Fri Dec  5 06:46:14 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 4 Dec 2008 22:46:14 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
Message-ID: <aac2c7cb0812042146lcea6107v7f5a48a29bdd1891@mail.gmail.com>

On Thu, Dec 4, 2008 at 10:14 PM, Guido van Rossum <guido at python.org> wrote:
> At the risk of bringing up something that was already rejected, let me
> propose something that follows the path taken in 3.0 for filenames,
> rather than doubling back:
>
> For os.environ, os.getenv() and os.putenv(), I think a similar
> approach as used for os.listdir() and os.getcwd() makes sense: let
> os.environ skip variables whose name or value is undecodable, and have
> a separate os.environb() which contains bytes; let os.getenv() and
> os.putenv() do the right thing when the arguments passed in are bytes.

+1 (as that's what I suggested)


> For sys.argv, because it's positional, you can't skip undecodable
> values, so I propose to use error=replace for the decoding; again, we
> can add sys.argvb that contains the raw bytes values. The various
> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
> and the subprocess module) should all accept bytes as well as strings.

+1.  I wish there was a better solution to sys.argv.


> On Windows, the bytes APIs should probably not exist.

-0.  I'd prefer byte APIs return UTF-16 bytes and the unicode APIs
become validating.


> I predict that most developers can get away with not using the bytes
> APIs at all. The small minority that needs to be robust if not all
> filenames use the system encoding can use the bytes APIs. This would
> be developers on various Unix systems except OSX (which uses UTF8 for
> its filesystems), and perhaps the occasional developer on OSX whose
> app needs to work with files on mounted filesystems that use a
> different encoding.


-- 
Adam Olsen, aka Rhamphoryncus

From guido at python.org  Fri Dec  5 07:05:05 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Dec 2008 22:05:05 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
Message-ID: <ca471dc20812042205q72fcebd1xd8e6e589c06bc3a1@mail.gmail.com>

On Thu, Dec 4, 2008 at 9:40 PM,  <glyph at divmod.com> wrote:
> The default case, the case of the user without the wherewithal
> to understand the nuances of the distinction between 2.x and 3.x, is a user
> who should use 2.x.

Not at all clear. If they're not sensitive to those nuances it's just
as likely that they're a casual developer (e.g. a student just
learning to program). Such users are unlikely to start using major 3rd
party packages like Twisted or Django, which would be completely
overwhelming to someone just learning. As shown in
http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html, Python 3.0
removes quite a few warts that are likely to trip up learners.

Once they are ready (probably under the wings of some guru) to dive
deeper, they may have to learn about 2.6 and how it differs -- that's
a useful exercise by itself, but if I'm right, most learners won't
have to go there because by the time they get to that point, the 3.0
ecosystem has matured enough to support their needs.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri Dec  5 07:06:17 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 4 Dec 2008 22:06:17 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812042146lcea6107v7f5a48a29bdd1891@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<aac2c7cb0812042146lcea6107v7f5a48a29bdd1891@mail.gmail.com>
Message-ID: <ca471dc20812042206g6faedbb0p5ea30472a722a380@mail.gmail.com>

On Thu, Dec 4, 2008 at 9:46 PM, Adam Olsen <rhamph at gmail.com> wrote:
> On Thu, Dec 4, 2008 at 10:14 PM, Guido van Rossum <guido at python.org> wrote:
>> On Windows, the bytes APIs should probably not exist.
>
> -0.  I'd prefer byte APIs return UTF-16 bytes and the unicode APIs
> become validating.

-1 on UTF-16 bytes, as this seems extremely useless and confusing to me.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Fri Dec  5 07:55:29 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 07:55:29 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
Message-ID: <4938D061.2020105@v.loewis.de>

> Let's bring out all the same arguments, come to no conclusion, and let
> it taper off unresolved, yet again! :)

This time, it will be different. I will write a PEP, and will request
that anybody proposing an alternative solution also write a PEP (and
no change is made to the code before the PEPs have been fully specified,
discussed, and a BDFL pronouncement has been made).

Regards,
Martin

From martin at v.loewis.de  Fri Dec  5 08:00:35 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 08:00:35 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>	
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
Message-ID: <4938D193.4080608@v.loewis.de>

> Please, if you have a *new* idea that doesn't have a failure mode, by
> all means post it.  But don't resurrect a pointless bikeshed.

While I completely agree that it is pointless to reiterate the same
arguments over and over, I disagree that the bikeshed metapher applies.
This metapher (IIUC) describes a trivial design issue that is merely
a matter of taste, rather than having deep technical implications.
Using Unicode or bytes for strings is not of that kind.

Regards,
Martin

From glyph at divmod.com  Fri Dec  5 08:27:05 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 07:27:05 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
Message-ID: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>

On 04:16 am, guido at python.org wrote:
>I hear some folks are considering advertising 3.0 as experimental or
>not ready for serious use yet.

With all due respect, for me, "library support" and "serious use" are 
synonymous.  When prompted I would say that 2.5 is probably the version 
that a new Python user should use.  It's what's already installed on 
their Mac or their Ubuntu box, and it's easiest to get libraries for. 
I've already said in my other note why I think the python website should 
say the same.

Speaking of respect, out of respect for all of you folks I have 
refrained from shouting this opinion from the rooftops.  I have avoided 
blogging about it, I've kept all my public feedback on this list, and I 
plan to continue saying nothing (elsewhere) until I have something nice 
to say.  (The occasional snide comment on IRC notwithstanding.)

That doesn't mean I'm going to tell people who have real problems to 
solve to mess around trying out 3.0, just to see if it has the library 
support that they need, when I already know that it doesn't.  Sorry, but 
community spirit only goes so far: when people ask for my 
recommendation, I'm going to tell the truth.

For example, I recently helped my sister do some work that involved 
running a Fourier transform over a large amount of data.  Doing this 
with python 2.5 took only a few minutes (numpy apparently preinstalled 
on leopard!); much faster than trying to debug the obscure errors she 
was getting out of Fortran.  Doing it with Python 3.0 would have been an 
exercise in frustration (no numpy yet at all), and even 2.6 would have 
been a pain (download, compile, install, get numpy, compile, install, 
etc etc).  If python 3.0 had for some reason *been* the preinstalled 
version, we would have needed to download 2.6 or 2.5.  For this reason I 
don't want to encourage the upstream, in this case Apple, to consider 
3.0 "ready" yet either.  2.x is still a necessity, even if they want to 
start shipping 3.0 soon.

In my experience this is an entirely typical usage of Python.  I know 
very few people who have learned the language for its own sake (and in 
fact, the two I can think of right now have long since switched to 
Haskell); it's almost always for this or that library.  In the cases 
where it is for the language itself, the conversation almost always 
begins, "Hey, I've been thinking about learning Python.  Can it do 
$TASK?".  If the answer is (as it often is) "Sure, just use Py$TASK" 
then they're immediately sold.  If not, "learn python" remains one of 
their never-done back-burner projects like "clean out the garage".  Even 
in my own case, I learned Python because it was easier to write GTK+ 
programs in than C; Java's GUI libraries having been demonstrated 
deficient, I wanted something better.  The networking stuff was a side- 
effect.

Given that this is my typical experience of Python introductions (of 
which I have done quite a few), until a majority of Py$TASK for $TASKs 
that I'm interested in have been ported to py3, then even in the 
abstract, py3 remains "experimental" and "not ready for serious use".

That's not the same thing as "bad":
>IMO it's as solid as 2.6. (2.6.1 being more solid, of course, as will 
>be 3.0.1).

I have not heard anyone saying that 3.0 is flaky, broken, or "beta".  I 
certainly haven't said that, or even thought it.  Library support is 
_the_ problem.
>Especially from the education front I've heard a lot of positive
>noises about 3.0. See e.g. an early review, posted 8 months ago:
>http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html.

To be fair, if someone asked me specifically about educating non- 
programmer adults about programming, I would probably at least *mention* 
py3, if not recommend it outright.  The improved consistency is worth a 
lot in an educational setting.  (But, if one is educating children and 
interested in soliciting their genuine enthusiasm, whiz-bang graphics 
are really a must-have, not a negotiable extra.)

Note, however, that even this paper specifically mentions several 
libraries which must be available, or they will have to "abandon these 
examples entirely or (reluctantly) delay adoption of version 3.0".  I 
hope for Mr. Efford's sake that these libraries will all become 
available shortly.  They have all taken steps to produce 3.0-compatible 
versions.  However, none are available today, making it still a 
difficult choice to use 3 rather than 2.

From martin at v.loewis.de  Fri Dec  5 08:26:03 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 08:26:03 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<20081204123750.GA890@amk.local>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
Message-ID: <4938D78B.6010406@v.loewis.de>

> Here's a bright idea.  On the 3.0 release page, include a box listing
> which major third-party apps have been converted.  Update it
> once every couple of weeks.  That way, we're not explicitly
> discouraging adoption of 3.0, we're just listing what support is
> then currently available (if you need twisted and its not on the list,
> then that would be your guide).

As a slight variation: that should be a wiki page (or, as AMK suggests,
a weblog). The release page should link to it.

If maintenance of this list was in the hands of a single person (the
release manager), or a few (the pydotorg editors), it would always
be outdated.

FWIW, there is also the py3 category in PyPI:

http://pypi.python.org/pypi?:action=browse&c=533

Regads,
Martin

From martin at v.loewis.de  Fri Dec  5 08:27:53 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 05 Dec 2008 08:27:53 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<20081204123750.GA890@amk.local>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
Message-ID: <4938D7F9.80908@v.loewis.de>

> I agree, this would be ideal.  I'm not sure the metadata is there to
> support it, though.

There is. There have been the following trove classifiers defined for
a few weeks now:

Programming Language :: Python :: 2
Programming Language :: Python :: 2.3
Programming Language :: Python :: 2.4
Programming Language :: Python :: 2.5
Programming Language :: Python :: 2.6
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.0
Programming Language :: Python :: 3.1

Regards,
Martin

From fdrake at acm.org  Fri Dec  5 08:40:20 2008
From: fdrake at acm.org (Fred Drake)
Date: Fri, 05 Dec 2008 02:40:20 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <4938D7F9.80908@v.loewis.de>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
	<4938D7F9.80908@v.loewis.de>
Message-ID: <F46AC351-ACAF-471E-846A-5BD17F8F103F@acm.org>

On Dec 5, 2008, at 2:27 AM, Martin v. L?wis wrote:
> There is. There have been the following trove classifiers defined for
> a few weeks now:


Wonderful!  Thanks for clueing me in.  I'll update my projects to use  
those in future releases.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From glyph at divmod.com  Fri Dec  5 08:58:30 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 05 Dec 2008 07:58:30 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812042205q72fcebd1xd8e6e589c06bc3a1@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
	<ca471dc20812042205q72fcebd1xd8e6e589c06bc3a1@mail.gmail.com>
Message-ID: <20081205075830.12555.1834157056.divmod.xquotient.1370@weber.divmod.com>


On 06:05 am, guido at python.org wrote:
>On Thu, Dec 4, 2008 at 9:40 PM,  <glyph at divmod.com> wrote:
>>The default case, the case of the user without the wherewithal
>>to understand the nuances of the distinction between 2.x and 3.x, is a 
>>user
>>who should use 2.x.
>
>Not at all clear. If they're not sensitive to those nuances it's just
>as likely that they're a casual developer (e.g. a student just
>learning to program). Such users are unlikely to start using major 3rd
>party packages like Twisted or Django, which would be completely
>overwhelming to someone just learning. As shown in
>http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html, Python 3.0
>removes quite a few warts that are likely to trip up learners.
>
>Once they are ready (probably under the wings of some guru) to dive
>deeper, they may have to learn about 2.6 and how it differs -- that's
>a useful exercise by itself, but if I'm right, most learners won't
>have to go there because by the time they get to that point, the 3.0
>ecosystem has matured enough to support their needs.

Well, ultimately the way you want to position this is your decision, but 
you haven't convinced me.  My experience of casual developers suggests 
that they are _extremely_ sensitive to such nuances.  Library support is 
a big one, but even bigger than that is the reporting of errors when 
mismatched versions don't work together.  Are they going to understand 
that 3.0 and 2.6 are actually different languages, or are they just 
going to think that something's broken when they double-click on a .pyw 
file they got from some random python 2.x tutorial, with python 3 for 
windows installed?

My interest is not hypothetical.  I am trying to avoid hearing someone 
say this to me: "Oh yeah, Python, I tried that, but it didn't work.  I 
use Visual Basic now and it's pretty good.  It has good graphics."

This type of confusion will persist for years.  It will probably be 
worst at the point where both versions are enjoying equal popularity, 
but at least by then all the tutorials and tools will loudly say "python 
TWO" or "python THREE" on them.  At least now, at the outset, it is 
pretty clear what direction the confusatron's going to tilt in.

From steve at holdenweb.com  Fri Dec  5 09:06:03 2008
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 05 Dec 2008 03:06:03 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938D193.4080608@v.loewis.de>
References: <4938374B.8000006@gmail.com>
	<49386A2C.60208@v.loewis.de>		<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<4938D193.4080608@v.loewis.de>
Message-ID: <ghancu$bl1$1@ger.gmane.org>

Martin v. L?wis wrote:
>> Please, if you have a *new* idea that doesn't have a failure mode, by
>> all means post it.  But don't resurrect a pointless bikeshed.
> 
> While I completely agree that it is pointless to reiterate the same
> arguments over and over, I disagree that the bikeshed metapher applies.
> This metapher (IIUC) describes a trivial design issue that is merely
> a matter of taste, rather than having deep technical implications.
> Using Unicode or bytes for strings is not of that kind.
> 
+1

These issues are very important because they affect everyone. Even
though very few people actually understand them. Including me, which is
why I've been so quiet on this thread.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From g.brandl at gmx.net  Fri Dec  5 09:21:04 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 05 Dec 2008 09:21:04 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <AF6A07DF-C986-4FDE-AE9A-7B679F78A76F@python.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com>	<46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org>	<85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com>	<4937B80D.9070309@gmail.com>	<gh8j4g$ol6$1@ger.gmane.org>	<gh8vh2$638$1@ger.gmane.org>
	<4938660E.9080809@v.loewis.de>
	<AF6A07DF-C986-4FDE-AE9A-7B679F78A76F@python.org>
Message-ID: <ghaoai$gh7$1@ger.gmane.org>

Barry Warsaw schrieb:
> On Dec 4, 2008, at 6:21 PM, Martin v. L?wis wrote:
> 
>>>> I can't find any docs built for Python 3.0 (not 3.1a0).
>>>
>>> The Windows installation has new 3.0 doc dated Dec 3, so it was  
>>> built,
>>> just not posted correctly.
> 
>> That doesn't mean very much. I built it on my local machine. Anybody
>> with subversion and python could do that; the documentation is in
>> subversion.
> 
>> Whether or not it appears on the web site as part of the release
>> process is an entirely different matter. It used to be that the
>> doc maintainer (Fred Drake) was part of the release team and release
>> process. I think Georg is complaining that he is release maintainer,
>> but not part of the release process.
> 
> I've asked Georg to update PEP 101 to make his role as Documentation  
> Expert explicit.  Unfortunately we only debug major releases once (or  
> twice) every 18 months.  But next time, we'll get that part right for  
> sure!

Done that now. Since release.py builds the docs all right, there's not
much left for me to do except check that everything is ok.

> In the meantime, I'll make sure Georg is involved in point releases  
> moving forward.

That's good. Thanks!

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From rhamph at gmail.com  Fri Dec  5 09:23:13 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 5 Dec 2008 01:23:13 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938D193.4080608@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<4938D193.4080608@v.loewis.de>
Message-ID: <aac2c7cb0812050023p5084cef8wf4a4ee275e4c2e6d@mail.gmail.com>

On Fri, Dec 5, 2008 at 12:00 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Please, if you have a *new* idea that doesn't have a failure mode, by
>> all means post it.  But don't resurrect a pointless bikeshed.
>
> While I completely agree that it is pointless to reiterate the same
> arguments over and over, I disagree that the bikeshed metapher applies.
> This metapher (IIUC) describes a trivial design issue that is merely
> a matter of taste, rather than having deep technical implications.
> Using Unicode or bytes for strings is not of that kind.

That we need to support both unicode and bytes is important, but
already seems to have consensus.  However, they present two distinct
usage patterns:

* unicode text, presentable to the user, interacts with all manor of
standardized APIs
* bytes, limited to local, internal use.  Only approximated forms can
be presented to the user, only custom formats can be saved externally

None of the proposals have turned these into a single use case.  All
they do is trade off various forms of subtly switch back and forth,
which leads to failure.  Debating which subtle failure is better is a
bikeshed.

Not only that, but we already have a solution that makes the choice
explicit, avoiding the subtle failure.  This is the solution already
in use for os file & path functions.  It's the solution Guido
supports.


-- 
Adam Olsen, aka Rhamphoryncus

From ncoghlan at gmail.com  Fri Dec  5 10:01:08 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Dec 2008 19:01:08 +1000
Subject: [Python-Dev] Taint Mode in Python 3.0
In-Reply-To: <693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com>
References: <200812041836.48146.nicole@cats-muvva.net>	<e27efe130812041529x72d900f8xcb62cd5d8b48bd27@mail.gmail.com>
	<693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com>
Message-ID: <4938EDD4.5000001@gmail.com>

Maciej Fijalkowski wrote:
> Hello,
> 
> The thing is pypy's taint code is broken. Basically you don't only
> need to patch all places that return pyobject, but also all places
> that might modify anything. (All side effects) For example innocently
> looking call to addition might end up calling arbitrary python code
> (and have arbitrary side effects). There is a question how do you
> approach such things?

Taint isn't an easy problem, but PyPy is still a *much* better platform
for that kind of experimentation than CPython.

RPython, objects spaces, the code generation, etc all give you much more
powerful tools to play with than the raw C code of the reference
interpreter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Dec  5 10:21:32 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 05 Dec 2008 19:21:32 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com>
References: <4938374B.8000006@gmail.com>
	<49386A2C.60208@v.loewis.de>	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>	<gha2dq$u0m$1@ger.gmane.org>
	<20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com>
Message-ID: <4938F29C.4050706@gmail.com>

glyph at divmod.com wrote:
> At least this time I think I've encapsulated pretty much my entire
> argument here, so if you don't buy it, we can probably just agree to
> disagree :).

Glyph, the only point I would add to your message is this one:

Adding a "blessed" way to encode arbitrary binary data into a Python 3.0
str object strikes me as giving up on one of the key advances in the new
version of the language.

8-bit strings were a problem in Python 2.x because they blurred the
boundary between arbitrary binary data and ASCII or latin-1 character data.

One of the most interesting aspects of Python 3.0 is its attempt to get
developers to be explicit about this distinction (both in the code and
in their own minds) by enforcing separation between arbitrary binary
data (held in bytes and bytearray instances) and character data (held in
str instances).

I don't understand how tunneling arbitrary binary data through str
instances (*regardless* of encoding mechanism) can possibly fail to
recreate exactly the same "is it text or binary data?" ambiguity
problems that the str/bytes split is intended to eliminate. And if that
happens, then what exactly was the point in moving to an all Unicode
string model for Py3k?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From victor.stinner at haypocalc.com  Fri Dec  5 10:43:10 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 5 Dec 2008 10:43:10 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <49386A2C.60208@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
Message-ID: <200812051043.10938.victor.stinner@haypocalc.com>

Le Friday 05 December 2008 00:39:24 Martin v. L?wis, vous avez ?crit?:
> 5) represent all environment variables in Unicode strings,
>    including the ones that currently fail to decode.
>    (then do the same to file names, then drop the byte-oriented
>     file operations again)

Please, don't do that! Bytes are not characters!

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From eckhardt at satorlaser.com  Fri Dec  5 10:35:50 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 5 Dec 2008 10:35:50 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com>
References: <4938374B.8000006@gmail.com> <gha2dq$u0m$1@ger.gmane.org> 
	<20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com>
Message-ID: <200812051035.50493.eckhardt@satorlaser.com>

On Friday 05 December 2008, glyph at divmod.com wrote:
> Filenames and environment variables would all need to be encoded or
> decoded according to this magic encoding.

Those, and commandline arguments, too.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From eckhardt at satorlaser.com  Fri Dec  5 10:41:05 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 5 Dec 2008 10:41:05 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812041947p47324ffav3c86e661905aa8d1@mail.gmail.com>
References: <4938374B.8000006@gmail.com> 
	<350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com>
	<aac2c7cb0812041947p47324ffav3c86e661905aa8d1@mail.gmail.com>
Message-ID: <200812051041.05992.eckhardt@satorlaser.com>

On Friday 05 December 2008, Adam Olsen wrote:
> Many of the windows APIs use UTF-16 without validating it.  They'll
> pass through invalid strings until they hit something that does
> validate, at which point it'll blow up.
>
> I suspect that it doesn't happen very often in practice, as having
> only one encoding makes it quite clear that it's a broken file name,
> not a mixed encoding environment.

Actually, I wouldn't say that's a problem at all. The point is that stuff that 
is blissfully unaware of encodings typically uses some ASCII-de(p)rived text. 
Those char-strings are translated according to the current locale, which then 
does the filtering and validation. The result may be gibberish (GIGO 
principle) but at least it's UTF-16 gibberish. ;)

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From victor.stinner at haypocalc.com  Fri Dec  5 11:18:48 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 5 Dec 2008 11:18:48 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4938374B.8000006@gmail.com>
References: <4938374B.8000006@gmail.com>
Message-ID: <200812051118.48096.victor.stinner@haypocalc.com>

Hi,

Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez ?crit?:
> I opened up bug http://bugs.python.org/issue4006 a while ago and it was
> suggested in the report that it's not a bug but a feature and so I
> should come here to see about getting the feature changed :-)

Yeah, I prefer to discuss such changes on the mailing list.

> These mixed encodings can occur for a variety of reasons.  Here's an
> example that isn't too contrived :-)
> (...)
> Furthermore, they don't want to suffer from the space loss of using 
> utf-8 to encode Japanese so they use shift-jis everywhere.

"space loss"? Really? If you configure your server correctly, you should get 
UTF-8 even if the file system is Shift-JIS. But it would be much easier to 
use UTF-8 everywhere.

Hum... I don't think that the discussion is about one specific server, but the 
lack of bytes environment variables in Python3 :-)

> 1) return mixed unicode and byte types in ...

NO!

> 2) return only byte types in os.environ

Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and 
Python3 already use Unicode everywhere (input(), open(), filenames, ...).

> 3) silently ignore non-decodable value when accessing os.environ['PATH']
> as we do now but allow access to the full information via
> os.environ[b'PATH'] and os.getenvb()

I don't like os.environ[b'PATH']. I prefer to always get the same result 
type... But os.listdir() doesn't respect that :-(

   os.listdir(str) -> list of str
   os.listdir(bytes) -> list of bytes

I would prefer a similar API for easier migration from Python2/Python3
(unicode). os.environb sounds like the best choice for me.


But they are open questions (already asked in the bug tracker):

(a) Should os.environ be updated if os.environb is changed? If yes, how?
   os.environb['PATH'] = '\xff' (or any invalid string in the system 
                                 default encoding)
   => os.environ['PATH'] = ???

(b) Should os.environb be updated if os.environ is changed? If yes, how?

The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset 
are unable to encode the whole Unicode charset (eg. codes >= 65535).

   os.environ['PATH'] = chr(0x10000)
   => os.environb['PATH'] = ???

(c) Same question when a key is deleted (del os.environ['PATH']).

If Python 3.1 will have os.environ and os.environb, I'm quite sure that some 
modules will user os.environ and other will prefer os.environb. If both 
environments are differents, the two modules set will work differently :-/

It would be maybe easier if os.environ supports bytes and unicode keys. But we 
have to keep these assertions:
   os.environ[bytes] -> bytes
   os.environ[str] -> str

> 4) raise an exception when non-decodable values are *accessed* and
> continue as in #3.

I like os.listdir() behaviour: just *ignore* non-decodable files. If you 
really want to access these files, use a bytes directory name ;-)

> I think that the ease of debugging is lost when we silently ignore an error.

Guido gave a good example. If your directory contains an non decodable 
filename (eg. "???.txt"): glob('*.py') will fail because of the evil 
filename. With the current behaviour, you're unable to list all files but 
glob('*.py') will list all Python scripts!

And Python3 is released, it's maybe a bad idea to change the behaviour (of 
os.environ) in Python 3.1 :-/

> The bug report I opened suggests creating a PEP to address this issue.

Please, try to answer to my questions about os.environ and os.environb 
consistency.

I also like bytes environment variables. I need them for my fuzzing program. 
The lack of bytes variables is a regression from Python2 (for my program). On 
UNIX, filenames are bytes and the environment variables are bytes. For the 
best interoperability, Python3 should support bytes. But the default choice 
should always be characters (unicode) and to never mix the bytes and str 
types ;-)

---

As usual, it goes faster if someone writes a patch :-) I could try to work on 
it.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From eckhardt at satorlaser.com  Fri Dec  5 11:27:35 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 5 Dec 2008 11:27:35 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
References: <4938374B.8000006@gmail.com> 
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com> 
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
Message-ID: <200812051127.35880.eckhardt@satorlaser.com>

On Friday 05 December 2008, Guido van Rossum wrote:
> At the risk of bringing up something that was already rejected, let me
> propose something that follows the path taken in 3.0 for filenames,
> rather than doubling back:
>
> For os.environ, os.getenv() and os.putenv(), I think a similar
> approach as used for os.listdir() and os.getcwd() makes sense: let
> os.environ skip variables whose name or value is undecodable, and have
> a separate os.environb() which contains bytes; let os.getenv() and
> os.putenv() do the right thing when the arguments passed in are bytes.
>
> For sys.argv, because it's positional, you can't skip undecodable
> values, so I propose to use error=replace for the decoding; again, we
> can add sys.argvb that contains the raw bytes values. The various
> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
> and the subprocess module) should all accept bytes as well as strings.
>
> On Windows, the bytes APIs should probably not exist.
>
> I predict that most developers can get away with not using the bytes
> APIs at all. The small minority that needs to be robust if not all
> filenames use the system encoding can use the bytes APIs.

I know some of those developers, you can contact them via 
python-dev at python.org. Seriously, what would you suggest to someone that 
wants to handle paths in a portable way? Using the Unicode variants of 
functions is fubar, because encoding/decoding is not universally possible. 
Using the byte variant is equally fubar, because e.g. on MS Windows it is not 
supported, except through a very lossy roundtrip through the locale's 
codepage, limiting your functionality.

I actually think it is about time to give up on trying to think about a path 
as a string. Dito for data received from os.environ or sys.argv. There are 
only very few things that are universal to them and a reliable encoding is 
none of them. Then, once you have let that idea go, meditate a bit over the 
Zen.

What I propose is that paths must be treated as OS-specific, with the only 
common reliable operations being joining them, concatenating them and 
splitting them into segments divided by the (again, OS-specific) separator. 
Other operations, like e.g. appending a string or converting it to a string 
in order to display it can fail. And if they fail, they should fail noisily. 
In 99% of all cases, using the default encoding will work and do what people 
expect, which is why I would make this conversion automatic. In all other 
cases, it will at least not fail silently (which would lead to garbage and 
data loss) and allow more sophisticated applications to handle it.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From Fabien.Bouleau at ses-engineering.com  Fri Dec  5 11:42:08 2008
From: Fabien.Bouleau at ses-engineering.com (Fabien.Bouleau at ses-engineering.com)
Date: Fri, 5 Dec 2008 11:42:08 +0100
Subject: [Python-Dev] Fix for frame_setlineno() in frameobject.c function
Message-ID: <OF12E2E14D.9724EDC8-ONC1257516.003A14C8-C1257516.003ABE8B@LocalDomain>

Hello,

This concerns a known bug in the frame_setlineno() function for Python 
2.5.x and 2.6.x (maybe in earlier version too). It is not possible to use 
this function when the address or line offset are greater than 127. The 
problem comes from the lnotab variable which is typed char*, therefore 
implicitely signed char*. Any value above 127 becomes a negative number.

The fix is very simple (applied on the Python 2.6.1 version of the source 
code):

--- frameobject.c       Thu Oct 02 19:39:50 2008
+++ frameobject_fixed.c Fri Dec 05 11:27:42 2008
@@ -119,8 +119,8 @@
        line = f->f_code->co_firstlineno;
        new_lasti = -1;
        for (offset = 0; offset < lnotab_len; offset += 2) {
-               addr += lnotab[offset];
-               line += lnotab[offset+1];
+               addr += ((unsigned char*)lnotab)[offset];
+               line += ((unsigned char*)lnotab)[offset+1];
                if (line >= new_lineno) {
                        new_lasti = addr;
                        new_lineno = line;


It would be nice to fix it for Python 2.5 and above, in order to have a 
proper MSI installer for Windows.

Best regards,
Fabien Bouleau



DISCLAIMER: 
This e-mail contains proprietary information some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail.

From LambertDW at Corning.com  Fri Dec  5 10:40:16 2008
From: LambertDW at Corning.com (Lambert, David W (S&T))
Date: Fri, 05 Dec 2008 04:40:16 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final  FFT
Message-ID: <84B204FFB016BA4984227335D8257FBA5A3860@CVCV0XI05.na.corning.com>

http://code.activestate.com/recipes/576550/ 

This recipe shows how to use gsl FFT with python 3.

ctypes is really good!

From exarkun at divmod.com  Fri Dec  5 13:30:59 2008
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 5 Dec 2008 07:30:59 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812042205q72fcebd1xd8e6e589c06bc3a1@mail.gmail.com>
Message-ID: <20081205123059.20272.808184471.divmod.quotient.16127@ohm>

On Thu, 4 Dec 2008 22:05:05 -0800, Guido van Rossum <guido at python.org> wrote:
>On Thu, Dec 4, 2008 at 9:40 PM,  <glyph at divmod.com> wrote:
>> The default case, the case of the user without the wherewithal
>> to understand the nuances of the distinction between 2.x and 3.x, is a user
>> who should use 2.x.
>
>Not at all clear. If they're not sensitive to those nuances it's just
>as likely that they're a casual developer (e.g. a student just
>learning to program). Such users are unlikely to start using major 3rd
>party packages like Twisted or Django, which would be completely
>overwhelming to someone just learning.

That seems like it would be right to me, but two or three times a month
someone shows up in the Twisted IRC channel who is learning both Python
and Twisted at the same time.  So apparently there are a lot of people
for whom this isn't overwhelming.

Jean-Paul

From eduardo.padoan at gmail.com  Fri Dec  5 13:38:36 2008
From: eduardo.padoan at gmail.com (Eduardo O. Padoan)
Date: Fri, 5 Dec 2008 10:38:36 -0200
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205023514.GA1723@amk.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
Message-ID: <dea92f560812050438x55610feevd421a8a432f3818c@mail.gmail.com>

On Fri, Dec 5, 2008 at 12:35 AM, A.M. Kuchling <amk at amk.ca> wrote:
> On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote:
>> Here's a bright idea.  On the 3.0 release page, include a box listing
>> which major third-party apps have been converted.  Update it
>> once every couple of weeks.  That way, we're not explicitly
>
> That's an excellent idea.  We could have a webpage, or start a
> topic-specific weblog for posting announcements.
>
> I've started a draft of a 3.0 FAQ in the wiki at
> <http://wiki.python.org/moin/Python3000/FAQ>.  Once it's finished we
> can move it into the 3.0 release pages.  Everyone please edit and
> improve it!

Sometime ago I started a page on the wiki to collect reports of early
migrations by the community:
http://wiki.python.org/moin/Early2to3Migrations

Maybe this would be relevant to point on the FAQ.

> --amk
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/eduardo.padoan%40gmail.com
>



-- 
    Eduardo de Oliveira Padoan
http://djangopeople.net/edcrypt/
"Distrust those in whom the desire to punish is strong." -- Goethe,
Nietzsche, Dostoevsky

From musiccomposition at gmail.com  Fri Dec  5 13:51:33 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 5 Dec 2008 06:51:33 -0600
Subject: [Python-Dev] Fix for frame_setlineno() in frameobject.c function
In-Reply-To: <OF12E2E14D.9724EDC8-ONC1257516.003A14C8-C1257516.003ABE8B@LocalDomain>
References: <OF12E2E14D.9724EDC8-ONC1257516.003A14C8-C1257516.003ABE8B@LocalDomain>
Message-ID: <1afaf6160812050451l286b5f6bw9332bc3ade886926@mail.gmail.com>

Hi,
Please post this on the issue tracker. http://bugs.python.org

On Fri, Dec 5, 2008 at 4:42 AM,  <Fabien.Bouleau at ses-engineering.com> wrote:
> Hello,
>
> This concerns a known bug in the frame_setlineno() function for Python
> 2.5.x and 2.6.x (maybe in earlier version too). It is not possible to use
> this function when the address or line offset are greater than 127. The
> problem comes from the lnotab variable which is typed char*, therefore
> implicitely signed char*. Any value above 127 becomes a negative number.
>
> The fix is very simple (applied on the Python 2.6.1 version of the source
> code):
>
> --- frameobject.c       Thu Oct 02 19:39:50 2008
> +++ frameobject_fixed.c Fri Dec 05 11:27:42 2008
> @@ -119,8 +119,8 @@
>        line = f->f_code->co_firstlineno;
>        new_lasti = -1;
>        for (offset = 0; offset < lnotab_len; offset += 2) {
> -               addr += lnotab[offset];
> -               line += lnotab[offset+1];
> +               addr += ((unsigned char*)lnotab)[offset];
> +               line += ((unsigned char*)lnotab)[offset+1];
>                if (line >= new_lineno) {
>                        new_lasti = addr;
>                        new_lineno = line;
>




-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From foom at fuhm.net  Fri Dec  5 15:27:37 2008
From: foom at fuhm.net (James Y Knight)
Date: Fri, 5 Dec 2008 09:27:37 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
Message-ID: <0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net>

On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote:
> Using the byte variant is equally fubar, because e.g. on MS Windows  
> it is not
> supported, except through a very lossy roundtrip through the locale's
> codepage, limiting your functionality.


Yeah, IMO whole mess could have been avoided by keeping the filename/ 
args/environ simply *bytes*, like it really is, on unix. Then, make  
the Windows version of python use (always! not dependent upon locale!)  
utf-8 to decode the utf-8 bytestring to the UTF-16 that the Windows  
platform APIs expect (and vice versa). And never use the ASCII variant  
of the windows APIs.

This would mean that all *inputs* would succeed, but some *outputs*  
would not, on Windows. But that's not a new kind of failure: NUL has  
never been allowed in argv/environ, and filenames have all sorts of  
platform-dependent restrictions.

But unfortunately, it's too late for that solution...

James

From a.badger at gmail.com  Fri Dec  5 16:06:06 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 07:06:06 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <gh9vdu$n0u$1@ger.gmane.org>
References: <4938374B.8000006@gmail.com>
	<gh9jdk$i48$1@ger.gmane.org>	<49385EED.9040004@gmail.com>
	<gh9vdu$n0u$1@ger.gmane.org>
Message-ID: <4939435E.3020103@gmail.com>

Terry Reedy wrote:
> Toshio Kuratomi wrote:
>>
>>> I would think life would be ultimately easier if either the file server
>>> or the shell server automatically translated file names from jis and
>>> utf8 and back, so that the PATH on the *nix shell server is entirely
>>> utf8.
>>
>> This is not possible because no part of the computer knows what the
>> encoding is.  To the computer, it's just a sequence of bytes.  Unlike
>> xml or the windows filesystem (winfs? ntfs?) where the encoding is
>> specified as part of the document/filesystem there's nothing to tell
>> what encoding the filenames are in.
> 
> I thought you said that the file server keep all filenames in shift-jis,
> and the shell server all in utf-8.

Yes.  But this is part of the setup of the example to keep things
simple.  The fileserver or shell server could themselves be of mixed
encodings (for instance, if it was serving home directories to users all
over the world each user might be using a different encoding.)

>  If so, then the shell server could
> know if it were told so.
> 

Where are you going to store that information?  In order for python to
run without errors, will it have to be configured on each system it's
installed on to know the encoding of each filename?  Or are we going to
try to talk each *NIX vendor into creating new filesystems that record
that information and after a five year span of time declare that python
will not run on other filesystems in corner cases?

I think that this way does not hold a reasonable expectation of keeping
python a portable language.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/e3be1a55/attachment-0001.pgp>

From victor.stinner at haypocalc.com  Fri Dec  5 16:09:19 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 5 Dec 2008 16:09:19 +0100
Subject: [Python-Dev] Python security: draft article on the wiki
Message-ID: <200812051609.19822.victor.stinner@haypocalc.com>

Hi,

I started to write a short article about Python security on the wiki:

   http://wiki.python.org/moin/Security

Nothing useful yet.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From skip at pobox.com  Fri Dec  5 16:25:01 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 5 Dec 2008 09:25:01 -0600
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <4938D7F9.80908@v.loewis.de>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
	<4938D7F9.80908@v.loewis.de>
Message-ID: <18745.18381.364105.121084@montanaro-dyndns-org.local>


    Martin> There is. There have been the following trove classifiers
    Martin> defined for a few weeks now:

    Martin> Programming Language :: Python :: 2
    Martin> Programming Language :: Python :: 2.3
    Martin> Programming Language :: Python :: 2.4
    Martin> Programming Language :: Python :: 2.5
    Martin> Programming Language :: Python :: 2.6
    Martin> Programming Language :: Python :: 2.7
    Martin> Programming Language :: Python :: 3
    Martin> Programming Language :: Python :: 3.0
    Martin> Programming Language :: Python :: 3.1

Good.  Now we just need to populate them.  I take it the classifiers without
minor numbers imply any known minor version (e.g., 2 ==> 2.3 and greater)?

Skip

From amk at amk.ca  Fri Dec  5 17:40:53 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Fri, 5 Dec 2008 11:40:53 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
Message-ID: <20081205164053.GA10632@amk-desktop.matrixgroup.net>

On Fri, Dec 05, 2008 at 05:40:46AM -0000, glyph at divmod.com wrote:
> For most users, especially new users who have yet to be impressed with  
> Python's power, 2.x is much better.  It's not like "library support" is  
> one small check-box on the language's feature sheet: most of the  
> attractive things about Python are libraries.  Of course I am not free  

Here I agree, sort of.  Newbies may not understand what they're giving
up in terms of libraries.  (The 'sort of' is because, having learned
3.0, learning the changes for 2.6 is certainly much easier than
learning a first programming language is.)

> The third (albeit much less likely) option is that you're learning  
> Python to learn to interact with a system that's scriptable in embedded  
> Python, like Blender or Gimp.  I don't think there's a single system of  
> that variety which uses 3.0 yet, and these will likely be even slower to  
> move than libraries.  

Let me note that if some application embeds Python for a specialized
purpose, where the only modules imported are either user-written or
part of the application, it seems much *easier* to move to Python 3
because the scripts don't use arbitrary third-party libraries.  Python
embedded in an e-mail MTA might use libraries for DNS or file I/O or
databases and has to be cautious about versions; Python in Gimp
probably doesn't, in practice.

--amk

From janssen at parc.com  Fri Dec  5 17:39:57 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 5 Dec 2008 08:39:57 PST
Subject: [Python-Dev] Python + Java Integration
In-Reply-To: <B53BCFBF-4FF1-42A8-B668-CB3E5513486E@snowtide.com>
References: <B53BCFBF-4FF1-42A8-B668-CB3E5513486E@snowtide.com>
Message-ID: <8291.1228495197@parc.com>

> One thing that would help Python in this "debate" (or, perhaps simply  
> put it in the running, at least as a "next Java" candidate) would be  
> if Python had an easier migration path for Java developers that  
> currently rely upon various third-party libraries.  The wealth of  
> third-party libraries available for Java has always been one of its  
> great strengths.  Ergo, if Python had an easy-to-use, recommended way  
> to use those libraries within the Python environment, that would be a  
> significant advantage to present to Java developers and those who  
> would choose Ruby over Java.  Platform compatibility is always a huge  
> motivator for those looking to migrate or upgrade.

Personally, I'm using Andi Vajda's JCC for this purpose.  Recommended.
The nice thing about it is that it turns jar files into Python modules;
you don't need the source.

http://pypi.python.org/pypi/JCC

Bill

From status at bugs.python.org  Fri Dec  5 18:06:58 2008
From: status at bugs.python.org (Python tracker)
Date: Fri,  5 Dec 2008 18:06:58 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20081205170658.63FCD780B1@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (11/28/08 - 12/05/08)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2233 open (+55) / 14139 closed (+41) / 16372 total (+96)

Open issues with patches:   753

Average duration of open issues: 705 days.
Median duration of open issues: 2193 days.

Open Issues Breakdown
   open  2214 (+54)
pending    19 ( +1)

Issues Created Or Reopened (96)
_______________________________

Coding cookie crashes IDLE                                       11/28/08
CLOSED http://bugs.python.org/issue4454    created  tjreedy                   
                                                                               

No Windows List in IDLE if several windows have the same title   11/28/08
CLOSED http://bugs.python.org/issue4455    created  amaury.forgeotdarc        
       patch                                                                   

xmlrpc is broken                                                 11/28/08
CLOSED http://bugs.python.org/issue4456    created  benjamin.peterson         
                                                                               

__import__ documentation obsolete                                11/29/08
       http://bugs.python.org/issue4457    created  stevenjd                  
                                                                               

getopt.gnu_getopt() loses dash argument                          11/29/08
CLOSED http://bugs.python.org/issue4458    created  muntyan                   
                                                                               

bdist_rpm assumes python                                         11/29/08
       http://bugs.python.org/issue4459    created  John5342                  
                                                                               

The parameter of PyInt_AsSsize_t() is not checked to see if it i 11/29/08
CLOSED http://bugs.python.org/issue4460    created  CWRU_Researcher1          
                                                                               

parameters of PyLong_FromString() are not checked for NULL       11/29/08
       http://bugs.python.org/issue4461    created  CWRU_Researcher1          
       patch                                                                   

result of PyList_GetItem() not validated                         11/29/08
CLOSED http://bugs.python.org/issue4462    created  CWRU_Researcher1          
                                                                               

Parameters and result of PyList_GetItem() are not validated      11/29/08
CLOSED http://bugs.python.org/issue4463    created  CWRU_Researcher1          
                                                                               

PyList_GetItem() result and parameters not fully validated       11/29/08
CLOSED http://bugs.python.org/issue4464    created  CWRU_Researcher1          
                                                                               

The result of set_copy() is not checked for NULL                 11/29/08
CLOSED http://bugs.python.org/issue4465    created  CWRU_Researcher1          
                                                                               

The return value of PyFile_FromFile is not checked for NULL      11/29/08
CLOSED http://bugs.python.org/issue4466    created  CWRU_Researcher1          
                                                                               

return value of PyUnicode_AsEncodedString() is not checked for N 11/29/08
CLOSED http://bugs.python.org/issue4467    created  CWRU_Researcher1          
                                                                               

Restore chapter enumeration in Python docs                       11/30/08
CLOSED http://bugs.python.org/issue4468    created  schluehk                  
                                                                               

CVE-2008-5031 multiple integer overflows                         11/30/08
       http://bugs.python.org/issue4469    created  doko                      
                                                                               

smtplib SMTP_SSL not working.                                    11/30/08
       http://bugs.python.org/issue4470    created  lcatucci                  
       patch                                                                   

IMAP4 missing support for starttls                               11/30/08
       http://bugs.python.org/issue4471    created  lcatucci                  
       patch                                                                   

Is shared lib building broken on trunk?                          11/30/08
       http://bugs.python.org/issue4472    created  skip.montanaro            
                                                                               

POP3 missing support for starttls                                11/30/08
       http://bugs.python.org/issue4473    created  lcatucci                  
       patch                                                                   

PyUnicode_FromWideChar incorrect for characters outside the BMP  11/30/08
       http://bugs.python.org/issue4474    created  marketdickinson           
                                                                               

More verbose error message for Py_FindMethod                     11/30/08
       http://bugs.python.org/issue4475    created  gpolo                     
       patch                                                                   

compileall.py  fails if current dir has a "types" subdir with 3. 12/01/08
       http://bugs.python.org/issue4476    created  aivazis                   
                                                                               

Speed up PyEval_EvalFrameEx when tracing is off.                 12/01/08
       http://bugs.python.org/issue4477    created  jyasskin                  
       patch, needs review                                                     

shutil.copyfile documentation                                    12/01/08
CLOSED http://bugs.python.org/issue4478    created  steve21                   
                                                                               

True division is not smart -> proposing smart True division      12/01/08
CLOSED http://bugs.python.org/issue4479    created  nassrat                   
                                                                               

bdist_msi and bdist_wininst are missing an uninstaller icon      12/01/08
       http://bugs.python.org/issue4480    created  lemburg                   
                                                                               

Windows installer crash                                          12/01/08
       http://bugs.python.org/issue4481    created  Konam                     
                                                                               

10e667.__format__('+') should return 'inf'                       12/01/08
       http://bugs.python.org/issue4482    created  DinoV                     
                                                                               

Error to build _dbm module during make                           12/01/08
       http://bugs.python.org/issue4483    created  legerf                    
       patch, easy                                                             

struct: per item endianess specification                         12/02/08
       http://bugs.python.org/issue4484    created  da4an1qu1                 
                                                                               

fast swap of "default" Windows python versions                   12/02/08
       http://bugs.python.org/issue4485    created  v+python                  
                                                                               

Exception traceback is incorrect for strange exception handling  12/02/08
       http://bugs.python.org/issue4486    created  ncoghlan                  
                                                                               

Add utf8 alias for email charsets                                12/02/08
       http://bugs.python.org/issue4487    created  maxua                     
       patch                                                                   

Python Documentation not Newb Friendly                           12/02/08
       http://bugs.python.org/issue4488    created  mez                       
                                                                               

shutil.rmtree is vulnerable to a symlink attack                  12/02/08
       http://bugs.python.org/issue4489    created  mrts                      
                                                                               

xml/sax/expatreader.py raises AttributeError when run            12/02/08
       http://bugs.python.org/issue4490    created  exarkun                   
                                                                               

email.Header.decode_header() doesn't work if encoded-word was se 12/02/08
       http://bugs.python.org/issue4491    created  ishimoto                  
       patch                                                                   

httplib code thinks it closes connection, but does not           12/02/08
       http://bugs.python.org/issue4492    created  jjlee                     
                                                                               

urllib2 doesn't always supply / where URI path component is empt 12/02/08
       http://bugs.python.org/issue4493    created  jjlee                     
                                                                               

Python 2.6 fails to build with Py_NO_ENABLE_SHARED               12/02/08
       http://bugs.python.org/issue4494    created  snaury                    
       patch                                                                   

Fix signed/unsigned warning                                      12/02/08
CLOSED http://bugs.python.org/issue4495    created  rhettinger                
                                                                               

misleading comment in urllib2                                    12/02/08
       http://bugs.python.org/issue4496    created  jjlee                     
                                                                               

Compiler warnings in longobject.c                                12/02/08
       http://bugs.python.org/issue4497    created  rhettinger                
       patch                                                                   

Compiler warning "signed/unsigned comparion in mmapmodule"       12/02/08
       http://bugs.python.org/issue4498    created  rhettinger                
                                                                               

redefinition of TILDE macro on AIX platform                      12/02/08
       http://bugs.python.org/issue4499    created  apaprocki                 
                                                                               

Compiler warnings when compiling Python 3.0 with a C89 compiler  12/03/08
       http://bugs.python.org/issue4500    created  christian.heimes          
                                                                               

asyncore's urgent data management and connection closed events   12/03/08
       http://bugs.python.org/issue4501    created  giampaolo.rodola          
       patch                                                                   

Allowing get_pre_input_hook from Readline                        12/03/08
       http://bugs.python.org/issue4502    created  Conrad.Irwin              
       patch                                                                   

exception traceback sometimes slow                               12/03/08
       http://bugs.python.org/issue4503    created  ocean-city                
                                                                               

Doc/includes out of date                                         12/03/08
CLOSED http://bugs.python.org/issue4504    created  exe                       
                                                                               

ob_size not removed from docs                                    12/03/08
CLOSED http://bugs.python.org/issue4505    created  exe                       
                                                                               

3.0 make test failures on Solaris 10                             12/03/08
       http://bugs.python.org/issue4506    created  skip.montanaro            
       64bit                                                                   

3.0 test failure on Mac OS X 10.5.5                              12/03/08
       http://bugs.python.org/issue4507    created  skip.montanaro            
                                                                               

distutils compiler not handling spaces in path to output/src fil 12/03/08
       http://bugs.python.org/issue4508    created  Thorney                   
       patch                                                                   

possible memoryview bug                                          12/03/08
       http://bugs.python.org/issue4509    created  gumpy                     
                                                                               

ValueError for list.remove() not very helpful                    12/03/08
       http://bugs.python.org/issue4510    created  brett.cannon              
       easy                                                                    

Decorators should have an index entry                            12/04/08
CLOSED http://bugs.python.org/issue4511    created  dvusboy                   
                                                                               

Add get_filename method to zipimport                             12/04/08
       http://bugs.python.org/issue4512    created  belopolsky                
       patch                                                                   

Finish updating zip docstring                                    12/04/08
CLOSED http://bugs.python.org/issue4513    created  tjreedy                   
                                                                               

test_binascii is failing                                         12/04/08
CLOSED http://bugs.python.org/issue4514    created  rhettinger                
                                                                               

Formatting error in "What's New in Python 3.0"                   12/04/08
CLOSED http://bugs.python.org/issue4515    created  pwang                     
                                                                               

Another formatting error in "What's New in Python 3.0"           12/04/08
CLOSED http://bugs.python.org/issue4516    created  pwang                     
                                                                               

improve __getattribute__ documentation                           12/04/08
CLOSED http://bugs.python.org/issue4517    created  LambertDW                 
                                                                               

broken link to python 3 doc on main doc page                     12/04/08
CLOSED http://bugs.python.org/issue4518    created  cleary                    
                                                                               

.pyc files included in 2.6 and 3.0 release tarballs              12/04/08
CLOSED http://bugs.python.org/issue4519    created  doko                      
                                                                               

Online 3.0 documentation says it's for 3.1a0                     12/04/08
CLOSED http://bugs.python.org/issue4520    created  paulmelis                 
                                                                               

"What's New in Python 3.0" mentions "getcwdu" instead of "getcwd 12/04/08
CLOSED http://bugs.python.org/issue4521    created  hagen                     
       patch                                                                   

Module wsgiref is not python3000 ready (unicode issues)          12/04/08
       http://bugs.python.org/issue4522    created  tordmor                   
       patch                                                                   

logging module __init__ uses has_key                             12/04/08
       http://bugs.python.org/issue4523    created  bitdancer                 
       patch                                                                   

Build fails at running build_scripts                             12/04/08
       http://bugs.python.org/issue4524    created  chaz6                     
       patch, needs review                                                     

metaclass fixer fails with AttributeError, causing 2to3 to exit  12/04/08
CLOSED http://bugs.python.org/issue4525    created  exarkun                   
                                                                               

Clarify documentation for binary literals                        12/04/08
CLOSED http://bugs.python.org/issue4526    created  nneonneo                  
                                                                               

Obsolete 'string or unicode' in fractions doc                    12/04/08
CLOSED http://bugs.python.org/issue4527    created  tjreedy                   
       easy                                                                    

test_httpservers consistently fails on OS X                      12/04/08
       http://bugs.python.org/issue4528    created  mwdiers                   
                                                                               

parser module failure on valid try/except/finally blocks         12/04/08
CLOSED http://bugs.python.org/issue4529    created  kaiw                      
                                                                               

IDLE crashes with Japanese text on print command                 12/04/08
CLOSED http://bugs.python.org/issue4530    created  Vultaire                  
                                                                               

Deprecation warnings in lib\compiler\ast.py                      12/04/08
CLOSED http://bugs.python.org/issue4531    created  edreamleo                 
                                                                               

Fails to build on QNX 6.3.2                                      12/04/08
       http://bugs.python.org/issue4532    created  kraai                     
                                                                               

3.0 file.read dreadfully slow                                    12/04/08
       http://bugs.python.org/issue4533    created  tjreedy                   
       patch                                                                   

problem with str.join - should work with list input, error says  12/04/08
CLOSED http://bugs.python.org/issue4534    created  lopgok                    
                                                                               

Build / Test Py3K failed on Ubuntu 8.10                          12/04/08
       http://bugs.python.org/issue4535    created  lbhudda                   
                                                                               

SystemError if invalid arguments passed to range() and step=-1   12/04/08
       http://bugs.python.org/issue4536    created  laszlo                    
       patch, needs review                                                     

webbrowser.UnixBrowser should use builtins.open                  12/05/08
       http://bugs.python.org/issue4537    reopened amaury.forgeotdarc        
                                                                               

ctypes could include data type limits                            12/04/08
       http://bugs.python.org/issue4538    created  roysmith                  
                                                                               

askdirectory() in tkinter.filedialog is broken                   12/04/08
       http://bugs.python.org/issue4539    created  dogtato                   
                                                                               

typo in a module describes utf-8 as uft-8                        12/04/08
       http://bugs.python.org/issue4540    created  john.weldon               
       patch, needs review                                                     

Add str method for removing leading or trailing substrings       12/05/08
CLOSED http://bugs.python.org/issue4541    created  zhirsch                   
       patch                                                                   

test_binascii fails on windows                                   12/05/08
CLOSED http://bugs.python.org/issue4542    created  amaury.forgeotdarc        
       patch, easy                                                             

container constructors destroy argument                          12/05/08
CLOSED http://bugs.python.org/issue4543    created  kjwcode                   
                                                                               

textwrap: __all__ atribute missing 'dedent' function             12/05/08
CLOSED http://bugs.python.org/issue4544    created  wolfdown                  
                                                                               

doctest seems to always fail on numpy.array2string               12/05/08
CLOSED http://bugs.python.org/issue4545    created  ekorn                     
                                                                               

Small thingy in "What's New in Python 3.0"                       12/05/08
CLOSED http://bugs.python.org/issue4546    created  paulmelis                 
                                                                               

Long jumps with frame_setlineno                                  12/05/08
       http://bugs.python.org/issue4547    created  fboule                    
       patch, needs review                                                     

OptionParser : Weird comportement in args processing             12/05/08
CLOSED http://bugs.python.org/issue4548    created  ohervieu                  
                                                                               

A defect in <The Python Tutorial>-<Python Scopes and Name Spaces 12/05/08
       http://bugs.python.org/issue4549    created  PyTiger                   
                                                                               



Issues Now Closed (86)
______________________

urllib fail to read URL contents, urllib2 crash Python            433 days
       http://bugs.python.org/issue1205    jjlee                     
                                                                               

httplib does not handle ssl end of file properly                  431 days
       http://bugs.python.org/issue1223    georg.brandl              
       patch                                                                   

popen spawned process may not write to stdout under windows       401 days
       http://bugs.python.org/issue1366    georg.brandl              
                                                                               

urllib2 302 POST                                                  391 days
       http://bugs.python.org/issue1401    jjlee                     
                                                                               

Victor Stinner's GMP patch for longs                              328 days
       http://bugs.python.org/issue1814    haypo                     
       patch                                                                   

Update What's new in 3.0                                          262 days
       http://bugs.python.org/issue2306    gvanrossum                
                                                                               

Update the ACKS file                                              264 days
       http://bugs.python.org/issue2311    georg.brandl              
                                                                               

'exceptions' import fixer                                         261 days
       http://bugs.python.org/issue2350    brett.cannon              
       patch, needs review                                                     

Backport buffer interface in Python 3.0 to Python 2.6             262 days
       http://bugs.python.org/issue2393    georg.brandl              
                                                                               

Python 2.6 refleak test issues                                    259 days
       http://bugs.python.org/issue2447    georg.brandl              
                                                                               

urllib2 can't handle http://www.wikispaces.com                    254 days
       http://bugs.python.org/issue2464    jjlee                     
       patch                                                                   

operator.*slice() should be deprecated in 2.6                     166 days
       http://bugs.python.org/issue3171    georg.brandl              
                                                                               

Multiprocessing docs are not 3.0-ready                            156 days
       http://bugs.python.org/issue3256    georg.brandl              
       patch                                                                   

multiprocessing: BaseManager.from_address documented but doesn't  113 days
       http://bugs.python.org/issue3518    jnoller                   
                                                                               

listreverseiterator has a decreasing len()                         98 days
       http://bugs.python.org/issue3689    rhettinger                
                                                                               

os.getenv silently discards env variables with non-UTF-8 values    65 days
       http://bugs.python.org/issue4006    Rhamphoryncus             
                                                                               

Minor errors in multiprocessing docs                               58 days
       http://bugs.python.org/issue4012    jnoller                   
                                                                               

distutils build_scripts and install_data commands need 2to3 supp   54 days
       http://bugs.python.org/issue4073    loewis                    
       patch                                                                   

Multiprocessing example                                            35 days
       http://bugs.python.org/issue4193    jnoller                   
                                                                               

BSD support for multiprocessing.cpu_count                          30 days
       http://bugs.python.org/issue4238    jnoller                   
                                                                               

cycle created by profile.run                                       27 days
       http://bugs.python.org/issue4273    amaury.forgeotdarc        
                                                                               

Error in docs of urllib.request and urllib.parse                   16 days
       http://bugs.python.org/issue4355    georg.brandl              
       patch                                                                   

Add CRT version info in msvcrt module                              11 days
       http://bugs.python.org/issue4365    loewis                    
       patch                                                                   

Add a warnings.showwarning replacement for logging                 12 days
       http://bugs.python.org/issue4384    vsajip                    
       easy                                                                    

binascii b2a functions accept strings (unicode) as data            10 days
       http://bugs.python.org/issue4387    loewis                    
       patch, needs review                                                     

Uninstaller Lacks an Icon                                           9 days
       http://bugs.python.org/issue4389    lemburg                   
                                                                               

os.extsep status? doc or os bug?                                   11 days
       http://bugs.python.org/issue4401    georg.brandl              
                                                                               

Windows Installer Error 1722 when opting for compilation at inst    7 days
       http://bugs.python.org/issue4407    keldonin                  
       patch                                                                   

re.compile(regexp).groups not documented                           11 days
       http://bugs.python.org/issue4408    georg.brandl              
                                                                               

Dangling asterisks in Python 3.0 subprocess docs                   11 days
       http://bugs.python.org/issue4409    georg.brandl              
                                                                               

Docs for 'y' Py_BuildValue tag are wrong                           10 days
       http://bugs.python.org/issue4427    georg.brandl              
       patch, patch                                                            

Improve os open flag options doc                                    9 days
       http://bugs.python.org/issue4441    georg.brandl              
                                                                               

2to3 run changed multiprocessing.Queue() to multiprocessing.queu    0 days
       http://bugs.python.org/issue4450    benjamin.peterson         
                                                                               

Coding cookie crashes IDLE                                          0 days
       http://bugs.python.org/issue4454    tjreedy                   
                                                                               

No Windows List in IDLE if several windows have the same title      0 days
       http://bugs.python.org/issue4455    amaury.forgeotdarc        
       patch                                                                   

xmlrpc is broken                                                    1 days
       http://bugs.python.org/issue4456    benjamin.peterson         
                                                                               

getopt.gnu_getopt() loses dash argument                             6 days
       http://bugs.python.org/issue4458    georg.brandl              
                                                                               

The parameter of PyInt_AsSsize_t() is not checked to see if it i    0 days
       http://bugs.python.org/issue4460    marketdickinson           
                                                                               

result of PyList_GetItem() not validated                            0 days
       http://bugs.python.org/issue4462    rhettinger                
                                                                               

Parameters and result of PyList_GetItem() are not validated         0 days
       http://bugs.python.org/issue4463    rhettinger                
                                                                               

PyList_GetItem() result and parameters not fully validated          0 days
       http://bugs.python.org/issue4464    rhettinger                
                                                                               

The result of set_copy() is not checked for NULL                    0 days
       http://bugs.python.org/issue4465    rhettinger                
                                                                               

The return value of PyFile_FromFile is not checked for NULL         0 days
       http://bugs.python.org/issue4466    marketdickinson           
                                                                               

return value of PyUnicode_AsEncodedString() is not checked for N    0 days
       http://bugs.python.org/issue4467    marketdickinson           
                                                                               

Restore chapter enumeration in Python docs                          5 days
       http://bugs.python.org/issue4468    georg.brandl              
                                                                               

shutil.copyfile documentation                                       4 days
       http://bugs.python.org/issue4478    georg.brandl              
                                                                               

True division is not smart -> proposing smart True division         0 days
       http://bugs.python.org/issue4479    nassrat                   
                                                                               

Fix signed/unsigned warning                                         0 days
       http://bugs.python.org/issue4495    rhettinger                
                                                                               

Doc/includes out of date                                            2 days
       http://bugs.python.org/issue4504    georg.brandl              
                                                                               

ob_size not removed from docs                                       2 days
       http://bugs.python.org/issue4505    georg.brandl              
                                                                               

Decorators should have an index entry                               2 days
       http://bugs.python.org/issue4511    dvusboy                   
                                                                               

Finish updating zip docstring                                       1 days
       http://bugs.python.org/issue4513    georg.brandl              
                                                                               

test_binascii is failing                                            1 days
       http://bugs.python.org/issue4514    amaury.forgeotdarc        
                                                                               

Formatting error in "What's New in Python 3.0"                      1 days
       http://bugs.python.org/issue4515    georg.brandl              
                                                                               

Another formatting error in "What's New in Python 3.0"              1 days
       http://bugs.python.org/issue4516    georg.brandl              
                                                                               

improve __getattribute__ documentation                              1 days
       http://bugs.python.org/issue4517    georg.brandl              
                                                                               

broken link to python 3 doc on main doc page                        1 days
       http://bugs.python.org/issue4518    georg.brandl              
                                                                               

.pyc files included in 2.6 and 3.0 release tarballs                 1 days
       http://bugs.python.org/issue4519    barry                     
                                                                               

Online 3.0 documentation says it's for 3.1a0                        0 days
       http://bugs.python.org/issue4520    georg.brandl              
                                                                               

"What's New in Python 3.0" mentions "getcwdu" instead of "getcwd    0 days
       http://bugs.python.org/issue4521    georg.brandl              
       patch                                                                   

metaclass fixer fails with AttributeError, causing 2to3 to exit     0 days
       http://bugs.python.org/issue4525    benjamin.peterson         
                                                                               

Clarify documentation for binary literals                           0 days
       http://bugs.python.org/issue4526    georg.brandl              
                                                                               

Obsolete 'string or unicode' in fractions doc                       0 days
       http://bugs.python.org/issue4527    georg.brandl              
       easy                                                                    

parser module failure on valid try/except/finally blocks            1 days
       http://bugs.python.org/issue4529    georg.brandl              
                                                                               

IDLE crashes with Japanese text on print command                    0 days
       http://bugs.python.org/issue4530    amaury.forgeotdarc        
                                                                               

Deprecation warnings in lib\compiler\ast.py                         0 days
       http://bugs.python.org/issue4531    edreamleo                 
                                                                               

problem with str.join - should work with list input, error says     0 days
       http://bugs.python.org/issue4534    amaury.forgeotdarc        
                                                                               

Add str method for removing leading or trailing substrings          0 days
       http://bugs.python.org/issue4541    rhettinger                
       patch                                                                   

test_binascii fails on windows                                      0 days
       http://bugs.python.org/issue4542    amaury.forgeotdarc        
       patch, easy                                                             

container constructors destroy argument                             0 days
       http://bugs.python.org/issue4543    rhettinger                
                                                                               

textwrap: __all__ atribute missing 'dedent' function                0 days
       http://bugs.python.org/issue4544    georg.brandl              
                                                                               

doctest seems to always fail on numpy.array2string                  0 days
       http://bugs.python.org/issue4545    amaury.forgeotdarc        
                                                                               

Small thingy in "What's New in Python 3.0"                          0 days
       http://bugs.python.org/issue4546    georg.brandl              
                                                                               

OptionParser : Weird comportement in args processing                0 days
       http://bugs.python.org/issue4548    georg.brandl              
                                                                               

registry functions don't handle null characters                  2144 days
       http://bugs.python.org/issue672132  amaury.forgeotdarc        
                                                                               

SIGSEGV in _sre.c (IRIX 6.5.20)                                  1948 days
       http://bugs.python.org/issue783789  amaury.forgeotdarc        
                                                                               

pickletools support for multiple pickles in a string             1792 days
       http://bugs.python.org/issue873150  fdrake                    
                                                                               

catch invalid chunk length in httplib read routine               1748 days
       http://bugs.python.org/issue900744  jjlee                     
       patch                                                                   

cgi.py does not correctly handle fields with ';'                 1499 days
       http://bugs.python.org/issue1055234 fdrake                    
       patch                                                                   

attempting to use urllib2 on some URLs fails starting on 2.4     1385 days
       http://bugs.python.org/issue1123695 amaury.forgeotdarc        
                                                                               

httplib patch to make _read_chunked() more robust                1047 days
       http://bugs.python.org/issue1411097 jjlee                     
       patch                                                                   

pdb find_function does not find Class methods.                    681 days
       http://bugs.python.org/issue1643369 georg.brandl              
                                                                               

Draft implementation for PEP 364                                  640 days
       http://bugs.python.org/issue1675334 barry                     
       patch                                                                   

'exec' does not accept what 'open' returns                        495 days
       http://bugs.python.org/issue1762972 georg.brandl              
                                                                               

urllib2 hangs with some documents.                                479 days
       http://bugs.python.org/issue1772481 amaury.forgeotdarc        
                                                                               

glob doesn't return unicode with unicode parameter                473 days
       http://bugs.python.org/issue1777458 georg.brandl              
                                                                               



Top Issues Most Discussed (10)
______________________________

 17 Error to build _dbm module during make                             4 days
open    http://bugs.python.org/issue4483   

 15 AssertionError in Doc/includes/mp_benchmarks.py                    7 days
open    http://bugs.python.org/issue4449   

 10 3.0 make test failures on Solaris 10                               2 days
open    http://bugs.python.org/issue4506   

 10 BaseHTTPRequestHandler depends on GC to close connections         87 days
open    http://bugs.python.org/issue3826   

  9 Update What's new in 3.0                                         262 days
closed  http://bugs.python.org/issue2306   

  8 POP3 missing support for starttls                                  5 days
open    http://bugs.python.org/issue4473   

  7 typo in a module describes utf-8 as uft-8                          1 days
open    http://bugs.python.org/issue4540   

  7 3.0 file.read dreadfully slow                                      1 days
open    http://bugs.python.org/issue4533   

  7 bdist_msi and bdist_wininst are missing an uninstaller icon        4 days
open    http://bugs.python.org/issue4480   

  7 Speed up PyEval_EvalFrameEx when tracing is off.                   5 days
open    http://bugs.python.org/issue4477   




From a.badger at gmail.com  Fri Dec  5 18:37:14 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 09:37:14 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051118.48096.victor.stinner@haypocalc.com>
References: <4938374B.8000006@gmail.com>
	<200812051118.48096.victor.stinner@haypocalc.com>
Message-ID: <493966CA.2010801@gmail.com>

Victor Stinner wrote:
> Hi,
> 
> Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez ?crit :
> 
>> These mixed encodings can occur for a variety of reasons.  Here's an
>> example that isn't too contrived :-)
>> (...)
>> Furthermore, they don't want to suffer from the space loss of using 
>> utf-8 to encode Japanese so they use shift-jis everywhere.
> 
> "space loss"? Really? If you configure your server correctly, you should get 
> UTF-8 even if the file system is Shift-JIS. But it would be much easier to 
> use UTF-8 everywhere.
> 
> Hum... I don't think that the discussion is about one specific server, but the 
> lack of bytes environment variables in Python3 :-)
>
Yep.  I can't change the logicalness of the policies of a different
organization, only code my application to deal with it :-)

>> 1) return mixed unicode and byte types in ...
> 
> NO!
> 
It's nice that we agree... but I would prefer if you leave enough
context so that others can see that we agree as well :-)

>> 2) return only byte types in os.environ
> 
> Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and 
> Python3 already use Unicode everywhere (input(), open(), filenames, ...).
>
We're also in agreement here.

>> 3) silently ignore non-decodable value when accessing os.environ['PATH']
>> as we do now but allow access to the full information via
>> os.environ[b'PATH'] and os.getenvb()
> 
> I don't like os.environ[b'PATH']. I prefer to always get the same result 
> type... But os.listdir() doesn't respect that :-(
> 
>    os.listdir(str) -> list of str
>    os.listdir(bytes) -> list of bytes
> 
> I would prefer a similar API for easier migration from Python2/Python3
> (unicode). os.environb sounds like the best choice for me.
> 
<nod>.  After thinking about how it would be used in subprocess calls I
agree.  os.environb would allow us to retrieve the full dict as bytes.
os.environ[b''] only works on individual keys.  Also os.getenv serves
the same purpose as os.environ[b''] would whereas os.environb would have
 its own uses.

> 
> But they are open questions (already asked in the bug tracker):
> 
I answered these in the bug tracker.  Here are the answers for the
mailing list:

> (a) Should os.environ be updated if os.environb is changed? If yes, how?
>    os.environb['PATH'] = '\xff' (or any invalid string in the system 
>                                  default encoding)
>    => os.environ['PATH'] = ???
> 
The underlying environment that both variables reflect should be updated
but what is displayed by os.environ should continue to follow the same
rules.  So if we follow option #3::
     os.environb['PATH'] = b'\xff'
     os.environ['PATH'] => raises KeyError because PATH is not a key in
the unicode decoded environment.

(option #4 would issue a UnicodeDecodeError instead of a KeyError)

Similarly, if you start with a variable in os.environb that can only be
represented as bytes and your program transforms it into something that
is decodable it should then show up in os.environ.

> (b) Should os.environb be updated if os.environ is changed? If yes, how?
> 
> The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset 
> are unable to encode the whole Unicode charset (eg. codes >= 65535).
> 
>    os.environ['PATH'] = chr(0x10000)
>    => os.environb['PATH'] = ???
>
Ah, this is a good question.  I misunderstood what you were getting at
when you posted this to the bug report.  I see several options but the
one that seems the most sane is to raise UnicodeEncodeError when setting
the value.  With that, proper code to set an environment variable might
look like this::

LANG=C python3.0
>>> variable = chr(0x10000)
>>> try:
>>>     # Unicode aware locales
>>>     os.environ['MYVAR'] = variable
>>> except UnicodeEncodeError:
>>>     # Non-Unicode locales
>>>     os.environb['MYVAR'] = bytes(variable, encoding='utf8')

> (c) Same question when a key is deleted (del os.environ['PATH']).
> 
Update the underlying env so both os.environ and os.environb reflect the
change.  Deleting should not hold the problems that updating does.

> If Python 3.1 will have os.environ and os.environb, I'm quite sure that some 
> modules will user os.environ and other will prefer os.environb. If both 
> environments are differents, the two modules set will work differently :-/
> 
Exactly.  So making sure they hold the same information is a priority.

> It would be maybe easier if os.environ supports bytes and unicode keys. But we 
> have to keep these assertions:
>    os.environ[bytes] -> bytes
>    os.environ[str] -> str
> 
I think the same choices have to be made here.  If LANG=C, we still have
to decide what to do when os.environ[str] is set to a non-ASCii string.

Additionally, the subprocess question makes using the key value
undesirable compared with having a separate os.environb that accesses
the same underlying data.

>> 4) raise an exception when non-decodable values are *accessed* and
>> continue as in #3.
> 
> I like os.listdir() behaviour: just *ignore* non-decodable files. If you 
> really want to access these files, use a bytes directory name ;-)
> 
Since you wrote the code for that I would hope so ;-)

Here's my problem with it, though.  With these semantics any program
that works on arbitrary files and runs on *NIX has to check
os.listdir(b'') and do the conversion manually.  The only code that
doesn't have to care is code that is working on files that the program
created and thus controls.

Since it is not obvious that this has to be done most programs won't do
this by default, there will be subtle bugs in a lot of code that
individual application authors will have to discover and change when a
user realizes something is wrong.  Since there's no traceback being
issued, the process of discovery and debugging will be longer.

>> I think that the ease of debugging is lost when we silently ignore an error.
> 
> Guido gave a good example. If your directory contains an non decodable 
> filename (eg. "???.txt"): glob('*.py') will fail because of the evil 
> filename. With the current behaviour, you're unable to list all files but 
> glob('*.py') will list all Python scripts!
> 
Current behaviour is this:

os.listdir('.')   => Only decodable filenames
glob.glob('*')    => Only decodable filenames
os.listdir(b'.')  => All filenames as bytes
glob.glob(b'*')   => All filenames as bytes

I think the desired behaviour assuming the existence of anondecodable
file is this:

os.listdir('.')    => traceback
glob.glob('*')     => traceback
os.listdir(b'.')   => All filenames as bytes
glob.glob(b'*')    => All filenames as bytes

Both of these approaches are internally consistent.  Why do you think
that glob.glob('*.py') is special and should not traceback?

> And Python3 is released, it's maybe a bad idea to change the behaviour (of 
> os.environ) in Python 3.1 :-/
> 
As you've pointed out, os.environ will have to change slightly.  But
others have already said that this is on the agenda to fix in 3.1.  The
current state is just broken as the environment is currently only
partially readable from python.

>> The bug report I opened suggests creating a PEP to address this issue.
> 
> Please, try to answer to my questions about os.environ and os.environb 
> consistency.
> 
I have.  Twice now :-)

> I also like bytes environment variables. I need them for my fuzzing program. 
> The lack of bytes variables is a regression from Python2 (for my program). On 
> UNIX, filenames are bytes and the environment variables are bytes. For the 
> best interoperability, Python3 should support bytes. But the default choice 
> should always be characters (unicode) and to never mix the bytes and str 
> types ;-)
> 
I agree 100%.

* Never mixing bytes and str is a *huge* benefit of python3 over python2.
* Unicode str everywhere possible is a python3 benefit that helps to get
conversion done at the border.

I just differ in that I think lack of tracebacks when
UnicodeDecodeErrors are encountered is a wart in python3 that did not
exist in python2.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/93727f86/attachment.pgp>

From guido at python.org  Fri Dec  5 18:59:52 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Dec 2008 09:59:52 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
Message-ID: <ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>

On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt <eckhardt at satorlaser.com> wrote:
> Seriously, what would you suggest to someone that
> wants to handle paths in a portable way? Using the Unicode variants of
> functions is fubar, because encoding/decoding is not universally possible.
> Using the byte variant is equally fubar, because e.g. on MS Windows it is not
> supported, except through a very lossy roundtrip through the locale's
> codepage, limiting your functionality.

Write a lightweight abstraction layer that uses Unicode when possible
and bytes otherwise. You'd need to write a few functions for the path
handling code you need, with a platform check or two sprinkled in.

Writing such an abstraction for the purpose of one specific
application is usually simple enough. However, writing a similar
abstraction that serves all apps and all use cases is hard. I hope
that eventually someone will come up with one though -- the failure of
earlier path object proposals notwithstanding.

> I actually think it is about time to give up on trying to think about a path
> as a string. Dito for data received from os.environ or sys.argv. There are
> only very few things that are universal to them and a reliable encoding is
> none of them. Then, once you have let that idea go, meditate a bit over the
> Zen.

This sounds too pessimistic to me. I expect that in five years it will
be universally accepted that these variables must be encoded in a
standard encoding. People are never going to give up thinking about
filenames etc. as strings, because that's what they are conceptually.
The problem is purely one of encoding, and that's where Unix/Linux are
behind the curve, since (so far) they haven't taken the plunge and
picked a universal standard encoding, the way Windows and Mac OS X
have done.

> What I propose is that paths must be treated as OS-specific, with the only
> common reliable operations being joining them, concatenating them and
> splitting them into segments divided by the (again, OS-specific) separator.
> Other operations, like e.g. appending a string or converting it to a string
> in order to display it can fail. And if they fail, they should fail noisily.

That's bad though, since filenames are being displayed all the time
(e.g. in error messages).

> In 99% of all cases, using the default encoding will work and do what people
> expect, which is why I would make this conversion automatic. In all other
> cases, it will at least not fail silently (which would lead to garbage and
> data loss) and allow more sophisticated applications to handle it.

I think the "always fail noisily" approach isn't the best approach.
E.g. if I am globbing for *.py, and there's an undecodable .txt file
in a directory, its presence shouldn't cause the glob to fail.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From Ted.Leung at Sun.COM  Fri Dec  5 18:48:07 2008
From: Ted.Leung at Sun.COM (Ted Leung)
Date: Fri, 05 Dec 2008 09:48:07 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
Message-ID: <30F5311C-8857-4486-99ED-7380BAC51B29@sun.com>

On Dec 4, 2008, at 7:59 PM, glyph at divmod.com wrote:

>
> On 02:35 am, amk at amk.ca wrote:
>> On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote:
>>> Here's a bright idea.  On the 3.0 release page, include a box  
>>> listing
>>> which major third-party apps have been converted.  Update it
>>> once every couple of weeks.  That way, we're not explicitly
>>
>> That's an excellent idea.  We could have a webpage, or start a
>> topic-specific weblog for posting announcements.
>>
>> I've started a draft of a 3.0 FAQ in the wiki at
>> <http://wiki.python.org/moin/Python3000/FAQ>.  Once it's finished we
>> can move it into the 3.0 release pages.  Everyone please edit and
>> improve it!
>
> It occurs to me that this specific idea (the box with the list of  
> supported applications / libraries) should be implementable as a  
> simple query against PyPI.  I don't know if it actually is :), but  
> it should be.  In general it would be nice to know whether one's  
> favorite tools were available for *any* new Python version.

I agree with this.   Plus it might act as an incentive for people to  
port libraries faster...

Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/6de259da/attachment-0001.htm>

From guido at python.org  Fri Dec  5 19:10:03 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Dec 2008 10:10:03 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
Message-ID: <ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>

On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:
> With all due respect, for me, "library support" and "serious use" are
> synonymous.

Glyph, I cannot have a discussion with you if every single post of
yours is longer than my combined daily output. Please spend some time
writing shorter posts. I'm sure I'm not the only one here with a short
attention span. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fdrake at acm.org  Fri Dec  5 19:16:35 2008
From: fdrake at acm.org (Fred Drake)
Date: Fri, 05 Dec 2008 13:16:35 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <18745.18381.364105.121084@montanaro-dyndns-org.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
	<4938D7F9.80908@v.loewis.de>
	<18745.18381.364105.121084@montanaro-dyndns-org.local>
Message-ID: <5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org>

On Dec 5, 2008, at 10:25 AM, skip at pobox.com wrote:
> Good.  Now we just need to populate them.  I take it the classifiers  
> without
> minor numbers imply any known minor version (e.g., 2 ==> 2.3 and  
> greater)?


This is an excellent question, Skip.

There was already "Programming Language :: Python", provided by many  
packages.  I think version compatibility relationships meant by each  
of these classifiers should be made explicit, wherever it is that  
documentation for classifiers is provided.

I don't recall having seen any such documentation; hopefully I just  
need to be hit by another clue.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From g.brandl at gmx.net  Fri Dec  5 19:24:27 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 05 Dec 2008 19:24:27 +0100
Subject: [Python-Dev] __import__ docs follow-up
Message-ID: <ghbrls$df5$1@ger.gmane.org>

Hi,

as a follow-up to the thread a few days ago, and the bug report, I've
rewritten most of the __import__ docs.  I've attached the suggested patch
to the issue <http://bugs.python.org/issue4457>.

I'd be glad for reviews. Also, I'd like to ask about opinions if this
"winning idiom" (as a bug comment states) should be in it, instead of
the getattr() helper function:

>>> import sys
>>> __import__('x.y.z')
>>> mod = sys.modules['x.y.z']

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Fri Dec  5 19:36:24 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 05 Dec 2008 19:36:24 +0100
Subject: [Python-Dev] ANN: new python-porting mailing list
Message-ID: <ghbscc$h9t$1@ger.gmane.org>

Hi all,

to facilitate discussion about porting Python code between different versions
(mainly of course from 2.x to 3.x), we've created a new mailing list

   python-porting at python.org

It is a public mailing list open to everyone.  We expect active participation
of many people porting their libraries/programs, and hope that the list can
be a help to all wanting to go this (not always smooth :-) way.

@python-dev: it would of course be nice to have more than a few developers
on that list ;-)

regards,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From dickinsm at gmail.com  Fri Dec  5 20:20:56 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Fri, 5 Dec 2008 19:20:56 +0000
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes <lists at cheimes.de> wrote:
> Flow diagram
> ------------
>
> trunk ---> release26-maint
>       \->      py3k       ---> release30-maint
>

I'm running into problems making this work, with a trivial change:
I committed r67590 (which adds a single assert to ast.c) to the
trunk, then merged to 2.6 and py3k in r67592 and r67595 respectively.
Then I tried:

../svnmerge.py merge -r67595

from the root directory of a clean copy of the release30-maint
branch (svn status gives no output), and got conflicts on '.':

property 'svnmerge-integrated' set on '.'

property 'svnmerge-blocked' set on '.'

--- Merging r67595 into '.':
U    Python/ast.c
 C   .

property 'svnmerge-integrated' set on '.'

property 'svnmerge-blocked' deleted from '.'.

I now have a new file dir_conflicts.prej that looks something like:

Trying to change property 'svnmerge-integrated' from
'/python/trunk:1-61437,...,67528,67590', but property has been locally
changed from
'/python/branches/py3k:1-67498,67522-67524,67539,67541,67559,67588' to
'/python/trunk:1-61437,...,67467,67484,67528'.

(where the ... abbreviates a big long list of revision numbers).

Did I mess up somewhere, or does svnmerge not work on
a revision that was itself the result of an svnmerge?

Mark

From brett at python.org  Fri Dec  5 20:21:28 2008
From: brett at python.org (Brett Cannon)
Date: Fri, 5 Dec 2008 11:21:28 -0800
Subject: [Python-Dev] ANN: new python-porting mailing list
In-Reply-To: <ghbscc$h9t$1@ger.gmane.org>
References: <ghbscc$h9t$1@ger.gmane.org>
Message-ID: <bbaeab100812051121j179befa1i59827ecff2d591d7@mail.gmail.com>

On Fri, Dec 5, 2008 at 10:36, Georg Brandl <g.brandl at gmx.net> wrote:
> Hi all,
>
> to facilitate discussion about porting Python code between different versions
> (mainly of course from 2.x to 3.x), we've created a new mailing list
>
>   python-porting at python.org
>
> It is a public mailing list open to everyone.  We expect active participation
> of many people porting their libraries/programs, and hope that the list can
> be a help to all wanting to go this (not always smooth :-) way.
>

The mailing list URL is
http://mail.python.org/mailman/listinfo/python-porting for those who
don't want to search on the mail.python.org home page (which looks
really dated at this point).

-Brett

From brett at python.org  Fri Dec  5 20:36:19 2008
From: brett at python.org (Brett Cannon)
Date: Fri, 5 Dec 2008 11:36:19 -0800
Subject: [Python-Dev] Merging flow
In-Reply-To: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
Message-ID: <bbaeab100812051136m4e5b156dufbc5b75a355bccb6@mail.gmail.com>

On Fri, Dec 5, 2008 at 11:20, Mark Dickinson <dickinsm at gmail.com> wrote:
> On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes <lists at cheimes.de> wrote:
>> Flow diagram
>> ------------
>>
>> trunk ---> release26-maint
>>       \->      py3k       ---> release30-maint
>>
>
> I'm running into problems making this work, with a trivial change:
> I committed r67590 (which adds a single assert to ast.c) to the
> trunk, then merged to 2.6 and py3k in r67592 and r67595 respectively.
> Then I tried:
>
> ../svnmerge.py merge -r67595
>
> from the root directory of a clean copy of the release30-maint
> branch (svn status gives no output), and got conflicts on '.':
>
> property 'svnmerge-integrated' set on '.'
>
> property 'svnmerge-blocked' set on '.'
>
> --- Merging r67595 into '.':
> U    Python/ast.c
>  C   .
>
> property 'svnmerge-integrated' set on '.'
>
> property 'svnmerge-blocked' deleted from '.'.
>
> I now have a new file dir_conflicts.prej that looks something like:
>
> Trying to change property 'svnmerge-integrated' from
> '/python/trunk:1-61437,...,67528,67590', but property has been locally
> changed from
> '/python/branches/py3k:1-67498,67522-67524,67539,67541,67559,67588' to
> '/python/trunk:1-61437,...,67467,67484,67528'.
>
> (where the ... abbreviates a big long list of revision numbers).
>
> Did I mess up somewhere, or does svnmerge not work on
> a revision that was itself the result of an svnmerge?

Someone might know better than me, but I am willing to bet you can't
svnmerge a svnmerge revision. Since the svnmerge revision contains
changes to the metadata on . that will conflict with the new svnmerge
values that the svnmerge you are trying to do causes. But if I am
right about this then won't that require blocking the svnmerge
revision on release30-maint the svnmerge revision on py3k?

Ugh. Is this getting to the point that we can only svnmerge between
trunk and py3k and the maintenance branches just have to be managed
the old-fashion way?

And I have pinged the people helping me with the DVCS PEP in hopes of
getting us moved off of svn sooner rather than later.

-Brett

From gregor.lingl at aon.at  Fri Dec  5 20:36:26 2008
From: gregor.lingl at aon.at (Gregor Lingl)
Date: Fri, 05 Dec 2008 20:36:26 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
Message-ID: <493982BA.6090604@aon.at>



Guido van Rossum schrieb:
> I hear some folks are considering advertising 3.0 as experimental or
> not ready for serious use yet.
>
> I think that's too negative -- we should encourage people to use it,
> period. They'll have to decide for themselves whether they can live
> with the lack of ported 3rd party libraries -- which may resolve
> itself soon enough. 
I'd find it useful to have a special regularly updated index of 
libraries already ported to 3.0 somewhere on python.org

Gregor

From fdrake at acm.org  Fri Dec  5 20:38:45 2008
From: fdrake at acm.org (Fred Drake)
Date: Fri, 05 Dec 2008 14:38:45 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
Message-ID: <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org>

On Dec 5, 2008, at 2:20 PM, Mark Dickinson wrote:
> Did I mess up somewhere, or does svnmerge not work on
> a revision that was itself the result of an svnmerge?

I ran into this yesterday as well with my patch to the cgi module.   
The work-around was to revert the change to that property and edit it  
manually.

I think this is a significant issue, since editing that property is  
about as error-prone as it can be.  I've not really looked at the code  
in svnmerge.py, so I'm not sure how hard it would be to fix.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From gregor.lingl at aon.at  Fri Dec  5 20:44:09 2008
From: gregor.lingl at aon.at (Gregor Lingl)
Date: Fri, 05 Dec 2008 20:44:09 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
Message-ID: <49398489.3060907@aon.at>



glyph at divmod.com schrieb:
>
> To be fair, if someone asked me specifically about educating non- 
> programmer adults about programming, I would probably at least 
> *mention* py3, if not recommend it outright.  The improved consistency 
> is worth a lot in an educational setting.  (But, if one is educating 
> children and interested in soliciting their genuine enthusiasm, 
> whiz-bang graphics are really a must-have, not a negotiable extra.)
As a non native English speaker I'm not sure if I understand correctly, 
what you mean with whiz-bang graphics. Nevertheless I'd like to point 
you to the new turtle graphics module (which is part of the standard 
librarys since 2.6). At least it was designed especially for use in the 
educational  domain. Moreover the source-distribution also contains a 
bunch of some ten example scripts.

Regards,
Gregor


From skip at pobox.com  Fri Dec  5 20:53:42 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 5 Dec 2008 13:53:42 -0600
Subject: [Python-Dev] ANN: new python-porting mailing list
In-Reply-To: <ghbscc$h9t$1@ger.gmane.org>
References: <ghbscc$h9t$1@ger.gmane.org>
Message-ID: <18745.34502.329661.301314@montanaro-dyndns-org.local>


    Georg>    python-porting at python.org

    Georg> It is a public mailing list open to everyone.  We expect active
    Georg> participation of many people porting their libraries/programs,
    Georg> and hope that the list can be a help to all wanting to go this
    Georg> (not always smooth :-) way.

I trust you will announce this in python-list and python-announce-list if
you haven't already?

Skip

From g.brandl at gmx.net  Fri Dec  5 20:57:43 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 05 Dec 2008 20:57:43 +0100
Subject: [Python-Dev] ANN: new python-porting mailing list
In-Reply-To: <18745.34502.329661.301314@montanaro-dyndns-org.local>
References: <ghbscc$h9t$1@ger.gmane.org>
	<18745.34502.329661.301314@montanaro-dyndns-org.local>
Message-ID: <ghc14o$219$1@ger.gmane.org>

skip at pobox.com schrieb:
>     Georg>    python-porting at python.org
> 
>     Georg> It is a public mailing list open to everyone.  We expect active
>     Georg> participation of many people porting their libraries/programs,
>     Georg> and hope that the list can be a help to all wanting to go this
>     Georg> (not always smooth :-) way.
> 
> I trust you will announce this in python-list and python-announce-list if
> you haven't already?

I've sent it to python-announce, it's in the moderator queue.  I'm not on
python-list so I can't answer followups.  If you'd like to do an
announcement there, I'd be happy :)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From mike.klaas at gmail.com  Fri Dec  5 21:01:35 2008
From: mike.klaas at gmail.com (Mike Klaas)
Date: Fri, 5 Dec 2008 12:01:35 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081205164053.GA10632@amk-desktop.matrixgroup.net>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com>
	<20081205164053.GA10632@amk-desktop.matrixgroup.net>
Message-ID: <B3453012-6C10-4ADE-B0E7-ED4C364ACD16@gmail.com>


On 5-Dec-08, at 8:40 AM, A.M. Kuchling wrote:

> On Fri, Dec 05, 2008 at 05:40:46AM -0000, glyph at divmod.com wrote:
>> For most users, especially new users who have yet to be impressed  
>> with
>> Python's power, 2.x is much better.  It's not like "library  
>> support" is
>> one small check-box on the language's feature sheet: most of the
>> attractive things about Python are libraries.  Of course I am not  
>> free
>
> Here I agree, sort of.  Newbies may not understand what they're giving
> up in terms of libraries.  (The 'sort of' is because, having learned
> 3.0, learning the changes for 2.6 is certainly much easier than
> learning a first programming language is.)

For possible insight, here is a current discussion on the topic:

http://www.reddit.com/r/programming/comments/7hlra/ask_progit_ive_got_the_itch_to_learn_python_since/

(note that these would be programmers interested in learning python,  
not people trying to learn programming)

-Mike

From a.badger at gmail.com  Fri Dec  5 21:05:20 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 12:05:20 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
References: <4938374B.8000006@gmail.com>	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
Message-ID: <49398980.7050209@gmail.com>

Guido van Rossum wrote:
> On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt <eckhardt at satorlaser.com> wrote:
>> In 99% of all cases, using the default encoding will work and do what people
>> expect, which is why I would make this conversion automatic. In all other
>> cases, it will at least not fail silently (which would lead to garbage and
>> data loss) and allow more sophisticated applications to handle it.
> 
> I think the "always fail noisily" approach isn't the best approach.
> E.g. if I am globbing for *.py, and there's an undecodable .txt file
> in a directory, its presence shouldn't cause the glob to fail.
> 
But why should it make glob() fail?  This sounds like an implementation
detail of glob.  Here's some pseudo-code::

def glob(pattern):
    string = False
    if isinstance(pattern, str):
        string = True
        if platform == 'POSIX':
            pattern = bytes(pattern, encoding=defaultencoding)
    rawfiles = os.listdir(os.path.dirname(pattern) or pattern)
    if string and platform == 'POSIX':
        return [str(f) for f in rawfiles if match(f, pattern)]
    else:
        return rawfiles

This way the traceback occurs if anything in the result set is
undecodable.  What am I missing?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/e67ab538/attachment-0001.pgp>

From guido at python.org  Fri Dec  5 21:11:28 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Dec 2008 12:11:28 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <49398980.7050209@gmail.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
Message-ID: <ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>

On Fri, Dec 5, 2008 at 12:05 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> Guido van Rossum wrote:
>> On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt <eckhardt at satorlaser.com> wrote:
>>> In 99% of all cases, using the default encoding will work and do what people
>>> expect, which is why I would make this conversion automatic. In all other
>>> cases, it will at least not fail silently (which would lead to garbage and
>>> data loss) and allow more sophisticated applications to handle it.
>>
>> I think the "always fail noisily" approach isn't the best approach.
>> E.g. if I am globbing for *.py, and there's an undecodable .txt file
>> in a directory, its presence shouldn't cause the glob to fail.
>>
> But why should it make glob() fail?  This sounds like an implementation
> detail of glob.

Glob was just an example. Many use cases for directory traversal
couldn't care less if they see *all* files.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From a.badger at gmail.com  Fri Dec  5 21:40:51 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 12:40:51 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
References: <4938374B.8000006@gmail.com>	
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>	
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	
	<200812051127.35880.eckhardt@satorlaser.com>	
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>	
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
Message-ID: <493991D3.9030003@gmail.com>

Guido van Rossum wrote:
> Glob was just an example. Many use cases for directory traversal
> couldn't care less if they see *all* files.
> 
Okay.  Makes it harder to prove correct or not if I don't know what the
use case is :-)  I can't think of a single use case off-hand.

Even your example of a ??.txt file making retrieval of *.py files fail
is a little broken.  If there was a ??.py file that was undecodable the
program would most likely want to know that file existed.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/c2738f18/attachment.pgp>

From tseaver at palladion.com  Fri Dec  5 21:49:44 2008
From: tseaver at palladion.com (Tres Seaver)
Date: Fri, 05 Dec 2008 15:49:44 -0500
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <49398489.3060907@aon.at>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<49398489.3060907@aon.at>
Message-ID: <493993E8.5000807@palladion.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gregor Lingl wrote:
> 
> glyph at divmod.com schrieb:
>> To be fair, if someone asked me specifically about educating non- 
>> programmer adults about programming, I would probably at least 
>> *mention* py3, if not recommend it outright.  The improved consistency 
>> is worth a lot in an educational setting.  (But, if one is educating 
>> children and interested in soliciting their genuine enthusiasm, 
>> whiz-bang graphics are really a must-have, not a negotiable extra.)
> As a non native English speaker I'm not sure if I understand correctly, 
> what you mean with whiz-bang graphics. Nevertheless I'd like to point 
> you to the new turtle graphics module (which is part of the standard 
> librarys since 2.6). At least it was designed especially for use in the 
> educational  domain. Moreover the source-distribution also contains a 
> bunch of some ten example scripts.

I'm pretty sure he that turtle graphics are not "whiz-bang" (in this
century, at least).  Begin able to do pygame-style OpenGL stuff would be
"whiz bang"[1] in my book.


[1] http://www.merriam-webster.com/dictionary/whizbang


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOZPn+gerLs4ltQ4RAnE1AKCl+Z51tACSJLBmAOcp5q534Mx+2ACg1I28
re6gaV7AFEU0WS1yvUIiZS0=
=4Pda
-----END PGP SIGNATURE-----


From a.badger at gmail.com  Fri Dec  5 21:57:35 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 12:57:35 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
References: <4938374B.8000006@gmail.com>
	<49386A2C.60208@v.loewis.de>	<25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net>	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
Message-ID: <493995BF.3000705@gmail.com>

Guido van Rossum wrote:
> At the risk of bringing up something that was already rejected, let me
> propose something that follows the path taken in 3.0 for filenames,
> rather than doubling back:
> 
> For os.environ, os.getenv() and os.putenv(), I think a similar
> approach as used for os.listdir() and os.getcwd() makes sense: let
> os.environ skip variables whose name or value is undecodable, and have
> a separate os.environb() which contains bytes; let os.getenv() and
> os.putenv() do the right thing when the arguments passed in are bytes.
> 
I prefer the method used by file.read() where an error is thrown when
accessing undecodable data.  I think in time python programmers will
consider not throwing an exception a wart in python3.  However, this is
enough to allow programmers to do the right thing once an error is
reported by users and the cause has been tracked down so it doesn't
block fixing errors as the current code does.

And it's not like anyone expected python3 to be wart-free just because
the python2 warts were fixed ;-)

> For sys.argv, because it's positional, you can't skip undecodable
> values, so I propose to use error=replace for the decoding; again, we
> can add sys.argvb that contains the raw bytes values. The various
> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
> and the subprocess module) should all accept bytes as well as strings.
> 
This also seems sane with the same comment about throwing errors.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/f9bd542a/attachment.pgp>

From victor.stinner at haypocalc.com  Fri Dec  5 19:20:59 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 5 Dec 2008 19:20:59 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493966CA.2010801@gmail.com>
References: <4938374B.8000006@gmail.com>
	<200812051118.48096.victor.stinner@haypocalc.com>
	<493966CA.2010801@gmail.com>
Message-ID: <200812051920.59463.victor.stinner@haypocalc.com>

Hi,

> > But they are open questions (already asked in the bug tracker):
>
> I answered these in the bug tracker.  Here are the answers for the
> mailing list:

Oh, sorry. I didn't follow the end of the discussion on the bug tracker.

> >    os.environb['PATH'] = '\xff'
> >    => os.environ['PATH'] = ???
>
>      os.environ['PATH'] => raises KeyError because PATH is not a key in
> the unicode decoded environment.

Ok, good answer :-)

> >    os.environ['PATH'] = chr(0x10000)
> >    => os.environb['PATH'] = ???
>
> raise UnicodeEncodeError when setting the value.

Ok, it's consistent the current behaviour.

$ LANG=C ./python
Python 3.0rc3+ (py3k:67498M, Dec  4 2008, 17:45:54)
>>> import os
>>> os.environ['x'] = '\xff'
>>> os.environ['x']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/haypo/prog/py3k/Lib/io.py", line 1491, in write
    b = encoder.encode(s)
  File "/home/haypo/prog/py3k/Lib/encodings/ascii.py", line 22, in encode
    return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: 
ordinal not in range(128)

Oh, that's strange :-p The error is delayed when we read the value.

> > It would be maybe easier if os.environ supports bytes and unicode keys.
> > But we have to keep these assertions:
> >    os.environ[bytes] -> bytes
> >    os.environ[str] -> str
>
> I think the same choices have to be made here.  If LANG=C, we still have
> to decide what to do when os.environ[str] is set to a non-ASCii string.

If the charset is US-ASCII, os.environ will drop non-ASCII values. But most 
variables are ASCII only. Examples with my shell:

$ env
XCURSOR_THEME=kubuntu
LANG=fr_FR.UTF-8
EDITOR=vim
HOME=/home/haypo
...

> Additionally, the subprocess question makes using the key value
> undesirable compared with having a separate os.environb that accesses
> the same underlying data.

The user should be able to choose bytes or unicode. Examples:
 - subprocess.Popen('ls') => use unicode environment (os.environ)
 - subprocess.Popen(b'ls') => use bytes environment (os.environb)

> Here's my problem with it, though.  With these semantics any program
> that works on arbitrary files and runs on *NIX has to check
> os.listdir(b'') and do the conversion manually.

Only programs that have to support strange environment like yours (mixing 
Shift-JIS and UTF-8) :-) Most programs don't have to support these charset 
mixture.

We can imagine an higher library working on UNIX and Windows (bytes or 
Unicode). But that would be later.

> I think the desired behaviour assuming the existence of a nondecodable
> file is this:

I prefer the current behaviour :-)

> Why do you think that glob.glob('*.py') is special and should not traceback?

It's not special. glob() reuses listdir(), and it was an example to show 
that "it just works".

> I just differ in that I think lack of tracebacks when
> UnicodeDecodeErrors are encountered is a wart in python3 that did not
> exist in python2.

Right.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From ncoghlan at gmail.com  Fri Dec  5 23:18:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 08:18:47 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493991D3.9030003@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com>
Message-ID: <4939A8C7.6050209@gmail.com>

Toshio Kuratomi wrote:
> Guido van Rossum wrote:
>> Glob was just an example. Many use cases for directory traversal
>> couldn't care less if they see *all* files.
>>
> Okay.  Makes it harder to prove correct or not if I don't know what the
> use case is :-)  I can't think of a single use case off-hand.
> 
> Even your example of a ??.txt file making retrieval of *.py files fail
> is a little broken.  If there was a ??.py file that was undecodable the
> program would most likely want to know that file existed.

Why? Most programs won't be able to do anything with it. And if the
program *can* do something with it... that's what the bytes version of
the APIs are for.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Dec  5 23:21:55 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 08:21:55 +1000
Subject: [Python-Dev] __import__ docs follow-up
In-Reply-To: <ghbrls$df5$1@ger.gmane.org>
References: <ghbrls$df5$1@ger.gmane.org>
Message-ID: <4939A983.2060400@gmail.com>

Georg Brandl wrote:
> Hi,
> 
> as a follow-up to the thread a few days ago, and the bug report, I've
> rewritten most of the __import__ docs.  I've attached the suggested patch
> to the issue <http://bugs.python.org/issue4457>.
> 
> I'd be glad for reviews. Also, I'd like to ask about opinions if this
> "winning idiom" (as a bug comment states) should be in it, instead of
> the getattr() helper function:
> 
>>>> import sys
>>>> __import__('x.y.z')
>>>> mod = sys.modules['x.y.z']

That way is a lot cleaner than other mechanisms I've seen (including the
current mechanism in the docs). Making that the recommended way of doing
a dynamic import seems like a good idea to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From a.badger at gmail.com  Fri Dec  5 23:21:50 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 14:21:50 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051920.59463.victor.stinner@haypocalc.com>
References: <4938374B.8000006@gmail.com>	<200812051118.48096.victor.stinner@haypocalc.com>	<493966CA.2010801@gmail.com>
	<200812051920.59463.victor.stinner@haypocalc.com>
Message-ID: <4939A97E.9030609@gmail.com>

Victor Stinner wrote:
>>> It would be maybe easier if os.environ supports bytes and unicode keys.
>>> But we have to keep these assertions:
>>>    os.environ[bytes] -> bytes
>>>    os.environ[str] -> str
>> I think the same choices have to be made here.  If LANG=C, we still have
>> to decide what to do when os.environ[str] is set to a non-ASCii string.
> 
> If the charset is US-ASCII, os.environ will drop non-ASCII values. But most 
> variables are ASCII only. Examples with my shell:
> 
Yes.  But you still have the question of what to do when:
os.environ[str] = chr(0x10000)

So I don't think it makes things simpler than having separate os.environ
and os.environb that update the same data behind the scenes.

>> Additionally, the subprocess question makes using the key value
>> undesirable compared with having a separate os.environb that accesses
>> the same underlying data.
> 
> The user should be able to choose bytes or unicode. Examples:

the subprocess question was posed further up the thread as basically --
does the user need to access os.environb in order to override things in
the environment when calling subprocess?  I think the answer to that is
yes since you might want to start with your environment and modify it
slightly when you call programs via subprocess.  If you just try to copy
os.environ and os.environ only iterates through the decodable env vars,
that doesn't work.  If you have an os.environb to copy it becomes possible.

>  - subprocess.Popen('ls') => use unicode environment (os.environ)
>  - subprocess.Popen(b'ls') => use bytes environment (os.environb)
> 
That's... not expected to me :-(

If I never touch os.environ and invoke subprocess the normal way, I'd
still expect the whole environment to be passed on to the program being
called.  This is how invoking programs manually, shell scripting,
invoking programs from perl, python2, etc work.

Also, it's not really a good fit with the other things that key off of
the initial argument.  os.listdir(b'.') changes the output to bytes.
subprocess.Popen(b'ls') would change what environment gets input into
the call.

>> Here's my problem with it, though.  With these semantics any program
>> that works on arbitrary files and runs on *NIX has to check
>> os.listdir(b'') and do the conversion manually.
> 
> Only programs that have to support strange environment like yours (mixing 
> Shift-JIS and UTF-8) :-) Most programs don't have to support these charset 
> mixture.
> 
Any program that is intended to be distributed, accesses arbitrary
files, and works on *nix platforms needs to take this into account.
Just because the environment inside of my organization is sane doesn't
mean that when we release the code to customers, clients, or the free
software community that the places it runs will be as strict about these
things.

Are most programs specific to one organization or are they distributed
to other people?  I can't answer that... everything I work on (except
passwords:-) is distributed -- from sys admin cronjobs to web
applications since I'm lucky that my whole job is devoted to working on
free software.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/da33d279/attachment.pgp>

From ncoghlan at gmail.com  Fri Dec  5 23:31:27 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 08:31:27 +1000
Subject: [Python-Dev] Merging flow
In-Reply-To: <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org>
References: <gh8s08$p9r$1@ger.gmane.org>	<5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
	<9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org>
Message-ID: <4939ABBF.90400@gmail.com>

Fred Drake wrote:
> On Dec 5, 2008, at 2:20 PM, Mark Dickinson wrote:
>> Did I mess up somewhere, or does svnmerge not work on
>> a revision that was itself the result of an svnmerge?
> 
> I ran into this yesterday as well with my patch to the cgi module.  The
> work-around was to revert the change to that property and edit it manually.
> 
> I think this is a significant issue, since editing that property is
> about as error-prone as it can be.  I've not really looked at the code
> in svnmerge.py, so I'm not sure how hard it would be to fix.

I think we're discovering the real reasons why people generally prefer
to use a DVCS when trying to manage multiple branches :P

For now it looks like we might have to maintain 3.0 manually, with
svnmerge only helping out for trunk->2.6 and trunk->py3k...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Dec  5 23:34:25 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 08:34:25 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939A97E.9030609@gmail.com>
References: <4938374B.8000006@gmail.com>	<200812051118.48096.victor.stinner@haypocalc.com>	<493966CA.2010801@gmail.com>	<200812051920.59463.victor.stinner@haypocalc.com>
	<4939A97E.9030609@gmail.com>
Message-ID: <4939AC71.7010702@gmail.com>

Toshio Kuratomi wrote:
> Are most programs specific to one organization or are they distributed
> to other people?

The former. That's pretty well documented in assorted IT literature
('shrink-wrap' and open source commodity software are still relatively
new players on the scene that started to shift the balance the other
way, but now the server side elements of web services are shifting it
back again).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From lists at cheimes.de  Fri Dec  5 23:47:49 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 05 Dec 2008 23:47:49 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <4939ABBF.90400@gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>	<5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
	<9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org>
	<4939ABBF.90400@gmail.com>
Message-ID: <4939AF95.3050506@cheimes.de>

Nick Coghlan wrote:
> I think we're discovering the real reasons why people generally prefer
> to use a DVCS when trying to manage multiple branches :P
> 
> For now it looks like we might have to maintain 3.0 manually, with
> svnmerge only helping out for trunk->2.6 and trunk->py3k...

The problem seems to be trunk -> py3k -> 3.0. I had no issues with py3k 
-> 3.0.

Christian

From a.badger at gmail.com  Fri Dec  5 23:47:54 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 14:47:54 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939AC71.7010702@gmail.com>
References: <4938374B.8000006@gmail.com>	<200812051118.48096.victor.stinner@haypocalc.com>	<493966CA.2010801@gmail.com>	<200812051920.59463.victor.stinner@haypocalc.com>
	<4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com>
Message-ID: <4939AF9A.50809@gmail.com>

Nick Coghlan wrote:
> Toshio Kuratomi wrote:
>> Are most programs specific to one organization or are they distributed
>> to other people?
> 
> The former. That's pretty well documented in assorted IT literature
> ('shrink-wrap' and open source commodity software are still relatively
> new players on the scene that started to shift the balance the other
> way, but now the server side elements of web services are shifting it
> back again).
> 
Cool.  So it's only people writing code to be shared with the larger
community or written for multiple customers that are affected by bugs
like this. :-/

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/146faa7b/attachment.pgp>

From a.badger at gmail.com  Fri Dec  5 23:48:38 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 05 Dec 2008 14:48:38 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939A8C7.6050209@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
Message-ID: <4939AFC6.7000106@gmail.com>

Nick Coghlan wrote:
> Toshio Kuratomi wrote:
>> Guido van Rossum wrote:
>>> Glob was just an example. Many use cases for directory traversal
>>> couldn't care less if they see *all* files.
>>>
>> Okay.  Makes it harder to prove correct or not if I don't know what the
>> use case is :-)  I can't think of a single use case off-hand.
>>
>> Even your example of a ??.txt file making retrieval of *.py files fail
>> is a little broken.  If there was a ??.py file that was undecodable the
>> program would most likely want to know that file existed.
> 
> Why? Most programs won't be able to do anything with it. And if the
> program *can* do something with it... that's what the bytes version of
> the APIs are for.
> 
Nonsense.  A program can do tons of things with a non-decodable
filename.  Where it's limited is non-decodable filedata.

For instance, if you have a graphical text editor, you need to let the
user select files to load.  To do that you need to list all the files in
a directory, even the ones that aren't decodable.  The ones that aren't
decodable need to substitute something like:
  str(filename, errors='replace') + '(Filename not encoded in UTF8)'
in the file listing that the user sees.  When the file is loaded, it
needs to access the actual raw filename.  The file can then be loaded
and operated upon and even saved back to disk using the raw, undecodable
filename.

If you have a file manager, you need to code something that let's the
user move the file around.  Once again, the program loads the raw
filenames.  It transforms the name into something representable to the
user.  It displays that.  The user selects it and asks that it be moved
to another location.  Then the program uses the raw filename to move
from one location to another.

If you have a backup program, you need to list all the files in a
directory.  Then you need to copy those files to another location.  Once
again you have to retrieve the byte version of any non-decodable filenames.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081205/40ceb6a3/attachment.pgp>

From fdrake at acm.org  Sat Dec  6 00:09:53 2008
From: fdrake at acm.org (Fred Drake)
Date: Fri, 05 Dec 2008 18:09:53 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <4939ABBF.90400@gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com>
	<9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org>
	<4939ABBF.90400@gmail.com>
Message-ID: <DFB67303-D93A-49B6-97CE-F1314F595D34@acm.org>

On Dec 5, 2008, at 5:31 PM, Nick Coghlan wrote:
> I think we're discovering the real reasons why people generally prefer
> to use a DVCS when trying to manage multiple branches :P

Really?  I don't.  The issue has nothing to do with someone  
maintaining private change sets, or wanting to do development with  
local commits without having access to commit to the project.

I expect (and someone from work has said they do as well) that  
Subversion 1.5's merge tracking would have handled this situation.

> For now it looks like we might have to maintain 3.0 manually, with
> svnmerge only helping out for trunk->2.6 and trunk->py3k...


I don't know if I'll have time to look at svnmerge this weekend (with  
house guests and all), but I really don't expect it's a difficult  
problem to solve in the tool.  The behavior suggests that this tiered  
set of branch relationships wasn't expected.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From jimjjewett at gmail.com  Sat Dec  6 00:12:05 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 5 Dec 2008 18:12:05 -0500
Subject: [Python-Dev] Merging flow
Message-ID: <fb6fbf560812051512v4e23cecdwbd69b019c30d9f54@mail.gmail.com>

Nick Coghlan wrote:

> For now it looks like we might have to maintain 3.0 manually, with
> svnmerge only helping out for trunk->2.6 and trunk->py3k

Does it make the bookkeeping horrible if you merge from trunk straight
to 3.0, and then blocked svnmerged changes from propagating?

-jJ

From martin at v.loewis.de  Sat Dec  6 00:46:22 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 06 Dec 2008 00:46:22 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <18745.18381.364105.121084@montanaro-dyndns-org.local>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204123750.GA890@amk.local>
	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>
	<B2649D21-0D63-4598-B134-987B37549146@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>
	<4938D7F9.80908@v.loewis.de>
	<18745.18381.364105.121084@montanaro-dyndns-org.local>
Message-ID: <4939BD4E.5020004@v.loewis.de>

> Good.  Now we just need to populate them.  I take it the classifiers without
> minor numbers imply any known minor version (e.g., 2 ==> 2.3 and greater)?

Perhaps. As usual, they mean what people use them for.

I intended them to mean 2.x and 3.x, respectively, with no constraint on
x (i.e. including possibly 2.0 and 2.1). In particular, presence of "2"
and absence of "3" is meant to indicate "I know that it won't work on
Python 3".

Regards,
Martin

From rdmurray at bitdance.com  Sat Dec  6 01:04:01 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Fri, 5 Dec 2008 19:04:01 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0812051850350.1160@kimball.webabinitio.net>

On Fri, 5 Dec 2008 at 12:11, Guido van Rossum wrote:
> On Fri, Dec 5, 2008 at 12:05 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
>> Guido van Rossum wrote:
>>> On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt <eckhardt at satorlaser.com> wrote:
>>>> In 99% of all cases, using the default encoding will work and do what people
>>>> expect, which is why I would make this conversion automatic. In all other
>>>> cases, it will at least not fail silently (which would lead to garbage and
>>>> data loss) and allow more sophisticated applications to handle it.
>>>
>>> I think the "always fail noisily" approach isn't the best approach.
>>> E.g. if I am globbing for *.py, and there's an undecodable .txt file
>>> in a directory, its presence shouldn't cause the glob to fail.
>>>
>> But why should it make glob() fail?  This sounds like an implementation
>> detail of glob.
>
> Glob was just an example. Many use cases for directory traversal
> couldn't care less if they see *all* files.

I agree with Toshio.  The only use case I can think of for not seeing
all files is when selecting a subset, and if the thing that does the
selecting only generates a traceback if a file that falls into the
subset is undecodable, then I don't see a problem.  That is, if I'm
selecting a subset of the files in a directory, and one of that subset
is undecodable, I _want_ a traceback, because I'll be wanting _all_
of the files that match my selection criteria.(*)

So I'm curious to hear your use cases where undecodable files are
"don't care".

(*) More specifically, I want the program of a developer who didn't think
about the fact that users might have files with undecodable filenames
in their directory to generate a traceback rather than silently losing
those files.  (This is spoken to both by the principle of least
surprise and the zen rule that errors should never pass silently :)

--RDM

From ncoghlan at gmail.com  Sat Dec  6 01:48:27 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 10:48:27 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939AFC6.7000106@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com>
Message-ID: <4939CBDB.30305@gmail.com>

Toshio Kuratomi wrote:
> Nick Coghlan wrote:
>> Toshio Kuratomi wrote:
>>> Guido van Rossum wrote:
>>>> Glob was just an example. Many use cases for directory traversal
>>>> couldn't care less if they see *all* files.
>>>>
>>> Okay.  Makes it harder to prove correct or not if I don't know what the
>>> use case is :-)  I can't think of a single use case off-hand.
>>>
>>> Even your example of a ??.txt file making retrieval of *.py files fail
>>> is a little broken.  If there was a ??.py file that was undecodable the
>>> program would most likely want to know that file existed.
>> Why? Most programs won't be able to do anything with it. And if the
>> program *can* do something with it... that's what the bytes version of
>> the APIs are for.
>>
> Nonsense.  A program can do tons of things with a non-decodable
> filename.  Where it's limited is non-decodable filedata.

You can't display a non-decodable filename to the user, hence the user
will have no idea what they're working on. Non-filesystem related apps
have no business trying to deal with insane filenames.

Linux is moving towards a standard of UTF-8 for filenames, and once we
get to the point where the idea of encoding filenames and environment
variables any other way is seen as crazy, then the Python 3 approach
will work seamlessly.

In the meantime, raw bytes APIs will provide an alternative for those
that disagree with that philosophy.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From thomas at python.org  Sat Dec  6 01:49:08 2008
From: thomas at python.org (Thomas Wouters)
Date: Sat, 6 Dec 2008 01:49:08 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
Message-ID: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>

On Fri, Dec 5, 2008 at 19:10, Guido van Rossum <guido at python.org> wrote:

> On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:
> > With all due respect, for me, "library support" and "serious use" are
> > synonymous.
>
> Glyph, I cannot have a discussion with you if every single post of
> yours is longer than my combined daily output. Please spend some time
> writing shorter posts. I'm sure I'm not the only one here with a short
> attention span. :-)


Allow me to paraphrase glyph (with whom I'm in complete agreement, for what
it's worth): many newbies will be disappointed by Python if they start with
Python 3.0 and discover that most of the cool possibilities they had heard
about are 'being worked on' and not quite ready. I don't doubt that 3.0 will
be easier for the new programmer to learn, but I do not believe the average
"Oh, I heard about Python, let's learn it" person should be pointed to 3.0
right now. They should be encouraged to learn 2.6 -- or even 2.5.

In spite of Python being a programming language, there is a difference
between 'casual user of the language' and 'library developer'; 3.0 is
certainly a must for all actual library developers, and I'm sure most of
them know about 3.0 by now. We're talking about first impressions for people
without that knowledge.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/ae8f156d/attachment.htm>

From murman at gmail.com  Sat Dec  6 02:00:45 2008
From: murman at gmail.com (Michael Urman)
Date: Fri, 5 Dec 2008 19:00:45 -0600
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CBDB.30305@gmail.com>
References: <4938374B.8000006@gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
Message-ID: <dcbbbb410812051700v6ea1b834l8dbed8c243409dc4@mail.gmail.com>

On Fri, Dec 5, 2008 at 18:48, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Toshio Kuratomi wrote:
>> Nick Coghlan wrote:
>>> Toshio Kuratomi wrote:
>>>> Guido van Rossum wrote:
>>>>> Glob was just an example. Many use cases for directory traversal
>>>>> couldn't care less if they see *all* files.
>>>>>
>>>> Okay.  Makes it harder to prove correct or not if I don't know what the
>>>> use case is :-)  I can't think of a single use case off-hand.
>>>>
>>>> Even your example of a ??.txt file making retrieval of *.py files fail
>>>> is a little broken.  If there was a ??.py file that was undecodable the
>>>> program would most likely want to know that file existed.
>>> Why? Most programs won't be able to do anything with it. And if the
>>> program *can* do something with it... that's what the bytes version of
>>> the APIs are for.
>>>
>> Nonsense.  A program can do tons of things with a non-decodable
>> filename.  Where it's limited is non-decodable filedata.
>
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.

And what of python's batteries---does a library that takes filenames
or directories from a controlling program and processes the contents
of the file need to care whether the file can be encoded properly? Is
said library filesystem related or not?

Won't it be awful when it's the directory name, and processing the
file works if you change into its directory, but not if you're outside
of it? And if there's an error during processing and the library
reports a full filename using os.abspath("file.ext"), but cannot get
the results?

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.
>
> In the meantime, raw bytes APIs will provide an alternative for those
> that disagree with that philosophy.

And until that time, it's agony for the library writers who didn't
think they needed to care, but find that their users (other
developers) do.
-- 
Michael Urman

From steve at pearwood.info  Sat Dec  6 02:03:55 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 6 Dec 2008 12:03:55 +1100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939A8C7.6050209@gmail.com>
References: <4938374B.8000006@gmail.com> <493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>
Message-ID: <200812061203.55624.steve@pearwood.info>

On Sat, 6 Dec 2008 09:18:47 am Nick Coghlan wrote:
> Toshio Kuratomi wrote:
> > Guido van Rossum wrote:
> >> Glob was just an example. Many use cases for directory traversal
> >> couldn't care less if they see *all* files.
> >
> > Okay.  Makes it harder to prove correct or not if I don't know what
> > the use case is :-)  I can't think of a single use case off-hand.
> >
> > Even your example of a ??.txt file making retrieval of *.py files
> > fail is a little broken.  If there was a ??.py file that was
> > undecodable the program would most likely want to know that file
> > existed.
>
> Why? Most programs won't be able to do anything with it.

But the program can report a sensible error message, so the user can fix 
the problem.

I'd rather have the Python API report errors then silence them, at least 
by default. I don't suppose it's on the table for functions to grow an 
extra argument that tells them to skip broken file names and 
environment variables? 

What I have in mind is something like:

os.listdir(path, silence_errors=False) -> list_of_strings

By default, if a filename in path is not a valid string, an exception is 
raised, with the guilty file name given in bytes as an attribute of the 
exception. If silence_errors is true, the invalid file names are 
silently skipped.



-- 
Steven

From ncoghlan at gmail.com  Sat Dec  6 02:05:24 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 11:05:24 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939AF9A.50809@gmail.com>
References: <4938374B.8000006@gmail.com>	<200812051118.48096.victor.stinner@haypocalc.com>	<493966CA.2010801@gmail.com>	<200812051920.59463.victor.stinner@haypocalc.com>
	<4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com>
	<4939AF9A.50809@gmail.com>
Message-ID: <4939CFD4.1050203@gmail.com>

Toshio Kuratomi wrote:
> Nick Coghlan wrote:
>> Toshio Kuratomi wrote:
>>> Are most programs specific to one organization or are they distributed
>>> to other people?
>> The former. That's pretty well documented in assorted IT literature
>> ('shrink-wrap' and open source commodity software are still relatively
>> new players on the scene that started to shift the balance the other
>> way, but now the server side elements of web services are shifting it
>> back again).
>>
> Cool.  So it's only people writing code to be shared with the larger
> community or written for multiple customers that are affected by bugs
> like this. :-/

True, but it's still a fairly important problem to have a solution to.
Even internally in large organisations there can be some pretty insane
environments as cruft accumulates over the years.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Sat Dec  6 02:19:24 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 06 Dec 2008 02:19:24 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<20081204123750.GA890@amk.local>	<6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1>	<B2649D21-0D63-4598-B134-987B37549146@python.org>	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<C8F99A02-9501-40FA-99F2-76E8435BC69D@acm.org>	<4938D7F9.80908@v.loewis.de>	<18745.18381.364105.121084@montanaro-dyndns-org.local>
	<5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org>
Message-ID: <4939D31C.4010101@v.loewis.de>

> There was already "Programming Language :: Python", provided by many
> packages.  I think version compatibility relationships meant by each of
> these classifiers should be made explicit, wherever it is that
> documentation for classifiers is provided.
> 
> I don't recall having seen any such documentation; hopefully I just need
> to be hit by another clue.

There is no documentation for classifiers whatsoever. I don't think
nuances matter much, anyway.

Regards,
Martin

From martin at v.loewis.de  Sat Dec  6 02:22:29 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 06 Dec 2008 02:22:29 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051043.10938.victor.stinner@haypocalc.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<200812051043.10938.victor.stinner@haypocalc.com>
Message-ID: <4939D3D5.1030403@v.loewis.de>

>> 5) represent all environment variables in Unicode strings,
>>    including the ones that currently fail to decode.
>>    (then do the same to file names, then drop the byte-oriented
>>     file operations again)
> 
> Please, don't do that! Bytes are not characters!

And environment variables, command line arguments, and file names
are not bytes, but characters.

Regards,
Martin

From foom at fuhm.net  Sat Dec  6 02:37:45 2008
From: foom at fuhm.net (James Y Knight)
Date: Fri, 5 Dec 2008 20:37:45 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CBDB.30305@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
Message-ID: <EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>

On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote:
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.

Sigh, same arguments, all over again.

Again, *both* KDE and Gnome apps display non-decodable filenames to  
the user, and let the user work with the files. They display as good a  
rendition as they can, using a replacement character as appropriate.  
In some earlier versions, KDE did not work at all on poorly-encoded  
files, and, users submitted bug reports. People do care, it does  
happen in real life, and it is a bug in your software if you cannot  
deal with the users' files. They just want the software to work. If it  
shows something weird in the window titlebar, that's a bit irritating  
but at least it doesn't get in the way of working.

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.

I seriously doubt that would ever enforce utf-8 filenames/env vars/ 
command arguments. Oddly encoded strings will always be with us in  
some form or another.

Now, perhaps you use crontab? At least on the systems I have, programs  
run by cron don't have any locale environment variables set, and so  
default to the "C" locale. So utf-8 encoded filenames/etc will fail,  
by default, for any python3 program run under cron.

I'd like to make an analogy: what if Python3 couldn't deal with  
filenames with spaces in them on unix? Most filenames don't have  
spaces in them, so it should be okay, right? And those people who  
really need to deal with space-containing filenames can use this other  
API variant, instead of the recommended and most obvious one. That'd  
be okay, right? No, of course it wouldn't be okay!

James

From guido at python.org  Sat Dec  6 02:47:45 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 5 Dec 2008 17:47:45 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
Message-ID: <ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>

On Fri, Dec 5, 2008 at 4:49 PM, Thomas Wouters <thomas at python.org> wrote:
> On Fri, Dec 5, 2008 at 19:10, Guido van Rossum <guido at python.org> wrote:
>>
>> On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:
>> > With all due respect, for me, "library support" and "serious use" are
>> > synonymous.
>>
>> Glyph, I cannot have a discussion with you if every single post of
>> yours is longer than my combined daily output. Please spend some time
>> writing shorter posts. I'm sure I'm not the only one here with a short
>> attention span. :-)
>
> Allow me to paraphrase glyph (with whom I'm in complete agreement, for what
> it's worth): many newbies will be disappointed by Python if they start with
> Python 3.0 and discover that most of the cool possibilities they had heard
> about are 'being worked on' and not quite ready. I don't doubt that 3.0 will
> be easier for the new programmer to learn, but I do not believe the average
> "Oh, I heard about Python, let's learn it" person should be pointed to 3.0
> right now. They should be encouraged to learn 2.6 -- or even 2.5.

Thanks for the summary! Maybe Glyph should just pipe his email through you. :-)

Without more context it's impossible to make a good recommendation.
Most people probably want to learn Python because they want to access
some system for which Python is required -- whether that's Blender,
Google App Engine, their Nokia cell phone, or something that some of
their colleagues have written (most Googlers learning Python fall in
that category :-). In that case they don't have a choice -- they
should learn the version that is used by the system they want to use.
Obviously that's going to be 2.x in most cases, at least for a while.

But I disagree that "most of the cool possibilities they have heard
about" are necessarily third party libraries. Python's standard
library has lots of stuff to offer.

> In spite of Python being a programming language, there is a difference
> between 'casual user of the language' and 'library developer'; 3.0 is
> certainly a must for all actual library developers, and I'm sure most of
> them know about 3.0 by now. We're talking about first impressions for people
> without that knowledge.

Well if most library developers already know 3.0 by now, I would hope
they aren't going to sit on their hands, and solve the issues at hand!
In the mean time, I don't mind if people learn 3.0 first and 2.6
second. It's probably easier that way than the other way around. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From murman at gmail.com  Sat Dec  6 02:55:44 2008
From: murman at gmail.com (Michael Urman)
Date: Fri, 5 Dec 2008 19:55:44 -0600
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939D3D5.1030403@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<200812051043.10938.victor.stinner@haypocalc.com>
	<4939D3D5.1030403@v.loewis.de>
Message-ID: <dcbbbb410812051755i1d7ea378k1ebe87c2444a09c0@mail.gmail.com>

On Fri, Dec 5, 2008 at 19:22, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Please, don't do that! Bytes are not characters!
>
> And environment variables, command line arguments, and file names
> are not bytes, but characters.

On Windows NT, sure. On Unix they're still bytes no matter how much we
want them to be characters.

This difference, and secondarily the way python 3 tries to sweep it
under the rug, seem to be the roots of the problem.

-- 
Michael Urman

From steve at pearwood.info  Sat Dec  6 02:58:27 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 6 Dec 2008 12:58:27 +1100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
Message-ID: <200812061258.27507.steve@pearwood.info>

On Sat, 6 Dec 2008 12:47:45 pm Guido van Rossum wrote:
> But I disagree that "most of the cool possibilities they have heard
> about" are necessarily third party libraries. Python's standard
> library has lots of stuff to offer.

+1 on that. I've been using Python for a decade now, and the first third 
party library I've downloaded and used was Pyparsing a month or two 
ago. I'll be the first to admit that my programs tend to be on the 
small size, but they're useful to me. The lack of third party libraries 
to Python 3 is not necessarily a show-stopper.


-- 
Steven

From martin at v.loewis.de  Sat Dec  6 03:02:49 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 06 Dec 2008 03:02:49 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <dcbbbb410812051755i1d7ea378k1ebe87c2444a09c0@mail.gmail.com>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>	
	<200812051043.10938.victor.stinner@haypocalc.com>	
	<4939D3D5.1030403@v.loewis.de>
	<dcbbbb410812051755i1d7ea378k1ebe87c2444a09c0@mail.gmail.com>
Message-ID: <4939DD49.7030600@v.loewis.de>

>> And environment variables, command line arguments, and file names
>> are not bytes, but characters.
> 
> On Windows NT, sure. On Unix they're still bytes no matter how much we
> want them to be characters.

Only in the API of the OS itself. Treating them as bytes in the
application is a mistake. The bytes are intended to represent
characters, so Python should treat them as what they are.

Regards,
Martin

From steve at pearwood.info  Sat Dec  6 03:06:40 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 6 Dec 2008 13:06:40 +1100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CBDB.30305@gmail.com>
References: <4938374B.8000006@gmail.com> <4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>
Message-ID: <200812061306.40613.steve@pearwood.info>

On Sat, 6 Dec 2008 11:48:27 am Nick Coghlan wrote:
> Toshio Kuratomi wrote:
> > Nick Coghlan wrote:
...
> >> Why? Most programs won't be able to do anything with it. And if
> >> the program *can* do something with it... that's what the bytes
> >> version of the APIs are for.
> >
> > Nonsense.  A program can do tons of things with a non-decodable
> > filename.  Where it's limited is non-decodable filedata.
>
> You can't display a non-decodable filename to the user, hence the
> user will have no idea what they're working on. Non-filesystem
> related apps have no business trying to deal with insane filenames.

I don't agree. Putting my user's hat on, I know what I would expect: the 
app should display *some* name, it doesn't matter exactly what, so long 
as:

* it's as close as possible to the "real" name; 

* it is unique in that directory (doesn't shadow another file); and

* it's enough to identify the file so I can read/save/delete/rename the 
file.

I think there are analogous situations: long-time Windows users will be 
used to seeing files listed as "longfilename.txt" in some applications 
and "longfi~1.txt" in another. Under POSIX, file names can contain 
unprintable ctrl characters, and the shell will print them at least 
three ways, depending on context. E.g. for a file containing a 
formfeed, I get one of ? \f or ^L in bash.

Applications can deal with such weird file names. KDE's file manager 
(konqueror) and file selection dialog both show the character as a 
small square, presumably the font's missing character glyph, and KDE 
apps can open and save the file. Still speaking as a user, I think it 
is quite reasonable to expect applications to deal with undisplayable 
filenames: displaying the name and opening the file are orthogonal 
concepts, although I accept that command-line interfaces will have 
difficulty with file names that can't be typed by the user!

I appreciate that broken unicode is more difficult to deal with than 
unprintable control characters, but the basic principle is the same.


-- 
Steven

From janssen at parc.com  Sat Dec  6 04:22:18 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 5 Dec 2008 19:22:18 PST
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
Message-ID: <27924.1228533738@parc.com>

Thomas Wouters <thomas at python.org> wrote:

> Allow me to paraphrase glyph (with whom I'm in complete agreement, for what
> it's worth): many newbies will be disappointed by Python if they start with
> Python 3.0 and discover that most of the cool possibilities they had heard
> about are 'being worked on' and not quite ready. I don't doubt that 3.0 will
> be easier for the new programmer to learn, but I do not believe the average
> "Oh, I heard about Python, let's learn it" person should be pointed to 3.0
> right now. They should be encouraged to learn 2.6 -- or even 2.5.

I think that's right.

I was asked this question today, and it comes up (to me) fairly often at
PARC.  I usually suggest using the Python version that's standard for
the user's platform, if they use OS X or Linux (and most do), which is
typically 2.5 (for OS X Leopard), and 2.4 (for Linux -- may be out of date).
For Windows users, I suggest the latest release (2.6).

Bill

From tseaver at palladion.com  Sat Dec  6 05:57:01 2008
From: tseaver at palladion.com (Tres Seaver)
Date: Fri, 05 Dec 2008 23:57:01 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
Message-ID: <493A061D.1060406@palladion.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ulrich Eckhardt wrote:
> On Friday 05 December 2008, Guido van Rossum wrote:
>> At the risk of bringing up something that was already rejected, let me
>> propose something that follows the path taken in 3.0 for filenames,
>> rather than doubling back:
>>
>> For os.environ, os.getenv() and os.putenv(), I think a similar
>> approach as used for os.listdir() and os.getcwd() makes sense: let
>> os.environ skip variables whose name or value is undecodable, and have
>> a separate os.environb() which contains bytes; let os.getenv() and
>> os.putenv() do the right thing when the arguments passed in are bytes.
>>
>> For sys.argv, because it's positional, you can't skip undecodable
>> values, so I propose to use error=replace for the decoding; again, we
>> can add sys.argvb that contains the raw bytes values. The various
>> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
>> and the subprocess module) should all accept bytes as well as strings.
>>
>> On Windows, the bytes APIs should probably not exist.
>>
>> I predict that most developers can get away with not using the bytes
>> APIs at all. The small minority that needs to be robust if not all
>> filenames use the system encoding can use the bytes APIs.
> 
> I know some of those developers, you can contact them via 
> python-dev at python.org. Seriously, what would you suggest to someone that 
> wants to handle paths in a portable way? Using the Unicode variants of 
> functions is fubar, because encoding/decoding is not universally possible. 
> Using the byte variant is equally fubar, because e.g. on MS Windows it is not 
> supported, except through a very lossy roundtrip through the locale's 
> codepage, limiting your functionality.
> 
> I actually think it is about time to give up on trying to think about a path 
> as a string. Dito for data received from os.environ or sys.argv. There are 
> only very few things that are universal to them and a reliable encoding is 
> none of them. Then, once you have let that idea go, meditate a bit over the 
> Zen.
> 
> What I propose is that paths must be treated as OS-specific, with the only 
> common reliable operations being joining them, concatenating them and 
> splitting them into segments divided by the (again, OS-specific) separator. 
> Other operations, like e.g. appending a string or converting it to a string 
> in order to display it can fail. And if they fail, they should fail noisily. 
> In 99% of all cases, using the default encoding will work and do what people 
> expect, which is why I would make this conversion automatic. In all other 
> cases, it will at least not fail silently (which would lead to garbage and 
> data loss) and allow more sophisticated applications to handle it.

Amen!  the idea that paths, environment varioables, and stuff pulled off
of sockets can be treated as text rather than strings is just wishful
thinking.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOgYd+gerLs4ltQ4RArQFAKDUZLXjwsIvNfNji4hbqM/aOZ0lMQCfRBq/
DHdYt2GGA1CrYA4a5pj+AZ4=
=4CcT
-----END PGP SIGNATURE-----


From tseaver at palladion.com  Sat Dec  6 05:57:01 2008
From: tseaver at palladion.com (Tres Seaver)
Date: Fri, 05 Dec 2008 23:57:01 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
Message-ID: <493A061D.1060406@palladion.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ulrich Eckhardt wrote:
> On Friday 05 December 2008, Guido van Rossum wrote:
>> At the risk of bringing up something that was already rejected, let me
>> propose something that follows the path taken in 3.0 for filenames,
>> rather than doubling back:
>>
>> For os.environ, os.getenv() and os.putenv(), I think a similar
>> approach as used for os.listdir() and os.getcwd() makes sense: let
>> os.environ skip variables whose name or value is undecodable, and have
>> a separate os.environb() which contains bytes; let os.getenv() and
>> os.putenv() do the right thing when the arguments passed in are bytes.
>>
>> For sys.argv, because it's positional, you can't skip undecodable
>> values, so I propose to use error=replace for the decoding; again, we
>> can add sys.argvb that contains the raw bytes values. The various
>> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
>> and the subprocess module) should all accept bytes as well as strings.
>>
>> On Windows, the bytes APIs should probably not exist.
>>
>> I predict that most developers can get away with not using the bytes
>> APIs at all. The small minority that needs to be robust if not all
>> filenames use the system encoding can use the bytes APIs.
> 
> I know some of those developers, you can contact them via 
> python-dev at python.org. Seriously, what would you suggest to someone that 
> wants to handle paths in a portable way? Using the Unicode variants of 
> functions is fubar, because encoding/decoding is not universally possible. 
> Using the byte variant is equally fubar, because e.g. on MS Windows it is not 
> supported, except through a very lossy roundtrip through the locale's 
> codepage, limiting your functionality.
> 
> I actually think it is about time to give up on trying to think about a path 
> as a string. Dito for data received from os.environ or sys.argv. There are 
> only very few things that are universal to them and a reliable encoding is 
> none of them. Then, once you have let that idea go, meditate a bit over the 
> Zen.
> 
> What I propose is that paths must be treated as OS-specific, with the only 
> common reliable operations being joining them, concatenating them and 
> splitting them into segments divided by the (again, OS-specific) separator. 
> Other operations, like e.g. appending a string or converting it to a string 
> in order to display it can fail. And if they fail, they should fail noisily. 
> In 99% of all cases, using the default encoding will work and do what people 
> expect, which is why I would make this conversion automatic. In all other 
> cases, it will at least not fail silently (which would lead to garbage and 
> data loss) and allow more sophisticated applications to handle it.

Amen!  the idea that paths, environment varioables, and stuff pulled off
of sockets can be treated as text rather than strings is just wishful
thinking.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOgYd+gerLs4ltQ4RArQFAKDUZLXjwsIvNfNji4hbqM/aOZ0lMQCfRBq/
DHdYt2GGA1CrYA4a5pj+AZ4=
=4CcT
-----END PGP SIGNATURE-----

From rdmurray at bitdance.com  Sat Dec  6 06:15:44 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Sat, 6 Dec 2008 00:15:44 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812061306.40613.steve@pearwood.info>
References: <4938374B.8000006@gmail.com> <4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com> <200812061306.40613.steve@pearwood.info>
Message-ID: <Pine.LNX.4.64.0812060004230.1160@kimball.webabinitio.net>

On Sat, 6 Dec 2008 at 13:06, Steven D'Aprano wrote:
> Applications can deal with such weird file names. KDE's file manager
> (konqueror) and file selection dialog both show the character as a
> small square, presumably the font's missing character glyph, and KDE
> apps can open and save the file. Still speaking as a user, I think it
> is quite reasonable to expect applications to deal with undisplayable
> filenames: displaying the name and opening the file are orthogonal

Agreed.  I would file a bug report if an application couldn't
handle a file that validly exists in my file system, no matter
how broken the filename might appear to be.

> concepts, although I accept that command-line interfaces will have
> difficulty with file names that can't be typed by the user!

Difficult, but not impossible: tab completion in the shell can allow
the user to submit otherwise difficult to type filenames to a program.
Which means python should be able to handle such things in argument
strings, so that my python utilities can manipulate such files when
specified as command line arguments....and a sensible error should be
generated by default if the program hasn't been written in such a way
that it can handle such input.

It would be wonderful if all Unix variants would switch to all UTF-8 (I
have done so on my own machines...I think :).  But it is a slow process.

--RDM

From glyph at divmod.com  Sat Dec  6 06:28:44 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 05:28:44 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com>
	<20081204213104.GA24509@amk-desktop.matrixgroup.net>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
Message-ID: <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>


On 5 Dec, 06:10 pm, guido at python.org wrote:
>On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:
>>With all due respect, for me, "library support" and "serious use" are
>>synonymous.
>
>Glyph, I cannot have a discussion with you if every single post of
>yours is longer than my combined daily output. Please spend some time
>writing shorter posts. I'm sure I'm not the only one here with a short
>attention span. :-)

I already spend a lot of time trying to remove extraneous details.  The 
drafts of these messages are usually 3x as long :).  So, trying to keep 
it short:

Thomas paraphrased my point pretty well.  The importance of libraries 
cannot be overemphasized.  Maybe you're right and the stdlib is enough 
for a large audience, but I don't know that audience.  Everyone I know 
who uses Python, uses it because of a library.  In some cases, an 
equivalent library exists for another language, and Python wins because 
it has a nicer syntax.  But, in no case does Python win where it 
*doesn't* have the library.

I think that the marketing for py3 needs to target library vendors 
before targeting novices.  If the novices are targeted first, they are 
going to have a bad experience when "python" libraries don't work with 
py3, and library maintainers are going to have a bad experience when 
clueless newbies harass them to update their software without 
understanding the magnitude of the work to do so.

I've been predicting this for years, but two days into Python 3's 
release, I've already seen real-world examples of this pattern in 
#twisted.  I can tell these people to "downgrade" to py2 when they come 
ask me for help, but I don't think most of them ask for help.  They just 
get angry and learn Java instead.

From stephen at xemacs.org  Sat Dec  6 06:31:51 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 06 Dec 2008 14:31:51 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CFD4.1050203@gmail.com>
References: <4938374B.8000006@gmail.com>
	<200812051118.48096.victor.stinner@haypocalc.com>
	<493966CA.2010801@gmail.com>
	<200812051920.59463.victor.stinner@haypocalc.com>
	<4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com>
	<4939AF9A.50809@gmail.com> <4939CFD4.1050203@gmail.com>
Message-ID: <874p1hq37c.fsf@xemacs.org>

Nick Coghlan writes:

 > True, but it's still a fairly important problem to have a solution to.
 > Even internally in large organisations there can be some pretty insane
 > environments as cruft accumulates over the years.

M&A and globalization makes it inevitable.

Toshio will remember the Mizuho April Fool's Day fiasco (a couple of
large banks merged, and when they reopened as a merged entity called
"Mizuho", the ATM system immediately crashed).

Japan being a country that doesn't believe in GAAP, such mergers are a
very difficult problem.  I don't know the details, but I wouldn't even
be surprised if encodings played a role in that mess because Japanese
companies often have their own internal variants of the national
standard JIS encoding.

From stephen at xemacs.org  Sat Dec  6 06:39:39 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 06 Dec 2008 14:39:39 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939D3D5.1030403@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<200812051043.10938.victor.stinner@haypocalc.com>
	<4939D3D5.1030403@v.loewis.de>
Message-ID: <873ah1q2uc.fsf@xemacs.org>

"Martin v. L?wis" writes:
 > >> 5) represent all environment variables in Unicode strings,
 > >>    including the ones that currently fail to decode.
 > >>    (then do the same to file names, then drop the byte-oriented
 > >>     file operations again)
 > > 
 > > Please, don't do that! Bytes are not characters!
 > 
 > And environment variables, command line arguments, and file names
 > are not bytes, but characters.

Unfortunately, both POSIX and OS implementation practice (including,
for example, VFAT file systems: NT-derived OSes are not safe!) say
otherwise, and that makes your line of argument extremely dangerous.

Remember, in a fight between human custom and machine programming, the
machine can always win by crashing.  For that reason, bytes must be
the underlying representation, always available, although I think it's
essential to make a text representation easily accessible, and even
the default.  Humans who would rather kvetch about the machine's
breakage than get a useful answer can (and should---problems will be
rare for most usage patterns) use the text representation.  Humans who
want reliability or debuggability, on the other hand, should have
something that cannot be mistaken for text immediately available.



From stephen at xemacs.org  Sat Dec  6 07:04:22 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 06 Dec 2008 15:04:22 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
Message-ID: <871vwlq1p5.fsf@xemacs.org>

Guido van Rossum writes:

 > This sounds too pessimistic to me. I expect that in five years it will
 > be universally accepted that these variables must be encoded in a
 > standard encoding.

Archival material will not catch up until the plastic rots.  And I bet
it takes ten years before the Japanese accept the same standard
encoding as the rest of the world (the Japanese cellphone system and
iMode still speak Shift JIS).  Five years should be plenty of time,
but big Japanese companies are very sensitive (and resistant to)
anything that might tend to open their turf to invaders.

 > People are never going to give up thinking about filenames etc. as
 > strings, because that's what they are conceptually.

People can't win this one 100%, they have to choose between
convenience with occasional fatal errors, or reliability.  Python
should not make it hard to achieve either.  The default should be
convenience, of course, but there should be a layer where "decodable
per standard" values and "not decoded" values are different types.
This is why Martin's proposal (or any other proposal to use strings
with invalid values) is nearly unacceptable, really.

What those who want reliability would have to do is to immediately
decode all strings from the system into something like what Toshio
proposes.  This would be a lot more reliable if done by Python rather
than an explicitly imported library, though, and would be available
for debugging of cases where the default "values are text"
representation falls down.

The same "text on the surface, bytes in the background" type could be
used by the email module (which already implements something like
this).

 > The problem is purely one of encoding,

No, it's not.  It's that strings (as understood by people) and system
"text" are different types (even on Mac: VFAT and NFS filesystem
filenames for example), and Python is not type-safe in this sense.

There ought to be a "you think this is text but I'm keeping an
accurate backup just in case" type for this.

From glyph at divmod.com  Sat Dec  6 07:03:55 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 06:03:55 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
Message-ID: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>

On 01:47 am, guido at python.org wrote:
>>In spite of Python being a programming language, there is a difference
>>between 'casual user of the language' and 'library developer'; 3.0 is
>>certainly a must for all actual library developers, and I'm sure most 
>>of
>>them know about 3.0 by now. We're talking about first impressions for 
>>people
>>without that knowledge.
>
>Well if most library developers already know 3.0 by now, I would hope
>they aren't going to sit on their hands, and solve the issues at hand!

The best thing for 3.0 adoption would be a 3.0 "welcoming committee".  A 
group of hackers wandering from one popular open source library to 
another, writing patches for 3.x compatibility issues.  There must be 
lots of people who care about 3.x adoption, and this is probably the 
most effective way they can reach that goal.

Each time I am going to fix a 3.0 compatibility issue, I have a choice: 
I can either make Twisted itself better (add features, fix bugs), or I 
can keep Twisted exactly the same but do lots of work so it will work on 
3.0.  It seems pretty clear to me that, to the extent that I have time 
for Twisted, fixing bugs in the HTTP implementation would be a better 
deal than puzzling through a megabyte of diffs generated by 2to3, trying 
to understand where it went wrong, and how.

This doesn't mean I'm "sitting on my hands".  It just means I have 
better things to be doing with my hands.  (To be precise, 1054 better 
things to do, re: Twisted.  Add in the Divmod projects and it's more 
like 3000.)

Of course the distant threat of an unmaintained 2.x series is enough to 
motivate me to push a *little* in this direction, but it doesn't make me 
happy about it.

I think this is exactly what the marketing effort around 3.0 needs to be 
doing: making a positive case for library and application authors to 
spend time to update to 3.x.  This is a lot of work, and many (I might 
even say most) of us need a lot of cajoling.  Free patches are a good 
incentive :).

From larry.bugbee at boeing.com  Sat Dec  6 07:18:37 2008
From: larry.bugbee at boeing.com (Bugbee, Larry)
Date: Fri, 5 Dec 2008 22:18:37 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <mailman.27161.1228543139.3486.python-dev@python.org>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
Message-ID: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>


There has been some discussion here that users should use the str or
byte function variant based on what is relevant to their system, for
example when getting a list of file names or opening a file.  That
thought process really doesn't do much for those of us that write code
that needs to run on any platform type, without alteration or the
addition of complex if-statements and/or exceptions.

Whatever the resolution here, and those of you addressing this thorny
issue have my admiration, the solution should be such that it gives
consistent behavior regardless of platform type and doesn't require the
programmer to know of all the minute details of each possible target
platform.  

That may not be possible for a while, so interim solutions should be
such that it minimizes later pain.  If that means hiding "implementation
details" behind a new function, so be it.  Then, at least, the body of
one's app is not burdened with this problem later when conditions
change.

I'm glad I'm not the only one with hard problems.  ;-)

Larry


From ncoghlan at gmail.com  Sat Dec  6 09:10:05 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 06 Dec 2008 18:10:05 +1000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <27924.1228533738@parc.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>	<20081204213104.GA24509@amk-desktop.matrixgroup.net>	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<27924.1228533738@parc.com>
Message-ID: <493A335D.7000007@gmail.com>

Bill Janssen wrote:
> Thomas Wouters <thomas at python.org> wrote:
> 
>> Allow me to paraphrase glyph (with whom I'm in complete agreement, for what
>> it's worth): many newbies will be disappointed by Python if they start with
>> Python 3.0 and discover that most of the cool possibilities they had heard
>> about are 'being worked on' and not quite ready. I don't doubt that 3.0 will
>> be easier for the new programmer to learn, but I do not believe the average
>> "Oh, I heard about Python, let's learn it" person should be pointed to 3.0
>> right now. They should be encouraged to learn 2.6 -- or even 2.5.
> 
> I think that's right.
> 
> I was asked this question today, and it comes up (to me) fairly often at
> PARC.  I usually suggest using the Python version that's standard for
> the user's platform, if they use OS X or Linux (and most do), which is
> typically 2.5 (for OS X Leopard), and 2.4 (for Linux -- may be out of date).

For Linux, it depends on the distro. I think Ubuntu has been on 2.5
since 7.04 or so.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From thomas at python.org  Sat Dec  6 11:12:22 2008
From: thomas at python.org (Thomas Wouters)
Date: Sat, 6 Dec 2008 11:12:22 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
Message-ID: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>

On Sat, Dec 6, 2008 at 02:47, Guido van Rossum <guido at python.org> wrote:

> In the mean time, I don't mind if people learn 3.0 first and 2.6
> second. It's probably easier that way than the other way around. :-)


It may be easier in a vacuum -- although I don't think it is. 3.0 is more
logical than 2.x, and I don't think it's easier to learn about the better
way first, and then realize that you have to use some archaic form later. In
fact, we had someone on #python just last week who had learned Python from a
2.6 tutorial, then found out he had to use 2.5, and he was actually tripping
over some 2.6-only features he'd been taught. When he learned he had to go
back and relearn and fix them by hand, his actual words were "if thats the
case, I'm gonna be forced to use another language". I hope that isn't a
typical example of such a case, but I can partly understand the sentiment.

But even if it's true, people don't learn in a vacuum. Almost everybody else
will be thinking of 3.0 in terms of 'changes since 2.x', tools such as 2to3
are oriented that way, and explanations on bits and pieces of Python
available to be googled are by and large about 2.x, not 3.0. Right now, it's
just much easier to go from 2.x to 3.0 than the other way 'round.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/cb0c12f4/attachment.htm>

From phd at phd.pp.ru  Sat Dec  6 15:34:54 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 6 Dec 2008 17:34:54 +0300
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
Message-ID: <20081206143454.GA15293@phd.pp.ru>

On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote:
> On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote:
>> You can't display a non-decodable filename to the user, hence the user
>> will have no idea what they're working on. Non-filesystem related apps
>> have no business trying to deal with insane filenames.
>
> Sigh, same arguments, all over again.
>
> Again, *both* KDE and Gnome apps display non-decodable filenames to the 
> user, and let the user work with the files. They display as good a  
> rendition as they can, using a replacement character as appropriate. In 
> some earlier versions, KDE did not work at all on poorly-encoded files, 
> and, users submitted bug reports. People do care, it does happen in real 
> life, and it is a bug in your software if you cannot deal with the users' 
> files. They just want the software to work. If it shows something weird 
> in the window titlebar, that's a bit irritating but at least it doesn't 
> get in the way of working.

   I agree 100%. Russian Unix users use at least 5 different encodings
(koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and
iso-8859-5 are less frequent). I have an FTP server with some filenames in
koi8 encoding - these filenames are for unix clients, - and some filenames
in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am
a commandline/console unixhead) for my needs (read email, write files in
utf-8 with characters beyond koi8-r, which is my primary encoding) - and
I still can work with filenames in koi8/cp1251 encodings. My filemanager
(Midnight Commander, for the matter) shows these files and directories as
"?????.???", but I can chdir to such directories, and I can open such
files. It would be a big bad blow for me if filemanagers (or other
programs) start to filter these filenames.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From phd at phd.pp.ru  Sat Dec  6 15:37:47 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 6 Dec 2008 17:37:47 +0300
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812061203.55624.steve@pearwood.info>
References: <4938374B.8000006@gmail.com> <493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>
	<200812061203.55624.steve@pearwood.info>
Message-ID: <20081206143747.GB15293@phd.pp.ru>

On Sat, Dec 06, 2008 at 12:03:55PM +1100, Steven D'Aprano wrote:
> I'd rather have the Python API report errors then silence them, at least 
> by default.

   +1 for encoding errors by default.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From phd at phd.pp.ru  Sat Dec  6 15:43:12 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 6 Dec 2008 17:43:12 +0300
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939D3D5.1030403@v.loewis.de>
References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de>
	<200812051043.10938.victor.stinner@haypocalc.com>
	<4939D3D5.1030403@v.loewis.de>
Message-ID: <20081206144312.GC15293@phd.pp.ru>

On Sat, Dec 06, 2008 at 02:22:29AM +0100, "Martin v. L?wis" wrote:
> And environment variables, command line arguments, and file names
> are not bytes, but characters.

   "There is no such thing as plain text!" If you say "these are
characters" you must also name the encoding for them. LANG/LC_ALL/LC_CTYPE
provide a sensible default, but if a program has problems decoding bytes to
characters there must be a way for the user to override the default. But
the user must be notified about the error, so programs must not silently
filters out non-decodable characters.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From skip at pobox.com  Sat Dec  6 16:17:43 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 6 Dec 2008 09:17:43 -0600
Subject: [Python-Dev] Where/how should I check this in?
Message-ID: <18746.38807.422664.986710@montanaro-dyndns-org.local>

I have a change to check in from this issue:

    http://bugs.python.org/issue4483

It is a build error for _dbmmodule.c which was reported against Python 3.0
involving a change to the layout of symbols in libgdbm.  (There is now a
libgdbm_compat in some systems which holds the dbm_* symbols.)  With one
tweak I'm certain needs to be applied to both 2.6 and trunk.  Do I just
check it in on all three branches and run svnmerge block to keep it from
being considered again?

Thanks,

Skip

From a.badger at gmail.com  Sat Dec  6 16:52:38 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Sat, 06 Dec 2008 07:52:38 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CBDB.30305@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
Message-ID: <493A9FC6.8090201@gmail.com>

Nick Coghlan wrote:
> Toshio Kuratomi wrote:
>>>
>> Nonsense.  A program can do tons of things with a non-decodable
>> filename.  Where it's limited is non-decodable filedata.
> 
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.
> 
This is where we disagree.  There are many ways to display the
non-decodable filename to the user because the user is not a machine.
The computer must know the unique sequence of bytes in order to access a
file. The user, OTOH, usually only needs to know that the file exists.
In most GUI-based end-user oriented desktop apps, it's enough to do
str(filename, errors='replace').  For instance, the GNOME file manager
displays:
  "? (Invalid encoding)"
and Konqueror, the KDE file manager just displays:
  "?"

The file can still be displayed this way, accessed via the raw bytes
that the program keeps internally, and operated upon by applications.

For applications in which the user needs more information to
differentiate the files the program has the option to display the raw
byte sequences as if they were the filename.  The *NIX shell and command
line tools have this ability.

$ LANG=en_US.utf8 ls -b
?
?
$ LANG=C ls -b
.
..
\303\241
\303\255
$ mv $'\303\241' $'\303\263'
$ LANG=C ls -b
\303\255
\303\263
$ LANG=en_US.utf8 ls -b
?
?

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.
> 
<nod>  With the caveat that I haven't seen movement by Linux and other
Unix variants to enforce UTF-8.  What I have seen are statements by
kernel programmers that having the filesystem use bytes and not know
about encoding is the correct thing to do.

This means that utf-8 will be a convention rather than a necessity for a
very long time and consequently programs will need to worry about the
problems of mixed encoding systems for an equally long time.  (Remember,
encoding is something that can be changed per user and per file.  So on
a multiuser OS, mixed encodings can be out of the control of the system
administrator for perfectly valid reasons.)

> In the meantime, raw bytes APIs will provide an alternative for those
> that disagree with that philosophy.
> 
Oh I agree with the UTF-8 everywhere philosophy.  I just know that
there's tons of real-world systems out there that don't conform to my
expectations for sanity and my code has to account for those :-)

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/686029a6/attachment.pgp>

From guido at python.org  Sat Dec  6 18:00:58 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 09:00:58 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>
Message-ID: <ca471dc20812060900h42acb016w4d79bf4a13fe7fbb@mail.gmail.com>

On Fri, Dec 5, 2008 at 10:18 PM, Bugbee, Larry <larry.bugbee at boeing.com> wrote:
> There has been some discussion here that users should use the str or
> byte function variant based on what is relevant to their system, for
> example when getting a list of file names or opening a file.  That
> thought process really doesn't do much for those of us that write code
> that needs to run on any platform type, without alteration or the
> addition of complex if-statements and/or exceptions.
>
> Whatever the resolution here, and those of you addressing this thorny
> issue have my admiration, the solution should be such that it gives
> consistent behavior regardless of platform type and doesn't require the
> programmer to know of all the minute details of each possible target
> platform.

My prediction is that it won't ever be possible to completely hide
this difference between platforms. The platforms differ fundamentally
in how they see filenames. An elaborate abstraction can certainly be
created that smooths out most of the differences, but at some point
useful functionality will have to be lost in order to maintain strict
platform independence. This is the fate of most platform-independence
abstractions by the way. For example, there are many elaborate
packages for platform-independent I/O, but they generally don't
provide access to all functionality that is available on a platform.
Where they do, the application is once again placed in the position of
having to use complex if-statements and/or exceptions.

Consider just this example. Many programs have a need to ask their
user for a filename to be created by the program. On systems where
filenames are raw byte strings, do you want to provide the user with a
way to specify an arbitrary byte string? (That is, in addition to the
normal case of entering a text string that will be transformed into a
filename using some encoding.) Your choices are either not to support
the case of bytes that aren't a valid encoding in the current
encoding, or add a UI element to select an encoding, or add a UI
element to enter raw bytes. An abstraction package is likely to only
support the first option (this is what Java does BTW), but this is not
acceptable to all applications.

> That may not be possible for a while, so interim solutions should be
> such that it minimizes later pain.  If that means hiding "implementation
> details" behind a new function, so be it.  Then, at least, the body of
> one's app is not burdened with this problem later when conditions
> change.

I believe the problem's severity is actually overstated. The interim
solution with the least amount of pain that will work for almost all
apps is to treat filenames as text strings encoded in some default
encoding, and ignore filenames that aren't valid encodings of any text
string. Yes, it is possible that you'll find that you can't completely
remove or traverse certain directory trees. But that's a fact of life
anyway (filesystems have many hidden failure modes), so you're better
off dealing with *that* possibility than worrying over the issue of
undecodable filenames.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Dec  6 18:05:23 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 09:05:23 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493A061D.1060406@palladion.com>
References: <4938374B.8000006@gmail.com>
	<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>
	<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<493A061D.1060406@palladion.com>
Message-ID: <ca471dc20812060905j4bcef3dbva1d0ed4e71e9759a@mail.gmail.com>

On Fri, Dec 5, 2008 at 8:57 PM, Tres Seaver <tseaver at palladion.com> wrote:
> Amen!  the idea that paths, environment varioables, and stuff pulled off
> of sockets can be treated as text rather than strings is just wishful
> thinking.

Unfortunately most of the programmers of the world *do* think that
way(*), and it's not easy to wean them off the idea. It's a powerful
meme that you can use your own name as a file name, even if you happen
to be Czech or Vietnamese -- and it's promoted by the two most popular
consumer operating systems.

(*) With the exception of sockets. Sockets are typically dealt with
through protocols and APIs that provide guidance about how to convert
between bytes and strings, and whether that is even a meaningful
operation.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From a.badger at gmail.com  Sat Dec  6 18:18:30 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Sat, 06 Dec 2008 09:18:30 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>
Message-ID: <493AB3E6.7070806@gmail.com>

Bugbee, Larry wrote:
> There has been some discussion here that users should use the str or
> byte function variant based on what is relevant to their system, for
> example when getting a list of file names or opening a file.  That
> thought process really doesn't do much for those of us that write code
> that needs to run on any platform type, without alteration or the
> addition of complex if-statements and/or exceptions.
> 
> Whatever the resolution here, and those of you addressing this thorny
> issue have my admiration, the solution should be such that it gives
> consistent behavior regardless of platform type and doesn't require the
> programmer to know of all the minute details of each possible target
> platform.  
> 
I've been thinking about this and I can only see one option.  I don't
think that it really makes less work for the programmer, though -- it
just shifts the problem and makes it more apparent what your code is doing.

To avoid exceptions and if-then's in program code when accessing
filenames, environment variables, etc, you would need to access each of
these resources via the byte API.  Then, to avoid having to keep track
of what's a string and what's a byte in your other code, you probably
want to convert those bytes to strings.  This is where the burden gets
shifted.  You'll have your own routine(s) to do the conversion and have
to have exception handling code to deal with undecodable filenames.

Note 1: your particular app might be able to get away without doing the
conversion from bytes to string -- it depends on what you're planning on
doing with the filename/environment data.

Note 2: If there isn't a parallel API on all platforms, for instance,
Guido's proposal to not have os.environb on Windows, then you'll still
have to have a platform specific check. (Likely you should try to access
os.evironb in this instance and if it doesn't exist, use os.environ
instead... and remember that you need to either change os.environ's data
into str type or change os.environb's data into byte type.)

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/0be3fe60/attachment.pgp>

From guido at python.org  Sat Dec  6 18:54:18 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 09:54:18 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
Message-ID: <ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>

On Fri, Dec 5, 2008 at 9:28 PM,  <glyph at divmod.com> wrote:
> On 5 Dec, 06:10 pm, guido at python.org wrote:
>> On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:
>>> With all due respect, for me, "library support" and "serious use" are
>>> synonymous.
>>
>> Glyph, I cannot have a discussion with you if every single post of
>> yours is longer than my combined daily output. Please spend some time
>> writing shorter posts. I'm sure I'm not the only one here with a short
>> attention span. :-)
>
> I already spend a lot of time trying to remove extraneous details.  The
> drafts of these messages are usually 3x as long :).  So, trying to keep it
> short:

Thanks!

> Thomas paraphrased my point pretty well.  The importance of libraries cannot
> be overemphasized.  Maybe you're right and the stdlib is enough for a large
> audience, but I don't know that audience.  Everyone I know who uses Python,
> uses it because of a library.  In some cases, an equivalent library exists
> for another language, and Python wins because it has a nicer syntax.  But,
> in no case does Python win where it *doesn't* have the library.

Clearly you're not reading the edu-sig list. :-)

> I think that the marketing for py3 needs to target library vendors before
> targeting novices.  If the novices are targeted first, they are going to
> have a bad experience when "python" libraries don't work with py3, and
> library maintainers are going to have a bad experience when clueless newbies
> harass them to update their software without understanding the magnitude of
> the work to do so.

I think it's great to have specific marketing targeted towards library
developers. I know we haven't done enough -- for example I promised a
C extension porting guide which didn't materialize. :-(

But I do *not* think it is a good idea to emphasize elsewhere that
most people shouldn't use Python 3.0. Py3k will have a hard enough
time gaining mindshare without the very developers who created it
discouraging its use. If you can't find it in your heart to recommend
3.0, can you at least keep that within your circle of
library-producing friends?

Whenever someone asks me which version to use, I alwasys respond with
a question -- what do you want to use it for? And then I'll give them
an answer based on what's available for their needs. Sometimes I have
to recommend Python 2.2. It's been a while since I had to recommend
1.5.2 but a few years ago that was still common. (A large company I
know still has servers where 1.5.2 is the default, although 2.4 is
also installed.)

> I've been predicting this for years, but two days into Python 3's release,
> I've already seen real-world examples of this pattern in #twisted.  I can
> tell these people to "downgrade" to py2 when they come ask me for help, but
> I don't think most of them ask for help.  They just get angry and learn Java
> instead.

If they're that easily convinced that Java is better they probably
were a lost cause anyway, so I won't mourn their departure too much.

The one thing I would warn against is replacing a default Python 2.x
with Python 3.0 -- if you find 2.x pre-installed, it's likely that
other parts of the OS infrastructure depend on it, and *any* upgrade
except to 2.x.n is likely to cause trouble.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Dec  6 19:09:28 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 10:09:28 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>
Message-ID: <ca471dc20812061009nd2737c4l79d9e49f4b701334@mail.gmail.com>

On Sat, Dec 6, 2008 at 2:12 AM, Thomas Wouters <thomas at python.org> wrote:

> On Sat, Dec 6, 2008 at 02:47, Guido van Rossum <guido at python.org> wrote:
>> In the mean time, I don't mind if people learn 3.0 first and 2.6
>> second. It's probably easier that way than the other way around. :-)
>
> It may be easier in a vacuum -- although I don't think it is. 3.0 is more
> logical than 2.x, and I don't think it's easier to learn about the better
> way first, and then realize that you have to use some archaic form later.

True, though (at least when writing new 2.x code) it's often not
needed to use the archaic forms. E.g. you don't have to use backticks
or __cmp__ or string exceptions. And if you can live with 2.6 it gets
even better (e.g. relative import, "except ... as ...").

> In
> fact, we had someone on #python just last week who had learned Python from a
> 2.6 tutorial, then found out he had to use 2.5, and he was actually tripping
> over some 2.6-only features he'd been taught. When he learned he had to go
> back and relearn and fix them by hand, his actual words were "if thats the
> case, I'm gonna be forced to use another language". I hope that isn't a
> typical example of such a case, but I can partly understand the sentiment.

You can't prevent this kind of thing happening occasionally. I don't
generally lie awake over it -- I don't expect a massive exodus. I
think some people like to say this kind of thing (especially publicly)
because they expect us to be insecure about Python adoption and
worried about the competition. Don't fall for the troll bait! When
they go home they'll realize that learning Ruby or Java is a lot more
work than learning the differences between Python 2.5 and 2.6. Or
they'll learn one of those and find that it's not all roses their
either. (Ruby is also going through a language transition, and the
choice of which version of Java to learn isn't that easy either,
despite the strict backwards compatibility -- you can choose to use a
somewhat awkward older version, or use the latest and find it's not
supported on the next platform you're porting to.)

> But even if it's true, people don't learn in a vacuum. Almost everybody else
> will be thinking of 3.0 in terms of 'changes since 2.x', tools such as 2to3
> are oriented that way, and explanations on bits and pieces of Python
> available to be googled are by and large about 2.x, not 3.0. Right now, it's
> just much easier to go from 2.x to 3.0 than the other way 'round.

True, but we should work on fixing this rather than giving up. What
happened to the 3to2 project? Wasn't someone planning to write a 3.0
to 2.6 (or 2.5?) converter using the same technology in 2to3?

We probably need two different marketing/PR streams: one aimed at
*existing* Python users (reaffirming we will be supporting 2.x fully
for many years to come), another at *new* users (suggesting that now
is a better time than ever to learn Python, with 3.0 available and new
packages being ported to it all the time).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Dec  6 19:16:21 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 10:16:21 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
Message-ID: <ca471dc20812061016u3c0c49b7q23d8647ae5a49aca@mail.gmail.com>

On Fri, Dec 5, 2008 at 10:03 PM,  <glyph at divmod.com> wrote:
> The best thing for 3.0 adoption would be a 3.0 "welcoming committee".  A
> group of hackers wandering from one popular open source library to another,
> writing patches for 3.x compatibility issues.  There must be lots of people
> who care about 3.x adoption, and this is probably the most effective way
> they can reach that goal.
>
> Each time I am going to fix a 3.0 compatibility issue, I have a choice: I
> can either make Twisted itself better (add features, fix bugs), or I can
> keep Twisted exactly the same but do lots of work so it will work on 3.0.
>  It seems pretty clear to me that, to the extent that I have time for
> Twisted, fixing bugs in the HTTP implementation would be a better deal than
> puzzling through a megabyte of diffs generated by 2to3, trying to understand
> where it went wrong, and how.
>
> This doesn't mean I'm "sitting on my hands".  It just means I have better
> things to be doing with my hands.  (To be precise, 1054 better things to do,
> re: Twisted.  Add in the Divmod projects and it's more like 3000.)
>
> Of course the distant threat of an unmaintained 2.x series is enough to
> motivate me to push a *little* in this direction, but it doesn't make me
> happy about it.
>
> I think this is exactly what the marketing effort around 3.0 needs to be
> doing: making a positive case for library and application authors to spend
> time to update to 3.x.  This is a lot of work, and many (I might even say
> most) of us need a lot of cajoling.  Free patches are a good incentive :).

This is a really good idea. I hope and expect that the information and
tools available for porting to 3.0 will dramatically improve over the
next half year or so (hopefully the situation is a lot less gloomy
already by the time we meet again at PyCon). The porting list that was
just created also sounds like a step in the right direction.

I do think that in many cases *some* support from the regular
maintainers of a library would be needed -- for example if you (in
particular) were to express a negative attitude towards porting
Twisted to 3.0 (I'm not saying that you do, it's just a hypothetical
that would apply to any "BDFL" for any sizable library) then this
would discourage others from trying to contribute. OTOH if you made a
branch available where you check in the results of running 2to3 over
Twisted, with instructions for people to contribute fixes, that would
be great -- at almost no cost to you! (Assuming you can get someone
else to work on merging trunk improvements into that branch.) Remember
the open source mantra -- reap the benefit of all those eyeballs!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From glyph at divmod.com  Sat Dec  6 19:48:04 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 18:48:04 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>
Message-ID: <20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com>

On 10:12 am, thomas at python.org wrote:
>When he learned he had to go
>back and relearn and fix them by hand, his actual words were "if thats 
>the
>case, I'm gonna be forced to use another language". I hope that isn't a
>typical example of such a case, but I can partly understand the 
>sentiment.

This is an overreaction, but it's a very typical overreaction.  It's 
difficult to recover from a negative first impression even if you have 
lots of opportunities; in the case of an anonymous user trying out 
Python, the user will often stop using it, without telling anyone, and 
never come back.  There's no opportunity to recover.

From glyph at divmod.com  Sat Dec  6 19:53:19 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 18:53:19 -0000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081206143454.GA15293@phd.pp.ru>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
Message-ID: <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>

On 02:34 pm, phd at phd.pp.ru wrote:
>On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote:
>>On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote:
>>>You can't display a non-decodable filename to the user, hence the 
>>>user
>>>will have no idea what they're working on. Non-filesystem related 
>>>apps
>>>have no business trying to deal with insane filenames.

>>Sigh, same arguments, all over again.

>>People do care, it does happen in real
>>life, and it is a bug in your software if you cannot deal with the 
>>users'
>>files. They just want the software to work. If it shows something 
>>weird
>>in the window titlebar, that's a bit irritating but at least it 
>>doesn't
>>get in the way of working.

>   I agree 100%. Russian Unix users use at least 5 different encodings
>(koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and
>iso-8859-5 are less frequent). I have an FTP server with some filenames 
>in
>koi8 encoding - these filenames are for unix clients, - and some 
>filenames
>in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am
>a commandline/console unixhead) for my needs (read email, write files 
>in
>utf-8 with characters beyond koi8-r, which is my primary encoding) - 
>and
>I still can work with filenames in koi8/cp1251 encodings. My 
>filemanager
>(Midnight Commander, for the matter) shows these files and directories 
>as
>"?????.???", but I can chdir to such directories, and I can open such
>files. It would be a big bad blow for me if filemanagers (or other
>programs) start to filter these filenames.

I find it interesting to note that the only users in this discussion who 
actually have these problems in real life all have this attitude.  It is 
expected that in an imperfect world we will have imperfect encodings, 
but it is super important that software which can open files can deal 
with not understanding the character translation of the filename.

From guido at python.org  Sat Dec  6 20:04:44 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 11:04:44 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com>
	<20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com>
Message-ID: <ca471dc20812061104qda59b49sa822a57e827130c9@mail.gmail.com>

On Sat, Dec 6, 2008 at 10:48 AM,  <glyph at divmod.com> wrote:
> On 10:12 am, thomas at python.org wrote:
>> When he learned he had to go
>> back and relearn and fix them by hand, his actual words were "if thats the
>> case, I'm gonna be forced to use another language". I hope that isn't a
>> typical example of such a case, but I can partly understand the sentiment.
>
> This is an overreaction, but it's a very typical overreaction.  It's
> difficult to recover from a negative first impression even if you have lots
> of opportunities; in the case of an anonymous user trying out Python, the
> user will often stop using it, without telling anyone, and never come back.
>  There's no opportunity to recover.

Sorry, but I really don't see it that dark. Either they weren't ready
to learn a new language anyway, or they'll try something else, and
find that the grass isn't actually that green on the other side of the
fence either.

In general I don't worry about losing one individual potential user;
there are plenty of others. I'd be more worried if someone wrote a
nasty blog rant or a Slashdot article after such an experience -- but
there will always be lots of people pointing out the other side, so
the negative effect of such blogs is usually neutralized quite well.

The one overraction that would really worry me is if influential
people inside the Python developer community were to start dissing
Python 3.0 based on the response of someone in #python.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Dec  6 20:13:38 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 11:13:38 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
Message-ID: <ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>

On Sat, Dec 6, 2008 at 10:53 AM,  <glyph at divmod.com> wrote:
> On 02:34 pm, phd at phd.pp.ru wrote:
>>  I agree 100%. Russian Unix users use at least 5 different encodings
>> (koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and
>> iso-8859-5 are less frequent). I have an FTP server with some filenames in
>> koi8 encoding - these filenames are for unix clients, - and some filenames
>> in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am
>> a commandline/console unixhead) for my needs (read email, write files in
>> utf-8 with characters beyond koi8-r, which is my primary encoding) - and
>> I still can work with filenames in koi8/cp1251 encodings. My filemanager
>> (Midnight Commander, for the matter) shows these files and directories as
>> "?????.???", but I can chdir to such directories, and I can open such
>> files. It would be a big bad blow for me if filemanagers (or other
>> programs) start to filter these filenames.
>
> I find it interesting to note that the only users in this discussion who
> actually have these problems in real life all have this attitude.  It is
> expected that in an imperfect world we will have imperfect encodings, but it
> is super important that software which can open files can deal with not
> understanding the character translation of the filename.

For file managers and similar tools I am absolutely 100% in agreement
-- that's why the binary APIs are there.

Most apps aren't file managers or ftp clients though. The sky is not falling.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ctb at msu.edu  Sat Dec  6 20:43:42 2008
From: ctb at msu.edu (C. Titus Brown)
Date: Sat, 6 Dec 2008 11:43:42 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
Message-ID: <20081206194342.GB26208@idyll.org>

On Sat, Dec 06, 2008 at 06:03:55AM -0000, glyph at divmod.com wrote:
-> On 01:47 am, guido at python.org wrote:
-> >>In spite of Python being a programming language, there is a difference
-> >>between 'casual user of the language' and 'library developer'; 3.0 is
-> >>certainly a must for all actual library developers, and I'm sure most 
-> >>of
-> >>them know about 3.0 by now. We're talking about first impressions for 
-> >>people
-> >>without that knowledge.
-> >
-> >Well if most library developers already know 3.0 by now, I would hope
-> >they aren't going to sit on their hands, and solve the issues at hand!
-> 
-> The best thing for 3.0 adoption would be a 3.0 "welcoming committee".  A 
-> group of hackers wandering from one popular open source library to 
-> another, writing patches for 3.x compatibility issues.  There must be 
-> lots of people who care about 3.x adoption, and this is probably the 
-> most effective way they can reach that goal.

Does anyone smell a few GSoC projects?  (And maybe GHOP if Google
decides to run it again; no word yet.)

--titus
-- 
C. Titus Brown, ctb at msu.edu

From warren at delsci.com  Sat Dec  6 20:38:51 2008
From: warren at delsci.com (Warren DeLano)
Date: Sat, 6 Dec 2008 11:38:51 -0800
Subject: [Python-Dev] "as" keyword woes
Message-ID: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>

 
> Date: Fri, 05 Dec 2008 22:22:38 -0800
> From: Dennis Lee Bieber <wlfraed at ix.netcom.com>
> Subject: Re: "as" keyword woes
> To: python-list at python.org
> Message-ID: <bqadnTS6jM21h6fUnZ2dnUVZ_uydnZ2d at earthlink.com>
> 
> 	I'm still in the dark as to what type of data could 
> even inspire the
> use of "as" as an object name... A collection of "a" objects? In which
> case, what are the "a"s? <G>

Please let me clarify.  It is not "as" as a standalone object that we
specifically miss in 2.6/3, but rather, the ability to use ".as" used as
a method or attribute name.  

In other words we have lost the ability to refer to "as" as the
generalized OOP-compliant/syntax-independent method name for casting:

new_object = old_object.as(class_hint)

# For example:

float_obj = int_obj.as("float")

# or 

float_obj = int_obj.as(float_class)

# as opposed to something like

float_obj = int_obj.asFloat()

# which requires a separate method for each cast, or

float_obj = (float)int_obj  

# which required syntax-dependent casting [language-based rather than
object-based].

Of course, use of explicit casting syntax "(float)" is fine if you're
restricting yourself to Python and other languages which support
casting, but that solution is unavailable inside of a pure OOP
message-passing paradigm where object.method(argument) invocations are
all you have to work with.  

Please note that use of object.asClassname(...) is a ubiqitous
convention for casting objects to specific classes (seen in ObjectiveC,
Java, SmallTalk, etc.).  

There, I assert that 'object.as(class_reference)' is the simplest and
most elegant generalization of this widely-used convention.  Indeed, it
is the only obvious concise answer, if you are limited to using methods
for casting.

Although there are other valid domain-specific uses for "as" as either a
local variable or attribute names (e.g. systematic naming: as, bs, cs),
those aren't nearly as important compared to "as" being available as the
name of a generalized casting method -- one that is now strictly denied
to users of Python 2.6 and 3.

As someone somewhat knowledgable of how parsers work, I do not
understand why a method/attribute name "object_name.as(...)" must
necessarily conflict with a standalone keyword " as ".  It seems to me
that it should be possible to unambiguously separate the two without
ambiguity or undue complication of the parser.

So, assuming I now wish to propose a corrective PEP to remedy this
situation for Python 3.1 and beyond, what is the best way to get started
on such a proposal?  

Cheers,
Warren








From scott+python-dev at scottdial.com  Sat Dec  6 21:06:42 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Sat, 06 Dec 2008 15:06:42 -0500
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
Message-ID: <493ADB52.7090608@scottdial.com>

Warren DeLano wrote:
> There, I assert that 'object.as(class_reference)' is the simplest and
> most elegant generalization of this widely-used convention.  Indeed, it
> is the only obvious concise answer, if you are limited to using methods
> for casting.

How about "to"? Almost every language I have ever used uses "to" and not
"as". Python predominately uses "to" already, so why would you fight
that? And moreover, I have never seen a language or library that
preferred "as", so I remain to be convinced that "as" is a good choice.

> As someone somewhat knowledgable of how parsers work, I do not
> understand why a method/attribute name "object_name.as(...)" must
> necessarily conflict with a standalone keyword " as ".  It seems to me
> that it should be possible to unambiguously separate the two without
> ambiguity or undue complication of the parser.

It's not a matter of whether it is possible. It's a matter of simplicity
and a lack of a worthy use-case for allowing it. In general, the trend
has been to not allow any keywords as identifiers in the Python
language. If there were such a worthy use-case, then what is really
import is that it increases the complexity of /the language/ a human
programmer needs to parse.

> So, assuming I now wish to propose a corrective PEP to remedy this
> situation for Python 3.1 and beyond, what is the best way to get started
> on such a proposal?

I think you will need to work on making a convincing argument as to why
the keyword "as" is anymore special than say "for", or any other
keywords for that matter. Unless you plan on proposing a reversal of the
current keyword/identifier ideology, which is likely to be reject outright.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From glyph at divmod.com  Sat Dec  6 21:19:15 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 20:19:15 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
Message-ID: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>

As far as the original point of this thread, I started off just 
defending the cautionary text already present in the announcements and 
on the website.  Since I'm not advocating any changes to that (the brief 
caveat on the "download" page is fine), we'll just have to agree to 
disagree on the abstractly appropriate audience for 3.0.  I'll respond 
to some other points though:

On 05:54 pm, guido at python.org wrote:
>On Fri, Dec 5, 2008 at 9:28 PM,  <glyph at divmod.com> wrote:
>>On 5 Dec, 06:10 pm, guido at python.org wrote:
>>>On Thu, Dec 4, 2008 at 11:27 PM,  <glyph at divmod.com> wrote:

>I think it's great to have specific marketing targeted towards library
>developers. I know we haven't done enough -- for example I promised a
>C extension porting guide which didn't materialize. :-(

Well, get cracking, then! :)
>If you can't find it in your heart to recommend
>3.0, can you at least keep that within your circle of
>library-producing friends?

In another (longer) message, I already said this is what I'm doing. 
Assuming that we are all my "library-producing friends" here :).  I am 
deliberately refraining from blogging about 3.0 until I have something 
nice to say.

But still, you can't honestly expect me to recommend 3.0 until someone 
has gotten at least a basic skeleton of Twisted up and running under it 
:).  My own attempts to do so have failed miserably, to the point where 
I can't even produce a useful bug report without a lot more work.

Would you recommend a C compiler that couldn't build Python, or link 
with it?
>Whenever someone asks me which version to use, I alwasys respond with
>a question -- what do you want to use it for?

In the longer term, I think that you should look at this as a symptom of 
a problem.  If you learn Java, you learn the most recent version.  If 
you need your software to work with an older version, you just pass a 
special option to the compiler.  If you want your *old* software to work 
with a *new* version, it basically just does (at least, 99% of the 
time).

I don't think there's anything about the 3.0 language which couldn't be 
supported in a VM that understood both 2 and 3.  "py3to2" seems at least 
a rough proof of concept of that idea, although it still has some 
issues.  Library availability should be a separate concern from a clean 
source language.

I also don't think 3.0 is perfect, and five years on, there will be a 
temptation to make more "just this once" incompatible changes.  Of 
course, you've promised these changes won't be made, and *this* set of 
design mistakes will be with us forever.  It would be nice if there were 
a way for evolution to continue without another reboot of the world.
>If they're that easily convinced that Java is better they probably
>were a lost cause anyway, so I won't mourn their departure too much.

I really believe that *all* new users are fickle, if they don't have a 
mandate as to what they need to be learning.  Personally, I learned 
Python because of a memory leak in Swing.

From guido at python.org  Sat Dec  6 21:29:09 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 12:29:09 -0800
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
Message-ID: <ca471dc20812061229v1e33584ycb06eb0389637a51@mail.gmail.com>

On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano <warren at delsci.com> wrote:
[...]
> There, I assert that 'object.as(class_reference)' is the simplest and
> most elegant generalization of this widely-used convention.  Indeed, it
> is the only obvious concise answer, if you are limited to using methods
> for casting.

Well, that's too bad, as 'as' is now a reserved word.

> Although there are other valid domain-specific uses for "as" as either a
> local variable or attribute names (e.g. systematic naming: as, bs, cs),
> those aren't nearly as important compared to "as" being available as the
> name of a generalized casting method -- one that is now strictly denied
> to users of Python 2.6 and 3.

If you had brought this up 5-10 years ago when we first introduced
'as' as a semi-keyword (in the import statement) we might have been
able to avert this disaster. As it was, nobody ever brought this up
AFICR, so I don't think it's *that* obvious.

> As someone somewhat knowledgable of how parsers work, I do not
> understand why a method/attribute name "object_name.as(...)" must
> necessarily conflict with a standalone keyword " as ".  It seems to me
> that it should be possible to unambiguously separate the two without
> ambiguity or undue complication of the parser.

That's possible with sufficiently powerful parser technology, but
that's not how the Python parser (and most parsers, in my experience)
treat reserved words. Reserved words are reserved in all contexts,
regardless of whether ambiguity could arise. Otherwise *every*
reserved word would have to be allowed right after a '.', and many
keywords would have to be allowed as identifiers in other contexts.
That way lies PL/1...

Furthermore, how would you define the 'as' method? Would you also want
to be allowed to write

def as(self, target): ...

??? Trust me, it's a slippery slope, and you don't want to start going
down there.

> So, assuming I now wish to propose a corrective PEP to remedy this
> situation for Python 3.1 and beyond, what is the best way to get started
> on such a proposal?

Don't bother writing a PEP to make 'as' available as an attribute
again. It has no chance of being accepted. Instead, think of a
different word you could use.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From glyph at divmod.com  Sat Dec  6 21:37:25 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 20:37:25 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812061016u3c0c49b7q23d8647ae5a49aca@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
	<ca471dc20812061016u3c0c49b7q23d8647ae5a49aca@mail.gmail.com>
Message-ID: <20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com>


On 06:16 pm, guido at python.org wrote:
>On Fri, Dec 5, 2008 at 10:03 PM,  <glyph at divmod.com> wrote:

>I do think that in many cases *some* support from the regular
>maintainers of a library would be needed -- for example if you (in
>particular) were to express a negative attitude towards porting
>Twisted to 3.0 (I'm not saying that you do, it's just a hypothetical
>that would apply to any "BDFL" for any sizable library) then this
>would discourage others from trying to contribute.

Of course.  Grumpy as we are, we're preparing for the 3.0 migration, and 
have been for a while.  There are tickets open in the tracker, a 
buildslave reporting 2.6's -3 warnings, and soon, apparently, a 
buildslave that will attempt to run the tests with 3.0, although getting 
anything but a traceback bootstrapping the testing tool is a ways off.

My attitude in every public statement I've ever made about 3.0 has been 
that there is too much migration work for our tiny team to do, but we 
are very open to getting help from the community.
>OTOH if you made a
>branch available where you check in the results of running 2to3 over
>Twisted, with instructions for people to contribute fixes, that would
>be great -- at almost no cost to you! (Assuming you can get someone
>else to work on merging trunk improvements into that branch.) Remember
>the open source mantra -- reap the benefit of all those eyeballs!

This isn't really the way our development process works on Twisted - we 
don't have enough developers to support more than one line of 
development.  Modules and subsystems can be patched individually, and 
the whole idea with 2to3 is that source changes should remain compatible 
with 2.6 (and appropriate level of swaddling can paper over library 
changes back to 2.3) so those fixes can just go into trunk, right?

Nevertheless the sentiment is the same.  If someone desperately 
interested in getting Twisted to work on 3.0, there would be lots of 
work for them to do and a clear place for them to go do it.

From guido at python.org  Sat Dec  6 21:51:55 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 12:51:55 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
Message-ID: <ca471dc20812061251o71c6c7abod6bf2c2f19cb3a97@mail.gmail.com>

On Sat, Dec 6, 2008 at 12:19 PM,  <glyph at divmod.com> wrote:
> I also don't think 3.0 is perfect, and five years on, there will be a
> temptation to make more "just this once" incompatible changes.  Of course,
> you've promised these changes won't be made, and *this* set of design
> mistakes will be with us forever.  It would be nice if there were a way for
> evolution to continue without another reboot of the world.

It would be nice indeed. But we (and any other language that's alive)
will need to walk a careful line between evolving too slow and too
fast. Hopefully we'll be able to evolve mostly through deprecation and
eventual removal of misfeatures rather than through a series of
hiccups like 3.0. But it will still be too slow for some and too fast
for others.

Since one of your favorite themes is that your team is too small, I
would like to reuse that idea. If we had as many Python core
developers as Sun and IBM have working on Java, we could most likely
have introduced all Python 3.0 features gradually, with compiler flags
and __future__ imports to support different versions. But despite
being a bit bigger than Twisted, we're still severely constrained by
resources. My estimation when we started was that it would be easier
for the core team to maintain two separate versions over a long time,
than to try and produce a single binary capable of running both
versions of the language. (Maybe Jython and/or IronPython provide a
better platform for doing that though.)

Hopefully by the time Python 4000 rolls along, technology will be
available to make the transition more smoothly. But we'll still have
to break some eggs...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Dec  6 21:53:16 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 12:53:16 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
	<ca471dc20812061016u3c0c49b7q23d8647ae5a49aca@mail.gmail.com>
	<20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com>
Message-ID: <ca471dc20812061253x4ca0a506q6b19583a8dbc1b00@mail.gmail.com>

On Sat, Dec 6, 2008 at 12:37 PM,  <glyph at divmod.com> wrote:
> Of course.  Grumpy as we are, we're preparing for the 3.0 migration, and
> have been for a while.  There are tickets open in the tracker, a buildslave
> reporting 2.6's -3 warnings, and soon, apparently, a buildslave that will
> attempt to run the tests with 3.0, although getting anything but a traceback
> bootstrapping the testing tool is a ways off.

Thank you very much for this.

> My attitude in every public statement I've ever made about 3.0 has been that
> there is too much migration work for our tiny team to do, but we are very
> open to getting help from the community.

If I were a Twisted user I wouldn't hesitate to help. Open source to the rescue!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From glyph at divmod.com  Sat Dec  6 22:26:49 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sat, 06 Dec 2008 21:26:49 -0000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812061251o71c6c7abod6bf2c2f19cb3a97@mail.gmail.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
	<ca471dc20812061251o71c6c7abod6bf2c2f19cb3a97@mail.gmail.com>
Message-ID: <20081206212649.12555.749860363.divmod.xquotient.1720@weber.divmod.com>

On 08:51 pm, guido at python.org wrote:
>On Sat, Dec 6, 2008 at 12:19 PM,  <glyph at divmod.com> wrote:
>>I also don't think 3.0 is perfect, and five years on, there will be a
>>temptation to make more "just this once" incompatible changes.  Of 
>>course,
>>you've promised these changes won't be made, and *this* set of design
>>mistakes will be with us forever.  It would be nice if there were a 
>>way for
>>evolution to continue without another reboot of the world.

>Since one of your favorite themes is that your team is too small, I
>would like to reuse that idea. If we had as many Python core
>developers as Sun and IBM have working on Java, we could most likely
>have introduced all Python 3.0 features gradually, with compiler flags
>and __future__ imports to support different versions. But despite
>being a bit bigger than Twisted, we're still severely constrained by
>resources.

Ah, the dangers of over-editing.  I originally had a whole paragraph 
about how I understood that the Python dev team was also resource 
constrained, but I deleted it for brevity.  Now you see why my posts are 
so long! :)

From brett at python.org  Sat Dec  6 23:36:11 2008
From: brett at python.org (Brett Cannon)
Date: Sat, 6 Dec 2008 14:36:11 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com>
	<ca471dc20812051747p488b49beobb02bb3e9856b8e6@mail.gmail.com>
	<20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com>
Message-ID: <bbaeab100812061436k606f21a5p7cc92c011da5011d@mail.gmail.com>

On Fri, Dec 5, 2008 at 22:03,  <glyph at divmod.com> wrote:
> On 01:47 am, guido at python.org wrote:
>>>
>>> In spite of Python being a programming language, there is a difference
>>> between 'casual user of the language' and 'library developer'; 3.0 is
>>> certainly a must for all actual library developers, and I'm sure most of
>>> them know about 3.0 by now. We're talking about first impressions for
>>> people
>>> without that knowledge.
>>
>> Well if most library developers already know 3.0 by now, I would hope
>> they aren't going to sit on their hands, and solve the issues at hand!
>
> The best thing for 3.0 adoption would be a 3.0 "welcoming committee".  A
> group of hackers wandering from one popular open source library to another,
> writing patches for 3.x compatibility issues.  There must be lots of people
> who care about 3.x adoption, and this is probably the most effective way
> they can reach that goal.
>

The welcoming committee has somewhat already started. Martin announced
on python-porting that he ported psycopg2 himself and submitted the
patch. Martin also mostly ported Django at the last PyCon.

-Brett

From brett at python.org  Sat Dec  6 23:42:38 2008
From: brett at python.org (Brett Cannon)
Date: Sat, 6 Dec 2008 14:42:38 -0800
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
	<bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
	<4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>
Message-ID: <bbaeab100812061442j10a30baat3caeb922eb6c93e8@mail.gmail.com>

On Thu, Dec 4, 2008 at 17:02, Frank Wierzbicki <fwierzbicki at gmail.com> wrote:
> On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon <brett at python.org> wrote:
>> On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki <fwierzbicki at gmail.com> wrote:
>>> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling <amk at amk.ca> wrote:
>>>> 14:00 - 15:30
>>>> =============
>>>>
>>>> Two tracks:
>>>>
>>>> Cross-implementation issues:
>>>>
>>>>  What do the various VMs want/need from CPython to help with their
>>>>  implementations?
>>>>
>>>>  * Marking CPython-specific tests in the test suite?
>>>>  * Getting an implementation agnostic test suite for the Python language?
>>>>  * Separating the language tests and the pure Python part of the stdlib into
>>>>    a separate project?  (Or publish them as a separate package.)
>>>>  * Transition plans for 3.0?
>>>>
>>>>  Champion needed.
>>> I would like to champion this one.
>>>
>>
>> I told AMK this a while back, but might as well make it more public; I
>> am up for chairing as well.
> Brett,
>
> Are you saying you've already called the cross-implementation champion
> role?

No, I am saying I had told AMK I was interested in championing the
session. He chose you, and that's that. One less thing for me to worry
about. =)

>  If so I'm happy to defer or co-chair.

Your call. I will definitely be there representing CPython as best as
I can, so I will be making noise regardless of whether I am standing
in front of the room or not.

-Brett

From jnoller at gmail.com  Sat Dec  6 23:48:58 2008
From: jnoller at gmail.com (jnoller at gmail.com)
Date: Sat, 06 Dec 2008 22:48:58 +0000
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
Message-ID: <0016361e89b66d8dd8045d689975@google.com>

On Dec 6, 2008 5:42pm, Brett Cannon <brett at python.org> wrote:
> On Thu, Dec 4, 2008 at 17:02, Frank Wierzbicki wrote:
>
> > On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon wrote:
>
> >> On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki wrote:
>
> >>> On Wed, Dec 3, 2008 at 10:31 AM, AM Kuchling wrote:
>
> >>>> 14:00 - 15:30
>
> >>>> =============
>
> >>>>
>
> >>>> Two tracks:
>
> >>>>
>
> >>>> Cross-implementation issues:
>
> >>>>
>
> >>>> What do the various VMs want/need from CPython to help with their
>
> >>>> implementations?
>
> >>>>
>
> >>>> * Marking CPython-specific tests in the test suite?
>
> >>>> * Getting an implementation agnostic test suite for the Python  
language?
>
> >>>> * Separating the language tests and the pure Python part of the  
stdlib into
>
> >>>> a separate project? (Or publish them as a separate package.)
>
> >>>> * Transition plans for 3.0?
>
> >>>>
>
> >>>> Champion needed.
>
> >>> I would like to champion this one.
>
> >>>
>
> >>
>
> >> I told AMK this a while back, but might as well make it more public; I
>
> >> am up for chairing as well.
>
> > Brett,
>
> >
>
> > Are you saying you've already called the cross-implementation champion
>
> > role?
>
>
>
> No, I am saying I had told AMK I was interested in championing the
>
> session. He chose you, and that's that. One less thing for me to worry
>
> about. =)
>
>
>
> > If so I'm happy to defer or co-chair.
>
>
>
> Your call. I will definitely be there representing CPython as best as
>
> I can, so I will be making noise regardless of whether I am standing
>
> in front of the room or not.
>
>
>
> -Brett
>

Is heckling covered as an official obligation? :)

-jesse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/0547fa79/attachment.htm>

From hsoft at hardcoded.net  Sun Dec  7 00:01:39 2008
From: hsoft at hardcoded.net (Virgil Dupras)
Date: Sun, 7 Dec 2008 00:01:39 +0100
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
Message-ID: <1BA80D7C-44A6-4DE0-AC43-A99B50DF3F5E@hardcoded.net>

On 06 Dec 2008, at 20:38, Warren DeLano wrote:

>
>> Date: Fri, 05 Dec 2008 22:22:38 -0800
>> From: Dennis Lee Bieber <wlfraed at ix.netcom.com>
>> Subject: Re: "as" keyword woes
>> To: python-list at python.org
>> Message-ID: <bqadnTS6jM21h6fUnZ2dnUVZ_uydnZ2d at earthlink.com>
>>
>> 	I'm still in the dark as to what type of data could
>> even inspire the
>> use of "as" as an object name... A collection of "a" objects? In  
>> which
>> case, what are the "a"s? <G>
>
> Please let me clarify.  It is not "as" as a standalone object that we
> specifically miss in 2.6/3, but rather, the ability to use ".as"  
> used as
> a method or attribute name.
>
> In other words we have lost the ability to refer to "as" as the
> generalized OOP-compliant/syntax-independent method name for casting:
>
> new_object = old_object.as(class_hint)
>
> # For example:
>
> float_obj = int_obj.as("float")
>
> # or
>
> float_obj = int_obj.as(float_class)
>
> # as opposed to something like
>
> float_obj = int_obj.asFloat()
>
> # which requires a separate method for each cast, or
>
> float_obj = (float)int_obj
>
> # which required syntax-dependent casting [language-based rather than
> object-based].
>
> Of course, use of explicit casting syntax "(float)" is fine if you're
> restricting yourself to Python and other languages which support
> casting, but that solution is unavailable inside of a pure OOP
> message-passing paradigm where object.method(argument) invocations are
> all you have to work with.
>
> Please note that use of object.asClassname(...) is a ubiqitous
> convention for casting objects to specific classes (seen in  
> ObjectiveC,
> Java, SmallTalk, etc.).
>
> There, I assert that 'object.as(class_reference)' is the simplest and
> most elegant generalization of this widely-used convention.  Indeed,  
> it
> is the only obvious concise answer, if you are limited to using  
> methods
> for casting.
>
> Although there are other valid domain-specific uses for "as" as  
> either a
> local variable or attribute names (e.g. systematic naming: as, bs,  
> cs),
> those aren't nearly as important compared to "as" being available as  
> the
> name of a generalized casting method -- one that is now strictly  
> denied
> to users of Python 2.6 and 3.
>
> As someone somewhat knowledgable of how parsers work, I do not
> understand why a method/attribute name "object_name.as(...)" must
> necessarily conflict with a standalone keyword " as ".  It seems to me
> that it should be possible to unambiguously separate the two without
> ambiguity or undue complication of the parser.
>
> So, assuming I now wish to propose a corrective PEP to remedy this
> situation for Python 3.1 and beyond, what is the best way to get  
> started
> on such a proposal?
>
> Cheers,
> Warren
>

As long as "as" is widely known as a keyword, I don't see the problem.  
Every python developer knows that the convention is to add a trailing  
underscore when you want to use a reserved word in your code. Besides,  
your examples are quite abstract. I'm sure it's possible to find good  
examples for "while", "with", "import", "from" (I often use "from_")  
or "try" as well. Or perhaps that the python keywords should be "as_"  
so we leave "as" free for eventual methods?

As for the implicit proposition of allowing keywords only for methods,  
I agree with Guido about it being a slippery slope. So we would end up  
with a language where it is allowed to name methods after keywords,  
but not functions (they can be declared in the local scope)? Yikes! Oh  
well, maybe it's possible for an intelligent parser to distinguish  
between keywords and function references, but think of the poor  
grammar highlighters in all source editors! What a nightmare it will  
be for them. Anyway, is there any language that does this, allowing  
keywords as method names? I don't know, but if not, there's probably a  
reason for it.

Your views on code elegance are also rather Javaish. I'd go for  
"class_reference(object)" (and why the heck would you "be limited to  
using method for casting"?).

Ciao,
Virgil

From solipsis at pitrou.net  Sun Dec  7 00:15:12 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 6 Dec 2008 23:15:12 +0000 (UTC)
Subject: [Python-Dev] Buildbots for 2.6 and 3.0
Message-ID: <loom.20081206T231226-64@post.gmane.org>


Hello people,

Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots
for the release26-maint and release30-maint branches. Is someone working on that?

Regards

Antoine.





From musiccomposition at gmail.com  Sun Dec  7 00:18:05 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sat, 6 Dec 2008 17:18:05 -0600
Subject: [Python-Dev] 3.0.1 possibilities
Message-ID: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>

Since the release of 3.0, several critical issues have come to our
attention. Namely, the builtin cmp function wasn't removed [1] and the
new IO library proved to be (as expected) abysmally slow [2][3][4].
Christian proposed that we release 3.0.1 within the next week to patch
up this critical issues. Thoughts?


[1] http://bugs.python.org/1717
[2] http://bugs.python.org/4533
[3] http://bugs.python.org/4561
[4] http://bugs.python.org/4565

-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From python at rcn.com  Sun Dec  7 00:19:39 2008
From: python at rcn.com (Raymond Hettinger)
Date: Sat, 6 Dec 2008 15:19:39 -0800
Subject: [Python-Dev] Buildbots for 2.6 and 3.0
References: <loom.20081206T231226-64@post.gmane.org>
Message-ID: <8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1>

BTW, 3.0 went out the door with test_binascii failing on windows.
Was surprised that some buildbot wasn't complaining.

----- Original Message ----- 
From: "Antoine Pitrou" <solipsis at pitrou.net>
To: <python-dev at python.org>
Sent: Saturday, December 06, 2008 3:15 PM
Subject: [Python-Dev] Buildbots for 2.6 and 3.0


> 
> Hello people,
> 
> Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots
> for the release26-maint and release30-maint branches. Is someone working on that?
> 
> Regards
> 
> Antoine.
> 
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com

From python at rcn.com  Sun Dec  7 00:25:06 2008
From: python at rcn.com (Raymond Hettinger)
Date: Sat, 6 Dec 2008 15:25:06 -0800
Subject: [Python-Dev] 3.0.1 possibilities
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
Message-ID: <B389435910D44A728F008569E25586F4@RaymondLaptop1>

Strong +1 
Are the RMs on board?

----- Original Message ----- 
From: "Benjamin Peterson" <musiccomposition at gmail.com>
To: <python-dev at python.org>
Sent: Saturday, December 06, 2008 3:18 PM
Subject: [Python-Dev] 3.0.1 possibilities


> Since the release of 3.0, several critical issues have come to our
> attention. Namely, the builtin cmp function wasn't removed [1] and the
> new IO library proved to be (as expected) abysmally slow [2][3][4].
> Christian proposed that we release 3.0.1 within the next week to patch
> up this critical issues. Thoughts?
> 
> 
> [1] http://bugs.python.org/1717
> [2] http://bugs.python.org/4533
> [3] http://bugs.python.org/4561
> [4] http://bugs.python.org/4565
> 
> -- 
> Cheers,
> Benjamin Peterson
> "There's nothing quite as beautiful as an oboe... except a chicken
> stuck in a vacuum cleaner."
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com

From guido at python.org  Sun Dec  7 00:25:41 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 15:25:41 -0800
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
Message-ID: <ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>

+1

On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> Since the release of 3.0, several critical issues have come to our
> attention. Namely, the builtin cmp function wasn't removed [1] and the
> new IO library proved to be (as expected) abysmally slow [2][3][4].
> Christian proposed that we release 3.0.1 within the next week to patch
> up this critical issues. Thoughts?
>
>
> [1] http://bugs.python.org/1717
> [2] http://bugs.python.org/4533
> [3] http://bugs.python.org/4561
> [4] http://bugs.python.org/4565

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From solipsis at pitrou.net  Sun Dec  7 00:39:07 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 6 Dec 2008 23:39:07 +0000 (UTC)
Subject: [Python-Dev] 3.0.1 possibilities
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
Message-ID: <loom.20081206T233249-605@post.gmane.org>

Benjamin Peterson <musiccomposition <at> gmail.com> writes:
> 
> Since the release of 3.0, several critical issues have come to our
> attention. Namely, the builtin cmp function wasn't removed [1] and the
> new IO library proved to be (as expected) abysmally slow [2][3][4].
> Christian proposed that we release 3.0.1 within the next week to patch
> up this critical issues.

The IO library needs a lot of work to make it as fast as in 2.6, one week isn't
enough. I'm not sure an emergency release with the linked patches is very useful
honestly.




From barry at python.org  Sun Dec  7 00:41:41 2008
From: barry at python.org (Barry Warsaw)
Date: Sat, 6 Dec 2008 18:41:41 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>
Message-ID: <E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote:

> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson
> <musiccomposition at gmail.com> wrote:
>> Since the release of 3.0, several critical issues have come to our
>> attention. Namely, the builtin cmp function wasn't removed [1] and  
>> the
>> new IO library proved to be (as expected) abysmally slow [2][3][4].
>> Christian proposed that we release 3.0.1 within the next week to  
>> patch
>> up this critical issues. Thoughts?
>>
>>
>> [1] http://bugs.python.org/1717
>> [2] http://bugs.python.org/4533
>> [3] http://bugs.python.org/4561
>> [4] http://bugs.python.org/4565

I've set the priority on all these to release blockers, but I have my  
reservations about 4561 and 4565.  Resolution of those seem like more  
than a week or so away.

If we want to do a bug fix release for 3.0.1, I'd like to do it no  
later than the 19th.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTsNtXEjvBPtnXfVAQKI4AP8CNQEEb2KuN8cvd+t6YK39jFPxEo8j/YV
022zAWX3nNgj/R88C7OwoP6nYLx+zz4D3USj65OZN4NS9W9tJYKs+Lv6CnjIJi2X
cVceihcJHVYbyx8r14mYt6VjSmpTuNBD8uPZGv23WLZJZ5pNpWeuEMqI6XR27bY2
NYxbwSEUQpw=
=3wZN
-----END PGP SIGNATURE-----

From aahz at pythoncraft.com  Sun Dec  7 01:20:32 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 6 Dec 2008 16:20:32 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
References: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
Message-ID: <20081207002032.GA13190@panix.com>

On Sat, Dec 06, 2008, Guido van Rossum wrote:
>
> But I do *not* think it is a good idea to emphasize elsewhere that
> most people shouldn't use Python 3.0. Py3k will have a hard enough
> time gaining mindshare without the very developers who created
> it discouraging its use. If you can't find it in your heart to
> recommend 3.0, can you at least keep that within your circle of
> library-producing friends?

Sorry, I don't think I can do that.  It's difficult-to-impossible to leap
straight from Python 2.2 or 2.3 to 3.0, and I think that most released
Python software still ought to support versions going back that far.
Unless someone plans to use Python only on machines where they can
guarantee availability of 3.0, I think that sticking with 2.x is the
prudent course.

Then again, until the release of 3.0, I was still advocating the use of
classic classes in the 2.x series, and I haven't yet decided whether I
should change that stance now that there is a released version of Python
where new-style classes are the default.

I believe that it would be a shame and a disservice to Python if there
were a large proportion of the Python community that discouraged the use
of 3.0; I also believe it would be a shame and a disservice to Python if
you (and other people) tell conservatives like me that we should keep our
mouths shut.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From aahz at pythoncraft.com  Sun Dec  7 01:23:57 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 6 Dec 2008 16:23:57 -0800
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
Message-ID: <20081207002357.GB13190@panix.com>

On Sat, Dec 06, 2008, Benjamin Peterson wrote:
>
> Since the release of 3.0, several critical issues have come to our
> attention. Namely, the builtin cmp function wasn't removed [1] and the
> new IO library proved to be (as expected) abysmally slow [2][3][4].
> Christian proposed that we release 3.0.1 within the next week to patch
> up this critical issues. Thoughts?

Seems overly aggressive to me.  These prohibit use of 3.0 in production
environments; they do not prohibit development in 3.0.  I think we should
target early January and make it public that we are doing so.  That will
give more time for any additional similar bugs to get fixed at once.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From amauryfa at gmail.com  Sun Dec  7 01:32:00 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Sun, 7 Dec 2008 01:32:00 +0100
Subject: [Python-Dev] Buildbots for 2.6 and 3.0
In-Reply-To: <8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1>
References: <loom.20081206T231226-64@post.gmane.org>
	<8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1>
Message-ID: <e27efe130812061632x1c2e432djade3849a57586ed5@mail.gmail.com>

Hello,

On Sun, Dec 7, 2008 at 00:19, Raymond Hettinger <python at rcn.com> wrote:
> BTW, 3.0 went out the door with test_binascii failing on windows.
> Was surprised that some buildbot wasn't complaining.

They were complaining. But not loud enough to stop the release.
(see bottom of
http://www.python.org/dev/buildbot/3.0/x86%20W2k8%203.0/builds/486/step-test/0
)

-- 
Amaury Forgeot d'Arc

From ncoghlan at gmail.com  Sun Dec  7 02:02:21 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 11:02:21 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081206143454.GA15293@phd.pp.ru>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<200812051127.35880.eckhardt@satorlaser.com>	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
Message-ID: <493B209D.5070306@gmail.com>

Oleg Broytmann wrote:
> My filemanager
> (Midnight Commander, for the matter) shows these files and directories as
> "?????.???", but I can chdir to such directories, and I can open such
> files. It would be a big bad blow for me if filemanagers (or other
> programs) start to filter these filenames.

Summary for those without the time to read the longer version below:
- File managers, backup managers and similar apps should use the binary
APIs worldwide
- Most apps in countries where encoding problems are common will also
need to use the binary APIs to be acceptable to their uses
- Many apps in countries where the 'native' encoding is UTF-8, ASCII or
latin-1 will be able to use the Unicode APIs without any issues whatsoever
- Apps targeting a limited, well-controlled execution environment (e.g.
web services) will also be able to use the Unicode APIs
- I think the binary and Unicode APIs should be available (and fully
functional) on all platforms (including Windows) so that app developers
don't create portability problems for themselves when they make the
decision as to which API to use

-------------

The point about *filesystem* apps (i.e. file managers, backup tools,
indexing engines) needing to deal with the imperfect world of dodgy
filesystem encodings isn't in dispute at all - that's why the binary
alternative APIs were added.

The point is that there is a spectrum from providing a completely clean
solution that addresses only the ideal case of "file paths and other
items such as environment variable names and values retrieved from the
OS are always well-formed text in the appropriate default encoding"
(which will actually work for large chunks of the planet - those where
the locals are native ASCII speakers and those where computers didn't
start to enter widespread use until after Unicode was already available)
to addressing only the most pessimistic case of "you can't trust the
default encoding at all, and need to assume that all strings retrieved
from the OS contain arbitrary binary data" (which is actually true for
some parts of the planet, but thankfully not for all of it).

Hopefully people can at least agree that the first extreme is
unacceptable because that ideal world doesn't exist. I personally think
that the other extreme is *also* unacceptable, because it burdens every
single application developer with dealing with a potential problem that
quite simply may not be a problem for them because they're in a
situation where the naive assumption of a sane operating environment is
actually a valid one for their particular application.

The idea of parallel Unicode and bytes APIs means that for those with an
appropriately limited target environment and/or audience, the Unicode
APIs will "just work", while the developers that aren't so lucky can
rely on the binary APIs instead.

That's actually the one place where I disagree with Guido: I agree with
Adam that the binary APIs *should* be available on Windows.

The difference would be that whereas on *nix type systems, the bytes
APIs are the 'lower level' that more accurately represents the
underlying OS, on Windows it would be the other way around, with the
Unicode APIs as the lower level ones, and the binary APIs as wrappers
around them that automatically decoded the bytes representation to a
Unicode one when writing to the OS, and encoded from Unicode to bytes
when reading from the OS.

If the binary APIs are missing from a major platform (i.e. Windows) then
the choice to use them brings with it a major cross-platform portability
problem that should really be handled by the standard library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From brett at python.org  Sun Dec  7 02:10:04 2008
From: brett at python.org (Brett Cannon)
Date: Sat, 6 Dec 2008 17:10:04 -0800
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>
	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
Message-ID: <bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>

On Sat, Dec 6, 2008 at 15:41, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote:
>
>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson
>> <musiccomposition at gmail.com> wrote:
>>>
>>> Since the release of 3.0, several critical issues have come to our
>>> attention. Namely, the builtin cmp function wasn't removed [1] and the
>>> new IO library proved to be (as expected) abysmally slow [2][3][4].
>>> Christian proposed that we release 3.0.1 within the next week to patch
>>> up this critical issues. Thoughts?
>>>
>>>
>>> [1] http://bugs.python.org/1717
>>> [2] http://bugs.python.org/4533
>>> [3] http://bugs.python.org/4561
>>> [4] http://bugs.python.org/4565
>
> I've set the priority on all these to release blockers, but I have my
> reservations about 4561 and 4565.  Resolution of those seem like more than a
> week or so away.
>
> If we want to do a bug fix release for 3.0.1, I'd like to do it no later
> than the 19th.
>

+1 just to get rid of cmp(). And if io speedups can happen, great, but
they can also wait for 3.0.2.

-Brett

From ncoghlan at gmail.com  Sun Dec  7 02:12:24 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 11:12:24 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493AB3E6.7070806@gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com>
	<493AB3E6.7070806@gmail.com>
Message-ID: <493B22F8.8090902@gmail.com>

Toshio Kuratomi wrote:
> Note 2: If there isn't a parallel API on all platforms, for instance,
> Guido's proposal to not have os.environb on Windows, then you'll still
> have to have a platform specific check. (Likely you should try to access
> os.evironb in this instance and if it doesn't exist, use os.environ
> instead... and remember that you need to either change os.environ's data
> into str type or change os.environb's data into byte type.)

Note that this is why I personally think the binary API variants
*should* exist on Windows, just with the sense of the system encoding
flipped around.

That is, on *nix:
- underlying OS API uses bytes
- binary API just passes values straight through
- Unicode API uses the system encoding to encode Unicode names and
values to be passed to the OS API and to decode bytes names and values
received from the OS API

While on Windows:
- underlying OS API uses Unicode
- Unicode API just passes values straight through
- binary API uses the system encoding to decode bytes names and values
to be passed to the OS API and to encode Unicode names and values
received from the OS API

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sun Dec  7 02:12:47 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 7 Dec 2008 01:12:47 +0000 (UTC)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<200812051127.35880.eckhardt@satorlaser.com>	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com>
Message-ID: <loom.20081207T011013-279@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> If the binary APIs are missing from a major platform (i.e. Windows) then
> the choice to use them brings with it a major cross-platform portability
> problem that should really be handled by the standard library.

+1

I might also add that providing binary APIs does not prevent us to implement
some special representation of broken filenames when using the unicode APIs (for
example using private Unicode characters - I'm not sure what the right
terminology is - as sometimes suggested).

Regards

Antoine.



From ncoghlan at gmail.com  Sun Dec  7 02:27:56 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 11:27:56 +1000
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local>
Message-ID: <493B269C.9020303@gmail.com>

Warren DeLano wrote:
> In other words we have lost the ability to refer to "as" as the
> generalized OOP-compliant/syntax-independent method name for casting:

Other possible spellings:

# Use the normal Python idiom for avoiding keyword clashes
# and append a trailing underscore
new_object = old_object.as_(class_hint)
float_obj = int_obj.as_("float")
float_obj = int_obj.as_(float_class)

# Use a different word (such as, oh, "cast" perhaps?)
new_object = old_object.cast(class_hint)
float_obj = int_obj.cast("float")
float_obj = int_obj.cast(float_class)

You could make a PEP if you really wanted to, but it's going to be rejected.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From nd at perlig.de  Sun Dec  7 02:35:41 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Sun, 7 Dec 2008 02:35:41 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B22F8.8090902@gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
Message-ID: <200812070235.41321@news.perlig.de>

* Nick Coghlan wrote:

> Toshio Kuratomi wrote:
> > Note 2: If there isn't a parallel API on all platforms, for instance,
> > Guido's proposal to not have os.environb on Windows, then you'll still
> > have to have a platform specific check. (Likely you should try to
> > access os.evironb in this instance and if it doesn't exist, use
> > os.environ instead... and remember that you need to either change
> > os.environ's data into str type or change os.environb's data into byte
> > type.)
>
> Note that this is why I personally think the binary API variants
> *should* exist on Windows, just with the sense of the system encoding
> flipped around.
>
> That is, on *nix:
> - underlying OS API uses bytes
> - binary API just passes values straight through
> - Unicode API uses the system encoding to encode Unicode names and
> values to be passed to the OS API and to decode bytes names and values
> received from the OS API
>
> While on Windows:
> - underlying OS API uses Unicode
> - Unicode API just passes values straight through
> - binary API uses the system encoding to decode bytes names and values
> to be passed to the OS API and to encode Unicode names and values
> received from the OS API

Now that is somewhat strange. That way you'll have two unreliable APIs and 
need to switch depending on the platform again.

nd
-- 
+++++[>++++++<-]>++>++++++[>++++++++++++<-]>++.<++++[>++++++++++<-]>+++.--.
+.<
<.>++++[>----<-]>---.<+++[>++++<-]>+.+.
+++++.<+++[>----<-]>.---.<+++[>++++<-]>
+.<<.>+++++[>-------<-]>+.<+++++[>++++<-]>+.<+++[>++++<-]>+.------.<<.>++++++
[>------<-]>.<+++++[>+++++<-]>.++.++++++++.------.<+++[>++++<-]>+.

From ncoghlan at gmail.com  Sun Dec  7 02:45:52 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 11:45:52 +1000
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081207002032.GA13190@panix.com>
References: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081207002032.GA13190@panix.com>
Message-ID: <493B2AD0.30303@gmail.com>

Aahz wrote:
> I believe that it would be a shame and a disservice to Python if there
> were a large proportion of the Python community that discouraged the use
> of 3.0; I also believe it would be a shame and a disservice to Python if
> you (and other people) tell conservatives like me that we should keep our
> mouths shut.

I don't think being honest about the situation is going to hurt anything
in the long run. There are lots of advantages to 3.0, but also plenty of
good reasons to stick with 2.x as well.

At this point in time, my own recommendation would be that if someone
doesn't have time to do a proper evaluation of the situation (talking
production development here, not "learning for fun"), then I would
probably still point them at 2.5. That recommendation will probably
change to 2.6 in a couple of months (since it usually takes a few months
after a release for the rest of the Python ecosystem to catch up with a
new 2.x release).

If they have the time though, my recommendation would be for them to do
their *own* evaluation, looking both at things that favour 3.0 like
Unicode handling and general developer convenience, as well as the
things that currently favour 2.x like IO speed and availability of 3rd
party libraries.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Sun Dec  7 02:51:30 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 11:51:30 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812070235.41321@news.perlig.de>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<493AB3E6.7070806@gmail.com>
	<493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de>
Message-ID: <493B2C22.5060907@gmail.com>

Andr? Malo wrote:
>> While on Windows:
>> - underlying OS API uses Unicode
>> - Unicode API just passes values straight through
>> - binary API uses the system encoding to decode bytes names and values
>> to be passed to the OS API and to encode Unicode names and values
>> received from the OS API
> 
> Now that is somewhat strange. That way you'll have two unreliable APIs and 
> need to switch depending on the platform again.

Sory, system encoding was probably a poor choice of words there, since
that generally means mbcs when talking about windows (which would indeed
be a very poor choice of encoding).

For binary wrappers around the Windows Unicode APIs, I was thinking
specifically of using UTF-8, since that should be able to encode
anything the Unicode APIs can handle.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Sun Dec  7 02:56:44 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 07 Dec 2008 02:56:44 +0100
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081207002032.GA13190@panix.com>
References: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>	<20081205023514.GA1723@amk.local>	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081207002032.GA13190@panix.com>
Message-ID: <493B2D5C.4090505@v.loewis.de>

> Sorry, I don't think I can do that.  It's difficult-to-impossible to leap
> straight from Python 2.2 or 2.3 to 3.0

My experience is different. That is very well possible (of course, I
haven't heard in a long time of a project that needs to maintain
compatibility with 2.2).

Regards,
Martin

From martin at v.loewis.de  Sun Dec  7 02:58:00 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 07 Dec 2008 02:58:00 +0100
Subject: [Python-Dev] Buildbots for 2.6 and 3.0
In-Reply-To: <loom.20081206T231226-64@post.gmane.org>
References: <loom.20081206T231226-64@post.gmane.org>
Message-ID: <493B2DA8.2000105@v.loewis.de>

> Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots
> for the release26-maint and release30-maint branches. Is someone working on that?

Yes. I won't enable 2.6 build slaves until 2.5.3 is released, but will
afterwards.

Regards,
Martin

From aahz at pythoncraft.com  Sun Dec  7 04:31:16 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 6 Dec 2008 19:31:16 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B209D.5070306@gmail.com>
References: <ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com>
Message-ID: <20081207033116.GB12097@panix.com>

On Sun, Dec 07, 2008, Nick Coghlan wrote:
>
> If the binary APIs are missing from a major platform (i.e. Windows) then
> the choice to use them brings with it a major cross-platform portability
> problem that should really be handled by the standard library.

+1
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From warren at delsci.com  Sun Dec  7 05:19:08 2008
From: warren at delsci.com (Warren DeLano)
Date: Sat, 6 Dec 2008 20:19:08 -0800
Subject: [Python-Dev] "as" keyword woes
Message-ID: <896B75251BA19745A529B1B867893FA5DB11@planet.delsci.local>

> Date: Sat, 6 Dec 2008 12:13:16 -0800 (PST)
> From: Carl Banks <pavlovevidence at gmail.com>
> Subject: Re: "as" keyword woes
> To: python-list at python.org
> Message-ID:
>
> (snip)
> 	
> If you write a PEP, I advise you to try to sound less whiny and than
> you have in this thread.  
>
> (snip)

Ehem, well, such comments notwithstanding, I thank everyone who
responded to my latest post on this topic for taking my inquiry
seriously, and for providing cogent, focused, well-reasoned feedback
while not resorting to name-calling, to false accusations on top of
baseless assumptions, or to explicit personal attacks on my competence,
sincerity, experience, credibility, or form.  

To you especially, I am grateful for your input for your years of
service to the community and to the noble ideals you embody in the
Python project.  May the rest of us (not just myself) be ashamed of our
lesser conduct and learn from you exemplary performance.

So to summarize, having assimilated all responses over the past several
days (python-list as well as python-dev, for the newcomers), I now
accept the following as self-evident:

-> "as", as a Python keyword, is a here to stay:  Love it or leave it.

-> Likewise ditto for the GIL: if you truly need Python concurrency
within a single process, then use a Python implementation other than
CPython.

Season's greetings to all!  Peace.

Cheers,
Warren

From guido at python.org  Sun Dec  7 05:20:07 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 6 Dec 2008 20:20:07 -0800
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <493B2AD0.30303@gmail.com>
References: <E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081207002032.GA13190@panix.com> <493B2AD0.30303@gmail.com>
Message-ID: <ca471dc20812062020o44a3d98er446b7c786252cb2f@mail.gmail.com>

On Sat, Dec 6, 2008 at 5:45 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Aahz wrote:
>> I believe that it would be a shame and a disservice to Python if there
>> were a large proportion of the Python community that discouraged the use
>> of 3.0; I also believe it would be a shame and a disservice to Python if
>> you (and other people) tell conservatives like me that we should keep our
>> mouths shut.

I hope I am not perceived as telling you to keep your mouth shut. I am
merely hoping that you will decide for yourself after having heard me
out.

> I don't think being honest about the situation is going to hurt anything
> in the long run. There are lots of advantages to 3.0, but also plenty of
> good reasons to stick with 2.x as well.
>
> At this point in time, my own recommendation would be that if someone
> doesn't have time to do a proper evaluation of the situation (talking
> production development here, not "learning for fun"), then I would
> probably still point them at 2.5. That recommendation will probably
> change to 2.6 in a couple of months (since it usually takes a few months
> after a release for the rest of the Python ecosystem to catch up with a
> new 2.x release).
>
> If they have the time though, my recommendation would be for them to do
> their *own* evaluation, looking both at things that favour 3.0 like
> Unicode handling and general developer convenience, as well as the
> things that currently favour 2.x like IO speed and availability of 3rd
> party libraries.

That sounds right. I just heard (via Martin) that PEP 3131 (Unicode
letters in identifiers) is already a big hit in Japan.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Sun Dec  7 05:53:07 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 6 Dec 2008 21:53:07 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B2C22.5060907@gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>
Message-ID: <aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>

On Sat, Dec 6, 2008 at 6:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Andr? Malo wrote:
>>> While on Windows:
>>> - underlying OS API uses Unicode
>>> - Unicode API just passes values straight through
>>> - binary API uses the system encoding to decode bytes names and values
>>> to be passed to the OS API and to encode Unicode names and values
>>> received from the OS API
>>
>> Now that is somewhat strange. That way you'll have two unreliable APIs and
>> need to switch depending on the platform again.
>
> Sory, system encoding was probably a poor choice of words there, since
> that generally means mbcs when talking about windows (which would indeed
> be a very poor choice of encoding).
>
> For binary wrappers around the Windows Unicode APIs, I was thinking
> specifically of using UTF-8, since that should be able to encode
> anything the Unicode APIs can handle.

If the Unicode APIs only have correct unicode, sure.  If not you'll
get errors translating to UTF-8 (and the byte APIs are supposed to
pass bad names through unaltered.)  Kinda ironic, no?


-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Sun Dec  7 07:07:08 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Sat, 06 Dec 2008 22:07:08 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
Message-ID: <493B680C.6010605@gmail.com>

Guido van Rossum wrote:
> On Sat, Dec 6, 2008 at 10:53 AM,  <glyph at divmod.com> wrote:

>> I find it interesting to note that the only users in this discussion who
>> actually have these problems in real life all have this attitude.  It is
>> expected that in an imperfect world we will have imperfect encodings, but it
>> is super important that software which can open files can deal with not
>> understanding the character translation of the filename.
> 
> For file managers and similar tools I am absolutely 100% in agreement
> -- that's why the binary APIs are there.
> 
> Most apps aren't file managers or ftp clients though. The sky is not falling.
> 
I agree that the sky is not falling (as long as we get a binary API for
env vars in 3.1) but I'm still wondering what the use case you see is.
Most apps aren't file managers or ftp clients but when they interact
with files (for instance, a file selection dialog) they need to be able
to show the user all the relevant files.  So on an app-by-app basis the
need for this is high.  On a code basis, I'd hope that most file
selection dialogs are pulled out into libraries... but that still
doesn't help me identify when someone would expect that asking python
for a list of all files in a directory or a specific set of files in a
directory should, without warning, return only a subset of them.  In
what situations is this appropriate behaviour?

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081206/8ce71be8/attachment.pgp>

From glyph at divmod.com  Sun Dec  7 08:05:48 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sun, 07 Dec 2008 07:05:48 -0000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B680C.6010605@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
	<493B680C.6010605@gmail.com>
Message-ID: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>


On 06:07 am, a.badger at gmail.com wrote:
>Guido van Rossum wrote:
>>On Sat, Dec 6, 2008 at 10:53 AM,  <glyph at divmod.com> wrote:
>
>>>I find it interesting to note that the only users in this discussion 
>>>who
>>>actually have these problems in real life all have this attitude.

>>For file managers and similar tools I am absolutely 100% in agreement
>>-- that's why the binary APIs are there.

>>Most apps aren't file managers or ftp clients though. The sky is not 
>>falling.

>Most apps aren't file managers or ftp clients but when they interact
>with files (for instance, a file selection dialog) they need to be able
>to show the user all the relevant files.  So on an app-by-app basis the
>need for this is high.

While I tend to agree emphatically with this, the *real* solution here 
is a path-abstraction library.  In separate discussions, the difficulty 
of getting such a thing into the standard library has been discussed, 
due to the wide variety of opinions as to what it should look like (and 
the shocking level of difficulty involved in making such a thing really 
work correctly).

I'd be very happy to talk to you off-list about my ideas for such a 
thing, but I'd rather not resurrect yet another tedious discussion here 
just now :).
>On a code basis, I'd hope that most file
>selection dialogs are pulled out into libraries... but that still
>doesn't help me identify when someone would expect that asking python
>for a list of all files in a directory or a specific set of files in a
>directory should, without warning, return only a subset of them.  In
>what situations is this appropriate behaviour?

If you say listdir(unicode) on a POSIX OS, your program is saying "I 
only know how to deal with unicode results from this function, so please 
only give me those.".  If your program is smart enough to deal with 
bytes, then you would have asked for bytes, no?  Returning only 
filenames which can be properly decoded makes sense.  Otherwise everyone 
needs to learn about this highly confusing issue, even for the simplest 
scripts.

Skipping undecodable values is good enough that it will work 90% of the 
time.  When you need to get to 100%, it won't be impossible - the bytes 
APIs will be there.  In the longer term, hopefully some path abstraction 
will eventually be there too.  We should not wait for a perfectly 
correct path abstraction to arrive before providing the primitives to do 
it yourself, though.

From hfuerstenau at gmx.net  Sun Dec  7 10:19:52 2008
From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=)
Date: Sun, 07 Dec 2008 10:19:52 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<493AB3E6.7070806@gmail.com>
	<493B22F8.8090902@gmail.com>	<200812070235.41321@news.perlig.de>
	<493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
Message-ID: <493B9538.9080107@gmx.net>

> If the Unicode APIs only have correct unicode, sure.  If not you'll
> get errors translating to UTF-8 (and the byte APIs are supposed to
> pass bad names through unaltered.)  Kinda ironic, no?

As far as I can see all Python Unicode strings can be encoded to UTF-8,
even things like lone surrogates because Python doesn't care about them.
So both the Unicode API and the binary API would be fail-safe on Windows.

- Hagen


From rhamph at gmail.com  Sun Dec  7 10:21:01 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 7 Dec 2008 02:21:01 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B923F.6010706@gmx.net>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
Message-ID: <aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>

On Sun, Dec 7, 2008 at 2:07 AM, Hagen F?rstenau <hfuerstenau at gmx.net> wrote:
>> If the Unicode APIs only have correct unicode, sure.  If not you'll
>> get errors translating to UTF-8 (and the byte APIs are supposed to
>> pass bad names through unaltered.)  Kinda ironic, no?
>
> As far as I can see all Python Unicode strings can be encoded to UTF-8,
> even things like lone surrogates because Python doesn't care about them.
> So both the Unicode API and the binary API would be fail-safe on Windows.

Python is broken and needs to be fixed.

http://bugs.python.org/issue3672
http://bugs.python.org/issue3297


-- 
Adam Olsen, aka Rhamphoryncus

From hfuerstenau at gmx.net  Sun Dec  7 10:35:15 2008
From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=)
Date: Sun, 07 Dec 2008 10:35:15 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>	
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>	
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>	
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
Message-ID: <493B98D3.8070405@gmx.net>

>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>> even things like lone surrogates because Python doesn't care about them.
>> So both the Unicode API and the binary API would be fail-safe on Windows.
> 
> Python is broken and needs to be fixed.
> 
> http://bugs.python.org/issue3672
> http://bugs.python.org/issue3297

But the question of whether Python should care about lone surrogates or
not is at best tangential to the issue at hand.  If you have lone
surrogates in the Unicode API (and didn't raise an exception on the way
getting there), then the sensible thing is to encode them into lone
UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
encoding to UTF-8 for the binary API would not be the place to enforce it.

- Hagen

From g.brandl at gmx.net  Sun Dec  7 12:41:03 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 07 Dec 2008 12:41:03 +0100
Subject: [Python-Dev] Rewrite map for old URLs in place
Message-ID: <ghgcpk$b90$1@ger.gmane.org>

Hi,

with a bit of delay I finally got around to creating a mod_rewrite map of
the 2.5 URLs.  URLs like http://docs.python.org/tut/node3.html will now
point permanently to the new URL.

Let me know if you find a problem.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From ncoghlan at gmail.com  Sun Dec  7 13:55:08 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 07 Dec 2008 22:55:08 +1000
Subject: [Python-Dev] Rewrite map for old URLs in place
In-Reply-To: <ghgcpk$b90$1@ger.gmane.org>
References: <ghgcpk$b90$1@ger.gmane.org>
Message-ID: <493BC7AC.50405@gmail.com>

Georg Brandl wrote:
> Hi,
> 
> with a bit of delay I finally got around to creating a mod_rewrite map of
> the 2.5 URLs.  URLs like http://docs.python.org/tut/node3.html will now
> point permanently to the new URL.
> 
> Let me know if you find a problem.

Excellent news!

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From steve at holdenweb.com  Sun Dec  7 14:38:58 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sun, 07 Dec 2008 08:38:58 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
Message-ID: <493BD1F2.5080300@holdenweb.com>

Brett Cannon wrote:
> On Sat, Dec 6, 2008 at 15:41, Barry Warsaw <barry at python.org> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote:
>>
>>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson
>>> <musiccomposition at gmail.com> wrote:
>>>> Since the release of 3.0, several critical issues have come to our
>>>> attention. Namely, the builtin cmp function wasn't removed [1] and the
>>>> new IO library proved to be (as expected) abysmally slow [2][3][4].
>>>> Christian proposed that we release 3.0.1 within the next week to patch
>>>> up this critical issues. Thoughts?
>>>>
>>>>
>>>> [1] http://bugs.python.org/1717
>>>> [2] http://bugs.python.org/4533
>>>> [3] http://bugs.python.org/4561
>>>> [4] http://bugs.python.org/4565
>> I've set the priority on all these to release blockers, but I have my
>> reservations about 4561 and 4565.  Resolution of those seem like more than a
>> week or so away.
>>
>> If we want to do a bug fix release for 3.0.1, I'd like to do it no later
>> than the 19th.
>>
> 
> +1 just to get rid of cmp(). And if io speedups can happen, great, but
> they can also wait for 3.0.2.
> 
A point release just to remove a function whose withdrawal has been
advertised as a 3.0 change hardly seems worth the substantial effort of
cutting a release. If cmp() shouldn't have been in 3.0 and was then
there's surely no problem about removing it later as promised: anyone
who uses it in 3.0 code shouldn't be.

If it doesn't have to wait for a major release then is there any real
need to cut the minor release immediately?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From steve at holdenweb.com  Sun Dec  7 14:38:58 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sun, 07 Dec 2008 08:38:58 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
Message-ID: <493BD1F2.5080300@holdenweb.com>

Brett Cannon wrote:
> On Sat, Dec 6, 2008 at 15:41, Barry Warsaw <barry at python.org> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote:
>>
>>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson
>>> <musiccomposition at gmail.com> wrote:
>>>> Since the release of 3.0, several critical issues have come to our
>>>> attention. Namely, the builtin cmp function wasn't removed [1] and the
>>>> new IO library proved to be (as expected) abysmally slow [2][3][4].
>>>> Christian proposed that we release 3.0.1 within the next week to patch
>>>> up this critical issues. Thoughts?
>>>>
>>>>
>>>> [1] http://bugs.python.org/1717
>>>> [2] http://bugs.python.org/4533
>>>> [3] http://bugs.python.org/4561
>>>> [4] http://bugs.python.org/4565
>> I've set the priority on all these to release blockers, but I have my
>> reservations about 4561 and 4565.  Resolution of those seem like more than a
>> week or so away.
>>
>> If we want to do a bug fix release for 3.0.1, I'd like to do it no later
>> than the 19th.
>>
> 
> +1 just to get rid of cmp(). And if io speedups can happen, great, but
> they can also wait for 3.0.2.
> 
A point release just to remove a function whose withdrawal has been
advertised as a 3.0 change hardly seems worth the substantial effort of
cutting a release. If cmp() shouldn't have been in 3.0 and was then
there's surely no problem about removing it later as promised: anyone
who uses it in 3.0 code shouldn't be.

If it doesn't have to wait for a major release then is there any real
need to cut the minor release immediately?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From ziade.tarek at gmail.com  Sun Dec  7 18:27:54 2008
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Sun, 7 Dec 2008 18:27:54 +0100
Subject: [Python-Dev] distutils patches, request for review
Message-ID: <94bdd2610812070927o5154c4edx9114c4a006edb9d@mail.gmail.com>

Hi,

I am looking for a core developer to review a few patches for distutils.

#1 is mandatory (it removes a bad bug)
#2 is very nice to have
#3 to #5 are test coverage and code beautication

In order:

1. #4400 : the default generated .pypirc is broken. This patch fixes
it: http://bugs.python.org/issue4400
2. #4394 : no need to store the password in pypirc anymore : using the
prompt if not stored. http://bugs.python.org/issue4394
3. #2461 : more test coverage. http://bugs.python.org/issue2461
4. #3992 : removes custom log implementation -> uses logging instead.
http://bugs.python.org/issue3992
5. #3985 : more cleanup. http://bugs.python.org/issue3985
6. #3986 : http://bugs.python.org/issue3986

Some of them are a few month old so I can refresh the patch on the
current trunk(s) as soon as they are picked.

Regards
Tarek

-- 
Tarek Ziad? | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

From guido at python.org  Sun Dec  7 18:32:51 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Dec 2008 09:32:51 -0800
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493BD1F2.5080300@holdenweb.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>
	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
	<493BD1F2.5080300@holdenweb.com>
Message-ID: <ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>

On Sun, Dec 7, 2008 at 5:38 AM, Steve Holden <steve at holdenweb.com> wrote:
> A point release just to remove a function whose withdrawal has been
> advertised as a 3.0 change hardly seems worth the substantial effort of
> cutting a release. If cmp() shouldn't have been in 3.0 and was then
> there's surely no problem about removing it later as promised: anyone
> who uses it in 3.0 code shouldn't be.
>
> If it doesn't have to wait for a major release then is there any real
> need to cut the minor release immediately?

Well, since 2to3 doesn't remove cmp, and it actually works, it's
likely that people will be accidentally depending on it in code
converted from 2.x. In the past, where there was a discrepancy between
docs and code, we've often ruled in favor of the code using arguments
like "it always worked like this so we'll break working code if we
change it now". There's clearly an argument of timeliness there, which
is why we'd like to get this fixed ASAP. The alternative, which nobody
likes, would be to keep it around, deprecate it in 3.1, and remove it
in 3.2 or 3.3.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Sun Dec  7 18:35:53 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 7 Dec 2008 10:35:53 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B98D3.8070405@gmx.net>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
Message-ID: <aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>

On Sun, Dec 7, 2008 at 2:35 AM, Hagen F?rstenau <hfuerstenau at gmx.net> wrote:
>>> As far as I can see all Python Unicode strings can be encoded to UTF-8,
>>> even things like lone surrogates because Python doesn't care about them.
>>> So both the Unicode API and the binary API would be fail-safe on Windows.
>>
>> Python is broken and needs to be fixed.
>>
>> http://bugs.python.org/issue3672
>> http://bugs.python.org/issue3297
>
> But the question of whether Python should care about lone surrogates or
> not is at best tangential to the issue at hand.  If you have lone
> surrogates in the Unicode API (and didn't raise an exception on the way
> getting there), then the sensible thing is to encode them into lone
> UTF-8 surrogates.  Even if you wanted to prevent lone surrogates,
> encoding to UTF-8 for the binary API would not be the place to enforce it.

No.  Unicode *requires* them to be treated as errors.  If you want to
pass them through then you're creating a custom encoding... which you
might argue for in this case, but it needs to be clearly separate from
the real UTF-8.


-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Sun Dec  7 19:03:13 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Sun, 07 Dec 2008 10:03:13 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
Message-ID: <493C0FE1.30506@gmail.com>

glyph at divmod.com wrote:
> 
> On 06:07 am, a.badger at gmail.com wrote:
>> Most apps aren't file managers or ftp clients but when they interact
>> with files (for instance, a file selection dialog) they need to be able
>> to show the user all the relevant files.  So on an app-by-app basis the
>> need for this is high.
> 
> While I tend to agree emphatically with this, the *real* solution here
> is a path-abstraction library.

Why don't you send me some information offlist.  I'm not sure I agree
that a path-abstraction library can work correctly but if it can it
would be nice to have that at a level higher than the file-dialog
libraries that I was envisioning.

[snip]

>> ... but that still
>> doesn't help me identify when someone would expect that asking python
>> for a list of all files in a directory or a specific set of files in a
>> directory should, without warning, return only a subset of them.  In
>> what situations is this appropriate behaviour?
> 
> If you say listdir(unicode) on a POSIX OS, your program is saying "I
> only know how to deal with unicode results from this function, so please
> only give me those.".

No.  (explained below)

>  If your program is smart enough to deal with
> bytes, then you would have asked for bytes, no?

Yes (explained below)

>  Returning only
> filenames which can be properly decoded makes sense.  Otherwise everyone
> needs to learn about this highly confusing issue, even for the simplest
> scripts.
>
os.listdir(unicode) (currently) means that the *programmer* is asking
that the stdlib return the decodable filenames from this directory.  The
question is whether the programmer understood that this is what they
were asking for and whether it is what they most likely want.  I would
make the following statements WRT to this:

1) The programmer most likely does not want decodable filenames and only
decodable filename.  If they were, we'd see a lot of python2.x code that
turns pathnames into unicode and discards everything that wasn't
decodable.  No one has given a use case for finding only the *decodable*
subset of files.  If I request to see all *.py files in a directory, I
want to see all of the *.py files in the directory, decodable or not.
If you can show how programmers intend "90%" of their calls to
os.listdir()/glob.glob('*.txt') to show only the decodable subset of the
results, then the foundation of my arguments is gone.  So please, give
examples to prove this wrong.

  - If this is true, a definition of os.listdir(<type 'str'>) that would
better meet programmer expectation would be: "Give me all files in a
directory with the output as str type".  The definition of
os.listdir(<type 'bytes'>) would be "Give me all files in a directory
with the output as bytes type".  Raising an exception when the filenames
are undecodable is perfectly reasonable in this situation.

2) For the programmer to understand the difference between
os.listdir(<type 'bytes'>) and os.listdir(<type 'str'>) they have to
understand the "highly confusing issue" and what it means for their
code.  So the current method is forcing programmers to understand it
even for the simplest scripts if their environment is not uniform with
no clue from the interpreter that there is an issue.

  - Similarly, raising an exception on undecodable values means that the
programmer can ignore the issue in any scripts in sane environments and
will be told that they need to deal with it (via an exception) when
their script runs in a non-sane environment.

3) The usage of unicode vs bytes is easy to miss for someone starting
with py2.x or windows and moving to a multi-platform or unix project.
Even simple testing won't reveal the problem unless the programmer knows
that they have to test what happens when encodings are mixed.  Once
again, this is requiring the programmer to understand the encoding issue
 without help from the interpreter.

> Skipping undecodable values is good enough that it will work 90% of the
> time.

You and Guido have now made this claim to defend not raising an
exception but I still don't have a use case.

Here are use cases that I see:

* Bill is coding an application for use inside his company.  His company
only uses utf-8.  His code naively uses os.listdir(<type 'str'>).

  - The code does not throw an exception whether we use the current
os.listdir() or one that could throw an exception because the system
admins have sanitised the environment.  Bill did not need to understand
the implications of encoding for his code to work in this script whether
simple or complex.

* Mary is coding an application for use inside her company.  It finds
all html files on a system and updates her company's copyright, privacy
policy, and other legal boilerplate.  Her expectation is that after her
program runs every file will have been updated.  Her environment is a
mixture of different filename encodings due to having many legacy
documents for users in different locales.  Mary's code also naively uses
os.listdir(<type 'str'>).  Her test case checks that the code does the
right thing on many languages but unfortunately doesn't check with
different encodings because she'd have to already understand the
encoding issue to check for that.

  - With the current approach, the code will silently do the wrong thing
in production for years, until someone notices and alerts the company
that something is wrong with certain files in certain locales.  By then,
Mary may no longer be involved with the company and there are thousands
of users who thought they were operating under the old legal terms
instead of the new ones.

  - With exceptions raised, Mary will be alerted of the problem when she
tries to run the code in production for the first time.  She can then do
a little research and fix it to run correctly.  The traceback that's
issued can be googled and the line that it points to will show where the
error is occurring.

* Arthur's company has shipped some of his code in a product.  The code
uses os.listdir(<type 'str'>) to find images and movies in a directory
subsequent to deciding if they contain pornography.  A cron job runs the
code and the messages it prints are sent by cron to the system admins to
take action on.  A customer calls to complain that the code did not
detect that a recently fired employee had a 30 minute pornographic movie
on his office computer.  Arthur has to figure out why.

  - With the current code, Arthur might start with the algorithms that
examines the movies, try to get samples of the pornography from the
company, and look in many wrong places before finding out that the code
that searches for files is not listing all the files in directories.
  - With tracebacks raised, the system admins, at least, will have
received messages from cron stating that the undecodable filenames are
causing errors that need to be addressed.  They can call Arthur's
company when they notice this and Arthur can fix it quickly because the
traceback contains all the necessary information.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081207/525cefae/attachment.pgp>

From murman at gmail.com  Sun Dec  7 19:18:19 2008
From: murman at gmail.com (Michael Urman)
Date: Sun, 7 Dec 2008 12:18:19 -0600
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
Message-ID: <dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>

On Sun, Dec 7, 2008 at 11:35, Adam Olsen <rhamph at gmail.com> wrote:
>>> http://bugs.python.org/issue3672
>>> http://bugs.python.org/issue3297
>
> No.  Unicode *requires* them to be treated as errors.  If you want to
> pass them through then you're creating a custom encoding... which you
> might argue for in this case, but it needs to be clearly separate from
> the real UTF-8.

I suspect it is a common and convenient but (according to what you
say) misconceived expectation that using UTF-8 to encode any Unicode
string will not raise an exception. This behavior is not something
which should be discarded lightly.

I see little reason that this couldn't be a new codec or error handler
that allowed people to choose between correct pure UTF-8 behavior or
the technically incorrect but very practical behavior it currently
has.

[My apologies, Adam, for sending this only to you the first time]
-- 
Michael Urman

From rhamph at gmail.com  Sun Dec  7 19:56:35 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 7 Dec 2008 11:56:35 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de>
	<493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
Message-ID: <aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>

On Sun, Dec 7, 2008 at 11:18 AM, Michael Urman <murman at gmail.com> wrote:
> On Sun, Dec 7, 2008 at 11:35, Adam Olsen <rhamph at gmail.com> wrote:
>>>> http://bugs.python.org/issue3672
>>>> http://bugs.python.org/issue3297
>>
>> No.  Unicode *requires* them to be treated as errors.  If you want to
>> pass them through then you're creating a custom encoding... which you
>> might argue for in this case, but it needs to be clearly separate from
>> the real UTF-8.
>
> I suspect it is a common and convenient but (according to what you
> say) misconceived expectation that using UTF-8 to encode any Unicode
> string will not raise an exception. This behavior is not something
> which should be discarded lightly.

It is *not* a valid Unicode string in the first place.  Therein lies
the problem.


> I see little reason that this couldn't be a new codec or error handler
> that allowed people to choose between correct pure UTF-8 behavior or
> the technically incorrect but very practical behavior it currently
> has.

Note that many of the restrictions were added for security reasons.
You might receive a UTF-8 encoded file name from a malicious user,
check if it contains something dangerous (like
"../../../../../etc/password"), then decode it.  If your decoder isn't
compliant (ie doesn't check for overly long sequences) then a
b'\xC0\xAF' gets translated into u'/', bypassing your previous check.

However, in this context we only need to allow lone surrogates.
CESU-8 comes to mind.  (It is a perverse world we live in.)

-- 
Adam Olsen, aka Rhamphoryncus

From paul at boddie.org.uk  Sun Dec  7 22:06:21 2008
From: paul at boddie.org.uk (Paul Boddie)
Date: Sun, 7 Dec 2008 22:06:21 +0100
Subject: [Python-Dev] "as" keyword woes
Message-ID: <200812072206.21908.paul@boddie.org.uk>

On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote:
>
> On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano <warren at delsci.com>
> wrote:
> > As someone somewhat knowledgable of how parsers work, I do not
> > understand why a method/attribute name "object_name.as(...)" must
> > necessarily conflict with a standalone keyword " as ".  It seems to me
> > that it should be possible to unambiguously separate the two without
> > ambiguity or undue complication of the parser.
>
> That's possible with sufficiently powerful parser technology, but
> that's not how the Python parser (and most parsers, in my experience)
> treat reserved words. Reserved words are reserved in all contexts,
> regardless of whether ambiguity could arise.

Just a quick aside from someone who merely lurks on this list: in SQL, it's 
quite possible to use keywords in a fashion similar to that desired by the 
inquirer, and it's actually possible to double-quote keywords and use them as 
names for things. I'm not advocating more complicated parsing technology for 
any Python implementation, but I think it's pertinent to point out that the 
technology isn't particularly obscure.

Apologies for the interruption,

Paul

From martin at v.loewis.de  Sun Dec  7 22:10:18 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 07 Dec 2008 22:10:18 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>
	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
Message-ID: <493C3BBA.1040106@v.loewis.de>

> There's clearly an argument of timeliness there, which
> is why we'd like to get this fixed ASAP.

I think it is still timely when fixed in January or February.
In fact, releasing it still in December might not be possible,
due to the limited time available.

Regards,
Martin

From tjreedy at udel.edu  Sun Dec  7 22:20:06 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 07 Dec 2008 16:20:06 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493C0FE1.30506@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com>
Message-ID: <ghhem0$doj$1@ger.gmane.org>

Toshio Kuratomi wrote:

>   - If this is true, a definition of os.listdir(<type 'str'>) that would
> better meet programmer expectation would be: "Give me all files in a
> directory with the output as str type".  The definition of
> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
> with the output as bytes type".  Raising an exception when the filenames
> are undecodable is perfectly reasonable in this situation.

Your examples (snipped) pretty well convince me that there is a use case 
for raising exceptions.  We should move beyond arguing over which one 
way is right.  I think there should be a second argument 
'ignorebad=False' to ignore undecodable files rather than raise the 
exception (or 'strict=True' to stop and raise exception on non-decodable 
names -- then code is 'if strict: raise ...').  I believe other 
functions have a similar parameter.

tjr


From guido at python.org  Sun Dec  7 22:33:57 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 7 Dec 2008 13:33:57 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghhem0$doj$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
Message-ID: <ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>

On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Toshio Kuratomi wrote:
>
>>  - If this is true, a definition of os.listdir(<type 'str'>) that would
>> better meet programmer expectation would be: "Give me all files in a
>> directory with the output as str type".  The definition of
>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>> with the output as bytes type".  Raising an exception when the filenames
>> are undecodable is perfectly reasonable in this situation.
>
> Your examples (snipped) pretty well convince me that there is a use case for
> raising exceptions.  We should move beyond arguing over which one way is
> right.  I think there should be a second argument 'ignorebad=False' to
> ignore undecodable files rather than raise the exception (or 'strict=True'
> to stop and raise exception on non-decodable names -- then code is 'if
> strict: raise ...').  I believe other functions have a similar parameter.

If you want the exceptions, just use the bytes API and try to decode
the byte strings using the system encoding.

My problem with raising exceptions *by default* when an undecodable
name exists is that it may render an app completely useless in a
situation where the developer is no longer around. This happened all
the time with the 2.x Unicode API, where the developer hadn't
anticipated a particular input potentially containing non-ASCII bytes,
and the user fed the application non-ASCII text. Making os.listdir
raise an exception when a directory contains a single undecodable file
means that the entire directory can't be read, and most likely the
entire app crashes at that point. Most likely the developer never
anticipated this situation (since in most places it is either
impossible or very unlikely) -- after all, if they had anticipated it
they would have used the bytes API in the first place. (It's worse
because the exception being raised would be UnicodeError -- most
people expect os.listdir to raise OSError, not other errors.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fabiofz at gmail.com  Sun Dec  7 22:46:57 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Sun, 7 Dec 2008 19:46:57 -0200
Subject: [Python-Dev] Nonlocal shortcut
Message-ID: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>

Hi,

I'm currently implementing a parser to handle Python 3.0, and one of
the points I found conflicting with the grammar specification is the
PEP 3104.

It says that a shortcut would be added to Python 3.0 so that "nonlocal
x = 0" can be written. However, the latest grammar specification
(http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
doesn't seem to take that into account... So, can someone enlighten me
on what should be the correct treatment for that on a grammar that
wants to support Python 3.0?

Thanks,

Fabio

From ncoghlan at gmail.com  Sun Dec  7 22:49:41 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 08 Dec 2008 07:49:41 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghhem0$doj$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>
Message-ID: <493C44F5.80806@gmail.com>

Terry Reedy wrote:
> Toshio Kuratomi wrote:
> 
>>   - If this is true, a definition of os.listdir(<type 'str'>) that would
>> better meet programmer expectation would be: "Give me all files in a
>> directory with the output as str type".  The definition of
>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>> with the output as bytes type".  Raising an exception when the filenames
>> are undecodable is perfectly reasonable in this situation.
> 
> Your examples (snipped) pretty well convince me that there is a use case
> for raising exceptions.  We should move beyond arguing over which one
> way is right.  I think there should be a second argument
> 'ignorebad=False' to ignore undecodable files rather than raise the
> exception (or 'strict=True' to stop and raise exception on non-decodable
> names -- then code is 'if strict: raise ...').  I believe other
> functions have a similar parameter.

If we were going to do anything like that for os.listdir() and other
filesystem APIs (like glob) that return multiple paths, we'd probably be
best advised to just have a normal Unicode 'errors' parameter which allowed:

'strict' - raise an Exception for malformed binary data
'replace' - insert '?' or some other symbol in place of malformed binary
data
'ignore' - simply leave out the malformed binary data
'skip' - run the underlying codec in strict mode, but skip over any
items which raise UnicodeDecodeError (default/current Py3k behaviour)

Obviously, 'skip' doesn't make any sense for APIs like getcwd() that
return a single value - a case could be made for those defaulting to
either replace or strict.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From amauryfa at gmail.com  Sun Dec  7 23:45:09 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Sun, 7 Dec 2008 23:45:09 +0100
Subject: [Python-Dev] Nonlocal shortcut
In-Reply-To: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
References: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
Message-ID: <e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>

Hello,

Fabio Zadrozny  wrote:
> Hi,
>
> I'm currently implementing a parser to handle Python 3.0, and one of
> the points I found conflicting with the grammar specification is the
> PEP 3104.
>
> It says that a shortcut would be added to Python 3.0 so that "nonlocal
> x = 0" can be written. However, the latest grammar specification
> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
> doesn't seem to take that into account... So, can someone enlighten me
> on what should be the correct treatment for that on a grammar that
> wants to support Python 3.0?

An issue was already filed about this:
http://bugs.python.org/issue4199
It should be ready for inclusion in 3.0.1.

-- 
Amaury Forgeot d'Arc

From greg.ewing at canterbury.ac.nz  Mon Dec  8 00:42:50 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 08 Dec 2008 12:42:50 +1300
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B2C22.5060907@gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com>
Message-ID: <493C5F7A.9070105@canterbury.ac.nz>

Nick Coghlan wrote:

> For binary wrappers around the Windows Unicode APIs, I was thinking
> specifically of using UTF-8, since that should be able to encode
> anything the Unicode APIs can handle.

Why shouldn't the binary interface just expose the raw
utf16 as bytes?

-- 
Greg

From tjreedy at udel.edu  Mon Dec  8 00:53:37 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 07 Dec 2008 18:53:37 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
Message-ID: <ghhnlr$6lf$1@ger.gmane.org>

Guido van Rossum wrote:
> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> Toshio Kuratomi wrote:
>>
>>>  - If this is true, a definition of os.listdir(<type 'str'>) that would
>>> better meet programmer expectation would be: "Give me all files in a
>>> directory with the output as str type".  The definition of
>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>>> with the output as bytes type".  Raising an exception when the filenames
>>> are undecodable is perfectly reasonable in this situation.
>> Your examples (snipped) pretty well convince me that there is a use case for
>> raising exceptions.  We should move beyond arguing over which one way is
>> right.  I think there should be a second argument 'ignorebad=False' to
>> ignore undecodable files rather than raise the exception (or 'strict=True'
>> to stop and raise exception on non-decodable names -- then code is 'if
>> strict: raise ...').  I believe other functions have a similar parameter.

I was thinking of the "normal Unicode 'errors' parameter", as described 
by Nick.

> If you want the exceptions, just use the bytes API and try to decode
> the byte strings using the system encoding.

If it was a matter of adding a new method, I might agree.  But:

1. We already have a method that does exactly what you describe.  It is 
only a matter of adding flexibility to the response to problems, for 
which there is already precedent.

2. Suggesting that people who want strings and not bytes should have to 
deal with bytes, just to get an error notification, seems to negate that 
point of moving to 3.0

3. A builtin would probably do so better than most programmers would, 
with little touches such as the one suggested below.

4. An error parameter would ALERT programmers to the possibility of a 
PROBLEM, both in the present and future.  As you say below, people need 
to better anticipate the future.

> My problem with raising exceptions *by default* when an undecodable
> name exists is that it may render an app completely useless in a
> situation where the developer is no longer around. This happened all
> the time with the 2.x Unicode API, where the developer hadn't
> anticipated a particular input potentially containing non-ASCII bytes,
> and the user fed the application non-ASCII text. Making os.listdir
> raise an exception when a directory contains a single undecodable file
> means that the entire directory can't be read, and most likely the
> entire app crashes at that point. Most likely the developer never
> anticipated this situation (since in most places it is either
> impossible or very unlikely) -- after all, if they had anticipated it
> they would have used the bytes API in the first place. (It's worse
> because the exception being raised would be UnicodeError -- most
> people expect os.listdir to raise OSError, not other errors.)

This to be is an argument for keeping the default the current behavior, 
but not for rejecting flexibility.  The computing world seems to be 
messier than we would like and worse that I realized until this week. 
As you say below, people need to better anticipate the future, and an 
errors parameter would help do that.


Is Windows really immune?  What about when it reads the directory of 
possibly old removable media with whatever byte name encodings?  Is this 
a possible source of 'unanticipated' problems?

As to your last sentence, os.listdir() with an errors parameter could 
convert a decoding UnicodeError to "OSError: undecodable file name 
<ascii+hex repr>", thereby supplying the expected exception as well as 
an extractable representation of problematical the raw bytes

Here is a possible use case: I want filenames as 3.0 strings and I 
anticipate no problems at present but, as you say above, something might 
happen years in the future.  I am using 3.0 *because* of the strings == 
unicode feature.  I would like to write

try:
   files = os.listdir(somedir, errors = strict)
except OSError as e:
   log(<verbose error message that includes somedir and e>)
   files = os.listdir(somedir)

and go one without the problem file but not without logging the problem 
so a future maintainer can consider what to do about it, but only when 
there is an actual need to think about it.

Terry Jan Reedy


From tjreedy at udel.edu  Mon Dec  8 01:02:01 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 07 Dec 2008 19:02:01 -0500
Subject: [Python-Dev] Nonlocal shortcut
In-Reply-To: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
References: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
Message-ID: <ghho5i$81e$1@ger.gmane.org>

Fabio Zadrozny wrote:
> Hi,
> 
> I'm currently implementing a parser to handle Python 3.0, and one of
> the points I found conflicting with the grammar specification is the
> PEP 3104.
> 
> It says that a shortcut would be added to Python 3.0 so that "nonlocal
> x = 0" can be written. 

As near as I can tell from testing, that did not happen. The PEP needs 
revision to delete that or push it to a later version.

 > However, the latest grammar specification
> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
> doesn't seem to take that into account... So, can someone enlighten me
> on what should be the correct treatment for that on a grammar that
> wants to support Python 3.0?


From lists at cheimes.de  Mon Dec  8 01:05:13 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 08 Dec 2008 01:05:13 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C3BBA.1040106@v.loewis.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de>
Message-ID: <493C64B9.2040701@cheimes.de>

Martin v. L?wis wrote:
> I think it is still timely when fixed in January or February.
> In fact, releasing it still in December might not be possible,
> due to the limited time available.

The cmp() / PyObject_Compare() removal patch is almost done. With some 
help I can finish it until Tuesday evening. We can have another release 
by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed 
before people are spending their Xmas holidays with 3.0. The defects include

* cmp(), PyObject_Compare() and frieds
* global/nonlocal shortcuts (global x = 0) aren't working
* unnecessary slowdown of read() due slow buffer resizing.

An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases 
again. If we release it now we can have an combined release of 2.6.2 and 
3.0.2 in two months from now. Two months are quite some time to fix the 
performance issue of the new IO library.

If Guido and Barry are fine with a lax policy on performance fixes we 
can integrate more tweaks. I believe performances patches were 
considered as features in the past. For this reason they weren't allowed 
for minor releases. Mark's work on long integer optimizations and json 
speedup are good candidates.

Christian

From musiccomposition at gmail.com  Mon Dec  8 01:11:45 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sun, 7 Dec 2008 18:11:45 -0600
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C64B9.2040701@cheimes.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>
	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
	<493BD1F2.5080300@holdenweb.com>
	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
Message-ID: <1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com>

On Sun, Dec 7, 2008 at 6:05 PM, Christian Heimes <lists at cheimes.de> wrote:
> Martin v. L?wis wrote:
>>
>> I think it is still timely when fixed in January or February.
>> In fact, releasing it still in December might not be possible,
>> due to the limited time available.
>
> The cmp() / PyObject_Compare() removal patch is almost done. With some help
> I can finish it until Tuesday evening. We can have another release by Monday
> Dec 15th. Python 3.0.0 has some defects that should be fixed before people
> are spending their Xmas holidays with 3.0. The defects include
>
> * cmp(), PyObject_Compare() and frieds
> * global/nonlocal shortcuts (global x = 0) aren't working

I have a patch for this [1], but I don't think this should be
considered a release blocker or even backported to 3.0. It's merely a
convenience feature and doesn't inhibit the usefulness of the PEP in
any way.

> * unnecessary slowdown of read() due slow buffer resizing.




-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From lists at cheimes.de  Mon Dec  8 01:14:53 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 08 Dec 2008 01:14:53 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>
	<493C64B9.2040701@cheimes.de>
	<1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com>
Message-ID: <493C66FD.2000506@cheimes.de>

Benjamin Peterson wrote:
> I have a patch for this [1], but I don't think this should be
> considered a release blocker or even backported to 3.0. It's merely a
> convenience feature and doesn't inhibit the usefulness of the PEP in
> any way.

Amaury said:
An issue was already filed about this:
http://bugs.python.org/issue4199
It should be ready for inclusion in 3.0.1.

I'm +0 for the patch. Given the nature of Python 3.0 I'm fine with 
getting it right.

Christian

From barry at python.org  Mon Dec  8 01:52:53 2008
From: barry at python.org (Barry Warsaw)
Date: Sun, 7 Dec 2008 19:52:53 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C64B9.2040701@cheimes.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
Message-ID: <BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 7, 2008, at 7:05 PM, Christian Heimes wrote:

> Martin v. L?wis wrote:
>> I think it is still timely when fixed in January or February.
>> In fact, releasing it still in December might not be possible,
>> due to the limited time available.
>
> The cmp() / PyObject_Compare() removal patch is almost done. With  
> some help I can finish it until Tuesday evening. We can have another  
> release by Monday Dec 15th. Python 3.0.0 has some defects that  
> should be fixed before people are spending their Xmas holidays with  
> 3.0. The defects include
>
> * cmp(), PyObject_Compare() and frieds
> * global/nonlocal shortcuts (global x = 0) aren't working
> * unnecessary slowdown of read() due slow buffer resizing.
>
> An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases  
> again. If we release it now we can have an combined release of 2.6.2  
> and 3.0.2 in two months from now. Two months are quite some time to  
> fix the performance issue of the new IO library.
>
> If Guido and Barry are fine with a lax policy on performance fixes  
> we can integrate more tweaks. I believe performances patches were  
> considered as features in the past. For this reason they weren't  
> allowed for minor releases. Mark's work on long integer  
> optimizations and json speedup are good candidates.

I'm personally okay with performance fixes in point releases, as long  
it doesn't change API or add additional features.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSTxv5XEjvBPtnXfVAQIu6AQAkxyGwhapcREx5/E3yHUf8lWvM4lh/FdR
AfHwwp7hs+yX8rR05CWAUfllY9dHcHKHvBCwTCgfuIrc4GJWbJHcx9/b19GTpzre
7fcikjQ0sk6zUq85DiJah7qL5AkA6Jmiby+rol7iudHlmQO/+6F6+aeL+vSKG8IC
vYbLILAFapI=
=ScYg
-----END PGP SIGNATURE-----

From lists at cheimes.de  Mon Dec  8 01:56:25 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 08 Dec 2008 01:56:25 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>
Message-ID: <493C70B9.2030601@cheimes.de>

Barry Warsaw wrote:
> I'm personally okay with performance fixes in point releases, as long it 
> doesn't change API or add additional features.

Does your okay include or exclude new internal APIs like new helper 
functions or a new C modules?

Christian

From fabiofz at gmail.com  Mon Dec  8 02:06:21 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Sun, 7 Dec 2008 23:06:21 -0200
Subject: [Python-Dev] Nonlocal shortcut
In-Reply-To: <e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>
References: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
	<e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>
Message-ID: <cfb578b20812071706w755f76faoc24ab90f5421d602@mail.gmail.com>

>> I'm currently implementing a parser to handle Python 3.0, and one of
>> the points I found conflicting with the grammar specification is the
>> PEP 3104.
>>
>> It says that a shortcut would be added to Python 3.0 so that "nonlocal
>> x = 0" can be written. However, the latest grammar specification
>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
>> doesn't seem to take that into account... So, can someone enlighten me
>> on what should be the correct treatment for that on a grammar that
>> wants to support Python 3.0?
>
> An issue was already filed about this:
> http://bugs.python.org/issue4199
> It should be ready for inclusion in 3.0.1.
>

Thanks for pointing that out.

Fabio

From v+python at g.nevcal.com  Mon Dec  8 03:17:04 2008
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Sun, 07 Dec 2008 18:17:04 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<493B22F8.8090902@gmail.com>
	<200812070235.41321@news.perlig.de>	<493B2C22.5060907@gmail.com>	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>	<493B923F.6010706@gmx.net>	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>	<493B98D3.8070405@gmx.net>	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
Message-ID: <493C83A0.5020606@g.nevcal.com>

On approximately 12/7/2008 10:56 AM, came the following characters from 
the keyboard of Adam Olsen:

> You might receive a UTF-8 encoded file name from a malicious user,
> check if it contains something dangerous (like
> "../../../../../etc/password"), then decode it.  If your decoder isn't
> compliant (ie doesn't check for overly long sequences) then a
> b'\xC0\xAF' gets translated into u'/', bypassing your previous check.


You might indeed.

But if you are interested in checking for security issues, shouldn't you 
  _first_ decode into some canonical form, specifying what sorts of 
Unicode strictness (such as overlong sequences) to check for during the 
decode process, and once the string is in canonical form, _then_ do 
checks for various attacks, such as the ../ sequence you mention?

And with that order of operation, even if you don't reject overlong 
sequences, you have canonized them, and can recognize the resulting 
characters as good or bad.


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From stephen at xemacs.org  Mon Dec  8 03:34:50 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 08 Dec 2008 11:34:50 +0900
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
Message-ID: <87tz9fo0mt.fsf@xemacs.org>

glyph at divmod.com writes:

 > But still, you can't honestly expect me to recommend 3.0 until someone 
 > has gotten at least a basic skeleton of Twisted up and running under it 
 > :).  My own attempts to do so have failed miserably, to the point where 
 > I can't even produce a useful bug report without a lot more work.

How about an issue in the Python tracker---or the Twisted one, with a
xref from the Python tracker to the Twisted tracker where the work
will be done---that says "Twisted wants to be ported but we don't have
enough developers, please help"?  Maybe with some encouraging
statement about how you can provide X amount of advice.

In general, maybe there should be some sort of (semi-)formal process
for proposing ports of libraries and coordinating work on them.  Even
just a focal point for where to make such requests, and a way to
saerch for them so you can find others with similar interests.

 > I don't think there's anything about the 3.0 language which
 > couldn't be supported in a VM that understood both 2 and 3.

Strings vs. bytes.<shudder>  It can't do both 2-style "bytes are text"
and 3-style "no way are bytes text" simultaneously AFAICS.

 > I also don't think 3.0 is perfect, and five years on, there will be
 > a temptation to make more "just this once" incompatible changes.
 > Of course, you've promised these changes won't be made, and *this*
 > set of design mistakes will be with us forever.

For values of "forever" approximating ten years.<wink>

 > It would be nice if there were a way for evolution to continue
 > without another reboot of the world.

Stephen J. Gould says not.<wink>

I think Java is a very different case from Python.  It is the product
of a language evolution that goes back to the early 1970s or so, and
the standardization effort was carefully shepherded by a powerful
company which provided resources to ensure that things went its way.

For that reason, I think it's a remarkable compliment to Python and to
Python 3 in particular that you consider Java an appropriate standard
of comparison for Python.

There's also the danger of stasis.  I think Lisp will never die, and
Common Lisp has done a good job of avoiding reboots.  But for
precisely that reason there continues to be a lively evolution of
seriously incompatible dialects, both Lisp-1 (Scheme) and Lisp-2.  I
see Python 3 as an attempt to bridle and ride this tiger, without
turning the rope into a noose and strangling the beast.

 > >If they're that easily convinced that Java is better they probably
 > >were a lost cause anyway, so I won't mourn their departure too much.
 > 
 > I really believe that *all* new users are fickle, if they don't have a 
 > mandate as to what they need to be learning.  Personally, I learned 
 > Python because of a memory leak in Swing.

Sure, but what Guido is saying, I think, is that as long as prominent
Python developers don't announce its funeral, the other things we
could do to encourage them are going to get lost in the noise of
inherent fickleness.  Which isn't just random, it depends on things
like availability of just the right library for one's app, etc.  But
there are too many of those to do them all, or even just to list them
up and try to prioritize them "objectively"---might as well be random.


From stephen at xemacs.org  Mon Dec  8 05:13:38 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 08 Dec 2008 13:13:38 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493C83A0.5020606@g.nevcal.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de>
	<493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
	<493C83A0.5020606@g.nevcal.com>
Message-ID: <87prk3nw25.fsf@xemacs.org>

Glenn Linderman writes:

 > But if you are interested in checking for security issues, shouldn't you 
 >   _first_ decode into some canonical form,

Yes.  That's all that is being asked for: that Python do strict
decoding to a canonical form by default.  That's a lot to ask, as it
turns out, but that is what we (the minority of strict Unicode
adherents, that is) want.

If you want the convenience and risk, I believe you should ask for it
by name (I suggest a name like "own_me" for the relaxed decoding
flag<wink>).  Failing that, it would be nice to have a global flag to
change the default.


From martin at v.loewis.de  Mon Dec  8 05:17:43 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 08 Dec 2008 05:17:43 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C64B9.2040701@cheimes.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
Message-ID: <493C9FE7.7040908@v.loewis.de>

>> I think it is still timely when fixed in January or February.
>> In fact, releasing it still in December might not be possible,
>> due to the limited time available.
> 
> The cmp() / PyObject_Compare() removal patch is almost done.

I wasn't (primarily) talking about fixing this particular issue.
Time needs to be made available also for the upcoming 2.4.6 and 2.5.3
releases (which should, IMO, get priority over a 3.0 bugfix release
at this point)

> With some
> help I can finish it until Tuesday evening. We can have another release
> by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed
> before people are spending their Xmas holidays with 3.0. The defects
> include
> 
> * cmp(), PyObject_Compare() and frieds
> * global/nonlocal shortcuts (global x = 0) aren't working
> * unnecessary slowdown of read() due slow buffer resizing.

I think 3.0.1 should also address other serious bugs in 3.0, such
as
- various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410)
- various ways to crash Python through the buffer protocol
  (4583, 4509; also 4580)

> An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases
> again.

IIUC, you want the bugfix version number to be sync'ed. I don't
think that is a useful thing to have.

> If Guido and Barry are fine with a lax policy on performance fixes we
> can integrate more tweaks. I believe performances patches were
> considered as features in the past. For this reason they weren't allowed
> for minor releases. Mark's work on long integer optimizations and json
> speedup are good candidates.

I don't recall such policy, and I can't see anything wrong with
including performance fixes in a bug fix release. Maybe you were
confusing this with whether performance fixes can be considered
release-critical (which they shouldn't, IMO)?

Regards,
Martin

From v+python at g.nevcal.com  Mon Dec  8 05:45:12 2008
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Sun, 07 Dec 2008 20:45:12 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <87prk3nw25.fsf@xemacs.org>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<493B22F8.8090902@gmail.com>	<200812070235.41321@news.perlig.de>	<493B2C22.5060907@gmail.com>	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>	<493B923F.6010706@gmx.net>	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>	<493B98D3.8070405@gmx.net>	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>	<493C83A0.5020606@g.nevcal.com>
	<87prk3nw25.fsf@xemacs.org>
Message-ID: <493CA658.6030106@g.nevcal.com>

On approximately 12/7/2008 8:13 PM, came the following characters from 
the keyboard of Stephen J. Turnbull:
> Glenn Linderman writes:
> 
>  > But if you are interested in checking for security issues, shouldn't you 
>  >   _first_ decode into some canonical form,
> 
> Yes.  That's all that is being asked for: that Python do strict
> decoding to a canonical form by default.  That's a lot to ask, as it
> turns out, but that is what we (the minority of strict Unicode
> adherents, that is) want.


I have no problem with having strict validation available.  But doesn't 
validation take significantly longer than decoding?  So I think it 
should be logically decoupled... do validation when/where it is needed 
for security reasons, and allow internal [de]coding to be faster.

I'm mostly indifferent about which should be the default... maybe there 
shouldn't be a default!  Use the "vUTF-8" decoder for strict validation, 
and the "fUTF-8" decoder for the faster, non-validating version.  Or 
something like that.  With appropriate documentation.  Of course, 
"UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it 
shouldn't change... but it could be deprecated.


You didn't address the issue that if the decoding to a canonical form is 
done first, many of the insecurities just go away, so why throw errors?


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From rhamph at gmail.com  Mon Dec  8 06:11:21 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 7 Dec 2008 22:11:21 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493CA658.6030106@g.nevcal.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
	<493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org>
	<493CA658.6030106@g.nevcal.com>
Message-ID: <aac2c7cb0812072111m549402dcma544990ed8233fff@mail.gmail.com>

On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 12/7/2008 8:13 PM, came the following characters from the
> keyboard of Stephen J. Turnbull:
>>
>> Glenn Linderman writes:
>>
>>  > But if you are interested in checking for security issues, shouldn't
>> you  >   _first_ decode into some canonical form,
>>
>> Yes.  That's all that is being asked for: that Python do strict
>> decoding to a canonical form by default.  That's a lot to ask, as it
>> turns out, but that is what we (the minority of strict Unicode
>> adherents, that is) want.
>
>
> I have no problem with having strict validation available.  But doesn't
> validation take significantly longer than decoding?  So I think it should be
> logically decoupled... do validation when/where it is needed for security
> reasons, and allow internal [de]coding to be faster.

I'd like to see benchmarks of such a claim.


> I'm mostly indifferent about which should be the default... maybe there
> shouldn't be a default!  Use the "vUTF-8" decoder for strict validation, and
> the "fUTF-8" decoder for the faster, non-validating version.  Or something
> like that.  With appropriate documentation.  Of course, "UTF-8" already
> exists... as "fUTF-8", so for compatibility, I guess it shouldn't change...
> but it could be deprecated.
>
>
> You didn't address the issue that if the decoding to a canonical form is
> done first, many of the insecurities just go away, so why throw errors?

Unicode is intended to allow interaction between various bits of
software.  It may be that a library checked it in UTF-8, then passed
it to python.  It would be nice if the library validated too, but a
major advantage of UTF-8 is older libraries (or protocols!) intended
for ASCII need only be 8-bit clean to be repurposed for UTF-8.  Their
security checks continue to work, so long as nobody down stream
introduces problems with a non-validating decoder.


-- 
Adam Olsen, aka Rhamphoryncus

From v+python at g.nevcal.com  Mon Dec  8 07:04:08 2008
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Sun, 07 Dec 2008 22:04:08 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812072111m549402dcma544990ed8233fff@mail.gmail.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	
	<493B923F.6010706@gmx.net>	
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>	
	<493B98D3.8070405@gmx.net>	
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>	
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>	
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>	
	<493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org>	
	<493CA658.6030106@g.nevcal.com>
	<aac2c7cb0812072111m549402dcma544990ed8233fff@mail.gmail.com>
Message-ID: <493CB8D8.604@g.nevcal.com>

On approximately 12/7/2008 9:11 PM, came the following characters from 
the keyboard of Adam Olsen:
> On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:
>> On approximately 12/7/2008 8:13 PM, came the following characters from the
>> keyboard of Stephen J. Turnbull:
>>> Glenn Linderman writes:
>>>
>>>  > But if you are interested in checking for security issues, shouldn't
>>> you  >   _first_ decode into some canonical form,
>>>
>>> Yes.  That's all that is being asked for: that Python do strict
>>> decoding to a canonical form by default.  That's a lot to ask, as it
>>> turns out, but that is what we (the minority of strict Unicode
>>> adherents, that is) want.
>>
>> I have no problem with having strict validation available.  But doesn't
>> validation take significantly longer than decoding?  So I think it should be
>> logically decoupled... do validation when/where it is needed for security
>> reasons, and allow internal [de]coding to be faster.
> 
> I'd like to see benchmarks of such a claim.


"significantly" seems to be the only word at question; it seems that 
there are a fair number of validation checks that could be performed; 
the numeric part of UTF-8 decoding is just a sequence of shifts, masks, 
and ORs, so can be coded pretty tightly in C or assembly language.

Anything extra would be slower; how much slower is hard to predict prior 
to the implementation.  My "significantly" was just the expectation that 
the larger code with more conditional branches that is required for 
validation is less likely to stay in cache, and take longer to load into 
cache, and take longer to execute.  This also seems to be supported by 
Stephen's comment "That's a lot to ask, as it turns out."

Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, 
I wonder if I could find that code?  Can you supply a validated decoder? 
  Then we could run some benchmarks, eh?


>> I'm mostly indifferent about which should be the default... maybe there
>> shouldn't be a default!  Use the "vUTF-8" decoder for strict validation, and
>> the "fUTF-8" decoder for the faster, non-validating version.  Or something
>> like that.  With appropriate documentation.  Of course, "UTF-8" already
>> exists... as "fUTF-8", so for compatibility, I guess it shouldn't change...
>> but it could be deprecated.
>>
>>
>> You didn't address the issue that if the decoding to a canonical form is
>> done first, many of the insecurities just go away, so why throw errors?
> 
> Unicode is intended to allow interaction between various bits of
> software.  It may be that a library checked it in UTF-8, then passed
> it to python.  It would be nice if the library validated too, but a
> major advantage of UTF-8 is older libraries (or protocols!) intended
> for ASCII need only be 8-bit clean to be repurposed for UTF-8.  Their
> security checks continue to work, so long as nobody down stream
> introduces problems with a non-validating decoder.


So I don't understand how this is responsive to the "decoding removes 
many insecurities" issue?

Yes, you might use libraries.  Either they have insecurities, or not. 
Either they validate, or not.  Either they decode, or not.  They may be 
immune to certain attacks, because of their structure and code, or not.

So when you examine a library for potential use, you have documentation 
or code to help you set your expectations about what it does, and 
whether or not it may have vulnerabilities, and whether or not those 
vulnerabilities are likely or unlikely, whether you can reduce the 
likelihood or prevent the vulnerabilities by wrapping the API, etc.  And 
so you choose to use the library, or not.

This whole discussion about libraries seems somewhat irrelevant to the 
question at hand, although it is certainly true that understanding how a 
library handles Unicode is an important issue for the potential user of 
a library.

So how does a non-validating decoder introduce problems?  I can see that 
it might not solve all problems, but how does it introduce problems? 
Wouldn't the problems be introduced by something else, and the use of a 
non-validating decoder may not catch the problem... but not be the cause 
of the problem?

And then, if you would like to address the original issue, that would be 
fine too.


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From rhamph at gmail.com  Mon Dec  8 08:04:15 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 8 Dec 2008 00:04:15 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493CB8D8.604@g.nevcal.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
	<493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org>
	<493CA658.6030106@g.nevcal.com>
	<aac2c7cb0812072111m549402dcma544990ed8233fff@mail.gmail.com>
	<493CB8D8.604@g.nevcal.com>
Message-ID: <aac2c7cb0812072304s433a8c16va24d05e2396513f6@mail.gmail.com>

On Sun, Dec 7, 2008 at 11:04 PM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On approximately 12/7/2008 9:11 PM, came the following characters from the
> keyboard of Adam Olsen:
>> On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman <v+python at g.nevcal.com>
>> wrote:
>
> Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, I
> wonder if I could find that code?  Can you supply a validated decoder?  Then
> we could run some benchmarks, eh?

There is no point for me, as the behaviour of a real UTF-8 codec is
clear.  It is you who needs to justify a second non-standard UTF-8-ish
codec.  See below.


>>> You didn't address the issue that if the decoding to a canonical form is
>>> done first, many of the insecurities just go away, so why throw errors?
>>
>> Unicode is intended to allow interaction between various bits of
>> software.  It may be that a library checked it in UTF-8, then passed
>> it to python.  It would be nice if the library validated too, but a
>> major advantage of UTF-8 is older libraries (or protocols!) intended
>> for ASCII need only be 8-bit clean to be repurposed for UTF-8.  Their
>> security checks continue to work, so long as nobody down stream
>> introduces problems with a non-validating decoder.
>
>
> So I don't understand how this is responsive to the "decoding removes many
> insecurities" issue?
>
> Yes, you might use libraries.  Either they have insecurities, or not. Either
> they validate, or not.  Either they decode, or not.  They may be immune to
> certain attacks, because of their structure and code, or not.
>
> So when you examine a library for potential use, you have documentation or
> code to help you set your expectations about what it does, and whether or
> not it may have vulnerabilities, and whether or not those vulnerabilities
> are likely or unlikely, whether you can reduce the likelihood or prevent the
> vulnerabilities by wrapping the API, etc.  And so you choose to use the
> library, or not.
>
> This whole discussion about libraries seems somewhat irrelevant to the
> question at hand, although it is certainly true that understanding how a
> library handles Unicode is an important issue for the potential user of a
> library.
>
> So how does a non-validating decoder introduce problems?  I can see that it
> might not solve all problems, but how does it introduce problems? Wouldn't
> the problems be introduced by something else, and the use of a
> non-validating decoder may not catch the problem... but not be the cause of
> the problem?
>
> And then, if you would like to address the original issue, that would be
> fine too.

Your non-validating encoder is translating an invalid sequence into a
valid one, thus you are introducing the problem.  A completely naive
environment (8-bit clean ASCII) would leave it as an invalid sequence
throughout.

This is not a theoretical problem.  See
http://tools.ietf.org/html/rfc3629#section-10 .  We MUST reject
invalid sequences, or else we are not using UTF-8.  There is no wiggle
room, no debate.

(The absoluteness is why the standard behaviour doesn't need a
benchmark.  You are essentially arguing that, when logging in as root
over the internet, it's a lot faster if you use telnet rather than
ssh.  One is simply not an option.)


-- 
Adam Olsen, aka Rhamphoryncus

From stephen at xemacs.org  Mon Dec  8 09:57:19 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 08 Dec 2008 17:57:19 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493CA658.6030106@g.nevcal.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de>
	<493B2C22.5060907@gmail.com>
	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
	<493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org>
	<493CA658.6030106@g.nevcal.com>
Message-ID: <87ljurnixc.fsf@xemacs.org>

Glenn Linderman writes:
 > On approximately 12/7/2008 8:13 PM, came the following characters from 

 > I have no problem with having strict validation available.  But
 > doesn't validation take significantly longer than decoding?

I think you're thinking of XML, where validation can take significant
resources over and above syntax checking.  For Unicode, not unless
you're seriously CPU-bound.  Unicode validation is a matter of a few
range checks and a couple of flags to handle things like lone
surrogates.

In the case of "excess length" in UTF-8, you can actually often do it
in *zero* time if you use a table to analyze the leading byte (eg,
0xC0 and 0xC1 are invalid UTF-8 leading bytes because they would
necessarily decode to U+0000 to U+007F, ie, the ASCII range), because
you have to make a check for 0xFE and 0xFF anyway, which can't be
UTF-8 leading bytes.  (I'm not sure this generalizes to longer UTF-8
sequences, but it would reject the use of 0xC0 0xAF to sneak in a "/"
in zero time!)

 > So I think it should be logically decoupled... do validation
 > when/where it is needed for security reasons,

Security is an important application, but the real issue is that
naively decoded text is a bomb with a sensitive impact fuse.  Pass it
around long enough, and it will blow up eventually.

The whole point of the fairly complex rules about Unicode formats and
the *requirement* that broken coding be a fatal error *in a
connforming Unicode process* is intended to ensure that Unicode
exceptions[1] only ever occur on input (or memory corruption and the
like, which is actually a form of I/O, of course).  That's where
efficiency comes from.

I think Python 3 should aspire to (eventually) be a conforming process
by default, with lax behavior an option.

 > and allow internal [de]coding to be faster.

"Internal decoding" is (or should be) an oxymoron.  Why would your
software be passing around text in any format other than internal?  So
decoding will happen (a) on I/O, which is itself almost certainly
slower than making a few checks for Unicode hygiene, or (b) on receipt
of data from other software that whose sanitation you shouldn't trust
more than you trust the Internet.

Encoding isn't a problem, AFAICS.

 > You didn't address the issue that if the decoding to a canonical
 > form is done first, many of the insecurities just go away, so why
 > throw errors?

Because as long as you're decoding anyway, it costs no more to do it
right, except in rare cases.  Why do you think Python should aspire to
"quick and dirty" in a context where dirty is known to be unhealthy,
and there is no known need for speed?  Why impose "doing it right" on
the application programmer when there's a well-defined spec for that
that we could implement in the standard library?

It's the errors themselves that people are objecting to.  See Guido's
posts for concisely stated arguments for a "don't ask, don't tell"
policy toward Unicode breakage.  I agree that Python should implement
that policy as an option, but I think that the user should have to
request it either with a runtime option or (in the case of user == app
programmer) by deliberately specifying a lax codec.  The default
*Unicode* codecs should definitely aspire to full Unicode conformance
within their sphere of responsibility.

Footnotes: 
[1]  A character outside the repertoire that the app can handle is not
a "Unicode exception", unless the reason the app can't handle it is
that the Unicode handler blew up.


From stephen at xemacs.org  Mon Dec  8 10:21:32 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 08 Dec 2008 18:21:32 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493CB8D8.604@g.nevcal.com>
References: <mailman.27161.1228543139.3486.python-dev@python.org>
	<493B923F.6010706@gmx.net>
	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>
	<493B98D3.8070405@gmx.net>
	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>
	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>
	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>
	<493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org>
	<493CA658.6030106@g.nevcal.com>
	<aac2c7cb0812072111m549402dcma544990ed8233fff@mail.gmail.com>
	<493CB8D8.604@g.nevcal.com>
Message-ID: <87k5abnhsz.fsf@xemacs.org>

Glenn Linderman writes:

 > "significantly" seems to be the only word at question; it seems that 
 > there are a fair number of validation checks that could be performed; 
 > the numeric part of UTF-8 decoding is just a sequence of shifts, masks, 
 > and ORs, so can be coded pretty tightly in C or assembly language.
 > 
 > Anything extra would be slower; how much slower is hard to predict prior 
 > to the implementation.

Not much, see my previous response.

 > This also seems to be supported by Stephen's comment "That's a lot
 > to ask, as it turns out."

Not what I meant.  Inefficiency is not an objection to checking for
validity at the level a codec can handle.  The objection is that "we
don't want *any* exceptions thrown that we didn't explicitly ask for",
and adding validation certainly will violate that.

 > So I don't understand how this is responsive to the "decoding removes 
 > many insecurities" issue?

Because you have to recheck every time the data crosses from Python
into your code.  To the extent that Python codecs promise validation
and keep that promise, internal code *never* has to make those checks.
That is a significant savings in programmer effort, because auditing a
large body of code for *any* I/O from Python is going to be costly.

 > So when you examine a library for potential use, you have documentation 
 > or code to help you set your expectations about what it does, and 
 > whether or not it may have vulnerabilities, and whether or not those 
 > vulnerabilities are likely or unlikely, whether you can reduce the 
 > likelihood or prevent the vulnerabilities by wrapping the API, etc.  And 
 > so you choose to use the library, or not.

Python is precisely such a component that people will choose to use,
or not, based on whether they can expect that when Python hands them a
Unicode object freshly input from the outside world, it won't contain
lone surrogates, or invalid UTF-8 characters that got through a
3rd-party spam filter, or whatever.

 > This whole discussion about libraries seems somewhat irrelevant to the 
 > question at hand,

No, it's the *only* point that matters.  IMO, speed is not relevant
here.  The question is whether throwing a Unicode exception on invalid
encoding by default generally does more good than harm.  Guido seems
to think "not!", which gives me pause.<wink>  I still disagree, though.


From eckhardt at satorlaser.com  Mon Dec  8 10:20:42 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Mon, 8 Dec 2008 10:20:42 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net>
References: <4938374B.8000006@gmail.com> 
	<200812051127.35880.eckhardt@satorlaser.com> 
	<0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net>
Message-ID: <200812081020.42448.eckhardt@satorlaser.com>

On Friday 05 December 2008, James Y Knight wrote:
> On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote:
> > Using the byte variant is equally fubar, because e.g. on MS Windows
> > it is not supported, except through a very lossy roundtrip through
> > the locale's codepage, limiting your functionality.
>
> Yeah, IMO whole mess could have been avoided by keeping the filename/
> args/environ simply *bytes*, like it really is, on unix. Then, make
> the Windows version of python use (always! not dependent upon locale!)
> utf-8 to decode the utf-8 bytestring to the UTF-16 that the Windows
> platform APIs expect (and vice versa).

If possible, I would try to avoid this useless roundtrip from UTF-16 to UTF-8 
and back.

> And never use the ASCII variant of the windows APIs.

That's okay, but I'm afraid it's not possible. The problem is not so much 
doing it, but finding all those places where it is currently done. Those 
could be outside of Python itself. So, even to Python code, there could still 
be APIs that would need the MBCS-encoded strings.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From v+python at g.nevcal.com  Mon Dec  8 10:54:54 2008
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 08 Dec 2008 01:54:54 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <87ljurnixc.fsf@xemacs.org>
References: <mailman.27161.1228543139.3486.python-dev@python.org>	<493B22F8.8090902@gmail.com>	<200812070235.41321@news.perlig.de>	<493B2C22.5060907@gmail.com>	<aac2c7cb0812062053i61aebef0tb7de3362abfc464d@mail.gmail.com>	<493B923F.6010706@gmx.net>	<aac2c7cb0812070121w3645e475o6b3801c44e5b01eb@mail.gmail.com>	<493B98D3.8070405@gmx.net>	<aac2c7cb0812070935g6a901b71qed4c4461e31a1a1@mail.gmail.com>	<dcbbbb410812071018q3e28c1fdg314fb1623b284c7@mail.gmail.com>	<aac2c7cb0812071056y7cd92f42k38bc5d7f3fb05c26@mail.gmail.com>	<493C83A0.5020606@g.nevcal.com>	<87prk3nw25.fsf@xemacs.org>	<493CA658.6030106@g.nevcal.com>
	<87ljurnixc.fsf@xemacs.org>
Message-ID: <493CEEEE.6010308@g.nevcal.com>

On approximately 12/8/2008 12:57 AM, came the following characters from 
the keyboard of Stephen J. Turnbull:

> "Internal decoding" is (or should be) an oxymoron.  Why would your
> software be passing around text in any format other than internal?  So
> decoding will happen (a) on I/O, which is itself almost certainly
> slower than making a few checks for Unicode hygiene, or (b) on receipt
> of data from other software that whose sanitation you shouldn't trust
> more than you trust the Internet.
> 
> Encoding isn't a problem, AFAICS.


So I can see validating user supplied data, which always comes in via I/O.

But during manipulation of internal data, including file and database 
I/O, there is a need for encoding and decoding also.  If all the data 
has already been validated, then there would be no need to revalidate on 
every conversion.

I hear you when you say that clever coding can make the validation 
nearly free, and I applaud that: the UTF-8 coder that I wrote predated 
most of the rules that have been created since, so I didn't attempt to 
be clever in that regard.

Thanks to you and Adam for your explanations; I see your points, and if 
it is nearly free, I withdraw most of my negativity on this topic.


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From ncoghlan at gmail.com  Mon Dec  8 11:12:05 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 08 Dec 2008 20:12:05 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghhnlr$6lf$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
Message-ID: <493CF2F5.9000904@gmail.com>

Terry Reedy wrote:
> This to be is an argument for keeping the default the current behavior,
> but not for rejecting flexibility.  The computing world seems to be
> messier than we would like and worse that I realized until this week. As
> you say below, people need to better anticipate the future, and an
> errors parameter would help do that.

It just occurred to me that this seems like a perfect situation to
address via the warning system. The normal warnings mechanics can then
be used to turn it into an exception if so desired, and this can be done
once per application rather than having to pass a separate argument
every time the affected APIs are called.

And the decoding problems don't pass silently either - they just get
emitted as a warning by default instead of causing the application to crash.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From eckhardt at satorlaser.com  Mon Dec  8 11:20:49 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Mon, 8 Dec 2008 11:20:49 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<ghhem0$doj$1@ger.gmane.org> 
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
Message-ID: <200812081120.49409.eckhardt@satorlaser.com>

On Sunday 07 December 2008, Guido van Rossum wrote:
> My problem with raising exceptions *by default* when an undecodable
> name exists is that it may render an app completely useless in a
> situation where the developer is no longer around. This happened all
> the time with the 2.x Unicode API, where the developer hadn't
> anticipated a particular input potentially containing non-ASCII bytes,
> and the user fed the application non-ASCII text. Making os.listdir
> raise an exception when a directory contains a single undecodable file
> means that the entire directory can't be read, and most likely the
> entire app crashes at that point. Most likely the developer never
> anticipated this situation (since in most places it is either
> impossible or very unlikely) -- after all, if they had anticipated it
> they would have used the bytes API in the first place.

There is another way to handle this that noisily signals errors but doesn't 
cause programs to suddenly fail. Using os.listdir as example, the problem 
there is that the OS actually returns a list of strings that can not be 
reliably decoded, so I would propose to simply not decode them.

Now, the idea is what if this function simply returned neither a byte string 
nor a Unicode string, but e.g. an environment string type (called env_str)? 
os.listdir would only fail if it really failed to read the dir. If a user 
wants to display an element from the returned list, they would get something 
akin to what repr() returns, i.e. a recognisable string that can be written 
to a logfile. However, this thing will also include additional markup that 
makes it clear that it is not just a piece of text and not suitable to 
display to the end user.

This type distinction is important, because it means that any developer will 
immediately see that something unexpected is going on here. They will 
invoke "type(lst[0])" and see the unexpected type env_str, which will (via 
documentation) redirect them to the issue with different encodings and that 
all they have to do is 'map( unicode, lst)' in order to get at a list of real 
text strings, but they will also read that this operation might fail, forcing 
an informed decision.

If they don't care about a textual representation at all but only want to 
invoke os.popen with arguments received from the commandline, then everything 
is fine, too, because that function will take the strings as they are and 
just give them back to the OS. This allows roundtripping from OS over Python 
and back to the OS without any conversions and thus without any conversions 
that could fail. In the case of e.g. a backup program, this is exactly what 
is needed.

Now, if you have any hard-coded strings in your program but a function like 
os.popen needs an env_str object, this string is converted via a default 
encoding, i.e. the same that is used when converting an env_str object to 
Unicode. In this case, I would go so far to say that os.popen should accept 
normal str strings, too, and perform that conversion itself. An alternative 
way would be to reject the string because it is the wrong type, but since 
this internal string's encoding is known, there is no reason to force users 
to convert explicitly, it is just that the conversion might fail.

Similarly, when modifying such an env_str object, like e.g. "bak = 
sys.argv[1]+'.backup'". In this case, the string '.backup' is converted 
according to the default encoding and then appended to the commandline 
argument, the result would again be an env_str object.


Note: There is an option in this design, and that is to make the default 
behaviour in case of nonconvertable env_str objects configurable. A 
filemanager would then replace the undecodable bytes by an approximation, a 
backup program would use strict mode and a music player would perhaps simply 
skip and ignore such strings. The problem there is that changing this option 
would possibly affect other library code that one doesn't even know about 
because it is only used indirectly and its implementation is unknown. For 
that reason, I would rather not make this policy a configurable element. If 
you want that, you can easily code it yourself.

BTW: there was a PEP that proposed a new path class, which was rejected. This 
class was actually pretty similar, except that it also included several other 
features (globbing, path handling, opening files and the kitchen sink) which 
eventually made it too bloated. Otherwise, the idea of creating a separate 
type for these strings is the same.


Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From lists at cheimes.de  Mon Dec  8 11:53:09 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 08 Dec 2008 11:53:09 +0100
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C9FE7.7040908@v.loewis.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>
	<493C64B9.2040701@cheimes.de> <493C9FE7.7040908@v.loewis.de>
Message-ID: <493CFC95.1050306@cheimes.de>

Martin v. L?wis wrote:
> I wasn't (primarily) talking about fixing this particular issue.
> Time needs to be made available also for the upcoming 2.4.6 and 2.5.3
> releases (which should, IMO, get priority over a 3.0 bugfix release
> at this point)

I've no opinion on the priority of the releases. Since you are 
responsible for the 2.4 and 2.5 releases as well as the Windows 
binaries, it's your choice. For the future we should find somebody to 
assist you with the Windows installers in order to release some pressure 
from you.

> I think 3.0.1 should also address other serious bugs in 3.0, such
> as
> - various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410)
> - various ways to crash Python through the buffer protocol
>   (4583, 4509; also 4580)

My list wasn't complete. I'm +1 for your additions.

> IIUC, you want the bugfix version number to be sync'ed. I don't
> think that is a useful thing to have.

Yeah. Barry also said it's a neat thing to have - but just a neat thing.

> I don't recall such policy, and I can't see anything wrong with
> including performance fixes in a bug fix release. Maybe you were
> confusing this with whether performance fixes can be considered
> release-critical (which they shouldn't, IMO)?

Maybe I'm a confused person? :]

Christian


From barry at python.org  Mon Dec  8 14:11:10 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Dec 2008 08:11:10 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C70B9.2030601@cheimes.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>
	<493C70B9.2030601@cheimes.de>
Message-ID: <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:

> Barry Warsaw wrote:
>> I'm personally okay with performance fixes in point releases, as  
>> long it doesn't change API or add additional features.
>
> Does your okay include or exclude new internal APIs like new helper  
> functions or a new C modules?

I /personally/ don't have a problem with that, but we need consensus  
before that becomes policy.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST0c7nEjvBPtnXfVAQJvQwQAjrCuivCuLT3HNq6n5VvUKVkxto5wyBzW
ka9YuFoBCVRDt7Z7Sn59UeLGVgrsL9Zw2rSra4cXE/1QaUzpxJlaFpafWVJilCPh
+hv6/t6ky0Ww0FsEv+56SRHOVRlfqgNMIbmDXemf40Oo/IYxqNL5HP59NeIvk0oa
u3Mmc7qsP1k=
=ZK8M
-----END PGP SIGNATURE-----

From barry at python.org  Mon Dec  8 14:12:04 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Dec 2008 08:12:04 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <493C9FE7.7040908@v.loewis.de>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<ca471dc20812061525p2a4432b3y448dc53139e5da0d@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
	<493C9FE7.7040908@v.loewis.de>
Message-ID: <59E06D94-4596-46C4-BFF1-BA10A46C76E0@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 7, 2008, at 11:17 PM, Martin v. L?wis wrote:

> I don't recall such policy, and I can't see anything wrong with
> including performance fixes in a bug fix release. Maybe you were
> confusing this with whether performance fixes can be considered
> release-critical (which they shouldn't, IMO)?

I agree with that.
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST0dJHEjvBPtnXfVAQIqhwQAkdJgQs8aq452mQRWGdNKLBw5Fsu1m/uV
PGcYbRvfD5nzKPhRvCK42okPaUTWXOAuLHf8gvLT+LwRewmztsMVb0JZKVf1MIuT
Msw60Du7jjNgjcbgd55i5mn7swQmGONB7iFfyq5htL3Bp1zQIi+Fhhi4/hZconHl
BTnbqfLGz1Q=
=u9GH
-----END PGP SIGNATURE-----

From skip at pobox.com  Mon Dec  8 14:13:31 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 8 Dec 2008 07:13:31 -0600
Subject: [Python-Dev] Deciding on dbm API in setup.py
Message-ID: <18749.7547.117133.919493@montanaro-dyndns-org.local>

Several packages provide a dbm-compatible API.  Currently, the code in
Python's setup.py hardcodes the order of consideration: ndbm, then gdbm,
then Berkeley DB.  While the APIs are compatible, the file formats are all
different as far as I know.  If you have ndbm but want to use Berkeley DB
format, you're stuck.  Right now editing setup.py is the only way to
influence the order.

I opened an issue on the bug tracker about this: 

    http://bugs.python.org/issue4587

It includes a patch which adds an optional environment variable
(PYDBMLIBORDER) which builders can use to override the order of the default
library checks.  I'm not sure that's the "correct" way to do this, but I'm
at a loss to figure out how else to do it.  Is it possible to easily add a
flag to setup.py, say --dbm-order=gdbm:bdb:ndbm?

If you've got any -- even passing -- interest in this, please read the issue
and add a comment if you feel so moved.

This grew out of a change to adapt to new gdbm library organization:

    http://bugs.python.org/issue4487

Unbeknownst to me, I apparently wound up fixing a previously reported issue
about the change:

    http://bugs.python.org/issue1167

Skip


From mal at egenix.com  Mon Dec  8 15:54:44 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 08 Dec 2008 15:54:44 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4939CBDB.30305@gmail.com>
References: <4938374B.8000006@gmail.com>		<aac2c7cb0812041832l52cb4af5n1a3532ab66739460@mail.gmail.com>		<ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>		<200812051127.35880.eckhardt@satorlaser.com>		<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>		<49398980.7050209@gmail.com>	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>	<493991D3.9030003@gmail.com>
	<4939A8C7.6050209@gmail.com>	<4939AFC6.7000106@gmail.com>
	<4939CBDB.30305@gmail.com>
Message-ID: <493D3534.30505@egenix.com>

On 2008-12-06 01:48, Nick Coghlan wrote:
> You can't display a non-decodable filename to the user, hence the user
> will have no idea what they're working on. Non-filesystem related apps
> have no business trying to deal with insane filenames.

This is not entirely true: OSes, shells, and applications will
typically represent the file names using either ?-replacements or
some form of hex or decimal escapes for the characters they can't
decode. Since humans are usually very good at pattern recognition,
this goes a long way.

Of course, how the application maps that partially converted file name
back to the real thing is another issue and that's something that
Python should not make harder than it should be.

> Linux is moving towards a standard of UTF-8 for filenames, and once we
> get to the point where the idea of encoding filenames and environment
> variables any other way is seen as crazy, then the Python 3 approach
> will work seamlessly.

It's going to take a long time before file names, environment variables
and command line parameters are all encoded using UTF-8, so "practicality
beats purity" will have to get more attention in this thread.

Python APIs should work out of the box most of the time.

Currently, if you live in a non-ASCII and non-pure-UTF-8 environment,
you have to deal with different and mixed encodings on a regular
basis.

Whether that's a USB stick, you're trying to read, a ZIP file
you're trying to open, a mounted network drive, etc. the problem
pops up in many different kinds of areas.

If I write "do_something.py *" I expect Python to indeed work on
all the files in my directory, not just the one that happen to
fit a particular encoding.

If I hook up a CGI script written in Python with a web server,
I expect all data to be received by the script, not just data
that happens to be UTF-8 encoded.

> In the meantime, raw bytes APIs will provide an alternative for those
> that disagree with that philosophy.

I think that's a wrong way to put it: The problems are not made
up by people who disagree with the one-encoding-for-everything
strategy.

The problems occur in real-life IT processing all the time - maybe
not so much in places where English scripts dominate, but certainly
in most other places with non-English scripts.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Mon Dec  8 17:18:01 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 16:18:01 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
Message-ID: <loom.20081208T161109-997@post.gmane.org>


Hello,

The Py_buffer struct has two pointers named `shape` and `strides`. Each points
to an array of Py_ssize_t values whose length is equal to the number of
dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't
explain how allocation of these arrays should be handled.

Right now this is circumvented by either pointing them to an externally-managed
piece of memory (e.g. a Py_ssize_t field in the original PyObject), or by
pointing them to another field in the Py_buffer (because in the case of a
one-dimensional buffer with itemsize == 1, shape[0] is simply equal to the
length of the buffer in bytes).

Of course this is not flexible, and it makes fixing the situation with buffers
of itemsize larger than 1 difficult (indeed, for those buffers, we can't simply
point the shape array to the byte length, and if we are taking a slice of the
memoryview, we can't either point it to the size of the original object (for
example an array.array)). Therefore, arises the problem of allocation of the
shape array.

For the one-dimensional case, I had in mind a simple scheme where the Py_buffer
struct has an additional two-member Py_ssize_t array. Then `shape` and `strides`
can point to the first and second member of this array, respectively. This
wouldn't solve the multi-dimensional case, however.

Thanks for any ideas on how to solve this.

Regards

Antoine.



From rdmurray at bitdance.com  Mon Dec  8 18:30:39 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Mon, 8 Dec 2008 12:30:39 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0812081215460.1160@kimball.webabinitio.net>

On Sun, 7 Dec 2008 at 13:33, Guido van Rossum wrote:
> My problem with raising exceptions *by default* when an undecodable
> name exists is that it may render an app completely useless in a
> situation where the developer is no longer around. This happened all

I think Nick Coghlan's suggestion of emitting warnings would be an
excellent solution that addresses both your concerns and the concerns
Toshio has expressed (and with which I agree 100%).

The above is the only use case I've heard in this thread for ignoring
files with names that can't be decoded:  so that a user can use the
program on those files whose names can be decoded even when the user does
not have the resources to get the program fixed to handle undecodable
filenames.  I agree that that is a worthwhile goal.

If warnings were emitted, then files would not be silently ignored,
yet the program could still be used.

--RDM

PS: I'd like to see a similar warning issued when an access attempt
is made through os.environ to a variable that cannot be decoded.

From janssen at parc.com  Mon Dec  8 18:56:21 2008
From: janssen at parc.com (Bill Janssen)
Date: Mon, 8 Dec 2008 09:56:21 PST
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493B209D.5070306@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812051127.35880.eckhardt@satorlaser.com>
	<ca471dc20812050959m62828ee7me69a4e8fa225aedc@mail.gmail.com>
	<49398980.7050209@gmail.com>
	<ca471dc20812051211j11af7bfbkbed149ca82c13f68@mail.gmail.com>
	<493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com>
	<4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com>
Message-ID: <33922.1228758981@parc.com>

Nick Coghlan <ncoghlan at gmail.com> wrote:

> - I think the binary and Unicode APIs should be available (and fully
> functional) on all platforms (including Windows) so that app developers
> don't create portability problems for themselves when they make the
> decision as to which API to use

+1

I'm perhaps biased here; most of my Python programs don't have user
interfaces, because they don't "talk" to people, they talk to other
programs.  The binary APIs for the OS are essential.  I use and
deeply appreciate all the string handling features in Python,
particularly its firm grip on Unicode issues, but that's *useful*
instead of *essential*.

Bill

From tjreedy at udel.edu  Mon Dec  8 19:16:03 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 08 Dec 2008 13:16:03 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493CF2F5.9000904@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>
	<493CF2F5.9000904@gmail.com>
Message-ID: <ghjo94$tv0$1@ger.gmane.org>

Nick Coghlan wrote:
> Terry Reedy wrote:
>> This to be is an argument for keeping the default the current behavior,
>> but not for rejecting flexibility.  The computing world seems to be
>> messier than we would like and worse that I realized until this week. As
>> you say below, people need to better anticipate the future, and an
>> errors parameter would help do that.
> 
> It just occurred to me that this seems like a perfect situation to
> address via the warning system.

I disagree.

 > The normal warnings mechanics can then
> be used to turn it into an exception if so desired, and this can be done
> once per application rather than having to pass a separate argument
> every time the affected APIs are called.

The warning mechanism, as far as I know, because I have never dealt with 
it (and do not want to) is for version issues.  In any case, the snippet 
that you clipped

try:
   files = os.listdir(somedir, errors = strict)
except OSError as e:
   log(<verbose error message that includes somedir and e>)
   files = os.listdir(somedir)

specifically requires a per call parameter.

> And the decoding problems don't pass silently either - they just get
> emitted as a warning by default instead of causing the application to crash.

Do they get automatically logged?  In any case, the errors parameter has 
an in between option to neither ignore or raise but to replace and give 
*something* printable.

This situation seems like an ideal situation for a parameter which gives 
the application program who uses Python a range of options to working 
with an un-ideal world.  I am really flabbergasted why there is so much 
opposition to doing so in favor of more difficult or less functional 
alternatives.

Terry Jan Reedy


From guido at python.org  Mon Dec  8 19:26:46 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Dec 2008 10:26:46 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghhnlr$6lf$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
Message-ID: <ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>

On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Guido van Rossum wrote:
>>
>> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>
>>> Toshio Kuratomi wrote:
>>>
>>>>  - If this is true, a definition of os.listdir(<type 'str'>) that would
>>>> better meet programmer expectation would be: "Give me all files in a
>>>> directory with the output as str type".  The definition of
>>>> os.listdir(<type 'bytes'>) would be "Give me all files in a directory
>>>> with the output as bytes type".  Raising an exception when the filenames
>>>> are undecodable is perfectly reasonable in this situation.
>>>
>>> Your examples (snipped) pretty well convince me that there is a use case
>>> for
>>> raising exceptions.  We should move beyond arguing over which one way is
>>> right.  I think there should be a second argument 'ignorebad=False' to
>>> ignore undecodable files rather than raise the exception (or
>>> 'strict=True'
>>> to stop and raise exception on non-decodable names -- then code is 'if
>>> strict: raise ...').  I believe other functions have a similar parameter.
>
> I was thinking of the "normal Unicode 'errors' parameter", as described by
> Nick.
>
>> If you want the exceptions, just use the bytes API and try to decode
>> the byte strings using the system encoding.
>
> If it was a matter of adding a new method, I might agree.  But:
>
> 1. We already have a method that does exactly what you describe.  It is only
> a matter of adding flexibility to the response to problems, for which there
> is already precedent.
>
> 2. Suggesting that people who want strings and not bytes should have to deal
> with bytes, just to get an error notification, seems to negate that point of
> moving to 3.0
>
> 3. A builtin would probably do so better than most programmers would, with
> little touches such as the one suggested below.
>
> 4. An error parameter would ALERT programmers to the possibility of a
> PROBLEM, both in the present and future.  As you say below, people need to
> better anticipate the future.
>
>> My problem with raising exceptions *by default* when an undecodable
>> name exists is that it may render an app completely useless in a
>> situation where the developer is no longer around. This happened all
>> the time with the 2.x Unicode API, where the developer hadn't
>> anticipated a particular input potentially containing non-ASCII bytes,
>> and the user fed the application non-ASCII text. Making os.listdir
>> raise an exception when a directory contains a single undecodable file
>> means that the entire directory can't be read, and most likely the
>> entire app crashes at that point. Most likely the developer never
>> anticipated this situation (since in most places it is either
>> impossible or very unlikely) -- after all, if they had anticipated it
>> they would have used the bytes API in the first place. (It's worse
>> because the exception being raised would be UnicodeError -- most
>> people expect os.listdir to raise OSError, not other errors.)
>
> This to be is an argument for keeping the default the current behavior, but
> not for rejecting flexibility.  The computing world seems to be messier than
> we would like and worse that I realized until this week. As you say below,
> people need to better anticipate the future, and an errors parameter would
> help do that.

I'm fine with whatever API enhancements you can come up with (assuming
others like them too :-) as long as the default remains the current
behavior.

> Is Windows really immune?  What about when it reads the directory of
> possibly old removable media with whatever byte name encodings?  Is this a
> possible source of 'unanticipated' problems?
>
> As to your last sentence, os.listdir() with an errors parameter could
> convert a decoding UnicodeError to "OSError: undecodable file name
> <ascii+hex repr>", thereby supplying the expected exception as well as an
> extractable representation of problematical the raw bytes
>
> Here is a possible use case: I want filenames as 3.0 strings and I
> anticipate no problems at present but, as you say above, something might
> happen years in the future.  I am using 3.0 *because* of the strings ==
> unicode feature.  I would like to write
>
> try:
>  files = os.listdir(somedir, errors = strict)
> except OSError as e:
>  log(<verbose error message that includes somedir and e>)
>  files = os.listdir(somedir)
>
> and go one without the problem file but not without logging the problem so a
> future maintainer can consider what to do about it, but only when there is
> an actual need to think about it.
>
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rdmurray at bitdance.com  Mon Dec  8 19:34:37 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Mon, 8 Dec 2008 13:34:37 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghjo94$tv0$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<4939CBDB.30305@gmail.com>
	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>
	<20081206143454.GA15293@phd.pp.ru>
	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>
	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org> <493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
Message-ID: <Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>

On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:
>>  And the decoding problems don't pass silently either - they just get
>>  emitted as a warning by default instead of causing the application to
>>  crash.
>
> Do they get automatically logged?  In any case, the errors parameter has an 
> in between option to neither ignore or raise but to replace and give 
> *something* printable.
>
> This situation seems like an ideal situation for a parameter which gives the 
> application program who uses Python a range of options to working with an 
> un-ideal world.  I am really flabbergasted why there is so much opposition to 
> doing so in favor of more difficult or less functional alternatives.

I'm in favor of an option to control what happens.

I just really really don't want the _default_ to be "ignore".  Defaulting
to a warning is fine with me, as would be defaulting to a traceback.

But defaulting to "silently ignore", as we have now, is just asking for user
confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
experience, IMO, than having a program fail when undecodable filenames
match the selection criteria.

--RDM

From brett at python.org  Mon Dec  8 20:14:24 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Dec 2008 11:14:24 -0800
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>
	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>
	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>
	<493BD1F2.5080300@holdenweb.com>
	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>
	<493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de>
	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>
	<493C70B9.2030601@cheimes.de>
	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>
Message-ID: <bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>

On Mon, Dec 8, 2008 at 05:11, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:
>
>> Barry Warsaw wrote:
>>>
>>> I'm personally okay with performance fixes in point releases, as long it
>>> doesn't change API or add additional features.
>>
>> Does your okay include or exclude new internal APIs like new helper
>> functions or a new C modules?
>
> I /personally/ don't have a problem with that, but we need consensus before
> that becomes policy.
>

Internal as in just for us I am fine with, but not nothing publicly available.

As for new C modules, I am fine with that as well as long as they add
no new build dependencies.

-Brett

From guido at python.org  Mon Dec  8 20:25:18 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Dec 2008 11:25:18 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org> <493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
Message-ID: <ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>

On Mon, Dec 8, 2008 at 10:34 AM,  <rdmurray at bitdance.com> wrote:
> On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:
>>>
>>>  And the decoding problems don't pass silently either - they just get
>>>  emitted as a warning by default instead of causing the application to
>>>  crash.
>>
>> Do they get automatically logged?  In any case, the errors parameter has
>> an in between option to neither ignore or raise but to replace and give
>> *something* printable.
>>
>> This situation seems like an ideal situation for a parameter which gives
>> the application program who uses Python a range of options to working with
>> an un-ideal world.  I am really flabbergasted why there is so much
>> opposition to doing so in favor of more difficult or less functional
>> alternatives.
>
> I'm in favor of an option to control what happens.
>
> I just really really don't want the _default_ to be "ignore".  Defaulting
> to a warning is fine with me, as would be defaulting to a traceback.
>
> But defaulting to "silently ignore", as we have now, is just asking for user
> confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
> experience, IMO, than having a program fail when undecodable filenames
> match the selection criteria.

Do you really not care about the risk where apps that weren't written
to be prepared to handle this will be rendered completely useless if a
single file in a directory has an unencodable name? This is similar to
an issue that Python had for a long time where it wouldn't start up if
the current directory contained non-ASCII characters.

Given that most developers will not have this issue in their own
environment, most apps will not be prepared for this issue, and that
makes it worse for the app's user!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From scott+python-dev at scottdial.com  Mon Dec  8 20:39:13 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Mon, 08 Dec 2008 14:39:13 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>
	<493CF2F5.9000904@gmail.com>	<ghjo94$tv0$1@ger.gmane.org>	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
	<ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
Message-ID: <493D77E1.2000401@scottdial.com>

Guido van Rossum wrote:
> On Mon, Dec 8, 2008 at 10:34 AM,  <rdmurray at bitdance.com> wrote:
>> On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote:
>>>>  And the decoding problems don't pass silently either - they just get
>>>>  emitted as a warning by default instead of causing the application to
>>>>  crash.
>>> Do they get automatically logged?  In any case, the errors parameter has
>>> an in between option to neither ignore or raise but to replace and give
>>> *something* printable.
>>
>> I just really really don't want the _default_ to be "ignore".  Defaulting
>> to a warning is fine with me, as would be defaulting to a traceback.
> 
> Do you really not care about the risk where apps that weren't written
> to be prepared to handle this will be rendered completely useless if a
> single file in a directory has an unencodable name?

Since when do warnings cause apps to be rendered completely useless? I
think it's easy to agree that defaulting to an exception is not good for
the reason you give, but I don't see how that applies to a warning. And,
it seems like a warning covers the issues that the other people want as
well. If there is a warning, then there is at least a record of the fact
that some filenames were ignored. Presumably if I was responsible for
the correctness of some piece of code, I would see the warning in a log
of some sort and could investigate it further (if I cared), otherwise I
could choose to ignore it. I don't see os.listdir(name) to be one of
those situations that emitting a warning is a nuisance at all.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From larry.bugbee at boeing.com  Mon Dec  8 20:49:17 2008
From: larry.bugbee at boeing.com (Bugbee, Larry)
Date: Mon, 8 Dec 2008 11:49:17 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <mailman.27726.1228763668.3486.python-dev@python.org>
References: <mailman.27726.1228763668.3486.python-dev@python.org>
Message-ID: <9418DB6C0B9D434190E54A78E931C3D1087D7A0B@XCH-NW-7V1.nw.nos.boeing.com>

> I'm perhaps biased here; most of my Python programs don't have user 
> interfaces, because they don't "talk" to people, they talk to other 
> programs.  The binary APIs for the OS are essential.  I use and 
> deeply appreciate all the string handling features in Python, 
> particularly its firm grip on Unicode issues, but that's *useful* 
> instead of *essential*.

Exactly!  Another +1.

Larry



From rdmurray at bitdance.com  Mon Dec  8 21:07:16 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Mon, 8 Dec 2008 15:07:16 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org> 
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com> 
	<ghhnlr$6lf$1@ger.gmane.org> <493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
	<ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0812081504320.1160@kimball.webabinitio.net>

On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:
> On Mon, Dec 8, 2008 at 10:34 AM,  <rdmurray at bitdance.com> wrote:
>> I'm in favor of an option to control what happens.
>>
>> I just really really don't want the _default_ to be "ignore".  Defaulting
>> to a warning is fine with me, as would be defaulting to a traceback.
>>
>> But defaulting to "silently ignore", as we have now, is just asking for user
>> confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
>> experience, IMO, than having a program fail when undecodable filenames
>> match the selection criteria.
>
> Do you really not care about the risk where apps that weren't written
> to be prepared to handle this will be rendered completely useless if a
> single file in a directory has an unencodable name? This is similar to
> an issue that Python had for a long time where it wouldn't start up if
> the current directory contained non-ASCII characters.

No, I do care.  In another message I agreed with you that having the
ap not fail was a reasonable goal.  What I'm saying is that having it
ignore the undecodable files fail _silently_ is bad.  And not picking
up a file that matches some selection criteria (ex: *.py) because it is
undecodable is a _failure_, in my opinion, that is _worse_ than getting
a traceback because there's an undecodable file in the directory.

But I'm happy with just issuing a warning by default.  That would mean
it doesn't fail silently, but neither does it crash.  Seems like the
best compromise with the broken nature of the real world IT
environment.

> Given that most developers will not have this issue in their own
> environment, most apps will not be prepared for this issue, and that
> makes it worse for the app's user!

It is exactly because most developers won't have the issue in their own
environment that ignoring files silently is a problem.  If they did,
they'd fix their code before it went out the door.  Since they don't,
when their code is used by somebody in a mixed encoding environment,
the programs _will_ fail by ignoring files that they should process.
The question, it seems to me, is do they fail silently and mysteriously
by failing to process files they are supposed to, or do they fail with
at least a little bit of noise?

--RDM

From guido at python.org  Mon Dec  8 21:12:57 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Dec 2008 12:12:57 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <Pine.LNX.4.64.0812081504320.1160@kimball.webabinitio.net>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org> <493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
	<ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
	<Pine.LNX.4.64.0812081504320.1160@kimball.webabinitio.net>
Message-ID: <ca471dc20812081212t34853ae1q6ad3a04f1ddf6544@mail.gmail.com>

On Mon, Dec 8, 2008 at 12:07 PM,  <rdmurray at bitdance.com> wrote:
> On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:
>>
>> On Mon, Dec 8, 2008 at 10:34 AM,  <rdmurray at bitdance.com> wrote:
>>>
>>> I'm in favor of an option to control what happens.
>>>
>>> I just really really don't want the _default_ to be "ignore".  Defaulting
>>> to a warning is fine with me, as would be defaulting to a traceback.
>>>
>>> But defaulting to "silently ignore", as we have now, is just asking for
>>> user
>>> confusion and debugging headaches, as detailed by Toshio.  A _worse_ user
>>> experience, IMO, than having a program fail when undecodable filenames
>>> match the selection criteria.
>>
>> Do you really not care about the risk where apps that weren't written
>> to be prepared to handle this will be rendered completely useless if a
>> single file in a directory has an unencodable name? This is similar to
>> an issue that Python had for a long time where it wouldn't start up if
>> the current directory contained non-ASCII characters.
>
> No, I do care.  In another message I agreed with you that having the
> ap not fail was a reasonable goal.  What I'm saying is that having it
> ignore the undecodable files fail _silently_ is bad.  And not picking
> up a file that matches some selection criteria (ex: *.py) because it is
> undecodable is a _failure_, in my opinion, that is _worse_ than getting
> a traceback because there's an undecodable file in the directory.
>
> But I'm happy with just issuing a warning by default.  That would mean
> it doesn't fail silently, but neither does it crash.  Seems like the
> best compromise with the broken nature of the real world IT
> environment.

OK, I can live with that too.

>> Given that most developers will not have this issue in their own
>> environment, most apps will not be prepared for this issue, and that
>> makes it worse for the app's user!
>
> It is exactly because most developers won't have the issue in their own
> environment that ignoring files silently is a problem.  If they did,
> they'd fix their code before it went out the door.  Since they don't,
> when their code is used by somebody in a mixed encoding environment,
> the programs _will_ fail by ignoring files that they should process.
> The question, it seems to me, is do they fail silently and mysteriously
> by failing to process files they are supposed to, or do they fail with
> at least a little bit of noise?

A warning is fine. Whether the app *fails* or *succeeds* when the
warning is issued depends on what the app is trying to do and what the
user expects. There certainly are valid use cases for both, but I
expect that succeeding noisily is going to be at least as common as
failing (in the sense of not doing the right thing, not necessarily
crashing) noisily. This is an improvement over always crashing.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Mon Dec  8 21:35:35 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 06:35:35 +1000
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>
	<493C64B9.2040701@cheimes.de>	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>	<493C70B9.2030601@cheimes.de>	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>
	<bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>
Message-ID: <493D8517.60904@gmail.com>

Brett Cannon wrote:
> On Mon, Dec 8, 2008 at 05:11, Barry Warsaw <barry at python.org> wrote:
>> On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote:
>>> Barry Warsaw wrote:
>>>> I'm personally okay with performance fixes in point releases, as long it
>>>> doesn't change API or add additional features.
>>> Does your okay include or exclude new internal APIs like new helper
>>> functions or a new C modules?
>> I /personally/ don't have a problem with that, but we need consensus before
>> that becomes policy.
> Internal as in just for us I am fine with, but not nothing publicly available.

Where would adding a (undocumented) get_filename() method to ZipImporter
objects for the benefit of the -m switch fit then? There are a few
things which don't always work properly because runpy doesn't currently
know how to set __file__ properly when the module comes a zipfile.

Although now that I think about it, I could actually fix that "the right
way" (with a documented get_filename() method on ZipImporter) for 2.7
and 3.1, while using a runpy internal workaround specifically for
ZipImporter instances in the maintenance branches...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From dato at net.com.org.es  Mon Dec  8 19:51:57 2008
From: dato at net.com.org.es (Adeodato =?utf-8?B?U2ltw7M=?=)
Date: Mon, 8 Dec 2008 19:51:57 +0100
Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg.
	execution bit)
Message-ID: <20081208185157.GA19135@chistera.yi.org>

Hello,

after using 2to3 --write over some scripts, I found it very cumbersome
having to run `chmod +x` on each of them afterwards.

The attached patch is a possible way to fix this issue. It'd be great if
somebody could apply it, or write a more appropriate fix.

Many thanks in advance!

P.S.: Please CC me on replies.

-- 
Adeodato Sim?                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
                            Listening to: Manolo Garc?a - Prend? la flor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2to3_preserve_mode.diff
Type: text/x-diff
Size: 584 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081208/747f7773/attachment.diff>

From mal at egenix.com  Mon Dec  8 21:37:52 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 08 Dec 2008 21:37:52 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
Message-ID: <493D85A0.6060601@egenix.com>

On 2008-12-08 19:26, Guido van Rossum wrote:
> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> Here is a possible use case: I want filenames as 3.0 strings and I
>> anticipate no problems at present but, as you say above, something might
>> happen years in the future.  I am using 3.0 *because* of the strings ==
>> unicode feature.  I would like to write
>>
>> try:
>>  files = os.listdir(somedir, errors = strict)
>> except OSError as e:
>>  log(<verbose error message that includes somedir and e>)
>>  files = os.listdir(somedir)
>>
>> and go one without the problem file but not without logging the problem so a
>> future maintainer can consider what to do about it, but only when there is
>> an actual need to think about it.

If that error parameter is the same as in unicode(value, errors),
then this would be a useful feature:

People could then choose among the already existing error handlers
('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
their own ones via the codecs module.

Such application specific error handlers could then also apply
whatever fancy round-trip safe encoding of non-decodable bytes
to Unicode escapes, private code points, etc. as seen fit by the
application.

Perhaps we should also add an ''encoding'' parameter that can be
set on a per directory basis (if necessary) and defaults to the
global file system encoding.

If an application hits directory that is known to cause problems,
it could then chose to receive the file names in a different,
more suitable encoding. This allows implementing fallback
mechanisms with a list of common encodings for a locale.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From solipsis at pitrou.net  Mon Dec  8 21:39:07 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 20:39:07 +0000 (UTC)
Subject: [Python-Dev] 3.0.1 possibilities
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>
	<493C64B9.2040701@cheimes.de>	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>	<493C70B9.2030601@cheimes.de>	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>
	<bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>
	<493D8517.60904@gmail.com>
Message-ID: <loom.20081208T203826-911@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> Where would adding a (undocumented) get_filename() method to ZipImporter
> objects for the benefit of the -m switch fit then?

Why not call it _get_filename() in 3.0 and get_filename() in 3.1?




From solipsis at pitrou.net  Mon Dec  8 21:45:50 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 20:45:50 +0000 (UTC)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
Message-ID: <loom.20081208T204431-761@post.gmane.org>

M.-A. Lemburg <mal <at> egenix.com> writes:
> 
> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.

I'd argue that such fancy round-trip safe error handler should be provided by
Python. It's not reasonable to expect application coders to come up with their
own codec variation based on subtle details of the unicode spec.

Regards

Antoine.



From ncoghlan at gmail.com  Mon Dec  8 21:46:53 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 06:46:53 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T161109-997@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
Message-ID: <493D87BD.90106@gmail.com>

Antoine Pitrou wrote:
> For the one-dimensional case, I had in mind a simple scheme where the Py_buffer
> struct has an additional two-member Py_ssize_t array. Then `shape` and `strides`
> can point to the first and second member of this array, respectively. This
> wouldn't solve the multi-dimensional case, however.
> 
> Thanks for any ideas on how to solve this.

Actually, I think your suggested scheme for the one-dimensional case
shows the way forward: ownership of the shape and strides memory belongs
to the object issuing the Py_buffer struct, and that object needs to
deal with it when the buffer is released. Defining a larger memory chunk
with the Py_buffer as the first item and the shape and stride info
tacked onto the end and returning that from PyObject_GetBuffer() means
that the shape/stride info will be released automatically when the view
is released via PyBuffer_Release().

For more complicated cases, the object providing the views may need to
do some internally bookkeeping to map from Py_buffer pointers to
separately allocated shape/stride information and release those when the
views are released.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Mon Dec  8 21:50:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 06:50:47 +1000
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <loom.20081208T203826-911@post.gmane.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>	<493C64B9.2040701@cheimes.de>	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>	<493C70B9.2030601@cheimes.de>	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>	<bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>	<493D8517.60904@gmail.com>
	<loom.20081208T203826-911@post.gmane.org>
Message-ID: <493D88A7.60701@gmail.com>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> Where would adding a (undocumented) get_filename() method to ZipImporter
>> objects for the benefit of the -m switch fit then?
> 
> Why not call it _get_filename() in 3.0 and get_filename() in 3.1?

Actually, since it should only be a fairly trivial couple of lines of
code, I think I'm going to put it in the runpy._get_filename() helper
function in the maintenance branches and only move it over to
ZipImporter on the trunk and the py3k branch. That way it's completely
unambiguous that this is just a bug fix for runpy rather than a new
feature for ZipImporter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From dickinsm at gmail.com  Mon Dec  8 21:56:25 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Mon, 8 Dec 2008 20:56:25 +0000
Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg.
	execution bit)
In-Reply-To: <20081208185157.GA19135@chistera.yi.org>
References: <20081208185157.GA19135@chistera.yi.org>
Message-ID: <5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com>

On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Sim? <dato at net.com.org.es> wrote:
>
> The attached patch is a possible way to fix this issue. It'd be great if
> somebody could apply it, or write a more appropriate fix.

Please could you submit your patch to the bug tracker, at

http://bugs.python.org

That way it's less likely to get lost. :)

Thanks,

Mark

From barry at python.org  Mon Dec  8 22:01:29 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 8 Dec 2008 16:01:29 -0500
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <loom.20081208T203826-911@post.gmane.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>
	<493C64B9.2040701@cheimes.de>	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>	<493C70B9.2030601@cheimes.de>	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>
	<bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>
	<493D8517.60904@gmail.com>
	<loom.20081208T203826-911@post.gmane.org>
Message-ID: <EE8BD06B-9AAA-4AC8-B390-B7E9B2114D86@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote:

> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>
>> Where would adding a (undocumented) get_filename() method to  
>> ZipImporter
>> objects for the benefit of the -m switch fit then?
>
> Why not call it _get_filename() in 3.0 and get_filename() in 3.1?

+1
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBST2LKXEjvBPtnXfVAQJZzAP/avX4YgpBSmOAh6Zc2TZEnsllRz6CRa86
bEPCWF1an7H9zzDl6gS5ZjbstXoEPf0Irr+W6BTSLVnRT/G7rFgw5q/QlG2yqvCP
dgOCT1Vr3PXgXouNkGaBFI5L/Aw2fuDadWUpGeA3FgH3PxaAH0XAr5LcKP2SidXc
v5nDim8lCxc=
=k3gW
-----END PGP SIGNATURE-----

From mal at egenix.com  Mon Dec  8 22:01:40 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 08 Dec 2008 22:01:40 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <loom.20081208T204431-761@post.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>	<493D85A0.6060601@egenix.com>
	<loom.20081208T204431-761@post.gmane.org>
Message-ID: <493D8B34.1070506@egenix.com>

On 2008-12-08 21:45, Antoine Pitrou wrote:
> M.-A. Lemburg <mal <at> egenix.com> writes:
>> Such application specific error handlers could then also apply
>> whatever fancy round-trip safe encoding of non-decodable bytes
>> to Unicode escapes, private code points, etc. as seen fit by the
>> application.
> 
> I'd argue that such fancy round-trip safe error handler should be provided by
> Python. It's not reasonable to expect application coders to come up with their
> own codec variation based on subtle details of the unicode spec.

Fair enough. We could add some e.g.

 * a round-trip safe escape error handler that uses a Unicode private
   code point area which we officially reserve for the Python
   interpreter

 * a human readable escape error handler that encodes the problem
   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
   encoded directory name instead of failing

 * a warning error handler that replaces the problem cases with
   a question mark and issues a warning through the warning
   framework

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ncoghlan at gmail.com  Mon Dec  8 22:03:56 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 07:03:56 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghjo94$tv0$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>	<493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
Message-ID: <493D8BBC.10503@gmail.com>

Terry Reedy wrote:
> Nick Coghlan wrote:
>> Terry Reedy wrote:
>>> This to be is an argument for keeping the default the current behavior,
>>> but not for rejecting flexibility.  The computing world seems to be
>>> messier than we would like and worse that I realized until this week. As
>>> you say below, people need to better anticipate the future, and an
>>> errors parameter would help do that.
>>
>> It just occurred to me that this seems like a perfect situation to
>> address via the warning system.
> 
> I disagree.
> 
>> The normal warnings mechanics can then
>> be used to turn it into an exception if so desired, and this can be done
>> once per application rather than having to pass a separate argument
>> every time the affected APIs are called.
> 
> The warning mechanism, as far as I know, because I have never dealt with
> it (and do not want to) is for version issues.

No, it's just DeprecationWarning in particular that is specific to
versioning issues. That's obviously the one that comes up most often for
core development, but there are other warnings as well (e.g. the
off-by-default ImportWarning when potential packages are skipped because
__init__.py is missing).

For this particular case, I would suggest adding something like
EnvironmentWarning (to parallel the EnvironmentError that is the common
parent of OSError and IOError).

>  In any case, the snippet
> that you clipped
> 
> try:
>   files = os.listdir(somedir, errors = strict)
> except OSError as e:
>   log(<verbose error message that includes somedir and e>)
>   files = os.listdir(somedir)
> 
> specifically requires a per call parameter.

True, but the decision to have "errors=warn" as the default behaviour is
independent of the decision of whether or not to allow the behaviour to
be changed on a case-by-case basis. There is nothing stopping us from
doing both.

>> And the decoding problems don't pass silently either - they just get
>> emitted as a warning by default instead of causing the application to
>> crash.
> 
> Do they get automatically logged?

By default warnings are written to sys.stderr. Whether that gets logged
or not will depend on the nature of the application

There are also mechanisms in warnings that allow an application to
override the handling of warnings (and for 2.7/3.1, there are mechanisms
in logging to make it easy to hook the warning system and the logging
system together, so that warnings are automatically logged).

>  In any case, the errors parameter has
> an in between option to neither ignore or raise but to replace and give
> *something* printable.

That's true, and why I would actually support doing both. Adding the
warning is a more pressing need though, since it is what will prevent
the errors from passing silently in the default case.

> This situation seems like an ideal situation for a parameter which gives
> the application program who uses Python a range of options to working
> with an un-ideal world.  I am really flabbergasted why there is so much
> opposition to doing so in favor of more difficult or less functional
> alternatives.

A warning will stop the failure from passing silently in the default
case - that's solving a different problem to the one that the error
handling argument will solve. I do agree that being able to override the
handling on a per-call basis could be a useful feature.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From alexander.belopolsky at gmail.com  Mon Dec  8 22:05:08 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 8 Dec 2008 16:05:08 -0500
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493D87BD.90106@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>
Message-ID: <d38f5330812081305m6281b112n12dfa187db27cfb1@mail.gmail.com>

I don't have much to add to Nick's reply other than to point you to
numpy, <http://projects.scipy.org/scipy/numpy>, as a reference
implementation.  You may also get better responses on the numpy list,
< numpy-discussion at scipy.org>.

On Mon, Dec 8, 2008 at 3:46 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Antoine Pitrou wrote:
>> For the one-dimensional case, I had in mind a simple scheme where the Py_buffer
>> struct has an additional two-member Py_ssize_t array. Then `shape` and `strides`
>> can point to the first and second member of this array, respectively. This
>> wouldn't solve the multi-dimensional case, however.
>>
>> Thanks for any ideas on how to solve this.
>
> Actually, I think your suggested scheme for the one-dimensional case
> shows the way forward: ownership of the shape and strides memory belongs
> to the object issuing the Py_buffer struct, and that object needs to
> deal with it when the buffer is released. Defining a larger memory chunk
> with the Py_buffer as the first item and the shape and stride info
> tacked onto the end and returning that from PyObject_GetBuffer() means
> that the shape/stride info will be released automatically when the view
> is released via PyBuffer_Release().
>
> For more complicated cases, the object providing the views may need to
> do some internally bookkeeping to map from Py_buffer pointers to
> separately allocated shape/stride information and release those when the
> views are released.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com
>

From rhamph at gmail.com  Mon Dec  8 22:06:28 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 8 Dec 2008 14:06:28 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <loom.20081208T204431-761@post.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
	<loom.20081208T204431-761@post.gmane.org>
Message-ID: <aac2c7cb0812081306u519e736cl66e28cc210b161ef@mail.gmail.com>

On Mon, Dec 8, 2008 at 1:45 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> M.-A. Lemburg <mal <at> egenix.com> writes:
>>
>> Such application specific error handlers could then also apply
>> whatever fancy round-trip safe encoding of non-decodable bytes
>> to Unicode escapes, private code points, etc. as seen fit by the
>> application.
>
> I'd argue that such fancy round-trip safe error handler should be provided by
> Python. It's not reasonable to expect application coders to come up with their
> own codec variation based on subtle details of the unicode spec.

Except they're clearly NOT part of the unicode spec.

Moreover, whatever tricks you use vary depending on if your garbage
input is from UTF-8, UTF-16, or UTF-32 (or any other arbitrary
encoding, like CP-1252 or Shift-JIS.)

At this point someone suggests we have a type that can store an
arbitrary mix of unicode and bytes, so the undecodable portions stay
in their original form. :P

-- 
Adam Olsen, aka Rhamphoryncus

From ncoghlan at gmail.com  Mon Dec  8 22:06:49 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 07:06:49 +1000
Subject: [Python-Dev] 3.0.1 possibilities
In-Reply-To: <EE8BD06B-9AAA-4AC8-B390-B7E9B2114D86@python.org>
References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com>	<E34F81CD-E973-46C5-B2B0-B1B5BC603BE8@python.org>	<bbaeab100812061710s68976d3dwb9d2541005238a64@mail.gmail.com>	<493BD1F2.5080300@holdenweb.com>	<ca471dc20812070932v6b0ea1ew7f02c8557d33e571@mail.gmail.com>	<493C3BBA.1040106@v.loewis.de>	<493C64B9.2040701@cheimes.de>	<BE66EB72-DB62-48B9-801A-3854E4F339E2@python.org>	<493C70B9.2030601@cheimes.de>	<133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org>	<bbaeab100812081114h1c1a8c14ld82cee0ffd0c75df@mail.gmail.com>	<493D8517.60904@gmail.com>	<loom.20081208T203826-911@post.gmane.org>
	<EE8BD06B-9AAA-4AC8-B390-B7E9B2114D86@python.org>
Message-ID: <493D8C69.4010708@gmail.com>

Barry Warsaw wrote:
> On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote:
> 
>> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>>>
>>> Where would adding a (undocumented) get_filename() method to ZipImporter
>>> objects for the benefit of the -m switch fit then?
> 
>> Why not call it _get_filename() in 3.0 and get_filename() in 3.1?
> 
> +1

Well, with release manager blessing I'll go with that approach then :)

Now, where are those round tuits to actually get it implemented...

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Mon Dec  8 22:12:25 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 21:12:25 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>
Message-ID: <loom.20081208T211114-616@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> Actually, I think your suggested scheme for the one-dimensional case
> shows the way forward: ownership of the shape and strides memory belongs
> to the object issuing the Py_buffer struct, and that object needs to
> deal with it when the buffer is released. Defining a larger memory chunk
> with the Py_buffer as the first item and the shape and stride info
> tacked onto the end and returning that from PyObject_GetBuffer() means
> that the shape/stride info will be released automatically when the view
> is released via PyBuffer_Release().

Ok, so another question: given that this will change the Py_buffer layout a bit,
can it go into 3.0.1 and 2.6.2?




From solipsis at pitrou.net  Mon Dec  8 22:14:46 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 21:14:46 +0000 (UTC)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<493B680C.6010605@gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
	<loom.20081208T204431-761@post.gmane.org>
	<aac2c7cb0812081306u519e736cl66e28cc210b161ef@mail.gmail.com>
Message-ID: <loom.20081208T211313-210@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> 
> Except they're clearly NOT part of the unicode spec.

This is always the same discussion going in circles. I know they're not part of
the unicode spec, but practicality beats purity and if the said error handler
comes with an appropriate warning in the official doc, then why not?

In any case, +1 to Marc-Andr?'s proposal.



From rhamph at gmail.com  Mon Dec  8 22:32:00 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 8 Dec 2008 14:32:00 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493D8B34.1070506@egenix.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>
	<493C0FE1.30506@gmail.com> <ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
	<loom.20081208T204431-761@post.gmane.org>
	<493D8B34.1070506@egenix.com>
Message-ID: <aac2c7cb0812081332w1032859dkbfaf168c3d6af9a7@mail.gmail.com>

On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-12-08 21:45, Antoine Pitrou wrote:
>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>> Such application specific error handlers could then also apply
>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>> to Unicode escapes, private code points, etc. as seen fit by the
>>> application.
>>
>> I'd argue that such fancy round-trip safe error handler should be provided by
>> Python. It's not reasonable to expect application coders to come up with their
>> own codec variation based on subtle details of the unicode spec.
>
> Fair enough. We could add some e.g.
>
>  * a round-trip safe escape error handler that uses a Unicode private
>   code point area which we officially reserve for the Python
>   interpreter

This would of course alter the behaviour of those private code points,
preventing them from round-tripping properly.

I don't think round-tripping can be done from an error handler.  You
need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
bytes.  This has long since gotten repetitive..


>  * a human readable escape error handler that encodes the problem
>   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
>   encoded directory name instead of failing

Similar to '?'.encode('ascii', 'backslashreplace')?  I'm +1 on making that work.


>  * a warning error handler that replaces the problem cases with
>   a question mark and issues a warning through the warning
>   framework

I dub thee errors='warnreplace'.


-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Mon Dec  8 22:36:30 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Mon, 08 Dec 2008 13:36:30 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812081212t34853ae1q6ad3a04f1ddf6544@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>
	<493CF2F5.9000904@gmail.com>	<ghjo94$tv0$1@ger.gmane.org>	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>	<ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>	<Pine.LNX.4.64.0812081504320.1160@kimball.webabinitio.net>
	<ca471dc20812081212t34853ae1q6ad3a04f1ddf6544@mail.gmail.com>
Message-ID: <493D935E.9030800@gmail.com>

Guido van Rossum wrote:
> On Mon, Dec 8, 2008 at 12:07 PM,  <rdmurray at bitdance.com> wrote:
>> On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote:
>> But I'm happy with just issuing a warning by default.  That would mean
>> it doesn't fail silently, but neither does it crash.  Seems like the
>> best compromise with the broken nature of the real world IT
>> environment.
> 
> OK, I can live with that too.
> 
Same here.  This lets the application specify globally what should
happen (exception, warning, ignore via the warnings filters) and should
give enough context that it doesn't become a mysterious error in the
program.

The per method addition of an errors argument so that this isoverridable
locally as well as globally is also a nice touch but can be done
separately from this step.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081208/9a2a1b03/attachment.pgp>

From victor.stinner at haypocalc.com  Mon Dec  8 22:39:18 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 8 Dec 2008 22:39:18 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493D85A0.6060601@egenix.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
Message-ID: <200812082239.18802.victor.stinner@haypocalc.com>

> ('strict', 'ignore', 'replace', 'xmlcharrefreplace')

replace (or xmlcharrefreplace) is just useless because you will not be unable 
to open or rename the file... You just know that there is a strange file in 
the directory.

From ncoghlan at gmail.com  Mon Dec  8 22:42:37 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 07:42:37 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T211114-616@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>
Message-ID: <493D94CD.5040209@gmail.com>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> Actually, I think your suggested scheme for the one-dimensional case
>> shows the way forward: ownership of the shape and strides memory belongs
>> to the object issuing the Py_buffer struct, and that object needs to
>> deal with it when the buffer is released. Defining a larger memory chunk
>> with the Py_buffer as the first item and the shape and stride info
>> tacked onto the end and returning that from PyObject_GetBuffer() means
>> that the shape/stride info will be released automatically when the view
>> is released via PyBuffer_Release().
> 
> Ok, so another question: given that this will change the Py_buffer layout a bit,
> can it go into 3.0.1 and 2.6.2?

No, you misunderstand what I meant. Py_buffer doesn't need to be changed
at all. The *issuing type* would define a new structure with the
additional fields, such as:

struct _my_Py_buffer {
  Py_buffer     view;
  SHAPE_TYPE    shape;
  STRIDES_TYPE  strides;
}

Internally, the object would use these instead of vanilla Py_buffer
objects, and set the shape and strides pointers inside the view field to
refer to the shape and strides fields.

Clients wouldn't need to know or care that the shape and stride
information had been tacked on to the end of the Py_buffer struct. When
the buffer was released via PyBuffer_Release, the object would throw
away the whole _my_Py_buffer structure (since the pointers are the same).

Alexander's suggestion of going and looking at what the numpy folks have
done in this area is probably a good idea too.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From mal at egenix.com  Mon Dec  8 22:44:30 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 08 Dec 2008 22:44:30 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812081332w1032859dkbfaf168c3d6af9a7@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>	<493D85A0.6060601@egenix.com>	<loom.20081208T204431-761@post.gmane.org>	<493D8B34.1070506@egenix.com>
	<aac2c7cb0812081332w1032859dkbfaf168c3d6af9a7@mail.gmail.com>
Message-ID: <493D953E.10107@egenix.com>

On 2008-12-08 22:32, Adam Olsen wrote:
> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2008-12-08 21:45, Antoine Pitrou wrote:
>>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>>> Such application specific error handlers could then also apply
>>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>>> to Unicode escapes, private code points, etc. as seen fit by the
>>>> application.
>>> I'd argue that such fancy round-trip safe error handler should be provided by
>>> Python. It's not reasonable to expect application coders to come up with their
>>> own codec variation based on subtle details of the unicode spec.
>> Fair enough. We could add some e.g.
>>
>>  * a round-trip safe escape error handler that uses a Unicode private
>>   code point area which we officially reserve for the Python
>>   interpreter
> 
> This would of course alter the behaviour of those private code points,
> preventing them from round-tripping properly.
> 
> I don't think round-tripping can be done from an error handler.  You
> need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
> bytes.  This has long since gotten repetitive..

The error handler would just map the problem bytes to the private
area. The application would then have to decide what to do with
them, ie. the error handler only provides one half of the round-
tripping.

And that's on purpose: I don't believe we can come up with some magic
solution for the encodings problem. This is essentially something
that applications will have to solve on a case-by-case basis.

>>  * a human readable escape error handler that encodes the problem
>>   bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1
>>   encoded directory name instead of failing
> 
> Similar to '?'.encode('ascii', 'backslashreplace')?  I'm +1 on making that work.

Yes.

>>  * a warning error handler that replaces the problem cases with
>>   a question mark and issues a warning through the warning
>>   framework
> 
> I dub thee errors='warnreplace'.

Yep, something along those lines.

Perhaps there are more and better alternatives. These suggestions
are just to show how the idea could be put to some real-life use.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From rhamph at gmail.com  Mon Dec  8 22:47:01 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 8 Dec 2008 14:47:01 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ca471dc20812081212t34853ae1q6ad3a04f1ddf6544@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org> <493CF2F5.9000904@gmail.com>
	<ghjo94$tv0$1@ger.gmane.org>
	<Pine.LNX.4.64.0812081326290.1160@kimball.webabinitio.net>
	<ca471dc20812081125n4544b67am182193e4fb207d7@mail.gmail.com>
	<Pine.LNX.4.64.0812081504320.1160@kimball.webabinitio.net>
	<ca471dc20812081212t34853ae1q6ad3a04f1ddf6544@mail.gmail.com>
Message-ID: <aac2c7cb0812081347j2f78e8c3v871c3f3cf63d1a4b@mail.gmail.com>

On Mon, Dec 8, 2008 at 1:12 PM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Dec 8, 2008 at 12:07 PM,  <rdmurray at bitdance.com> wrote:
>> But I'm happy with just issuing a warning by default.  That would mean
>> it doesn't fail silently, but neither does it crash.  Seems like the
>> best compromise with the broken nature of the real world IT
>> environment.
>
> OK, I can live with that too.

+1


-- 
Adam Olsen, aka Rhamphoryncus

From mal at egenix.com  Mon Dec  8 22:47:21 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 08 Dec 2008 22:47:21 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812082239.18802.victor.stinner@haypocalc.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>	<493D85A0.6060601@egenix.com>
	<200812082239.18802.victor.stinner@haypocalc.com>
Message-ID: <493D95E9.4000104@egenix.com>

On 2008-12-08 22:39, Victor Stinner wrote:
>> ('strict', 'ignore', 'replace', 'xmlcharrefreplace')
> 
> replace (or xmlcharrefreplace) is just useless because you will not be unable 
> to open or rename the file... You just know that there is a strange file in 
> the directory.

Right, but that's already a lot better than not knowing of the
file's existence at all :-)

Note that the above are standard error handlers for Unicode
conversions. The rest of the email you cut away has more useful
error handlers for the purpose in question.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 08 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From dato at net.com.org.es  Mon Dec  8 22:45:29 2008
From: dato at net.com.org.es (Adeodato =?utf-8?B?U2ltw7M=?=)
Date: Mon, 8 Dec 2008 22:45:29 +0100
Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode
	(eg.	execution bit)
In-Reply-To: <5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com>
References: <20081208185157.GA19135@chistera.yi.org>
	<5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com>
Message-ID: <20081208214529.GA23974@chistera.yi.org>

* Mark Dickinson [Mon, 08 Dec 2008 20:56:25 +0000]:

> On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Sim? <dato at net.com.org.es> wrote:

> > The attached patch is a possible way to fix this issue. It'd be great if
> > somebody could apply it, or write a more appropriate fix.

> Please could you submit your patch to the bug tracker, at

> http://bugs.python.org

> That way it's less likely to get lost. :)

Ok, submitted as #4602.

Thanks,

-- 
Adeodato Sim?                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
As scarce as truth is, the supply has always been in excess of the demand.
                -- Josh Billings


From guido at python.org  Mon Dec  8 22:54:41 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Dec 2008 13:54:41 -0800
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <200812072206.21908.paul@boddie.org.uk>
References: <200812072206.21908.paul@boddie.org.uk>
Message-ID: <ca471dc20812081354l270f6753r778154d8a4e9c04b@mail.gmail.com>

On Sun, Dec 7, 2008 at 1:06 PM, Paul Boddie <paul at boddie.org.uk> wrote:
> On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote:
>>
>> On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano <warren at delsci.com>
>> wrote:
>> > As someone somewhat knowledgable of how parsers work, I do not
>> > understand why a method/attribute name "object_name.as(...)" must
>> > necessarily conflict with a standalone keyword " as ".  It seems to me
>> > that it should be possible to unambiguously separate the two without
>> > ambiguity or undue complication of the parser.
>>
>> That's possible with sufficiently powerful parser technology, but
>> that's not how the Python parser (and most parsers, in my experience)
>> treat reserved words. Reserved words are reserved in all contexts,
>> regardless of whether ambiguity could arise.
>
> Just a quick aside from someone who merely lurks on this list: in SQL, it's
> quite possible to use keywords in a fashion similar to that desired by the
> inquirer, and it's actually possible to double-quote keywords and use them as
> names for things. I'm not advocating more complicated parsing technology for
> any Python implementation, but I think it's pertinent to point out that the
> technology isn't particularly obscure.

>From my experience with SQL, it's nearly as bad as Python in that
every single one of the 200+ reserved words in a typical
implementation cannot be used as a name in any context without using
double quotes. While the double-quote escape is handy (especially
given there are so many obscure reserved words) this is not exactly
what the OP wanted -- they would have to say x."as"('float'), except
using some other notation instead of double quotes. Having to escape
it completely kills the OP's claim that 'as' is "simplest and most
elegant".

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From gruszczy at gmail.com  Mon Dec  8 22:55:21 2008
From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=)
Date: Mon, 8 Dec 2008 22:55:21 +0100
Subject: [Python-Dev] Self in method body
Message-ID: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>

There is a large discussion on python-list about Guido's article about
new self syntax, therefore I would like to use that to raise similar
question: self in the body. Some time ago I was coding in Magik
language (http://en.wikipedia.org/wiki/Magik_(programming_language),
which is dynamically typed and similar to Smalltalk and actually to
Python too - although the syntax is far less appalling. As you can see
in the examples, defining methods is very similar to what Guido
proposed in his blog, though you don't provide the name of the
argument, but the name of the class. Then you just precede attributes
with a '.', which is 4 letters less than self. And, well, this rocks
;-)

It is really not a problem to type 4 letters (well, six with a coma
and a space) in the signature, but it takes a lot of time to type all
those selfs inside the function's body. So I was thinking, if this
issue could be raised too, when new self syntax is proposed. Simple
example looks like this:

class bar:

   def bar.foo():
      .x = 5

This could really save a lot of code, while attributes are still
easily distinguishable.

-- 
Filip Gruszczy?ski

From guido at python.org  Mon Dec  8 23:07:49 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 8 Dec 2008 14:07:49 -0800
Subject: [Python-Dev] Nonlocal shortcut
In-Reply-To: <e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>
References: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
	<e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>
Message-ID: <ca471dc20812081407t14139758p5335414cc8a4a2ba@mail.gmail.com>

On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
> Hello,
>
> Fabio Zadrozny  wrote:
>> Hi,
>>
>> I'm currently implementing a parser to handle Python 3.0, and one of
>> the points I found conflicting with the grammar specification is the
>> PEP 3104.
>>
>> It says that a shortcut would be added to Python 3.0 so that "nonlocal
>> x = 0" can be written. However, the latest grammar specification
>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
>> doesn't seem to take that into account... So, can someone enlighten me
>> on what should be the correct treatment for that on a grammar that
>> wants to support Python 3.0?
>
> An issue was already filed about this:
> http://bugs.python.org/issue4199
> It should be ready for inclusion in 3.0.1.

No it should not. It should be put in 3.1.

I strongly object against the addition of features of *any* kind to
3.0.1, no matter whether they were promised or announced in a PEP or
in the docs or on the 8 o'clock news.  This would make 3.0.0 forever a
"loser" release.

(I find the removal of 'cmp' hard to swallow too, but in a sense the
addition of features is worse, as it makes downgrading a risk.
Upgrades, no matter how minimal, always represent risks -- however
downgrading shouldn't represent risks, unless you happen to depend on
a bugfix that wasn't present in the downgrade -- but we're not talking
about a bugfix here no matter how you bend the English language.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From paul at boddie.org.uk  Mon Dec  8 23:18:55 2008
From: paul at boddie.org.uk (Paul Boddie)
Date: Mon, 8 Dec 2008 23:18:55 +0100
Subject: [Python-Dev] "as" keyword woes
In-Reply-To: <ca471dc20812081354l270f6753r778154d8a4e9c04b@mail.gmail.com>
References: <200812072206.21908.paul@boddie.org.uk>
	<ca471dc20812081354l270f6753r778154d8a4e9c04b@mail.gmail.com>
Message-ID: <200812082318.55524.paul@boddie.org.uk>

On Monday 08 December 2008 22:54:41 Guido van Rossum wrote:
>
> From my experience with SQL, it's nearly as bad as Python in that
> every single one of the 200+ reserved words in a typical
> implementation cannot be used as a name in any context without using
> double quotes.

SQL is a big language; I won't disagree with that! That said, you don't always 
have to quote names like "end" as I mention below.

> While the double-quote escape is handy (especially 
> given there are so many obscure reserved words) this is not exactly
> what the OP wanted -- they would have to say x."as"('float'), except
> using some other notation instead of double quotes. Having to escape
> it completely kills the OP's claim that 'as' is "simplest and most
> elegant".

You can do what the OP wants, at least in PostgreSQL, which is fairly 
conformant. As I wrote on comp.lang.python...

create table "create" (
  "select" varchar
);

select "select" from "create";
select "create".select from "create";

(This from a PostgreSQL 8.2 session.)

I don't know whether SQL 1992 actually allows dropping the double-quotes for 
column names, but this is the kind of thing he has in mind.

Paul

From rhamph at gmail.com  Mon Dec  8 23:25:03 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 8 Dec 2008 15:25:03 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493D953E.10107@egenix.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<ghhem0$doj$1@ger.gmane.org>
	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<ghhnlr$6lf$1@ger.gmane.org>
	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
	<loom.20081208T204431-761@post.gmane.org>
	<493D8B34.1070506@egenix.com>
	<aac2c7cb0812081332w1032859dkbfaf168c3d6af9a7@mail.gmail.com>
	<493D953E.10107@egenix.com>
Message-ID: <aac2c7cb0812081425i5ca23599yfaf121b3301c0d88@mail.gmail.com>

On Mon, Dec 8, 2008 at 2:44 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-12-08 22:32, Adam Olsen wrote:
>> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 2008-12-08 21:45, Antoine Pitrou wrote:
>>>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>>>> Such application specific error handlers could then also apply
>>>>> whatever fancy round-trip safe encoding of non-decodable bytes
>>>>> to Unicode escapes, private code points, etc. as seen fit by the
>>>>> application.
>>>> I'd argue that such fancy round-trip safe error handler should be provided by
>>>> Python. It's not reasonable to expect application coders to come up with their
>>>> own codec variation based on subtle details of the unicode spec.
>>> Fair enough. We could add some e.g.
>>>
>>>  * a round-trip safe escape error handler that uses a Unicode private
>>>   code point area which we officially reserve for the Python
>>>   interpreter
>>
>> This would of course alter the behaviour of those private code points,
>> preventing them from round-tripping properly.
>>
>> I don't think round-tripping can be done from an error handler.  You
>> need a full codec to do it.  A simple option is 8859-1.  Or, ya know,
>> bytes.  This has long since gotten repetitive..
>
> The error handler would just map the problem bytes to the private
> area. The application would then have to decide what to do with
> them, ie. the error handler only provides one half of the round-
> tripping.

By that point it's already too late.  You've already conflated garbage
PUA with legitimate PUA.

To make it work you need to treat those legitimate PUA scalars as
errors too, transforming them.  A common example is how escaping
replaces a single '\' with '\\'.

Hrm.  nul-escaping should work.  Obviously it can't be used outside
the filesystem though, as they may introduce a legitimate nul.


-- 
Adam Olsen, aka Rhamphoryncus

From ironfroggy at gmail.com  Mon Dec  8 23:52:12 2008
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Mon, 8 Dec 2008 17:52:12 -0500
Subject: [Python-Dev] Nonlocal shortcut
In-Reply-To: <ca471dc20812081407t14139758p5335414cc8a4a2ba@mail.gmail.com>
References: <cfb578b20812071346o15288b7bqc4d16a1fb3847f1@mail.gmail.com>
	<e27efe130812071445o4ada427fx42bfba551aa23d4@mail.gmail.com>
	<ca471dc20812081407t14139758p5335414cc8a4a2ba@mail.gmail.com>
Message-ID: <76fd5acf0812081452h26ee56aaxbe736013ca6458b2@mail.gmail.com>

Did the original PEP discussion cover debates about the shortcut
working for all assignment operators (like += and x[i] =) and the
difference between it being one-shot (doesnt affect x for the rest of
the function) or simply the unrolling into nonlocal x; x= y as it is?

On Mon, Dec 8, 2008 at 5:07 PM, Guido van Rossum <guido at python.org> wrote:
> On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc <amauryfa at gmail.com> wrote:
>> Hello,
>>
>> Fabio Zadrozny  wrote:
>>> Hi,
>>>
>>> I'm currently implementing a parser to handle Python 3.0, and one of
>>> the points I found conflicting with the grammar specification is the
>>> PEP 3104.
>>>
>>> It says that a shortcut would be added to Python 3.0 so that "nonlocal
>>> x = 0" can be written. However, the latest grammar specification
>>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar)
>>> doesn't seem to take that into account... So, can someone enlighten me
>>> on what should be the correct treatment for that on a grammar that
>>> wants to support Python 3.0?
>>
>> An issue was already filed about this:
>> http://bugs.python.org/issue4199
>> It should be ready for inclusion in 3.0.1.
>
> No it should not. It should be put in 3.1.
>
> I strongly object against the addition of features of *any* kind to
> 3.0.1, no matter whether they were promised or announced in a PEP or
> in the docs or on the 8 o'clock news.  This would make 3.0.0 forever a
> "loser" release.
>
> (I find the removal of 'cmp' hard to swallow too, but in a sense the
> addition of features is worse, as it makes downgrading a risk.
> Upgrades, no matter how minimal, always represent risks -- however
> downgrading shouldn't represent risks, unless you happen to depend on
> a bugfix that wasn't present in the downgrade -- but we're not talking
> about a bugfix here no matter how you bend the English language.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com
>



-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

From steve at pearwood.info  Mon Dec  8 23:52:20 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 9 Dec 2008 09:52:20 +1100
Subject: [Python-Dev] Self in method body
In-Reply-To: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>
References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>
Message-ID: <200812090952.21310.steve@pearwood.info>

On Tue, 9 Dec 2008 08:55:21 am Filip Gruszczy?ski wrote:
> There is a large discussion on python-list about Guido's article
> about new self syntax, therefore I would like to use that to raise
> similar question: self in the body. Some time ago I was coding in
> Magik language
> (http://en.wikipedia.org/wiki/Magik_(programming_language), which is
> dynamically typed and similar to Smalltalk and actually to Python too
> - although the syntax is far less appalling. As you can see in the
> examples, defining methods is very similar to what Guido proposed in
> his blog, though you don't provide the name of the argument, but the
> name of the class. Then you just precede attributes with a '.', which
> is 4 letters less than self. And, well, this rocks ;-)
>
> It is really not a problem to type 4 letters (well, six with a coma
> and a space) in the signature, but it takes a lot of time to type all
> those selfs inside the function's body. 

For some definition of "a lot".

I've just grabbed a random, heavily OO module from my own code library. 
It has 60 instances of "self", or 240 characters, out of 18,839 
characters in total (including newlines). Removing self will decrease 
the number of my keystrokes and the amount of pure typing time 
(excluding thinking time, debugging time) by about 1.2%. I don't call 
that "a lot" -- it's actually quite small. And it becomes vanishingly 
trivial when you factor in that most of the time spent programming is 
not typing but thinking, testing, debugging, etc.

Doing the same calculation for BaseHTTPServer.py and SimpleHTTPServer.py 
in the standard library, I get 1.9% and 2.0% respectively.


> This could really save a lot of code, while attributes are still
> easily distinguishable.

I don't think so.



-- 
Steven

From gruszczy at gmail.com  Tue Dec  9 00:18:49 2008
From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=)
Date: Tue, 9 Dec 2008 00:18:49 +0100
Subject: [Python-Dev] Self in method body
In-Reply-To: <200812090952.21310.steve@pearwood.info>
References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>
	<200812090952.21310.steve@pearwood.info>
Message-ID: <1be78d220812081518s7e064430v414da0a73399ac5f@mail.gmail.com>

> I've just grabbed a random, heavily OO module from my own code library.
> It has 60 instances of "self", or 240 characters, out of 18,839
> characters in total (including newlines). Removing self will decrease
> the number of my keystrokes and the amount of pure typing time
> (excluding thinking time, debugging time) by about 1.2%. I don't call
> that "a lot" -- it's actually quite small. And it becomes vanishingly
> trivial when you factor in that most of the time spent programming is
> not typing but thinking, testing, debugging, etc.

Well, maybe I don't program in Python the "right way" ;-), because
it's a bit more in my code. I repeated this test, and for a random
module holding some GUI stuff (built using PyQt) and it's more than 5%
(213 selfs out of 16204 characters). With a small app for creating
dungeon tiles for role playing games I astonishingly got same very
similar value (484 * 4 / 35000) ;-) Maybe it's a feature of
programming with a lot of gui stuff, which I do. But 1 of the 20 chars
used for a self is quite a lot for me.

-- 
Filip Gruszczy?ski

From solipsis at pitrou.net  Tue Dec  9 00:25:20 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 8 Dec 2008 23:25:20 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>
Message-ID: <loom.20081208T231050-480@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> No, you misunderstand what I meant. Py_buffer doesn't need to be changed
> at all. The *issuing type* would define a new structure with the
> additional fields, such as:

With to the current buffer API, this is not possible. It's the caller who
allocates the Py_buffer struct (usually on the stack), not the callee. Therefore
the callee (e.g. the getbufferproc of the issuing type) cannot choose to
allocate a different structure.

(of course complex schemes can be devised where the callee maintains its own
separate storage for shape and strides, but I don't think we want to go there)

> Alexander's suggestion of going and looking at what the numpy folks have
> done in this area is probably a good idea too.

Well, I'm open to others doing this, but I won't do it myself. My interest is in
fixing the most glaring bugs of the buffer API and memoryview object. The numpy
folks are welcome to voice their opinions and give advice on python-dev.

Regards

Antoine.



From tjreedy at udel.edu  Tue Dec  9 00:58:09 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 08 Dec 2008 18:58:09 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493D85A0.6060601@egenix.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<ghhnlr$6lf$1@ger.gmane.org>	<ca471dc20812081026i7fe4b609yf5abea2f5249fe6b@mail.gmail.com>
	<493D85A0.6060601@egenix.com>
Message-ID: <ghkcah$4mv$1@ger.gmane.org>

M.-A. Lemburg wrote:

>> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:

>>> try:
>>>  files = os.listdir(somedir, errors = strict)
>>> except OSError as e:
>>>  log(<verbose error message that includes somedir and e>)
>>>  files = os.listdir(somedir)

  > If that error parameter is the same as in unicode(value, errors),
> then this would be a useful feature:

Except that unicode becomes str in 3.0, that is exactly my intention.

> People could then choose among the already existing error handlers
> ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register
> their own ones via the codecs module.

These could be passed through from listdir or getenv to str.

[Side questions:
1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. 
Should it be or is 'xmlcharrefreplace' an addition for a later version.
2. A garbage value for errors (such as 'blah') is silently ignored (so I 
cannot test the above).  Intended or a bug?]

Someone else proposed a new option 'warn', which Guido has accepted to 
be the default instead of the current 'ignore'.  It could not be passed 
through (unless str were changed or something registered).  I believe 
the implementation of that would be to call str with 'strict' but catch 
errors and warn instead.  Whether there should be 1 warning for each 
problematic bytes encountered or 1 for each listdir (or whatever) call, 
possibly with the number of problems, I leave to others to decide.

> Such application specific error handlers could then also apply
> whatever fancy round-trip safe encoding of non-decodable bytes
> to Unicode escapes, private code points, etc. as seen fit by the
> application.
> 
> Perhaps we should also add an ''encoding'' parameter that can be
> set on a per directory basis (if necessary) and defaults to the
> global file system encoding.

That could also be passed through, but I will lets others make the 
argument for it.
> 
> If an application hits directory that is known to cause problems,
> it could then chose to receive the file names in a different,
> more suitable encoding. This allows implementing fallback
> mechanisms with a list of common encodings for a locale.

Terry Jan Reedy



From tjreedy at udel.edu  Tue Dec  9 01:18:26 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 08 Dec 2008 19:18:26 -0500
Subject: [Python-Dev] Self in method body
In-Reply-To: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>
References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com>
Message-ID: <ghkdgj$80b$1@ger.gmane.org>

Filip Gruszczy?ski wrote:
> There is a large discussion on python-list about Guido's article about

That discussion should stay there.

> new self syntax, therefore I would like to use that to raise similar
> question: self in the body.

That has also be heavily discussed, many times, there and here.

> ... Then you just precede attributes with a '.',

Guido has specifically rejected that, more than once, I believe.

 > which is 4 letters less than self.

As has been said *many* times in previous discussions, you can use 1 
letter intead of 4 if you really wish, if saving keystrokes is your 
highest priority.  But please don't rehash these discussions, at least 
not here.

Terry Jan Reedy


From amk at amk.ca  Tue Dec  9 03:53:17 2008
From: amk at amk.ca (A.M. Kuchling)
Date: Mon, 8 Dec 2008 21:53:17 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <bbaeab100812061442j10a30baat3caeb922eb6c93e8@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
	<bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
	<4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>
	<bbaeab100812061442j10a30baat3caeb922eb6c93e8@mail.gmail.com>
Message-ID: <20081209025317.GA1080@amk.local>

On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote:
> No, I am saying I had told AMK I was interested in championing the
> session. He chose you, and that's that. One less thing for me to worry
> about. =)

Brett, I actually think you'd be a good champion for the 11AM
transition-planning session.  As a reminder, the topics came up with
were:

Transition plan for rest of 2.x series; goals for 2.7/3.1.
- New features & future plans?
- Is 2.7 last of the 2.x releases?
- Unicode issues
- Stdlib plans?

(Possibly this is too much material for one session, and something
will have to be pruned.)

--amk

From alexander.belopolsky at gmail.com  Tue Dec  9 04:01:18 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 8 Dec 2008 22:01:18 -0500
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T231050-480@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
Message-ID: <d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>

On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
..
>> Alexander's suggestion of going and looking at what the numpy folks have
>> done in this area is probably a good idea too.
>
> Well, I'm open to others doing this, but I won't do it myself. My interest is in
> fixing the most glaring bugs of the buffer API and memoryview object. The numpy
> folks are welcome to voice their opinions and give advice on python-dev.
>

I did not follow numpy development for the last year or more, so I
won't qualify as "the numpy folks," but my understanding is that numpy
does exactly what Nick recommended: the viewed object owns shape and
strides just as it owns the data.  The viewing object increases the
reference count of the viewed object and thus assures that data, shape
and strides don't go away prematurely.

I am copying Travis, the author of the PEP 3118, hoping that he would
step in on behalf of "the numpy folks."

From brett at python.org  Tue Dec  9 04:31:56 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 8 Dec 2008 19:31:56 -0800
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <20081209025317.GA1080@amk.local>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
	<bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
	<4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>
	<bbaeab100812061442j10a30baat3caeb922eb6c93e8@mail.gmail.com>
	<20081209025317.GA1080@amk.local>
Message-ID: <bbaeab100812081931l71e903bbwb9cb818a050ca687@mail.gmail.com>

On Mon, Dec 8, 2008 at 18:53, A.M. Kuchling <amk at amk.ca> wrote:
> On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote:
>> No, I am saying I had told AMK I was interested in championing the
>> session. He chose you, and that's that. One less thing for me to worry
>> about. =)
>
> Brett, I actually think you'd be a good champion for the 11AM
> transition-planning session.

OK, so I guess I do have one more thing to worry about. =) I'd be
happy to do that session.

> As a reminder, the topics came up with
> were:
>
> Transition plan for rest of 2.x series; goals for 2.7/3.1.
> - New features & future plans?
> - Is 2.7 last of the 2.x releases?
> - Unicode issues
> - Stdlib plans?

Probably the last two will be wishy-washy in terms of whether they
will be reached.

-Brett

From greg.ewing at canterbury.ac.nz  Tue Dec  9 06:46:14 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 09 Dec 2008 18:46:14 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T231050-480@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
Message-ID: <493E0626.3090301@canterbury.ac.nz>

Antoine Pitrou wrote:

> (of course complex schemes can be devised where the callee maintains its own
> separate storage for shape and strides, but I don't think we want to go there)

But that's exactly where you're supposed to be going.
If the object providing the buffer has variable-sized
shape and strides arrays, it has to manage the memory
for them somehow.

-- 
Greg

From v+python at g.nevcal.com  Tue Dec  9 07:20:15 2008
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Mon, 08 Dec 2008 22:20:15 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <Pine.LNX.4.64.0812081215460.1160@kimball.webabinitio.net>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>
	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>
	<Pine.LNX.4.64.0812081215460.1160@kimball.webabinitio.net>
Message-ID: <493E0E1F.4090009@g.nevcal.com>

On approximately 12/8/2008 9:30 AM, came the following characters from 
the keyboard of rdmurray at bitdance.com:

> If warnings were emitted, then files would not be silently ignored,
> yet the program could still be used.


Yep, this is sounding useful.


> PS: I'd like to see a similar warning issued when an access attempt
> is made through os.environ to a variable that cannot be decoded.


And argv ?  Seems like the warning technique could be useful for _any_ 
interface that has been traditionally bytes, because that's the kind of 
characters that were, but now should move to (Unicode) characters.

The warnings could be the same, or very similar.

The question is if one global control should handle all types of bytes 
problems, or if there should be individual controls for each bytes 
problem, or both.  I tend to believe in both; the paranoid can set 
exactly the ones they've coded for, the aggressive can set the global 
one.  In this manner, new cases can be added to the global settings over 
time, if more are discovered -- it should be documented to handle future 
similar issues in a similar manner.


-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

From ajm at flonidan.dk  Tue Dec  9 09:41:09 2008
From: ajm at flonidan.dk (Anders J. Munch)
Date: Tue, 9 Dec 2008 09:41:09 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghkcah$4mv$1@ger.gmane.org>
Message-ID: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net>

On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>> try:
>>>  files = os.listdir(somedir, errors = strict)
>>> except OSError as e:
>>>  log(<verbose error message that includes somedir and e>)
>>>  files = os.listdir(somedir)

Instead of a codecs error handler name, how about a callback for
converting bytes to str?

os.listdir(somedir, decoder=bytes.decode)
os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, errors='xmlcharrefreplace'))
os.listdir(somedir, decoder=repr)

ISTM that would be simpler and more flexible than going over the
codecs registry.  One caveat though is that there's no obvious way of
telling listdir to skip a name.  But if the default behaviour for
decoder=None is to skip with a warning, then the need to explicitly
ask for files to be skipped would be small.

Terry's example would then be:

>>> try:
>>>  files = os.listdir(somedir, decoder=bytes.decode)
>>> except UnicodeDecodeError as e:
>>>  log(<verbose error message that includes somedir and e>)
>>>  files = os.listdir(somedir)

- Anders

From ncoghlan at gmail.com  Tue Dec  9 10:01:17 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 19:01:17 +1000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493E0E1F.4090009@g.nevcal.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<4939CBDB.30305@gmail.com>	<EC9F52C2-E6ED-4163-8459-B3783D099230@fuhm.net>	<20081206143454.GA15293@phd.pp.ru>	<20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com>	<ca471dc20812061113n3c62857ds865e1b43757d0368@mail.gmail.com>	<493B680C.6010605@gmail.com>	<20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com>	<493C0FE1.30506@gmail.com>	<ghhem0$doj$1@ger.gmane.org>	<ca471dc20812071333l6a588d19i3b7d535cc0dbfe53@mail.gmail.com>	<Pine.LNX.4.64.0812081215460.1160@kimball.webabinitio.net>
	<493E0E1F.4090009@g.nevcal.com>
Message-ID: <493E33DD.5010604@gmail.com>

Glenn Linderman wrote:
> On approximately 12/8/2008 9:30 AM, came the following characters from
> the keyboard of rdmurray at bitdance.com:
>> PS: I'd like to see a similar warning issued when an access attempt
>> is made through os.environ to a variable that cannot be decoded.
> 
> 
> And argv ?  Seems like the warning technique could be useful for _any_
> interface that has been traditionally bytes, because that's the kind of
> characters that were, but now should move to (Unicode) characters.
> 
> The warnings could be the same, or very similar.
> 
> The question is if one global control should handle all types of bytes
> problems, or if there should be individual controls for each bytes
> problem, or both.  I tend to believe in both; the paranoid can set
> exactly the ones they've coded for, the aggressive can set the global
> one.  In this manner, new cases can be added to the global settings over
> time, if more are discovered -- it should be documented to handle future
> similar issues in a similar manner.

The warnings system provides that level of granularity for 'free' (so
long as we set the stack level appropriately in the C-API warnings call).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Tue Dec  9 10:07:53 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 19:07:53 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T231050-480@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>
Message-ID: <493E3569.6010408@gmail.com>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> No, you misunderstand what I meant. Py_buffer doesn't need to be changed
>> at all. The *issuing type* would define a new structure with the
>> additional fields, such as:
> 
> With to the current buffer API, this is not possible. It's the caller who
> allocates the Py_buffer struct (usually on the stack), not the callee. Therefore
> the callee (e.g. the getbufferproc of the issuing type) cannot choose to
> allocate a different structure.
> 
> (of course complex schemes can be devised where the callee maintains its own
> separate storage for shape and strides, but I don't think we want to go there)

In that case, as Greg noted, this is exactly what the callee should be
doing. Maintaining a PyDict instance to map from view pointers to shapes
and strides info doesn't strike me as a "complex scheme" though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From mal at egenix.com  Tue Dec  9 10:22:59 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 09 Dec 2008 10:22:59 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net>
References: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net>
Message-ID: <493E38F3.7020002@egenix.com>

On 2008-12-09 09:41, Anders J. Munch wrote:
> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>> try:
>>>>  files = os.listdir(somedir, errors = strict)
>>>> except OSError as e:
>>>>  log(<verbose error message that includes somedir and e>)
>>>>  files = os.listdir(somedir)
> 
> Instead of a codecs error handler name, how about a callback for
> converting bytes to str?
> 
> os.listdir(somedir, decoder=bytes.decode)
> os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, errors='xmlcharrefreplace'))
> os.listdir(somedir, decoder=repr)
> 
> ISTM that would be simpler and more flexible than going over the
> codecs registry.  One caveat though is that there's no obvious way of
> telling listdir to skip a name.  But if the default behaviour for
> decoder=None is to skip with a warning, then the need to explicitly
> ask for files to be skipped would be small.
> 
> Terry's example would then be:
> 
>>>> try:
>>>>  files = os.listdir(somedir, decoder=bytes.decode)
>>>> except UnicodeDecodeError as e:
>>>>  log(<verbose error message that includes somedir and e>)
>>>>  files = os.listdir(somedir)

Well, this is not too far away from just putting the whole decoding
logic into the application directly:

files = [filename.decode(filesystemencoding, errors='warnreplace')
         for filename in os.listdir(dir)]

(or os.listdirb() if that's where the discussion is heading)

... and that also tells us something about this discussion: we're
trying to come up with some magic to work around writing two
lines of Python code.

I'd just have all the os APIs return bytes and leave whatever
conversion to Unicode might be necessary to a higher level API.

Think of it: You really only need the Unicode values if you
ever want to output those values in text form somewhere.

In those cases, it's usually a human reading a log file or
screen output. Most other cases, just care about getting
some form of file identifier in order to open the file
and don't really care about the encoding of the file name
at all.

It's probably better to have a two helper functions in the os module
that take care of the conversion on demand rather than trying
to force this conversion even in cases where the application
never really needs to write the filename somewhere, e.g.
os.decodefilename() and os.encodefilename().

These should then provide some reasonable default logic, e.g.
use a 'warnreplace' error handler. Applications are then
free to use these converters or implement their own.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From nd at perlig.de  Tue Dec  9 10:42:32 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Tue, 9 Dec 2008 10:42:32 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493E38F3.7020002@egenix.com>
References: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net>
	<493E38F3.7020002@egenix.com>
Message-ID: <200812091042.32911.nd@perlig.de>

* M.-A. Lemburg wrote: 


> On 2008-12-09 09:41, Anders J. Munch wrote:
> > On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> >>>> try:
> >>>>  files = os.listdir(somedir, errors = strict)
> >>>> except OSError as e:
> >>>>  log(<verbose error message that includes somedir and e>)
> >>>>  files = os.listdir(somedir)
> >
> > Instead of a codecs error handler name, how about a callback for
> > converting bytes to str?
> >
> > os.listdir(somedir, decoder=bytes.decode)
> > os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding,
> > errors='xmlcharrefreplace')) os.listdir(somedir, decoder=repr)
> >
> > ISTM that would be simpler and more flexible than going over the
> > codecs registry.  One caveat though is that there's no obvious way of
> > telling listdir to skip a name.  But if the default behaviour for
> > decoder=None is to skip with a warning, then the need to explicitly
> > ask for files to be skipped would be small.
> >
> > Terry's example would then be:
> >>>> try:
> >>>>  files = os.listdir(somedir, decoder=bytes.decode)
> >>>> except UnicodeDecodeError as e:
> >>>>  log(<verbose error message that includes somedir and e>)
> >>>>  files = os.listdir(somedir)
>
> Well, this is not too far away from just putting the whole decoding
> logic into the application directly:
>
> files = [filename.decode(filesystemencoding, errors='warnreplace')
>          for filename in os.listdir(dir)]
>
> (or os.listdirb() if that's where the discussion is heading)
>
> ... and that also tells us something about this discussion: we're
> trying to come up with some magic to work around writing two
> lines of Python code.
>
> I'd just have all the os APIs return bytes and leave whatever
> conversion to Unicode might be necessary to a higher level API.

[...]

What I'm saying ;-)

+1.

nd

From ajm at flonidan.dk  Tue Dec  9 12:04:48 2008
From: ajm at flonidan.dk (Anders J. Munch)
Date: Tue, 9 Dec 2008 12:04:48 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <493E38F3.7020002@egenix.com>
Message-ID: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net>

M.-A. Lemburg wrote:
> 
> Well, this is not too far away from just putting the whole decoding
> logic into the application directly:
> 
> files = [filename.decode(filesystemencoding, errors='warnreplace')
>          for filename in os.listdir(dir)]
> 
> (or os.listdirb() if that's where the discussion is heading)

I see what you mean, and yes, I think os.listdirb will do just as
well.  There is no need for any extra parameters to os.listdir.  The
typical application will just obliviously use os.listdir(dir) and get
the default elide-and-warn behaviour for un-decodable names.  That
rare special application that needs more control can use os.listdirb
and handle decoding itself.

Using a global registry of error handlers would just get in the way of
an application that needs more control.

- Anders

From solipsis at pitrou.net  Tue Dec  9 12:21:43 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Dec 2008 11:21:43 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
Message-ID: <loom.20081209T112013-381@post.gmane.org>

Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:
> 
> I did not follow numpy development for the last year or more, so I
> won't qualify as "the numpy folks," but my understanding is that numpy
> does exactly what Nick recommended: the viewed object owns shape and
> strides just as it owns the data.  The viewing object increases the
> reference count of the viewed object and thus assures that data, shape
> and strides don't go away prematurely.

That doesn't work if e.g. you take a slice of a memoryview object, since the
shape changes in the process.
See http://bugs.python.org/issue4580




From ncoghlan at gmail.com  Tue Dec  9 13:33:53 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 22:33:53 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T112013-381@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
Message-ID: <493E65B1.5020004@gmail.com>

Antoine Pitrou wrote:
> Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

I have zero problem whatsoever if slice assignment TO a memoryview
object is permitted only if the shape stays the same (i.e. I think that
issue should be closed as "not a bug").

The buffer protocol permits you to edit the DATA held by another object.
It doesn't let you edit the *structure* of that object (which is what
would be implied by changing the shape of the object).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Tue Dec  9 14:37:11 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 09 Dec 2008 23:37:11 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T112013-381@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
Message-ID: <493E7487.3050300@gmail.com>

Antoine Pitrou wrote:
> Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

Note that the PEP is unambiguous as to who owns the pointers in the view
object:
"The exporter is responsible for making sure that any memory pointed to
by buf, format, shape, strides, and suboffsets is valid until
releasebuffer is called. If the exporter wants to be able to change an
object's shape, strides, and/or suboffsets before releasebuffer is
called then it should allocate those arrays when getbuffer is called
(pointing to them in the buffer-info structure provided) and free them
when releasebuffer is called."

The problem with memoryview appears to be related to the way it
calculates its own length (since that is the check that is failing when
the view blows up):

>>> a = array('i', range(10))
>>> m = memoryview(a)
>>> len(m) # This is the length in bytes, which is WRONG!
40
>>> m2 = memoryview(a)[2:8]
>>> len(m2) # This is correct
6
>>> a2 = array('i', range(6))
>>> m[:] = a    # But this works
>>> m2[:] = a2  # and this does not
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot modify size of memoryview object
>>> len(memoryview(a2)) # Ah, 24 != 6 is our problem!
24

Looks to me like there are a couple of bugs here:

The first is that memoryview is treating the len field in the Py_buffer
struct as the number of objects in the view in a few places instead of
as the total number of bytes being exposed (it is actually the latter,
as defined in PEP 3118).

The second is that the getbuf implementation in array.array is broken.
It is ONLY OK for shape to be null when ndim=0 (i.e. a scalar value). An
array is NOT a scalar value, so the array objects should be setting the
shape pointer to point to an single item array (where shape[0] is the
length of the array).

memoryview can then be fixed to use shape[0] instead of len to get the
number of objects in the view.

memoryview also currently gets the shape wrong on slices:

>>> m.shape
(10,)
>>> m2.shape
(10,)


Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Tue Dec  9 15:27:56 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 09 Dec 2008 15:27:56 +0100
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493E65B1.5020004@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
Message-ID: <1228832876.18857.11.camel@localhost>

Le mardi 09 d?cembre 2008 ? 22:33 +1000, Nick Coghlan a ?crit :
> I have zero problem whatsoever if slice assignment TO a memoryview
> object is permitted only if the shape stays the same (i.e. I think that
> issue should be closed as "not a bug").

I'm not even talking about slice /assignment/ here, just read-only
slicing.
Slicing a memoryview must produce another memoryview with a different
shape but with the same underlying object. That's why I have to modify
the shape field /after/ the new Py_buffer is initialized.

> The buffer protocol permits you to edit the DATA held by another
> object. It doesn't let you edit the *structure* of that object

Perhaps, but it's necessary for slicing.

> The first is that memoryview is treating the len field in the
> Py_buffer struct as the number of objects in the view in a few places
> instead of as the total number of bytes being exposed (it is actually
> the latter, as defined in PEP 3118).

I don't understand the difference between "the number of objects in the
view" and "the total number of bytes being exposed". For me it should be
the same and the "buf" and "len" fields in the Py_buffer should be
usable by any other C function, otherwise they are useless.

> memoryview also currently gets the shape wrong on slices:

I know, that's what I'm trying to fix...




From solipsis at pitrou.net  Tue Dec  9 15:56:42 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Dec 2008 14:56:42 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost>
Message-ID: <loom.20081209T145152-685@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:
> 
> > The first is that memoryview is treating the len field in the
> > Py_buffer struct as the number of objects in the view in a few places
> > instead of as the total number of bytes being exposed (it is actually
> > the latter, as defined in PEP 3118).
> 
> I don't understand the difference between "the number of objects in the
> view" and "the total number of bytes being exposed". For me it should be
> the same and the "buf" and "len" fields in the Py_buffer should be
> usable by any other C function, otherwise they are useless.

Sorry, I had misread your message. Yes, indeed "len" should the number of bytes,
not the number of objects. This is also solved as part of the patch I proposed
in the aforementioned bug entry.

Regards

Antoine.



From rdmurray at bitdance.com  Tue Dec  9 17:38:18 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Tue, 9 Dec 2008 11:38:18 -0500 (EST)
Subject: [Python-Dev] RELEASED Python 3.0 final
In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
References: <A45F9DC2-2114-4B81-9C69-37EAA3A356C9@python.org>
	<E78372F56890411285B0E90CFD1D3DDD@RaymondLaptop1>
	<79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com>
	<F015775A5EE84A38AB44F7C02D3F9C2F@RaymondLaptop1>
	<20081205023514.GA1723@amk.local>
	<20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com>
	<ca471dc20812042016m46f68638i6c8fd4c8ccb0643d@mail.gmail.com>
	<20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com>
	<ca471dc20812051010l3bc4ca5aqfa3e6e60a0208b10@mail.gmail.com>
	<20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com>
	<ca471dc20812060954p578d55acj95aba6fc18bafc4a@mail.gmail.com>
	<20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com>
Message-ID: <Pine.LNX.4.64.0812091105060.1160@kimball.webabinitio.net>

On Sat, 6 Dec 2008 at 20:19, glyph at divmod.com wrote:
> On 05:54 pm, guido at python.org wrote:
>> On Fri, Dec 5, 2008 at 9:28 PM,  <glyph at divmod.com> wrote:
>> Whenever someone asks me which version to use, I alwasys respond with
>> a question -- what do you want to use it for?
>
> In the longer term, I think that you should look at this as a symptom of a 
> problem.  If you learn Java, you learn the most recent version.  If you need 
> your software to work with an older version, you just pass a special option

Sometimes this even works.  But it isn't always easy to get it right,
and if you are mixing libraries....well, in my real-world experience we
wound up upgrading the VM.

> to the compiler.  If you want your *old* software to work with a *new* 
> version, it basically just does (at least, 99% of the time).

If you specify the source option correctly.

It seems to me that 3to2 and 2to3 are the python equivalent to the javac
'target' and 'source' options.  Like Guido said, the python community
just doesn't have the resources to make them perfect :(.

Based on a quick google, the Java community appears to be grappling
with these same issues:

     http://blog.adjective.org/post/2008/02/21/Java-Backwards-Compatability

the poster seems intent on maintaining more backward compatibility
than we have with python2/3, until you remember that java uses a
compile-and-distribute-binaries paradigm and python does not.  Once you
realize that, the differences in backward compatibility don't
seem so large...at least to me.

--RDM

From foom at fuhm.net  Tue Dec  9 18:01:10 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 9 Dec 2008 12:01:10 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net>
References: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net>
Message-ID: <A08CA2DE-F346-4F27-BE58-FC8648D5AC3D@fuhm.net>

On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote:
> The typical application will just obliviously use os.listdir(dir)  
> and get the default elide-and-warn behaviour for un-decodable names.  
> That rare special application

I guess this is a new definition of rare special application: "an  
application which deals with user-specified files".

This is the problem I see in having two parallel APIs: people keep  
saying "most applications can just go ahead and use the [broken]  
unicode string API". If there was a unicode API and a bytes API, but  
everyone was clear that "always use the bytes API" is the right thing  
to do, that'd be okay... But, since even python-dev members are saying  
that only a rare special app needs to care about working with users'  
existing files, I'm rather worried this API design will cause most  
programs written in python to be broken. Which seems a shame.

> that needs more control can use os.listdirb and handle decoding  
> itself.

James

From steve at holdenweb.com  Tue Dec  9 18:15:53 2008
From: steve at holdenweb.com (Steve Holden)
Date: Tue, 09 Dec 2008 12:15:53 -0500
Subject: [Python-Dev] Floating-point implementations
Message-ID: <ghm940$ev4$1@ger.gmane.org>

Is anyone aware of any implementations that use other than 64-bit
floating-point? I'd be particularly interested in any that use greater
precision than the usual 56-bit mantissa. Do modern 64-bit systems
implement anything wider than the normal double?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From dickinsm at gmail.com  Tue Dec  9 18:24:44 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 9 Dec 2008 17:24:44 +0000
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <ghm940$ev4$1@ger.gmane.org>
References: <ghm940$ev4$1@ger.gmane.org>
Message-ID: <5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com>

On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <steve at holdenweb.com> wrote:
> Is anyone aware of any implementations that use other than 64-bit
> floating-point? I'd be particularly interested in any that use greater
> precision than the usual 56-bit mantissa. Do modern 64-bit systems
> implement anything wider than the normal double?

I don't know of any.  There are certainly places in the codebase that
assume 56 bits are enough.  (I seem to recall it's something like
56 bits for IBM, 53 bits for IEEE 754, 48 for Cray, and 52 or 56 for VAX.)

Many systems have a "long double" type, which usually seems to
be either 80-bit (with a 64-bit mantissa) or 128-bit.  The latter is
sometimes implemented as a pair of doubles, effectively giving
a 106-bit mantissa, and sometimes as an IEEE extended precision
type;  I don't know how many bits the mantissa would have in that
case, but surely not more than 117.

I asked a related question a while ago:

http://mail.python.org/pipermail/python-dev/2008-February/076680.html

Mark

From dickinsm at gmail.com  Tue Dec  9 18:33:14 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Tue, 9 Dec 2008 17:33:14 +0000
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <ghm940$ev4$1@ger.gmane.org>
References: <ghm940$ev4$1@ger.gmane.org>
Message-ID: <5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com>

On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <steve at holdenweb.com> wrote:
> precision than the usual 56-bit mantissa. Do modern 64-bit systems
> implement anything wider than the normal double?

I may have misinterpreted your question.  Are you asking simply
about what the hardware provides, or about what the C compiler
and library support?  Or something else entirely?

It looks like IEEE-conforming 128-bit floats would have a 113-bit
mantissa (including the implicit leading '1' bit).

Mark

From steve at holdenweb.com  Tue Dec  9 18:43:28 2008
From: steve at holdenweb.com (Steve Holden)
Date: Tue, 09 Dec 2008 12:43:28 -0500
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com>
References: <ghm940$ev4$1@ger.gmane.org>
	<5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com>
Message-ID: <493EAE40.5060909@holdenweb.com>

Mark Dickinson wrote:
> On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden <steve at holdenweb.com> wrote:
>> precision than the usual 56-bit mantissa. Do modern 64-bit systems
>> implement anything wider than the normal double?
> 
> I may have misinterpreted your question.  Are you asking simply
> about what the hardware provides, or about what the C compiler
> and library support?  Or something else entirely?
> 
> It looks like IEEE-conforming 128-bit floats would have a 113-bit
> mantissa (including the implicit leading '1' bit).
> 
I was actually asking about Python implementations, and read your
original answer as meaning "no, there aren't any". I had assumed,
correctly or otherwise, that the C library would have to offer
well-integrated support to enable its use in Python. In fact I had
assumed it would need to be pretty much a drop-in repleacement, but it
sounds as though there are some hard-coded assumptions about float size
that would not allow that.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From eckhardt at satorlaser.com  Tue Dec  9 19:31:29 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Tue, 9 Dec 2008 19:31:29 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812081306u519e736cl66e28cc210b161ef@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<loom.20081208T204431-761@post.gmane.org> 
	<aac2c7cb0812081306u519e736cl66e28cc210b161ef@mail.gmail.com>
Message-ID: <200812091931.29905.eckhardt@satorlaser.com>

On Monday 08 December 2008, Adam Olsen wrote:
> At this point someone suggests we have a type that can store an
> arbitrary mix of unicode and bytes, so the undecodable portions stay
> in their original form. :P

Well, not an arbitrary mix, but a type that just stores whatever comes from 
the system without further specifying it as either bytes or Unicode:

* If you want a string for displaying it, you first have to extract a string 
from that thing and there you optionally specify the encoding and error 
behaviour.
* If you want to append a string to it, it is automatically encoded in the 
default encoding, which obviously can fail.
* Similarly, e.g. globbing is done on the underlying representation's level, 
so "*.py" will first have to be converted according to the default encoding.
* If you just print it, you will get something that you can make out the 
decodable parts from, but it will probably be like "{Unicode:u'abcde'}" 
or "{bytes:b'ab\xf0\x0fcd'}".
* If you don't want to display it, but just want to pass it to the system, 
just use it as is.

Yes, this puts an inconvenience on application programmers that up to now 
always assumed that they received a list of strings from os.readdir(), but 
that's the way with false assumptions. In any case, they will be aware (from 
reading the docs) of what the problem is and why there is no way to return a 
text. Further, they will get tools to convert these paths or environment vars 
to texts, so it will be simply replacing "os.readdir()" 
with "map(to_unicode,os.readdir())".


Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From lists at larsko.org  Tue Dec  9 20:26:51 2008
From: lists at larsko.org (Lars Kotthoff)
Date: Tue, 9 Dec 2008 19:26:51 +0000
Subject: [Python-Dev] Forking and pipes
Message-ID: <20081209192651.7dfbcf7b@ronin.larsko.net>

Dear list,

 I recently noticed a python program which uses forks and pipes for
communication between the processes not behaving as expected. The minimal
example program:

--------------------------------------------------------------------------------
#!/usr/bin/python

import os, sys

r, w = os.pipe()
write = os.fdopen(w, 'w')
print >> write, "foo"
pid = os.fork()
if pid:
    os.waitpid(pid, 0)
else:
    sys.exit(0)
write.close()
read = os.fdopen(r)
print read.read()
read.close()
--------------------------------------------------------------------------------

This prints out "foo" twice although it's only written once to the pipe. It
seems that python doesn't flush file descriptors before copying them to the
child process, thus resulting in the duplicate message. The equivalent C
program behaves as expected,

--------------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void) {
    int fds[2];
    pid_t pid;
    char* buf = (char*) calloc(4, sizeof(char));

    pipe(fds);
    write(fds[1], "foo", 3);

    pid = fork();
    if(pid) {
        waitpid(pid, NULL, 0);
    } else {
        return EXIT_SUCCESS;
    }

    close(fds[1]);

    read(fds[0], buf, 3);
    printf("%s\n", buf);
    close(fds[0]);

    free(buf);
    
    return EXIT_SUCCESS;
}
--------------------------------------------------------------------------------

Is this behaviour intentional? I've tested both python and C on Linux, OpenBSD
and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the same
everywhere.

Thanks,

Lars

From rhamph at gmail.com  Tue Dec  9 20:22:35 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 9 Dec 2008 12:22:35 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812091931.29905.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<loom.20081208T204431-761@post.gmane.org>
	<aac2c7cb0812081306u519e736cl66e28cc210b161ef@mail.gmail.com>
	<200812091931.29905.eckhardt@satorlaser.com>
Message-ID: <aac2c7cb0812091122o116d189aq5032a3e94c96ee87@mail.gmail.com>

On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt
<eckhardt at satorlaser.com> wrote:
> On Monday 08 December 2008, Adam Olsen wrote:
>> At this point someone suggests we have a type that can store an
>> arbitrary mix of unicode and bytes, so the undecodable portions stay
>> in their original form. :P
>
> Well, not an arbitrary mix, but a type that just stores whatever comes from
> the system without further specifying it as either bytes or Unicode:
>
> * If you want a string for displaying it, you first have to extract a string
> from that thing and there you optionally specify the encoding and error
> behaviour.
> * If you want to append a string to it, it is automatically encoded in the
> default encoding, which obviously can fail.

So the 2.x str, but with a more interesting default encoding than
ASCII.  It'll work fine on the developer's system, but one day a user
will present it with strange input, and boom.

You have to be pessimistic here.  The default operations should either
always work or never work.  Using unicode internally and skipping
garbage input means the operations always work.  Using a bytes API
means mixing with unicode never works, unless the programmer
explicitly converts, in which case the onus is on them to use proper
error handling.

The only thing separating this from a bikeshed discussion is that a
bikeshed has many equally good solutions, while we have no good
solutions.  Instead we're trying to find the least-bad one.  The
unicode/bytes separation is pretty close to that.  Adding a warning
gets even closer.  Adding magic makes it worse.


-- 
Adam Olsen, aka Rhamphoryncus

From foom at fuhm.net  Tue Dec  9 20:40:11 2008
From: foom at fuhm.net (James Y Knight)
Date: Tue, 9 Dec 2008 14:40:11 -0500
Subject: [Python-Dev] Forking and pipes
In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net>
References: <20081209192651.7dfbcf7b@ronin.larsko.net>
Message-ID: <3E4D576A-5E49-4FE0-9AF2-34FFFC3B1594@fuhm.net>


On Dec 9, 2008, at 2:26 PM, Lars Kotthoff wrote:

> Dear list,
>
> I recently noticed a python program which uses forks and pipes for
> communication between the processes not behaving as expected. The  
> minimal
> example program:
>
> [snip]

> This prints out "foo" twice although it's only written once to the  
> pipe. It
> seems that python doesn't flush file descriptors before copying them  
> to the
> child process, thus resulting in the duplicate message. The  
> equivalent C
> program behaves as expected,
>
> [snip]
>
> Is this behaviour intentional? I've tested both python and C on  
> Linux, OpenBSD
> and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the  
> same
> everywhere.


Yes, it's intentional. And, no, your programs aren't equivalent.

Rewrite your C program to use fdopen, and fread/fwrite. *Then* it will  
be equivalent and have the same behavior as the python program.

Alternatively, you can change your python program to use os.read/ 
os.write instead of fdopen and fileobject.read/fileobject.write, if  
you want your python program to work like the C program.

James

From shigin at rambler-co.ru  Tue Dec  9 20:35:16 2008
From: shigin at rambler-co.ru (Alexander Shigin)
Date: Tue, 09 Dec 2008 22:35:16 +0300
Subject: [Python-Dev] Forking and pipes
In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net>
References: <20081209192651.7dfbcf7b@ronin.larsko.net>
Message-ID: <1228851316.24594.5.camel@jenner>

? ???, 09/12/2008 ? 19:26 +0000, Lars Kotthoff ?????:
> Dear list,
> 
>  I recently noticed a python program which uses forks and pipes for
> communication between the processes not behaving as expected. The minimal
> example program:

If you write 
====
r, w = os.pipe()
os.write(w, 'foo')
pid = os.fork()
====

You'll get the same result as C program. Or if you use fdopen in C
program you'll get the same result as Python.

The problem with the example is libc buffering. If you say
write.flush(), buffer won't be shared with child process and you'll see
only one 'foo'.


From a.badger at gmail.com  Tue Dec  9 21:25:01 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Tue, 09 Dec 2008 12:25:01 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <A08CA2DE-F346-4F27-BE58-FC8648D5AC3D@fuhm.net>
References: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net>
	<A08CA2DE-F346-4F27-BE58-FC8648D5AC3D@fuhm.net>
Message-ID: <493ED41D.3020900@gmail.com>

James Y Knight wrote:
> On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote:
>> The typical application will just obliviously use os.listdir(dir) and
>> get the default elide-and-warn behaviour for un-decodable names. That
>> rare special application
> 
> I guess this is a new definition of rare special application: "an
> application which deals with user-specified files".
> 
> This is the problem I see in having two parallel APIs: people keep
> saying "most applications can just go ahead and use the [broken] unicode
> string API". If there was a unicode API and a bytes API, but everyone
> was clear that "always use the bytes API" is the right thing to do,
> that'd be okay... But, since even python-dev members are saying that
> only a rare special app needs to care about working with users' existing
> files, I'm rather worried this API design will cause most programs
> written in python to be broken. Which seems a shame.
> 
I agree with you which was part of why I raised this subject but I also
think that using the warnings module to issue a warning and ignore the
entire problematic entry is a reasonable compromise.  Hopefully it will
become obvious to people that it's a python3 wart at some point in the
future and we'll re-examine the default.  But until then, having a
printed warning that individual apps can turn into an exception seems
like it is less broken than the other alternatives the "rare special
application" people can live with :-)

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081209/33d050de/attachment.pgp>

From ncoghlan at gmail.com  Tue Dec  9 22:22:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Dec 2008 07:22:47 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <1228832876.18857.11.camel@localhost>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost>
Message-ID: <493EE1A7.6050405@gmail.com>

Antoine Pitrou wrote:
> Le mardi 09 d?cembre 2008 ? 22:33 +1000, Nick Coghlan a ?crit :
>> memoryview also currently gets the shape wrong on slices:
> 
> I know, that's what I'm trying to fix...

Yes, I was slightly misled by your use of slice assignment to
demonstrate the problem. It also turns out that while assignment to
memoryviews has issues, and so does slicing, there is a fundamental
problem with the length calculation when a memoryview is first created
which is further confusing matters.

For the slicing problem in particular, memoryview is currently trying to
get away with only one Py_buffer object when it needs TWO.

The first Py_buffer object needs to describe the view the memoryview has
of the target object (i.e. it describes the entire data area of the
target). The shape/strides/etc pointers in that struct are owned by the
target object. The existing self->view tends to fill this role fairly well.

The *second* (currently nonexistent) Py_buffer object needs to describe
the memory layout that the memoryview exposes to the rest of the world.
The pointers in *this* struct will be owned by the memoryview object and
accurately reflect any changes in shape due to slicing operations.

Currently, memoryview is trying to make the first Py_buffer also fill
the role of the second one, and that obviously isn't going to work for
subviews.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From greg.ewing at canterbury.ac.nz  Tue Dec  9 23:31:48 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 11:31:48 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493E3569.6010408@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<493E3569.6010408@gmail.com>
Message-ID: <493EF1D4.5090803@canterbury.ac.nz>

Nick Coghlan wrote:
> Maintaining a PyDict instance to map from view pointers to shapes
> and strides info doesn't strike me as a "complex scheme" though.

I don't see why a given buffer provider should ever need
more than one set of shape/strides arrays at a time. It
can allocate them on creation, reallocate them as needed
if the shape of its internal data changes, and deallocate
them when it goes away.

If you are creating view objects that present slices or
some other alternative perspective, then the view object
itself is a buffer provider and should maintain shape/stride
arrays for its particular view of the underlying object.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue Dec  9 23:45:03 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 11:45:03 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T112013-381@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
Message-ID: <493EF4EF.6080600@canterbury.ac.nz>

Antoine Pitrou wrote:

> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580

I haven't looked in detail at how memoryview is currently
implemented, but it seems to me that the way it should work
is that whenever you access a slice, it obtains a fresh
Py_Buffer from the underlying object, and does the right
thing based on the shape/strides from that together with
the slice ranges.

The only time it should need to allocate its own shape/strides
is if you request a Py_Buffer from the memoryview itself,
at which time it should obtain a Py_Buffer from the underlying
object, update its own shape/strides and pass them to the
caller. The underlying Py_Buffer lock should be held until
the caller releases the memoryview's Py_Buffer, ensuring
that its shape/strides remains valid for as long as they're
needed.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Tue Dec  9 23:54:08 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 11:54:08 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493E7487.3050300@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E7487.3050300@gmail.com>
Message-ID: <493EF710.3060509@canterbury.ac.nz>

Nick Coghlan wrote:

> [from the PEP] "If the exporter wants to be able to change an
> object's shape, strides, and/or suboffsets before releasebuffer is
> called then it should allocate those arrays when getbuffer is called
> (pointing to them in the buffer-info structure provided) and free them
> when releasebuffer is called."

Even allowing this seems rather dubious to me. I suppose
there's no serious danger as long as the block of memory
ultimately holding the data doesn't move or change size,
but changing the shape could confuse a buffer user that's
iterating over the data.

-- 
Greg

From solipsis at pitrou.net  Wed Dec 10 00:15:58 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 9 Dec 2008 23:15:58 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost>
	<493EE1A7.6050405@gmail.com>
Message-ID: <loom.20081209T230035-355@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> For the slicing problem in particular, memoryview is currently trying to
> get away with only one Py_buffer object when it needs TWO.

Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
needs of the memoryview object? If the memoryview can't be a relatively thin
object-oriented wrapper around a Py_buffer, then this all screams failure to me.

----

In all honesty, I admit I am annoyed by all the problems with the buffer API /
memoryview object, many of which are caused by its utterly bizarre design (and
the fact that the design team went missing in action after imposing such a
bizarre and complex design on us), and I'm reluctant to add yet another level of
byzantine complexity in order to solve those problems. It explains I may sound a
bit angry at times :-)

If we really need to change things a lot to make them work, we should re-work
the buffer API from the ground up, make the Py_buffer struct a true PyObject
(that is, a true variable-length object so as to solve the shape and strides
allocation issue) and merge it with the current memoryview implementation. It
would make things both more simpler and more flexible.

But of course it would destroy C-level compatibility with 2.6 / 3.0.

Regards

Antoine.



From greg.ewing at canterbury.ac.nz  Wed Dec 10 00:55:43 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 12:55:43 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T230035-355@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
Message-ID: <493F057F.4070806@canterbury.ac.nz>

Antoine Pitrou wrote:

> Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
> needs of the memoryview object? 

Two things here:

   1) The memoryview should *not* be holding onto a Py_buffer
      in between calls to its getitem and setitem methods. It
      should request one from the underlying object when needed
      and release it again as soon as possible.

   2) The "second" Py_buffer referred to above only needs to
      be materialized when someone makes a GetBuffer request on
      the memoryview itself. It's not needed for Python getitem
      and setitem calls. (The implementation might choose to
      implement these by creating a temporary Py_buffer, but
      again, it would only last as long as the call.)

> If the memoryview can't be a relatively thin
> object-oriented wrapper around a Py_buffer, then this all screams failure to me.

It shouldn't be a wrapper around a Py_buffer, it should be a
wrapper around the buffer *interface* of the underlying object.

> In all honesty, I admit I am annoyed by all the problems with the buffer API /
> memoryview object, many of which are caused by its utterly bizarre design

It sounds to me like whoever wrote the memoryview implementation
didn't understand how the buffer interface is meant to be used.
That doesn't mean there's anything wrong with the buffer interface.

I have some doubts myself about whether it needs to be as
complicated as it is, but I think the basic idea is sound:
that Py_buffer objects are ephemeral, to be obtained when
needed and not kept for any longer than necessary.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Wed Dec 10 01:00:40 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 13:00:40 +1300
Subject: [Python-Dev] Forking and pipes
In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net>
References: <20081209192651.7dfbcf7b@ronin.larsko.net>
Message-ID: <493F06A8.9010100@canterbury.ac.nz>

Lars Kotthoff wrote:

> This prints out "foo" twice although it's only written once to the pipe. It
> seems that python doesn't flush file descriptors before copying them to the
> child process, thus resulting in the duplicate message. The equivalent C
> program behaves as expected,

Your Python and C programs are not equivalent -- the C one is
writing directly to the file descriptor, whereas the Python one
is effectively using a buffered stdio stream. The unflushed stdio
buffer is getting copied by the fork, hence the duplicate output.

Solution: either (a) flush the Python file object before forking
or (b) use os.write() directly on the fd to avoid the buffering.

-- 
Greg

From solipsis at pitrou.net  Wed Dec 10 01:21:54 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 10 Dec 2008 00:21:54 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost>
	<493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
	<493F057F.4070806@canterbury.ac.nz>
Message-ID: <loom.20081210T002139-470@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
>    1) The memoryview should *not* be holding onto a Py_buffer
>       in between calls to its getitem and setitem methods. It
>       should request one from the underlying object when needed
>       and release it again as soon as possible.

If the memoryview wasn't holding onto a Py_buffer, one couldn't rely on its
length or anything else because the underlying object could be mutated at any
moment (even by another thread). It would make memoryview objects basically
unusable for anything except bytes objects (which are immutable).

>    2) The "second" Py_buffer referred to above only needs to
>       be materialized when someone makes a GetBuffer request on
>       the memoryview itself.

It's already what is being done, but that's got nothing to do with the problem
at hand. We are talking about slicing the memoryview, not taking a (non-sliced)
buffer of it.

>       It's not needed for Python getitem
>       and setitem calls.

What is needed for Python getitem and setitem calls is proper shape information
in the embedded Py_buffer struct, otherwise memoryview slices are buggy. In the
case of a memoryview slice, the proper shape information can only be computed
*after* the Py_buffer is obtained.

> It sounds to me like whoever wrote the memoryview implementation
> didn't understand how the buffer interface is meant to be used.

Perhaps, perhaps not, but without any concrete suggestion we won't go anywhere.

As I said, I don't think it would be foolish to revamp the current spec and/or
implementation /if we have a precise plan of how to do better/. The /if/ part is
important :-)

Regards

Antoine.



From martin at v.loewis.de  Wed Dec 10 07:31:29 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 10 Dec 2008 07:31:29 +0100
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <ghm940$ev4$1@ger.gmane.org>
References: <ghm940$ev4$1@ger.gmane.org>
Message-ID: <493F6241.3050500@v.loewis.de>

> Is anyone aware of any implementations that use other than 64-bit
> floating-point?

As I understand you are asking about Python implementations:
sure, the gmpy package supports arbitrary-precision floating point.

> I'd be particularly interested in any that use greater
> precision than the usual 56-bit mantissa. 

Nit-pickingly: it's usual that the mantissa is 53-bit.

> Do modern 64-bit systems implement anything wider
> than the normal double?

As Mark said: sure. x86 systems have supported 80-bit
"extended" precision for ages. Some architectures have
architecture support for 128-bit floats (e.g. Itanium, SPARC v9);
it's not clear to me whether they actually implement the
long double operations in hardware, or whether they trap
and get software-emulated.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Wed Dec 10 11:21:40 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 10 Dec 2008 23:21:40 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081210T002139-470@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
	<493F057F.4070806@canterbury.ac.nz>
	<loom.20081210T002139-470@post.gmane.org>
Message-ID: <493F9834.8030100@canterbury.ac.nz>

Antoine Pitrou wrote:

> If the memoryview wasn't holding onto a Py_buffer, one couldn't rely on its
> length or anything else because the underlying object could be mutated at any
> moment

Hmm, it seems there are two different approaches that could
be taken here to the design of a memoryview object.

You seem to be thinking of an "eager" approach where the
memoryview keeps the underlying object's memory locked for
as long as it exists, thus preventing it from being
resized.

Whereas I've been thinking of it as being "lazy", in
the sense that the memoryview simply remembers the slice
parameters it was given, and waits until you access it
before making any GetBuffer calls.

The lazy version would have the characteristic that
creating a slice could succeed even though accessing it
later fails due to a range error. I'm not sure that's
necessarily a fatally bad thing.

I'm also not sure that the eager version would be totally
immune to such things. The PEP seems to permit the shape
to change while the buffer is locked as long as the overall
size and location of the memory doesn't change, so a
subsequent access to a formerly-valid slice could still
fail.

In any case, I think it should be possible to implement
either version without the memoryview having to own
more than one Py_buffer and one set of shape/strides
at a time. Slicing the memoryview creates another
memoryview with its own Py_buffer and shape/strides.

-- 
Greg

From eckhardt at satorlaser.com  Wed Dec 10 11:39:37 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Wed, 10 Dec 2008 11:39:37 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812091122o116d189aq5032a3e94c96ee87@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812091931.29905.eckhardt@satorlaser.com> 
	<aac2c7cb0812091122o116d189aq5032a3e94c96ee87@mail.gmail.com>
Message-ID: <200812101139.37301.eckhardt@satorlaser.com>

On Tuesday 09 December 2008, Adam Olsen wrote:
> On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt
>
> <eckhardt at satorlaser.com> wrote:
> > On Monday 08 December 2008, Adam Olsen wrote:
> >> At this point someone suggests we have a type that can store an
> >> arbitrary mix of unicode and bytes, so the undecodable portions stay
> >> in their original form. :P
> >
> > Well, not an arbitrary mix, but a type that just stores whatever comes
> > from the system without further specifying it as either bytes or Unicode:
> >
> > * If you want a string for displaying it, you first have to extract a
> > string from that thing and there you optionally specify the encoding and
> > error behaviour.
> > * If you want to append a string to it, it is automatically encoded in
> > the default encoding, which obviously can fail.
>
> So the 2.x str, but with a more interesting default encoding than
> ASCII.  It'll work fine on the developer's system, but one day a user
> will present it with strange input, and boom.

If the system's representation of filenames can not represent a Unicode 
codepoint that the user entered, trying to open such a file must fail. If it 
can be represented, for convenience I would allow an implicit conversion.

  for i in readdir():
      copy( i, i+".backup")
      ...

> You have to be pessimistic here.  The default operations should either
> always work or never work.  Using unicode internally and skipping
> garbage input means the operations always work.  Using a bytes API
> means mixing with unicode never works, unless the programmer
> explicitly converts, in which case the onus is on them to use proper
> error handling.

So, if I understand you correctly, you would prefer an explicit conversion to 
the system's representation:

  for i in readdir():
      copy( i, i+path(".backup"))
      ...

> The only thing separating this from a bikeshed discussion is that a
> bikeshed has many equally good solutions, while we have no good
> solutions.  Instead we're trying to find the least-bad one.  The
> unicode/bytes separation is pretty close to that.  Adding a warning
> gets even closer.  Adding magic makes it worse.

Well, I see two cases:
1. Converting from an uncertain representation to a known one.
2. Converting from a known representation to a known one.

The uncertain one is the one used by the filesystem or environment. The known 
representations are the expected(!) encoding for filesystem and environment 
and the internal text in Unicode. For case 1, I would require an explicit 
conversion to make the programmer really aware of the fact that it can fail. 
For the second case, I would allow an implicit conversion even though it can 
fail. Anyhow, that is a matter of taste, and I can actually live with your 
point of view.

However, one question still remains: What about the approach in general, i.e. 
that these texts with an uncertain representation are handled as a separate 
type? I find this much more appealing that duplicating APIs like readdir() 
using either overloading on the arguments or a separate readdirb().

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From dickinsm at gmail.com  Wed Dec 10 11:42:10 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Wed, 10 Dec 2008 10:42:10 +0000
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com>
References: <ghm940$ev4$1@ger.gmane.org>
	<5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com>
Message-ID: <5c6f2a5d0812100242m11042672q17d1a52027c54f68@mail.gmail.com>

On Tue, Dec 9, 2008 at 5:24 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
> I don't know of any.  There are certainly places in the codebase that
> assume 56 bits are enough.  (I seem to recall it's something like
> 56 bits for IBM, 53 bits for IEEE 754, 48 for Cray, and 52 or 56 for VAX.)

Quick correction, after actually bothering to look things up rather
than relying on my poor memory:  VAX doubles have either *53*
(not 52) or 56 bit mantissas.  More precisely, the VAX G_floating
format has a 53-bit mantissa (52 bits stored directly, one implicit
'hidden' bit), while the (now rare) D_floating format has a 56-bit
mantissa (again, including the implicit 'hidden' bit).

Mark

From regebro at gmail.com  Wed Dec 10 11:55:42 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Wed, 10 Dec 2008 11:55:42 +0100
Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time"
In-Reply-To: <ac2200130811160843y2292fad4he6cf78bbb696d4d@mail.gmail.com>
References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com>
	<7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com>
	<ac2200130811160843y2292fad4he6cf78bbb696d4d@mail.gmail.com>
Message-ID: <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com>

A funny thing just happened to me. I tried out causing this error,
just to see how the error message was somehow different, by creating a
time.py in /tmp, and running python from there. Then I removed the
time.py, and went on working.

Two days later, my usage of zc.buildout are broken with a "module time
has no attribute time". Huh?

Turns out, I created an empty time.py in /tmp, just to see the error
message. By buildout will when creating eggs from checked out modules,
copy them to a directory under /tmp, and evidently run python from
/tmp to create the eggs. So that process finds the time.pyc, created
from the empty time.py, which I hadn't deleted, and breaks!

Heh. That was funny. Moral of the story: Don't create python modules
with names that clash with build in modules in /tmp, even for testing.
Or at least, of you do, remember to remove the pyc. :-P Or, reboot
your Linux every night.  Or well. I guess this could have been avoided
in many ways. ;-)

On Sun, Nov 16, 2008 at 17:43, Guilherme Polo <ggpolo at gmail.com> wrote:
> On Sun, Nov 16, 2008 at 11:55 AM, Tal Einat <taleinat at gmail.com> wrote:
>> Steve Holden wrote:
>>> Tal Einat wrote:
>>>> It this desired behavior?
>>>>
>>>> At the very least the exception should be more detailed, perhaps to
>>>> the point of suggesting the probable cause of the error (i.e.
>>>> overriding the time module).
>>>>
>>> How is this different from any other case where you import a module with
>>> a standard library name conflict, thereby confusing modules loaded later
>>> standard library. Should we do the same for any error induced in such a way?
>>
>> The difference is that here the exception is generated directly in the
>> C code so you don't get an intelligible traceback. The C code for
>> datetime imports the time module via the Python C API.
>>
>> In other words, here a function from a module in the stdlib, datetime,
>> barfs unexpectedly because I happen to have a file name time.py
>> hanging around in some directory. There is no traceback and no
>> intelligible exception message, just "AttributeError: time". I had to
>> dig through datetime's C code to figure out which module was being
>> imported via the Python C API, which turned out to be time.
>
> Just like Steve told you, this isn't different from other cases. But,
> at least you get a message a bit more verbose in most cases, like:
>
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> AttributeError: 'module' object has no attribute 'time'
>
> Then I went to look why this wasn't happening with datetime too, and I
> found out that PyObject_CallMethod in abstract.c re sets the exception
> message that would have been set by PyObject_GetAttr by now. Maybe
> someone can tell me why it is doing that, for now a patch is attached
> here (I didn't resist to not remove two trailing whitespaces).
>
>>
>>  This is rare enough that I've never had something like this happen to
>> me in seven years of heavy Python programming.
>>
>> - Tal
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ggpolo%40gmail.com
>>
>
>
>
> --
> -- Guilherme H. Polo Goncalves
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/regebro%40gmail.com
>
>



-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From victor.stinner at haypocalc.com  Wed Dec 10 12:06:49 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 10 Dec 2008 12:06:49 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
Message-ID: <200812101206.49316.victor.stinner@haypocalc.com>

Hi,

I published a new version of my fault handler: it installs an handler for 
signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and 
continue the execution of your Python program. Example:

   try:
      call_evil_code()
   except MemoryError:
      print "A segfault? Haha, I don't care!"
   print "continue the execution"

(yes, it's possible to continue the execution after a segmentation fault!)

Handled errors:

 - Segmentation fault:
   * invalid memory read
   * invalid memory write
   * stack overflow (stack pointer outside the stack memory)

 - SIGFPE
   * division by zero
   * floating point error?

Such errors may occurs from external libraries (written in C)... or Python 
builtin libraries (eg. imageop). The handler is now only used in 
Py_EvalFrameEx(), but it could be used anywhere.

The patch uses sigsetjmp() in Py_EvalFrameEx() to set a "check point", and 
siglongjmp() in the signal handler to go back to the check point. It also 
uses a separated stack for the signal handler, because on stack overflow you 
can not use the stack (ex: unable to call any function!). With MAXDEPTH=100, 
the memory footprint is ~20 KB. If you call Py_EvalFrameEx() more than 
MAXDEPTH times, the handler will go back to the frame #MAXDEPTH on error (you 
loose the last entries in the Python traceback).

sigsetjmp()/siglongjmp() should be available on many OS. I just know that it 
works perfectly on Linux. sigaltstack() is needed to recover after a stack 
overflow, but other errors can be catched without it.

I didn't run any benchmark yet, but it would be interresting ;-) Changing 
MAXDEPTH constant may changes the speed with many recursive calls (eg. 
MAXDEPTH=1 only set a check for the first call to Py_EvalFrameEx()).

I would appreciate a review, especially for the patch in Python/ceval.c.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From ncoghlan at gmail.com  Wed Dec 10 12:49:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Dec 2008 21:49:47 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T230035-355@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>	<493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost>	<493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
Message-ID: <493FACDB.1030607@gmail.com>

Antoine Pitrou wrote:
> In all honesty, I admit I am annoyed by all the problems with the buffer API /
> memoryview object, many of which are caused by its utterly bizarre design (and
> the fact that the design team went missing in action after imposing such a
> bizarre and complex design on us), and I'm reluctant to add yet another level of
> byzantine complexity in order to solve those problems. It explains I may sound a
> bit angry at times :-)
> 
> If we really need to change things a lot to make them work, we should re-work
> the buffer API from the ground up, make the Py_buffer struct a true PyObject
> (that is, a true variable-length object so as to solve the shape and strides
> allocation issue) and merge it with the current memoryview implementation. It
> would make things both more simpler and more flexible.

I don't see anything wrong with the PEP 3118 protocol. It does exactly
what it is designed to do: allow the number crunching crowd to share
large datasets between different libraries without copying things around
in memory. Yes, the protocol is complicated, but that is because it is
trying to handle a complicated problem.

The memoryview implementation on the other hand is pretty broken. I do
have a theory on how it ended up in such an unusable state, but I'm not
particularly inclined to share it - this kind of thing can happen
sometimes, and the important question now is how we fix it.

As I see it, memoryview is actually trying to do two things, but the
design for supporting the second of them doesn't appear to have been
adequately thought through in the current implementation.

The first use of a memoryview object is merely to allow access to the
Py_buffer of a data store. This is pretty simple, and aside from
currently getting len() wrong when itemsize > 1, memoryview isn't
terrible at it.

If we left memoryview at that it *would* just be a simple wrapper around
a Py_buffer struct, and it's implementation wouldn't be difficult at all.

Where it gets a bit more complicated is if we want to support slices
(rather than just indexing) on memoryview objects. When you do that, the
memoryview is no longer a simple wrapper around the Py_buffer of the
underlying data store, because it isn't exposing the whole data store
any more - it is only exposing part of it.

Requesting access to only part of a data buffer is NOT part of the PEP
3118 API, and it doesn't need to be: it can be part of a separate object
that adapts from the underlying data store to the desired subview.

The object that is meant to be performing at least simple 1-dimensional
cases of that adaptation is memoryview (or more to the point, memoryview
slices), but it currently *sucks* at this because it relies too heavily
on the info in the Py_buffer that it got from the underlying object.
That Py_buffer describes the *whole* data store, but a memoryview slice
may only be exposing part of it - so while the info in the Py_buffer is
accurate for the underlying object, it is *not* accurate for the
memoryview itself.

Fixing that for the 1 dimensional case shouldn't actually be all that
difficult - the memoryview just needs to maintain its own shape[0] entry
that reflects the number of items in the view rather than the number in
the underlying object.

The multi-dimensional cases get pretty tricky though, since they will
almost always end up dealing with non-contiguous data. The PEP 3118
protocol is up to handling the task, but the implementation of the index
mapping to handle these multi-dimensional cases is highly non-trivial,
and probably best left to third party libraries like numpy.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Wed Dec 10 12:54:01 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 10 Dec 2008 21:54:01 +1000
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493F9834.8030100@canterbury.ac.nz>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com>	<1228832876.18857.11.camel@localhost>
	<493EE1A7.6050405@gmail.com>	<loom.20081209T230035-355@post.gmane.org>	<493F057F.4070806@canterbury.ac.nz>	<loom.20081210T002139-470@post.gmane.org>
	<493F9834.8030100@canterbury.ac.nz>
Message-ID: <493FADD9.4010109@gmail.com>

Greg Ewing wrote:
> In any case, I think it should be possible to implement
> either version without the memoryview having to own
> more than one Py_buffer and one set of shape/strides
> at a time. Slicing the memoryview creates another
> memoryview with its own Py_buffer and shape/strides.

The important point is that the shape information in the Py_buffer
filled in by the underlying object is the shape of *that* object.

Except in the trivial case where the memoryview is exposing the entire
underlying data buffer, the shape information in the Py_buffer has
nothing to do with the shape of the memoryview object itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Wed Dec 10 13:58:20 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 10 Dec 2008 12:58:20 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>	<493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost>	<493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
	<493FACDB.1030607@gmail.com>
Message-ID: <loom.20081210T121749-165@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> I don't see anything wrong with the PEP 3118 protocol.

Apart from the fact that:
- it uses something (Py_buffer) which is not a PyObject and has totally
different allocation/lifetime semantics (which makes it non-trivial to adapt to
for anyone used to the rest of the C API)
- it has unsolved issues like allocation of the underlying shape and strides 
members
- it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an
existing one (which seem to be rather fundamental actions to me)

... I agree there's nothing wrong with it!

> That Py_buffer describes the *whole* data store, but a memoryview slice
> may only be exposing part of it - so while the info in the Py_buffer is
> accurate for the underlying object, it is *not* accurate for the
> memoryview itself.

And the problem here is that Py_buffer is/was (*) not flexible enough to allow
easy modification in order to take a sub-buffer without some annoying problems.

(*) my patch solves the one-dimensional case. People interested in the
multi-dimensional case will have to do their homework themselves!

Regards

Antoine.



From fwierzbicki at gmail.com  Wed Dec 10 15:18:39 2008
From: fwierzbicki at gmail.com (Frank Wierzbicki)
Date: Wed, 10 Dec 2008 09:18:39 -0500
Subject: [Python-Dev] Holding a Python Language Summit at PyCon
In-Reply-To: <bbaeab100812081931l71e903bbwb9cb818a050ca687@mail.gmail.com>
References: <20081203153128.GA6161@amk-desktop.matrixgroup.net>
	<4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com>
	<bbaeab100812041216w16a653efv4a2c7dfd8ad03403@mail.gmail.com>
	<4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com>
	<bbaeab100812061442j10a30baat3caeb922eb6c93e8@mail.gmail.com>
	<20081209025317.GA1080@amk.local>
	<bbaeab100812081931l71e903bbwb9cb818a050ca687@mail.gmail.com>
Message-ID: <4dab5f760812100618x6dbca5e5o80895aa5c4aa73a5@mail.gmail.com>

On Mon, Dec 8, 2008 at 10:31 PM, Brett Cannon <brett at python.org> wrote:
> On Mon, Dec 8, 2008 at 18:53, A.M. Kuchling <amk at amk.ca> wrote:
>> On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote:
>>> No, I am saying I had told AMK I was interested in championing the
>>> session. He chose you, and that's that. One less thing for me to worry
>>> about. =)
>>
>> Brett, I actually think you'd be a good champion for the 11AM
>> transition-planning session.
>
> OK, so I guess I do have one more thing to worry about. =) I'd be
> happy to do that session.
Sounds good, and I'm still happy to do the other session even with all
of the heckling :)

-Frank

From lie.1296 at gmail.com  Tue Dec  9 22:48:38 2008
From: lie.1296 at gmail.com (Lie Ryan)
Date: Tue, 9 Dec 2008 21:48:38 +0000 (UTC)
Subject: [Python-Dev] Floating-point implementations
References: <ghm940$ev4$1@ger.gmane.org>
Message-ID: <ghmp3l$80n$5@ger.gmane.org>

On Tue, 09 Dec 2008 12:15:53 -0500, Steve Holden wrote:

> Is anyone aware of any implementations that use other than 64-bit
> floating-point? I'd be particularly interested in any that use greater
> precision than the usual 56-bit mantissa. Do modern 64-bit systems
> implement anything wider than the normal double?
> 
> regards
>  Steve

Why don't we create a DecimalFloat datatype which is a variable-width 
floating point number. Decimal is variable precision fixed-point number, 
while the plain ol' float would be system dependent floating point.


From oliphant.travis at ieee.org  Wed Dec 10 16:44:06 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 09:44:06 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081208T161109-997@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
Message-ID: <ghoo47$n3b$1@ger.gmane.org>

Antoine Pitrou wrote:
> Hello,
> 
> The Py_buffer struct has two pointers named `shape` and `strides`. Each points
> to an array of Py_ssize_t values whose length is equal to the number of
> dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't
> explain how allocation of these arrays should be handled.
>

I'm coming in late to this discussion, so I apologize for being out of 
order.   But, as Nick later clarifies, the PEP *does* specify how 
allocation of these arrays is handled.

Specifically, it is the responsibility of the exporter to do it and keep 
them correct as long as the buffer is shared.

I have not been able to keep up with the python-dev mailing lists since 
I have been working full time outside of academia.   I apologize for the 
difficulty this may have caused.  But, I have been available via email 
and am happy to respond to specific questions regarding the buffer 
protocol and its implementation.

I will make some time during December to help clean up confusing issues. 
  There are still pieces to implement as well (the enhancements to the 
struct module, for example), but I will not have time for this in the 
next 6 months because I would like to spend any time I can find on 
porting NumPy to use the new buffer protocol as part of getting NumPy 
ready for 3.0.

-Travis


From steve at holdenweb.com  Wed Dec 10 16:46:55 2008
From: steve at holdenweb.com (Steve Holden)
Date: Wed, 10 Dec 2008 10:46:55 -0500
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <ghmp3l$80n$5@ger.gmane.org>
References: <ghm940$ev4$1@ger.gmane.org> <ghmp3l$80n$5@ger.gmane.org>
Message-ID: <ghoo95$nl5$1@ger.gmane.org>

Lie Ryan wrote:
> On Tue, 09 Dec 2008 12:15:53 -0500, Steve Holden wrote:
> 
>> Is anyone aware of any implementations that use other than 64-bit
>> floating-point? I'd be particularly interested in any that use greater
>> precision than the usual 56-bit mantissa. Do modern 64-bit systems
>> implement anything wider than the normal double?
>>
>> regards
>>  Steve
> 
> Why don't we create a DecimalFloat datatype which is a variable-width 
> floating point number. Decimal is variable precision fixed-point number, 
> while the plain ol' float would be system dependent floating point.
> 
Because it's a large amount of work? For a limited return ... the
implementation is bound to be hugely slow compared with hardware
floating point, and as Martin already pointed out gmpy provides
higher-precision arithmetic where required, and the Decimal module
provides arbitrary-range fixed-point arithmetic.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From oliphant.travis at ieee.org  Wed Dec 10 16:49:01 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 09:49:01 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
Message-ID: <ghoode$o5e$1@ger.gmane.org>

Alexander Belopolsky wrote:
> On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> ..
>>> Alexander's suggestion of going and looking at what the numpy folks have
>>> done in this area is probably a good idea too.
>> Well, I'm open to others doing this, but I won't do it myself. My interest is in
>> fixing the most glaring bugs of the buffer API and memoryview object. The numpy
>> folks are welcome to voice their opinions and give advice on python-dev.
>>
> 
> I did not follow numpy development for the last year or more, so I
> won't qualify as "the numpy folks," but my understanding is that numpy
> does exactly what Nick recommended: the viewed object owns shape and
> strides just as it owns the data.  The viewing object increases the
> reference count of the viewed object and thus assures that data, shape
> and strides don't go away prematurely.
> 
> I am copying Travis, the author of the PEP 3118, hoping that he would
> step in on behalf of "the numpy folks."

I appreciate the copy, as I mentioned I have not had time to follow 
python-dev in detail this year, but I'm glad to help maintain the buffer 
protocol and share any information I can.

I think Nick understands the situation:  the exporter is responsible for 
  allocating and freeing shape, strides, and suboffsets memory (as well 
as  formats, and buf memory).   How it does this is not specified and 
open for interpretation by the objects.  In the standard library there 
is nothing that needs anything complicated and I'm comfortable with what 
I wrote previously to support the objects in the standard library.

There is a length bug in the memoryview implementation, but that is a 
separate issue and being handled.

NumPy will have to handle sharing shape and strides information and will 
serve as a reference implementation when that support is added.

-Travis


From dickinsm at gmail.com  Wed Dec 10 16:51:29 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Wed, 10 Dec 2008 15:51:29 +0000
Subject: [Python-Dev] Floating-point implementations
In-Reply-To: <ghmp3l$80n$5@ger.gmane.org>
References: <ghm940$ev4$1@ger.gmane.org> <ghmp3l$80n$5@ger.gmane.org>
Message-ID: <5c6f2a5d0812100751w47c7eeefqdb33968d067e384e@mail.gmail.com>

On Tue, Dec 9, 2008 at 9:48 PM, Lie Ryan <lie.1296 at gmail.com> wrote:
> Why don't we create a DecimalFloat datatype which is a variable-width
> floating point number. Decimal is variable precision fixed-point number,
> while the plain ol' float would be system dependent floating point.

Decimal is *already* floating-point.  Its handling of exponents
and significant zeros mean that it can do a pretty good job of
imitating fixed-point as well, but it's still at root a floating-point
type.

Mark

From oliphant.travis at ieee.org  Wed Dec 10 16:54:01 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 09:54:01 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T112013-381@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
Message-ID: <ghoomq$p4v$1@ger.gmane.org>

Antoine Pitrou wrote:
> Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:
>> I did not follow numpy development for the last year or more, so I
>> won't qualify as "the numpy folks," but my understanding is that numpy
>> does exactly what Nick recommended: the viewed object owns shape and
>> strides just as it owns the data.  The viewing object increases the
>> reference count of the viewed object and thus assures that data, shape
>> and strides don't go away prematurely.
> 
> That doesn't work if e.g. you take a slice of a memoryview object, since the
> shape changes in the process.
> See http://bugs.python.org/issue4580
>


I think there was some confusion about how to support slicing with 
memory view objects.  I remember thinking about it but not getting to 
the code to write it.   The memory object is both an exporter and 
consumer of the buffer protocol.  It can have it's own semantics about 
storing shape and strides information separate from the buffer protocol.

The memory view object needs some way to translate the information it 
gets from the underlying object to the consumer of the information.

My thinking is that the memory view object itself will allocate shape 
and strides information as it needs it.

-Travis


From oliphant.travis at ieee.org  Wed Dec 10 17:12:10 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 10:12:10 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081209T230035-355@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>	<493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost>	<493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
Message-ID: <ghopor$t54$1@ger.gmane.org>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> For the slicing problem in particular, memoryview is currently trying to
>> get away with only one Py_buffer object when it needs TWO.
> 
> Why should it need two? Why couldn't the embedded Py_buffer fullfill all the
> needs of the memoryview object? If the memoryview can't be a relatively thin
> object-oriented wrapper around a Py_buffer, then this all screams failure to me.
>

The advice to look at NumPy is good because memoryview is modeled after 
NumPy -- and never completed.

When a slice view is made, a new memoryview object is created with a 
Py_buffer  structure that needs to allocate it's own shape and strides 
(or something that will allow correct shape and strides to be reported 
to any consumer).  In this way, there are two Py_buffer structures.

I do not remember implementing slicing for memoryview objects and it 
looks like the problem is there.


> ----
> 
> In all honesty, I admit I am annoyed by all the problems with the buffer API /
> memoryview object, many of which are caused by its utterly bizarre design (and
> the fact that the design team went missing in action after imposing such a
> bizarre and complex design on us), and I'm reluctant to add yet another level of
> byzantine complexity in order to solve those problems. It explains I may sound a
> bit angry at times :-)

I understand your frustration, but I've been here (just not able to 
follow python-dev), and I've tried to respond to issues that came to my 
attention.   I did not have time to complete the memoryview 
implementation, but that does not meen the buffer API is "bizarre".

Yes, the cobbled together memoryview object itself may be "bizarre", but 
that is sometimes the reality of volunteer work.  Just ignore the 
memoryview object if it does not meet your needs.

Please let me know what other problems exist.

> 
> If we really need to change things a lot to make them work, we should re-work
> the buffer API from the ground up, make the Py_buffer struct a true PyObject
> (that is, a true variable-length object so as to solve the shape and strides
> allocation issue) and merge it with the current memoryview implementation. It
> would make things both more simpler and more flexible.
> 

The only place there is a shape/strides allocation issue is with the 
memoryview object itself.   There is not an issue as far as I can see 
with the buffer protocol itself.

I'm glad you are trying to help clean up the memoryview implementation. 
         I welcome the eyes and the keystrokes.  Are you familiar at all 
with NumPy?  That may help you understand what you currently consider to 
be "utterly bizarre"

Best regards,

-Travis


From oliphant.travis at ieee.org  Wed Dec 10 17:30:09 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 10:30:09 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493F057F.4070806@canterbury.ac.nz>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com>	<1228832876.18857.11.camel@localhost>
	<493EE1A7.6050405@gmail.com>	<loom.20081209T230035-355@post.gmane.org>
	<493F057F.4070806@canterbury.ac.nz>
Message-ID: <ghoqqi$1ah$1@ger.gmane.org>

Greg Ewing wrote:
> Antoine Pitrou wrote:
> 
>> Why should it need two? Why couldn't the embedded Py_buffer fullfill 
>> all the
>> needs of the memoryview object? 
> 
> Two things here:
> 
>   1) The memoryview should *not* be holding onto a Py_buffer
>      in between calls to its getitem and setitem methods. It
>      should request one from the underlying object when needed
>      and release it again as soon as possible.
>

This is actually a different design than the PEP calls for.  From the PEP:

    This is functionally similar to the current buffer object except a
reference to base is kept and the memory view is not re-grabbed.
Thus, this memory view object holds on to the memory of base until it
is deleted.

I'm open to this changing, but it is the current PEP.


>   2) The "second" Py_buffer referred to above only needs to
>      be materialized when someone makes a GetBuffer request on
>      the memoryview itself. It's not needed for Python getitem
>      and setitem calls. (The implementation might choose to
>      implement these by creating a temporary Py_buffer, but
>      again, it would only last as long as the call.)

The memoryview object will need to store some information for 
re-calculating strides, shape, and sub-offsets for consumers.

> 
>> If the memoryview can't be a relatively thin
>> object-oriented wrapper around a Py_buffer, then this all screams 
>> failure to me.
> 
> It shouldn't be a wrapper around a Py_buffer, it should be a
> wrapper around the buffer *interface* of the underlying object.
> 

This is a different object than what was proposed, but I'm not opposed 
to it.

> It sounds to me like whoever wrote the memoryview implementation
> didn't understand how the buffer interface is meant to be used.
> That doesn't mean there's anything wrong with the buffer interface.
> 
> I have some doubts myself about whether it needs to be as
> complicated as it is, but I think the basic idea is sound:
> that Py_buffer objects are ephemeral, to be obtained when
> needed and not kept for any longer than necessary.
> 

I'm all for simplifying as much as possible.  There are some things I 
understand very well (like how strides and shape information can be 
shared with views), but others that I'm trying to understand better 
(like whether holding on to a view or re-grabbing the view is better).

I think I'm leaning toward the re-grabbing concept.   I'm all for 
improving the memoryview object, but let's not confuse that effort with 
the buffer API implementation.

I do not think we need to worry about changes to the memoryview object, 
because I doubt anything outside of the standard library is using it yet.


-Travis



From oliphant.travis at ieee.org  Wed Dec 10 17:34:09 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 10:34:09 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493FACDB.1030607@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>	<493E65B1.5020004@gmail.com>	<1228832876.18857.11.camel@localhost>	<493EE1A7.6050405@gmail.com>	<loom.20081209T230035-355@post.gmane.org>
	<493FACDB.1030607@gmail.com>
Message-ID: <ghor21$1ah$2@ger.gmane.org>

Nick Coghlan wrote:
> Antoine Pitrou wrote:
>> In all honesty, I admit I am annoyed by all the problems with the buffer API /
>> memoryview object, many of which are caused by its utterly bizarre design (and
>> the fact that the design team went missing in action after imposing such a
>> bizarre and complex design on us), and I'm reluctant to add yet another level of
>> byzantine complexity in order to solve those problems. It explains I may sound a
>> bit angry at times :-)
>>
>> If we really need to change things a lot to make them work, we should re-work
>> the buffer API from the ground up, make the Py_buffer struct a true PyObject
>> (that is, a true variable-length object so as to solve the shape and strides
>> allocation issue) and merge it with the current memoryview implementation. It
>> would make things both more simpler and more flexible.
> 
> I don't see anything wrong with the PEP 3118 protocol. It does exactly
> what it is designed to do: allow the number crunching crowd to share
> large datasets between different libraries without copying things around
> in memory. Yes, the protocol is complicated, but that is because it is
> trying to handle a complicated problem.
> 
> The memoryview implementation on the other hand is pretty broken. I do
> have a theory on how it ended up in such an unusable state, but I'm not
> particularly inclined to share it - this kind of thing can happen
> sometimes, and the important question now is how we fix it.
> 

Thank you Nick.   This is a correct assessment of the situation.  I'd 
like to help improve memoryview as I can.  It does need thought about 
what you want memoryview to be.

I wanted memoryview to be able to be sliced and diced (much like NumPy 
arrays).  But, I only was able to get around to implementing the (simple 
view of Py_buffer struct).


-Travis


From oliphant.travis at ieee.org  Wed Dec 10 17:37:22 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 10:37:22 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081210T121749-165@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>	<loom.20081208T231050-480@post.gmane.org>	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>	<loom.20081209T112013-381@post.gmane.org>	<493E65B1.5020004@gmail.com>	<1228832876.18857.11.camel@localhost>	<493EE1A7.6050405@gmail.com>	<loom.20081209T230035-355@post.gmane.org>	<493FACDB.1030607@gmail.com>
	<loom.20081210T121749-165@post.gmane.org>
Message-ID: <ghor82$1ah$3@ger.gmane.org>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> I don't see anything wrong with the PEP 3118 protocol.
> 
> Apart from the fact that:
> - it uses something (Py_buffer) which is not a PyObject and has totally
> different allocation/lifetime semantics (which makes it non-trivial to adapt to
> for anyone used to the rest of the C API)

  * this is a non-issue.   The Py_buffer struct is just a place-holder 
for a bunch of variables.  It could be a Python-object but that was seen 
as unnecessary.

> - it has unsolved issues like allocation of the underlying shape and strides 
> members

  * this is false.  It does specify how this is handled.

> - it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an
> existing one (which seem to be rather fundamental actions to me)

  * this is not part of the PEP.  Whether it's a deficiency or not is 
open to interpretation.

> 
> ... I agree there's nothing wrong with it!

I'm glad you agree.

> 
>> That Py_buffer describes the *whole* data store, but a memoryview slice
>> may only be exposing part of it - so while the info in the Py_buffer is
>> accurate for the underlying object, it is *not* accurate for the
>> memoryview itself.
> 
> And the problem here is that Py_buffer is/was (*) not flexible enough to allow
> easy modification in order to take a sub-buffer without some annoying problems.
> 

You are confusing the intent of the memoryview with the Py_buffer struct.

-Travis


From oliphant.travis at ieee.org  Wed Dec 10 17:39:45 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 10 Dec 2008 10:39:45 -0600
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493EF1D4.5090803@canterbury.ac.nz>
References: <loom.20081208T161109-997@post.gmane.org>	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>	<493E3569.6010408@gmail.com>
	<493EF1D4.5090803@canterbury.ac.nz>
Message-ID: <ghorch$1ah$4@ger.gmane.org>

Greg Ewing wrote:
> Nick Coghlan wrote:
>> Maintaining a PyDict instance to map from view pointers to shapes
>> and strides info doesn't strike me as a "complex scheme" though.
> 
> I don't see why a given buffer provider should ever need
> more than one set of shape/strides arrays at a time. It
> can allocate them on creation, reallocate them as needed
> if the shape of its internal data changes, and deallocate
> them when it goes away.
> 

I agree.  NumPy has a single shape/strides array.  The intent was to 
share this through the buffer interface.


> If you are creating view objects that present slices or
> some other alternative perspective, then the view object
> itself is a buffer provider and should maintain shape/stride
> arrays for its particular view of the underlying object.

Yes, that is correct.

-Travis


From foom at fuhm.net  Wed Dec 10 18:49:50 2008
From: foom at fuhm.net (James Y Knight)
Date: Wed, 10 Dec 2008 12:49:50 -0500
Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time"
In-Reply-To: <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com>
References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com>
	<7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com>
	<ac2200130811160843y2292fad4he6cf78bbb696d4d@mail.gmail.com>
	<319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com>
Message-ID: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net>


On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote:

> Turns out, I created an empty time.py in /tmp, just to see the error
> message. By buildout will when creating eggs from checked out modules,
> copy them to a directory under /tmp, and evidently run python from
> /tmp to create the eggs. So that process finds the time.pyc, created
> from the empty time.py, which I hadn't deleted, and breaks!

Sounds like a security hole in zc.buildout. Imagine someone *else*  
made a time.py in /tmp...

James

From regebro at gmail.com  Wed Dec 10 19:05:47 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Wed, 10 Dec 2008 19:05:47 +0100
Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time"
In-Reply-To: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net>
References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com>
	<7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com>
	<ac2200130811160843y2292fad4he6cf78bbb696d4d@mail.gmail.com>
	<319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com>
	<6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net>
Message-ID: <319e029f0812101005l898243ta05152f09fce92fc@mail.gmail.com>

On Wed, Dec 10, 2008 at 18:49, James Y Knight <foom at fuhm.net> wrote:
>
> On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote:
>
>> Turns out, I created an empty time.py in /tmp, just to see the error
>> message. By buildout will when creating eggs from checked out modules,
>> copy them to a directory under /tmp, and evidently run python from
>> /tmp to create the eggs. So that process finds the time.pyc, created
>> from the empty time.py, which I hadn't deleted, and breaks!
>
> Sounds like a security hole in zc.buildout. Imagine someone *else* made a
> time.py in /tmp...

Yup. Adam Olsen also reminded me of this, and I have filed a bug report.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From schmir at gmail.com  Wed Dec 10 19:19:19 2008
From: schmir at gmail.com (Ralf Schmitt)
Date: Wed, 10 Dec 2008 19:19:19 +0100
Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time"
In-Reply-To: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net>
References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com>
	<7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com>
	<ac2200130811160843y2292fad4he6cf78bbb696d4d@mail.gmail.com>
	<319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com>
	<6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net>
Message-ID: <932f8baf0812101019p6c08798du34c3038d1e4cd83f@mail.gmail.com>

On Wed, Dec 10, 2008 at 6:49 PM, James Y Knight <foom at fuhm.net> wrote:
>
> On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote:
>
>> Turns out, I created an empty time.py in /tmp, just to see the error
>> message. By buildout will when creating eggs from checked out modules,
>> copy them to a directory under /tmp, and evidently run python from
>> /tmp to create the eggs. So that process finds the time.pyc, created
>> from the empty time.py, which I hadn't deleted, and breaks!
>
> Sounds like a security hole in zc.buildout. Imagine someone *else* made a
> time.py in /tmp...
>

the current working directory is also added to sys.path if PYTHONPATH
contains an empty element. might be the case here...

- Ralf

From rhamph at gmail.com  Wed Dec 10 19:14:30 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 10 Dec 2008 11:14:30 -0700
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812101206.49316.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>

On Wed, Dec 10, 2008 at 4:06 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Hi,
>
> I published a new version of my fault handler: it installs an handler for
> signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and
> continue the execution of your Python program. Example:

This will of course leave the program in an undefined state.  It is
very likely to crash again, emit garbage, hang, or otherwise be
useless.

sigsetjmp() is only safe for code explicitly designed for it.  That
will never be the case for CPython, let alone all the arbitrary
libraries that may be used with it.


-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Wed Dec 10 19:31:45 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 10 Dec 2008 11:31:45 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812101139.37301.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812091931.29905.eckhardt@satorlaser.com>
	<aac2c7cb0812091122o116d189aq5032a3e94c96ee87@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
Message-ID: <aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>

On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt
<eckhardt at satorlaser.com> wrote:
> On Tuesday 09 December 2008, Adam Olsen wrote:
>> The only thing separating this from a bikeshed discussion is that a
>> bikeshed has many equally good solutions, while we have no good
>> solutions.  Instead we're trying to find the least-bad one.  The
>> unicode/bytes separation is pretty close to that.  Adding a warning
>> gets even closer.  Adding magic makes it worse.
>
> Well, I see two cases:
> 1. Converting from an uncertain representation to a known one.
> 2. Converting from a known representation to a known one.

Not quite:
1. Using a garbage file name locally (within a single process, not
talking to any libs)
2. Using a unicode filename everywhere (libs, saved to config files,
displayed to the user, etc.)

Note that if you have a GUI doing the former, all you technically need
is a placeholder like "<undecodable filename>".  You might try to
extract some ASCII out of it, but that's just a minor bonus.

On linux the bytes/unicode separation is perfect for this.  You decide
which approach you're using and use it consistently.  If you mess up
(mixing bytes and unicode) you'll consistently get an error.

We currently don't follow this model on windows, so a garbage file
name gets passed around as if it was unicode, but fails when passed to
a lib, saved to a config file, is displayed to a user, etc.
(Depending on the API, as many won't validate either.)


-- 
Adam Olsen, aka Rhamphoryncus

From victor.stinner at haypocalc.com  Wed Dec 10 19:37:16 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 10 Dec 2008 19:37:16 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
Message-ID: <200812101937.16467.victor.stinner@haypocalc.com>

Oh, I forgot the issue URL:
   http://bugs.python.org/issue3999

I also attached an example of catching segfaults.

> > I published a new version of my fault handler: it installs an handler for
> > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and
> > continue the execution of your Python program. Example:
>
> This will of course leave the program in an undefined state.  It is
> very likely to crash again, emit garbage, hang, or otherwise be
> useless.

Recover after a segfault is dangerous, but my first goal was to get the Python 
backtrace instead just one line: "Segmentation fault". It helps a lot for 
debug!

I didn't try on real world application, but with a small script the program 
continues its execution without any problem.

But yes, there is a big risk of:
 - leak memory 
 - deadlock
 - context problem, eg. for the GIL, I call PyGILState_Ensure()
 - etc.

I choosed the exceptions MemoryError and ArithmeticError, but we could use 
specific exceptions based on BaseException instead of Exception to avoid 
catching them with "except Exception: ...".

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From musiccomposition at gmail.com  Wed Dec 10 19:42:56 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 10 Dec 2008 12:42:56 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
Message-ID: <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>

On Wed, Dec 10, 2008 at 12:37 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Oh, I forgot the issue URL:
>   http://bugs.python.org/issue3999
>
> I also attached an example of catching segfaults.
>
>> > I published a new version of my fault handler: it installs an handler for
>> > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and
>> > continue the execution of your Python program. Example:
>>
>> This will of course leave the program in an undefined state.  It is
>> very likely to crash again, emit garbage, hang, or otherwise be
>> useless.
>
> Recover after a segfault is dangerous, but my first goal was to get the Python
> backtrace instead just one line: "Segmentation fault". It helps a lot for
> debug!

Exactly! That's why it doesn't belong in the Python core. We can't
guarantee anything about its affects or encourage it.

>
> I didn't try on real world application, but with a small script the program
> continues its execution without any problem.

But as you say, it would be used on real world programs!



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From rhamph at gmail.com  Wed Dec 10 19:59:09 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 10 Dec 2008 11:59:09 -0700
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0812101059h14bee038v154732aeeb9b119b@mail.gmail.com>

On Wed, Dec 10, 2008 at 11:37 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Oh, I forgot the issue URL:
>   http://bugs.python.org/issue3999
>
> I also attached an example of catching segfaults.
>
>> > I published a new version of my fault handler: it installs an handler for
>> > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and
>> > continue the execution of your Python program. Example:
>>
>> This will of course leave the program in an undefined state.  It is
>> very likely to crash again, emit garbage, hang, or otherwise be
>> useless.
>
> Recover after a segfault is dangerous, but my first goal was to get the Python
> backtrace instead just one line: "Segmentation fault". It helps a lot for
> debug!

It's possible to print the Python stack purely from C, without
invoking any Python code.  Even better, you could print the C stack
while you're at it!  Doing that in a signal handler, and then killing
the process, could be seriously considered.

Take a look at http://www.linuxjournal.com/article/6391 .  You'll
probably need #ifdef's to only use it on certain supported platforms,
and probably disable it by default anyway (configure option?  Not
sure).  Still, it'd be useful to have it there.


-- 
Adam Olsen, aka Rhamphoryncus

From tjreedy at udel.edu  Wed Dec 10 20:04:00 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 10 Dec 2008 14:04:00 -0500
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>	<200812101937.16467.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
Message-ID: <ghp3qu$49e$2@ger.gmane.org>

Benjamin Peterson wrote:
> On Wed, Dec 10, 2008 at 12:37 PM, Victor Stinner

>>> This will of course leave the program in an undefined state.  It is
>>> very likely to crash again, emit garbage, hang, or otherwise be
>>> useless.
>> Recover after a segfault is dangerous, but my first goal was to get the Python
>> backtrace instead just one line: "Segmentation fault". It helps a lot for
>> debug!
> 
> Exactly! That's why it doesn't belong in the Python core. We can't
> guarantee anything about its affects or encourage it.

Would it be safe to catch SIGSEGV, output a trace, and then exit?
IE, make the 'first goal' the only goal?


From bjourne at gmail.com  Wed Dec 10 20:22:13 2008
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Wed, 10 Dec 2008 20:22:13 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
Message-ID: <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com>

One thing i think it would be useful for in the real world is for
unittesting extension modules. You cant profitably write unit tests
for segfaults because that breaks the test harness. In situations like
those, recovering would be likely (caveat emptor of course).


2008/12/10, Adam Olsen <rhamph at gmail.com>:
> On Wed, Dec 10, 2008 at 4:06 AM, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
>> Hi,
>>
>> I published a new version of my fault handler: it installs an handler for
>> signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and
>> continue the execution of your Python program. Example:
>
> This will of course leave the program in an undefined state.  It is
> very likely to crash again, emit garbage, hang, or otherwise be
> useless.
>
> sigsetjmp() is only safe for code explicitly designed for it.  That
> will never be the case for CPython, let alone all the arbitrary
> libraries that may be used with it.
>
>
> --
> Adam Olsen, aka Rhamphoryncus
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/bjourne%40gmail.com
>


-- 
mvh Bj?rn

From rhamph at gmail.com  Wed Dec 10 21:05:17 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 10 Dec 2008 13:05:17 -0700
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com>
Message-ID: <aac2c7cb0812101205m4d3c55c7kd64b59d9a7c0b070@mail.gmail.com>

On Wed, Dec 10, 2008 at 12:22 PM, BJ?rn Lindqvist <bjourne at gmail.com> wrote:
> One thing i think it would be useful for in the real world is for
> unittesting extension modules. You cant profitably write unit tests
> for segfaults because that breaks the test harness. In situations like
> those, recovering would be likely (caveat emptor of course).

The only safe option there is a subprocess.


-- 
Adam Olsen, aka Rhamphoryncus

From mal at egenix.com  Wed Dec 10 22:09:00 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 10 Dec 2008 22:09:00 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812101205m4d3c55c7kd64b59d9a7c0b070@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>	<740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com>
	<aac2c7cb0812101205m4d3c55c7kd64b59d9a7c0b070@mail.gmail.com>
Message-ID: <49402FEC.7070303@egenix.com>

On 2008-12-10 21:05, Adam Olsen wrote:
> On Wed, Dec 10, 2008 at 12:22 PM, BJ?rn Lindqvist <bjourne at gmail.com> wrote:
>> One thing i think it would be useful for in the real world is for
>> unittesting extension modules. You cant profitably write unit tests
>> for segfaults because that breaks the test harness. In situations like
>> those, recovering would be likely (caveat emptor of course).
> 
> The only safe option there is a subprocess.

True, but that still makes it a little difficult to report the errors
found in the module.

mxTools has an optional safecall() function that allows calling
functions which potentially segfault and still returns control
back to the calling application:

http://www.egenix.com/products/python/mxBase/mxTools/

It's not (yet) documented, but fairly straight forward to use
once you've enabled it in egenix_mx_base.py:

result = mx.Tools.safecall(callable, args, kws)

Using such a function is handy in situations where you have a
multi-process application setup that sometimes needs to call
out to external libraries of varying quality - a situation that's
not uncommon in real-life situations.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 10 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From martin at v.loewis.de  Thu Dec 11 00:12:43 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 11 Dec 2008 00:12:43 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812101206.49316.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
Message-ID: <49404CEB.8040900@v.loewis.de>

> I would appreciate a review, especially for the patch in Python/ceval.c.

In this specific case, it is not clear for what objective you want such
review. For inclusion into Python?

Several people already said (essentially) that: -1. I don't think such
code should be added to the Python core, no matter how smart or correct
it is.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Thu Dec 11 00:56:04 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 11 Dec 2008 12:56:04 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <ghopor$t54$1@ger.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org> <ghopor$t54$1@ger.gmane.org>
Message-ID: <49405714.6030108@canterbury.ac.nz>

Travis Oliphant wrote:

> When a slice view is made, a new memoryview object is created with a 
> Py_buffer  structure that needs to allocate it's own shape and strides 
> (or something that will allow correct shape and strides to be reported 
> to any consumer).  In this way, there are two Py_buffer structures.

To be precise, the important thing is for the memoryview to allocate
its own shape and strides. It's not strictly necessary to keep them
internally in a Py_buffer struct, although that may be a convenient
way to do it.

-- 
Greg

From jyasskin at gmail.com  Thu Dec 11 01:12:02 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Wed, 10 Dec 2008 16:12:02 -0800
Subject: [Python-Dev] Merging flow
In-Reply-To: <gh8s08$p9r$1@ger.gmane.org>
References: <gh8s08$p9r$1@ger.gmane.org>
Message-ID: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>

Was there ever a conclusion to this? I need to merge the patches
associated with issue 4597 from trunk to all the maintenance branches,
and I'd like to avoid messing anyone up if possible. If I don't hear
back, I'll plan to svnmerge directly from trunk to each of the
branches, and then block my merge to py3k from being merged again to
release30-maint.

Thanks,
Jeffrey

On Thu, Dec 4, 2008 at 7:12 AM, Christian Heimes <lists at cheimes.de> wrote:
> Several people have asked about the patch and merge flow. Now that Python
> 3.0 is out it's a bit more complicated.
>
> Flow diagram
> ------------
>
> trunk ---> release26-maint
>       \->      py3k       ---> release30-maint
>
>
> Patches for all versions of Python should land in the trunk. They are then
> merged into release26-maint and py3k branches. Changes for Python 3.0 are
> merged via the py3k branch.
>
> Christian

From alexander.belopolsky at gmail.com  Thu Dec 11 01:21:06 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 10 Dec 2008 19:21:06 -0500
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <49404CEB.8040900@v.loewis.de>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
Message-ID: <d38f5330812101621j4a1d7a54ie77775e92c5afd62@mail.gmail.com>

On Wed, Dec 10, 2008 at 6:12 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I would appreciate a review, especially for the patch in Python/ceval.c.
>
> In this specific case, it is not clear for what objective you want such
> review. For inclusion into Python?
>

Even if it does not result in an inclusion into Python, I personally
would be quite interested in following this thread if discussion of
Victor's patch continues.  It may quite possibly yield some
improvements to python development tools (core and libraries'
development).  Graceful handling of hard errors is an unsolved problem
in Python and it has become more important since ctypes made it to the
standard library and therefore it has become possible to easily
trigger a hard error from pure python code.

> Several people already said (essentially) that: -1. I don't think such
> code should be added to the Python core, no matter how smart or correct
> it is.
>

Looking up the thread, I don't see anyone taking such an extreme
position: never recover from SEGV even if it can be done 100%
correctly.  The sentiment that I see and the one that I share is that
it is extremely difficult (and maybe impossible) to do correctly.
However, if someone comes up with a smart solution, I would be very
much interested to see it.

While by the time you get a SIGSEGV, you process is likely to be
beyond recovery, I don't think the same applies to SIGFPE.   It may
also be possible to get rid of the arbitrary recursion limit on Linux
(I've heard this problem is solved on Windows) by being smart about
handling SIGSEGV.

Finally, providing some diagnostic before exiting on hard errors is
not without precedent: I believe R has such a feature.  It may be
worthwhile to compare Victor's approach to what is done in R.

It may, however, be better to move further discussion to the tracker
(I understand that the patch is at
<http://bugs.python.org/issue3999>).

From greg.ewing at canterbury.ac.nz  Thu Dec 11 01:21:48 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 11 Dec 2008 13:21:48 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <493FACDB.1030607@gmail.com>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org> <493FACDB.1030607@gmail.com>
Message-ID: <49405D1C.60207@canterbury.ac.nz>

Nick Coghlan wrote:

> The multi-dimensional cases get pretty tricky though, since they will
> almost always end up dealing with non-contiguous data. The PEP 3118
> protocol is up to handling the task, but the implementation of the index
> mapping to handle these multi-dimensional cases is highly non-trivial,
> and probably best left to third party libraries like numpy.

I'm wondering whether there should be some kind of utility
function provided with the buffer API for doing this. It
would take the shape/strides info from a Py_buffer together
with a set of slicing parameters, and create you another
set of shape/strides info describing the slice.

It seems sensible to put the effort into doing this
correctly once, rather than leave everyone implementing
a memoryview-like object to come up with their own
half-working and/or broken implementation.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Thu Dec 11 01:21:56 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 11 Dec 2008 13:21:56 +1300
Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer
In-Reply-To: <loom.20081210T121749-165@post.gmane.org>
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com> <loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com> <loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org> <493E65B1.5020004@gmail.com>
	<1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org> <493FACDB.1030607@gmail.com>
	<loom.20081210T121749-165@post.gmane.org>
Message-ID: <49405D24.3010607@canterbury.ac.nz>

Antoine Pitrou wrote:

> - it uses something (Py_buffer) which is not a PyObject and has totally
> different allocation/lifetime semantics

This was a deliberate decision -- in fact I argued for it myself.
The buffer interface is meant to be a minimal-overhead way for
C code to get at the underlying data. Requiring allocation of
a PyObject would be too expensive.

The way to think about the Py_buffer struct is not as an
object in its own right, but just a place to put some output
parameters from the GetBuffer call.

The lifetime of the information pointed to by the Py_buffer
is the same as the lifetime of the underlying object, and that
object is responsible for managing it.

> - it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an
> existing one (which seem to be rather fundamental actions to me)

I don't think they're as fundamental as all that. But some
utilities for doing things like this could be useful, as I
mentioned in another post.

-- 
Greg

From solipsis at pitrou.net  Thu Dec 11 01:35:05 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Dec 2008 00:35:05 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Allocation_of_shape_and_strides_fields_in_?=
	=?utf-8?q?Py=5Fbuffer?=
References: <loom.20081208T161109-997@post.gmane.org>
	<493D87BD.90106@gmail.com>
	<loom.20081208T211114-616@post.gmane.org>
	<493D94CD.5040209@gmail.com>
	<loom.20081208T231050-480@post.gmane.org>
	<d38f5330812081901l1b0a59delf8f73995c0db2ab9@mail.gmail.com>
	<loom.20081209T112013-381@post.gmane.org>
	<493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost>
	<493EE1A7.6050405@gmail.com>
	<loom.20081209T230035-355@post.gmane.org>
	<493FACDB.1030607@gmail.com>
	<loom.20081210T121749-165@post.gmane.org>
	<49405D24.3010607@canterbury.ac.nz>
Message-ID: <loom.20081211T002856-879@post.gmane.org>

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> 
> This was a deliberate decision -- in fact I argued for it myself.
> The buffer interface is meant to be a minimal-overhead way for
> C code to get at the underlying data. Requiring allocation of
> a PyObject would be too expensive.

Tuples are used everywhere throughout the interpreter and yet they are proper
PyObjects. Even simple integers are often wrapped into PyLong objects (see the
getitem/setitem protocol in Py3k). I doubt Py_buffers are more critical for
performance than tuples and integers are.




From martin at v.loewis.de  Thu Dec 11 01:56:56 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 11 Dec 2008 01:56:56 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
Message-ID: <49406558.7020005@v.loewis.de>

Jeffrey Yasskin wrote:
> Was there ever a conclusion to this? I need to merge the patches
> associated with issue 4597 from trunk to all the maintenance branches,
> and I'd like to avoid messing anyone up if possible. If I don't hear
> back, I'll plan to svnmerge directly from trunk to each of the
> branches, and then block my merge to py3k from being merged again to
> release30-maint.

No - you should merge from the py3k branch to the release30-maint branch.

Regards,
Martin

From rhamph at gmail.com  Thu Dec 11 02:01:45 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Wed, 10 Dec 2008 18:01:45 -0700
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <d38f5330812101621j4a1d7a54ie77775e92c5afd62@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
	<d38f5330812101621j4a1d7a54ie77775e92c5afd62@mail.gmail.com>
Message-ID: <aac2c7cb0812101701i1956b8fcu4eef9d09869dacd6@mail.gmail.com>

On Wed, Dec 10, 2008 at 5:21 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Wed, Dec 10, 2008 at 6:12 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Several people already said (essentially) that: -1. I don't think such
>> code should be added to the Python core, no matter how smart or correct
>> it is.
>>
>
> Looking up the thread, I don't see anyone taking such an extreme
> position: never recover from SEGV even if it can be done 100%
> correctly.  The sentiment that I see and the one that I share is that
> it is extremely difficult (and maybe impossible) to do correctly.
> However, if someone comes up with a smart solution, I would be very
> much interested to see it.

It is impossible to do in general, and I am -1 on any misguided
attempts to do so.


> While by the time you get a SIGSEGV, you process is likely to be
> beyond recovery, I don't think the same applies to SIGFPE.

No, it's as much about the context as it is the error.  We could write
our own floating point code that can recover from SIGFPE (which isn't
portable, but still mostly doable), but enabling it for arbitrary
third-party libraries is completely unsafe.

Printing a stack trace and then aborting would be possible and useful though.


> It may
> also be possible to get rid of the arbitrary recursion limit on Linux
> (I've heard this problem is solved on Windows) by being smart about
> handling SIGSEGV.

If we could calculate how much stack is left we'd have a much more
robust way of doing recursion limits.  I suppose this could be done by
reading a byte from each page with a temporary SIGSEGV handler
installed, but I'm not convinced you can't ask the platform directly
somehow.  I'd also be considered about thread-safety.


> Finally, providing some diagnostic before exiting on hard errors is
> not without precedent: I believe R has such a feature.  It may be
> worthwhile to compare Victor's approach to what is done in R.
>
> It may, however, be better to move further discussion to the tracker
> (I understand that the patch is at
> <http://bugs.python.org/issue3999>).


-- 
Adam Olsen, aka Rhamphoryncus

From alexander.belopolsky at gmail.com  Thu Dec 11 02:22:23 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 10 Dec 2008 20:22:23 -0500
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812101701i1956b8fcu4eef9d09869dacd6@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
	<d38f5330812101621j4a1d7a54ie77775e92c5afd62@mail.gmail.com>
	<aac2c7cb0812101701i1956b8fcu4eef9d09869dacd6@mail.gmail.com>
Message-ID: <d38f5330812101722t518be028i964763f8d1d5c6ba@mail.gmail.com>

On Wed, Dec 10, 2008 at 8:01 PM, Adam Olsen <rhamph at gmail.com> wrote:
..
> It is impossible to do in general, and I am -1 on any misguided
> attempts to do so.
>

I agree, recovering from segfaults caused by buggy third party C
modules is a losing proposition, but for a limited number of
conditions that can be triggered from python code running on a
non-buggy interpreter (hopefully ctypes included, but that would be
hard), converting signals into exceptions may be possible.
..
> Printing a stack trace and then aborting would be possible and useful though.
>
Even a simple dialog: Python have encountered a segfault, would you
like to dump core? y/n in the interactive session will be quite
useful.

From hodgestar+pythondev at gmail.com  Thu Dec 11 08:21:32 2008
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Thu, 11 Dec 2008 09:21:32 +0200
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
Message-ID: <fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>

On Wed, Dec 10, 2008 at 8:37 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Recover after a segfault is dangerous, but my first goal was to get the Python
> backtrace instead just one line: "Segmentation fault". It helps a lot for
> debug!

This would be extremely useful. I've had PyGTK segfault on me a number
of times in an app I'm writing and I keep meaning to try get to the
bottom of the issue but it happens infrequently and somehow I never
get around to it. Some indictation of what Python was executing when
the segfault occurred would help narrow now the possibilities rapidly.

Schiavo
Simon

From eckhardt at satorlaser.com  Thu Dec 11 10:19:16 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Thu, 11 Dec 2008 10:19:16 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812101139.37301.eckhardt@satorlaser.com> 
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
Message-ID: <200812111019.16950.eckhardt@satorlaser.com>

On Wednesday 10 December 2008, Adam Olsen wrote:
> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt
>
> <eckhardt at satorlaser.com> wrote:
> > On Tuesday 09 December 2008, Adam Olsen wrote:
> >> The only thing separating this from a bikeshed discussion is that a
> >> bikeshed has many equally good solutions, while we have no good
> >> solutions.  Instead we're trying to find the least-bad one.  The
> >> unicode/bytes separation is pretty close to that.  Adding a warning
> >> gets even closer.  Adding magic makes it worse.
> >
> > Well, I see two cases:
> > 1. Converting from an uncertain representation to a known one.
> > 2. Converting from a known representation to a known one.
>
> Not quite:
> 1. Using a garbage file name locally (within a single process, not
> talking to any libs)
> 2. Using a unicode filename everywhere (libs, saved to config files,
> displayed to the user, etc.)

I think there is some misunderstanding. I was referring to conversions and 
whether it is good to perform them implicitly. For that, I saw the above two 
cases.

> On linux the bytes/unicode separation is perfect for this.  You decide
> which approach you're using and use it consistently.  If you mess up
> (mixing bytes and unicode) you'll consistently get an error.
>
> We currently don't follow this model on windows, so a garbage file
> name gets passed around as if it was unicode, but fails when passed to
> a lib, saved to a config file, is displayed to a user, etc.

I'm not sure I agree with this. Facts I know are:
1. On POSIX systems, there is no reliable encoding for filenames while the 
system APIs use char/byte strings.
2. On MS Windows, the encoding for filenames is Unicode/UTF-16.

Returning Unicode strings from readdir() is wrong because it can't handle the 
case 1 above. Returning byte strings is wrong because it can't handle case 2 
above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, 
worst case, to the locale-dependent MBCS. Returning something different 
depending on the system us also broken because that would make Python code 
that uses this function and assumes a certain type unportable.

Note that this doesn't get much better if you provide a separate readdirb() 
API or one that simply returns a byte string or Unicode string depending on 
its argument. It just shifts the brokenness from readdir() to the code that 
uses it, unless this code makes a distinction between the target systems. 
Since way too many programmers are not aware of the problem, they will not 
handle these systems differently, so code will become non-portable.

What I'd just like some feedback on is the approach to return a distinct type 
(neither a byte string nor a Unicode string) from readdir(). In order to use 
this, a programmer will have to convert it explicitly, otherwise e.g. 
printing it will just produce <env_string at 0x01234567>. This will 
immediately bump each programmer with their heads on the issue of unknown 
encodings and they will have to make the application-specific choice whether 
an approximation of the filename, an exception or ignoring the file is the 
right choice. Also, it presents the options for doing this conversion in a 
single class, which I personally find much better than providing overloads 
for hundreds of functions.


Sorry for ranting, but I'm a bit confused and desperate, because either I'm 
unable to explain what I mean or I'm really not understanding something that 
everybody else here seems to agree upon. I just know that using a distinct 
path type has helped me in C++ in the past, and I don't see why it shouldn't 
in Python.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From victor.stinner at haypocalc.com  Thu Dec 11 10:34:24 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 11 Dec 2008 10:34:24 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <ghp3qu$49e$2@ger.gmane.org>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
Message-ID: <200812111034.24319.victor.stinner@haypocalc.com>

Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit?:
> >> Recover after a segfault is dangerous, but my first goal was to get the
> >> Python backtrace instead just one line: "Segmentation fault". It helps a
> >> lot for debug!
> >
> > Exactly! That's why it doesn't belong in the Python core. We can't
> > guarantee anything about its affects or encourage it.
>
> Would it be safe to catch SIGSEGV, output a trace, and then exit?
> IE, make the 'first goal' the only goal?

Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to 
display the trace? It would be nice to -at least- use the Python stderr 
(which is written in pure Python for Python3). It would be better if the user 
can setup a callback, like sys.excepthook. But if -as many people wrote- 
Python is totally broken after a segfault, it is maybe not a good idea :-)

I guess that sigsetjmp() and siglongjmp() hack can be avoided in 
Py_EvalFrameEx(), so ceval.c could be unchanged.

New pseudocode:
  set checkpoint
  if error:
     get the backtrace
     display the backtrace
     fast exit (eg. don't call atexit, don't free memory, ...)
  else:
     normal execution

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From fijall at gmail.com  Thu Dec 11 11:10:14 2008
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 11 Dec 2008 11:10:14 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812101701i1956b8fcu4eef9d09869dacd6@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
	<d38f5330812101621j4a1d7a54ie77775e92c5afd62@mail.gmail.com>
	<aac2c7cb0812101701i1956b8fcu4eef9d09869dacd6@mail.gmail.com>
Message-ID: <693bc9ab0812110210i5174ce77u4309e5841b897a1a@mail.gmail.com>

>
> If we could calculate how much stack is left we'd have a much more
> robust way of doing recursion limits.  I suppose this could be done by
> reading a byte from each page with a temporary SIGSEGV handler
> installed, but I'm not convinced you can't ask the platform directly
> somehow.  I'd also be considered about thread-safety.
>

It's something as hard as taking address of local variable at the
beginning of the program and at any arbitrary point. Of course 'how
much is left' means additional arithmetics.

Cheers,
fijal

From steve at holdenweb.com  Thu Dec 11 13:13:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 11 Dec 2008 07:13:49 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812111019.16950.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
Message-ID: <494103FD.5000101@holdenweb.com>

Ulrich Eckhardt wrote:
> On Wednesday 10 December 2008, Adam Olsen wrote:
>> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt
>>
>> <eckhardt at satorlaser.com> wrote:
>>> On Tuesday 09 December 2008, Adam Olsen wrote:
>>>> The only thing separating this from a bikeshed discussion is that a
>>>> bikeshed has many equally good solutions, while we have no good
>>>> solutions.  Instead we're trying to find the least-bad one.  The
>>>> unicode/bytes separation is pretty close to that.  Adding a warning
>>>> gets even closer.  Adding magic makes it worse.
>>> Well, I see two cases:
>>> 1. Converting from an uncertain representation to a known one.
>>> 2. Converting from a known representation to a known one.
>> Not quite:
>> 1. Using a garbage file name locally (within a single process, not
>> talking to any libs)
>> 2. Using a unicode filename everywhere (libs, saved to config files,
>> displayed to the user, etc.)
> 
> I think there is some misunderstanding. I was referring to conversions and 
> whether it is good to perform them implicitly. For that, I saw the above two 
> cases.
> 
>> On linux the bytes/unicode separation is perfect for this.  You decide
>> which approach you're using and use it consistently.  If you mess up
>> (mixing bytes and unicode) you'll consistently get an error.
>>
>> We currently don't follow this model on windows, so a garbage file
>> name gets passed around as if it was unicode, but fails when passed to
>> a lib, saved to a config file, is displayed to a user, etc.
> 
> I'm not sure I agree with this. Facts I know are:
> 1. On POSIX systems, there is no reliable encoding for filenames while the 
> system APIs use char/byte strings.
> 2. On MS Windows, the encoding for filenames is Unicode/UTF-16.
> 
> Returning Unicode strings from readdir() is wrong because it can't handle the 
> case 1 above. Returning byte strings is wrong because it can't handle case 2 
> above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, 
> worst case, to the locale-dependent MBCS. Returning something different 
> depending on the system us also broken because that would make Python code 
> that uses this function and assumes a certain type unportable.
> 
> Note that this doesn't get much better if you provide a separate readdirb() 
> API or one that simply returns a byte string or Unicode string depending on 
> its argument. It just shifts the brokenness from readdir() to the code that 
> uses it, unless this code makes a distinction between the target systems. 
> Since way too many programmers are not aware of the problem, they will not 
> handle these systems differently, so code will become non-portable.
> 
> What I'd just like some feedback on is the approach to return a distinct type 
> (neither a byte string nor a Unicode string) from readdir(). In order to use 
> this, a programmer will have to convert it explicitly, otherwise e.g. 
> printing it will just produce <env_string at 0x01234567>. This will 
> immediately bump each programmer with their heads on the issue of unknown 
> encodings and they will have to make the application-specific choice whether 
> an approximation of the filename, an exception or ignoring the file is the 
> right choice. Also, it presents the options for doing this conversion in a 
> single class, which I personally find much better than providing overloads 
> for hundreds of functions.
> 
> 
> Sorry for ranting, but I'm a bit confused and desperate, because either I'm 
> unable to explain what I mean or I'm really not understanding something that 
> everybody else here seems to agree upon. I just know that using a distinct 
> path type has helped me in C++ in the past, and I don't see why it shouldn't 
> in Python.
> 
Seems to me this just threatens to add to the confusion.

If you know what your filesystem produces, you can take the appropriate
action to convert it into a type that makes sense to the user. If you
don't, then at least if you have the string in its bytes form you can
re-present it to the filesystem to manipulate the file. What are we
supposed to do with the "special type"?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From steve at holdenweb.com  Thu Dec 11 13:13:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 11 Dec 2008 07:13:49 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812111019.16950.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
Message-ID: <494103FD.5000101@holdenweb.com>

Ulrich Eckhardt wrote:
> On Wednesday 10 December 2008, Adam Olsen wrote:
>> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt
>>
>> <eckhardt at satorlaser.com> wrote:
>>> On Tuesday 09 December 2008, Adam Olsen wrote:
>>>> The only thing separating this from a bikeshed discussion is that a
>>>> bikeshed has many equally good solutions, while we have no good
>>>> solutions.  Instead we're trying to find the least-bad one.  The
>>>> unicode/bytes separation is pretty close to that.  Adding a warning
>>>> gets even closer.  Adding magic makes it worse.
>>> Well, I see two cases:
>>> 1. Converting from an uncertain representation to a known one.
>>> 2. Converting from a known representation to a known one.
>> Not quite:
>> 1. Using a garbage file name locally (within a single process, not
>> talking to any libs)
>> 2. Using a unicode filename everywhere (libs, saved to config files,
>> displayed to the user, etc.)
> 
> I think there is some misunderstanding. I was referring to conversions and 
> whether it is good to perform them implicitly. For that, I saw the above two 
> cases.
> 
>> On linux the bytes/unicode separation is perfect for this.  You decide
>> which approach you're using and use it consistently.  If you mess up
>> (mixing bytes and unicode) you'll consistently get an error.
>>
>> We currently don't follow this model on windows, so a garbage file
>> name gets passed around as if it was unicode, but fails when passed to
>> a lib, saved to a config file, is displayed to a user, etc.
> 
> I'm not sure I agree with this. Facts I know are:
> 1. On POSIX systems, there is no reliable encoding for filenames while the 
> system APIs use char/byte strings.
> 2. On MS Windows, the encoding for filenames is Unicode/UTF-16.
> 
> Returning Unicode strings from readdir() is wrong because it can't handle the 
> case 1 above. Returning byte strings is wrong because it can't handle case 2 
> above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, 
> worst case, to the locale-dependent MBCS. Returning something different 
> depending on the system us also broken because that would make Python code 
> that uses this function and assumes a certain type unportable.
> 
> Note that this doesn't get much better if you provide a separate readdirb() 
> API or one that simply returns a byte string or Unicode string depending on 
> its argument. It just shifts the brokenness from readdir() to the code that 
> uses it, unless this code makes a distinction between the target systems. 
> Since way too many programmers are not aware of the problem, they will not 
> handle these systems differently, so code will become non-portable.
> 
> What I'd just like some feedback on is the approach to return a distinct type 
> (neither a byte string nor a Unicode string) from readdir(). In order to use 
> this, a programmer will have to convert it explicitly, otherwise e.g. 
> printing it will just produce <env_string at 0x01234567>. This will 
> immediately bump each programmer with their heads on the issue of unknown 
> encodings and they will have to make the application-specific choice whether 
> an approximation of the filename, an exception or ignoring the file is the 
> right choice. Also, it presents the options for doing this conversion in a 
> single class, which I personally find much better than providing overloads 
> for hundreds of functions.
> 
> 
> Sorry for ranting, but I'm a bit confused and desperate, because either I'm 
> unable to explain what I mean or I'm really not understanding something that 
> everybody else here seems to agree upon. I just know that using a distinct 
> path type has helped me in C++ in the past, and I don't see why it shouldn't 
> in Python.
> 
Seems to me this just threatens to add to the confusion.

If you know what your filesystem produces, you can take the appropriate
action to convert it into a type that makes sense to the user. If you
don't, then at least if you have the string in its bytes form you can
re-present it to the filesystem to manipulate the file. What are we
supposed to do with the "special type"?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From ncoghlan at gmail.com  Thu Dec 11 13:18:31 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 11 Dec 2008 22:18:31 +1000
Subject: [Python-Dev] Merging flow
In-Reply-To: <49406558.7020005@v.loewis.de>
References: <gh8s08$p9r$1@ger.gmane.org>	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
	<49406558.7020005@v.loewis.de>
Message-ID: <49410517.1030601@gmail.com>

Martin v. L?wis wrote:
> Jeffrey Yasskin wrote:
>> Was there ever a conclusion to this? I need to merge the patches
>> associated with issue 4597 from trunk to all the maintenance branches,
>> and I'd like to avoid messing anyone up if possible. If I don't hear
>> back, I'll plan to svnmerge directly from trunk to each of the
>> branches, and then block my merge to py3k from being merged again to
>> release30-maint.
> 
> No - you should merge from the py3k branch to the release30-maint branch.

I believe that's difficult when you previously merged from the trunk to
the py3k branch - the merged change to the svnmerge related properties
on the root directory gets in the way when svnmerge attempts to update
them on the maintenance branch.

That's what started this thread, and so far nobody has come up with a
workaround. It seems to me that svnmerge.py should just be able to do a
svn revert on the affected properties in the maintenance branch before
it attempts to modify them, but my svn-fu isn't strong enough for me to
say that for sure.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From skip at pobox.com  Thu Dec 11 13:57:03 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 11 Dec 2008 06:57:03 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
Message-ID: <18753.3615.21624.999357@montanaro-dyndns-org.local>


    Simon> Some indictation of what Python was executing when the segfault
    Simon> occurred would help narrow now the possibilities rapidly.

The Python distribution comes with a Misc/gdbinit file (you can grab it from
the Subversion source tree via the web as well) that defines a pystack
command.  It will work with core files as well as running processes and
should give you a very good idea where your Python code was executing when
the segfault occurred.

-- 
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/

From solipsis at pitrou.net  Thu Dec 11 14:10:48 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Dec 2008 13:10:48 +0000 (UTC)
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
	<18753.3615.21624.999357@montanaro-dyndns-org.local>
Message-ID: <loom.20081211T130601-166@post.gmane.org>

<skip <at> pobox.com> writes:
> 
> The Python distribution comes with a Misc/gdbinit file (you can grab it from
> the Subversion source tree via the web as well) that defines a pystack
> command.  It will work with core files as well as running processes and
> should give you a very good idea where your Python code was executing when
> the segfault occurred.

Still, it would be much better if the stack trace could be printed by Python
itself rather than having to resort to gdb wizardry. Especially if the problem
is reported by one of your non-developer users.



From skip at pobox.com  Thu Dec 11 14:28:01 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 11 Dec 2008 07:28:01 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <loom.20081211T130601-166@post.gmane.org>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
	<18753.3615.21624.999357@montanaro-dyndns-org.local>
	<loom.20081211T130601-166@post.gmane.org>
Message-ID: <18753.5473.853788.617528@montanaro-dyndns-org.local>


    Antoine> Still, it would be much better if the stack trace could be
    Antoine> printed by Python itself rather than having to resort to gdb
    Antoine> wizardry. Especially if the problem is reported by one of your
    Antoine> non-developer users.

I understand.  The guy has a problem today for which there is a solution
that I posted.  If he's "been meaning to look into the problem" and he's
posting to python-dev I presume he knows at least a little about running gdb
if he's operating in a Unix environment.  These two gdb commands

    source .gdbinit
    pystack

shouldn't be too much of a barrier.

Skip

From solipsis at pitrou.net  Thu Dec 11 14:37:39 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 11 Dec 2008 13:37:39 +0000 (UTC)
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<aac2c7cb0812101014q7d93dba9ped4a7511f45654f2@mail.gmail.com>
	<200812101937.16467.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
	<18753.3615.21624.999357@montanaro-dyndns-org.local>
	<loom.20081211T130601-166@post.gmane.org>
	<18753.5473.853788.617528@montanaro-dyndns-org.local>
Message-ID: <loom.20081211T133305-853@post.gmane.org>

<skip <at> pobox.com> writes:
> 
> I understand.  The guy has a problem today for which there is a solution
> that I posted.  If he's "been meaning to look into the problem" and he's
> posting to python-dev I presume he knows at least a little about running gdb
> if he's operating in a Unix environment.  These two gdb commands
> 
>     source .gdbinit
>     pystack
> 
> shouldn't be too much of a barrier.

Well, but sometimes you don't have a core file (because you didn't run ulimit
before launching Python and the crash wasn't expected; if the crash is very
erratic, by the time you've fixed the system limits, you don't manage to
reproduce it anymore, or it takes hours because it's at the end of a very long
workload). Sometimes you don't have the gdbinit file around (for example,
Mandriva doesn't ship it with any Python-related package). Sometimes you are
under Windows.

etc. :-)



From eckhardt at satorlaser.com  Thu Dec 11 14:41:46 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Thu, 11 Dec 2008 14:41:46 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <494103FD.5000101@holdenweb.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812111019.16950.eckhardt@satorlaser.com> 
	<494103FD.5000101@holdenweb.com>
Message-ID: <200812111441.46739.eckhardt@satorlaser.com>

On Thursday 11 December 2008, Steve Holden wrote:
> Ulrich Eckhardt wrote:
> > What I'd just like some feedback on is the approach to return a distinct
> > type (neither a byte string nor a Unicode string) from readdir(). In
> > order to use this, a programmer will have to convert it explicitly,
> > otherwise e.g. printing it will just produce <env_string at 0x01234567>.
> > This will immediately bump each programmer with their heads on the issue
> > of unknown encodings and they will have to make the application-specific
> > choice whether an approximation of the filename, an exception or ignoring
> > the file is the right choice. Also, it presents the options for doing
> > this conversion in a single class, which I personally find much better
> > than providing overloads for hundreds of functions.
[...]
>
> Seems to me this just threatens to add to the confusion.
>
> If you know what your filesystem produces, you can take the appropriate
> action to convert it into a type that makes sense to the user. If you
> don't, then at least if you have the string in its bytes form you can
                                       ^^^^^^^^^^^^^^^^^^^

There are operating systems that don't use bytes to represent a file path, 
namely all the MS Windows variants. Even worse, when you use a byte string 
there, it typically means that you want to use the obsolete encoding that is 
based on codepages.

Why can we not preserve the representation of a path as it is? Why do we 
_have_ to convert it to anything at all, without even knowing if this 
conversion is needed? I just want to do something to a file's content, why 
does its path have to be converted to something and then be converted back in 
order for the system to digest it?

> re-present it to the filesystem to manipulate the file. What are we
> supposed to do with the "special type"?

You receive from readdir() and pass it to stat(), simple as that. No 
conversions from the native representation needed. If you need a textual 
representation, then you have to convert it and you have to do so explicitly 
according to whatever logic your application requires.

If readdir() returned Unicode text, people would start taking that for 
granted. If it returned bytes, just the same. Returning a completely 
unrelated type will give them enough hint that for this thing they have to 
rethink their assumptions. This runs along the lines of "In the face of 
ambiguity, refuse the temptation to guess.", as it makes guessing rather 
impossible.

I just don't see a case where using a separate path class would break things. 
Further, the special handling that is required would be made even clearer by 
using such a class.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From krstic at solarsail.hcs.harvard.edu  Thu Dec 11 14:44:57 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Thu, 11 Dec 2008 14:44:57 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <49404CEB.8040900@v.loewis.de>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
Message-ID: <B5342F9C-6344-4390-AA07-91945A82AF3B@solarsail.hcs.harvard.edu>

Hi Martin,

On Dec 11, 2008, at 12:12 AM, Martin v. L?wis wrote:
> Several people already said (essentially) that: -1. I don't think such
> code should be added to the Python core, no matter how smart or  
> correct
> it is.


does your -1 apply only to attempts to resume execution after SIGSEGV,  
or also to the idea of dumping the stack and immediately exiting? The  
former strikes me as crazy talk, while the latter is genuinely useful.

Cheers,

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From victor.stinner at haypocalc.com  Thu Dec 11 15:19:07 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 11 Dec 2008 15:19:07 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <18753.3615.21624.999357@montanaro-dyndns-org.local>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
	<18753.3615.21624.999357@montanaro-dyndns-org.local>
Message-ID: <200812111519.07899.victor.stinner@haypocalc.com>

Le Thursday 11 December 2008 13:57:03 skip at pobox.com, vous avez ?crit?:
>     Simon> Some indictation of what Python was executing when the segfault
>     Simon> occurred would help narrow now the possibilities rapidly.
>
> The Python distribution comes with a Misc/gdbinit file

Hum, do you really run *all* programs in gdb? Most of the time, you don't 
expect a crash (because you trust your softwares). You will have to try to 
reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!).

My new proposition is to display the backtrace instead of just the 
message "segmentation fault". It's not a problem if displaying the backtrace 
produces new fault because it's already better than just the 
message "segmentation fault". Even with my SIGSEVG handler, you can still use 
gdb because gdb catchs the signal before the program.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From ijmorlan at uwaterloo.ca  Thu Dec 11 14:58:51 2008
From: ijmorlan at uwaterloo.ca (Isaac Morland)
Date: Thu, 11 Dec 2008 08:58:51 -0500 (EST)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com>
	<200812111441.46739.eckhardt@satorlaser.com>
Message-ID: <Pine.GSO.4.64.0812110846200.28468@core.cs.uwaterloo.ca>

On Thu, 11 Dec 2008, Ulrich Eckhardt wrote:

> On Thursday 11 December 2008, Steve Holden wrote:
>> Ulrich Eckhardt wrote:
>> Seems to me this just threatens to add to the confusion.
>>
>> If you know what your filesystem produces, you can take the appropriate
>> action to convert it into a type that makes sense to the user. If you
>> don't, then at least if you have the string in its bytes form you can
>                                       ^^^^^^^^^^^^^^^^^^^
>
> There are operating systems that don't use bytes to represent a file path,
> namely all the MS Windows variants. Even worse, when you use a byte string
> there, it typically means that you want to use the obsolete encoding that is
> based on codepages.
>
> Why can we not preserve the representation of a path as it is? Why do we
> _have_ to convert it to anything at all, without even knowing if this
> conversion is needed? I just want to do something to a file's content, why
> does its path have to be converted to something and then be converted back in
> order for the system to digest it?
>
>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
>
> You receive from readdir() and pass it to stat(), simple as that. No
> conversions from the native representation needed. If you need a textual
> representation, then you have to convert it and you have to do so explicitly
> according to whatever logic your application requires.

Not only would this address the issue with the local filesystem, it would 
also provide a principled way to deal with remote filesystems.  For 
example, an FTP interface library for Python could use this type to 
returns paths of the sort actually supported by the raw FTP protocol.

Thinking of "the" filesystem is actually a misconception - always 
referring to "a" filesystem opens up all sorts of possibilities.  There is 
a lot of coding to do to allow this, but allowing programs to work with 
paths and files in the local filesystem, remote filesystems, and 
filesystems constructed from others (e.g., by expanding symlinks, changing 
the root similar to chroot, or encoding/unencoding pathnames) would open 
up lots of possibilities, including better test environments.

This is an interesting case of separating byte strings from character 
strings.  As long as the two are conflated, everything appears simple. 
But when they are separated, not only are there two types where before 
there was only one, it turns out that which type is correct in some 
circumstances depends on the platform.  Also, many objects which are byte 
strings at the protocol level are usually or always meant to be character 
strings of some sort, but how to translate them simply cannot be nailed 
down once and for all.

Isaac Morland			CSCF Web Guru
DC 2554C, x36650		WWW Software Specialist

From skip at pobox.com  Thu Dec 11 15:27:27 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 11 Dec 2008 08:27:27 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812111519.07899.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>
	<18753.3615.21624.999357@montanaro-dyndns-org.local>
	<200812111519.07899.victor.stinner@haypocalc.com>
Message-ID: <18753.9039.975554.300631@montanaro-dyndns-org.local>


    >> The Python distribution comes with a Misc/gdbinit file

    Victor> Hum, do you really run *all* programs in gdb? Most of the time,
    Victor> you don't expect a crash (because you trust your softwares). You
    Victor> will have to try to reproduce the crash, but sometimes it's very
    Victor> hard (eg. Heisenbugs!).

Please folks!  Get real.  I was trying to help out a guy who responded to
this thread saying that he gets intermittent segfaults in his PyGTK
programs.  I don't presume that he runs his app in gdb.  If he has a core
file this will work.  I apologize profusely for any implication that a set
of gdb commands is in any way superior to your patch.

OTOH, it works today if you have a core file and are running Python at least
as far back as 2.4.  It doesn't require any changes to the interpreter.  I
use it frequently at work (a couple times a month anyway).  We get
notifications of all core files dropped each day.  I make at least a cursory
check of all core files dumped by Python.  For that I use the pystack
command defined in Misc/gdbinit.

    Victor> My new proposition is to display the backtrace instead of just
    Victor> the message "segmentation fault". It's not a problem if
    Victor> displaying the backtrace produces new fault because it's already
    Victor> better than just the message "segmentation fault". Even with my
    Victor> SIGSEVG handler, you can still use gdb because gdb catchs the
    Victor> signal before the program.

Again, I meant no disrespect to your proposal.  I was *simply trying to help
the guy out*.

Skip

From jyasskin at gmail.com  Thu Dec 11 17:08:53 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 11 Dec 2008 08:08:53 -0800
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812111034.24319.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
	<200812111034.24319.victor.stinner@haypocalc.com>
Message-ID: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com>

On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> But if -as many people wrote-
> Python is totally broken after a segfault, it is maybe not a good idea :-)

While it's true that after a segfault or unexpected longjmp, there are
no guarantees whatsoever about the state of the python program, the
program will often just happen to work, and there are at least some
programs I've worked on that would rather take the risk in order to
try to shut down gracefully. For example, an interactive app may want
to give the user a chance to save her (not necessarily corrupted) work
into a new file rather than unconditionally losing it. Or a webserver
might want to catch the segfault, finish replying to the other
requests that were in progress at the time, maybe reply to the request
that caused the segfault, and then restart. Yes there's a possibility
that the events around the segfault exposed some secret internal data
(and they may do so even without segfaulting), but when the
alternative is not replying to the users at all, this may be a risk
the app wants to take. It would be nice for Python to at least expose
the option so that developers (who are consenting adults, remember)
can make their own decisions. It should _not_ be on by default, but
something like sys.dangerous_turn_C_crashes_into_exceptions() would be
useful.

Jeffrey

From jyasskin at gmail.com  Thu Dec 11 17:17:49 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 11 Dec 2008 08:17:49 -0800
Subject: [Python-Dev] Merging flow
In-Reply-To: <49410517.1030601@gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
	<49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com>
Message-ID: <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com>

On Thu, Dec 11, 2008 at 4:18 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Martin v. L?wis wrote:
>> Jeffrey Yasskin wrote:
>>> Was there ever a conclusion to this? I need to merge the patches
>>> associated with issue 4597 from trunk to all the maintenance branches,
>>> and I'd like to avoid messing anyone up if possible. If I don't hear
>>> back, I'll plan to svnmerge directly from trunk to each of the
>>> branches, and then block my merge to py3k from being merged again to
>>> release30-maint.
>>
>> No - you should merge from the py3k branch to the release30-maint branch.
>
> I believe that's difficult when you previously merged from the trunk to
> the py3k branch - the merged change to the svnmerge related properties
> on the root directory gets in the way when svnmerge attempts to update
> them on the maintenance branch.
>
> That's what started this thread, and so far nobody has come up with a
> workaround. It seems to me that svnmerge.py should just be able to do a
> svn revert on the affected properties in the maintenance branch before
> it attempts to modify them, but my svn-fu isn't strong enough for me to
> say that for sure.

Yeah, that's why I asked. I tried what Martin suggested with r67698 by
just saying I'd resolved the conflict, which added the single revision
I was merging from to the svnmerge-integrated property. It didn't add
the two original revisions. I don't know enough about how svnmerge
works to know if that's the right outcome or who it's going to cause
trouble for.

Jeffrey

From foom at fuhm.net  Thu Dec 11 17:40:28 2008
From: foom at fuhm.net (James Y Knight)
Date: Thu, 11 Dec 2008 11:40:28 -0500
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
	<200812111034.24319.victor.stinner@haypocalc.com>
	<5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com>
Message-ID: <FB5A9178-65AD-46E3-B89E-704E1BBA3F97@fuhm.net>


On Dec 11, 2008, at 11:08 AM, Jeffrey Yasskin wrote:

> On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
>> But if -as many people wrote-
>> Python is totally broken after a segfault, it is maybe not a good  
>> idea :-)
>
> While it's true that after a segfault or unexpected longjmp, there are
> no guarantees whatsoever about the state of the python program, the
> program will often just happen to work, and there are at least some
> programs I've worked on that would rather take the risk in order to
> try to shut down gracefully.

I ran an interactive game for years (written in C, mind you, not  
python), where the SIGSEGV handler simply recursively reinvoked the  
main loop, after disabling the command that caused a SEGV if it had  
caused a SEGV twice already. It almost always worked and continued  
running without issue. YMMV, of course. :)

James

From musiccomposition at gmail.com  Thu Dec 11 17:38:36 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 11 Dec 2008 10:38:36 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
	<200812111034.24319.victor.stinner@haypocalc.com>
	<5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com>
Message-ID: <1afaf6160812110838j2064385byca69bd01f7d9d06@mail.gmail.com>

On Thu, Dec 11, 2008 at 10:08 AM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
>> But if -as many people wrote-
>> Python is totally broken after a segfault, it is maybe not a good idea :-)
>
> While it's true that after a segfault or unexpected longjmp, there are
> no guarantees whatsoever about the state of the python program, the
> program will often just happen to work, and there are at least some
> programs I've worked on that would rather take the risk in order to
> try to shut down gracefully. For example, an interactive app may want
> to give the user a chance to save her (not necessarily corrupted) work
> into a new file rather than unconditionally losing it. Or a webserver
> might want to catch the segfault, finish replying to the other
> requests that were in progress at the time, maybe reply to the request
> that caused the segfault, and then restart. Yes there's a possibility
> that the events around the segfault exposed some secret internal data
> (and they may do so even without segfaulting), but when the
> alternative is not replying to the users at all, this may be a risk
> the app wants to take. It would be nice for Python to at least expose
> the option so that developers (who are consenting adults, remember)
> can make their own decisions. It should _not_ be on by default, but
> something like sys.dangerous_turn_C_crashes_into_exceptions() would be
> useful.

Trying to recover (or save work etc.) is incredibility unpredictable,
though. It could very well end up making the situation worse!

I'm -1 on putting this in the core.



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From steve at holdenweb.com  Thu Dec 11 18:46:57 2008
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 11 Dec 2008 12:46:57 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com>
	<200812111441.46739.eckhardt@satorlaser.com>
Message-ID: <ghrjma$h4f$1@ger.gmane.org>

Ulrich Eckhardt wrote:
> On Thursday 11 December 2008, Steve Holden wrote:
>> Ulrich Eckhardt wrote:
>>> What I'd just like some feedback on is the approach to return a distinct
>>> type (neither a byte string nor a Unicode string) from readdir(). In
>>> order to use this, a programmer will have to convert it explicitly,
>>> otherwise e.g. printing it will just produce <env_string at 0x01234567>.
>>> This will immediately bump each programmer with their heads on the issue
>>> of unknown encodings and they will have to make the application-specific
>>> choice whether an approximation of the filename, an exception or ignoring
>>> the file is the right choice. Also, it presents the options for doing
>>> this conversion in a single class, which I personally find much better
>>> than providing overloads for hundreds of functions.
> [...]
>> Seems to me this just threatens to add to the confusion.
>>
>> If you know what your filesystem produces, you can take the appropriate
>> action to convert it into a type that makes sense to the user. If you
>> don't, then at least if you have the string in its bytes form you can
>                                        ^^^^^^^^^^^^^^^^^^^
> 
> There are operating systems that don't use bytes to represent a file path, 
> namely all the MS Windows variants. Even worse, when you use a byte string 
> there, it typically means that you want to use the obsolete encoding that is 
> based on codepages.
> 
> Why can we not preserve the representation of a path as it is? Why do we 
> _have_ to convert it to anything at all, without even knowing if this 
> conversion is needed? I just want to do something to a file's content, why 
> does its path have to be converted to something and then be converted back in 
> order for the system to digest it?
> 
You don't: that was my point. You only need to perform any kind of
conversion when the filename has to be presented to something other than
the file system.

>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
> 
> You receive from readdir() and pass it to stat(), simple as that. No 
> conversions from the native representation needed. If you need a textual 
> representation, then you have to convert it and you have to do so explicitly 
> according to whatever logic your application requires.
> 
Exactly.

> If readdir() returned Unicode text, people would start taking that for 
> granted. If it returned bytes, just the same. Returning a completely 
> unrelated type will give them enough hint that for this thing they have to 
> rethink their assumptions. This runs along the lines of "In the face of 
> ambiguity, refuse the temptation to guess.", as it makes guessing rather 
> impossible.
> 
So you are suggesting this "special object" be used only to represent
files to users? Now I understand.

> I just don't see a case where using a separate path class would break things. 
> Further, the special handling that is required would be made even clearer by 
> using such a class.
> 
But it does have to be implemented ...

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From rhamph at gmail.com  Thu Dec 11 19:04:20 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 11:04:20 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com>
	<200812111441.46739.eckhardt@satorlaser.com>
Message-ID: <aac2c7cb0812111004t56cd6d0fxcdb5877299309b8a@mail.gmail.com>

On Thu, Dec 11, 2008 at 6:41 AM, Ulrich Eckhardt
<eckhardt at satorlaser.com> wrote:
> On Thursday 11 December 2008, Steve Holden wrote:
>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
>
> You receive from readdir() and pass it to stat(), simple as that. No
> conversions from the native representation needed. If you need a textual
> representation, then you have to convert it and you have to do so explicitly
> according to whatever logic your application requires.

The simplest solution there is to have windows bytes APIs that return
raw UTF-16 bytes (note that windows does NOT guaranteed to be valid
unicode, despite being much more likely than on linux).  The only real
issue I see is that UTF-16 isn't an ASCII superset, so it won't print
nicely.

In other words, bytes can be your special type.


-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Thu Dec 11 19:15:22 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 11:15:22 -0700
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812111034.24319.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
	<200812111034.24319.victor.stinner@haypocalc.com>
Message-ID: <aac2c7cb0812111015y2705ec9fp387c77f033ad6da0@mail.gmail.com>

On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit :
>> >> Recover after a segfault is dangerous, but my first goal was to get the
>> >> Python backtrace instead just one line: "Segmentation fault". It helps a
>> >> lot for debug!
>> >
>> > Exactly! That's why it doesn't belong in the Python core. We can't
>> > guarantee anything about its affects or encourage it.
>>
>> Would it be safe to catch SIGSEGV, output a trace, and then exit?
>> IE, make the 'first goal' the only goal?
>
> Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to
> display the trace? It would be nice to -at least- use the Python stderr
> (which is written in pure Python for Python3). It would be better if the user
> can setup a callback, like sys.excepthook. But if -as many people wrote-
> Python is totally broken after a segfault, it is maybe not a good idea :-)

You have to use the low-level stderr, nothing that invokes Python.
I'd hate to get a second segfault while printing the first.

Just think about how indirect refcounting bugs tend to be.  Another
example is messing up GIL handling.  There's heaps of things for which
we'd want good stack traces, which can't be done from Python.


-- 
Adam Olsen, aka Rhamphoryncus

From daniel at stutzbachenterprises.com  Thu Dec 11 20:39:30 2008
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 11 Dec 2008 13:39:30 -0600
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812111015y2705ec9fp387c77f033ad6da0@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>
	<ghp3qu$49e$2@ger.gmane.org>
	<200812111034.24319.victor.stinner@haypocalc.com>
	<aac2c7cb0812111015y2705ec9fp387c77f033ad6da0@mail.gmail.com>
Message-ID: <eae285400812111139y3f4e217do55e2b08ef468bded@mail.gmail.com>

On Thu, Dec 11, 2008 at 12:15 PM, Adam Olsen <rhamph at gmail.com> wrote:

> You have to use the low-level stderr, nothing that invokes Python.
> I'd hate to get a second segfault while printing the first.
>
> Just think about how indirect refcounting bugs tend to be.  Another
> example is messing up GIL handling.  There's heaps of things for which
> we'd want good stack traces, which can't be done from Python.
>

+1 on functionality to print a stack trace on a fault
-1 on translating the fault into an exception

I suggest exposing some functions to control the functionality.  Here are
some things the user may wish to control:

1. Disable/enable the functionality altogether
2. Set the file descriptor that the stack trace should be written to
3. Set a file name that should be created and written to instead
4. Specify whether a core dump should be generated
5. Specify a program to run after the stack trace has been printed

#3 combined with #5 would be very useful for automated bug reporting.

For what it's worth, the functionality could be implemented under Windows
using Structured Exception Handling.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081211/b928db72/attachment.htm>

From mal at egenix.com  Thu Dec 11 21:00:43 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 11 Dec 2008 21:00:43 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <aac2c7cb0812111015y2705ec9fp387c77f033ad6da0@mail.gmail.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>	<1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com>	<ghp3qu$49e$2@ger.gmane.org>	<200812111034.24319.victor.stinner@haypocalc.com>
	<aac2c7cb0812111015y2705ec9fp387c77f033ad6da0@mail.gmail.com>
Message-ID: <4941716B.6030401@egenix.com>

On 2008-12-11 19:15, Adam Olsen wrote:
> On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
>> Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit :
>>>>> Recover after a segfault is dangerous, but my first goal was to get the
>>>>> Python backtrace instead just one line: "Segmentation fault". It helps a
>>>>> lot for debug!
>>>> Exactly! That's why it doesn't belong in the Python core. We can't
>>>> guarantee anything about its affects or encourage it.
>>> Would it be safe to catch SIGSEGV, output a trace, and then exit?
>>> IE, make the 'first goal' the only goal?
>> Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to
>> display the trace? It would be nice to -at least- use the Python stderr
>> (which is written in pure Python for Python3). It would be better if the user
>> can setup a callback, like sys.excepthook. But if -as many people wrote-
>> Python is totally broken after a segfault, it is maybe not a good idea :-)
> 
> You have to use the low-level stderr, nothing that invokes Python.
> I'd hate to get a second segfault while printing the first.
> 
> Just think about how indirect refcounting bugs tend to be.  Another
> example is messing up GIL handling.  There's heaps of things for which
> we'd want good stack traces, which can't be done from Python.

Experience with mx.Tools.safecall() shows that there's a lot you can
still do after a segfault in some library, including print the
traceback in Python, so things are not as bad.

However, I'd disable such functionality in Python per default,
if it should ever get introduced. This has got to stay an expert
option, unless we want to risk messing up user systems completely,
e.g. by having some logging manager unintentionally overwrite
important files on the disk.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 11 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From martin at v.loewis.de  Thu Dec 11 21:03:03 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 11 Dec 2008 21:03:03 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <49410517.1030601@gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
	<49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com>
Message-ID: <494171F7.7050208@v.loewis.de>

> I believe that's difficult when you previously merged from the trunk to
> the py3k branch - the merged change to the svnmerge related properties
> on the root directory gets in the way when svnmerge attempts to update
> them on the maintenance branch.
> 
> That's what started this thread, and so far nobody has come up with a
> workaround.

The work-around is fairly straight-forward:

- inspect the conflict file (I forgot its name - something like
  dir-props), and verify that the only conflict is in the missing
  merge info from trunk to py3k
- svn resolved .

> It seems to me that svnmerge.py should just be able to do a
> svn revert on the affected properties in the maintenance branch before
> it attempts to modify them, but my svn-fu isn't strong enough for me to
> say that for sure.

See above. svnmerge overwrites the property after it has conflicted,
so the only additional action to take is to declare that a resolution.

Regards,
Martin

From martin at v.loewis.de  Thu Dec 11 21:05:39 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 11 Dec 2008 21:05:39 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <B5342F9C-6344-4390-AA07-91945A82AF3B@solarsail.hcs.harvard.edu>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
	<B5342F9C-6344-4390-AA07-91945A82AF3B@solarsail.hcs.harvard.edu>
Message-ID: <49417293.50506@v.loewis.de>

> On Dec 11, 2008, at 12:12 AM, Martin v. L?wis wrote:
>> Several people already said (essentially) that: -1. I don't think such
>> code should be added to the Python core, no matter how smart or correct
>> it is.
> 
> 
> does your -1 apply only to attempts to resume execution after SIGSEGV,
> or also to the idea of dumping the stack and immediately exiting? The
> former strikes me as crazy talk, while the latter is genuinely useful.

Only to the former. If it is actually possible to print a stack trace,
that could be useful indeed. I'm then skeptical that this is possible
in the general case (i.e. displaying the full C stack), but displaying
(parts of) the Python stack might be possible. I think it should still
proceed to dump core, so that you can then inspect the core with a
proper debugger.

Regards,
Martin

From martin at v.loewis.de  Thu Dec 11 21:10:07 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 11 Dec 2008 21:10:07 +0100
Subject: [Python-Dev] Merging flow
In-Reply-To: <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>	
	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>	
	<49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com>
	<5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com>
Message-ID: <4941739F.6020701@v.loewis.de>

> Yeah, that's why I asked. I tried what Martin suggested with r67698 by
> just saying I'd resolved the conflict, which added the single revision
> I was merging from to the svnmerge-integrated property. It didn't add
> the two original revisions. 

Can you elaborate? What are the "two original revisions" it didn't add?

If you are referring to the trunk revisions - that's fine. As far
as svnmerge is concerned, we merge revisions from the 3k branch
to the 3.0 maintenance branch. The original revisions don't exist
on the 3k branch (they have an empty changeset), so it's not a
problem that they didn't get recorded as merged.

Regards,
Martin

From jyasskin at gmail.com  Thu Dec 11 21:33:09 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Thu, 11 Dec 2008 12:33:09 -0800
Subject: [Python-Dev] Merging flow
In-Reply-To: <4941739F.6020701@v.loewis.de>
References: <gh8s08$p9r$1@ger.gmane.org>
	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
	<49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com>
	<5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com>
	<4941739F.6020701@v.loewis.de>
Message-ID: <5d44f72f0812111233p2cae1249n31e8ddab857c1e03@mail.gmail.com>

On Thu, Dec 11, 2008 at 12:10 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Yeah, that's why I asked. I tried what Martin suggested with r67698 by
>> just saying I'd resolved the conflict, which added the single revision
>> I was merging from to the svnmerge-integrated property. It didn't add
>> the two original revisions.
>
> Can you elaborate? What are the "two original revisions" it didn't add?
>
> If you are referring to the trunk revisions - that's fine. As far
> as svnmerge is concerned, we merge revisions from the 3k branch
> to the 3.0 maintenance branch. The original revisions don't exist
> on the 3k branch (they have an empty changeset), so it's not a
> problem that they didn't get recorded as merged.

Yes, I was referring to the trunk revisions. Sounds like this (marking
the conflicting property as resolved without changing it) is the way
to go then. Thanks!

From ncoghlan at gmail.com  Thu Dec 11 21:39:29 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Dec 2008 06:39:29 +1000
Subject: [Python-Dev] Merging flow
In-Reply-To: <494171F7.7050208@v.loewis.de>
References: <gh8s08$p9r$1@ger.gmane.org>	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>
	<49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com>
	<494171F7.7050208@v.loewis.de>
Message-ID: <49417A81.8050505@gmail.com>

Martin v. L?wis wrote:
>> I believe that's difficult when you previously merged from the trunk to
>> the py3k branch - the merged change to the svnmerge related properties
>> on the root directory gets in the way when svnmerge attempts to update
>> them on the maintenance branch.
>>
>> That's what started this thread, and so far nobody has come up with a
>> workaround.
> 
> The work-around is fairly straight-forward:
> 
> - inspect the conflict file (I forgot its name - something like
>   dir-props), and verify that the only conflict is in the missing
>   merge info from trunk to py3k
> - svn resolved .

Ah, that's the missing piece of info - thanks :)

This should probably go in the dev FAQ somewhere though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From eric at trueblade.com  Thu Dec 11 21:45:09 2008
From: eric at trueblade.com (Eric Smith)
Date: Thu, 11 Dec 2008 15:45:09 -0500
Subject: [Python-Dev] Merging flow
In-Reply-To: <49417A81.8050505@gmail.com>
References: <gh8s08$p9r$1@ger.gmane.org>	<5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com>	<49406558.7020005@v.loewis.de>
	<49410517.1030601@gmail.com>	<494171F7.7050208@v.loewis.de>
	<49417A81.8050505@gmail.com>
Message-ID: <49417BD5.10109@trueblade.com>

Nick Coghlan wrote:
> Martin v. L?wis wrote:
>>> I believe that's difficult when you previously merged from the trunk to
>>> the py3k branch - the merged change to the svnmerge related properties
>>> on the root directory gets in the way when svnmerge attempts to update
>>> them on the maintenance branch.
>>>
>>> That's what started this thread, and so far nobody has come up with a
>>> workaround.
>> The work-around is fairly straight-forward:
>>
>> - inspect the conflict file (I forgot its name - something like
>>   dir-props), and verify that the only conflict is in the missing
>>   merge info from trunk to py3k
>> - svn resolved .
> 
> Ah, that's the missing piece of info - thanks :)
> 
> This should probably go in the dev FAQ somewhere though.

Indeed! Preferably with an example, if someone who understands it has 
the time. I have some changes I've been hold off of checking in until I 
see how someone else handles this.

Eric.

From martin at v.loewis.de  Thu Dec 11 22:02:16 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 11 Dec 2008 22:02:16 +0100
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <200812111519.07899.victor.stinner@haypocalc.com>
References: <200812101206.49316.victor.stinner@haypocalc.com>	<fb73205e0812102321g50dd6ee0l8aeb61486bfed3df@mail.gmail.com>	<18753.3615.21624.999357@montanaro-dyndns-org.local>
	<200812111519.07899.victor.stinner@haypocalc.com>
Message-ID: <49417FD8.7050307@v.loewis.de>

>> The Python distribution comes with a Misc/gdbinit file
> 
> Hum, do you really run *all* programs in gdb? Most of the time, you don't 
> expect a crash (because you trust your softwares). You will have to try to 
> reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!).

You don't have to run the program in gdb. You can also use the core dump
that the operating system will generate, and study the crash after it
happened.

Regards,
Martin

From stephen at xemacs.org  Fri Dec 12 02:55:52 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 12 Dec 2008 10:55:52 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <494103FD.5000101@holdenweb.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com>
Message-ID: <871vwe9mxj.fsf@xemacs.org>

Steve Holden writes:
 > Ulrich Eckhardt writes:

 > > What I'd just like some feedback on is the approach to return a
 > > distinct type (neither a byte string nor a Unicode string) from
 > > readdir().

This is presumably unacceptable on the grounds that it will break
existing code that does something more or less useful more or less
some of the time.<wink>

 > If you know what your filesystem produces, you can take the appropriate
 > action to convert it into a type that makes sense to the user.

Unfortunately, even programmers experienced in I18N like Martin, and
those with intuition-that-has-the-force-of-law<wink> like Guido,
express deliberate disbelief on this point.  They say that filesystem
names and environment variable values are text, which is true from the
semantic viewpoint but can't be fully supported by any implementation.

The implementation issue is why you want bytes, but I don't think it
is going to overcome the tide of (semantically-oriented) pragmatism.

 > If you don't, then at least if you have the string in its bytes
 > form you can re-present it to the filesystem to manipulate the
 > file. What are we supposed to do with the "special type"?

Trivially convert it back to bytes and re-present it to the
filesystem, of course.

I gather that the BFDL's line on this thread of discussion is that
forcing programmers to think about encodings every time they call out
to the OS is unacceptable when most programs will work acceptably
almost all of the time with a rather naive approach.  This means that
almost all Python programs will be technically broken for the
forseeable future, sorry, Ulrich.

And for the same pragmatic reasons, these functions are going to
return strings (ie, Unicode), not bytes, I expect.  Sorry, Steve.

What needs to be determined here is the best way to provide
reliability to those who will go to the effort of asking for it if
it's available.  I don't think "just return bytes" fits the bill for
the reason above.

What I would like to see is a type that is derived from string (so if
you present it to an API expecting string, it is silently treated as
string), but from which the original bytes can always be extracted on
request.  If the original bytes cannot be sensibly decoded to a
string, then the string field in the object would either contain
something that should normally cause an error in a string API, or some
made-up string (presumably it would attempt to be a more or less
faithful representation of the bytes) at the caller's option.
Probably they'd also contain some metadata useful in guessing
encodings (the read time locale in particular).

These objects probably shouldn't support string-like operations in a
general way (ie, maintaining both the string representation and the
bytes "correctly").  Rather, using "proper" string operations on them
would use the string content and produce strings.  People who really
want to handle mixed-encoding pathnames and the like would have to
keep collections of these objects and handle them in an ad-hoc way.

Unfortunate implementing this is way beyond my skills and time
availability.

From sturla at molden.no  Fri Dec 12 02:13:13 2008
From: sturla at molden.no (Sturla Molden)
Date: Fri, 12 Dec 2008 02:13:13 +0100 (CET)
Subject: [Python-Dev] The endless GIL debate: why not remove thread support
	instead?
Message-ID: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>

Last month there was a discussion on Python-Dev regarding removal of
reference counting to remove the GIL. I hope you forgive me for continuing
the debate.

I think reference counting is a good feature. It prevents huge piles of
garbage from building up. It makes the interpreter run more smoothly. It
is not just important for games and multimedia applications, but also
servers under high load. Python does not pause to look for garbage like
Java or .NET. It only pauses to look for dead reference cycles. This can
be safely turned off temporarily; it can be turned off completely if you
do not create reference cycles. With Java and .NET, no garbage is ever
reclaimed except by the intermittent garbage collection. Python always
reclaims an object when the reference count drops to zero ? whether the GC
is enabled or not. This makes Python programs well-behaved. For this
reason, I think removing reference counting is a genuinely bad idea. Even
if the GIL is evil, this remedy is even worse.

I am not a Python core developer; I am a research scientist who use Python
because Matlab is (or used to be) a bad programming language, albeit a
good computing environment. As most people who have worked with scientific
computing know, there are better paradigms for concurrency than threads.
In particular, there are message-passing systems like MPI and Erlang, and
there are autovectorizing compilers for OpenMP and Fortran 90/95. There
are special LAPACK, BLAS and FFT libraries for parallel computer
architectures. There are fork-join systems like cilk and
java.util.concurrent. Threads seem to be used only because mediocre
programmers don't know what else to use.

I genuinely think the use of threads should be discouraged. It leads to
code that are full of bugs and difficult to maintain - race conditions,
deadlocks, and livelocks are common pitfalls. Very few developers are
capable of implementing efficient load-balancing by hand. Multi-threaded
programs tend to scale badly because they are badly written. If the GIL
discourages the abuse of threads, it serves a purpose albeit being evil
like the Linux kernel's BKL.

Python could be better off doing what tcl does. Allow each process to
embed multiple interpreters; run each interpreter in its own thread.
Implement a fast message-passing system between the interpreters (e.g.
copy-on-write by making communicated objects immutable), and Python would
be closer to Erlang than Java.

I thus think the main offender is the thread and threading modules - not
the GIL. Without thread support in the interpreter, there would be no
threads. Without threads, there would be no need for a GIL. Both sources
of evil can be removed by just removing thread support from the Python
interpreter. In addition, it would make Python faster at executing linear
code. Just copy the concurrency model of Erlang instead of Java and get
rid of those nasty threads. In the meanwhile, I'll continue to experiment
with multiprocessing.

Removing reference counting to encourage the use of threads is like
shooting ourselves in the leg twice. That?s my two cents on this issue.

There is another issue to note as well: If you can endure a 200x loss of
efficacy by using Python instead of Fortran, scalability on dual or
quad-core processors may not be that important. Just move the bottlenecks
out of Python and you are much better off.


Regards,
Sturla Molden



From rhamph at gmail.com  Fri Dec 12 06:22:37 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 22:22:37 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <871vwe9mxj.fsf@xemacs.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
Message-ID: <aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>

On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Unfortunately, even programmers experienced in I18N like Martin, and
> those with intuition-that-has-the-force-of-law<wink> like Guido,
> express deliberate disbelief on this point.  They say that filesystem
> names and environment variable values are text, which is true from the
> semantic viewpoint but can't be fully supported by any implementation.

With all the focus on backup tools and file managers I think we've
lost perspective.  They're an important use case, but hardly the
dominant one.

Please, as a user, if your app is creating new files, do NOT use
bytes!  You have no excuse for creating garbage, and garbage doesn't
help the user any.  Getting the encoding right, use the unicode APIs,
and don't pass the buck on to everything else.

The fact that the unicode is easier is a bonus for doing the right thing.

-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Fri Dec 12 06:41:57 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 11 Dec 2008 21:41:57 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<200812101139.37301.eckhardt@satorlaser.com>	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>	<200812111019.16950.eckhardt@satorlaser.com>	<494103FD.5000101@holdenweb.com>
	<871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
Message-ID: <4941F9A5.5040704@gmail.com>

Adam Olsen wrote:
> On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> Unfortunately, even programmers experienced in I18N like Martin, and
>> those with intuition-that-has-the-force-of-law<wink> like Guido,
>> express deliberate disbelief on this point.  They say that filesystem
>> names and environment variable values are text, which is true from the
>> semantic viewpoint but can't be fully supported by any implementation.
> 
> With all the focus on backup tools and file managers I think we've
> lost perspective.  They're an important use case, but hardly the
> dominant one.
> 
> Please, as a user, if your app is creating new files, do NOT use
> bytes!  You have no excuse for creating garbage, and garbage doesn't
> help the user any.  Getting the encoding right, use the unicode APIs,
> and don't pass the buck on to everything else.
> 
Uhmmm.... That's good advice but doesn't solve any problems :-(.  No
matter what I create, the filenames will be bytes when the next person
reads them in.  If my locale is shift-js and the person I'm sharing the
file with uses utf-8 things won't work.  Even if my locale is utf-8
(since I come from a European nation) and their locale is utf-16
(because they're from an Asian nation) the Unicode API won't work.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081211/d19170eb/attachment.pgp>

From rhamph at gmail.com  Fri Dec 12 07:19:27 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 23:19:27 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4941F9A5.5040704@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
Message-ID: <aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>

On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> Adam Olsen wrote:
>> On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>>> Unfortunately, even programmers experienced in I18N like Martin, and
>>> those with intuition-that-has-the-force-of-law<wink> like Guido,
>>> express deliberate disbelief on this point.  They say that filesystem
>>> names and environment variable values are text, which is true from the
>>> semantic viewpoint but can't be fully supported by any implementation.
>>
>> With all the focus on backup tools and file managers I think we've
>> lost perspective.  They're an important use case, but hardly the
>> dominant one.
>>
>> Please, as a user, if your app is creating new files, do NOT use
>> bytes!  You have no excuse for creating garbage, and garbage doesn't
>> help the user any.  Getting the encoding right, use the unicode APIs,
>> and don't pass the buck on to everything else.
>>
> Uhmmm.... That's good advice but doesn't solve any problems :-(.  No
> matter what I create, the filenames will be bytes when the next person
> reads them in.  If my locale is shift-js and the person I'm sharing the
> file with uses utf-8 things won't work.  Even if my locale is utf-8
> (since I come from a European nation) and their locale is utf-16
> (because they're from an Asian nation) the Unicode API won't work.

So you'll open up the dir and find this collection:

??????.txt
????????.png
???????.html
????????.html
???.png
??????.txt
??????.txt
??????.txt

A half-broken setup is still a broken setup.  Eventually you have to
tell people to stop screwing around and pick one encoding.

I doubt that UTF-16 is used very much (other than on windows).  I
haven't found any statistics on what distros use, but did find this
one of the web itself:
http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html

I can't wait for next year's statistics.

-- 
Adam Olsen, aka Rhamphoryncus

From curt at hagenlocher.org  Fri Dec 12 07:25:08 2008
From: curt at hagenlocher.org (Curt Hagenlocher)
Date: Thu, 11 Dec 2008 22:25:08 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
Message-ID: <d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>

On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen <rhamph at gmail.com> wrote:

>
> I doubt that UTF-16 is used very much (other than on windows).
>

There's this other obscure platform called "Java"... ;)

--
Curt Hagenlocher
curt at hagenlocher.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081211/7f9db612/attachment-0001.htm>

From rhamph at gmail.com  Fri Dec 12 07:25:31 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 23:25:31 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
Message-ID: <aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>

On Thu, Dec 11, 2008 at 10:22 PM, Adam Olsen <rhamph at gmail.com> wrote:
> On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> Unfortunately, even programmers experienced in I18N like Martin, and
>> those with intuition-that-has-the-force-of-law<wink> like Guido,
>> express deliberate disbelief on this point.  They say that filesystem
>> names and environment variable values are text, which is true from the
>> semantic viewpoint but can't be fully supported by any implementation.
>
> With all the focus on backup tools and file managers I think we've
> lost perspective.  They're an important use case, but hardly the
> dominant one.
>
> Please, as a user, if your app is creating new files, do NOT use
> bytes!  You have no excuse for creating garbage, and garbage doesn't
> help the user any.  Getting the encoding right, use the unicode APIs,
> and don't pass the buck on to everything else.
>
> The fact that the unicode is easier is a bonus for doing the right thing.

As a data point, firefox (when pointed at my home dir) DOES skip over
garbage files.


-- 
Adam Olsen, aka Rhamphoryncus

From rhamph at gmail.com  Fri Dec 12 07:26:46 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 11 Dec 2008 23:26:46 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
Message-ID: <aac2c7cb0812112226l38dbe646ya0ff2c76b02995bb@mail.gmail.com>

On Thu, Dec 11, 2008 at 11:25 PM, Curt Hagenlocher <curt at hagenlocher.org> wrote:
> On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen <rhamph at gmail.com> wrote:
>>
>> I doubt that UTF-16 is used very much (other than on windows).
>
> There's this other obscure platform called "Java"... ;)

Sorry, I should have said "for interchange". :)

(CPython doesn't use UTF-8 internally either.  It uses UTF-16 or UTF-32.)


-- 
Adam Olsen, aka Rhamphoryncus

From a.badger at gmail.com  Fri Dec 12 08:16:38 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 11 Dec 2008 23:16:38 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	
	<200812101139.37301.eckhardt@satorlaser.com>	
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>	
	<200812111019.16950.eckhardt@satorlaser.com>	
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>	
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>	
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
Message-ID: <49420FD6.1040901@gmail.com>

Adam Olsen wrote:

> A half-broken setup is still a broken setup.  Eventually you have to
> tell people to stop screwing around and pick one encoding.
> 
But it's not a broken setup.  It's the way the world is because people
share things with each other.

> I doubt that UTF-16 is used very much (other than on windows).  I
> haven't found any statistics on what distros use, but did find this
> one of the web itself:
> http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html
> 
UTF-16 is popular in Asian locales for the same reason that shift-js and
big-5 are hanging in there.  utf-8 takes many more bytes to encode Asian
Unicode characters than utf-16.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081211/8314dd81/attachment.pgp>

From a.badger at gmail.com  Fri Dec 12 08:33:28 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Thu, 11 Dec 2008 23:33:28 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<200812101139.37301.eckhardt@satorlaser.com>	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>	<200812111019.16950.eckhardt@satorlaser.com>	<494103FD.5000101@holdenweb.com>
	<871vwe9mxj.fsf@xemacs.org>	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>
Message-ID: <494213C8.7040809@gmail.com>

Adam Olsen wrote:
> As a data point, firefox (when pointed at my home dir) DOES skip over
> garbage files.
> 
> 
That's not true.  However, it looks like Firefox is actually broken.
Take a look at this screenshot:
  firefox.png

That shows a directory with a folder that's not decodable in my utf-8
locale.  What's interesting to note is that I actually have two
nondecodable folders there but only one of them showed up.  So firefox
is inconsistent with its treatment, rendering some non-decodable files
and ignoring others.

Also interesting, if you point your browser at:
  http://toshio.fedorapeople.org/u/

You should see two other test files.  They're both
(one-half)(enyei).html but one's encoded in utf-8 and the other in
latin-1.  Firefox has some bugs in it related to this.  For instance, if
you mouseover the two links you'll see that firefox displays the same
symbolic names for each of the files (even though they're in two
different encodings).  Sometimes firefox is able to load both files and
sometimes it only loads one of them.  Firefox seems to be translating
the characters from ASCII percent encoding of bytes into their unicode
symbols and back to utf-8 in some circumstances related to whether it
has the pages in its cache or not.  In this case, it should be leaving
things as percent encoded bytes as it's the only way that apache is
going to know what to retrieve.

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081211/b47644a0/attachment.pgp>

From rhamph at gmail.com  Fri Dec 12 09:00:26 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 12 Dec 2008 01:00:26 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <494213C8.7040809@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>
	<494213C8.7040809@gmail.com>
Message-ID: <aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>

On Fri, Dec 12, 2008 at 12:33 AM, Toshio Kuratomi <a.badger at gmail.com> wrote:
> Adam Olsen wrote:
>> As a data point, firefox (when pointed at my home dir) DOES skip over
>> garbage files.
>>
>>
> That's not true.  However, it looks like Firefox is actually broken.
> Take a look at this screenshot:
>  firefox.png
>
> That shows a directory with a folder that's not decodable in my utf-8
> locale.  What's interesting to note is that I actually have two
> nondecodable folders there but only one of them showed up.  So firefox
> is inconsistent with its treatment, rendering some non-decodable files
> and ignoring others.
>
> Also interesting, if you point your browser at:
>  http://toshio.fedorapeople.org/u/
>
> You should see two other test files.  They're both
> (one-half)(enyei).html but one's encoded in utf-8 and the other in
> latin-1.  Firefox has some bugs in it related to this.  For instance, if
> you mouseover the two links you'll see that firefox displays the same
> symbolic names for each of the files (even though they're in two
> different encodings).  Sometimes firefox is able to load both files and
> sometimes it only loads one of them.  Firefox seems to be translating
> the characters from ASCII percent encoding of bytes into their unicode
> symbols and back to utf-8 in some circumstances related to whether it
> has the pages in its cache or not.  In this case, it should be leaving
> things as percent encoded bytes as it's the only way that apache is
> going to know what to retrieve.

UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
the browser has to display the percent escapes in the address bar,
rather than the intended text.

IOW, inconsistent behaviour is a bug, but translating into UTF-8 is not. ;)


-- 
Adam Olsen, aka Rhamphoryncus

From eckhardt at satorlaser.com  Fri Dec 12 09:19:05 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 12 Dec 2008 09:19:05 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <ghrjma$h4f$1@ger.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812111441.46739.eckhardt@satorlaser.com> 
	<ghrjma$h4f$1@ger.gmane.org>
Message-ID: <200812120919.05389.eckhardt@satorlaser.com>

On Thursday 11 December 2008, Steve Holden wrote:
> Ulrich Eckhardt wrote:
> > If readdir() returned Unicode text, people would start taking that for
> > granted. If it returned bytes, just the same. Returning a completely
> > unrelated type will give them enough hint that for this thing they have
> > to rethink their assumptions. This runs along the lines of "In the face
> > of ambiguity, refuse the temptation to guess.", as it makes guessing
> > rather impossible.
>
> So you are suggesting this "special object" be used only to represent
> files to users? Now I understand.

Not only files, the same problem crops up when handling sys.argv and 
os.environ.

> > I just don't see a case where using a separate path class would break
> > things. Further, the special handling that is required would be made even
> > clearer by using such a class.
>
> But it does have to be implemented ...

Well, it isn't really terribly difficult to do so, after all its just a 
container for either a byte string or Unicode string plus some helper code to 
convert it to/from Unicode.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From stefan_ml at behnel.de  Fri Dec 12 09:35:25 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 12 Dec 2008 09:35:25 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
Message-ID: <ght7od$vaa$1@ger.gmane.org>

Hi,

replying to the topic only: because many C libraries support threading and
Python extension modules can integrate them in a way that allows
concurrency in a safe way (although 'safe' is definitely something that is
paid for in developer days).

Stefan


From eckhardt at satorlaser.com  Fri Dec 12 09:31:16 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 12 Dec 2008 09:31:16 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812111004t56cd6d0fxcdb5877299309b8a@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812111441.46739.eckhardt@satorlaser.com> 
	<aac2c7cb0812111004t56cd6d0fxcdb5877299309b8a@mail.gmail.com>
Message-ID: <200812120931.16231.eckhardt@satorlaser.com>

On Thursday 11 December 2008, Adam Olsen wrote:
> The simplest solution there is to have windows bytes APIs that return
> raw UTF-16 bytes (note that windows does NOT guaranteed to be valid
> unicode, despite being much more likely than on linux).

Actually, I'm not aware of this case. I only know that the OS refuses to mount 
media it can't decode, but that is on the OS-level. Can you give me a hint?

> The only real issue I see is that UTF-16 isn't an ASCII superset, so it
> won't print nicely.

True, but I personally couldn't care less. Actually, I would even prefer if 
printing a byte string always produced \x escaped byte values, that way it 
would at least be consistent. 

> In other words, bytes can be your special type.

That would actually be a lot of work to do, but I do agree that it would be a 
way. 

The problem though is that I have seen quite a few places in Python where such 
a byte string is passed as 'char*' and treated with the assumption that 
strlen() would yield a meaningful value there, so this calls at least for a 
distinct 'Py_Byte' type. Also, this still doesn't even remotely handle the 
problem that you do have two valid encodings on win32, even though the MBCS 
one could be called deprecated. People will try to interface to other 
libraries that use win32 CHAR strings and that will be much harder or even 
impossible. Further, and that is IMHO the worst part of it, things will fail 
too silently and programmers aren't encouraged to write portable code, but 
maybe I'm just too pessimistic.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From stephen at xemacs.org  Fri Dec 12 09:57:20 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 12 Dec 2008 17:57:20 +0900
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4941F9A5.5040704@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
Message-ID: <87oczh93f3.fsf@xemacs.org>

Toshio Kuratomi writes:
 > Adam Olsen wrote:
 > > On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > >> Unfortunately, even programmers experienced in I18N like Martin, and
 > >> those with intuition-that-has-the-force-of-law<wink> like Guido,
 > >> express deliberate disbelief on this point.  They say that filesystem
 > >> names and environment variable values are text, which is true from the
 > >> semantic viewpoint but can't be fully supported by any implementation.
 > > 
 > > With all the focus on backup tools and file managers I think we've
 > > lost perspective.  They're an important use case, but hardly the
 > > dominant one.

True.

 > > Please, as a user, if your app is creating new files, do NOT use
 > > bytes!  You have no excuse for creating garbage, and garbage doesn't
 > > help the user any.  Getting the encoding right, use the unicode APIs,
 > > and don't pass the buck on to everything else.
 > > 
 > Uhmmm.... That's good advice but doesn't solve any problems :-(.

Exactly.  Furthermore, the problems *already exist*.  My current
locale is UTF-8 and all files dated since about 2002 have UTF-8 names,
*except* in my MIME-bodies garbage can, where only recently have I got
around to coercing my MUA to doing the right thing.  And of course
there are still legacy files names in EUC-JP, which I suppose I could
search for but since I only access a directory containing one once in
a pale blue moon, I'm not gonna bother.

It's just not reasonable to expect users or even sysadminns to go
around cleaning up legacy data.

From nd at perlig.de  Fri Dec 12 10:11:09 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Fri, 12 Dec 2008 10:11:09 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<494213C8.7040809@gmail.com>
	<aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>
Message-ID: <200812121011.09427.nd@perlig.de>

* Adam Olsen wrote: 

> UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
> the browser has to display the percent escapes in the address bar,
> rather than the intended text.

Duh! The address bar should contain the URL, which *is* the intended text. 
The escapes are there for a reason. If I pass some octets using percent 
escapes via the query string or request body, it's not text, not even 
intended. It's still a collection of octets. Translating them back (and 
forth when I press enter in the address bar) is a pretty ambigious 
operation and therefore pretty wrong.

The defacto standard does not exist. There's a real one instead: RFC 2396.

nd

From rhamph at gmail.com  Fri Dec 12 10:12:26 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 12 Dec 2008 02:12:26 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812120931.16231.eckhardt@satorlaser.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812111441.46739.eckhardt@satorlaser.com>
	<aac2c7cb0812111004t56cd6d0fxcdb5877299309b8a@mail.gmail.com>
	<200812120931.16231.eckhardt@satorlaser.com>
Message-ID: <aac2c7cb0812120112rec02ecdjd9436801c28568e@mail.gmail.com>

On Fri, Dec 12, 2008 at 1:31 AM, Ulrich Eckhardt
<eckhardt at satorlaser.com> wrote:
> On Thursday 11 December 2008, Adam Olsen wrote:
>> The simplest solution there is to have windows bytes APIs that return
>> raw UTF-16 bytes (note that windows does NOT guaranteed to be valid
>> unicode, despite being much more likely than on linux).
>
> Actually, I'm not aware of this case. I only know that the OS refuses to mount
> media it can't decode, but that is on the OS-level. Can you give me a hint?

Only pages like this, which indicate the underlying API is an array of WCHAR:

http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx


>> The only real issue I see is that UTF-16 isn't an ASCII superset, so it
>> won't print nicely.
>
> True, but I personally couldn't care less. Actually, I would even prefer if
> printing a byte string always produced \x escaped byte values, that way it
> would at least be consistent.
>
>> In other words, bytes can be your special type.
>
> That would actually be a lot of work to do, but I do agree that it would be a
> way.
>
> The problem though is that I have seen quite a few places in Python where such
> a byte string is passed as 'char*' and treated with the assumption that
> strlen() would yield a meaningful value there, so this calls at least for a
> distinct 'Py_Byte' type. Also, this still doesn't even remotely handle the
> problem that you do have two valid encodings on win32, even though the MBCS
> one could be called deprecated. People will try to interface to other
> libraries that use win32 CHAR strings and that will be much harder or even
> impossible. Further, and that is IMHO the worst part of it, things will fail
> too silently and programmers aren't encouraged to write portable code, but
> maybe I'm just too pessimistic.

char * is just fine.  You need only pass a length along with it.  All
internal APIs *must* already do this, as they support nul bytes.  Also
note that the underlying POSIX APIs prohibit nul bytes in filenames,
so it's irrelevant for them.

If your concern is that people will use MBCS byte strings (produced
how?) in a WCHAR API.. I agree it would be confusing, but not nearly
enough to warrant a special type (which would probably get passed a
MBCS byte string anyway.)

Although I haven't found an official claim that MBCS is deprecated, I
see no reason why it wouldn't be effectively obsoleted by the UTF-16
APIs.  (Certain outdated APIs may be the exception.)  We could have a
way to convert (locale-dependent codec?), but that's as much as we
should care.

-- 
Adam Olsen, aka Rhamphoryncus

From eckhardt at satorlaser.com  Fri Dec 12 10:10:13 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Fri, 12 Dec 2008 10:10:13 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <871vwe9mxj.fsf@xemacs.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
Message-ID: <200812121010.13157.eckhardt@satorlaser.com>

On Friday 12 December 2008, Stephen J. Turnbull wrote:
> I gather that the BFDL's line on this thread of discussion is that
> forcing programmers to think about encodings every time they call out
> to the OS is unacceptable

Exactly that is not necessary.

  for n in os.readdir('.'):
      f = open(n)
      if grep('foo', f):
          print('found "foo"!')

Now, if you actually wanted to output the filename, you could never do so 
reliably anyway, because even though it is supposed to be text, the encoding 
isn't known. So, an archiving program will probably do something like this:

   try:
       for n in os.readdir():
           b = n.encode('UTF-8')
           f = open(n)
           archive.write_file_header(b)
           archive.write_file(f)
   catch ...
       print "oops, couldn't decode file '%s'" % n.unicode(error='replace')

If you're writing a filemanager, you would store the path alongside an 
approximated Unicode representation.


> when most programs will work acceptably 
> almost all of the time with a rather naive approach.  This means that
> almost all Python programs will be technically broken for the
> forseeable future, sorry, Ulrich.

Actually, they are already broken, only that few people notice it. :|

> And for the same pragmatic reasons, these functions are going to
> return strings (ie, Unicode), not bytes, I expect.  Sorry, Steve.
>
> What needs to be determined here is the best way to provide
> reliability to those who will go to the effort of asking for it if
> it's available.  I don't think "just return bytes" fits the bill for
> the reason above.
>
> What I would like to see is a type that is derived from string (so if
> you present it to an API expecting string, it is silently treated as
> string), but from which the original bytes can always be extracted on
> request.

I like that idea, this type would behave pretty much like the env_string I 
proposed. The main difference is that it does several implicit conversions 
where I personally would rather see explicit conversions. Other than that, 
I'm all for it.

> If the original bytes cannot be sensibly decoded to a 
> string, then the string field in the object would either contain
> something that should normally cause an error in a string API, or some
> made-up string (presumably it would attempt to be a more or less
> faithful representation of the bytes) at the caller's option.
> Probably they'd also contain some metadata useful in guessing
> encodings (the read time locale in particular).

Well, I wouldn't provide an approximation. Considering the archiving software 
above, you would end up with a file name "<undecodable file name>" in an 
archive. For that kind of software, it would be fatal. But, and that is much 
more important than my preference, at least your approach would allow writing 
reliable software that properly handles such environment strings. Further, 
and that is where it differs from just returning bytes, it even makes it easy 
by the using a distinct type.

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From rhamph at gmail.com  Fri Dec 12 10:19:14 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 12 Dec 2008 02:19:14 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812121011.09427.nd@perlig.de>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<494213C8.7040809@gmail.com>
	<aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>
	<200812121011.09427.nd@perlig.de>
Message-ID: <aac2c7cb0812120119s44ba4264ne7d2edf112188768@mail.gmail.com>

On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo <nd at perlig.de> wrote:
> * Adam Olsen wrote:
>
>> UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
>> the browser has to display the percent escapes in the address bar,
>> rather than the intended text.
>
> Duh! The address bar should contain the URL, which *is* the intended text.
> The escapes are there for a reason. If I pass some octets using percent
> escapes via the query string or request body, it's not text, not even
> intended. It's still a collection of octets. Translating them back (and
> forth when I press enter in the address bar) is a pretty ambigious
> operation and therefore pretty wrong.
>
> The defacto standard does not exist. There's a real one instead: RFC 2396.

All the heaps of people using non-english wikipedia sites might
disagree with you.  There's only, what, a few *million* pages that
would be affected?

It'd be very interesting if someone at Google could provide some
statistics on URL encodings.


-- 
Adam Olsen, aka Rhamphoryncus

From p.f.moore at gmail.com  Fri Dec 12 11:03:14 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 12 Dec 2008 10:03:14 +0000
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
Message-ID: <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>

2008/12/12 Sturla Molden <sturla at molden.no>:
> Last month there was a discussion on Python-Dev regarding removal of
> reference counting to remove the GIL. I hope you forgive me for continuing
> the debate.
[...]
> Python could be better off doing what tcl does. Allow each process to
> embed multiple interpreters; run each interpreter in its own thread.
> Implement a fast message-passing system between the interpreters (e.g.
> copy-on-write by making communicated objects immutable), and Python would
> be closer to Erlang than Java.

Too much to comment individually here, but I'd agree that
message-passing approaches are a better model in general. Some
specific points:

1. The Queue module gives the bones of a message-passing model,
building something based on that is possible now (and may already
exist). You have to do isolation by convention rather than having it
enforced by the system, but that's OK for coding. (It doesn't help the
"remove the GIL" debate, though).
2. I'd like to see isolation based on multiple interpreters, but the
problem lies with extensions (and at a lower level with the Python C
API) which wasn't designed with isolation in mind. Changing that may
be nice, but it's probably too late (or if not, it's likely to be a
lot of work to do it in a compatible manner).
3. Exposing multiple interpreters at the Python level would let most
of this be done outside the core. But it may result in pure Python
code being able to crash the application if not done carefully.

And of course, the overriding points:
- This needs to be done in a backward compatible manner (Python 3.0 is out now!)
- A working patch is hugely more likely to make progress, as all the
evidence shows that the current core developers don't find this issue
important enough to spend their limited coding time on.

Paul.

From ncoghlan at gmail.com  Fri Dec 12 11:07:54 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Dec 2008 20:07:54 +1000
Subject: [Python-Dev] The endless GIL debate: why not remove thread
 support instead?
In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
Message-ID: <494237FA.7090500@gmail.com>

Sturla Molden wrote:
> Last month there was a discussion on Python-Dev regarding removal of
> reference counting to remove the GIL. I hope you forgive me for continuing
> the debate.

Anything to do with removing the GIL/threads/whatever other core
language feature someone doesn't like really belongs on c.l.p. or
python-ideas rather than here. Ideas should be at least remotely
feasible before they're brought to python-dev.

That said, I'll bite anyway...

Treating threads as communicating sequential processes (via the Queue
module) actually makes them pretty easy to use correctly.

They are then extraordinarily handy for performing multiple non-GIL
bound tasks (such as IO operations or number crunching using an
extension module like numpy) in parallel.

For GIL bound tasks, switching from the threading module to the
multiprocessing module now allows the activity to scale to multiple CPUs.

Removing thread support merely because concurrent programming is hard
(no matter how you do it) would be... odd (to say the least).

Changing the underlying concurrency mechanism from threads to
subinterpreters to processes to whole computers doesn't make
understanding and coping with the concepts involved in concurrency any
easier (and in fact will often make them harder to handle by increasing
the communications latency).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Dec 12 11:09:26 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 12 Dec 2008 20:09:26 +1000
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
Message-ID: <49423856.30705@gmail.com>

Paul Moore wrote:
> 2. I'd like to see isolation based on multiple interpreters, but the
> problem lies with extensions (and at a lower level with the Python C
> API) which wasn't designed with isolation in mind. Changing that may
> be nice, but it's probably too late (or if not, it's likely to be a
> lot of work to do it in a compatible manner).

Actually, I believe 3.0 already took a big step towards allowing this by
changing the way modules are initialised.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From regebro at gmail.com  Fri Dec 12 11:52:46 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 12 Dec 2008 11:52:46 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
Message-ID: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>

On Fri, Dec 12, 2008 at 02:13, Sturla Molden <sturla at molden.no> wrote:
> I genuinely think the use of threads should be discouraged. It leads to
> code that are full of bugs and difficult to maintain - race conditions,
> deadlocks, and livelocks are common pitfalls.

The use of threads for load balancing should be discouraged, yes. That
is not what they are designed for. Threads are designed to allow
blocking processes to go on in the background without blocking the
main process. This, they are very useful for. Removing thread support
would therefore be a very big mistake. It's needed, it has it's uses,
just not the one *you* want.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From sturla at molden.no  Fri Dec 12 12:23:34 2008
From: sturla at molden.no (Sturla Molden)
Date: Fri, 12 Dec 2008 12:23:34 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
 support instead?
In-Reply-To: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
Message-ID: <494249B6.6040206@molden.no>

On 12/12/2008 11:52 AM, Lennart Regebro wrote:

> The use of threads for load balancing should be discouraged, yes. That
> is not what they are designed for. Threads are designed to allow
> blocking processes to go on in the background without blocking the
> main process.

It seems that most programmers with Java or Windows experience don't 
understand this; hence the ever lasting GIL debate.

With multiple interpreters - one interpreter per thread - this could 
still be accomplished. Let one interpreter block while another continues 
to work. Then the result of the blocking operation is messaged back. 
Multi-threaded C libraries could be used the in same way. But there 
would be no need for a GIL, because each interpreter would be a 
single-threaded compartment.

.NET have something similar in what is called 'appdomains'.

I am not suggesting removal of threads but rather the Java threading 
model. I just think it is a mistake to let multiple OS threads touch the 
same interpreter.

Sturla Molden

From vext01 at gmail.com  Fri Dec 12 13:29:11 2008
From: vext01 at gmail.com (Edd Barrett)
Date: Fri, 12 Dec 2008 12:29:11 +0000
Subject: [Python-Dev] Build failure on OpenBSD 4.4-current
Message-ID: <a6e8c1d0812120429k459b169ne66724165e93afda@mail.gmail.com>

Hi,

I just had to move the "extern lstat..." outside the "ifndef
HAVE_LSTAT" to get python 2.6.1 to build on OpenBSD 4.4-current/i386.

I'm not suggesting this is correct, but it fixes the build for my
platform at least.

--- Modules/posixmodule.c.orig     Fri Dec 12 11:08:54 2008
+++ Modules/posixmodule.c       Fri Dec 12 11:54:16 2008
@@ -208,10 +208,11 @@
 #ifdef HAVE_SYMLINK
 extern int symlink(const char *, const char *);
 #endif /* HAVE_SYMLINK */
+#endif /* !HAVE_UNISTD_H */
+
 #ifdef HAVE_LSTAT
 extern int lstat(const char *, struct stat *);
 #endif /* HAVE_LSTAT */
-#endif /* !HAVE_UNISTD_H */

 #endif /* !_MSC_VER */


Im using gcc-4.2

Thanks

-- 

Best Regards

Edd

http://students.dec.bournemouth.ac.uk/ebarrett

From solipsis at pitrou.net  Fri Dec 12 14:06:35 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Dec 2008 13:06:35 +0000 (UTC)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812101139.37301.eckhardt@satorlaser.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
Message-ID: <loom.20081212T130614-77@post.gmane.org>

Curt Hagenlocher <curt <at> hagenlocher.org> writes:

> 
> 
> On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen <rhamph <at> gmail.com> wrote:
> 
> 
> I doubt that UTF-16 is used very much (other than on windows).
> 
> There's this other obscure platform called "Java"... ;)

Does it have a filesystem?



From solipsis at pitrou.net  Fri Dec 12 14:17:36 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Dec 2008 13:17:36 +0000 (UTC)
Subject: [Python-Dev] Build failure on OpenBSD 4.4-current
References: <a6e8c1d0812120429k459b169ne66724165e93afda@mail.gmail.com>
Message-ID: <loom.20081212T131515-992@post.gmane.org>

Hello,

Edd Barrett <vext01 <at> gmail.com> writes:
> 
> I just had to move the "extern lstat..." outside the "ifndef
> HAVE_LSTAT" to get python 2.6.1 to build on OpenBSD 4.4-current/i386.

Could you please open an issue in http://bugs.python.org ? That way the problem
is less likely to be overlooked.

By the way, there are other bug entries regarding OpenBSD, at least one of them
has a patch waiting for review: http://bugs.python.org/issue3920

Regards

Antoine.



From vext01 at gmail.com  Fri Dec 12 15:12:38 2008
From: vext01 at gmail.com (Edd Barrett)
Date: Fri, 12 Dec 2008 14:12:38 +0000
Subject: [Python-Dev] Build failure on OpenBSD 4.4-current
In-Reply-To: <loom.20081212T131515-992@post.gmane.org>
References: <a6e8c1d0812120429k459b169ne66724165e93afda@mail.gmail.com>
	<loom.20081212T131515-992@post.gmane.org>
Message-ID: <a6e8c1d0812120612mbe4d42w37e1b06e36e7fed1@mail.gmail.com>

Hi,

On Fri, Dec 12, 2008 at 1:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Could you please open an issue in http://bugs.python.org ? That way the problem
> is less likely to be overlooked.

http://bugs.python.org/issue4639

Thanks


-- 

Best Regards

Edd

http://students.dec.bournemouth.ac.uk/ebarrett

From curt at hagenlocher.org  Fri Dec 12 15:14:53 2008
From: curt at hagenlocher.org (Curt Hagenlocher)
Date: Fri, 12 Dec 2008 06:14:53 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <loom.20081212T130614-77@post.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
	<loom.20081212T130614-77@post.gmane.org>
Message-ID: <d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>

On Fri, Dec 12, 2008 at 5:06 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Curt Hagenlocher <curt <at> hagenlocher.org> writes:
>
> > There's this other obscure platform called "Java"... ;)
>
> Does it have a filesystem?

No, but it also has to interact with filesystems of possibly invalid
or indeterminate encodings.  What does java.io do?

--
Curt Hagenlocher
curt at hagenlocher.org

From solipsis at pitrou.net  Fri Dec 12 15:19:30 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 12 Dec 2008 14:19:30 +0000 (UTC)
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>
	<200812111019.16950.eckhardt@satorlaser.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
	<loom.20081212T130614-77@post.gmane.org>
	<d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>
Message-ID: <loom.20081212T141839-940@post.gmane.org>

Curt Hagenlocher <curt <at> hagenlocher.org> writes:
> 
> No, but it also has to interact with filesystems of possibly invalid
> or indeterminate encodings.  What does java.io do?

My point was that Python doesn't have to interact with the Java IO libraries,
while it has to interact with the Unix and Windows IO APIs.




From curt at hagenlocher.org  Fri Dec 12 15:23:16 2008
From: curt at hagenlocher.org (Curt Hagenlocher)
Date: Fri, 12 Dec 2008 06:23:16 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <loom.20081212T141839-940@post.gmane.org>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
	<loom.20081212T130614-77@post.gmane.org>
	<d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>
	<loom.20081212T141839-940@post.gmane.org>
Message-ID: <d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>

On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Curt Hagenlocher <curt <at> hagenlocher.org> writes:
>>
>> No, but it also has to interact with filesystems of possibly invalid
>> or indeterminate encodings.  What does java.io do?
>
> My point was that Python doesn't have to interact with the Java IO libraries,
> while it has to interact with the Unix and Windows IO APIs.

Of course.  But the Java IO libraries have to interact with the Unix
and Windows IO APIs as well. It might be interesting to know how they
handle similar situations.

--
Curt Hagenlocher
curt at hagenlocher.org

From lists at cheimes.de  Fri Dec 12 15:50:13 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 12 Dec 2008 15:50:13 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <49423856.30705@gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
	<49423856.30705@gmail.com>
Message-ID: <ghttn5$3o0$1@ger.gmane.org>

Nick Coghlan schrieb:
> Actually, I believe 3.0 already took a big step towards allowing this by
> changing the way modules are initialised.

You are believing correctly. Martin has designed and implemented a
nicely working API to store extension module data per interpreter state.
 For now interpreter states are used for sub interpreters only.

http://www.python.org/dev/peps/pep-3121/

Christian


From scott+python-dev at scottdial.com  Fri Dec 12 16:21:35 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Fri, 12 Dec 2008 10:21:35 -0500
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	<494103FD.5000101@holdenweb.com>
	<871vwe9mxj.fsf@xemacs.org>	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>	<4941F9A5.5040704@gmail.com>	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>	<loom.20081212T130614-77@post.gmane.org>	<d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>	<loom.20081212T141839-940@post.gmane.org>
	<d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>
Message-ID: <4942817F.20202@scottdial.com>

Curt Hagenlocher wrote:
> On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Curt Hagenlocher <curt <at> hagenlocher.org> writes:
>>> No, but it also has to interact with filesystems of possibly invalid
>>> or indeterminate encodings.  What does java.io do?
>> My point was that Python doesn't have to interact with the Java IO libraries,
>> while it has to interact with the Unix and Windows IO APIs.
> 
> Of course.  But the Java IO libraries have to interact with the Unix
> and Windows IO APIs as well. It might be interesting to know how they
> handle similar situations.

See the following email for a summary of existing practice (as of 2004):

http://www.mail-archive.com/unicode at unicode.org/msg27352.html

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From regebro at gmail.com  Fri Dec 12 17:39:33 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 12 Dec 2008 17:39:33 +0100
Subject: [Python-Dev] 2to3 question about fix_imports.
Message-ID: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>

The fix_imports fix seems to fix only the first import per line that you have.
So if you do for example
   import urllib2, cStringIO
it will not fix cStringIO.

Is this a bug or a feature? :-) If it's a feature it should warn at
least, right?

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From victor.stinner at haypocalc.com  Fri Dec 12 17:54:49 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Fri, 12 Dec 2008 17:54:49 +0100
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
Message-ID: <200812121754.50123.victor.stinner@haypocalc.com>

Le Friday 12 December 2008 17:39:33 Lennart Regebro, vous avez ?crit?:
> The fix_imports fix seems to fix only the first import per line that you
> have. So if you do for example
>    import urllib2, cStringIO
> it will not fix cStringIO.
>
> Is this a bug or a feature? :-)

I prefer to see that as a bug and so replace cStringIO by StringIO. So can you 
open an issue?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From a.badger at gmail.com  Fri Dec 12 17:56:19 2008
From: a.badger at gmail.com (Toshio Kuratomi)
Date: Fri, 12 Dec 2008 08:56:19 -0800
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>	
	<200812101139.37301.eckhardt@satorlaser.com>	
	<aac2c7cb0812101031l7ca0221l708b25db3171c526@mail.gmail.com>	
	<200812111019.16950.eckhardt@satorlaser.com>	
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>	
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>	
	<aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>	
	<494213C8.7040809@gmail.com>
	<aac2c7cb0812120000u6134d3bcyb863e77cbec86a4d@mail.gmail.com>
Message-ID: <494297B3.3000204@gmail.com>

Adam Olsen wrote:
> UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
> the browser has to display the percent escapes in the address bar,
> rather than the intended text.
> 
> IOW, inconsistent behaviour is a bug, but translating into UTF-8 is not. ;)
> 
> 
I think we should let this tangent drop because it's about bugs in
firefox bug, not in python :-)

-Toshio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081212/0edb9ab5/attachment.pgp>

From theller at ctypes.org  Fri Dec 12 18:32:07 2008
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 12 Dec 2008 18:32:07 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <ghttn5$3o0$1@ger.gmane.org>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>	<49423856.30705@gmail.com>
	<ghttn5$3o0$1@ger.gmane.org>
Message-ID: <ghu76n$6na$1@ger.gmane.org>

Christian Heimes schrieb:
> Nick Coghlan schrieb:
>> Actually, I believe 3.0 already took a big step towards allowing this by
>> changing the way modules are initialised.
> 
> You are believing correctly. Martin has designed and implemented a
> nicely working API to store extension module data per interpreter state.
>  For now interpreter states are used for sub interpreters only.
> 
> http://www.python.org/dev/peps/pep-3121/

But the extension modules still have to changed to use this mechanism, right?
-- 
Thanks,
Thomas


From regebro at gmail.com  Fri Dec 12 19:10:14 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 12 Dec 2008 19:10:14 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <494249B6.6040206@molden.no>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
	<494249B6.6040206@molden.no>
Message-ID: <319e029f0812121010p8dd97b9t8ccde78c037a42c2@mail.gmail.com>

On Fri, Dec 12, 2008 at 12:23, Sturla Molden <sturla at molden.no> wrote:
> It seems that most programmers with Java or Windows experience don't
> understand this; hence the ever lasting GIL debate.

Yes. Maybe writing this with big letters in the thread module docs would help?

> I am not suggesting removal of threads but rather the Java threading model.
> I just think it is a mistake to let multiple OS threads touch the same
> interpreter.

Does Python have a java threading model? I don't know java well enough
to know what that is. :)

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From regebro at gmail.com  Fri Dec 12 19:21:31 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Fri, 12 Dec 2008 19:21:31 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <4942817F.20202@scottdial.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
	<loom.20081212T130614-77@post.gmane.org>
	<d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>
	<loom.20081212T141839-940@post.gmane.org>
	<d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>
	<4942817F.20202@scottdial.com>
Message-ID: <319e029f0812121021v9214e89n506da07d347839a0@mail.gmail.com>

On Fri, Dec 12, 2008 at 16:21, Scott Dial
<scott+python-dev at scottdial.com> wrote:
> See the following email for a summary of existing practice (as of 2004):
>
> http://www.mail-archive.com/unicode at unicode.org/msg27352.html

Interesting. Quite a lot of them do just drop the undecodable
filenames. The Java solution with replacing it seems to be a better
idea at first glance, but what if you then end up with two filenames
that are the same? Possibly replacing with the <?> character is a good
idea to notify that the file is there, but fail then fail to open it.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From status at bugs.python.org  Fri Dec 12 18:06:44 2008
From: status at bugs.python.org (Python tracker)
Date: Fri, 12 Dec 2008 18:06:44 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20081212170644.9951B785BC@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (12/05/08 - 12/12/08)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2261 open (+58) / 14206 closed (+37) / 16467 total (+95)

Open issues with patches:   763

Average duration of open issues: 699 days.
Median duration of open issues: 2499 days.

Open Issues Breakdown
   open  2242 (+57)
pending    19 ( +1)

Issues Created Or Reopened (97)
_______________________________

Remove mimetools usage from the stdlib                           12/06/08
       http://bugs.python.org/issue2848    reopened brett.cannon              
       patch                                                                   

improve linecache: reuse tokenize.detect_encoding() and io.open( 12/12/08
       http://bugs.python.org/issue4016    reopened benjamin.peterson         
       patch                                                                   

Deprecated python 2.x syntax in "HOWTO Use Python in the web"    12/05/08
CLOSED http://bugs.python.org/issue4550    created  jcsalterego               
       patch                                                                   

The python 2.6.1 source distribution is missing Doc/tools/sphinx 12/05/08
CLOSED http://bugs.python.org/issue4551    created  andreask                  
                                                                               

Doc/tools/sphinxext not included in the 2.6.1 tarball            12/05/08
CLOSED http://bugs.python.org/issue4552    created  doko                      
                                                                               

Results from os.path.islink and os.stat S_ISLNK do not match     12/05/08
CLOSED http://bugs.python.org/issue4553    created  npatters                  
                                                                               

Missing  make altframeworkinstall for Mac OS X                   12/06/08
       http://bugs.python.org/issue4554    created  christian.heimes          
                                                                               

Smelly exports                                                   12/06/08
       http://bugs.python.org/issue4555    created  christian.heimes          
                                                                               

cmp() function erroneously noted as gone in "What's New"         12/06/08
CLOSED http://bugs.python.org/issue4556    created  mwatkins                  
                                                                               

array('c') in python 3.0 produces error, doc says it is ok       12/06/08
CLOSED http://bugs.python.org/issue4557    created  lopgok                    
                                                                               

with_stdc89                                                      12/06/08
       http://bugs.python.org/issue4558    created  christian.heimes          
       patch, patch                                                            

Whats new recommendation error                                   12/06/08
CLOSED http://bugs.python.org/issue4559    created  lregebro                  
                                                                               

"Flouted", not "flaunted"                                        12/06/08
CLOSED http://bugs.python.org/issue4560    created  jdf                       
                                                                               

Optimize new io library                                          12/06/08
       http://bugs.python.org/issue4561    created  christian.heimes          
       patch                                                                   

zip() documentation was not updated                              12/06/08
CLOSED http://bugs.python.org/issue4562    created  mchouza                   
                                                                               

Wrong formatting of contributor list in About page               12/06/08
       http://bugs.python.org/issue4563    created  salty-horse               
                                                                               

bytearray.fromhex doesn't respect bytearray subclass             12/06/08
CLOSED http://bugs.python.org/issue4564    created  pitrou                    
                                                                               

io write() performance very slow                                 12/06/08
       http://bugs.python.org/issue4565    created  ialbert                   
                                                                               

2.6.1 breaks many applications that embed Python on Windows      12/06/08
       http://bugs.python.org/issue4566    created  craigh                    
                                                                               

Registry key not set if unattended installation used             12/06/08
       http://bugs.python.org/issue4567    created  stuaxo                    
                                                                               

Improved optparse "varargs" callback example                     12/06/08
       http://bugs.python.org/issue4568    created  gregg.lind                
       patch                                                                   

Segfault when mutating a memoryview to an array.array            12/07/08
CLOSED http://bugs.python.org/issue4569    created  pitrou                    
                                                                               

Bad example in set tutorial                                      12/07/08
CLOSED http://bugs.python.org/issue4570    created  jmarter                   
                                                                               

write to stdout in binary mode - is it possible?                 12/07/08
CLOSED http://bugs.python.org/issue4571    created  lopgok                    
                                                                               

add SEEK_* values to io and/or io.IOBase                         12/07/08
       http://bugs.python.org/issue4572    created  gumpy                     
                                                                               

zsh-style subpattern matching for fnmatch/glob                   12/07/08
       http://bugs.python.org/issue4573    created  erickt                    
       patch                                                                   

reading UTF16-encoded text file crashes if \r on 64-char boundar 12/07/08
       http://bugs.python.org/issue4574    created  sjmachin                  
       patch                                                                   

Py_IS_INFINITY defect causes test_cmath failure on x86           12/07/08
       http://bugs.python.org/issue4575    created  marketdickinson           
       patch                                                                   

"Defining new types" little outdated                             12/07/08
CLOSED http://bugs.python.org/issue4576    created  exe                       
                                                                               

distutils: -3 warnings (apply)                                   12/07/08
       http://bugs.python.org/issue4577    created  srittau                   
       patch                                                                   

compiler: -3 warnings                                            12/07/08
       http://bugs.python.org/issue4578    created  srittau                   
       patch                                                                   

.read() and .readline() differ in failing                        12/07/08
       http://bugs.python.org/issue4579    created  eggy                      
       patch, needs review                                                     

slicing of memoryviews when itemsize != 1 is wrong               12/07/08
       http://bugs.python.org/issue4580    created  pitrou                    
       patch, needs review                                                     

failed to import module from lib-dynload                         12/07/08
CLOSED http://bugs.python.org/issue4581    created  legerf                    
                                                                               

type of __builtins__ changes if in main module or not            12/07/08
CLOSED http://bugs.python.org/issue4582    created  nnorwitz                  
                                                                               

segfault when mutating memoryview to array.array when array is r 12/07/08
       http://bugs.python.org/issue4583    created  gumpy                     
                                                                               

doctest fails to display bytes type                              12/07/08
CLOSED http://bugs.python.org/issue4584    created  msyang                    
                                                                               

Build failure on OS X 10.5.5: make: *** [sharedmods] Error 1     12/07/08
CLOSED http://bugs.python.org/issue4585    created  marketdickinson           
                                                                               

"Extending Embedded Python" documention uses removed Py_InitModu 12/07/08
CLOSED http://bugs.python.org/issue4586    created  blakemadden               
                                                                               

Need to rework the dbm lib/include selection process             12/08/08
       http://bugs.python.org/issue4587    created  skip.montanaro            
       patch, needs review                                                     

Need a way to make my own bytes                                  12/08/08
CLOSED http://bugs.python.org/issue4588    created  lopgok                    
                                                                               

'with' loses ->bool exceptions                                   12/08/08
CLOSED http://bugs.python.org/issue4589    created  jyasskin                  
       patch                                                                   

2to3 strips trailing L for long iterals in two fixers            12/08/08
CLOSED http://bugs.python.org/issue4590    created  aronacher                 
       patch, needs review                                                     

32-bits unsigned user/group identifier                           12/08/08
       http://bugs.python.org/issue4591    created  sjoerd                    
       patch, needs review                                                     

Embedding example does not add created module                    12/08/08
CLOSED http://bugs.python.org/issue4592    created  blakemadden               
       patch, needs review                                                     

Documentation for multiprocessing - Pool.apply()                 12/08/08
       http://bugs.python.org/issue4593    created  beazley                   
       easy                                                                    

Can't compile with -O3, on ARM, with gcc 3.4.4                   12/08/08
       http://bugs.python.org/issue4594    created  metageek                  
                                                                               

new types example is out of date                                 12/08/08
       http://bugs.python.org/issue4595    created  blakemadden               
                                                                               

2to3 does not fail as early as possible.                         12/08/08
       http://bugs.python.org/issue4596    created  LambertDW                 
                                                                               

EvalFrameEx fails to set 'why' for some exceptions               12/10/08
CLOSED http://bugs.python.org/issue4597    reopened amaury.forgeotdarc        
       patch                                                                   

IDLE not opening                                                 12/08/08
CLOSED http://bugs.python.org/issue4598    created  ec2929                    
                                                                               

Strings undisplayable with repr                                  12/08/08
CLOSED http://bugs.python.org/issue4599    created  mfoord                    
                                                                               

__class__ assignment: new-style? heap? == confusing              12/08/08
       http://bugs.python.org/issue4600    created  tjreedy                   
                                                                               

directory permission error with make install in 3.0              12/08/08
       http://bugs.python.org/issue4601    created  legerf                    
       patch                                                                   

2to3 drops executable bit with --write                           12/08/08
CLOSED http://bugs.python.org/issue4602    created  dato                      
       patch                                                                   

3.0 document tab interpretation change                           12/08/08
       http://bugs.python.org/issue4603    created  tjreedy                   
                                                                               

close() seems to have limited effect                             12/09/08
       http://bugs.python.org/issue4604    created  skip.montanaro            
       patch                                                                   

3.0 documentation mentions using maketrans from within the strin 12/09/08
       http://bugs.python.org/issue4605    created  suicideducky              
                                                                               

Passing 'None' if argtype is set to POINTER(...) doesn't always  12/09/08
       http://bugs.python.org/issue4606    created  robertluce                
       patch                                                                   

uuid behavior with multiple threads                              12/09/08
       http://bugs.python.org/issue4607    created  mortenab                  
                                                                               

urllib.request.urlopen does not return an iterable object        12/09/08
       http://bugs.python.org/issue4608    created  jwilk                     
                                                                               

Allow use of > 256 FD's on solaris in 32 bit mode                12/09/08
       http://bugs.python.org/issue4609    created  pajs at fodder.org.uk        
                                                                               

Unicode case mappings are incorrect                              12/09/08
       http://bugs.python.org/issue4610    created  alexs                     
                                                                               

Small error in "Extending Python with C or C++"                  12/09/08
       http://bugs.python.org/issue4611    created  jakamkon                  
                                                                               

PyModule_Create() doesn't add/import module                      12/09/08
CLOSED http://bugs.python.org/issue4612    created  blakemadden               
                                                                               

Can't figure out where SyntaxError: can not delete variable 'x'  12/09/08
       http://bugs.python.org/issue4613    created  marduk                    
       patch                                                                   

Document PyModule_Create()                                       12/09/08
       http://bugs.python.org/issue4614    created  brett.cannon              
       needs review                                                            

de-duping function in itertools                                  12/10/08
       http://bugs.python.org/issue4615    created  thomaspinckney3           
                                                                               

tarfile does not set the creation date and time of the extracted 12/10/08
CLOSED http://bugs.python.org/issue4616    created  throbi                    
                                                                               

SyntaxError when free variable name is also an exception target  12/10/08
       http://bugs.python.org/issue4617    created  amaury.forgeotdarc        
       patch, needs review                                                     

print_function and unicode_literals don't work together          12/10/08
       http://bugs.python.org/issue4618    created  exarkun                   
                                                                               

Invalid Behaviour When a Default Argument is a Mutable Object    12/10/08
CLOSED http://bugs.python.org/issue4619    created  rhr                       
                                                                               

Memory leak with datetime used with time.strptime                12/10/08
CLOSED http://bugs.python.org/issue4620    created  sebegue                   
                                                                               

zipfile returns string but expects binary                        12/10/08
       http://bugs.python.org/issue4621    created  francescor                
                                                                               

SequenceMatcher bug with long sequences                          12/10/08
       http://bugs.python.org/issue4622    created  eliben                    
                                                                               

IDLE shutdown if I run an edited file contains chinese           12/11/08
       http://bugs.python.org/issue4623    created  bianpeng                  
                                                                               

Can not import readline on python3.0 (ubuntu 8.04)               12/11/08
CLOSED http://bugs.python.org/issue4624    created  xxiao                     
                                                                               

IDLE won't open anymore, .idlerc unaccessible                    12/11/08
       http://bugs.python.org/issue4625    created  skcheng                   
                                                                               

compile() doesn't ignore the source encoding when a string is pa 12/11/08
       http://bugs.python.org/issue4626    created  brett.cannon              
                                                                               

Add Mac OS X Disk Images to Python.org homepage                  12/11/08
       http://bugs.python.org/issue4627    created  carlj                     
                                                                               

No universal newline support for compile() when using bytes      12/11/08
       http://bugs.python.org/issue4628    created  brett.cannon              
                                                                               

getopt should not accept no_argument that ends with '='          12/11/08
       http://bugs.python.org/issue4629    created  wangchun                  
       patch                                                                   

IDLE no longer respects .Xdefaults insertOffTime                 12/11/08
       http://bugs.python.org/issue4630    created  mark                      
                                                                               

urlopen returns extra, spurious bytes                            12/11/08
       http://bugs.python.org/issue4631    created  dato                      
                                                                               

Wrong fix for range(42)[::-1]                                    12/11/08
CLOSED http://bugs.python.org/issue4632    created  theller                   
                                                                               

file.tell() gives wrong result                                   12/11/08
CLOSED http://bugs.python.org/issue4633    created  yavuz164                  
                                                                               

2to3 should fix "import HTMLParser"                              12/11/08
CLOSED http://bugs.python.org/issue4634    created  mastrodomenico            
                                                                               

no reference for optparse methods                                12/11/08
       http://bugs.python.org/issue4635    created  techtonik                 
                                                                               

bdist_wininst installer with install script raises exception     12/11/08
       http://bugs.python.org/issue4636    created  theller                   
                                                                               

Binary floating point and decimal floating point arithmetic      12/11/08
CLOSED http://bugs.python.org/issue4637    created  Retro                     
                                                                               

1 is 1 is allways true while 1.0 is 1.0 may sometimes be true    12/12/08
CLOSED http://bugs.python.org/issue4638    created  nassrat                   
                                                                               

Build failure on OpenBSD 4.4-current regarding lstat()           12/12/08
       http://bugs.python.org/issue4639    created  vext01                    
                                                                               

optparse - dosn't distinguish between '--option' and '-option'   12/12/08
       http://bugs.python.org/issue4640    created  kszawala                  
                                                                               

optparse - dosn't distinguish between '--option' and '-option'   12/12/08
       http://bugs.python.org/issue4641    created  kszawala                  
                                                                               

optparse - dosn't distinguish between '--option' and '-option'   12/12/08
       http://bugs.python.org/issue4642    created  kszawala                  
                                                                               

cgitb.html fails if getattr call raises exception                12/12/08
       http://bugs.python.org/issue4643    created  amc1                      
                                                                               

Minor documentation fault in 2to3 script                         12/12/08
       http://bugs.python.org/issue4644    created  amc1                      
                                                                               



Issues Now Closed (74)
______________________

gdbm/ndbm 1.8.1+ needs libgdbm_compat.so                          449 days
       http://bugs.python.org/issue1167    ocean-city                
       patch                                                                   

Victor Stinner's GMP patch for longs                              328 days
       http://bugs.python.org/issue1814    marketdickinson           
       patch                                                                   

Python fails silently on bad locale                               291 days
       http://bugs.python.org/issue2173    marketdickinson           
       patch                                                                   

Full precision summation                                          214 days
       http://bugs.python.org/issue2819    marketdickinson           
       patch                                                                   

Incorrect rounding in floating-point operations with gcc/x87      198 days
       http://bugs.python.org/issue2937    marketdickinson           
                                                                               

math test fails on Solaris 10                                     173 days
       http://bugs.python.org/issue3167    marketdickinson           
       patch                                                                   

Multiprocessing Array and sharedctypes.Array error in docs/imple  165 days
       http://bugs.python.org/issue3206    amaury.forgeotdarc        
       patch                                                                   

BufferedWriter not thread-safe                                    129 days
       http://bugs.python.org/issue3476    wplappert                 
       patch                                                                   

expm1 missing                                                     123 days
       http://bugs.python.org/issue3501    marketdickinson           
                                                                               

test_math: math.log(-ninf) fails to raise exception on OpenBSD    108 days
       http://bugs.python.org/issue3682    marketdickinson           
                                                                               

math.log(x, 10) gives different result than math.log10(x)          98 days
       http://bugs.python.org/issue3724    marketdickinson           
       patch                                                                   

_lsprof: clear() should call flush_unmatched()                     79 days
       http://bugs.python.org/issue3952    haypo                     
       patch                                                                   

tokenize.detect_encoding(): raise SyntaxError on codecs.lookup()   70 days
       http://bugs.python.org/issue4021    benjamin.peterson         
       patch, patch, needs review                                              

Decimal.max(NaN, x) gives incorrect results when x is finite and   63 days
       http://bugs.python.org/issue4084    marketdickinson           
       patch                                                                   

ihooks incompatible with absolute_import feature                   35 days
       http://bugs.python.org/issue4244    georg.brandl              
                                                                               

Update pydoc URLs                                                  36 days
       http://bugs.python.org/issue4259    loewis                    
                                                                               

smtplib.py initialisation defect                                   28 days
       http://bugs.python.org/issue4302    ocean-city                
       patch                                                                   

state_reset not called on 'state' before sre_search invoked        16 days
       http://bugs.python.org/issue4416    amaury.forgeotdarc        
                                                                               

String allocations waste 3 bytes of memory on average.              9 days
       http://bugs.python.org/issue4445    marketdickinson           
       patch                                                                   

__import__ documentation obsolete                                   9 days
       http://bugs.python.org/issue4457    georg.brandl              
       patch                                                                   

parameters of PyLong_FromString() are not checked for NULL          6 days
       http://bugs.python.org/issue4461    marketdickinson           
       patch                                                                   

Windows installer crash                                             4 days
       http://bugs.python.org/issue4481    tjreedy                   
                                                                               

Error to build _dbm module during make                              6 days
       http://bugs.python.org/issue4483    skip.montanaro            
       patch, easy                                                             

Python Documentation not Newb Friendly                              4 days
       http://bugs.python.org/issue4488    georg.brandl              
                                                                               

Compiler warnings in longobject.c                                   3 days
       http://bugs.python.org/issue4497    marketdickinson           
       patch                                                                   

3.0 test failure on Mac OS X 10.5.5                                 2 days
       http://bugs.python.org/issue4507    marketdickinson           
                                                                               

Decorators should have an index entry                               2 days
       http://bugs.python.org/issue4511    georg.brandl              
                                                                               

problem with str.join - should work with list input, error says     1 days
       http://bugs.python.org/issue4534    lopgok                    
                                                                               

webbrowser.UnixBrowser should use builtins.open                     1 days
       http://bugs.python.org/issue4537    amaury.forgeotdarc        
                                                                               

A defect in <The Python Tutorial>-<Python Scopes and Name Spaces    0 days
       http://bugs.python.org/issue4549    georg.brandl              
                                                                               

Deprecated python 2.x syntax in "HOWTO Use Python in the web"       0 days
       http://bugs.python.org/issue4550    georg.brandl              
       patch                                                                   

The python 2.6.1 source distribution is missing Doc/tools/sphinx    0 days
       http://bugs.python.org/issue4551    georg.brandl              
                                                                               

Doc/tools/sphinxext not included in the 2.6.1 tarball               3 days
       http://bugs.python.org/issue4552    georg.brandl              
                                                                               

Results from os.path.islink and os.stat S_ISLNK do not match        0 days
       http://bugs.python.org/issue4553    christian.heimes          
                                                                               

cmp() function erroneously noted as gone in "What's New"            0 days
       http://bugs.python.org/issue4556    georg.brandl              
                                                                               

array('c') in python 3.0 produces error, doc says it is ok          0 days
       http://bugs.python.org/issue4557    georg.brandl              
                                                                               

Whats new recommendation error                                      1 days
       http://bugs.python.org/issue4559    lregebro                  
                                                                               

"Flouted", not "flaunted"                                           0 days
       http://bugs.python.org/issue4560    georg.brandl              
                                                                               

zip() documentation was not updated                                 0 days
       http://bugs.python.org/issue4562    georg.brandl              
                                                                               

bytearray.fromhex doesn't respect bytearray subclass                0 days
       http://bugs.python.org/issue4564    amaury.forgeotdarc        
                                                                               

Segfault when mutating a memoryview to an array.array               1 days
       http://bugs.python.org/issue4569    pitrou                    
                                                                               

Bad example in set tutorial                                         0 days
       http://bugs.python.org/issue4570    rhettinger                
                                                                               

write to stdout in binary mode - is it possible?                    1 days
       http://bugs.python.org/issue4571    christian.heimes          
                                                                               

"Defining new types" little outdated                                0 days
       http://bugs.python.org/issue4576    georg.brandl              
                                                                               

failed to import module from lib-dynload                            0 days
       http://bugs.python.org/issue4581    loewis                    
                                                                               

type of __builtins__ changes if in main module or not               0 days
       http://bugs.python.org/issue4582    loewis                    
                                                                               

doctest fails to display bytes type                                 0 days
       http://bugs.python.org/issue4584    georg.brandl              
                                                                               

Build failure on OS X 10.5.5: make: *** [sharedmods] Error 1        1 days
       http://bugs.python.org/issue4585    marketdickinson           
                                                                               

"Extending Embedded Python" documention uses removed Py_InitModu    2 days
       http://bugs.python.org/issue4586    georg.brandl              
                                                                               

Need a way to make my own bytes                                     0 days
       http://bugs.python.org/issue4588    loewis                    
                                                                               

'with' loses ->bool exceptions                                      3 days
       http://bugs.python.org/issue4589    amaury.forgeotdarc        
       patch                                                                   

2to3 strips trailing L for long iterals in two fixers               1 days
       http://bugs.python.org/issue4590    aronacher                 
       patch, needs review                                                     

Embedding example does not add created module                       1 days
       http://bugs.python.org/issue4592    georg.brandl              
       patch, needs review                                                     

EvalFrameEx fails to set 'why' for some exceptions                  0 days
       http://bugs.python.org/issue4597    jyasskin                  
       patch                                                                   

IDLE not opening                                                    2 days
       http://bugs.python.org/issue4598    loewis                    
                                                                               

Strings undisplayable with repr                                     0 days
       http://bugs.python.org/issue4599    loewis                    
                                                                               

2to3 drops executable bit with --write                              3 days
       http://bugs.python.org/issue4602    benjamin.peterson         
       patch                                                                   

PyModule_Create() doesn't add/import module                         0 days
       http://bugs.python.org/issue4612    amaury.forgeotdarc        
                                                                               

tarfile does not set the creation date and time of the extracted    2 days
       http://bugs.python.org/issue4616    lars.gustaebel            
                                                                               

Invalid Behaviour When a Default Argument is a Mutable Object       0 days
       http://bugs.python.org/issue4619    loewis                    
                                                                               

Memory leak with datetime used with time.strptime                   1 days
       http://bugs.python.org/issue4620    skip.montanaro            
                                                                               

Can not import readline on python3.0 (ubuntu 8.04)                  0 days
       http://bugs.python.org/issue4624    benjamin.peterson         
                                                                               

Wrong fix for range(42)[::-1]                                       0 days
       http://bugs.python.org/issue4632    benjamin.peterson         
                                                                               

file.tell() gives wrong result                                      0 days
       http://bugs.python.org/issue4633    QuantumTim                
                                                                               

2to3 should fix "import HTMLParser"                                 0 days
       http://bugs.python.org/issue4634    benjamin.peterson         
                                                                               

Binary floating point and decimal floating point arithmetic         0 days
       http://bugs.python.org/issue4637    gvanrossum                
                                                                               

1 is 1 is allways true while 1.0 is 1.0 may sometimes be true       0 days
       http://bugs.python.org/issue4638    tim_one                   
                                                                               

Proto 2 pickle vs dict subclass                                  1873 days
       http://bugs.python.org/issue826897  benjamin.peterson         
                                                                               

Python interpreter stalled on _PyPclose.WaitForSingleObject      1708 days
       http://bugs.python.org/issue928332  amaury.forgeotdarc        
                                                                               

Fix for #777597 - socketmodule.c connection handling incorec     1647 days
       http://bugs.python.org/issue965036  amaury.forgeotdarc        
       patch                                                                   

distutils' dry-run wants to create some real build dirs          1545 days
       http://bugs.python.org/issue1030250 amaury.forgeotdarc        
       patch                                                                   

correct/clarify documentation for super                          1363 days
       http://bugs.python.org/issue1163367 rhettinger                
                                                                               

sys.settrace cause curried parms to show up as attributes         796 days
       http://bugs.python.org/issue1569356 loewis                    
                                                                               

thread + import => crashes?                                       292 days
       http://bugs.python.org/issue1720705 forest                    
                                                                               



Top Issues Most Discussed (10)
______________________________

 42 Get rid of more refercenes to __cmp__                             57 days
open    http://bugs.python.org/issue1717   

 23 slicing of memoryviews when itemsize != 1 is wrong                 5 days
open    http://bugs.python.org/issue4580   

 17 Make conversions from long to float correctly rounded.           174 days
open    http://bugs.python.org/issue3166   

 13 Optimize new io library                                            6 days
open    http://bugs.python.org/issue4561   

 12 bugs in array.array with exports (buffer protocol)                 9 days
open    http://bugs.python.org/issue4509   

 11 Whats new recommendation error                                     1 days
closed  http://bugs.python.org/issue4559   

 11 Error to build _dbm module during make                             6 days
closed  http://bugs.python.org/issue4483   

 10 with_stdc89                                                        7 days
open    http://bugs.python.org/issue4558   

  9 tarfile does not set the creation date and time of the extracte    2 days
closed  http://bugs.python.org/issue4616   

  9 'with' loses ->bool exceptions                                     3 days
closed  http://bugs.python.org/issue4589   




From glyph at divmod.com  Fri Dec 12 21:54:07 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 12 Dec 2008 20:54:07 -0000
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org>
	<aac2c7cb0812112122l6d91f47dh285f81efe5342ab6@mail.gmail.com>
	<4941F9A5.5040704@gmail.com>
	<aac2c7cb0812112219u33559597qae3f7f723c8b67e4@mail.gmail.com>
	<d2155e360812112225ta575b8bwd41374474e65fb91@mail.gmail.com>
	<loom.20081212T130614-77@post.gmane.org>
	<d2155e360812120614n4fcb3ae7n9326b496b46bfd60@mail.gmail.com>
	<loom.20081212T141839-940@post.gmane.org>
	<d2155e360812120623x748fe687r3a473a5cf7e26741@mail.gmail.com>
Message-ID: <20081212205407.12555.311547571.divmod.xquotient.2122@weber.divmod.com>

On 02:23 pm, curt at hagenlocher.org wrote:
>On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou <solipsis at pitrou.net> 
>wrote:
>>Curt Hagenlocher <curt <at> hagenlocher.org> writes:
>>>
>>>No, but it also has to interact with filesystems of possibly invalid
>>>or indeterminate encodings.  What does java.io do?
>>
>>My point was that Python doesn't have to interact with the Java IO 
>>libraries,
>>while it has to interact with the Unix and Windows IO APIs.
>
>Of course.  But the Java IO libraries have to interact with the Unix
>and Windows IO APIs as well. It might be interesting to know how they
>handle similar situations.

Apparently Java has the facilities to do the right thing, but actually 
it's just broken.

My locale says UTF-8.  However, if I create a non-decodable file with 
Python (2), there are three ways I can tell Java to open it: I can ask 
for it with a string (that won't work, because no valid UTF-8 string 
maps to an undecodable string, pretty much by definition).  I can list 
the directory that it's in (presuming that *that's* a directory) and get 
a java.io.File, which could be retaining all the interesting 
information, or I can use a URI, which is a string that resolves to 
octets before it resolves to characters again.

However, it looks like Java screws up in every case.

Here's a transcript from the ever-helpful jython:

glyph at nhuvasarim:~/tmp$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 
4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>file("\xff\xff", "wb").write("lolz\n")
glyph at nhuvasarim:~/tmp$ jython
Jython 2.2.1 on java1.6.0_07
Type "copyright", "credits" or "license" for more information.
>>>from java.io import File
>>>fileList = File(".").listFiles()
>>>fileList
array(java.io.File,[./
>>>fileList[0].__class__
<jclass java.io.File 1>
>>>from java.io import FileReader
>>>FileReader(fileList[0])
Traceback (innermost last):
  File "<console>", line 1, in ?
         at java.io.FileInputStream.open(Native Method)
         at java.io.FileInputStream.<init>(FileInputStream.java:106)
         at java.io.FileReader.<init>(FileReader.java:55)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at 
java.lang.reflect.Constructor.newInstance(Constructor.java:513)

java.io.FileNotFoundException: java.io.FileNotFoundException: ./?FD?FD (No 
such file or directory)
>>>from java.net import URI
>>>u = URI("file:///home/glyph/tmp/%ff%ff")
>>>FileReader(File(u))
Traceback (innermost last):
  File "<console>", line 1, in ?
         at java.io.FileInputStream.open(Native Method)
         at java.io.FileInputStream.<init>(FileInputStream.java:106)
         at java.io.FileReader.<init>(FileReader.java:55)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at 
java.lang.reflect.Constructor.newInstance(Constructor.java:513)

java.io.FileNotFoundException: java.io.FileNotFoundException: 
/home/glyph/tmp/?FD?FD (No such file or directory)
>>>

From ncoghlan at gmail.com  Fri Dec 12 22:34:01 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 13 Dec 2008 07:34:01 +1000
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <ghu76n$6na$1@ger.gmane.org>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>	<49423856.30705@gmail.com>	<ghttn5$3o0$1@ger.gmane.org>
	<ghu76n$6na$1@ger.gmane.org>
Message-ID: <4942D8C9.5080203@gmail.com>

Thomas Heller wrote:
> Christian Heimes schrieb:
>> Nick Coghlan schrieb:
>>> Actually, I believe 3.0 already took a big step towards allowing this by
>>> changing the way modules are initialised.
>> You are believing correctly. Martin has designed and implemented a
>> nicely working API to store extension module data per interpreter state.
>>  For now interpreter states are used for sub interpreters only.
>>
>> http://www.python.org/dev/peps/pep-3121/
> 
> But the extension modules still have to changed to use this mechanism, right?

Yep, but at least it's *possible* now. With 2.x, it isn't possible for
an extension module to support subinterpreters properly, even if they
want to.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From nd at perlig.de  Sat Dec 13 05:47:47 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Sat, 13 Dec 2008 05:47:47 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812120119s44ba4264ne7d2edf112188768@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812121011.09427.nd@perlig.de>
	<aac2c7cb0812120119s44ba4264ne7d2edf112188768@mail.gmail.com>
Message-ID: <200812130547.47351@news.perlig.de>

* Adam Olsen wrote:

> On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo <nd at perlig.de> wrote:
> > * Adam Olsen wrote:
> >> UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
> >> the browser has to display the percent escapes in the address bar,
> >> rather than the intended text.
> >
> > Duh! The address bar should contain the URL, which *is* the intended
> > text. The escapes are there for a reason. If I pass some octets using
> > percent escapes via the query string or request body, it's not text,
> > not even intended. It's still a collection of octets. Translating them
> > back (and forth when I press enter in the address bar) is a pretty
> > ambigious operation and therefore pretty wrong.
> >
> > The defacto standard does not exist. There's a real one instead: RFC
> > 2396.
>
> All the heaps of people using non-english wikipedia sites might
> disagree with you.  There's only, what, a few *million* pages that
> would be affected?

I'm not sure what you're trying to pull here. Is that supposed to be an 
argument? There's no page affected at all. It's a browser UI issue, not a 
page issue.

And even if it were interesting at all, how the URL escapes are displayed in 
the address bar, those millions of people would favourite KOI8-R or Big 5 
over UTF-8 if you would ask them.

Which leads to the exact point: The browser cannot know, nor should it even. 
It's opaque. The only entity which needs to understand the encoding of URL 
percent escapes in query or request body is the *server* selecting the 
resource.

But I'm sure I'm not telling you any news here.

nd
-- 
"Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gef?hrten nicht zu z?hlen brauchte" -- Karl May, "Winnetou III"

Im Westen was neues: <http://pub.perlig.de/books.html#apache2>

From rhamph at gmail.com  Sat Dec 13 07:12:47 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 12 Dec 2008 23:12:47 -0700
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <200812130547.47351@news.perlig.de>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812121011.09427.nd@perlig.de>
	<aac2c7cb0812120119s44ba4264ne7d2edf112188768@mail.gmail.com>
	<200812130547.47351@news.perlig.de>
Message-ID: <aac2c7cb0812122212j71c31b04xb8f56f89f1534c66@mail.gmail.com>

On Fri, Dec 12, 2008 at 9:47 PM, Andr? Malo <nd at perlig.de> wrote:
> * Adam Olsen wrote:
>> On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo <nd at perlig.de> wrote:
>> > * Adam Olsen wrote:
>> >> UTF-8 in percent encodings is becoming a defacto standard.  Otherwise
>> >> the browser has to display the percent escapes in the address bar,
>> >> rather than the intended text.
>> >
>> > Duh! The address bar should contain the URL, which *is* the intended
>> > text. The escapes are there for a reason. If I pass some octets using
>> > percent escapes via the query string or request body, it's not text,
>> > not even intended. It's still a collection of octets. Translating them
>> > back (and forth when I press enter in the address bar) is a pretty
>> > ambigious operation and therefore pretty wrong.
>> >
>> > The defacto standard does not exist. There's a real one instead: RFC
>> > 2396.
>>
>> All the heaps of people using non-english wikipedia sites might
>> disagree with you.  There's only, what, a few *million* pages that
>> would be affected?
>
> I'm not sure what you're trying to pull here. Is that supposed to be an
> argument? There's no page affected at all. It's a browser UI issue, not a
> page issue.
>
> And even if it were interesting at all, how the URL escapes are displayed in
> the address bar, those millions of people would favourite KOI8-R or Big 5
> over UTF-8 if you would ask them.
>
> Which leads to the exact point: The browser cannot know, nor should it even.
> It's opaque. The only entity which needs to understand the encoding of URL
> percent escapes in query or request body is the *server* selecting the
> resource.
>
> But I'm sure I'm not telling you any news here.

You're arguing that text should be an opaque entity..

We've wasted enough of everybody's time on this already, I'm not going
to continue on this thread.  Send me a private email if you think it's
really important.


-- 
Adam Olsen, aka Rhamphoryncus

From nd at perlig.de  Sat Dec 13 07:34:06 2008
From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=)
Date: Sat, 13 Dec 2008 07:34:06 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812122212j71c31b04xb8f56f89f1534c66@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<200812130547.47351@news.perlig.de>
	<aac2c7cb0812122212j71c31b04xb8f56f89f1534c66@mail.gmail.com>
Message-ID: <200812130734.06774@news.perlig.de>

* Adam Olsen wrote:

> On Fri, Dec 12, 2008 at 9:47 PM, Andr? Malo <nd at perlig.de> wrote:
> > * Adam Olsen wrote:
> >> On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo <nd at perlig.de> wrote:
> >> > * Adam Olsen wrote:
> >> >> UTF-8 in percent encodings is becoming a defacto standard. 
> >> >> Otherwise the browser has to display the percent escapes in the
> >> >> address bar, rather than the intended text.
> >> >
> >> > Duh! The address bar should contain the URL, which *is* the intended
> >> > text. The escapes are there for a reason. If I pass some octets
> >> > using percent escapes via the query string or request body, it's not
> >> > text, not even intended. It's still a collection of octets.
> >> > Translating them back (and forth when I press enter in the address
> >> > bar) is a pretty ambigious operation and therefore pretty wrong.
> >> >
> >> > The defacto standard does not exist. There's a real one instead: RFC
> >> > 2396.
> >>
> >> All the heaps of people using non-english wikipedia sites might
> >> disagree with you.  There's only, what, a few *million* pages that
> >> would be affected?
> >
> > I'm not sure what you're trying to pull here. Is that supposed to be an
> > argument? There's no page affected at all. It's a browser UI issue, not
> > a page issue.
> >
> > And even if it were interesting at all, how the URL escapes are
> > displayed in the address bar, those millions of people would favourite
> > KOI8-R or Big 5 over UTF-8 if you would ask them.
> >
> > Which leads to the exact point: The browser cannot know, nor should it
> > even. It's opaque. The only entity which needs to understand the
> > encoding of URL percent escapes in query or request body is the
> > *server* selecting the resource.
> >
> > But I'm sure I'm not telling you any news here.
>
> You're arguing that text should be an opaque entity..

No, actually I'm not. I'm arguing that escapes are opaque.

> We've wasted enough of everybody's time on this already, I'm not going
> to continue on this thread. 

Agreed.

nd
-- 
Da f?llt mir ein, wieso gibt es eigentlich in Unicode kein
"i" mit einem Herzchen als T?pfelchen? Das w?r sooo s??ss!

                                 -- Bj?rn H?hrmann in darw

From lie.1296 at gmail.com  Sat Dec 13 08:57:28 2008
From: lie.1296 at gmail.com (Lie Ryan)
Date: Sat, 13 Dec 2008 07:57:28 +0000 (UTC)
Subject: [Python-Dev] Psyco for -OO or -O
Message-ID: <ghvpt8$qq0$1@ger.gmane.org>

I'm sure probably most of you knows about psyco[1], the optimizer. Python 
has an -O and -OO flag that is intended to be optimization flag, but we 
know that currently it doesn't do much. Why not add psyco as standard 
library and let -O or -OO invoke psyco?

[1] http://psyco.sourceforge.net/index.html


From fuzzyman at voidspace.org.uk  Sat Dec 13 14:28:37 2008
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 13 Dec 2008 13:28:37 +0000
Subject: [Python-Dev] Psyco for -OO or -O
In-Reply-To: <ghvpt8$qq0$1@ger.gmane.org>
References: <ghvpt8$qq0$1@ger.gmane.org>
Message-ID: <4943B885.1070605@voidspace.org.uk>

Lie Ryan wrote:
> I'm sure probably most of you knows about psyco[1], the optimizer. Python 
> has an -O and -OO flag that is intended to be optimization flag, but we 
> know that currently it doesn't do much. Why not add psyco as standard 
> library and let -O or -OO invoke psyco?
>   

This really belongs on Python-ideas and not Python-dev.

The main reason why not is that someone(s) from the Python core team 
would then need to 'own' maintaining Psyco (which is x86 only as well). 
Psyco is so hard to maintain that even the original author wants to drop 
it. :-)

Michael Foord

> [1] http://psyco.sourceforge.net/index.html
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>   


-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog



From fuzzyman at voidspace.org.uk  Sat Dec 13 14:32:36 2008
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 13 Dec 2008 13:32:36 +0000
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
Message-ID: <4943B974.6020407@voidspace.org.uk>

Lennart Regebro wrote:
> On Fri, Dec 12, 2008 at 02:13, Sturla Molden <sturla at molden.no> wrote:
>   
>> I genuinely think the use of threads should be discouraged. It leads to
>> code that are full of bugs and difficult to maintain - race conditions,
>> deadlocks, and livelocks are common pitfalls.
>>     
>
> The use of threads for load balancing should be discouraged, yes. That
> is not what they are designed for. Threads are designed to allow
> blocking processes to go on in the background without blocking the
> main process. This, they are very useful for. Removing thread support
> would therefore be a very big mistake. It's needed, it has it's uses,
> just not the one *you* want.
>
>   

That's an interesting assertion about what threads were designed for. Do 
you have anything to back it up?

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog



From lie.1296 at gmail.com  Sat Dec 13 16:19:47 2008
From: lie.1296 at gmail.com (Lie Ryan)
Date: Sat, 13 Dec 2008 15:19:47 +0000 (UTC)
Subject: [Python-Dev] Psyco for -OO or -O
References: <ghvpt8$qq0$1@ger.gmane.org> <4943B885.1070605@voidspace.org.uk>
Message-ID: <gi0jqj$q63$1@ger.gmane.org>

On Sat, 13 Dec 2008 13:28:37 +0000, Michael Foord wrote:

> Lie Ryan wrote:
>> I'm sure probably most of you knows about psyco[1], the optimizer.
>> Python has an -O and -OO flag that is intended to be optimization flag,
>> but we know that currently it doesn't do much. Why not add psyco as
>> standard library and let -O or -OO invoke psyco?
>>   
>>   
> This really belongs on Python-ideas and not Python-dev.

Ah yes, sorry about that, I'm new here. This will be my last post about 
this here...


From guido at python.org  Sat Dec 13 17:14:57 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Dec 2008 08:14:57 -0800
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <4943B974.6020407@voidspace.org.uk>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
	<4943B974.6020407@voidspace.org.uk>
Message-ID: <ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>

Yes, this is what threads were designed for. As an abstraction to have
multiple "threads of control" on a *single* processor (in a single
process). The whole multi-core business came decades later. (Classic
multi-processors have something called threads too, but they, too,
came later than the original single-core-single-CPU thread concept,
and often threads on those systems have properties that don't match
how threads work on modern multi-core CPUs.)

On Sat, Dec 13, 2008 at 5:32 AM, Michael Foord
<fuzzyman at voidspace.org.uk> wrote:
> Lennart Regebro wrote:
>>
>> On Fri, Dec 12, 2008 at 02:13, Sturla Molden <sturla at molden.no> wrote:
>>
>>>
>>> I genuinely think the use of threads should be discouraged. It leads to
>>> code that are full of bugs and difficult to maintain - race conditions,
>>> deadlocks, and livelocks are common pitfalls.
>>>
>>
>> The use of threads for load balancing should be discouraged, yes. That
>> is not what they are designed for. Threads are designed to allow
>> blocking processes to go on in the background without blocking the
>> main process. This, they are very useful for. Removing thread support
>> would therefore be a very big mistake. It's needed, it has it's uses,
>> just not the one *you* want.
>>
>>
>
> That's an interesting assertion about what threads were designed for. Do you
> have anything to back it up?
>
> Michael
>
> --
> http://www.ironpythoninaction.com/
> http://www.voidspace.org.uk/blog
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steve at holdenweb.com  Sat Dec 13 17:57:44 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 13 Dec 2008 11:57:44 -0500
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>	<4943B974.6020407@voidspace.org.uk>
	<ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
Message-ID: <gi0pi4$f0j$1@ger.gmane.org>

If I remember correctly (when threading was invented in the mid-1980s)
threads were originally described as "lightweight processes". The
perceived advantage at the time was the ability to have multiple threads
of control with shared memory: this was much faster than the available
inter-process communication mechanisms. On a single-processor computer
synchronization was much less of a problem.

regards
 Steve


Guido van Rossum wrote:
> Yes, this is what threads were designed for. As an abstraction to have
> multiple "threads of control" on a *single* processor (in a single
> process). The whole multi-core business came decades later. (Classic
> multi-processors have something called threads too, but they, too,
> came later than the original single-core-single-CPU thread concept,
> and often threads on those systems have properties that don't match
> how threads work on modern multi-core CPUs.)
> 
> On Sat, Dec 13, 2008 at 5:32 AM, Michael Foord
> <fuzzyman at voidspace.org.uk> wrote:
>> Lennart Regebro wrote:
>>> On Fri, Dec 12, 2008 at 02:13, Sturla Molden <sturla at molden.no> wrote:
>>>
>>>> I genuinely think the use of threads should be discouraged. It leads to
>>>> code that are full of bugs and difficult to maintain - race conditions,
>>>> deadlocks, and livelocks are common pitfalls.
>>>>
>>> The use of threads for load balancing should be discouraged, yes. That
>>> is not what they are designed for. Threads are designed to allow
>>> blocking processes to go on in the background without blocking the
>>> main process. This, they are very useful for. Removing thread support
>>> would therefore be a very big mistake. It's needed, it has it's uses,
>>> just not the one *you* want.
>>>
>>>
>> That's an interesting assertion about what threads were designed for. Do you
>> have anything to back it up?
>>

-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From wilk at flibuste.net  Sat Dec 13 16:31:00 2008
From: wilk at flibuste.net (William Dode)
Date: Sat, 13 Dec 2008 15:31:00 +0000 (UTC)
Subject: [Python-Dev] Psyco for -OO or -O
References: <ghvpt8$qq0$1@ger.gmane.org> <4943B885.1070605@voidspace.org.uk>
Message-ID: <gi0kfk$us3$1@ger.gmane.org>

On 13-12-2008, Michael Foord wrote:
> Lie Ryan wrote:
>> I'm sure probably most of you knows about psyco[1], the optimizer. Python 
>> has an -O and -OO flag that is intended to be optimization flag, but we 
>> know that currently it doesn't do much. Why not add psyco as standard 
>> library and let -O or -OO invoke psyco?
>>   
>
> This really belongs on Python-ideas and not Python-dev.
>
> The main reason why not is that someone(s) from the Python core team 
> would then need to 'own' maintaining Psyco (which is x86 only as well). 
> Psyco is so hard to maintain that even the original author wants to drop 
> it. :-)

It could be the killer feature wich will push python3 adoption ;-) 
Bloggers like so much benchings !

Sorry...

-- 
William Dod? - http://flibuste.net
Informaticien Ind?pendant


From roy.lowrance at gmail.com  Sat Dec 13 18:08:59 2008
From: roy.lowrance at gmail.com (Roy Lowrance)
Date: Sat, 13 Dec 2008 12:08:59 -0500
Subject: [Python-Dev] beginning developer: fastest way to learn how Python
	3.0 works
Message-ID: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>

I'd like to learn how Python 3.0 works. I've downloaded the svn.

I am wondering what the best way to learn is:
- Just jump in?
- Or perhaps learn A before B?
- Or maybe there is a tutorial for those new to the internals?

What's the best way to learn how Python 3.0 works?

Roy

From lists at cheimes.de  Sat Dec 13 18:13:55 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 13 Dec 2008 18:13:55 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <gi0pi4$f0j$1@ger.gmane.org>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>	<4943B974.6020407@voidspace.org.uk>	<ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
	<gi0pi4$f0j$1@ger.gmane.org>
Message-ID: <gi0qgk$iak$1@ger.gmane.org>

Steve Holden schrieb:
> If I remember correctly (when threading was invented in the mid-1980s)
> threads were originally described as "lightweight processes". The
> perceived advantage at the time was the ability to have multiple threads
> of control with shared memory: this was much faster than the available
> inter-process communication mechanisms. On a single-processor computer
> synchronization was much less of a problem.

Initially one of Java's main target platforms were set-top boxes. Back
in the 90ties set-top boxes had limited hardware and dumb processors.
Most of the boxes had no MMU and so didn't support multiple processes.
Threads were the easiest way to have some kind of concurrency.

Back in those days threads were the only solution for concurrency but
today - about 15 years later with powerful processors even in cheap
mobile phones - people are still indoctrinated with the same philosophy ...

Christian


From guido at python.org  Sat Dec 13 18:48:03 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Dec 2008 09:48:03 -0800
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <gi0qgk$iak$1@ger.gmane.org>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>
	<4943B974.6020407@voidspace.org.uk>
	<ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
	<gi0pi4$f0j$1@ger.gmane.org> <gi0qgk$iak$1@ger.gmane.org>
Message-ID: <ca471dc20812130948hfba3268l6316666387485f3c@mail.gmail.com>

On Sat, Dec 13, 2008 at 9:13 AM, Christian Heimes <lists at cheimes.de> wrote:
> Steve Holden schrieb:
>> If I remember correctly (when threading was invented in the mid-1980s)
>> threads were originally described as "lightweight processes". The
>> perceived advantage at the time was the ability to have multiple threads
>> of control with shared memory: this was much faster than the available
>> inter-process communication mechanisms. On a single-processor computer
>> synchronization was much less of a problem.
>
> Initially one of Java's main target platforms were set-top boxes. Back
> in the 90ties set-top boxes had limited hardware and dumb processors.
> Most of the boxes had no MMU and so didn't support multiple processes.
> Threads were the easiest way to have some kind of concurrency.

Just let's not rewrite history and believe Java invented threads. They
were around well before that.

> Back in those days threads were the only solution for concurrency but
> today - about 15 years later with powerful processors even in cheap
> mobile phones - people are still indoctrinated with the same philosophy ...

It's not so much indoctrination. Threads are a useful tool. The
problem is that some people perceive threads as the *only* tool.
There's a whole spectrum of tools, from event handling to multiple
processes, and they don't all solve the same problem. (I guess it
doesn't help that the word process is given new meanings by some
languages.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steve at pearwood.info  Fri Dec 12 13:01:29 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 12 Dec 2008 23:01:29 +1100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <494213C8.7040809@gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com>
	<aac2c7cb0812112225sc6b41fatd379df47f1ef84de@mail.gmail.com>
	<494213C8.7040809@gmail.com>
Message-ID: <200812122301.29729.steve@pearwood.info>

On Fri, 12 Dec 2008 06:33:28 pm Toshio Kuratomi wrote:

> Also interesting, if you point your browser at:
>   http://toshio.fedorapeople.org/u/
>
> You should see two other test files.  They're both
> (one-half)(enyei).html but one's encoded in utf-8 and the other in
> latin-1.

For what it's worth, Konquorer 3.5 displays the two files as 

(1/2)(n+tilde).html
(A+caret)(1/2)(A+tilde)(plusminus).html

It doesn't seem to have any trouble opening either of them.


-- 
Steven

From aahz at pythoncraft.com  Sat Dec 13 19:18:51 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 13 Dec 2008 10:18:51 -0800
Subject: [Python-Dev] beginning developer: fastest way to learn
	how	Python 3.0 works
In-Reply-To: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>
References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>
Message-ID: <20081213181851.GA23531@panix.com>

On Sat, Dec 13, 2008, Roy Lowrance wrote:
> 
> What's the best way to learn how Python 3.0 works?

Post to the correct mailing list.  ;-)

Use comp.lang.python or python-tutor or python-help

python-dev is for people creating new versions of Python
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From roy.lowrance at gmail.com  Sat Dec 13 19:30:10 2008
From: roy.lowrance at gmail.com (Roy Lowrance)
Date: Sat, 13 Dec 2008 13:30:10 -0500
Subject: [Python-Dev] beginning developer: fastest way to learn how
	Python 3.0 works
In-Reply-To: <20081213181851.GA23531@panix.com>
References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>
	<20081213181851.GA23531@panix.com>
Message-ID: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com>

Maybe this is the correct list, as my inquiry is about how to learn
how the current implementation works so that I could consider how to
implement new features.

So, here's a modified question: If you want to learn how python works
(not how to program in the python language), what's a productive way
to proceed?

Roy

On Sat, Dec 13, 2008 at 1:18 PM, Aahz <aahz at pythoncraft.com> wrote:
> On Sat, Dec 13, 2008, Roy Lowrance wrote:
>>
>> What's the best way to learn how Python 3.0 works?
>
> Post to the correct mailing list.  ;-)
>
> Use comp.lang.python or python-tutor or python-help
>
> python-dev is for people creating new versions of Python
> --
> Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/
>
> "It is easier to optimize correct code than to correct optimized code."
> --Bill Harlan
>



-- 
Roy Lowrance
home: 212 674 9777
mobile: 347 255 2544

From tjreedy at udel.edu  Sat Dec 13 21:13:55 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 13 Dec 2008 15:13:55 -0500
Subject: [Python-Dev] beginning developer: fastest way to learn how
	Python 3.0 works
In-Reply-To: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com>
References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>	<20081213181851.GA23531@panix.com>
	<162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com>
Message-ID: <gi1520$fau$1@ger.gmane.org>

Roy Lowrance wrote:
> Maybe this is the correct list, as my inquiry is about how to learn
> how the current implementation works so that I could consider how to
> implement new features.
> 
> So, here's a modified question: If you want to learn how python works
> (not how to program in the python language), what's a productive way
> to proceed?

There are developer pages on the site, a wiki page on the ceval loop, 
the extending and embedding manual, and the code itself.


From solipsis at pitrou.net  Sat Dec 13 22:22:16 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 13 Dec 2008 21:22:16 +0000 (UTC)
Subject: [Python-Dev] Reindenting the C code base?
Message-ID: <loom.20081213T211524-493@post.gmane.org>


Hello,

I remember there were some talks of reindenting the C code base (from tabs to
4-space indents) after py3k is released, but I can't find the discussion thread
again. Was a decision ever taken about it?

Regards

Antoine.



From guido at python.org  Sat Dec 13 22:26:50 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 13 Dec 2008 13:26:50 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <loom.20081213T211524-493@post.gmane.org>
References: <loom.20081213T211524-493@post.gmane.org>
Message-ID: <ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>

On Sat, Dec 13, 2008 at 1:22 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> I remember there were some talks of reindenting the C code base (from tabs to
> 4-space indents) after py3k is released, but I can't find the discussion thread
> again. Was a decision ever taken about it?

I think we should not do this. We should use 4 space indents for new
files, but existing files should not be reindented. If you reindent,
much of the history of the file is essentially lost -- "svn blame"
will blame whoever reindented the code, and it's a pain to go back.
There's also the issue of merging between the 2.x and 3.x branches,
which we still do.

As far as a decision, I think the de facto decision is to keep the
status quo, and I'm all for sticking with that.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From solipsis at pitrou.net  Sat Dec 13 23:11:47 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 13 Dec 2008 22:11:47 +0000 (UTC)
Subject: [Python-Dev] Reindenting the C code base?
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
Message-ID: <loom.20081213T220617-917@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> 
> I think we should not do this. We should use 4 space indents for new
> files, but existing files should not be reindented.

Well, right now many files are indented with a mix of spaces and tabs, depending
on who did the edit and how their editor was configured at the time.

Perhaps a graceful policy would be to mandate that all new edits be made with
spaces without touching other functions in the file. Then hopefully the code
base would gradually converge to a tabless scheme.

Regards

Antoine.



From martin at v.loewis.de  Sat Dec 13 23:28:32 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 13 Dec 2008 23:28:32 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
Message-ID: <49443710.3060102@v.loewis.de>

On behalf of the Python development team and the Python community, I'm
happy to announce the release candidates of Python 2.4.6 and 2.5.3.

2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases
will only include security fixes. According to the release notes, over
100 bugs and patches have been addressed since Python 2.5.1, many of
them improving the stability of the interpreter, and improving its
portability.

2.4.6 includes only a small number of security fixes. Python 2.6 is
the latest version of Python, we're making this release for people who
are still running Python 2.4.

See the release notes at the website (also available as Misc/NEWS in
the source distribution) for details of bugs fixed; most of them prevent
interpreter crashes (and now cause proper Python exceptions in cases
where the interpreter may have crashed before).

Assuming no major problems crop up, a final release of Python 2.4.6
and 2.5.3 will follow in about a week's time.

For more information on Python 2.4.6 and 2.5.3, including download
links for various platforms, release notes, and known issues, please
see:

    http://www.python.org/2.4.6
    http://www.python.org/2.5.3

Highlights of the previous major Python releases are available
from the Python 2.5 page, at

    http://www.python.org/2.4/highlights.html
    http://www.python.org/2.5/highlights.html

Enjoy this release,
Martin

Martin v. Loewis
martin at v.loewis.de
Python Release Manager
(on behalf of the entire python-dev team)

From mlobol at gmail.com  Sat Dec 13 23:35:00 2008
From: mlobol at gmail.com (Miguel Lobo)
Date: Sat, 13 Dec 2008 22:35:00 +0000
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
Message-ID: <10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com>

> I think we should not do this. We should use 4 space indents for new
> files, but existing files should not be reindented. If you reindent,
> much of the history of the file is essentially lost -- "svn blame"
> will blame whoever reindented the code, and it's a pain to go back.

I believe "svn blame -x -w" ignores whitespace changes.

-- 
Miguel
Check out Gleam, an LGPL sound synthesizer library, at http://gleamsynth.sf.net

From lists at cheimes.de  Sat Dec 13 23:39:36 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sat, 13 Dec 2008 23:39:36 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <49443710.3060102@v.loewis.de>
References: <49443710.3060102@v.loewis.de>
Message-ID: <494439A8.2030208@cheimes.de>

Martin v. L?wis schrieb:
> 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases
> will only include security fixes. According to the release notes, over
> 100 bugs and patches have been addressed since Python 2.5.1, many of
                                                          ^^^^

Do you really mean 2.5.1?

Christian

From martin at v.loewis.de  Sat Dec 13 23:47:27 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 13 Dec 2008 23:47:27 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <gi0pi4$f0j$1@ger.gmane.org>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>	<4943B974.6020407@voidspace.org.uk>	<ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
	<gi0pi4$f0j$1@ger.gmane.org>
Message-ID: <49443B7F.8020602@v.loewis.de>

> If I remember correctly (when threading was invented in the mid-1980s)
> threads were originally described as "lightweight processes".

According to

http://www.serpentine.com/blog/threads-faq/the-history-of-threads/

that's when threads where *reinvented*. They were originally invented
in 1965, on Multics (1970) they were used to perform compilation in the
background. When Unix came along, it *added* address space separation,
introducing what is now known as processes.

> The
> perceived advantage at the time was the ability to have multiple threads
> of control with shared memory: this was much faster than the available
> inter-process communication mechanisms. On a single-processor computer
> synchronization was much less of a problem.

Historically, it was vice versa. First there were
threads/processes/tasks with shared variables, semaphores, etc, and
later address space separation was added.

Regards,
Martin

From martin at v.loewis.de  Sat Dec 13 23:51:25 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 13 Dec 2008 23:51:25 +0100
Subject: [Python-Dev] beginning developer: fastest way to learn
 how	Python 3.0 works
In-Reply-To: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com>
References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com>	<20081213181851.GA23531@panix.com>
	<162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com>
Message-ID: <49443C6D.8040005@v.loewis.de>

> Maybe this is the correct list, as my inquiry is about how to learn
> how the current implementation works so that I could consider how to
> implement new features.
> 
> So, here's a modified question: If you want to learn how python works
> (not how to program in the python language), what's a productive way
> to proceed?

Well, the question is what you want to learn it *for*. If you want to
learn in order to contribute, I suggest you pick an old bug on the bug
tracker and try to solve it.

If you have a specific new feature in mind that you want to implement,
I again suggest that you just start implementing it. If you don't know
how, then you should ask on python-list how certain things are done
that you might need for the feature, or you even explain to python-list
readers what the feature is that you want to implement, and how people
would go about implementing it.

Regards,
Martin

From martin at v.loewis.de  Sat Dec 13 23:55:38 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 13 Dec 2008 23:55:38 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <494439A8.2030208@cheimes.de>
References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de>
Message-ID: <49443D6A.9020308@v.loewis.de>

Christian Heimes wrote:
> Martin v. L?wis schrieb:
>> 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases
>> will only include security fixes. According to the release notes, over
>> 100 bugs and patches have been addressed since Python 2.5.1, many of
>                                                           ^^^^
>
> Do you really mean 2.5.1?

Oops, no - although the statement is technically correct; since 2.5.2,
only 80 bugs have been added :-)

Thanks for pointing that out.

Martin

From skip at pobox.com  Sun Dec 14 04:04:09 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 13 Dec 2008 21:04:09 -0600
Subject: [Python-Dev] Problem with svn on community buildbot
Message-ID: <18756.30633.439039.977094@montanaro-dyndns-org.local>

I have a community buildbot:

    http://www.python.org/dev/buildbot/community/all/g5%20OSX%202.5/builds/14/step-svn/0

which is failing the svn checkout of the 2.5 branch:

    svn: PROPFIND request failed on '/projects/python/branches/release25-maint'
    svn: PROPFIND of '/projects/python/branches/release25-maint': Could not resolve hostname `svn.python.org': Temporary failure in name resolution (http://svn.python.org)

The svn command is:

    /opt/local/bin/svn checkout --revision 67742 --non-interactive http://svn.python.org/projects/python/branches/release25-maint build

Any idea what the problem might be?

Thanks,

Skip

From martin at v.loewis.de  Sun Dec 14 05:40:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 05:40:19 +0100
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <18756.30633.439039.977094@montanaro-dyndns-org.local>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
Message-ID: <49448E33.9080506@v.loewis.de>

>     svn: PROPFIND of '/projects/python/branches/release25-maint': Could not resolve hostname `svn.python.org': Temporary failure in name resolution (http://svn.python.org)
> 
> Any idea what the problem might be?

Well - can you resolve `svn.python.org' on that machine (e.g. when
using ping(1))?

Regards,
Martin

From skip at pobox.com  Sun Dec 14 14:58:25 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 14 Dec 2008 07:58:25 -0600
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <49448E33.9080506@v.loewis.de>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
Message-ID: <18757.4353.603639.60602@montanaro-dyndns-org.local>


    Martin> Well - can you resolve `svn.python.org' on that machine
    Martin> (e.g. when using ping(1))?

Yup:

    $ host svn.python.org
    svn.python.org has address 82.94.164.164
    svn.python.org has IPv6 address 2001:888:2000:d::a4
    $ ping svn.python.org
    PING svn.python.org (82.94.164.164): 56 data bytes
    64 bytes from 82.94.164.164: icmp_seq=0 ttl=50 time=134.041 ms
    64 bytes from 82.94.164.164: icmp_seq=1 ttl=50 time=135.441 ms
    64 bytes from 82.94.164.164: icmp_seq=2 ttl=50 time=135.352 ms
    ^C
    --- svn.python.org ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 134.041/134.945/135.441/0.640 ms
    $ telnet svn.python.org 80
    Trying 82.94.164.164...
    Connected to svn.python.org.
    Escape character is '^]'.
    ^]
    telnet> quit
    Connection closed.

Skip

From alexander.belopolsky at gmail.com  Sun Dec 14 17:07:30 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sun, 14 Dec 2008 11:07:30 -0500
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <18757.4353.603639.60602@montanaro-dyndns-org.local>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
Message-ID: <d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>

I don't know is this is related, but from my end, access to
svn.python.org has been extremely slow recently:

$ time curl -o /dev/null http://svn.python.org
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   353  100   353    0     0      4      0  0:01:28  0:01:15  0:00:13     0

real    1m15.045s
user    0m0.004s
sys     0m0.004s


I've seen similar slowdowns accessing bugs.python.org, but not now.

It looks like it has something to do with IPv6:

$ host svn.python.org
svn.python.org has address 82.94.164.164
svn.python.org has IPv6 address 2001:888:2000:d::a4

$ time curl -v -o /dev/null http://svn.python.org
* About to connect() to svn.python.org port 80 (#0)
*   Trying 2001:888:2000:d::a4... Operation timed out
*   Trying 82.94.164.164... connected
...

No slowdown when IPv6 lookup is disabled with -4 option to curl:

$ time curl -4 -o /dev/null http://svn.python.org
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   353  100   353    0     0    774      0 --:--:-- --:--:-- --:--:--     0

real    0m0.463s
user    0m0.004s
sys     0m0.004s

On Sun, Dec 14, 2008 at 8:58 AM,  <skip at pobox.com> wrote:
>
>    Martin> Well - can you resolve `svn.python.org' on that machine
>    Martin> (e.g. when using ping(1))?
>
> Yup:
>
>    $ host svn.python.org
>    svn.python.org has address 82.94.164.164
>    svn.python.org has IPv6 address 2001:888:2000:d::a4
>    $ ping svn.python.org
>    PING svn.python.org (82.94.164.164): 56 data bytes
>    64 bytes from 82.94.164.164: icmp_seq=0 ttl=50 time=134.041 ms
>    64 bytes from 82.94.164.164: icmp_seq=1 ttl=50 time=135.441 ms
>    64 bytes from 82.94.164.164: icmp_seq=2 ttl=50 time=135.352 ms
>    ^C
>    --- svn.python.org ping statistics ---
>    3 packets transmitted, 3 packets received, 0% packet loss
>    round-trip min/avg/max/stddev = 134.041/134.945/135.441/0.640 ms
>    $ telnet svn.python.org 80
>    Trying 82.94.164.164...
>    Connected to svn.python.org.
>    Escape character is '^]'.
>    ^]
>    telnet> quit
>    Connection closed.
>
> Skip
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com
>

From guido at python.org  Sun Dec 14 17:26:06 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 14 Dec 2008 08:26:06 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <loom.20081213T220617-917@post.gmane.org>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
Message-ID: <ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>

On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Guido van Rossum <guido <at> python.org> writes:
>>
>> I think we should not do this. We should use 4 space indents for new
>> files, but existing files should not be reindented.
>
> Well, right now many files are indented with a mix of spaces and tabs, depending
> on who did the edit and how their editor was configured at the time.

That's  a shame. We used to have more rigorous standards than allowing that.

> Perhaps a graceful policy would be to mandate that all new edits be made with
> spaces without touching other functions in the file. Then hopefully the code
> base would gradually converge to a tabless scheme.

I don't think so. I find local consistency more important than global
consistency. A file can become really hard to read when different
indentation schemes are used in random parts of the code.

If you have a problem configuring your editor, just say so and someone
will explain how to do it.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sun Dec 14 17:30:02 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 14 Dec 2008 10:30:02 -0600
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
Message-ID: <18757.13450.115714.797824@montanaro-dyndns-org.local>


    Alexander> It looks like it has something to do with IPv6:

    Alexander> $ host svn.python.org svn.python.org has address
    Alexander> 82.94.164.164 svn.python.org has IPv6 address
    Alexander> 2001:888:2000:d::a4
    ...
    Alexander> No slowdown when IPv6 lookup is disabled with -4 option to
    Alexander> curl:
    ...

But I have no problem on my laptop which is sitting right next to the G5
which is having problems.  Both show an IPv6 address for svn.python.org.

Skip

From jyasskin at gmail.com  Sun Dec 14 18:43:28 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Sun, 14 Dec 2008 09:43:28 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
Message-ID: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>

On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum <guido at python.org> wrote:
> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Guido van Rossum <guido <at> python.org> writes:
>>>
>>> I think we should not do this. We should use 4 space indents for new
>>> files, but existing files should not be reindented.
>>
>> Well, right now many files are indented with a mix of spaces and tabs, depending
>> on who did the edit and how their editor was configured at the time.
>
> That's  a shame. We used to have more rigorous standards than allowing that.
>
>> Perhaps a graceful policy would be to mandate that all new edits be made with
>> spaces without touching other functions in the file. Then hopefully the code
>> base would gradually converge to a tabless scheme.
>
> I don't think so. I find local consistency more important than global
> consistency. A file can become really hard to read when different
> indentation schemes are used in random parts of the code.
>
> If you have a problem configuring your editor, just say so and someone
> will explain how to do it.

I've never figured out how to configure emacs to deduce whether the
current file uses spaces or tabs and has a 4 or 8 space indent. I
always try to get it right anyway, but it'd be a lot more convenient
if my editor did it for me. If there are such instructions, perhaps
they should be added to PEPs 7 and 8?

Thanks,
Jeffrey

From solipsis at pitrou.net  Sun Dec 14 18:49:39 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 14 Dec 2008 17:49:39 +0000 (UTC)
Subject: [Python-Dev] Reindenting the C code base?
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
Message-ID: <loom.20081214T174723-667@post.gmane.org>

Jeffrey Yasskin <jyasskin <at> gmail.com> writes:
> 
> I've never figured out how to configure emacs to deduce whether the
> current file uses spaces or tabs and has a 4 or 8 space indent.

Same question for Kate! Although I guess that if emacs isn't able to do it, Kate
won't do it either...

(Kate allows configuring on a directory basis, on a file extension basis, but
not on a filename basis)

Regards

Antoine.



From alexandre at peadrop.com  Sun Dec 14 18:54:15 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 14 Dec 2008 12:54:15 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <loom.20081213T220617-917@post.gmane.org>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
Message-ID: <acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>

On Sat, Dec 13, 2008 at 5:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Guido van Rossum <guido <at> python.org> writes:
>>
>> I think we should not do this. We should use 4 space indents for new
>> files, but existing files should not be reindented.
>
> Well, right now many files are indented with a mix of spaces and tabs, depending
> on who did the edit and how their editor was configured at the time.
>

Personally, I think the indentation of, at least,
Objects/unicodeobject.c should be fixed. This file has become so
mixed-up with tab and space indents that I have no-idea what to use
when I edit it. Just to give an idea how messy it is, they are 5214
lines indented with tabs and 4272 indented with spaces (out the 9733
of the file).

-- Alexandre

From alexandre at peadrop.com  Sun Dec 14 18:57:14 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 14 Dec 2008 12:57:14 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
Message-ID: <acd65fa20812140957s86af12eh64663759d725e5d4@mail.gmail.com>

On Sun, Dec 14, 2008 at 12:43 PM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> I've never figured out how to configure emacs to deduce whether the
> current file uses spaces or tabs and has a 4 or 8 space indent. I
> always try to get it right anyway, but it'd be a lot more convenient
> if my editor did it for me. If there are such instructions, perhaps
> they should be added to PEPs 7 and 8?
>

I know python-mode is able to detect indent configuration of python
code automatically, but I don't know if c-mode is able to. Personally,

From alexandre at peadrop.com  Sun Dec 14 19:03:40 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 14 Dec 2008 13:03:40 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <acd65fa20812140957s86af12eh64663759d725e5d4@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<acd65fa20812140957s86af12eh64663759d725e5d4@mail.gmail.com>
Message-ID: <acd65fa20812141003u78d6eai3d22234003c44953@mail.gmail.com>

On Sun, Dec 14, 2008 at 12:57 PM, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> On Sun, Dec 14, 2008 at 12:43 PM, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
>> I've never figured out how to configure emacs to deduce whether the
>> current file uses spaces or tabs and has a 4 or 8 space indent. I
>> always try to get it right anyway, but it'd be a lot more convenient
>> if my editor did it for me. If there are such instructions, perhaps
>> they should be added to PEPs 7 and 8?
>>
>
> I know python-mode is able to detect indent configuration of python
> code automatically, but I don't know if c-mode is able to. Personally,
>

[sorry, <tab><space> in gmail made it send my unfinished email]

Personally, I use auto-mode-alist to make Emacs choose the indent
configuration to use automatically.

Here's how it looks like for me:

(defmacro def-styled-c-mode (name style &rest body)
  "Define styled C modes."
  `(defun ,name ()
     (interactive)
     (c-mode)
     (c-set-style ,style)
     , at body))

(def-styled-c-mode python-c-mode "python"
  (setq indent-tabs-mode t
        tab-width 8
        c-basic-offset 8))

(def-styled-c-mode py3k-c-mode "python"
  (setq indent-tabs-mode nil
        tab-width 4
        c-basic-offset 4))

(setq auto-mode-alist
  (append '(("/python.org/python/.*\\.[ch]\\'" . python-c-mode)
            ("/python.org/.*/.*\\.[ch]\\'" . py3k-c-mode)) auto-mode-alist))

From alexandre at peadrop.com  Sun Dec 14 19:19:17 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 14 Dec 2008 13:19:17 -0500
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
Message-ID: <acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>

On Fri, Dec 12, 2008 at 11:39 AM, Lennart Regebro <regebro at gmail.com> wrote:
> The fix_imports fix seems to fix only the first import per line that you have.
> So if you do for example
>   import urllib2, cStringIO
> it will not fix cStringIO.
>
> Is this a bug or a feature? :-) If it's a feature it should warn at
> least, right?
>

Which revision of python are you using? I tried the test-case you gave
and 2to3 translated it perfectly.

-- Alexandre

alex at helios:~$ cat test.py
import urllib2, cStringIO

s = cStringIO.StringIO(urllib2.randombytes(100))
alex at helios:~$ 2to3 test.py
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- test.py (original)
+++ test.py (refactored)
@@ -1,3 +1,3 @@
-import urllib2, cStringIO
+import urllib.request, urllib.error, io

-s = cStringIO.StringIO(urllib2.randombytes(100))
+s = io.StringIO(urllib2.randombytes(100))
RefactoringTool: Files that need to be modified:
RefactoringTool: test.py

From regebro at gmail.com  Sun Dec 14 19:34:35 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 14 Dec 2008 19:34:35 +0100
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
	<acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>
Message-ID: <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com>

On Sun, Dec 14, 2008 at 19:19, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> Which revision of python are you using? I tried the test-case you gave
> and 2to3 translated it perfectly.

3.0, I haven't tried with trunk yet, and possibly it's a more
complicated usecase.
-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From alexandre at peadrop.com  Sun Dec 14 19:49:06 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sun, 14 Dec 2008 13:49:06 -0500
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
	<acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>
	<319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com>
Message-ID: <acd65fa20812141049i49db36e7y63c22b731fee1e7f@mail.gmail.com>

On Sun, Dec 14, 2008 at 1:34 PM, Lennart Regebro <regebro at gmail.com> wrote:
> On Sun, Dec 14, 2008 at 19:19, Alexandre Vassalotti
> <alexandre at peadrop.com> wrote:
>> Which revision of python are you using? I tried the test-case you gave
>> and 2to3 translated it perfectly.
>
> 3.0, I haven't tried with trunk yet, and possibly it's a more
> complicated usecase.

Strange, fix_imports in Python 3.0 (final) looks fine. If you can come
up with a reproducible example, please open a bug on bugs.python.org
and set me as the assignee (my user id is alexandre.vassalotti).

Thanks,
-- Alexandre

From regebro at gmail.com  Sun Dec 14 20:02:01 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 14 Dec 2008 20:02:01 +0100
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <acd65fa20812141049i49db36e7y63c22b731fee1e7f@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
	<acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>
	<319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com>
	<acd65fa20812141049i49db36e7y63c22b731fee1e7f@mail.gmail.com>
Message-ID: <319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com>

On Sun, Dec 14, 2008 at 19:49, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
>> 3.0, I haven't tried with trunk yet, and possibly it's a more
>> complicated usecase.
>
> Strange, fix_imports in Python 3.0 (final) looks fine. If you can come
> up with a reproducible example, please open a bug on bugs.python.org
> and set me as the assignee (my user id is alexandre.vassalotti).

Actually, it wasn't more complex, but it was completely different. It
doesn't have anything with the amount of statements, but it's
specifically if you have urlparse in the imports that breaks it. I'll
open a bug report.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From regebro at gmail.com  Sun Dec 14 20:08:09 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 14 Dec 2008 20:08:09 +0100
Subject: [Python-Dev] 2to3 question about fix_imports.
In-Reply-To: <319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com>
References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com>
	<acd65fa20812141019i59416ed9t725dea5df86009d7@mail.gmail.com>
	<319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com>
	<acd65fa20812141049i49db36e7y63c22b731fee1e7f@mail.gmail.com>
	<319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com>
Message-ID: <319e029f0812141108j291e3fb1n70512fb7c20b0947@mail.gmail.com>

On Sun, Dec 14, 2008 at 20:02, Lennart Regebro <regebro at gmail.com> wrote:
> On Sun, Dec 14, 2008 at 19:49, Alexandre Vassalotti
> <alexandre at peadrop.com> wrote:
>>> 3.0, I haven't tried with trunk yet, and possibly it's a more
>>> complicated usecase.
>>
>> Strange, fix_imports in Python 3.0 (final) looks fine. If you can come
>> up with a reproducible example, please open a bug on bugs.python.org
>> and set me as the assignee (my user id is alexandre.vassalotti).
>
> Actually, it wasn't more complex, but it was completely different. It
> doesn't have anything with the amount of statements, but it's
> specifically if you have urlparse in the imports that breaks it. I'll
> open a bug report.

I couldn't assign it to you, so here goes:
http://bugs.python.org/issue4664


-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From martin at v.loewis.de  Sun Dec 14 20:55:40 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 20:55:40 +0100
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>	
	<49448E33.9080506@v.loewis.de>	
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
Message-ID: <494564BC.4020000@v.loewis.de>

> I don't know is this is related

It shouldn't. AFAIK, buildbot makes its internet connections
through twisted, and twisted doesn't use IPv6. Also, the diagnostics
(cannot resolve name) doesn't match connectivity problems.

> $ time curl -v -o /dev/null http://svn.python.org
> * About to connect() to svn.python.org port 80 (#0)
> *   Trying 2001:888:2000:d::a4... Operation timed out

Hmm. Can you debug this further?

Do you have IPv6 connectivity at all? Do you have a global v6
address? What happens if you do a v6 traceroute?

Regards,
Martin


From martin at v.loewis.de  Sun Dec 14 21:42:10 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 21:42:10 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
Message-ID: <49456FA2.70900@v.loewis.de>

> I've never figured out how to configure emacs to deduce whether the
> current file uses spaces or tabs and has a 4 or 8 space indent.

If it is now official policy that different files use different styles,
then I think it would be helpful to put Emacs variables at the end of
each file. See the end of Objects/unicodeobject.c for an example.

I'm not aware of a builtin function that adjusts c-mode automatically;
I could fine a package that does some basic guessing, though:

http://members.iinet.net.au/~bethandmark/elisp/mst-guess-indentation.el
http://www.emacswiki.org/cgi-bin/emacs/guess-offset.el

I've tried the second one briefly. It guesses c-basic-offset fairly
well, but doesn't attempt to guess indent-tabs. This one does; I haven't
tried it yet:

https://savannah.nongnu.org/projects/dtrt-indent/
http://git.savannah.gnu.org/gitweb/?p=dtrt-indent.git;a=blob_plain;f=dtrt-indent.el;hb=HEAD

Regards,
Martin

From martin at v.loewis.de  Sun Dec 14 21:43:47 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 21:43:47 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>
	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
Message-ID: <49457003.5060104@v.loewis.de>

> Personally, I think the indentation of, at least,
> Objects/unicodeobject.c should be fixed. This file has become so
> mixed-up with tab and space indents that I have no-idea what to use
> when I edit it. Just to give an idea how messy it is, they are 5214
> lines indented with tabs and 4272 indented with spaces (out the 9733
> of the file).

As an Emacs variables block is present in the file, I would consider
this normative, and declare that the official indenting is 4 spaces
for the file, no tabs.

Regards,
Martin

From dickinsm at gmail.com  Sun Dec 14 21:49:31 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 14 Dec 2008 20:49:31 +0000
Subject: [Python-Dev] How to force export of a particular symbol from
	python.exe?
Message-ID: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com>

Hi all,

I'm having some trouble making some bits of the Python core code
available to extension modules.  Specifically,  I'm trying to add a
function 'Py_force_to_memory' to Python/pymath.c and then use
it (via a macro) from Modules/cmathmodule.c. But importing of
the cmath module fails with a 'Symbol not found' error.  The
function is declared with a 'PyAPI_FUNC' in Python/pymath.h.

Here's the relevant portion of the make output:

*** WARNING: renaming "cmath" since importing it failed:
dlopen(build/lib.macosx-10.3-i386-2.7/cmath.so, 2): Symbol not found:
_Py_force_to_memory
  Referenced from:
/Users/dickinsm/python_source/branches/trunk/build/lib.macosx-10.3-i386-2.7/cmath.so
  Expected in: dynamic lookup

This is a non-debug trunk build, on OS X (10.5.5), with all
the defaults.  I'm using Apple's standard toolchain (gcc 4.0.1,
Darwin linker).  The patch I'm building with can be seen at:

http://bugs.python.org/issue4575

(It's the first of the two patches there, called 'force_to_memory.patch'.)

I think I understand the cause of this problem;  I just don't know how
to fix it.  The cause seems to be that none of the symbols in pymath.o
is used in the Python executable;  they're used only in the extension
modules.  So while the '_Py_force_to_memory' symbol appears in
libpython2.7.a, it doesn't appear in the python.exe executable;  hence
the above error.

If I move the definition of Py_force_to_memory from Python/pymath.c
to Objects/floatobject.c then everything works as expected.

Questions:

(1) Is this an OS X only problem?

(2) Is there an easy way to force a particular symbol (or all the
symbols from a particular object file) to be exported in the Python
executable, so that it's available to a dynamically loaded extension
module?

I've found the -u option to gcc, but this doesn't seem like a
particularly portable solution.  Of course, if this problem exists
only on OS X, then the solution doesn't need to be portable.

Thanks,

Mark

From martin at v.loewis.de  Sun Dec 14 21:53:09 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 21:53:09 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <loom.20081214T174723-667@post.gmane.org>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<loom.20081214T174723-667@post.gmane.org>
Message-ID: <49457235.7060701@v.loewis.de>

> Same question for Kate! Although I guess that if emacs isn't able to do it, Kate
> won't do it either...
> 
> (Kate allows configuring on a directory basis, on a file extension basis, but
> not on a filename basis)

I guess it would be possible to write a Kate plugin that does that.

Regards,
Martin

From martin at v.loewis.de  Sun Dec 14 22:06:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 22:06:19 +0100
Subject: [Python-Dev] How to force export of a particular symbol from
 python.exe?
In-Reply-To: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com>
References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com>
Message-ID: <4945754B.3010201@v.loewis.de>

> (1) Is this an OS X only problem?

Probably not. If nothing of pymath.c is actually needed when linking
the python executable, pymath.o will be excluded by the linker.

> (2) Is there an easy way to force a particular symbol (or all the
> symbols from a particular object file) to be exported in the Python
> executable, so that it's available to a dynamically loaded extension
> module?

That's not the issue. Had pymath.o been linked into python, it's
symbols would have been exported (is that proper use of English
tenses?)

To fix this, I see three solutions

1. Explicitly link the module to extensions which are known to
   require it, e.g. by explicitly adding it to the sources in
   setup.py. That might cause duplications, but would IMO be
   the cleanest solution (python.exe has no business in exporting
   standard math functions, IMO)

2. Explicitly link pymath.o to python.exe, instead of integrating
   it into libpythonxy.a. If the symbols need to be exposed through
   python.exe (for whatever reason), this is the clean way to do it.

3. Implicitly force linkage, by adding a dummy symbol to pymath.o
   which gets referenced from an object known to be linked into
   the interpreter. This has the least impact on the build process,
   but is the most hackish approach (IMO).

Regards,
Martin

From solipsis at pitrou.net  Sun Dec 14 22:08:14 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 14 Dec 2008 21:08:14 +0000 (UTC)
Subject: [Python-Dev] Reindenting the C code base?
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<loom.20081214T174723-667@post.gmane.org>
	<49457235.7060701@v.loewis.de>
Message-ID: <loom.20081214T210649-728@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> I guess it would be possible to write a Kate plugin that does that.

Or perhaps more simply, Kate allows modelines at the beginning and at the end of
source files. I don't know if it's ok to add these to the code base though.




From alexander.belopolsky at gmail.com  Sun Dec 14 22:52:36 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sun, 14 Dec 2008 16:52:36 -0500
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <494564BC.4020000@v.loewis.de>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
	<494564BC.4020000@v.loewis.de>
Message-ID: <d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>

Please see below for more svn debugging, but now I also traced down
the delays I observe when I go to bugs.python.com to the same issue.
The offending download is the style sheet and that explains why curl
does not show it when pointed to the main page:

$ curl -v -o /dev/null http://python.org/styles/screen-switcher-default.css
* About to connect() to python.org port 80 (#0)
*   Trying 2001:888:2000:d::a2... Operation timed out

The offending main page element is:
$ curl  http://bugs.python.org 2>/dev/null | grep screen-switcher-default
<link media="screen"
href="http://python.org/styles/screen-switcher-default.css"
type="text/css" id="screen-switcher-stylesheet" rel="stylesheet" />


On Sun, Dec 14, 2008 at 2:55 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
..
>> $ time curl -v -o /dev/null http://svn.python.org
>> * About to connect() to svn.python.org port 80 (#0)
>> *   Trying 2001:888:2000:d::a4... Operation timed out
>
> Hmm. Can you debug this further?
>
> Do you have IPv6 connectivity at all?
I don't think so.

> Do you have a global v6 address?
No, only private inet6 address:

$ ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
	inet6 fe80::21f:5bff:fef3:c0a4%en0 prefixlen 64 scopeid 0x4
	inet 192.168.1.6 netmask 0xffffff00 broadcast 192.168.1.255
...

> What happens if you do a v6 traceroute?
>
$ traceroute6 -v svn.python.org
traceroute6 to svn.python.org (2001:888:2000:d::a4) from
fdbd:a375:403a:51c6:21f:5bff:fef3:c0a4, 30 hops max, 12 byte packets
 1  *
24 bytes from fe80::216:cbff:fec1:c94c%en0 to
fe80::21f:5bff:fef3:c0a4: icmp type 136 (Neighbor Advertisement) code
0
0000: fe800000 00000000 0216cbff fec1c94c
0010: 00000000 00000000

32 bytes from fe80::216:cbff:fec1:c94c%en0 to
fe80::21f:5bff:fef3:c0a4: icmp type 135 (Neighbor Solicitation) code 0
0000: fe800000 00000000 021f5bff fef3c0a4
0010: 01010016 cbc1c94c 00000000 00000000
 * *

From dickinsm at gmail.com  Sun Dec 14 22:57:41 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 14 Dec 2008 21:57:41 +0000
Subject: [Python-Dev] How to force export of a particular symbol from
	python.exe?
In-Reply-To: <4945754B.3010201@v.loewis.de>
References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com>
	<4945754B.3010201@v.loewis.de>
Message-ID: <5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com>

On Sun, Dec 14, 2008 at 9:06 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> That's not the issue. Had pymath.o been linked into python, it's
> symbols would have been exported (is that proper use of English
> tenses?)

Sounds right to me.

>
> To fix this, I see three solutions
>
> [...]

Thanks for this;  this gives me a clearer idea of how things might
be solved.

> (python.exe has no business in exporting
> standard math functions, IMO)

It's a little bit messy:  some bits of pymath.c (hypot, and possibly
copysign) are needed in the core, but only on platforms whose
math libraries haven't caught up with C99.  The rest is only
(possibly) needed in the math and cmath modules.  In fact,
on OS X none of pymath.c is needed at all, which results in
lots of "ranlib: file: libpython2.7.a(pymath.o) has no symbols"
in the build output...

I'll try to find a non-hackish solution.

Mark

From alexander.belopolsky at gmail.com  Sun Dec 14 23:03:17 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sun, 14 Dec 2008 17:03:17 -0500
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
	<494564BC.4020000@v.loewis.de>
	<d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>
Message-ID: <d38f5330812141403k3115d6bbs5d84dfc191302af0@mail.gmail.com>

I've found a work-around in Firefox: go to about:config page an change
network.dns.disableIPv6 to true.

Does anyone know a similar setting in Safari?

On Sun, Dec 14, 2008 at 4:52 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> Please see below for more svn debugging, but now I also traced down
> the delays I observe when I go to bugs.python.com to the same issue.
> The offending download is the style sheet and that explains why curl
> does not show it when pointed to the main page:
>
> $ curl -v -o /dev/null http://python.org/styles/screen-switcher-default.css
> * About to connect() to python.org port 80 (#0)
> *   Trying 2001:888:2000:d::a2... Operation timed out
>
> The offending main page element is:
> $ curl  http://bugs.python.org 2>/dev/null | grep screen-switcher-default
> <link media="screen"
> href="http://python.org/styles/screen-switcher-default.css"
> type="text/css" id="screen-switcher-stylesheet" rel="stylesheet" />
>
>
> On Sun, Dec 14, 2008 at 2:55 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> ..
>>> $ time curl -v -o /dev/null http://svn.python.org
>>> * About to connect() to svn.python.org port 80 (#0)
>>> *   Trying 2001:888:2000:d::a4... Operation timed out
>>
>> Hmm. Can you debug this further?
>>
>> Do you have IPv6 connectivity at all?
> I don't think so.
>
>> Do you have a global v6 address?
> No, only private inet6 address:
>
> $ ifconfig en0
> en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>        inet6 fe80::21f:5bff:fef3:c0a4%en0 prefixlen 64 scopeid 0x4
>        inet 192.168.1.6 netmask 0xffffff00 broadcast 192.168.1.255
> ...
>
>> What happens if you do a v6 traceroute?
>>
> $ traceroute6 -v svn.python.org
> traceroute6 to svn.python.org (2001:888:2000:d::a4) from
> fdbd:a375:403a:51c6:21f:5bff:fef3:c0a4, 30 hops max, 12 byte packets
>  1  *
> 24 bytes from fe80::216:cbff:fec1:c94c%en0 to
> fe80::21f:5bff:fef3:c0a4: icmp type 136 (Neighbor Advertisement) code
> 0
> 0000: fe800000 00000000 0216cbff fec1c94c
> 0010: 00000000 00000000
>
> 32 bytes from fe80::216:cbff:fec1:c94c%en0 to
> fe80::21f:5bff:fef3:c0a4: icmp type 135 (Neighbor Solicitation) code 0
> 0000: fe800000 00000000 021f5bff fef3c0a4
> 0010: 01010016 cbc1c94c 00000000 00000000
>  * *
>

From martin at v.loewis.de  Sun Dec 14 23:15:41 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 14 Dec 2008 23:15:41 +0100
Subject: [Python-Dev] How to force export of a particular symbol from
 python.exe?
In-Reply-To: <5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com>
References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com>	
	<4945754B.3010201@v.loewis.de>
	<5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com>
Message-ID: <4945858D.4050309@v.loewis.de>

> It's a little bit messy:  some bits of pymath.c (hypot, and possibly
> copysign) are needed in the core, but only on platforms whose
> math libraries haven't caught up with C99.

It would be possible to only build the module if it defines any
functions; that should be checked in configure.

Alternatively, I believe that autoconf offers a mechanism to
have fallback functions in files named like the function; autoconf
will then build itself a list of all additional source files.
Using that would require to split pymath.c into multiple files.

Regards,
Martin


From martin at v.loewis.de  Sun Dec 14 23:18:33 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 23:18:33 +0100
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <d38f5330812141403k3115d6bbs5d84dfc191302af0@mail.gmail.com>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>	
	<49448E33.9080506@v.loewis.de>	
	<18757.4353.603639.60602@montanaro-dyndns-org.local>	
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>	
	<494564BC.4020000@v.loewis.de>	
	<d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>
	<d38f5330812141403k3115d6bbs5d84dfc191302af0@mail.gmail.com>
Message-ID: <49458639.4020507@v.loewis.de>

> I've found a work-around in Firefox: go to about:config page an change
> network.dns.disableIPv6 to true.

I'd advise against using such a work-around. The infrastructure is
designed to cope with that case transparently; if it is not transparent,
your system must be somehow misconfigured (it could also be the case
that applications are buggy - but I don't think this is the case you
are facing). The proper solution is to fix your system (although I'm
still uncertain what precisely the problem might be).

Regards,
Martin

From alexander.belopolsky at gmail.com  Sun Dec 14 23:38:12 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sun, 14 Dec 2008 17:38:12 -0500
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <49458639.4020507@v.loewis.de>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>
	<49448E33.9080506@v.loewis.de>
	<18757.4353.603639.60602@montanaro-dyndns-org.local>
	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>
	<494564BC.4020000@v.loewis.de>
	<d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>
	<d38f5330812141403k3115d6bbs5d84dfc191302af0@mail.gmail.com>
	<49458639.4020507@v.loewis.de>
Message-ID: <d38f5330812141438x667a9572x87c074a7abe75c63@mail.gmail.com>

On Sun, Dec 14, 2008 at 5:18 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I've found a work-around in Firefox: go to about:config page an change
>> network.dns.disableIPv6 to true.
>
> I'd advise against using such a work-around. The infrastructure is
> designed to cope with that case transparently; if it is not transparent,
> your system must be somehow misconfigured ...

I've never had similar issues with any site other than those in
python.org domain and I had these problems with bug.python.org on
several systems in different locations.

Another work-around, which happens to work for all browsers and svn is
to disable IPv6 in network preferences (my system is Mac OS 10.5.5).
since I don't have IPv6 connectivity, I think this is a solution I can
live with, but I wonder why is it necessary for python.org to be
registered as both an IPv4 and v6 domain?  Google does not do that:

$ host google.com
google.com has address 72.14.205.100
google.com has address 74.125.45.100
google.com has address 209.85.171.100
google.com mail is handled by 10 smtp4.google.com.
google.com mail is handled by 10 smtp1.google.com.
google.com mail is handled by 10 smtp2.google.com.
google.com mail is handled by 10 smtp3.google.com.
$ host ipv6.google.com
ipv6.google.com is an alias for ipv6.l.google.com.
ipv6.l.google.com has IPv6 address 2001:4860:0:2001::68

From martin at v.loewis.de  Sun Dec 14 23:57:59 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 14 Dec 2008 23:57:59 +0100
Subject: [Python-Dev] Problem with svn on community buildbot
In-Reply-To: <d38f5330812141438x667a9572x87c074a7abe75c63@mail.gmail.com>
References: <18756.30633.439039.977094@montanaro-dyndns-org.local>	<49448E33.9080506@v.loewis.de>	<18757.4353.603639.60602@montanaro-dyndns-org.local>	<d38f5330812140807s3e81128pb9b9c23c7f2d6858@mail.gmail.com>	<494564BC.4020000@v.loewis.de>	<d38f5330812141352q2cddfb8ex7bc295733342d78b@mail.gmail.com>	<d38f5330812141403k3115d6bbs5d84dfc191302af0@mail.gmail.com>	<49458639.4020507@v.loewis.de>
	<d38f5330812141438x667a9572x87c074a7abe75c63@mail.gmail.com>
Message-ID: <49458F77.7050709@v.loewis.de>

> live with, but I wonder why is it necessary for python.org to be
> registered as both an IPv4 and v6 domain?  Google does not do that:

Google works in changing that:

http://www3.ietf.org/proceedings/08jul/slides/plenaryw-4.pdf

Other systems have been doing it for many years now:

martin at mira:~$ host www.freebsd.org
www.freebsd.org has address 69.147.83.33
www.freebsd.org has IPv6 address 2001:4f8:fff6::21

Regards,
Martin

From alexander.belopolsky at gmail.com  Mon Dec 15 04:12:41 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sun, 14 Dec 2008 22:12:41 -0500
Subject: [Python-Dev] sys.stdout.write encoding failure
Message-ID: <d38f5330812141912o6ca13f56n40c680655ee350c5@mail.gmail.com>

There is currently a unit test in the trunk that fails in verbose mode:

$ ./python.exe Lib/test/test_doctest.py -v
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position
338-339: ordinal not in range(128)

Apparently, the problem is that stdout cannot encode non-ascii characters:

>>> sys.stdout.write(u'f\xf6\xf6')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
1-2: ordinal not in range(128)

which is strange because

>>> sys.stdout.encoding
'UTF-8'

and print has no problem with the same string:
>>> print u'f\xf6\xf6'
f??


Where does  'ascii' codec come from?

From jeremy at alum.mit.edu  Mon Dec 15 05:06:51 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Sun, 14 Dec 2008 23:06:51 -0500
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
Message-ID: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>

This bug is pretty serious, because urllib will insert garbage into
the application-visible data for a chunked response.  It simply
ignores the fact that it's reading a chunked response and includes the
chunked header data is payload data.  The original bug was reported in
September, but no one noticed it.  It was reported again recently.

http://bugs.python.org/issue3761
http://bugs.python.org/issue4631

I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
that's not my call.

Jeremy

From g.brandl at gmx.net  Mon Dec 15 09:20:44 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 15 Dec 2008 09:20:44 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
Message-ID: <gi5429$ov$1@ger.gmane.org>

Jeffrey Yasskin schrieb:
> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum <guido at python.org> wrote:
>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> Guido van Rossum <guido <at> python.org> writes:
>>>>
>>>> I think we should not do this. We should use 4 space indents for new
>>>> files, but existing files should not be reindented.
>>>
>>> Well, right now many files are indented with a mix of spaces and tabs, depending
>>> on who did the edit and how their editor was configured at the time.
>>
>> That's  a shame. We used to have more rigorous standards than allowing that.
>>
>>> Perhaps a graceful policy would be to mandate that all new edits be made with
>>> spaces without touching other functions in the file. Then hopefully the code
>>> base would gradually converge to a tabless scheme.
>>
>> I don't think so. I find local consistency more important than global
>> consistency. A file can become really hard to read when different
>> indentation schemes are used in random parts of the code.
>>
>> If you have a problem configuring your editor, just say so and someone
>> will explain how to do it.
> 
> I've never figured out how to configure emacs to deduce whether the
> current file uses spaces or tabs and has a 4 or 8 space indent. I
> always try to get it right anyway, but it'd be a lot more convenient
> if my editor did it for me. If there are such instructions, perhaps
> they should be added to PEPs 7 and 8?

I use this little hack to detect indentation in Python's C files:

(defun c-select-style ()
  "Hack: Select the C style to use from buffer indentation."
  (save-excursion
    (if (re-search-forward "^\t" 3000 t)
        (c-set-style "python")
      (c-set-style "python-new"))))

(add-hook 'c-mode-hook 'c-select-style)

-- where "python" and "python-new" are two appropriate c-mode styles.

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From eckhardt at satorlaser.com  Mon Dec 15 09:40:00 2008
From: eckhardt at satorlaser.com (Ulrich Eckhardt)
Date: Mon, 15 Dec 2008 09:40:00 +0100
Subject: [Python-Dev] Python-3.0, unicode, and os.environ
In-Reply-To: <aac2c7cb0812120112rec02ecdjd9436801c28568e@mail.gmail.com>
References: <ca471dc20812042114hf9b2c44t436c5a4e9b3e3831@mail.gmail.com> 
	<200812120931.16231.eckhardt@satorlaser.com> 
	<aac2c7cb0812120112rec02ecdjd9436801c28568e@mail.gmail.com>
Message-ID: <200812150940.00352.eckhardt@satorlaser.com>

On Friday 12 December 2008, Adam Olsen wrote:
> Only pages like this, which indicate the underlying API is an array of
> WCHAR:
>
> http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx

Hmm, true. So even there, the encoding isn't known...

> char * is just fine.  You need only pass a length along with it.  All
> internal APIs *must* already do this, as they support nul bytes.  Also
> note that the underlying POSIX APIs prohibit nul bytes in filenames,
> so it's irrelevant for them.

Hmmm, I see things like Py_GetPath() in the 2.7 sourcecode, which returns a 
plain char*. I really need to check if 3.0 is better.

thanks for the info

Uli

-- 
Sator Laser GmbH
Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden.
E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich.

**************************************************************************************


From amauryfa at gmail.com  Mon Dec 15 09:47:31 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Mon, 15 Dec 2008 09:47:31 +0100
Subject: [Python-Dev] sys.stdout.write encoding failure
In-Reply-To: <d38f5330812141912o6ca13f56n40c680655ee350c5@mail.gmail.com>
References: <d38f5330812141912o6ca13f56n40c680655ee350c5@mail.gmail.com>
Message-ID: <e27efe130812150047l793412f0x28475e99cbd0bab6@mail.gmail.com>

Hi,

Alexander Belopolsky wrote:
> There is currently a unit test in the trunk that fails in verbose mode:
>
> $ ./python.exe Lib/test/test_doctest.py -v
> ...
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 338-339: ordinal not in range(128)
>
> Apparently, the problem is that stdout cannot encode non-ascii characters:
>
>>>> sys.stdout.write(u'f\xf6\xf6')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 1-2: ordinal not in range(128)
>
> which is strange because
>
>>>> sys.stdout.encoding
> 'UTF-8'
>
> and print has no problem with the same string:
>>>> print u'f\xf6\xf6'
> f??
>
>
> Where does  'ascii' codec come from?

It's sys.getdefaultencoding default value.

sys.stdout.write() expects a bytes string. What you see here is the
coercion of the unicode to a string.

-- 
Amaury Forgeot d'Arc

From mal at egenix.com  Mon Dec 15 11:27:29 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 15 Dec 2008 11:27:29 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <49457003.5060104@v.loewis.de>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
	<49457003.5060104@v.loewis.de>
Message-ID: <49463111.6040800@egenix.com>

On 2008-12-14 21:43, Martin v. L?wis wrote:
>> Personally, I think the indentation of, at least,
>> Objects/unicodeobject.c should be fixed. This file has become so
>> mixed-up with tab and space indents that I have no-idea what to use
>> when I edit it. Just to give an idea how messy it is, they are 5214
>> lines indented with tabs and 4272 indented with spaces (out the 9733
>> of the file).
> 
> As an Emacs variables block is present in the file, I would consider
> this normative, and declare that the official indenting is 4 spaces
> for the file, no tabs.

All the Unicode C code I wrote at the time used 4 space indents. I
would welcome this being restored. It got diluted over time.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 15 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From steve at holdenweb.com  Mon Dec 15 14:59:21 2008
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 15 Dec 2008 08:59:21 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com>
Message-ID: <gi5nro$165$1@ger.gmane.org>

Miguel Lobo wrote:
>> I think we should not do this. We should use 4 space indents for new
>> files, but existing files should not be reindented. If you reindent,
>> much of the history of the file is essentially lost -- "svn blame"
>> will blame whoever reindented the code, and it's a pain to go back.
> 
> I believe "svn blame -x -w" ignores whitespace changes.
> 
Sounds like Uncle Timmy's whitespace management needs to become a little
 more draconian.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From josiah.carlson at gmail.com  Mon Dec 15 19:50:42 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Mon, 15 Dec 2008 10:50:42 -0800
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <49443D6A.9020308@v.loewis.de>
References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de>
	<49443D6A.9020308@v.loewis.de>
Message-ID: <e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>

Would anyone mind terribly if I backported a version of:
http://bugs.python.org/issue4501 to 2.4 and 2.5?

It fixes some strange duplicate data issues on poll() with packets
with a nonstandard flag set.

 - Josiah

On Sat, Dec 13, 2008 at 2:55 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Christian Heimes wrote:
>> Martin v. L?wis schrieb:
>>> 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases
>>> will only include security fixes. According to the release notes, over
>>> 100 bugs and patches have been addressed since Python 2.5.1, many of
>>                                                           ^^^^
>>
>> Do you really mean 2.5.1?
>
> Oops, no - although the statement is technically correct; since 2.5.2,
> only 80 bugs have been added :-)
>
> Thanks for pointing that out.
>
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com
>

From jeremy at alum.mit.edu  Mon Dec 15 20:19:39 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Mon, 15 Dec 2008 14:19:39 -0500
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
Message-ID: <e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>

I have a patch that appears to fix this bug
http://bugs.python.org/file12361/urllib-chunked.diff
but I'm not sure about its interaction with the io module and
RawIOBase.  Is there a new IO expert who could take a look at it for
me?

Jeremy

On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> This bug is pretty serious, because urllib will insert garbage into
> the application-visible data for a chunked response.  It simply
> ignores the fact that it's reading a chunked response and includes the
> chunked header data is payload data.  The original bug was reported in
> September, but no one noticed it.  It was reported again recently.
>
> http://bugs.python.org/issue3761
> http://bugs.python.org/issue4631
>
> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
> that's not my call.
>
> Jeremy
>

From brett at python.org  Mon Dec 15 20:21:27 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 15 Dec 2008 11:21:27 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <gi5429$ov$1@ger.gmane.org>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<gi5429$ov$1@ger.gmane.org>
Message-ID: <bbaeab100812151121v3103b0e0qd260b0e22215856a@mail.gmail.com>

On Mon, Dec 15, 2008 at 00:20, Georg Brandl <g.brandl at gmx.net> wrote:
> Jeffrey Yasskin schrieb:
>> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum <guido at python.org> wrote:
>>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>>> Guido van Rossum <guido <at> python.org> writes:
>>>>>
>>>>> I think we should not do this. We should use 4 space indents for new
>>>>> files, but existing files should not be reindented.
>>>>
>>>> Well, right now many files are indented with a mix of spaces and tabs, depending
>>>> on who did the edit and how their editor was configured at the time.
>>>
>>> That's  a shame. We used to have more rigorous standards than allowing that.
>>>
>>>> Perhaps a graceful policy would be to mandate that all new edits be made with
>>>> spaces without touching other functions in the file. Then hopefully the code
>>>> base would gradually converge to a tabless scheme.
>>>
>>> I don't think so. I find local consistency more important than global
>>> consistency. A file can become really hard to read when different
>>> indentation schemes are used in random parts of the code.
>>>
>>> If you have a problem configuring your editor, just say so and someone
>>> will explain how to do it.
>>
>> I've never figured out how to configure emacs to deduce whether the
>> current file uses spaces or tabs and has a 4 or 8 space indent. I
>> always try to get it right anyway, but it'd be a lot more convenient
>> if my editor did it for me. If there are such instructions, perhaps
>> they should be added to PEPs 7 and 8?
>
> I use this little hack to detect indentation in Python's C files:
>
> (defun c-select-style ()
>  "Hack: Select the C style to use from buffer indentation."
>  (save-excursion
>    (if (re-search-forward "^\t" 3000 t)
>        (c-set-style "python")
>      (c-set-style "python-new"))))
>
> (add-hook 'c-mode-hook 'c-select-style)
>
> -- where "python" and "python-new" are two appropriate c-mode styles.
>

Anyone have something similar for Vim?

-Brett

From mike.klaas at gmail.com  Mon Dec 15 20:40:47 2008
From: mike.klaas at gmail.com (Mike Klaas)
Date: Mon, 15 Dec 2008 11:40:47 -0800
Subject: [Python-Dev] Psyco for -OO or -O
In-Reply-To: <4943B885.1070605@voidspace.org.uk>
References: <ghvpt8$qq0$1@ger.gmane.org> <4943B885.1070605@voidspace.org.uk>
Message-ID: <8EEA1438-116A-4226-8C01-E32F36445D00@gmail.com>


On 13-Dec-08, at 5:28 AM, Michael Foord wrote:

> Lie Ryan wrote:
>> I'm sure probably most of you knows about psyco[1], the optimizer.  
>> Python has an -O and -OO flag that is intended to be optimization  
>> flag, but we know that currently it doesn't do much. Why not add  
>> psyco as standard library and let -O or -OO invoke psyco?
>>
>
> This really belongs on Python-ideas and not Python-dev.
>
> The main reason why not is that someone(s) from the Python core team  
> would then need to 'own' maintaining Psyco (which is x86 only as well

Worse, it is 32bit only, which has greatly diminished its usefulness  
in the last few years.

-Mike

From guido at python.org  Mon Dec 15 21:59:30 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 15 Dec 2008 12:59:30 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
Message-ID: <ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>

Aha! A specific file. I'm supportive of fixing that specific file. Now
if you can figure out how to do it and still allow merging between 2.6
and 3.0 that would be cool.

--Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, Dec 14, 2008 at 9:54 AM, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> On Sat, Dec 13, 2008 at 5:11 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Guido van Rossum <guido <at> python.org> writes:
>>>
>>> I think we should not do this. We should use 4 space indents for new
>>> files, but existing files should not be reindented.
>>
>> Well, right now many files are indented with a mix of spaces and tabs, depending
>> on who did the edit and how their editor was configured at the time.
>>
>
> Personally, I think the indentation of, at least,
> Objects/unicodeobject.c should be fixed. This file has become so
> mixed-up with tab and space indents that I have no-idea what to use
> when I edit it. Just to give an idea how messy it is, they are 5214
> lines indented with tabs and 4272 indented with spaces (out the 9733
> of the file).

From jmurphy41 at mac.com  Mon Dec 15 20:59:51 2008
From: jmurphy41 at mac.com (Jim Murphy)
Date: Mon, 15 Dec 2008 14:59:51 -0500
Subject: [Python-Dev] How to force export of a particular symbol from
 python.exe?
Message-ID: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>

Martin:

You wrote:

   "That's not the issue. Had pymath.o been linked into python, it's
    symbols would have been exported (is that proper use of English
    tenses?)"

Yes, it's a proper and idiomatic use of the subjunctive mood, which
many native (American) English speakers manage to mangle.

I also noticed you wrote the following a few emails later on the
python-dev list:

     "Using that would require to split pymath.c into multiple files."

My ear tells me that either "that would require splitting pymath,c ..."
or "that would require one to split pymat.c ..." is much more  
grammatical
than "that would require to split ...," but I can't cite a rule. It is  
frequently
acceptable to use either the infinitive  or the gerund form of a verb,  
which
would imply that "to split" should be interchangeable with  
"splitting," but
I believe that some verbs have preferences for one form over the other.

My ear seems to be thinking of the template "require someone to do
something," and rebels at hearing the "to" without a "someone."  That's
the  best excuse for a rule I could come up with.

I actually spent a half-hour trying to find rules on the uses in  
English of
infinitives versus gerunds and did not find anything definitive. I  
realize
now, to my disgust,  that English usage is very badly afflicted with
"special casing."


Jim Murphy
326 Sunnyview Lane
Ithaca, New York  14850-6258
Tel (home): +1 607-319-4161



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3837 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081215/c1fe100f/attachment.bin>

From martin at v.loewis.de  Mon Dec 15 22:08:22 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 15 Dec 2008 22:08:22 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>
References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de>	
	<49443D6A.9020308@v.loewis.de>
	<e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>
Message-ID: <4946C746.3020009@v.loewis.de>

> Would anyone mind terribly if I backported a version of:
> http://bugs.python.org/issue4501 to 2.4 and 2.5?

Yes, I would. These branches are frozen right now until the
final release is made. Afterwards, only security-critical patches
are allowed, which this one is not, AFAICT.

> It fixes some strange duplicate data issues on poll() with packets
> with a nonstandard flag set.

People experiencing this should upgrade to 2.6 (when it is fixed there).

Regards,
Martin

From steve at holdenweb.com  Mon Dec 15 22:09:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 15 Dec 2008 16:09:49 -0500
Subject: [Python-Dev] How to force export of a particular symbol from
	python.exe?
In-Reply-To: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>
References: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>
Message-ID: <4946C79D.5060109@holdenweb.com>

Jim Murphy wrote:
> Martin:
> 
> You wrote:
> 
>   "That's not the issue. Had pymath.o been linked into python, it's
>    symbols would have been exported (is that proper use of English
>    tenses?)"
> 
It does, however, make the common mistake of putting an apostrophe in a
possessive personal pronoun.

> Yes, it's a proper and idiomatic use of the subjunctive mood, which
> many native (American) English speakers manage to mangle.
> 
> I also noticed you wrote the following a few emails later on the
> python-dev list:
> 
>     "Using that would require to split pymath.c into multiple files."
> 
> My ear tells me that either "that would require splitting pymath,c ..."
> or "that would require one to split pymat.c ..." is much more grammatical
> than "that would require to split ...," but I can't cite a rule. It is
> frequently
> acceptable to use either the infinitive  or the gerund form of a verb,
> which
> would imply that "to split" should be interchangeable with "splitting," but
> I believe that some verbs have preferences for one form over the other.
> 
> My ear seems to be thinking of the template "require someone to do
> something," and rebels at hearing the "to" without a "someone."  That's
> the  best excuse for a rule I could come up with.
> 
> I actually spent a half-hour trying to find rules on the uses in English of
> infinitives versus gerunds and did not find anything definitive. I realize
> now, to my disgust,  that English usage is very badly afflicted with
> "special casing."
> 
This is only significant because Martin is a perfectionist who wants to
write better English. I can't remember a time when his
slightly-less-than-perfect command of the language rendered anything he
wrote incomprehensible.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From steve at holdenweb.com  Mon Dec 15 22:09:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 15 Dec 2008 16:09:49 -0500
Subject: [Python-Dev] How to force export of a particular symbol from
	python.exe?
In-Reply-To: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>
References: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>
Message-ID: <4946C79D.5060109@holdenweb.com>

Jim Murphy wrote:
> Martin:
> 
> You wrote:
> 
>   "That's not the issue. Had pymath.o been linked into python, it's
>    symbols would have been exported (is that proper use of English
>    tenses?)"
> 
It does, however, make the common mistake of putting an apostrophe in a
possessive personal pronoun.

> Yes, it's a proper and idiomatic use of the subjunctive mood, which
> many native (American) English speakers manage to mangle.
> 
> I also noticed you wrote the following a few emails later on the
> python-dev list:
> 
>     "Using that would require to split pymath.c into multiple files."
> 
> My ear tells me that either "that would require splitting pymath,c ..."
> or "that would require one to split pymat.c ..." is much more grammatical
> than "that would require to split ...," but I can't cite a rule. It is
> frequently
> acceptable to use either the infinitive  or the gerund form of a verb,
> which
> would imply that "to split" should be interchangeable with "splitting," but
> I believe that some verbs have preferences for one form over the other.
> 
> My ear seems to be thinking of the template "require someone to do
> something," and rebels at hearing the "to" without a "someone."  That's
> the  best excuse for a rule I could come up with.
> 
> I actually spent a half-hour trying to find rules on the uses in English of
> infinitives versus gerunds and did not find anything definitive. I realize
> now, to my disgust,  that English usage is very badly afflicted with
> "special casing."
> 
This is only significant because Martin is a perfectionist who wants to
write better English. I can't remember a time when his
slightly-less-than-perfect command of the language rendered anything he
wrote incomprehensible.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From victor.stinner at haypocalc.com  Mon Dec 15 22:14:16 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Mon, 15 Dec 2008 22:14:16 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>
References: <49443710.3060102@v.loewis.de> <49443D6A.9020308@v.loewis.de>
	<e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>
Message-ID: <200812152214.16260.victor.stinner@haypocalc.com>

Le Monday 15 December 2008 19:50:42 Josiah Carlson, vous avez ?crit?:
> Would anyone mind terribly if I backported a version of:
> http://bugs.python.org/issue4501 to 2.4 and 2.5?

First the patch have be reviewed and at least applied to trunk :-)

Can you give an short example to describe the bug? Maybe write an unit test?

I don't know poll(), so I can't help for this issue.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From ncoghlan at gmail.com  Mon Dec 15 22:17:57 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 16 Dec 2008 07:17:57 +1000
Subject: [Python-Dev] How to force export of a particular symbol from
 python.exe?
In-Reply-To: <4946C79D.5060109@holdenweb.com>
References: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>
	<4946C79D.5060109@holdenweb.com>
Message-ID: <4946C985.40701@gmail.com>

Steve Holden wrote:
> This is only significant because Martin is a perfectionist who wants to
> write better English. I can't remember a time when his
> slightly-less-than-perfect command of the language rendered anything he
> wrote incomprehensible.

I'd actually criticise the written communication abilities of many of my
native English speaking friends long before I'd criticise the English
writing of most of the non-Native English speakers on this list (i.e.
most of the writing here is of a higher standard than many native
English speakers could manage).

This particular thread of discussion does appear to be veering a little
off topic though :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Mon Dec 15 22:24:18 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 15 Dec 2008 22:24:18 +0100
Subject: [Python-Dev] How to force export of a particular symbol from
 python.exe?
In-Reply-To: <4946C985.40701@gmail.com>
References: <E28CD280-3CCC-4848-8499-19BCB3A47C2D@mac.com>	<4946C79D.5060109@holdenweb.com>
	<4946C985.40701@gmail.com>
Message-ID: <4946CB02.6000206@v.loewis.de>

> This particular thread of discussion does appear to be veering a little
> off topic though :)

And I apologize for starting it :-)

Martin

From scott+python-dev at scottdial.com  Mon Dec 15 22:25:24 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Mon, 15 Dec 2008 16:25:24 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
	<ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
Message-ID: <4946CB44.9070800@scottdial.com>

Guido van Rossum wrote:
> Aha! A specific file. I'm supportive of fixing that specific file. Now
> if you can figure out how to do it and still allow merging between 2.6
> and 3.0 that would be cool.

Like "svn blame", you can use "svn merge -x -w" to avoid merging
whitespace changes. However, svnmerge.py does not support any of these
command-line flags being passed along to the svn command-line. It should
be pretty easy to hack in, if it was desirable.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From martin at v.loewis.de  Mon Dec 15 22:28:42 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 15 Dec 2008 22:28:42 +0100
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>	<loom.20081213T220617-917@post.gmane.org>	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
	<ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
Message-ID: <4946CC0A.6050109@v.loewis.de>

> Aha! A specific file. I'm supportive of fixing that specific file. Now
> if you can figure out how to do it and still allow merging between 2.6
> and 3.0 that would be cool.

In the specific case, I think it's best to fix the 2.7 source, and then
merge the changes into 3k. The 3.x version is still similar to the 2.x
version, except for a number of additions (such as interning).

The changes should probably then also merged into the 2.6 and 3.0
branches, to allow easy merging in the future. Backporting to 2.5 will
become difficult; it will also become unnecessary.

Regards,
Martin

From alexandre at peadrop.com  Mon Dec 15 22:40:47 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 15 Dec 2008 16:40:47 -0500
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<acd65fa20812140954s349b14fpd989cc08fd60bd86@mail.gmail.com>
	<ca471dc20812151259u5cf555c3xad75d894a099fba1@mail.gmail.com>
Message-ID: <acd65fa20812151340v5243479arca03a79c2fd035cf@mail.gmail.com>

On Mon, Dec 15, 2008 at 3:59 PM, Guido van Rossum <guido at python.org> wrote:
> Aha! A specific file. I'm supportive of fixing that specific file. Now
> if you can figure out how to do it and still allow merging between 2.6
> and 3.0 that would be cool.
>

Here's the simplest solution I thought so far to allow smooth merging
subsequently. First, fix the 2.6 version with 4-space indent. Over a
third of the file is already using spaces for indentation, so I don't
think losing consistency is a big deal. Then, block the trunk commit
with svnmerge to prevent it from being merged back to the py3k branch.
Finally, fix the 3.0 version.

-- Alexandre

From josiah.carlson at gmail.com  Mon Dec 15 22:58:34 2008
From: josiah.carlson at gmail.com (Josiah Carlson)
Date: Mon, 15 Dec 2008 13:58:34 -0800
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1
In-Reply-To: <200812152214.16260.victor.stinner@haypocalc.com>
References: <49443710.3060102@v.loewis.de> <49443D6A.9020308@v.loewis.de>
	<e6511dbf0812151050u3e848b14pe38fd83be9e18718@mail.gmail.com>
	<200812152214.16260.victor.stinner@haypocalc.com>
Message-ID: <e6511dbf0812151358h6ec96273w3b71f0f02fa30d73@mail.gmail.com>

On Mon, Dec 15, 2008 at 1:14 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Monday 15 December 2008 19:50:42 Josiah Carlson, vous avez ?crit :
>> Would anyone mind terribly if I backported a version of:
>> http://bugs.python.org/issue4501 to 2.4 and 2.5?
>
> First the patch have be reviewed and at least applied to trunk :-)
>
> Can you give an short example to describe the bug? Maybe write an unit test?
>
> I don't know poll(), so I can't help for this issue.

One of our 3rd party users of asyncore, ftpdlib by Giampaolo Rodola,
discovered a duplicate data issue related to data with the urgent data
flag attached to TCP packets.  I don't know the underlying source of
the issue (it smells like a buffer duplication bug, but I can't see
that asyncore is doing it), but the patch does fix the issue.

But with policies being "only security issues are backported to 2.4
and 2.5", and this is definitely not a security issue, I won't
backport it.

 - Josiah

From skip at pobox.com  Tue Dec 16 01:44:05 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 15 Dec 2008 18:44:05 -0600
Subject: [Python-Dev] [Python-3000] python-3000 list is closed
In-Reply-To: <4946D870.7000308@v.loewis.de>
References: <4946D870.7000308@v.loewis.de>
Message-ID: <18758.63957.659035.419926@montanaro-dyndns-org.local>


    Martin> The mailing list python-3000 at python.org is now closed. All
    Martin> further discussion of Python 3.x takes place on
    Martin> python-dev at python.org.

Maybe set up a simple email alias reflecting python-3000 to python-dev?

Skip

From barry at python.org  Tue Dec 16 02:07:19 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 15 Dec 2008 20:07:19 -0500
Subject: [Python-Dev] [Python-3000] python-3000 list is closed
In-Reply-To: <18758.63957.659035.419926@montanaro-dyndns-org.local>
References: <4946D870.7000308@v.loewis.de>
	<18758.63957.659035.419926@montanaro-dyndns-org.local>
Message-ID: <22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 15, 2008, at 7:44 PM, skip at pobox.com wrote:

>
>    Martin> The mailing list python-3000 at python.org is now closed. All
>    Martin> further discussion of Python 3.x takes place on
>    Martin> python-dev at python.org.
>
> Maybe set up a simple email alias reflecting python-3000 to python- 
> dev?


Or,

https://launchpad.net/replybot

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSUb/R3EjvBPtnXfVAQJ4sgP/Wy8ma4nzcYQ5gXVCw2TpODq5l/duzB+I
f3ej5tSyvI2wzf+OTQQwth5A0xySB8LoGbSQsYwhvbA+3xXOe1lIYeVYUGru9Y4T
xs1axRgydTwxAFgHBdjrY7tLhXH4GOed0xYvbu6b3tRslb+4agmOhluX4WCBRZH+
sgIW0XL7nsI=
=Nrdo
-----END PGP SIGNATURE-----

From brad at python.org  Tue Dec 16 02:55:09 2008
From: brad at python.org (Brad Knowles)
Date: Mon, 15 Dec 2008 19:55:09 -0600
Subject: [Python-Dev] [Python-3000] python-3000 list is closed
In-Reply-To: <22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org>
References: <4946D870.7000308@v.loewis.de>
	<18758.63957.659035.419926@montanaro-dyndns-org.local>
	<22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org>
Message-ID: <49470A7D.3000301@python.org>

Barry Warsaw wrote:

>> Maybe set up a simple email alias reflecting python-3000 to python-dev?
> 
> 
> Or,
> 
> https://launchpad.net/replybot

If we're going to leave something configured in Mailman, it already has an 
auto-reply functionality.  It would be nearly trivial to set that up.

-- 
Brad Knowles <brad at python.org>
Member of the Python.org Postmaster Team & Co-Moderator of the
mailman-users and mailman-developers mailing lists

From bharat.satsangi at gmail.com  Tue Dec 16 06:02:18 2008
From: bharat.satsangi at gmail.com (bharat satsangi)
Date: Tue, 16 Dec 2008 10:32:18 +0530
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <bbaeab100812151121v3103b0e0qd260b0e22215856a@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<gi5429$ov$1@ger.gmane.org>
	<bbaeab100812151121v3103b0e0qd260b0e22215856a@mail.gmail.com>
Message-ID: <15c21a910812152102i6e0bf349k6981f4f04ff5808d@mail.gmail.com>

please unsubscribe me



On Tue, Dec 16, 2008 at 12:51 AM, Brett Cannon <brett at python.org> wrote:

>  On Mon, Dec 15, 2008 at 00:20, Georg Brandl <g.brandl at gmx.net> wrote:
> > Jeffrey Yasskin schrieb:
> >> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum <guido at python.org>
> wrote:
> >>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >>>> Guido van Rossum <guido <at> python.org> writes:
> >>>>>
> >>>>> I think we should not do this. We should use 4 space indents for new
> >>>>> files, but existing files should not be reindented.
> >>>>
> >>>> Well, right now many files are indented with a mix of spaces and tabs,
> depending
> >>>> on who did the edit and how their editor was configured at the time.
> >>>
> >>> That's  a shame. We used to have more rigorous standards than allowing
> that.
> >>>
> >>>> Perhaps a graceful policy would be to mandate that all new edits be
> made with
> >>>> spaces without touching other functions in the file. Then hopefully
> the code
> >>>> base would gradually converge to a tabless scheme.
> >>>
> >>> I don't think so. I find local consistency more important than global
> >>> consistency. A file can become really hard to read when different
> >>> indentation schemes are used in random parts of the code.
> >>>
> >>> If you have a problem configuring your editor, just say so and someone
> >>> will explain how to do it.
> >>
> >> I've never figured out how to configure emacs to deduce whether the
> >> current file uses spaces or tabs and has a 4 or 8 space indent. I
> >> always try to get it right anyway, but it'd be a lot more convenient
> >> if my editor did it for me. If there are such instructions, perhaps
> >> they should be added to PEPs 7 and 8?
> >
> > I use this little hack to detect indentation in Python's C files:
> >
> > (defun c-select-style ()
> >  "Hack: Select the C style to use from buffer indentation."
> >  (save-excursion
> >    (if (re-search-forward "^\t" 3000 t)
> >        (c-set-style "python")
> >      (c-set-style "python-new"))))
> >
> > (add-hook 'c-mode-hook 'c-select-style)
> >
> > -- where "python" and "python-new" are two appropriate c-mode styles.
> >
>
> Anyone have something similar for Vim?
>
> -Brett
>  _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/bharat.satsangi%40gmail.com
>



-- 
Thanks and Regards

Bharat
+91-9888674137
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081216/6f3935e7/attachment.htm>

From tjreedy at udel.edu  Tue Dec 16 08:15:33 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 16 Dec 2008 02:15:33 -0500
Subject: [Python-Dev] [Python-3000] python-3000 list is closed
In-Reply-To: <18758.63957.659035.419926@montanaro-dyndns-org.local>
References: <4946D870.7000308@v.loewis.de>
	<18758.63957.659035.419926@montanaro-dyndns-org.local>
Message-ID: <gi7kim$q7e$1@ger.gmane.org>

skip at pobox.com wrote:
>     Martin> The mailing list python-3000 at python.org is now closed. All
>     Martin> further discussion of Python 3.x takes place on
>     Martin> python-dev at python.org.
> 
> Maybe set up a simple email alias reflecting python-3000 to python-dev?

It is currently mirrored to news.gmane.org (which is how I have read it 
and all other Python lists).  I presume they will want to keep their 
archive (or maybe not), but whatever bot is set up might take their bots 
into consideration, unless there is a way to explicitly close it on 
their end.  (I am ignorant of this sort of stuff, but appreciate the 
mirror.)

tjr


From kirklin.mcdonald at gmail.com  Tue Dec 16 08:58:09 2008
From: kirklin.mcdonald at gmail.com (Kirk McDonald)
Date: Mon, 15 Dec 2008 23:58:09 -0800
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <bbaeab100812151121v3103b0e0qd260b0e22215856a@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
	<loom.20081213T220617-917@post.gmane.org>
	<ca471dc20812140826n125fbb95r33f14fa727d39333@mail.gmail.com>
	<5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com>
	<gi5429$ov$1@ger.gmane.org>
	<bbaeab100812151121v3103b0e0qd260b0e22215856a@mail.gmail.com>
Message-ID: <25bd58d10812152358n38928e51j89fe19288f5a7cfd@mail.gmail.com>

On Mon, Dec 15, 2008 at 11:21 AM, Brett Cannon <brett at python.org> wrote:

> On Mon, Dec 15, 2008 at 00:20, Georg Brandl <g.brandl at gmx.net> wrote:
> > Jeffrey Yasskin schrieb:
> >> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum <guido at python.org>
> wrote:
> >>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >>>> Guido van Rossum <guido <at> python.org> writes:
> >>>>>
> >>>>> I think we should not do this. We should use 4 space indents for new
> >>>>> files, but existing files should not be reindented.
> >>>>
> >>>> Well, right now many files are indented with a mix of spaces and tabs,
> depending
> >>>> on who did the edit and how their editor was configured at the time.
> >>>
> >>> That's  a shame. We used to have more rigorous standards than allowing
> that.
> >>>
> >>>> Perhaps a graceful policy would be to mandate that all new edits be
> made with
> >>>> spaces without touching other functions in the file. Then hopefully
> the code
> >>>> base would gradually converge to a tabless scheme.
> >>>
> >>> I don't think so. I find local consistency more important than global
> >>> consistency. A file can become really hard to read when different
> >>> indentation schemes are used in random parts of the code.
> >>>
> >>> If you have a problem configuring your editor, just say so and someone
> >>> will explain how to do it.
> >>
> >> I've never figured out how to configure emacs to deduce whether the
> >> current file uses spaces or tabs and has a 4 or 8 space indent. I
> >> always try to get it right anyway, but it'd be a lot more convenient
> >> if my editor did it for me. If there are such instructions, perhaps
> >> they should be added to PEPs 7 and 8?
> >
> > I use this little hack to detect indentation in Python's C files:
> >
> > (defun c-select-style ()
> >  "Hack: Select the C style to use from buffer indentation."
> >  (save-excursion
> >    (if (re-search-forward "^\t" 3000 t)
> >        (c-set-style "python")
> >      (c-set-style "python-new"))))
> >
> > (add-hook 'c-mode-hook 'c-select-style)
> >
> > -- where "python" and "python-new" are two appropriate c-mode styles.
> >
>
> Anyone have something similar for Vim?
>
> -Brett
>

Something along the lines of:

:fu Select_c_style()
:   if search('^\t')
:       set noet
        " etc.
:   el
:       set et
        " etc.
:   en
:endf
:au BufRead *.[ch] call Select_c_style()

-Kirk McDonald
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081215/5c7e13d0/attachment-0001.htm>

From syfou at users.sourceforge.net  Tue Dec 16 09:44:06 2008
From: syfou at users.sourceforge.net (Sylvain Fourmanoit)
Date: Tue, 16 Dec 2008 03:44:06 -0500 (EST)
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
Message-ID: <alpine.LNX.2.00.0812160325400.10838@Turing>

On Sat, 13 Dec 2008, Guido van Rossum wrote:
> If you reindent, much of the history of the file is essentially lost -- 
> "svn blame" will blame whoever reindented the code, and it's a pain to 
> go back.

I am not a subversion specialist, but it appears this part can be handled 
gracefully by passing -b (ignore space change) to an external diff command 
svn blame can rely on (svn blame -x -ub ...). At least, it seems to work 
on my station (GNU Diffutils, Subversion 1.5.1)!

--
Sylvain

From techtonik at gmail.com  Tue Dec 16 10:26:55 2008
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 16 Dec 2008 11:26:55 +0200
Subject: [Python-Dev] Reindenting the C code base?
In-Reply-To: <ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
References: <loom.20081213T211524-493@post.gmane.org>
	<ca471dc20812131326x13384e3eu8583445ea9aa1995@mail.gmail.com>
Message-ID: <d34314100812160126m1bb840ak7a5b9130aecd9ad4@mail.gmail.com>

On Sat, Dec 13, 2008 at 11:26 PM, Guido van Rossum <guido at python.org> wrote:
>
> I think we should not do this. We should use 4 space indents for new
> files, but existing files should not be reindented. If you reindent,
> much of the history of the file is essentially lost -- "svn blame"
> will blame whoever reindented the code, and it's a pain to go back.
> There's also the issue of merging between the 2.x and 3.x branches,
> which we still do.

"svnadmin dump" produces pretty munchable text file to pretend that
there were no tabs at all. The problem may be to sync working copies
with old new repository.
http://svnbook.red-bean.com/en/1.5/svn.ref.svnadmin.c.dump.html

svn pre-commit hook can be used to avoid any unescaped tabs in future commits.
http://svnbook.red-bean.com/en/1.5/svn.ref.reposhooks.pre-commit.html

Adding pre-commit hook is better than adding editor-specific comments,
because it doesn't require your editor to support the syntax -
regardless of editor you will have to convert tabs file to spaces
anyway.

-- 
--anatoly t.

From krstic at solarsail.hcs.harvard.edu  Tue Dec 16 22:37:26 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Tue, 16 Dec 2008 16:37:26 -0500
Subject: [Python-Dev] Trap SIGSEGV and SIGFPE
In-Reply-To: <49417293.50506@v.loewis.de>
References: <200812101206.49316.victor.stinner@haypocalc.com>
	<49404CEB.8040900@v.loewis.de>
	<B5342F9C-6344-4390-AA07-91945A82AF3B@solarsail.hcs.harvard.edu>
	<49417293.50506@v.loewis.de>
Message-ID: <68D5B02F-A716-4E66-86FF-B50A0FAEFF4E@solarsail.hcs.harvard.edu>

On Dec 11, 2008, at 3:05 PM, Martin v. L?wis wrote:
> If it is actually possible to print a stack trace, that could be  
> useful indeed. I'm then skeptical that this is possible in the  
> general case (i.e. displaying the full C stack), but displaying  
> (parts of) the Python stack might be possible. I think it should  
> still proceed to dump core, so that you can then inspect the core  
> with a proper debugger.


+1. Victor, any interest in attempting to retool your patch in this  
direction?

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From krstic at solarsail.hcs.harvard.edu  Tue Dec 16 22:43:40 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Tue, 16 Dec 2008 16:43:40 -0500
Subject: [Python-Dev] The endless GIL debate: why not remove
	thread	support instead?
In-Reply-To: <49443B7F.8020602@v.loewis.de>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com>	<4943B974.6020407@voidspace.org.uk>	<ca471dc20812130814l374f0a37y82d4e7c1dffa596f@mail.gmail.com>
	<gi0pi4$f0j$1@ger.gmane.org> <49443B7F.8020602@v.loewis.de>
Message-ID: <8C214F23-C8D3-49F3-BC9B-0D945218EB0E@solarsail.hcs.harvard.edu>

On Dec 13, 2008, at 5:47 PM, Martin v. L?wis wrote:
> They were originally invented in 1965, on Multics (1970) they were  
> used to perform compilation in the background. When Unix came along,  
> it *added* address space separation, introducing what is now known  
> as processes.


Yes, and a lot of the subsequent interest in threads came due to the  
historically debilitating overhead of fork() on some important Unices,  
notably Solaris.

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From solipsis at pitrou.net  Tue Dec 16 22:53:23 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 16 Dec 2008 21:53:23 +0000 (UTC)
Subject: [Python-Dev] Calling the GC less often when there are lots of
	long-lived objects
Message-ID: <loom.20081216T214305-45@post.gmane.org>


Hello,

There are recurring complaints about the garbage collector degrading performance
when lots of objects are created in a row. In issue #4074, I've proposed a patch
which basically implements Martin's suggestion in
http://mail.python.org/pipermail/python-dev/2008-June/080579.html to base the
decision to do a full collection on the ratio between the number of objects
surviving the (n-1) generation collection and the number of long-lived objects.
I've also added a condition so that this new behaviour is only triggered when
there are more than 10000 long-lived objects -- therefore, cycles will still get
collected quickly in lightweight programs. In Gregory's simple test of storing
many tuples in a list, the behaviour has indeed changed from exponential to
linear.

Is anybody opposed to the principle of this proposal?

Antoine.



From greg.ewing at canterbury.ac.nz  Wed Dec 17 01:27:39 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 17 Dec 2008 13:27:39 +1300
Subject: [Python-Dev] Calling the GC less often when there are lots of
 long-lived objects
In-Reply-To: <loom.20081216T214305-45@post.gmane.org>
References: <loom.20081216T214305-45@post.gmane.org>
Message-ID: <4948477B.20609@canterbury.ac.nz>

Antoine Pitrou wrote:
> I've proposed a patch
> which basically implements Martin's suggestion in
> http://mail.python.org/pipermail/python-dev/2008-June/080579.html
> 
> Is anybody opposed to the principle of this proposal?

Sounds okay to me.

-- 
Greg

From lists at cheimes.de  Wed Dec 17 01:51:21 2008
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 17 Dec 2008 01:51:21 +0100
Subject: [Python-Dev] Calling the GC less often when there are lots of
 long-lived objects
In-Reply-To: <loom.20081216T214305-45@post.gmane.org>
References: <loom.20081216T214305-45@post.gmane.org>
Message-ID: <49484D09.4040202@cheimes.de>

Antoine Pitrou schrieb:
> Is anybody opposed to the principle of this proposal?

Is it reasonable to implement multiple policies so the user can switch
between them? Or is the new algorithm superior in all cases?

From solipsis at pitrou.net  Wed Dec 17 02:00:56 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 17 Dec 2008 01:00:56 +0000 (UTC)
Subject: [Python-Dev] Calling the GC less often when there are lots of
	long-lived objects
References: <loom.20081216T214305-45@post.gmane.org>
	<49484D09.4040202@cheimes.de>
Message-ID: <loom.20081217T005930-637@post.gmane.org>

Christian Heimes <lists <at> cheimes.de> writes:
> 
> Is it reasonable to implement multiple policies so the user can switch
> between them? Or is the new algorithm superior in all cases?

We could let the user configure the threshold between the old policy and the new
policy. Currently it is hard-wired to a value of 10000 (that is, 10000
long-lived objects tracked by the GC).




From martin.hellwig at dcuktec.org  Tue Dec 16 21:12:51 2008
From: martin.hellwig at dcuktec.org (Martin P. Hellwig)
Date: Tue, 16 Dec 2008 20:12:51 +0000
Subject: [Python-Dev] =?windows-1252?q?=5BANN=5D_EuroPython_2009_=96_Call_?=
 =?windows-1252?q?for_Participation!?=
Message-ID: <49480BC3.9030002@dcuktec.org>

On behalf of the EuroPython 2009 organisation it is my privilege and 
honour to announce the 'Call for Participation' for EuroPython 2009!
EuroPython is the conference for the communities around Python, 
including the Django, Zope and Plone communities.
This years conference will be held in Birmingham, UK from Monday 29th 
June to Saturday 4th July 2009.

Talk & Themes
Do you have something you wish to present at EuroPython? Go to 
http://www.europython.eu/talks/cfp/  for this years themes and 
submissions criteria, the deadline is on 5th April 2009.

Other Talks, Activities and Events
Have you got something which does not fit the above? Visit 
http://www.europython.eu/talks/ .

Help Us Out
We could use a hand any contribution is welcome, please take a look at 
http://www.europython.eu/contact/ .

Sponsors
An unique opportunity to affiliate with the prestigious EuroPython 
conference!
http://www.europython.eu/sponsors/

Spread the Word
Improve our publicity by distributing this announcement in your corner 
of the community, please coordinate this with the organizers: 
http://www.europython.eu/contact/

General Information
For more information about the conference, please visit 
http://www.europython.eu/

Looking forward to see you!

The EuroPython Team


From bioinformed at gmail.com  Wed Dec 17 16:37:08 2008
From: bioinformed at gmail.com (Kevin Jacobs <jacobs@bioinformed.com>)
Date: Wed, 17 Dec 2008 10:37:08 -0500
Subject: [Python-Dev] Calling the GC less often when there are lots of
	long-lived objects
In-Reply-To: <loom.20081217T005930-637@post.gmane.org>
References: <loom.20081216T214305-45@post.gmane.org>
	<49484D09.4040202@cheimes.de>
	<loom.20081217T005930-637@post.gmane.org>
Message-ID: <2e1434c10812170737q603acb03va3d3a46a459546dd@mail.gmail.com>

On Tue, Dec 16, 2008 at 8:00 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Christian Heimes <lists <at> cheimes.de> writes:
> >
> > Is it reasonable to implement multiple policies so the user can switch
> > between them? Or is the new algorithm superior in all cases?
>
> <http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com>
>

I'll test your patch, as I currently have to micro-manage the garbage
collector in several of my algorithms or else they degenerate into
almost continuous collection.

Results in a day or two.

~Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081217/3b42a2cd/attachment.htm>

From guido at python.org  Wed Dec 17 19:05:29 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 17 Dec 2008 10:05:29 -0800
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
	<e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
Message-ID: <ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>

The inheritance from io.RawIOBase seems fine.

--Guido van Rossum (home page: http://www.python.org/~guido/)



On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> I have a patch that appears to fix this bug
> http://bugs.python.org/file12361/urllib-chunked.diff
> but I'm not sure about its interaction with the io module and
> RawIOBase.  Is there a new IO expert who could take a look at it for
> me?
>
> Jeremy
>
> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>> This bug is pretty serious, because urllib will insert garbage into
>> the application-visible data for a chunked response.  It simply
>> ignores the fact that it's reading a chunked response and includes the
>> chunked header data is payload data.  The original bug was reported in
>> September, but no one noticed it.  It was reported again recently.
>>
>> http://bugs.python.org/issue3761
>> http://bugs.python.org/issue4631
>>
>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>> that's not my call.
>>
>> Jeremy
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

From martin at v.loewis.de  Wed Dec 17 22:05:50 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 17 Dec 2008 22:05:50 +0100
Subject: [Python-Dev] Please test OSX installer
Message-ID: <494969AE.3060805@v.loewis.de>

I just created an OSX installer for 2.5.3c1. As it's the first time
I do that, I'd appreciate if somebody could test it and report whether
it works (as well as the 2.5.2 one did).

http://www.python.org/download/releases/2.5.3/

Regards,
Martin

From solipsis at pitrou.net  Wed Dec 17 23:02:10 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 17 Dec 2008 22:02:10 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Calling_the_GC_less_often_when_there_are_l?=
	=?utf-8?q?ots_of=09long-lived_objects?=
References: <loom.20081216T214305-45@post.gmane.org>
	<49484D09.4040202@cheimes.de>
	<loom.20081217T005930-637@post.gmane.org>
Message-ID: <loom.20081217T214619-273@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:
> 
> We could let the user configure the threshold between the old policy and the 
new
> policy. Currently it is hard-wired to a value of 10000 (that is, 10000
> long-lived objects tracked by the GC).

I've removed the threshold in the latest patches because it didn't make much
sense when a few long-lived objects contained a lot of objects not tracked by
the GC.

Another improvement I've included in the latest patches (but which is
orthogonal to the algorithmic change) is that simple tuples and even simple
dicts are not tracked by the GC if they don't need to. A few examples
(gc.is_tracked() is a new function which returns True if an object is tracked
by the GC):

>>> import gc
>>> gc.is_tracked(())
False
>>> gc.is_tracked((1,2))
False
>>> gc.is_tracked((1,(2, "a", None)))
False
>>> gc.is_tracked((1,(2, "a", None, {})))
True

>>> d = {}
>>> gc.is_tracked(d)
False
>>> d[1,2] = 3,4
>>> gc.is_tracked(d)
False
>>> d[5] = None, "a", (1,2,3)
>>> gc.is_tracked(d)
False
>>> d[6] = {}
>>> gc.is_tracked(d)
True
>>> gc.is_tracked(d[6])
False

Regards

Antoine.



From guido at python.org  Wed Dec 17 23:03:38 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 17 Dec 2008 14:03:38 -0800
Subject: [Python-Dev] Please test OSX installer
In-Reply-To: <494969AE.3060805@v.loewis.de>
References: <494969AE.3060805@v.loewis.de>
Message-ID: <ca471dc20812171403r16a9745ci3452801edbdac75e@mail.gmail.com>

Worked flawlessly both on an x86 MacBook Pro running Leopard (10.5)
and a ppc PowerBook G4 running Tiget (10.4).

The only issue is that the Python logo makes the text in the sidebar
of the installer hard to read.

I didn't test the GUI app.

Thanks for doing this!

--Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Dec 17, 2008 at 1:05 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I just created an OSX installer for 2.5.3c1. As it's the first time
> I do that, I'd appreciate if somebody could test it and report whether
> it works (as well as the 2.5.2 one did).
>
> http://www.python.org/download/releases/2.5.3/

From alexander.belopolsky at gmail.com  Wed Dec 17 23:04:12 2008
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 17 Dec 2008 17:04:12 -0500
Subject: [Python-Dev] Please test OSX installer
In-Reply-To: <494969AE.3060805@v.loewis.de>
References: <494969AE.3060805@v.loewis.de>
Message-ID: <d38f5330812171404w35fc1538o714d70ee8d88b52e@mail.gmail.com>

I've installed it on a MacBook Air running Leopard (10.5.6).
Installer ran like a charm, but when I ran the following in IDLE:

>>> from test.regrtest import main
>>> main()

I got a "Problem Report for Python" pop-up.  Skip to "///" for
"Problem Details".  Interestingly, the test completed with the
following report:

286 tests OK.
3 tests failed:
    test_descr test_file test_subprocess
...
3 skips unexpected on darwin:
    test_ioctl test_bsddb185 test_univnewlines

This suggests that the crash was in a subprocess.


///

Process:         Python [1203]
Path:            /Applications/MacPython 2.5/IDLE.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       X86 (Native)
Parent Process:  Python [1027]

Date/Time:       2008-12-17 16:57:52.804 -0500
OS Version:      Mac OS X 10.5.6 (9G55)
Report Version:  6

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Crashed Thread:  0

Thread 0 Crashed:
0   libSystem.B.dylib             	0x90c70e42 __kill + 10
1   libSystem.B.dylib             	0x90ce323a raise + 26
2   libSystem.B.dylib             	0x90cef679 abort + 73
3   org.python.python             	0x004bd33f posix_getloadavg + 0
(posixmodule.c:7961)
4   org.python.python             	0x0048571e PyEval_EvalFrameEx +
18973 (ceval.c:3596)
5   org.python.python             	0x00487731 PyEval_EvalCodeEx + 1819
(ceval.c:2875)
6   org.python.python             	0x004878e5 PyEval_EvalCode + 87 (ceval.c:520)
7   org.python.python             	0x004ab810 PyRun_StringFlags + 243
(pythonrun.c:1273)
8   org.python.python             	0x004ab8d7 PyRun_SimpleStringFlags
+ 72 (pythonrun.c:900)
9   org.python.python             	0x004b84c5 Py_Main + 1296 (main.c:521)
10  Python                        	0x00001f8e 0x1000 + 3982
11  Python                        	0x00001eb5 0x1000 + 3765

Thread 0 crashed with X86 Thread State (32-bit):
  eax: 0x00000000  ebx: 0x90cef639  ecx: 0xbffff1dc  edx: 0x90c70e42
  edi: 0x008001c0  esi: 0x00000000  ebp: 0xbffff1f8  esp: 0xbffff1dc
   ss: 0x0000001f  efl: 0x00000282  eip: 0x90c70e42   cs: 0x00000007
   ds: 0x0000001f   es: 0x0000001f   fs: 0x00000000   gs: 0x00000037
  cr2: 0x0048d191

Binary Images:
    0x1000 -     0x1fff +Python ??? (???) /Applications/MacPython
2.5/IDLE.app/Contents/MacOS/Python
  0x3f1000 -   0x4e7fe3 +org.python.python 2.5a0 (2.5)
/Library/Frameworks/Python.framework/Versions/2.5/Python
0x8fe00000 - 0x8fe2db43  dyld 97.1 (???)
<100d362e03410f181a34e04e94189ae5> /usr/lib/dyld
0x90c02000 - 0x90d69ff3  libSystem.B.dylib ??? (???)
<d68880dfb1f8becdbdac6928db1510fb> /usr/lib/libSystem.B.dylib
0x946bd000 - 0x946c1fff  libmathCommon.A.dylib ??? (???)
/usr/lib/system/libmathCommon.A.dylib
0xffff0000 - 0xffff1780  libSystem.B.dylib ??? (???) /usr/lib/libSystem.B.dylib



On Wed, Dec 17, 2008 at 4:05 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I just created an OSX installer for 2.5.3c1. As it's the first time
> I do that, I'd appreciate if somebody could test it and report whether
> it works (as well as the 2.5.2 one did).
>
> http://www.python.org/download/releases/2.5.3/
>
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com
>

From greg.ewing at canterbury.ac.nz  Wed Dec 17 23:52:36 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 18 Dec 2008 11:52:36 +1300
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <49423856.30705@gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
	<49423856.30705@gmail.com>
Message-ID: <494982B4.5040602@canterbury.ac.nz>

Nick Coghlan wrote:

> Actually, I believe 3.0 already took a big step towards allowing this by
> changing the way modules are initialised.

It's a step, but I wouldn't call it a big one. There are
many other problems to be solved before fully independent
interpreters are possible.

-- 
Greg

From martin at v.loewis.de  Wed Dec 17 23:55:54 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 17 Dec 2008 23:55:54 +0100
Subject: [Python-Dev] Calling the GC less often when there are lots of
 long-lived objects
In-Reply-To: <loom.20081217T214619-273@post.gmane.org>
References: <loom.20081216T214305-45@post.gmane.org>	<49484D09.4040202@cheimes.de>	<loom.20081217T005930-637@post.gmane.org>
	<loom.20081217T214619-273@post.gmane.org>
Message-ID: <4949837A.7080900@v.loewis.de>

> I've removed the threshold in the latest patches because it didn't make much
> sense when a few long-lived objects contained a lot of objects not tracked by
> the GC.
> 
> Another improvement I've included in the latest patches (but which is
> orthogonal to the algorithmic change) is that simple tuples and even simple
> dicts are not tracked by the GC if they don't need to. A few examples
> (gc.is_tracked() is a new function which returns True if an object is tracked
> by the GC):

As they are orthogonal, I think they should be considered separately,
but in particular committed separately. FWIW, I'm in favor of both
(but haven't reviewed the non-cyclic tuples one yet).

So despite the organizational overhead, I'd appreciate if you could
create separate patches, if not separate issues.

Regards,
Martin

From solipsis at pitrou.net  Thu Dec 18 00:06:53 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 17 Dec 2008 23:06:53 +0000 (UTC)
Subject: [Python-Dev] Calling the GC less often when there are lots of
	long-lived objects
References: <loom.20081216T214305-45@post.gmane.org>	<49484D09.4040202@cheimes.de>	<loom.20081217T005930-637@post.gmane.org>
	<loom.20081217T214619-273@post.gmane.org>
	<4949837A.7080900@v.loewis.de>
Message-ID: <loom.20081217T230625-782@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> So despite the organizational overhead, I'd appreciate if you could
> create separate patches, if not separate issues.

Ok, I'm gonna do that.

Regards

Antoine.



From arnarbi at gmail.com  Thu Dec 18 02:33:35 2008
From: arnarbi at gmail.com (Arnar Birgisson)
Date: Thu, 18 Dec 2008 01:33:35 +0000
Subject: [Python-Dev] Atomic instructions for reference count
	increment/decrement
Message-ID: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com>

Hi all,

I'm new here, so bear with me. I tried googling this, but the closest
I came up with was a post from 2000.

>From the discussion about getting rid of the GIL lately, what I read
from it is that reference counting is the main obstacle. My question
is, why aren't hardware supported atomic increments and decrements
being used for the reference counters? As far as I'm told they are
available on most modern platforms (on x86 it is the LOCK instruction
prefix) and these incur little overhead.

I'd be very happy with pointers to previous discussion on the matter
or simple arguments why this would not apply to the Python reference
counting mechanism.

cheers,
Arnar

From daniel at stutzbachenterprises.com  Thu Dec 18 04:18:26 2008
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Wed, 17 Dec 2008 21:18:26 -0600
Subject: [Python-Dev] Atomic instructions for reference count
	increment/decrement
In-Reply-To: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com>
References: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com>
Message-ID: <eae285400812171918h356749abv2411b0a4d320e0a7@mail.gmail.com>

On Wed, Dec 17, 2008 at 7:33 PM, Arnar Birgisson <arnarbi at gmail.com> wrote:

> >From the discussion about getting rid of the GIL lately, what I read
> from it is that reference counting is the main obstacle. My question
> is, why aren't hardware supported atomic increments and decrements
> being used for the reference counters?


As far as I'm told they are
> available on most modern platforms (on x86 it is the LOCK instruction
> prefix)


True.


> and these incur little overhead.


False, due to the costs of maintaining cache coherency.

I'd be very happy with pointers to previous discussion on the matter
> or simple arguments why this would not apply to the Python reference
> counting mechanism.
>

Adam Olsen actually tried it.  See:
http://mail.python.org/pipermail/python-dev/2007-September/074645.html

Other message in that thread describe the problem in more detail.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081217/165cae96/attachment.htm>

From p.f.moore at gmail.com  Thu Dec 18 12:47:37 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 18 Dec 2008 11:47:37 +0000
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <494982B4.5040602@canterbury.ac.nz>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
	<49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz>
Message-ID: <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>

2008/12/17 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Nick Coghlan wrote:
>
>> Actually, I believe 3.0 already took a big step towards allowing this by
>> changing the way modules are initialised.
>
> It's a step, but I wouldn't call it a big one. There are
> many other problems to be solved before fully independent
> interpreters are possible.

Do you know if these remaining problems are listed anywhere? AIUI,
certain software (for example mod_python) has been using multiple
interpreters for a long while now - admittedly not without issues, but
certainly enough to imply that multiple interpreters are at least
"possible" - although not perfect. Experience with such software would
probably be a great guide to where the issues exist.

Maybe a page on the Python Wiki, or a FAQ entry, would be useful here.
If only to make things explicit, and clear up some of the FUD around
multiple interpreters.

Paul.

From jeremy at alum.mit.edu  Thu Dec 18 14:22:29 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 18 Dec 2008 08:22:29 -0500
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
	<e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
	<ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>
Message-ID: <e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>

On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
> The inheritance from io.RawIOBase seems fine.

There is a small problem with the interaction between HTTPResponse and
RawIOBase, but I think the problem is more on the http side.  You may
recall that the HTTP code has a habit of closing the connection for
you.  In a variety of cases, once you've read the last bytes of the
response, the HTTPResponse object calls its own close() method.  This
interacts poorly with RawIOBase, because it raises a ValueError for
any operation on a closed io object.  This prevents iterators from
working correctly.  The iterator implementation expects the final call
to readline() to return an empty string and converts that to a
StopIteration.  Instead, it's seeing a ValueError that propagates out.

It's always been odd to me that the connection closed itself.  It's
going to be tricky to fix the current bug (chunked responses) and keep
the self-closing behavior, but I worry that change the self-closing
behavior too dramatically isn't appropriate for a bug fix.  Will look
some more at this tomorrow.

Jeremy

> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>> I have a patch that appears to fix this bug
>> http://bugs.python.org/file12361/urllib-chunked.diff
>> but I'm not sure about its interaction with the io module and
>> RawIOBase.  Is there a new IO expert who could take a look at it for
>> me?
>>
>> Jeremy
>>
>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>> This bug is pretty serious, because urllib will insert garbage into
>>> the application-visible data for a chunked response.  It simply
>>> ignores the fact that it's reading a chunked response and includes the
>>> chunked header data is payload data.  The original bug was reported in
>>> September, but no one noticed it.  It was reported again recently.
>>>
>>> http://bugs.python.org/issue3761
>>> http://bugs.python.org/issue4631
>>>
>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>> that's not my call.
>>>
>>> Jeremy
>>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>

From guido at python.org  Thu Dec 18 18:27:42 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 18 Dec 2008 09:27:42 -0800
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
	<e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
	<ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>
	<e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>
Message-ID: <ca471dc20812180927g226c9080q3a796959656d4792@mail.gmail.com>

It sounds like the self-closing is an implementation detail, meant to
make sure the socket is closed as early as possible (which I suppose
is a good thing if there's a server waiting for the final ACK on the
other side). Perhaps it should not use close() but something slightly
lower level that affects the socket directly?

--Guido van Rossum (home page: http://www.python.org/~guido/)



On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
>> The inheritance from io.RawIOBase seems fine.
>
> There is a small problem with the interaction between HTTPResponse and
> RawIOBase, but I think the problem is more on the http side.  You may
> recall that the HTTP code has a habit of closing the connection for
> you.  In a variety of cases, once you've read the last bytes of the
> response, the HTTPResponse object calls its own close() method.  This
> interacts poorly with RawIOBase, because it raises a ValueError for
> any operation on a closed io object.  This prevents iterators from
> working correctly.  The iterator implementation expects the final call
> to readline() to return an empty string and converts that to a
> StopIteration.  Instead, it's seeing a ValueError that propagates out.
>
> It's always been odd to me that the connection closed itself.  It's
> going to be tricky to fix the current bug (chunked responses) and keep
> the self-closing behavior, but I worry that change the self-closing
> behavior too dramatically isn't appropriate for a bug fix.  Will look
> some more at this tomorrow.
>
> Jeremy
>
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
>>
>>
>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>> I have a patch that appears to fix this bug
>>> http://bugs.python.org/file12361/urllib-chunked.diff
>>> but I'm not sure about its interaction with the io module and
>>> RawIOBase.  Is there a new IO expert who could take a look at it for
>>> me?
>>>
>>> Jeremy
>>>
>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>> This bug is pretty serious, because urllib will insert garbage into
>>>> the application-visible data for a chunked response.  It simply
>>>> ignores the fact that it's reading a chunked response and includes the
>>>> chunked header data is payload data.  The original bug was reported in
>>>> September, but no one noticed it.  It was reported again recently.
>>>>
>>>> http://bugs.python.org/issue3761
>>>> http://bugs.python.org/issue4631
>>>>
>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>>> that's not my call.
>>>>
>>>> Jeremy
>>>>
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>>
>

From janssen at parc.com  Thu Dec 18 19:12:50 2008
From: janssen at parc.com (Bill Janssen)
Date: Thu, 18 Dec 2008 10:12:50 PST
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
	<e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
	<ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>
	<e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>
Message-ID: <8926.1229623970@parc.com>

Jeremy Hylton <jeremy at alum.mit.edu> wrote:

> but I worry that change the self-closing
> behavior too dramatically isn't appropriate for a bug fix.  Will look
> some more at this tomorrow.

Reading through the code, it looks like you've already fixed bug 1348.

Thanks!

Bill

From jeremy at alum.mit.edu  Thu Dec 18 20:10:28 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu, 18 Dec 2008 14:10:28 -0500
Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
In-Reply-To: <ca471dc20812180927g226c9080q3a796959656d4792@mail.gmail.com>
References: <e8bf7a530812142006k38737e41m236030b7da6a432b@mail.gmail.com>
	<e8bf7a530812151119i6531322bld942551e669043a9@mail.gmail.com>
	<ca471dc20812171005v6497edebj29cd8ae96438378@mail.gmail.com>
	<e8bf7a530812180522s11ae2e4cib4a3406493de2dd5@mail.gmail.com>
	<ca471dc20812180927g226c9080q3a796959656d4792@mail.gmail.com>
Message-ID: <e8bf7a530812181110y789ce124nae52d3881597a47e@mail.gmail.com>

On Thu, Dec 18, 2008 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:
> It sounds like the self-closing is an implementation detail, meant to
> make sure the socket is closed as early as possible (which I suppose
> is a good thing if there's a server waiting for the final ACK on the
> other side). Perhaps it should not use close() but something slightly
> lower level that affects the socket directly?

That's what I'm thinking, too.  I had 10 minutes last night after the
kids went to bed, and my first attempt didn't work :-).

Jeremy

>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
>>> The inheritance from io.RawIOBase seems fine.
>>
>> There is a small problem with the interaction between HTTPResponse and
>> RawIOBase, but I think the problem is more on the http side.  You may
>> recall that the HTTP code has a habit of closing the connection for
>> you.  In a variety of cases, once you've read the last bytes of the
>> response, the HTTPResponse object calls its own close() method.  This
>> interacts poorly with RawIOBase, because it raises a ValueError for
>> any operation on a closed io object.  This prevents iterators from
>> working correctly.  The iterator implementation expects the final call
>> to readline() to return an empty string and converts that to a
>> StopIteration.  Instead, it's seeing a ValueError that propagates out.
>>
>> It's always been odd to me that the connection closed itself.  It's
>> going to be tricky to fix the current bug (chunked responses) and keep
>> the self-closing behavior, but I worry that change the self-closing
>> behavior too dramatically isn't appropriate for a bug fix.  Will look
>> some more at this tomorrow.
>>
>> Jeremy
>>
>>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>>
>>>
>>>
>>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>> I have a patch that appears to fix this bug
>>>> http://bugs.python.org/file12361/urllib-chunked.diff
>>>> but I'm not sure about its interaction with the io module and
>>>> RawIOBase.  Is there a new IO expert who could take a look at it for
>>>> me?
>>>>
>>>> Jeremy
>>>>
>>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>>> This bug is pretty serious, because urllib will insert garbage into
>>>>> the application-visible data for a chunked response.  It simply
>>>>> ignores the fact that it's reading a chunked response and includes the
>>>>> chunked header data is payload data.  The original bug was reported in
>>>>> September, but no one noticed it.  It was reported again recently.
>>>>>
>>>>> http://bugs.python.org/issue3761
>>>>> http://bugs.python.org/issue4631
>>>>>
>>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>>>> that's not my call.
>>>>>
>>>>> Jeremy
>>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> http://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>
>>
>

From greg.ewing at canterbury.ac.nz  Thu Dec 18 23:52:34 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 19 Dec 2008 11:52:34 +1300
Subject: [Python-Dev] The endless GIL debate: why not remove thread
 support instead?
In-Reply-To: <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
	<49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz>
	<79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>
Message-ID: <494AD432.5060406@canterbury.ac.nz>

Paul Moore wrote:
> Do you know if these remaining problems are listed anywhere?

There was a big discussion about this in comp.lang.python
not long ago. Basically all the built-in types and constants
are shared between interpreters, which means you still need
a GIL to stop different interpreters stepping on each other's
toes.

> AIUI,
> certain software (for example mod_python) has been using multiple
> interpreters for a long while now

Multiple interpeters are possible, they're just not completely
independent. Whether this is a problem depends on the reason
you want multiple interpreters. In the Apache case, it's
probably more about providing virtual Python environments than
free-threading between interpreters.

-- 
Greg

From ncoghlan at gmail.com  Fri Dec 19 00:05:17 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 19 Dec 2008 09:05:17 +1000
Subject: [Python-Dev] The endless GIL debate: why not remove thread
 support instead?
In-Reply-To: <494AD432.5060406@canterbury.ac.nz>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>	<49423856.30705@gmail.com>
	<494982B4.5040602@canterbury.ac.nz>	<79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>
	<494AD432.5060406@canterbury.ac.nz>
Message-ID: <494AD72D.8050208@gmail.com>

Greg Ewing wrote:
> Paul Moore wrote:
>> Do you know if these remaining problems are listed anywhere?
> 
> There was a big discussion about this in comp.lang.python
> not long ago. Basically all the built-in types and constants
> are shared between interpreters, which means you still need
> a GIL to stop different interpreters stepping on each other's
> toes.

That kind of thing is under the core's control though - the 2.x module
initialisation problem means that you can't write a multiple interpreter
friendly extension module even if you want to.

The new per-interpreter state mechanism could also be used internally by
the core to duplicate some of that global state for each new interpreter.

I see the introduction of the interpreter specific state mechanism as a
big step because it provides an underlying mechanism that makes the
problem solvable *in principle* through a combination of per-interpreter
state and finer grained shared locking, making it just a practical
implementation problem to see if that can be done without adversely
impacting single interpreter performance.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From p.f.moore at gmail.com  Fri Dec 19 00:18:07 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 18 Dec 2008 23:18:07 +0000
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <494AD432.5060406@canterbury.ac.nz>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>
	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>
	<49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz>
	<79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>
	<494AD432.5060406@canterbury.ac.nz>
Message-ID: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com>

2008/12/18 Greg Ewing <greg.ewing at canterbury.ac.nz>:
> Paul Moore wrote:
>>
>> Do you know if these remaining problems are listed anywhere?
>
> There was a big discussion about this in comp.lang.python
> not long ago. Basically all the built-in types and constants
> are shared between interpreters, which means you still need
> a GIL to stop different interpreters stepping on each other's
> toes.
>
>> AIUI,
>> certain software (for example mod_python) has been using multiple
>> interpreters for a long while now
>
> Multiple interpeters are possible, they're just not completely
> independent. Whether this is a problem depends on the reason
> you want multiple interpreters. In the Apache case, it's
> probably more about providing virtual Python environments than
> free-threading between interpreters.

OK, but how close is it to providing isolation for threads running
under the control of the GIL? I'm thinking of something along the
lines of an in-process version of fork(), which spawns a new
interpreter and runs the 2 interpreters as threads, still using the
GIL to enforce serialisation, but otherwise independent. I believe
that Perl uses this model for its "interpreter threads"
implementation.

Paul.

From lists at cheimes.de  Fri Dec 19 00:28:13 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 19 Dec 2008 00:28:13 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove thread
	support instead?
In-Reply-To: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>	<49423856.30705@gmail.com>
	<494982B4.5040602@canterbury.ac.nz>	<79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>	<494AD432.5060406@canterbury.ac.nz>
	<79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com>
Message-ID: <giemac$8e8$1@ger.gmane.org>

Paul Moore schrieb:
> OK, but how close is it to providing isolation for threads running
> under the control of the GIL? I'm thinking of something along the
> lines of an in-process version of fork(), which spawns a new
> interpreter and runs the 2 interpreters as threads, still using the
> GIL to enforce serialisation, but otherwise independent. I believe
> that Perl uses this model for its "interpreter threads"
> implementation.

How is your idea different from subinterpreters? Today you can have
multiple subinterpreters inside a single process. Each subinterpreter
has its own state and can see only its own objects.

Christian


From martin at v.loewis.de  Fri Dec 19 00:55:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 19 Dec 2008 00:55:19 +0100
Subject: [Python-Dev] The endless GIL debate: why not remove
 thread	support instead?
In-Reply-To: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com>
References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no>	<79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com>	<49423856.30705@gmail.com>
	<494982B4.5040602@canterbury.ac.nz>	<79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com>	<494AD432.5060406@canterbury.ac.nz>
	<79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com>
Message-ID: <494AE2E7.2080401@v.loewis.de>

> OK, but how close is it to providing isolation for threads running
> under the control of the GIL? 

They won't be indedepent. If an extension module has a global variable,
that will be shared across interpreters. If that variable supports
modifiable state, such modifications will "leak" across interpreters.

For example, there will be only a single object class. With that in
mind, take a look at object.__subclasses__(); it would provide access
to all classes, including those in the other interpreters. Likewise,
gc.get_objects() will give you the complete list of all objects. So
the isolation is not strong enough to run untrusted code isolated
from other code.

Regards,
Martin

From kristjan at ccpgames.com  Fri Dec 19 11:25:00 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 19 Dec 2008 10:25:00 +0000
Subject: [Python-Dev] try/except in io.py
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>

Greetings!

Yesterday, I committed revision r67843 to py3k.
Re-enablign the windows CRT runtime checks showed me that close() was beeing called with an invalid file descriptor.
Now, the problem was was in tokenizer.c, but the reason this wasn't caught earlier was,

1)      Incorrect error checking for close() in _fileio.c, which I fixed, and

2)      Line 384 in io.py, where all exceptions are caught for self.close().

Fixing 1 and patching 2 would bring the problem to light when running the test_imp.py part of the testsuite and, indeed, applying the fix to tokenizer.c would then remove it again.

I am a bit worried about 2) thoug.  I didn't modify that, but having a catch all clause just to be clean on system exit seems shaky to me.   I wonder, is there a way to make such behaviour, if it is indeed necessary, just to be active when exit is in progress?

Something like:
try:
                self.close()
except:
                try:
                               if not sys.exiting(): raise
                except:
                               pass


Or better yet, do as we have done often here, just catch the particular problem that occurs during shutdown, most often name error:
try:
                self.close()
except (AttributeError, NameError):
                pass


What do you think?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081219/d4167657/attachment-0001.htm>

From amauryfa at gmail.com  Fri Dec 19 11:49:00 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Fri, 19 Dec 2008 11:49:00 +0100
Subject: [Python-Dev] try/except in io.py
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>
Message-ID: <e27efe130812190249g24b74772t170544fe88ebe1f7@mail.gmail.com>

Hello,
Kristj?n Valur J?nsson wrote:
> Greetings!
>
> Yesterday, I committed revision r67843 to py3k.
>
> Re-enablign the windows CRT runtime checks showed me that close() was beeing
> called with an invalid file descriptor.
>
> Now, the problem was was in tokenizer.c, but the reason this wasn't caught
> earlier was,
>
> 1)      Incorrect error checking for close() in _fileio.c, which I fixed,
> and
>
> 2)      Line 384 in io.py, where all exceptions are caught for self.close().
>
>
>
> Fixing 1 and patching 2 would bring the problem to light when running the
> test_imp.py part of the testsuite and, indeed, applying the fix to
> tokenizer.c would then remove it again.
>
> I am a bit worried about 2) thoug.  I didn't modify that, but having a catch
> all clause just to be clean on system exit seems shaky to me.   I wonder, is
> there a way to make such behaviour, if it is indeed necessary, just to be
> active when exit is in progress?
>
> Something like:
>
> try:
>                 self.close()
> except:
>                 try:
>                                if not sys.exiting(): raise
>                 except:
>                                pass
>
>
> Or better yet, do as we have done often here, just catch the particular
> problem that occurs during shutdown, most often name error:
>
> try:
>                 self.close()
> except (AttributeError, NameError):
>                 pass

I suggest "except Exception": SystemExit and KeyboardInterrupt inherit
from BaseException, not from Exceptions
And close() is likely to raise IOErrors.


-- 
Amaury Forgeot d'Arc

From kristjan at ccpgames.com  Fri Dec 19 11:56:46 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 19 Dec 2008 10:56:46 +0000
Subject: [Python-Dev] try/except in io.py
In-Reply-To: <e27efe130812190249g24b74772t170544fe88ebe1f7@mail.gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>
	<e27efe130812190249g24b74772t170544fe88ebe1f7@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local>



> > try:
> >                 self.close()
> > except:
> >                 try:
> >                                if not sys.exiting(): raise
> >                 except:
> >                                pass
> >
> >
> > Or better yet, do as we have done often here, just catch the particular
> > problem that occurs during shutdown, most often name error:
> >
> > try:
> >                 self.close()
> > except (AttributeError, NameError):
> >                 pass
>
> From: Amaury Forgeot d'Arc [mailto:amauryfa at gmail.com]
> I suggest "except Exception": SystemExit and KeyboardInterrupt inherit
> from BaseException, not from Exceptions
> And close() is likely to raise IOErrors.

Ah, but that is not what the intent is to guard agains, according the comments.
During exit, modules have been deleted and all sorts of things have gone away.
It is therefore likely that code that executes during exit will encounter
NameErrors (when a module is being cleaned up and its globals removed)
And AttributeErrors.
ImportErrors too, in fact.

It would be good to see the actual repro case that caused this to be added in the first place, so that we could selectively catch those errors.

Kristj?n

From ncoghlan at gmail.com  Fri Dec 19 14:50:37 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 19 Dec 2008 23:50:37 +1000
Subject: [Python-Dev] try/except in io.py
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local>
References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>	<e27efe130812190249g24b74772t170544fe88ebe1f7@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local>
Message-ID: <494BA6AD.2090208@gmail.com>

Kristj?n Valur J?nsson wrote:
> Ah, but that is not what the intent is to guard agains, according the
> comments. During exit, modules have been deleted and all sorts of
> things have gone away. It is therefore likely that code that executes
> during exit will encounter NameErrors (when a module is being cleaned
> up and its globals removed) And AttributeErrors. ImportErrors too, in
> fact.
> 
> It would be good to see the actual repro case that caused this to be
> added in the first place, so that we could selectively catch those
> errors.

Generally speaking, close() and __delete__() methods that can be invoked
during interpreter shutdown should avoid referencing module globals at
all. Necessary globals (including members of other modules) should
either be cached on the relevant class or captured in a closure.

Now, it may be that the relevant close() method in io.py touches too
much code for that to be practical, but it certainly isn't the case in
general that encountering Name/Attribute/ImportError during shutdown is
inevitable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From dima at hlabs.spb.ru  Fri Dec 19 15:20:55 2008
From: dima at hlabs.spb.ru (Dmitry Vasiliev)
Date: Fri, 19 Dec 2008 17:20:55 +0300
Subject: [Python-Dev] Py3k: magical dir()
Message-ID: <494BADC7.7040404@hlabs.spb.ru>

Hello!

I think it's a strange behavior:

Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 >>> hash(range(10))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'range'
 >>> dir(range(10))
['__class__', '__delattr__', '__doc__', '__eq__', '__format__', 
'__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', 
'__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', 
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', 
'__setattr__', '__sizeof__', '__str__', '__subclasshook__']
 >>> hash(range(10))
-1211318616
 >>> hash(range(1000))
-1211318472

-- 
Dmitry Vasiliev (dima at hlabs.spb.ru)
   http://hlabs.spb.ru

From lists at cheimes.de  Fri Dec 19 16:02:24 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 19 Dec 2008 16:02:24 +0100
Subject: [Python-Dev] Py3k: magical dir()
In-Reply-To: <494BADC7.7040404@hlabs.spb.ru>
References: <494BADC7.7040404@hlabs.spb.ru>
Message-ID: <gigd1v$uqk$1@ger.gmane.org>

Dmitry Vasiliev schrieb:
> Hello!
> 
> I think it's a strange behavior:
> 
> Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32)
> [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> hash(range(10))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: unhashable type: 'range'
>>>> dir(range(10))
> ['__class__', '__delattr__', '__doc__', '__eq__', '__format__',
> '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__',
> '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__',
> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__',
> '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
>>>> hash(range(10))
> -1211318616
>>>> hash(range(1000))
> -1211318472

Yes, it is. I'm able to reproduce the problem.

Christian


From eric at trueblade.com  Fri Dec 19 16:22:06 2008
From: eric at trueblade.com (Eric Smith)
Date: Fri, 19 Dec 2008 10:22:06 -0500
Subject: [Python-Dev] Py3k: magical dir()
In-Reply-To: <gigd1v$uqk$1@ger.gmane.org>
References: <494BADC7.7040404@hlabs.spb.ru> <gigd1v$uqk$1@ger.gmane.org>
Message-ID: <494BBC1E.1090209@trueblade.com>

Christian Heimes wrote:
> Dmitry Vasiliev schrieb:
>> Hello!
>>
>> I think it's a strange behavior:
>>
>> Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32)
>> [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> hash(range(10))
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: unhashable type: 'range'
>>>>> dir(range(10))
>> ['__class__', '__delattr__', '__doc__', '__eq__', '__format__',
>> '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__',
>> '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__',
>> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__',
>> '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
>>>>> hash(range(10))
>> -1211318616
>>>>> hash(range(1000))
>> -1211318472
> 
> Yes, it is. I'm able to reproduce the problem.

It's not just dir(). Same behavior with help():

Python 3.1a0 (py3k:67856, Dec 19 2008, 10:18:03)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> hash(range(10))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'range'
[43173 refs]
 >>> help(range(10))

[77213 refs]
 >>> hash(range(10))
5041912
[77215 refs]
 >>>

From ggpolo at gmail.com  Fri Dec 19 16:23:55 2008
From: ggpolo at gmail.com (Guilherme Polo)
Date: Fri, 19 Dec 2008 13:23:55 -0200
Subject: [Python-Dev] Py3k: magical dir()
In-Reply-To: <494BADC7.7040404@hlabs.spb.ru>
References: <494BADC7.7040404@hlabs.spb.ru>
Message-ID: <ac2200130812190723o2e26787fgf4f86790cd83b5b0@mail.gmail.com>

On Fri, Dec 19, 2008 at 12:20 PM, Dmitry Vasiliev <dima at hlabs.spb.ru> wrote:
> Hello!
>
> I think it's a strange behavior:
>
> Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32)
> [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> hash(range(10))
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: unhashable type: 'range'
>>>> dir(range(10))
> ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__',
> '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__',
> '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__',
> '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__',
> '__sizeof__', '__str__', '__subclasshook__']
>>>> hash(range(10))
> -1211318616
>>>> hash(range(1000))
> -1211318472
>

There are other ways to reproduce it without using dir, like
range(10).__class__; hash(range(10))

Is there some reason no set tp_hash for rangeobject to
PyObject_HashNotImplemented ?

> --
> Dmitry Vasiliev (dima at hlabs.spb.ru)
>  http://hlabs.spb.ru



-- 
-- Guilherme H. Polo Goncalves

From hagenf at CoLi.Uni-SB.DE  Fri Dec 19 16:27:48 2008
From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Fri, 19 Dec 2008 16:27:48 +0100
Subject: [Python-Dev] Py3k: magical dir()
In-Reply-To: <ac2200130812190723o2e26787fgf4f86790cd83b5b0@mail.gmail.com>
References: <494BADC7.7040404@hlabs.spb.ru>
	<ac2200130812190723o2e26787fgf4f86790cd83b5b0@mail.gmail.com>
Message-ID: <494BBD74.7030605@coli.uni-saarland.de>

> Is there some reason no set tp_hash for rangeobject to
> PyObject_HashNotImplemented ?

http://bugs.python.org/issue4701

- Hagen


From status at bugs.python.org  Fri Dec 19 18:06:43 2008
From: status at bugs.python.org (Python tracker)
Date: Fri, 19 Dec 2008 18:06:43 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20081219170643.2994C7857C@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (12/12/08 - 12/19/08)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2266 open (+37) / 14258 closed (+20) / 16524 total (+57)

Open issues with patches:   762

Average duration of open issues: 704 days.
Median duration of open issues: 2530 days.

Open Issues Breakdown
   open  2248 (+37)
pending    18 ( +0)

Issues Created Or Reopened (58)
_______________________________

Doctest module does not work with zipped packages                12/15/08
CLOSED http://bugs.python.org/issue4197    reopened ncoghlan                  
       patch                                                                   

configparser DEFAULT                                             12/12/08
CLOSED http://bugs.python.org/issue4645    created  shawn.ashlee              
                                                                               

distutils chokes on empty options arg in the setup function      12/12/08
       http://bugs.python.org/issue4646    created  theller                   
       patch, patch                                                            

Builtin parser module fails to parse relative imports            12/12/08
CLOSED http://bugs.python.org/issue4647    created  schluehk                  
                                                                               

Fix n//x to n/x in the Docs                                      12/12/08
CLOSED http://bugs.python.org/issue4648    created  Retro                     
                                                                               

Fix a+b to a + b                                                 12/13/08
CLOSED http://bugs.python.org/issue4649    created  Retro                     
                                                                               

getopt need re-factor...                                         12/13/08
       http://bugs.python.org/issue4650    created  wangchun                  
                                                                               

getopt need re-factor...                                         12/13/08
CLOSED http://bugs.python.org/issue4651    created  wangchun                  
                                                                               

IDLE does not work with Unicode                                  12/13/08
       http://bugs.python.org/issue4652    created  zzyzx                     
                                                                               

Patch to fix typos for Py3K                                      12/13/08
       http://bugs.python.org/issue4653    created  typo.pl                   
                                                                               

os.path.realpath() get the wrong result                          12/13/08
       http://bugs.python.org/issue4654    created  dirlt                     
                                                                               

during Python installation, setup.py should not use .pydistutils 12/14/08
       http://bugs.python.org/issue4655    created  jah                       
                                                                               

Python 3 tutorial has old information about dicts                12/14/08
CLOSED http://bugs.python.org/issue4656    created  mdcowles                  
                                                                               

Doctest gets line numbers wrongs with <> in name                 12/14/08
       http://bugs.python.org/issue4657    created  ncoghlan                  
                                                                               

missing closing bracket in Functional Programming HOWTO          12/14/08
CLOSED http://bugs.python.org/issue4658    created  bgeron                    
                                                                               

compilation warning in Modules/zipimport.c                       12/14/08
       http://bugs.python.org/issue4659    created  pitrou                    
                                                                               

multiprocessing.JoinableQueue task_done() issue                  12/14/08
       http://bugs.python.org/issue4660    created  merrellb                  
                                                                               

email.parser: impossible to read messages encoded in a different 12/14/08
       http://bugs.python.org/issue4661    created  dato                      
                                                                               

posix module lacks several DeprecationWarning's                  12/14/08
       http://bugs.python.org/issue4662    created  mishok13                  
       patch                                                                   

Increase TextIOWrapper._CHUNK_SIZE                               12/14/08
CLOSED http://bugs.python.org/issue4663    created  pitrou                    
                                                                               

Regression fix_imports does not refactor multiple imports correc 12/14/08
CLOSED http://bugs.python.org/issue4664    created  lregebro                  
                                                                               

Failure to compile trunk on Solaris10/SPARC using C++ compiler   12/14/08
CLOSED http://bugs.python.org/issue4665    created  skip.montanaro            
                                                                               

test_bad_address in test_urllib2_localnet often fails            12/14/08
CLOSED http://bugs.python.org/issue4666    created  pitrou                    
                                                                               

Patch with a couple of 2.0isms in tutorial                       12/14/08
CLOSED http://bugs.python.org/issue4667    created  sgala                     
       patch                                                                   

examples in the functional howto are not consistent with 3.X beh 12/14/08
CLOSED http://bugs.python.org/issue4668    created  sgala                     
       patch                                                                   

bytes,join and bytearray.join not in manual; help for bytes.join 12/15/08
       http://bugs.python.org/issue4669    created  sjmachin                  
                                                                               

setup.py exception when db_setup_debug = True                    12/15/08
       http://bugs.python.org/issue4670    created  djmdjm                    
                                                                               

pydoc executes the code to be documented                         12/15/08
       http://bugs.python.org/issue4671    created  Jim_C                     
                                                                               

Distutils SWIG support blocks use of SWIG -outdir option         12/15/08
       http://bugs.python.org/issue4672    created  andybuckley               
                                                                               

Distutils should provide an uninstall command                    12/15/08
       http://bugs.python.org/issue4673    created  andybuckley               
                                                                               

test_normalization failures on some buildbot                     12/16/08
CLOSED http://bugs.python.org/issue4674    created  pitrou                    
                                                                               

urllib's splitpasswd does not accept newline chars in passwords  12/16/08
       http://bugs.python.org/issue4675    created  mibanescu                 
       patch                                                                   

python3 closes + home keys                                       12/16/08
       http://bugs.python.org/issue4676    created  Somelauw                  
                                                                               

a list comprehensions tests for pybench                          12/16/08
       http://bugs.python.org/issue4677    created  pitrou                    
       patch                                                                   

Unicode: multiple chars for high code points                     12/16/08
CLOSED http://bugs.python.org/issue4678    created  ede                       
                                                                               

Fork + shelve causes shelve corruption and backtrace             12/16/08
       http://bugs.python.org/issue4679    created  calmofthestorm            
                                                                               

deque class should include high-water mark                       12/17/08
CLOSED http://bugs.python.org/issue4680    created  roysmith                  
                                                                               

mmap offset should be off_t instead of ssize_t, and size calcula 12/17/08
       http://bugs.python.org/issue4681    created  saa                       
       patch                                                                   

'b' formatter is actually unsigned char                          12/17/08
       http://bugs.python.org/issue4682    created  vt                        
                                                                               

urllib2.HTTPDigestAuthHandler fails on third hostname?           12/17/08
       http://bugs.python.org/issue4683    created  cmb                       
                                                                               

sys.exit() exits program when non-daemonic threads are still run 12/17/08
       http://bugs.python.org/issue4684    created  eggy                      
                                                                               

IDLE will not open (2.6.1 on WinXP pro)                          12/17/08
       http://bugs.python.org/issue4685    created  Yo                        
                                                                               

Exceptions in ConfigParser don't set .args                       12/17/08
       http://bugs.python.org/issue4686    created  beazley                   
                                                                               

GC stats not accurate because of debug overhead                  12/17/08
       http://bugs.python.org/issue4687    created  pitrou                    
       patch                                                                   

GC optimization: don't track simple tuples and dicts             12/17/08
       http://bugs.python.org/issue4688    created  pitrou                    
       patch                                                                   

Typo in PyObjC URL on "GUI Programming on the Mac"               12/17/08
       http://bugs.python.org/issue4689    created  mevans                    
                                                                               

asyncore calls handle_write() on closed sockets when use_poll=Tr 12/18/08
       http://bugs.python.org/issue4690    created  forest                    
                                                                               

IDLE Code Caching Windows                                        12/18/08
CLOSED http://bugs.python.org/issue4691    created  brandon.dixon             
                                                                               

Framework build fails if OS X on case-sensitive file system      12/18/08
CLOSED http://bugs.python.org/issue4692    created  nad                       
       patch                                                                   

Idle for Python 3.0 is default even without doing make fullinsta 12/18/08
       http://bugs.python.org/issue4693    created  orsenthil                 
                                                                               

_call_method() in multiprocessing documentation                  12/18/08
CLOSED http://bugs.python.org/issue4694    created  beazley                   
                                                                               

Bad AF_PIPE address in multiprocessing documentation             12/18/08
       http://bugs.python.org/issue4695    created  beazley                   
                                                                               

email module does not fold headers                               12/18/08
       http://bugs.python.org/issue4696    created  bromine                   
       patch                                                                   

Clarification needed for subprocess convenience functions in Pyt 12/18/08
       http://bugs.python.org/issue4697    created  Erik Sternerson           
                                                                               

Solaris buildbot failure on trunk in test_hostshot               12/18/08
       http://bugs.python.org/issue4698    created  pitrou                    
                                                                               

Typo in documentation of "signal"                                12/19/08
CLOSED http://bugs.python.org/issue4699    created  yam850                    
                                                                               

UnicodeEncodeError in license()                                  12/19/08
       http://bugs.python.org/issue4700    created  mnewman                   
                                                                               

range objects becomes hashable after attribute access            12/19/08
       http://bugs.python.org/issue4701    created  hagen                     
       patch                                                                   



Issues Now Closed (54)
______________________

Thread local storage and PyGILState_* mucked up by os.fork()       15 days
       http://bugs.python.org/issue1683    loewis                    
                                                                               

optimize list comprehensions                                      297 days
       http://bugs.python.org/issue2183    pitrou                    
       patch, patch                                                            

gc.DEBUG_STATS reports invalid "elapsed" times                    269 days
       http://bugs.python.org/issue2467    pitrou                    
       patch                                                                   

create a numbits() method for int and long types                  148 days
       http://bugs.python.org/issue3439    marketdickinson           
       patch, needs review                                                     

use string_print() in gdb                                         116 days
       http://bugs.python.org/issue3632    amaury.forgeotdarc        
       patch                                                                   

urllib.request and urllib.response cannot handle HTTP1.1 chunked  103 days
       http://bugs.python.org/issue3761    jhylton                   
                                                                               

2.6rc1: test_threading hangs on FreeBSD 6.3 i386                   90 days
       http://bugs.python.org/issue3863    loewis                    
       patch                                                                   

_hotshot: invalid error control in logreader()                     83 days
       http://bugs.python.org/issue3954    amaury.forgeotdarc        
       patch                                                                   

__main__.__file__ not set correctly when -m switch gets __main__   67 days
       http://bugs.python.org/issue4082    ncoghlan                  
                                                                               

textwrap wordsep_re Unicode                                        53 days
       http://bugs.python.org/issue4163    pitrou                    
       patch                                                                   

Doctest module does not work with zipped packages                   0 days
       http://bugs.python.org/issue4197    ncoghlan                  
       patch                                                                   

Pdb cannot access source code in zipped packages.                  51 days
       http://bugs.python.org/issue4201    ncoghlan                  
       patch                                                                   

inspect.getsource doesn't work on functions imported from a zipf   47 days
       http://bugs.python.org/issue4223    ncoghlan                  
                                                                               

cycle created by profile.run                                       40 days
       http://bugs.python.org/issue4273    darrenr                   
                                                                               

[2.5 regression] ctypes fails to build on arm-linux-gnu            31 days
       http://bugs.python.org/issue4303    loewis                    
                                                                               

(Tkinter) Please backport these                                    26 days
       http://bugs.python.org/issue4342    loewis                    
                                                                               

A bug in ncurses.h still exists in FreeBSD 4.9 - 4.11              23 days
       http://bugs.python.org/issue4368    loewis                    
       patch                                                                   

Distutils Metadata Documentation Missing "platforms" Keyword       18 days
       http://bugs.python.org/issue4446    georg.brandl              
       patch                                                                   

CVE-2008-5031 multiple integer overflows                           13 days
       http://bugs.python.org/issue4469    loewis                    
                                                                               

Speed up PyEval_EvalFrameEx when tracing is off.                   12 days
       http://bugs.python.org/issue4477    jyasskin                  
       patch                                                                   

logging module __init__ uses has_key                                9 days
       http://bugs.python.org/issue4523    benjamin.peterson         
       patch                                                                   

Registry key not set if unattended installation used               13 days
       http://bugs.python.org/issue4567    loewis                    
                                                                               

Improved optparse "varargs" callback example                        9 days
       http://bugs.python.org/issue4568    georg.brandl              
       patch                                                                   

reading UTF16-encoded text file crashes if \r on 64-char boundar    7 days
       http://bugs.python.org/issue4574    pitrou                    
       patch                                                                   

compiler: -3 warnings                                               8 days
       http://bugs.python.org/issue4578    georg.brandl              
       patch                                                                   

segfault when mutating memoryview to array.array when array is r   11 days
       http://bugs.python.org/issue4583    pitrou                    
       patch, needs review                                                     

new types example is out of date                                    7 days
       http://bugs.python.org/issue4595    georg.brandl              
                                                                               

3.0 document tab interpretation change                              6 days
       http://bugs.python.org/issue4603    georg.brandl              
                                                                               

3.0 documentation mentions using maketrans from within the strin    4 days
       http://bugs.python.org/issue4605    benjamin.peterson         
                                                                               

Small error in "Extending Python with C or C++"                     6 days
       http://bugs.python.org/issue4611    georg.brandl              
                                                                               

tarfile does not set the creation date and time of the extracted    3 days
       http://bugs.python.org/issue4616    lars.gustaebel            
                                                                               

optparse - dosn't distinguish between '--option' and '-option'      0 days
       http://bugs.python.org/issue4641    marketdickinson           
                                                                               

optparse - dosn't distinguish between '--option' and '-option'      0 days
       http://bugs.python.org/issue4642    marketdickinson           
                                                                               

Minor documentation fault in 2to3 script                            0 days
       http://bugs.python.org/issue4644    benjamin.peterson         
                                                                               

configparser DEFAULT                                                2 days
       http://bugs.python.org/issue4645    loewis                    
                                                                               

Builtin parser module fails to parse relative imports               0 days
       http://bugs.python.org/issue4647    benjamin.peterson         
                                                                               

Fix n//x to n/x in the Docs                                         0 days
       http://bugs.python.org/issue4648    rhettinger                
                                                                               

Fix a+b to a + b                                                    1 days
       http://bugs.python.org/issue4649    gvanrossum                
                                                                               

getopt need re-factor...                                            0 days
       http://bugs.python.org/issue4651    gvanrossum                
                                                                               

Python 3 tutorial has old information about dicts                   0 days
       http://bugs.python.org/issue4656    benjamin.peterson         
                                                                               

missing closing bracket in Functional Programming HOWTO             0 days
       http://bugs.python.org/issue4658    benjamin.peterson         
                                                                               

Increase TextIOWrapper._CHUNK_SIZE                                  1 days
       http://bugs.python.org/issue4663    pitrou                    
                                                                               

Regression fix_imports does not refactor multiple imports correc    0 days
       http://bugs.python.org/issue4664    benjamin.peterson         
                                                                               

Failure to compile trunk on Solaris10/SPARC using C++ compiler      1 days
       http://bugs.python.org/issue4665    loewis                    
                                                                               

test_bad_address in test_urllib2_localnet often fails               1 days
       http://bugs.python.org/issue4666    pitrou                    
                                                                               

Patch with a couple of 2.0isms in tutorial                          0 days
       http://bugs.python.org/issue4667    georg.brandl              
       patch                                                                   

examples in the functional howto are not consistent with 3.X beh    0 days
       http://bugs.python.org/issue4668    georg.brandl              
       patch                                                                   

test_normalization failures on some buildbot                        0 days
       http://bugs.python.org/issue4674    pitrou                    
                                                                               

Unicode: multiple chars for high code points                        0 days
       http://bugs.python.org/issue4678    lemburg                   
                                                                               

deque class should include high-water mark                          1 days
       http://bugs.python.org/issue4680    tim_one                   
                                                                               

IDLE Code Caching Windows                                           0 days
       http://bugs.python.org/issue4691    amaury.forgeotdarc        
                                                                               

Framework build fails if OS X on case-sensitive file system         0 days
       http://bugs.python.org/issue4692    marketdickinson           
       patch                                                                   

_call_method() in multiprocessing documentation                     0 days
       http://bugs.python.org/issue4694    benjamin.peterson         
                                                                               

Typo in documentation of "signal"                                   0 days
       http://bugs.python.org/issue4699    benjamin.peterson         
                                                                               



Top Issues Most Discussed (10)
______________________________

 49 create a numbits() method for int and long types                 148 days
closed  http://bugs.python.org/issue3439   

 14 Optimize new io library                                           13 days
open    http://bugs.python.org/issue4561   

 13 GC optimization: don't track simple tuples and dicts               2 days
open    http://bugs.python.org/issue4688   

 11 urlopen returns extra, spurious bytes                              8 days
open    http://bugs.python.org/issue4631   

 10 Py_IS_INFINITY defect causes test_cmath failure on x86            12 days
open    http://bugs.python.org/issue4575   

 10 Building a list of tuples has non-linear performance              73 days
open    http://bugs.python.org/issue4074   

 10 optimize list comprehensions                                     297 days
closed  http://bugs.python.org/issue2183   

  8 deque class should include high-water mark                         1 days
closed  http://bugs.python.org/issue4680   

  8 datetime module missing some important methods                   656 days
open    http://bugs.python.org/issue1673409

  7 mmap offset should be off_t instead of ssize_t, and size calcul    2 days
open    http://bugs.python.org/issue4681   




From ziade.tarek at gmail.com  Fri Dec 19 19:55:28 2008
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 19 Dec 2008 19:55:28 +0100
Subject: [Python-Dev] Distutils maintenance
Message-ID: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com>

Hello

I would like to request a commit access to work specifically on
distutils maintenance.

Regards
Tarek

-- 
Tarek Ziad? | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

From musiccomposition at gmail.com  Fri Dec 19 19:59:26 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 19 Dec 2008 12:59:26 -0600
Subject: [Python-Dev] Distutils maintenance
In-Reply-To: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com>
References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com>
Message-ID: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com>

On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> Hello
>
> I would like to request a commit access to work specifically on
> distutils maintenance.

+1

We are currently without an active distutils maintainer, and many
stale distutil tickets are in need of attention I'm sure Tarek could
provide. Tarek has also been providing many useful patches of his own.



-- 
Cheers,
Benjamin

From martin at v.loewis.de  Fri Dec 19 21:45:17 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 19 Dec 2008 21:45:17 +0100
Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3 (final)
Message-ID: <494C07DD.5090702@v.loewis.de>

On behalf of the Python development team and the Python community, I'm
happy to announce the release of Python 2.4.6 and 2.5.3 (final).

2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases
will only include security fixes. According to the release notes, about
80 bugs and patches have been addressed since Python 2.5.2, many of
them improving the stability of the interpreter, and improving its
portability.

Since the release candidate, the only change was an update to the
Macintosh packaging procedure.

2.4.6 includes only a small number of security fixes. Python 2.6 is
the latest version of Python, we're making this release for people who
are still running Python 2.4.

See the release notes at the website (also available as Misc/NEWS in
the source distribution) for details of bugs fixed; most of them prevent
interpreter crashes (and now cause proper Python exceptions in cases
where the interpreter may have crashed before).

For more information on Python 2.4.6 and 2.5.3, including download
links for various platforms, release notes, and known issues, please
see:

    http://www.python.org/2.4.6
    http://www.python.org/2.5.3

Highlights of the previous major Python releases are available
from the Python 2.5 page, at

    http://www.python.org/2.4/highlights.html
    http://www.python.org/2.5/highlights.html

Enjoy this release,
Martin

Martin v. Loewis
martin at v.loewis.de
Python Release Manager
(on behalf of the entire python-dev team)

From kristjan at ccpgames.com  Fri Dec 19 22:00:29 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 19 Dec 2008 21:00:29 +0000
Subject: [Python-Dev] try/except in io.py
In-Reply-To: <494BA6AD.2090208@gmail.com>
References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local>
	<e27efe130812190249g24b74772t170544fe88ebe1f7@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local>
	<494BA6AD.2090208@gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BBF@exchis.ccp.ad.local>

Ok, in this case I move that we remove this try/except and see where it leads us.
If we see problems during teardown, we should deal with them in a more targeted manner.

Kristj?n

-----Original Message-----
From: Nick Coghlan [mailto:ncoghlan at gmail.com] 
Sent: 19. desember 2008 13:51
To: Kristj?n Valur J?nsson
Cc: Amaury Forgeot d'Arc; Python-Dev
Subject: Re: [Python-Dev] try/except in io.py


Generally speaking, close() and __delete__() methods that can be invoked
during interpreter shutdown should avoid referencing module globals at
all. Necessary globals (including members of other modules) should
either be cached on the relevant class or captured in a closure.

Now, it may be that the relevant close() method in io.py touches too
much code for that to be practical, but it certainly isn't the case in
general that encountering Name/Attribute/ImportError during shutdown is
inevitable.


From martin at v.loewis.de  Fri Dec 19 22:20:22 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 19 Dec 2008 22:20:22 +0100
Subject: [Python-Dev] Please test OSX installer
In-Reply-To: <d38f5330812171404w35fc1538o714d70ee8d88b52e@mail.gmail.com>
References: <494969AE.3060805@v.loewis.de>
	<d38f5330812171404w35fc1538o714d70ee8d88b52e@mail.gmail.com>
Message-ID: <494C1016.2000803@v.loewis.de>

> I got a "Problem Report for Python" pop-up.  Skip to "///" for
> "Problem Details".  Interestingly, the test completed with the
> following report:

Thanks for the report. I have tested that with 2.5.2, which fails
in the same way. So this is not a regression, and I have not attempted
to fix it.

Regards,
Martin

From fabiofz at gmail.com  Fri Dec 19 22:43:01 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Fri, 19 Dec 2008 19:43:01 -0200
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
Message-ID: <cfb578b20812191343v3a13b6adx674297a7064eb8aa@mail.gmail.com>

Hi,

I'm currently having problems to get the output of Python 3.0 into the
Eclipse console (integrating it into Pydev).

The problem appears to be that stdout and stderr are not running
unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and
the content only appears to me when a flush() is done or when the
process finishes.

So, in the search of a solution, I found a suggestion from
http://stackoverflow.com/questions/107705/python-output-buffering

to use the following construct:

sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

But that gives the error below in Python 3.0:

    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
  File "D:\bin\Python30\lib\os.py", line 659, in fdopen
    return io.open(fd, *args, **kwargs)
  File "D:\bin\Python30\lib\io.py", line 243, in open
    raise ValueError("can't have unbuffered text I/O")
ValueError: can't have unbuffered text I/O

So, I'd like to know if there's some way I can make it run unbuffered
(to get the output contents without having to flush() after each
write).

Thanks,

Fabio

From brett at python.org  Fri Dec 19 23:03:01 2008
From: brett at python.org (Brett Cannon)
Date: Fri, 19 Dec 2008 14:03:01 -0800
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <cfb578b20812191343v3a13b6adx674297a7064eb8aa@mail.gmail.com>
References: <cfb578b20812191343v3a13b6adx674297a7064eb8aa@mail.gmail.com>
Message-ID: <bbaeab100812191403m2841ddfbm7e48e8d4e5854a97@mail.gmail.com>

On Fri, Dec 19, 2008 at 13:43, Fabio Zadrozny <fabiofz at gmail.com> wrote:
> Hi,
>
> I'm currently having problems to get the output of Python 3.0 into the
> Eclipse console (integrating it into Pydev).
>
> The problem appears to be that stdout and stderr are not running
> unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and
> the content only appears to me when a flush() is done or when the
> process finishes.
>
> So, in the search of a solution, I found a suggestion from
> http://stackoverflow.com/questions/107705/python-output-buffering
>
> to use the following construct:
>
> sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
>
> But that gives the error below in Python 3.0:
>
>    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
>  File "D:\bin\Python30\lib\os.py", line 659, in fdopen
>    return io.open(fd, *args, **kwargs)
>  File "D:\bin\Python30\lib\io.py", line 243, in open
>    raise ValueError("can't have unbuffered text I/O")
> ValueError: can't have unbuffered text I/O
>
> So, I'd like to know if there's some way I can make it run unbuffered
> (to get the output contents without having to flush() after each
> write).

Notice how the exception specifies test I/O cannot be unbuffered. This
restriction does not apply to bytes I/O. Simply open it as 'wb'
instead of 'w' and it works.

-Brett

From ncoghlan at gmail.com  Fri Dec 19 23:15:00 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 20 Dec 2008 08:15:00 +1000
Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter
	startup?
Message-ID: <494C1CE4.5080102@gmail.com>

Some strangeness was recently reported for the range() type in Py3k
where instances are unhashable until an attribute is retrieved from the
range type itself, and then they become hashable. [1]

While there is definitely an associated bug in the range implementation
(it doesn't block inheritance of the default object.__hash__
implementation), there's also the fact that when the interpreter
*starts* the hash implementation hasn't been inherited yet, but it does
get inherited later.

It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of
the builtin types - they're left to have it called implicitly when an
operation using them needs tp_dict filled in.

Such operations (which includes retrieving an attribute from the type
object) will implicitly call PyType_Ready to populate tp_dict, which
also has the side effect of inheriting slot implementations from base
classes.

Is there a specific reason for not fully initialising the builtin types?
Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?

Cheers,
Nick.

[1] http://bugs.python.org/issue4701

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Fri Dec 19 23:18:14 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 20 Dec 2008 08:18:14 +1000
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <bbaeab100812191403m2841ddfbm7e48e8d4e5854a97@mail.gmail.com>
References: <cfb578b20812191343v3a13b6adx674297a7064eb8aa@mail.gmail.com>
	<bbaeab100812191403m2841ddfbm7e48e8d4e5854a97@mail.gmail.com>
Message-ID: <494C1DA6.1080202@gmail.com>

Brett Cannon wrote:
> Notice how the exception specifies test I/O cannot be unbuffered. This
> restriction does not apply to bytes I/O. Simply open it as 'wb'
> instead of 'w' and it works.

s/test/text/ :)

(For anyone else that is like me and skipped over the exception detail
on first reading, thus becoming a little confused...)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From fabiofz at gmail.com  Fri Dec 19 23:20:22 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Fri, 19 Dec 2008 20:20:22 -0200
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <bbaeab100812191403m2841ddfbm7e48e8d4e5854a97@mail.gmail.com>
References: <cfb578b20812191343v3a13b6adx674297a7064eb8aa@mail.gmail.com>
	<bbaeab100812191403m2841ddfbm7e48e8d4e5854a97@mail.gmail.com>
Message-ID: <cfb578b20812191420l14198fc4j449f778fa5c88f7e@mail.gmail.com>

You're right, thanks (guess I'll use that option then).

Now, is it a bug that Python 3.0 doesn't run unbuffered when
specifying -u or PYTHONUNBUFFERED, or was this support dropped?

Thanks,

Fabio

On Fri, Dec 19, 2008 at 8:03 PM, Brett Cannon <brett at python.org> wrote:
> On Fri, Dec 19, 2008 at 13:43, Fabio Zadrozny <fabiofz at gmail.com> wrote:
>> Hi,
>>
>> I'm currently having problems to get the output of Python 3.0 into the
>> Eclipse console (integrating it into Pydev).
>>
>> The problem appears to be that stdout and stderr are not running
>> unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and
>> the content only appears to me when a flush() is done or when the
>> process finishes.
>>
>> So, in the search of a solution, I found a suggestion from
>> http://stackoverflow.com/questions/107705/python-output-buffering
>>
>> to use the following construct:
>>
>> sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
>>
>> But that gives the error below in Python 3.0:
>>
>>    sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
>>  File "D:\bin\Python30\lib\os.py", line 659, in fdopen
>>    return io.open(fd, *args, **kwargs)
>>  File "D:\bin\Python30\lib\io.py", line 243, in open
>>    raise ValueError("can't have unbuffered text I/O")
>> ValueError: can't have unbuffered text I/O
>>
>> So, I'd like to know if there's some way I can make it run unbuffered
>> (to get the output contents without having to flush() after each
>> write).
>
> Notice how the exception specifies test I/O cannot be unbuffered. This
> restriction does not apply to bytes I/O. Simply open it as 'wb'
> instead of 'w' and it works.
>
> -Brett
>

From barry at python.org  Fri Dec 19 23:28:32 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 19 Dec 2008 17:28:32 -0500
Subject: [Python-Dev] Python 3.0.1
Message-ID: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'd like to get Python 3.0.1 out before the end of the year.  There  
are no showstoppers, but I haven't yet looked at the deferred blockers  
or the buildbots.

Do you think we can get 3.0.1 out on December 24th?  Or should we wait  
until after Christmas and get it out, say on the 29th?  Do we need an  
rc?

This question goes mostly to Martin and Georg.  What would work for  
you guys?

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/
DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu
xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor
5n6Ag5zT+e8=
=oU/g
-----END PGP SIGNATURE-----

From bcannon at gmail.com  Fri Dec 19 23:33:38 2008
From: bcannon at gmail.com (bcannon at gmail.com)
Date: Fri, 19 Dec 2008 22:33:38 +0000
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
Message-ID: <0016e64f68207a52a5045e6de625@google.com>

On Dec 19, 2008 2:20pm, Fabio Zadrozny <fabiofz at gmail.com> wrote:
> You're right, thanks (guess I'll use that option then).
>
>
>
> Now, is it a bug that Python 3.0 doesn't run unbuffered when
>
> specifying -u or PYTHONUNBUFFERED, or was this support dropped?
>
>

Well, ``python -h`` still lists it. That means either the output for -h  
needs to be fixed or the feature needs to be supported.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081219/e3e1d91f/attachment.htm>

From ncoghlan at gmail.com  Fri Dec 19 23:42:49 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 20 Dec 2008 08:42:49 +1000
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
Message-ID: <494C2369.5030901@gmail.com>

Barry Warsaw wrote:
> I'd like to get Python 3.0.1 out before the end of the year.  There are
> no showstoppers, but I haven't yet looked at the deferred blockers or
> the buildbots.
> 
> Do you think we can get 3.0.1 out on December 24th?  Or should we wait
> until after Christmas and get it out, say on the 29th?  Do we need an rc?

There are some memoryview issues [1] I'd like to have fixed for 3.0.1 -
the 29th would be a much easier date to hit. A quick review pass through
the other 3.0 highs and criticals might also be worthwhile.

Cheers,
Nick.

http://bugs.python.org/issue4580

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From barry at python.org  Fri Dec 19 23:46:30 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 19 Dec 2008 17:46:30 -0500
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <494C2369.5030901@gmail.com>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
	<494C2369.5030901@gmail.com>
Message-ID: <16D50043-22B0-4711-BE91-E752953444EA@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 19, 2008, at 5:42 PM, Nick Coghlan wrote:

> Barry Warsaw wrote:
>> I'd like to get Python 3.0.1 out before the end of the year.  There  
>> are
>> no showstoppers, but I haven't yet looked at the deferred blockers or
>> the buildbots.
>>
>> Do you think we can get 3.0.1 out on December 24th?  Or should we  
>> wait
>> until after Christmas and get it out, say on the 29th?  Do we need  
>> an rc?
>
> There are some memoryview issues [1] I'd like to have fixed for  
> 3.0.1 -
> the 29th would be a much easier date to hit. A quick review pass  
> through
> the other 3.0 highs and criticals might also be worthwhile.

Thanks.  I've bumped that to release blocker for now.  If there are  
any other 'high' bugs that you want considered for 3.0.1, please make  
the release blockers too, for now.

- -Barry


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSUwkRnEjvBPtnXfVAQKQ4QP/eRmWBgyuijbe9vnXkRkTkAmd4qyrAD2s
Forp4hKGvoc4A03Q4x2uVweI4oSdFrKIN2NlcM3JVlSrsU07DTElFoCEA/A8DB3N
+6Sp9bC98iVqGUmle54rFIm0F/iCoFQ59mp9jNGeiKVwjojUDkbJNXulHuYIb1co
RuICfsatRc0=
=zjQz
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Fri Dec 19 23:47:27 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 19 Dec 2008 22:47:27 +0000 (UTC)
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
References: <0016e64f68207a52a5045e6de625@google.com>
Message-ID: <loom.20081219T224445-192@post.gmane.org>


> Well, ``python -h`` still lists it.

Precisely, it says:

-u     : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
         see man page for details on internal buffering relating to '-u'

Note the "binary". And indeed:

./python -u
Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdout.buffer.write(b"y")
y1
>>> 

I don't know what it would take to enable unbuffered text IO while keeping the
current TextIOWrapper implementation...

Regards

Antoine.



From solipsis at pitrou.net  Fri Dec 19 23:59:49 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 19 Dec 2008 22:59:49 +0000 (UTC)
Subject: [Python-Dev] Python 3.0.1
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
	<494C2369.5030901@gmail.com>
Message-ID: <loom.20081219T225442-354@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> There are some memoryview issues [1] I'd like to have fixed for 3.0.1 -
> the 29th would be a much easier date to hit. A quick review pass through
> the other 3.0 highs and criticals might also be worthwhile.

What about #1717 "Get rid of more refercenes to __cmp__"?
(although I like the typo a lot)



From guido at python.org  Sat Dec 20 00:03:10 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 19 Dec 2008 15:03:10 -0800
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <loom.20081219T224445-192@post.gmane.org>
References: <0016e64f68207a52a5045e6de625@google.com>
	<loom.20081219T224445-192@post.gmane.org>
Message-ID: <ca471dc20812191503w3475ac17sb430c099cff62457@mail.gmail.com>

Fror truly unbuffered text output you'd have to make changes to the
io.TextIOWrapper class to flush after each write() call. That's an API
change -- the constructor currently has a line_buffering option but no
option for completely unbuffered mode. It would also require some
changes to io.open() which currently rejects buffering=0 in text mode.
All that suggests that it should wait until 3.1.

However it might make sense to at least turn on line buffering when -u
or PYTHONUNBUFFERED is given; that doesn't require API changes and so
can be considered a bug fix.

--Guido van Rossum (home page: http://www.python.org/~guido/)



On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> Well, ``python -h`` still lists it.
>
> Precisely, it says:
>
> -u     : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
>         see man page for details on internal buffering relating to '-u'
>
> Note the "binary". And indeed:
>
> ./python -u
> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import sys
>>>> sys.stdout.buffer.write(b"y")
> y1
>>>>
>
> I don't know what it would take to enable unbuffered text IO while keeping the
> current TextIOWrapper implementation...
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>

From solipsis at pitrou.net  Sat Dec 20 00:16:10 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 19 Dec 2008 23:16:10 +0000 (UTC)
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
References: <0016e64f68207a52a5045e6de625@google.com>
	<loom.20081219T224445-192@post.gmane.org>
Message-ID: <loom.20081219T231505-35@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:
> 
> Note the "binary". And indeed:
[...]

And I realize I should have thought a bit before giving that "proof".
Sorry!




From amauryfa at gmail.com  Sat Dec 20 00:38:04 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Sat, 20 Dec 2008 00:38:04 +0100
Subject: [Python-Dev] Distutils maintenance
In-Reply-To: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com>
References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com>
	<1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com>
Message-ID: <e27efe130812191538o2ad6d002o8fc1d7ec462d94ee@mail.gmail.com>

On Fri, Dec 19, 2008 at 19:59, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
>> Hello
>>
>> I would like to request a commit access to work specifically on
>> distutils maintenance.
>
> +1
>
> We are currently without an active distutils maintainer, and many
> stale distutil tickets are in need of attention I'm sure Tarek could
> provide. Tarek has also been providing many useful patches of his own.

+1 from me as well.

-- 
Amaury Forgeot d'Arc

From tutufan at gmail.com  Sat Dec 20 00:29:38 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Fri, 19 Dec 2008 17:29:38 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict
	(python 2.5.2)
Message-ID: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>

I have a program that creates a huge (45GB) defaultdict.  (The keys
are short strings, the values are short lists of pairs (string, int).)
 Nothing but possibly the strings and ints is shared.

The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).  That is, after executing the final
statement (a print), it is apparently spending a huge amount of time
cleaning up before exiting.  I haven't installed any exit handlers or
anything like that, all files are already closed and stdout/stderr
flushed, and there's nothing special going on.  I have done
'gc.disable()' for performance (which is hideous without it)--I have
no reason to think there are any loops.

Currently I am working around this by doing an os._exit(), which is
immediate, but this seems like a bit of hack.  Is this something that
needs fixing, or that has already been fixed?

Mike

From martin at v.loewis.de  Sat Dec 20 03:44:22 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 20 Dec 2008 03:44:22 +0100
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
Message-ID: <494C5C06.30109@v.loewis.de>

> Do you think we can get 3.0.1 out on December 24th?

I won't have physical access to my build machine from December 24th to
January 3rd.

> Or should we wait
> until after Christmas and get it out, say on the 29th?  Do we need an rc?

If you want to get it quickly, it should happen on December 23rd (my
time, meaning that the tag should be created on December 22nd). December
29th might work as well; I'd create the binaries remotely (in this case,
the tag would need to be created on December 28th).

Overall, I think a week more or less doesn't really matter, and would
prefer to see the release created in January. There are 13 release
blockers, and I'm skeptical that they can all get resolved within
the next few days.

Regards,
Martin

From martin at v.loewis.de  Sat Dec 20 04:12:21 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 20 Dec 2008 04:12:21 +0100
Subject: [Python-Dev] 2.6 and 3.0 buildbot slaves
Message-ID: <494C6295.9080506@v.loewis.de>

I have now set up buildbot slaves for 2.6 and 3.0,
and turned off the 2.5 ones.

Regards,
Martin

From ncoghlan at gmail.com  Sat Dec 20 08:17:28 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 20 Dec 2008 17:17:28 +1000
Subject: [Python-Dev] Call PyType_Ready on builtin types during
	interpreter startup?
In-Reply-To: <494C1CE4.5080102@gmail.com>
References: <494C1CE4.5080102@gmail.com>
Message-ID: <494C9C08.5030702@gmail.com>

Nick Coghlan wrote:
> Is there a specific reason for not fully initialising the builtin types?
> Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?

I need to correct this slightly: some builtin types *are* initialised
properly by _Py_ReadyTypes.

So the question is actually whether or not the missing builtin types
should be added to that function.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From kristjan at ccpgames.com  Sat Dec 20 11:02:38 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Sat, 20 Dec 2008 10:02:38 +0000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict	(python 2.5.2)
In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>

Can you distill the program into something reproducible?
Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance?
I can try to point our commercial profiling tools at it and see what it is doing.
K

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Mike Coleman
Sent: 19. desember 2008 23:30
To: python-dev at python.org
Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

I have a program that creates a huge (45GB) defaultdict.  (The keys
are short strings, the values are short lists of pairs (string, int).)
 Nothing but possibly the strings and ints is shared.

The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).  That is, after executing the final
statement (a print), it is apparently spending a huge amount of time
cleaning up before exiting.  I haven't installed any exit handlers or
anything like that, all files are already closed and stdout/stderr
flushed, and there's nothing special going on.  I have done
'gc.disable()' for performance (which is hideous without it)--I have
no reason to think there are any loops.

Currently I am working around this by doing an os._exit(), which is
immediate, but this seems like a bit of hack.  Is this something that
needs fixing, or that has already been fixed?

Mike
_______________________________________________
Python-Dev mailing list
Python-Dev at python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com


From steve at pearwood.info  Sat Dec 20 11:55:26 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 20 Dec 2008 21:55:26 +1100
Subject: [Python-Dev]
	=?iso-8859-1?q?extremely_slow_exit_for_program_havin?=
	=?iso-8859-1?q?g_huge_=2845G=29_dict_=28python_2=2E5=2E2=29?=
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
Message-ID: <200812202155.28024.steve@pearwood.info>

On Sat, 20 Dec 2008 09:02:38 pm Kristj?n Valur J?nsson wrote:

> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting
> some degradation of exit performance? I can try to point our
> commercial profiling tools at it and see what it is doing. K

In November 2007, a similar problem was reported on the comp.lang.python 
newsgroup. 370MB was large enough to demonstrate the problem. I don't 
know if a bug was ever reported.

The thread starts here:
http://mail.python.org/pipermail/python-list/2007-November/465498.html

or if you prefer Google Groups:
http://preview.tinyurl.com/97xsso

and it describes extremely long times to populate and destroy large 
dicts even with garbage collection turned off.

My summary at the time was:

"On systems with multiple CPUs or 64-bit systems, or both, creating 
and/or deleting a multi-megabyte dictionary in recent versions of 
Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ 
minutes, compared to seconds if the system only has a single CPU. 
Turning garbage collection off doesn't help."

I make no guarantee that the above is a correct description of the 
problem, only that this is what I believed at the time.

I'm afraid it is a very long thread, with multiple red herrings, lots of 
people unable to reproduce the problem, and the usual nonsense that 
happens on comp.lang.python.

I was originally one of the skeptics until I reproduced the original 
posters problem. I generated a sample file 8 million key/value pairs as 
a 370MB text file. Reading it into a dict took two and a half minutes 
on my relatively slow computer. But deleting the dict took more than 30 
minutes even with garbage collection switched off. Sample code 
reproducing the problem on my machine is here:

http://mail.python.org/pipermail/python-list/2007-November/465513.html

According to this post of mine:

http://mail.python.org/pipermail/python-list/2007-November/466209.html

deleting 8 million (key, value) pairs stored as a list of tuples was 
very fast. It was only if they were stored as a dict that deleting it 
was horribly slow.

Please note that other people have tried and failed to replicate the 
problem. I suspect the fault (if it is one, and not human error) is 
specific to some combinations of Python version and hardware.

Even if this is a Will Not Fix, I'd be curious if anyone else can 
reproduce the problem.

Hope this is helpful,

Steven.



> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com at python.org
> [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On
> Behalf Of Mike Coleman Sent: 19. desember 2008 23:30
> To: python-dev at python.org
> Subject: [Python-Dev] extremely slow exit for program having huge
> (45G) dict (python 2.5.2)
>
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string,
> int).) Nothing but possibly the strings and ints is shared.
>
> The program takes around 10 minutes to run, but longer than 20
> minutes to exit (I gave up at that point).  That is, after executing
> the final statement (a print), it is apparently spending a huge
> amount of time cleaning up before exiting.  I haven't installed any
> exit handlers or anything like that, all files are already closed and
> stdout/stderr flushed, and there's nothing special going on.  I have
> done
> 'gc.disable()' for performance (which is hideous without it)--I have
> no reason to think there are any loops.
>
> Currently I am working around this by doing an os._exit(), which is
> immediate, but this seems like a bit of hack.  Is this something that
> needs fixing, or that has already been fixed?
>
> Mike




-- 
Steven D'Aprano

From andymac at bullseye.apana.org.au  Sat Dec 20 11:08:00 2008
From: andymac at bullseye.apana.org.au (Andrew MacIntyre)
Date: Sat, 20 Dec 2008 21:08:00 +1100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict	(python 2.5.2)
In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
Message-ID: <494CC400.7070404@bullseye.andymac.org>

Mike Coleman wrote:
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string, int).)
>  Nothing but possibly the strings and ints is shared.
> 
> The program takes around 10 minutes to run, but longer than 20 minutes
> to exit (I gave up at that point).  That is, after executing the final
> statement (a print), it is apparently spending a huge amount of time
> cleaning up before exiting.  I haven't installed any exit handlers or
> anything like that, all files are already closed and stdout/stderr
> flushed, and there's nothing special going on.  I have done
> 'gc.disable()' for performance (which is hideous without it)--I have
> no reason to think there are any loops.
> 
> Currently I am working around this by doing an os._exit(), which is
> immediate, but this seems like a bit of hack.  Is this something that
> needs fixing, or that has already been fixed?

You don't mention the platform, but...

This behaviour was not unknown in the distant past, with much smaller
datasets.  Most of the problems then related to the platform malloc()
doing funny things as stuff was free()ed, like coalescing free space.

[I once sat and watched a Python script run in something like 30 seconds
  and then take nearly 10 minutes to terminate, as you describe (Python
  2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
  hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
  properties from Python's point of view]

PyMalloc effectively removed this as an issue for most cases and platform
malloc()s have also become considerably more sophisticated since then,
but I wonder whether the sheer size of your dataset is unmasking related
issues.

Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
might have odd behaviour with 45GB of arenas returned to it piecemeal.
This is something that could be checked with a small C program.
Calling os._exit() circumvents the free()ing of the arenas.

Also consider that, with the exception of small integers (-1..256), no
interning of integers is done.  If your data contains large quantities
of integers with non-unique values (that aren't in the small integer
range) you may find it useful to do your own interning.

-- 
-------------------------------------------------------------------------
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac at pcug.org.au             (alt) |        Belconnen ACT 2616
Web:    http://www.andymac.org/               |        Australia

From steve at holdenweb.com  Sat Dec 20 14:14:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 20 Dec 2008 08:14:49 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <494CC400.7070404@bullseye.andymac.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org>
Message-ID: <giir4b$90m$1@ger.gmane.org>

Andrew MacIntyre wrote:
> Mike Coleman wrote:
>> I have a program that creates a huge (45GB) defaultdict.  (The keys
>> are short strings, the values are short lists of pairs (string, int).)
>>  Nothing but possibly the strings and ints is shared.
>>
>> The program takes around 10 minutes to run, but longer than 20 minutes
>> to exit (I gave up at that point).  That is, after executing the final
>> statement (a print), it is apparently spending a huge amount of time
>> cleaning up before exiting.  I haven't installed any exit handlers or
>> anything like that, all files are already closed and stdout/stderr
>> flushed, and there's nothing special going on.  I have done
>> 'gc.disable()' for performance (which is hideous without it)--I have
>> no reason to think there are any loops.
>>
>> Currently I am working around this by doing an os._exit(), which is
>> immediate, but this seems like a bit of hack.  Is this something that
>> needs fixing, or that has already been fixed?
> 
> You don't mention the platform, but...
> 
> This behaviour was not unknown in the distant past, with much smaller
> datasets.  Most of the problems then related to the platform malloc()
> doing funny things as stuff was free()ed, like coalescing free space.
> 
> [I once sat and watched a Python script run in something like 30 seconds
>  and then take nearly 10 minutes to terminate, as you describe (Python
>  2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
>  hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
>  properties from Python's point of view]
> 
> PyMalloc effectively removed this as an issue for most cases and platform
> malloc()s have also become considerably more sophisticated since then,
> but I wonder whether the sheer size of your dataset is unmasking related
> issues.
> 
> Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
> accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
> might have odd behaviour with 45GB of arenas returned to it piecemeal.
> This is something that could be checked with a small C program.
> Calling os._exit() circumvents the free()ing of the arenas.
> 
> Also consider that, with the exception of small integers (-1..256), no
> interning of integers is done.  If your data contains large quantities
> of integers with non-unique values (that aren't in the small integer
> range) you may find it useful to do your own interning.
> 
It's a pity a simplistic approach that redefines all space reclamation
activities as null functions won't work. I hate to think of all the
cycles that are being wasted reclaiming space just because a program has
terminated, when in fact an os.exit() call would work just as well from
the user's point of view.

Unfortunately there are doubtless programs out there that do rely on
actions being taken at shutdown.

Maybe os.exit() could be more widely advertised, though ...

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From g.brandl at gmx.net  Sat Dec 20 14:26:22 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 20 Dec 2008 14:26:22 +0100
Subject: [Python-Dev] Distutils maintenance
In-Reply-To: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com>
References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com>
	<1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com>
Message-ID: <giirrj$b6r$1@ger.gmane.org>

Benjamin Peterson schrieb:
> On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
>> Hello
>>
>> I would like to request a commit access to work specifically on
>> distutils maintenance.
> 
> +1
> 
> We are currently without an active distutils maintainer, and many
> stale distutil tickets are in need of attention I'm sure Tarek could
> provide. Tarek has also been providing many useful patches of his own.

FWIW, +1.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Sat Dec 20 14:29:15 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 20 Dec 2008 14:29:15 +0100
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
Message-ID: <giis10$b6r$2@ger.gmane.org>

Barry Warsaw schrieb:
> I'd like to get Python 3.0.1 out before the end of the year.  There
> are no showstoppers, but I haven't yet looked at the deferred blockers
> or the buildbots.
> 
> Do you think we can get 3.0.1 out on December 24th?  Or should we wait
> until after Christmas and get it out, say on the 29th?  Do we need an
> rc?
> 
> This question goes mostly to Martin and Georg.  What would work for
> you guys?

Since the 24th is the most important Christmas day around here, I'll not
be available then :)

Either 23rd or 29th is fine with me.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From skip at pobox.com  Sat Dec 20 16:55:32 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 20 Dec 2008 09:55:32 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <giir4b$90m$1@ger.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org> <giir4b$90m$1@ger.gmane.org>
Message-ID: <18765.5492.200918.790182@montanaro-dyndns-org.local>


    Steve> Unfortunately there are doubtless programs out there that do rely
    Steve> on actions being taken at shutdown.

Indeed.  I believe any code which calls atexit.register.

    Steve> Maybe os.exit() could be more widely advertised, though ...

That would be os._exit().  Calling it avoids calls to exit functions
registered with atexit.register().  I believe it is both safe, and
reasonable programming practice for modules to register exit functions.
Both the logging and multiprocessing modules call it.  It's incumbent on the
application programmer to know these details of the modules the app uses
(perhaps indirectly) to know whether or not it's safe/wise to call
os._exit().

-- 
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/

From aahz at pythoncraft.com  Sat Dec 20 18:01:55 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 20 Dec 2008 09:01:55 -0800
Subject: [Python-Dev] Call PyType_Ready on builtin types
	during	interpreter startup?
In-Reply-To: <494C1CE4.5080102@gmail.com>
References: <494C1CE4.5080102@gmail.com>
Message-ID: <20081220170154.GA28166@panix.com>

On Sat, Dec 20, 2008, Nick Coghlan wrote:
> 
> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of
> the builtin types - they're left to have it called implicitly when an
> operation using them needs tp_dict filled in.

This seems like a release blocker for 3.0.1 to me
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan

From tutufan at gmail.com  Sat Dec 20 17:57:47 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 10:57:47 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
Message-ID: <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>

On Sat, Dec 20, 2008 at 4:02 AM, Kristj?n Valur J?nsson
<kristjan at ccpgames.com> wrote:
> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance?
> I can try to point our commercial profiling tools at it and see what it is doing.

I will try next week to see if I can come up with a smaller,
submittable example.  Thanks.

From tutufan at gmail.com  Sat Dec 20 18:09:03 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 11:09:03 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494CC400.7070404@bullseye.andymac.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org>
Message-ID: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>

Andrew, this is on an (intel) x86_64 box with 64GB of RAM.  I don't
recall the maker or details of the architecture off the top of my
head, but it would be something "off the rack" from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that gave me any reason to think was affecting my program.

It's running CentOS 5 I think, so that might make glibc several years
old.  Your malloc idea sounds plausible to me.  If it is a libc
problem, it would be nice if there was some way we could tell malloc
to "live for today because there is no tomorrow" in the terminal phase
of the program.

I'm not sure exactly how to attack this.  Callgrind is cool, but no
way will work on something this size.  Timed ltrace output might be
interesting.  Or maybe a gprof'ed Python, though that's more work.

Regarding interning, I thought this only worked with strings.  Is
there some way to intern integers?  I'm probably creating 300M
integers more or less uniformly distributed across range(10000).

Mike





On Sat, Dec 20, 2008 at 4:08 AM, Andrew MacIntyre
<andymac at bullseye.apana.org.au> wrote:
> Mike Coleman wrote:
>>
>> I have a program that creates a huge (45GB) defaultdict.  (The keys
>> are short strings, the values are short lists of pairs (string, int).)
>>  Nothing but possibly the strings and ints is shared.
>>
>> The program takes around 10 minutes to run, but longer than 20 minutes
>> to exit (I gave up at that point).  That is, after executing the final
>> statement (a print), it is apparently spending a huge amount of time
>> cleaning up before exiting.  I haven't installed any exit handlers or
>> anything like that, all files are already closed and stdout/stderr
>> flushed, and there's nothing special going on.  I have done
>> 'gc.disable()' for performance (which is hideous without it)--I have
>> no reason to think there are any loops.
>>
>> Currently I am working around this by doing an os._exit(), which is
>> immediate, but this seems like a bit of hack.  Is this something that
>> needs fixing, or that has already been fixed?
>
> You don't mention the platform, but...
>
> This behaviour was not unknown in the distant past, with much smaller
> datasets.  Most of the problems then related to the platform malloc()
> doing funny things as stuff was free()ed, like coalescing free space.
>
> [I once sat and watched a Python script run in something like 30 seconds
>  and then take nearly 10 minutes to terminate, as you describe (Python
>  2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
>  hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
>  properties from Python's point of view]
>
> PyMalloc effectively removed this as an issue for most cases and platform
> malloc()s have also become considerably more sophisticated since then,
> but I wonder whether the sheer size of your dataset is unmasking related
> issues.
>
> Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
> accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
> might have odd behaviour with 45GB of arenas returned to it piecemeal.
> This is something that could be checked with a small C program.
> Calling os._exit() circumvents the free()ing of the arenas.
>
> Also consider that, with the exception of small integers (-1..256), no
> interning of integers is done.  If your data contains large quantities
> of integers with non-unique values (that aren't in the small integer
> range) you may find it useful to do your own interning.
>
> --
> -------------------------------------------------------------------------
> Andrew I MacIntyre                     "These thoughts are mine alone..."
> E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
>       andymac at pcug.org.au             (alt) |        Belconnen ACT 2616
> Web:    http://www.andymac.org/               |        Australia
>

From Scott.Daniels at Acm.Org  Sat Dec 20 18:41:39 2008
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Sat, 20 Dec 2008 09:41:39 -0800
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<494CC400.7070404@bullseye.andymac.org>
	<3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
Message-ID: <gijann$mrl$1@ger.gmane.org>

Mike Coleman wrote:
> ... Regarding interning, I thought this only worked with strings. 
> Is there some way to intern integers?  I'm probably creating 300M
> integers more or less uniformly distributed across range(10000)?

held = list(range(10000))
...
     troublesome_dict[string] = held[number_to_hold]
...

--Scott David Daniels
Scott.Daniels at Acm.Org


From kristjan at ccpgames.com  Sat Dec 20 19:25:25 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Sat, 20 Dec 2008 18:25:25 +0000
Subject: [Python-Dev] extremely slow exit for program having huge
	(45G)	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org>
	<3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BDD@exchis.ccp.ad.local>

You can always try poor-man's profiling, which is surprisingly useful in the face of massive performance problems.
Just attach a debugger to the program, and when it suffering from a performance problem, break the execution on a regular basis. You are statistically very likely to get a callstack representative of the problem you are having.
Do this a few times and you will get a fair impression of what the program is spending its time on.
>From the debugger, you can also examine the python callstack of the program by examinging the 'f' local variable in the Frame Evaluation function.

Have fun,

K

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Mike Coleman
Sent: 20. desember 2008 17:09
To: Andrew MacIntyre
Cc: Python Dev
Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)


I'm not sure exactly how to attack this.  Callgrind is cool, but no
way will work on something this size.  Timed ltrace output might be
interesting.  Or maybe a gprof'ed Python, though that's more work.



From mikko+python at redinnovation.com  Sat Dec 20 20:27:15 2008
From: mikko+python at redinnovation.com (Mikko Ohtamaa)
Date: Sat, 20 Dec 2008 21:27:15 +0200
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
Message-ID: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>

Hi fellow snakemen and lizard ladies,

We have been recently done lots of Python work on Nokia Series 60 phones and
even managed to roll out some commercial Python based applications. In the
future we plan to create some iPhone Python apps also.

Python runs fine in phones - after it has been launched. Currently the
biggest issue preventing the world dominance of Python based mobile
applications is the start up time. We cope with the issue by using fancy
splash screens and progress indicators, but it does't cure the fact that it
takes a minute to show the main user interface of the application. Most of
the time is spend in import executing opcodes and forming function and class
structures in memory - something which cannot be easily boosted.

Now, we have been thinking. Maemo has fork() based Python launcher (
http://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/) which
greatly speed ups the start up time by holding Python in memory all the
time. We cannot afford such luxury on Symbian and iPhone, since we do not
control the operating system. So how about this

1. A Python application is launched normally

2. After VM has initialized module importing and reached a static launch
state (meaning that the state is same on every launch) the VM state is
written on to disk

3. Application continues execution and starts doing dynamic stuff

4. On the following launches, special init code is used which directly blits
VM image from disk back to memory and we have reached the static state again
without going whoops of executing import related opcodes

5. Also, I have heard a suggestion that VM image could be defragmented and
analyzed offline

Any opinions?

Cheers,
Mikko


-- 
Mikko Ohtamaa
Red Innovation Ltd.
Oulu, Finland
http://www.redinnovation.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081220/3696b62e/attachment-0001.htm>

From solipsis at pitrou.net  Sat Dec 20 20:45:11 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 20 Dec 2008 19:45:11 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?extremely_slow_exit_for_program_having_hug?=
	=?utf-8?b?ZSAoNDVHKSBkaWN0IChweXRob24gMi41LjIp?=
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<200812202155.28024.steve@pearwood.info>
Message-ID: <loom.20081220T193826-31@post.gmane.org>

Steven D'Aprano <steve <at> pearwood.info> writes:
> 
> In November 2007, a similar problem was reported on the comp.lang.python 
> newsgroup. 370MB was large enough to demonstrate the problem. I don't 
> know if a bug was ever reported.

Do you still reproduce it on trunk?
I've tried your scripts on my machine and they work fine, even if I leave
garbage collecting enabled during the process.
(dual core 64-bit machine but in 32-bit mode)




From mal at egenix.com  Sat Dec 20 21:04:32 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 20 Dec 2008 21:04:32 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
Message-ID: <494D4FD0.4020202@egenix.com>

On 2008-12-20 17:57, Mike Coleman wrote:
> On Sat, Dec 20, 2008 at 4:02 AM, Kristj?n Valur J?nsson
> <kristjan at ccpgames.com> wrote:
>> Can you distill the program into something reproducible?
>> Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance?
>> I can try to point our commercial profiling tools at it and see what it is doing.
> 
> I will try next week to see if I can come up with a smaller,
> submittable example.  Thanks.

These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 20 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From leif.walsh at gmail.com  Sat Dec 20 21:20:22 2008
From: leif.walsh at gmail.com (Leif Walsh)
Date: Sat, 20 Dec 2008 15:20:22 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494D4FD0.4020202@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
Message-ID: <cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>

On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.

In that case, the question would be "why is the interpreter collecting
garbage when it knows we're trying to exit anyway?".

-- 
Cheers,
Leif

From fuzzyman at voidspace.org.uk  Sat Dec 20 21:25:42 2008
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sat, 20 Dec 2008 20:25:42 +0000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
Message-ID: <494D54C6.3000500@voidspace.org.uk>

Leif Walsh wrote:
> On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>   
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
>>     
>
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".
>
>   

Because finalizers are only called when an object is destroyed presumably.

Michael

-- 
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog



From skip at pobox.com  Sat Dec 20 21:26:20 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 20 Dec 2008 14:26:20 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
Message-ID: <18765.21740.137339.943481@montanaro-dyndns-org.local>


    Leif> In that case, the question would be "why is the interpreter
    Leif> collecting garbage when it knows we're trying to exit anyway?".

Because useful side effects are sometimes performed as a result of this
activity (flushing disk buffers, closing database connections, etc).

Skip

From tim.peters at gmail.com  Sat Dec 20 21:34:19 2008
From: tim.peters at gmail.com (Tim Peters)
Date: Sat, 20 Dec 2008 15:34:19 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
Message-ID: <1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com>

[M.-A. Lemburg]
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.

[Leif Walsh]
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".

Because user-defined destructors (like __del__ methods and weakref
callbacks) may be associated with garbage, and users presumably want
those to execute.  Doing so requires identifying identifying garbage
and releasing it, same as if the interpreter didn't happen to be
exiting.

BTW, the original poster should try this:  use whatever tools the OS
supplies to look at CPU and disk usage during the long exit.  What I
/expect/ is that almost no CPU time is being used, while the disk is
grinding itself to dust.  That's what happens when a large number of
objects have been swapped out to disk, and exit processing has to page
them all back into memory again (in order to decrement their
refcounts).  Python's cyclic gc (the `gc` module) has nothing to do
with this -- it's typically the been-there-forever refcount-based
non-cyclic gc that accounts for supernaturally long exit times.

If that is the case here, there's no evident general solution.  If you
have millions of objects still alive at exit, refcount-based
reclamation has to visit all of them, and if they've been swapped out
to disk it can take a very long time to swap them all back into memory
again.

From mal at egenix.com  Sat Dec 20 21:50:19 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 20 Dec 2008 21:50:19 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
Message-ID: <494D5A8B.8060000@egenix.com>

On 2008-12-20 21:20, Leif Walsh wrote:
> On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
> 
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".

It cannot know until the very end, because there may still be
some try: ... except SystemExit: ... somewhere in the code
waiting to trigger and stop the system exit.

If you want a really fast exit, try this:

import os
os.kill(os.getpid(), 9)

But you better know what you're doing if you take this approach...

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 20 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From leif.walsh at gmail.com  Sat Dec 20 22:01:59 2008
From: leif.walsh at gmail.com (Leif Walsh)
Date: Sat, 20 Dec 2008 16:01:59 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <18765.21740.137339.943481@montanaro-dyndns-org.local>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
Message-ID: <cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>

(@Skip, Michael, Tim)

On Sat, Dec 20, 2008 at 3:26 PM,  <skip at pobox.com> wrote:
> Because useful side effects are sometimes performed as a result of this
> activity (flushing disk buffers, closing database connections, etc).

Of course they are.  But what about the case given above:

On Sat, Dec 20, 2008 at 5:55 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> I was originally one of the skeptics until I reproduced the original
> posters problem. I generated a sample file 8 million key/value pairs as
> a 370MB text file. Reading it into a dict took two and a half minutes
> on my relatively slow computer. But deleting the dict took more than 30
> minutes even with garbage collection switched off.

It might be a semantic change that I'm looking for here, but it seems
to me that if you turn off the garbage collector, you should be able
to expect that either it also won't run on exit, or it should have a
way of letting you tell it not to run on exit.  If I'm running without
a garbage collector, that assumes I'm at least cocky enough to think I
know when I'm done with my objects, so I should know to delete the
objects that have __del__ functions I care about before I exit.  Well,
maybe; I'm sure one of you could drag out a programmer that would make
that mistake, but turning off the garbage collector to me seems to
send the experience message, at least a little.

Does the garbage collector run any differently when the process is
exiting?  It seems that it wouldn't need to do anything more that run
through all objects in the heap and delete them, which doesn't require
anything fancy, and should be able to sort by address to aid with
caching.  If it's already this fast, then I guess it really is the
sheer number of function calls necessary that are causing such a
slowdown in the cases we've seen, but I find this hard to believe.

-- 
Cheers,
Leif

From tim.peters at gmail.com  Sat Dec 20 22:03:11 2008
From: tim.peters at gmail.com (Tim Peters)
Date: Sat, 20 Dec 2008 16:03:11 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <gijann$mrl$1@ger.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org>
	<3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
	<gijann$mrl$1@ger.gmane.org>
Message-ID: <1f7befae0812201303x39b21d00qcda6e897a29371db@mail.gmail.com>

[Mike Coleman]
>> ... Regarding interning, I thought this only worked with strings.

Implementation details.  Recent versions of CPython also, e.g.,
"intern" the empty tuple, and very small integers.

>> Is there some way to intern integers?  I'm probably creating 300M
>> integers more or less uniformly distributed across range(10000)?

Interning would /vastly/ reduce memory use for ints in that case, from
gigabytes down to less than half a megabyte.


[Scott David Daniels]
> held = list(range(10000))
> ...
>    troublesome_dict[string] = held[number_to_hold]
> ...

More generally, but a bit slower, for objects usable as dict keys,
change code of the form:

    x = whatever_you_do_to_get_a_new_object()
    use(x)

to:

    x = whatever_you_do_to_get_a_new_object()
    x = intern_it(x, x)
    use(x)

where `intern_it` is defined like so once at the start of the program:

    intern_it = {}.setdefault

This snippet may make the mechanism clearer:

>>> intern_it = {}.setdefault
>>> x = 3000
>>> id(intern_it(x, x))
36166156
>>> x = 1000 + 2000
>>> id(intern_it(x, x))
36166156
>>> x = "works for computed strings too"
>>> id(intern_it(x, x))
27062696
>>> x = "works for computed strings t" + "o" * 2
>>> id(intern_it(x, x))
27062696

From tim.peters at gmail.com  Sat Dec 20 22:11:30 2008
From: tim.peters at gmail.com (Tim Peters)
Date: Sat, 20 Dec 2008 16:11:30 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
Message-ID: <1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com>

[Leif Walsh]
> ...
> It might be a semantic change that I'm looking for here, but it seems
> to me that if you turn off the garbage collector, you should be able
> to expect that either it also won't run on exit,

It won't then, but "the garbage collector" is the gc module, and that
only performs /cyclic/ garbage collection.  There is no way to stop
refcount-based garbage collection.  Read my message again.


> or it should have a
> way of letting you tell it not to run on exit.  If I'm running without
> a garbage collector, that assumes I'm at least cocky enough to think I
> know when I'm done with my objects, so I should know to delete the
> objects that have __del__ functions I care about before I exit.  Well,
> maybe; I'm sure one of you could drag out a programmer that would make
> that mistake, but turning off the garbage collector to me seems to
> send the experience message, at least a little.

This probably isn't a problem with cyclic gc (reread my msg).


> Does the garbage collector run any differently when the process is
> exiting?

No.


> It seems that it wouldn't need to do anything more that run
> through all objects in the heap and delete them, which doesn't require
> anything fancy,

Reread my msg -- already explained the likely cause here (if "all the
objects in the heap" have in fact been swapped out to disk, it can
take an enormously long time to just "run through" them all).


> and should be able to sort by address to aid with
> caching.

That one isn't possible.  There is no list of "all objects" to /be/
sorted.  The only way to find all the objects is to traverse the
object graph from its roots, which is exactly what non-cyclic gc does
anyway.


>  If it's already this fast, then I guess it really is the
> sheer number of function calls necessary that are causing such a
> slowdown in the cases we've seen, but I find this hard to believe.

My guess remains that CPU usage is trivial here, and 99.99+% of the
wall-clock time is consumed waiting for disk reads.  Either that, or
that platform malloc is going nuts.

From solipsis at pitrou.net  Sat Dec 20 22:13:11 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 20 Dec 2008 21:13:11 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?extremely_slow_exit_for_program_having_hug?=
	=?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?=
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
Message-ID: <loom.20081220T210531-211@post.gmane.org>

Leif Walsh <leif.walsh <at> gmail.com> writes:
> 
> It might be a semantic change that I'm looking for here, but it seems
> to me that if you turn off the garbage collector, you should be able
> to expect that either it also won't run on exit, or it should have a
> way of letting you tell it not to run on exit. 
[...]

I'm skeptical that it's a garbage collector problem. The script creates one dict
containing lots of strings and ints. The thing is, strings and ints aren't
tracked by the GC as they are simple atomic objects. Therefore, the /only/
object created by the script which is tracked by the GC is the dict. Moreover,
since there is no cycle created, the dict should be directly destroyed when its
last reference dies (the "del" statement), not go through the garbage collection
process.

Given that the problem is reproduced on certain systems and not others, it can
be related to an interaction between allocation patterns of the dict
implementation, the Python memory allocator, and the implementation of the C
malloc() / free() functions. I'm no expert enough to find out more on the
subject.



From fabiofz at gmail.com  Sat Dec 20 22:45:18 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Sat, 20 Dec 2008 19:45:18 -0200
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <ca471dc20812191503w3475ac17sb430c099cff62457@mail.gmail.com>
References: <0016e64f68207a52a5045e6de625@google.com>
	<loom.20081219T224445-192@post.gmane.org>
	<ca471dc20812191503w3475ac17sb430c099cff62457@mail.gmail.com>
Message-ID: <cfb578b20812201345l331f918g3f654d3478f4c21b@mail.gmail.com>

It appears that this bug was already reported: http://bugs.python.org/issue4705

Any chance that it gets in the next 3.0.x bugfix release?

Just as a note, if I do: sys.stdout._line_buffering = True, it also
works, but doesn't seem right as it's accessing an internal attribute.

Note 2: the solution that said to pass 'wb' does not work, because I
need the output as text and not binary or text becomes garbled when
it's not ascii.

Thanks,

Fabio

On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum <guido at python.org> wrote:
> Fror truly unbuffered text output you'd have to make changes to the
> io.TextIOWrapper class to flush after each write() call. That's an API
> change -- the constructor currently has a line_buffering option but no
> option for completely unbuffered mode. It would also require some
> changes to io.open() which currently rejects buffering=0 in text mode.
> All that suggests that it should wait until 3.1.
>
> However it might make sense to at least turn on line buffering when -u
> or PYTHONUNBUFFERED is given; that doesn't require API changes and so
> can be considered a bug fix.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>>> Well, ``python -h`` still lists it.
>>
>> Precisely, it says:
>>
>> -u     : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
>>         see man page for details on internal buffering relating to '-u'
>>
>> Note the "binary". And indeed:
>>
>> ./python -u
>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54)
>> [GCC 4.3.2] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> import sys
>>>>> sys.stdout.buffer.write(b"y")
>> y1
>>>>>
>>
>> I don't know what it would take to enable unbuffered text IO while keeping the
>> current TextIOWrapper implementation...
>>
>> Regards
>>
>> Antoine.
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com
>

From martin at v.loewis.de  Sat Dec 20 22:55:30 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 20 Dec 2008 22:55:30 +0100
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
In-Reply-To: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>
References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>
Message-ID: <494D69D2.5090601@v.loewis.de>

> Any opinions?

I would use a different marshal implementation. Instead of defining
a stream format for marshal, make marshal dump its graph of objects
along with the actual memory layout. On load, copying can
be avoided; just a few pointers need to be updated. The resulting
marshal files would be platform-specific (wrt. endianness and pointer
width).

On marshaling, you copy all objects into a contiguous block
of memory (8-aligned), and dump that. On unmarshaling, you just
map that block. If the target supports true memory mapping with
page boundaries, you might be able to store multiple .pyc files
into a single page. This reformatting could be done offline
also.

A few things need to be considered:
- compatibility. The original marshal code would probably
  need to be preserved for the "marshal" module.
- relative pointers. Code objects, tuples, etc. contain
  pointers. Assuming the marshaled object cannot be loaded
  back into the same address, you need to adjust pointers.
  A common trick is to put a desired load address into the
  memory block, then try to load into that address. If the
  address is already taken, load into a different address,
  and walk though all objects, adjusting pointers.
- type references. On loading, you will need to patch all
  ob_type fields. Put the marshal codes into the ob_type
  field on marshalling, then switch on unmarshalling.
- references to interned strings. On loading, you can
  either intern them all, or you have a "fast interning"
  algorithm that assigns a fixed table of interned-string
  numbers.
- reference counting. Make sure all these objects start
  out with a reference count of 1, so they will never
  become garbage.

If you use a container file for multiple .pyc files,
you can have additional savings by sharing strings
across modules; this should help in particular for
reference to builtin symbols, and for common method
names. A fixed interning might become unnecessary as
the unique single string object in the container will
either become the interned string itself, or point it
it after being interned once.
With such a container system, unmarshalling should be
lazy; e.g. for each object, the value of ob_type can
be used to determine whether the object was
unmarshalled.

Of course, you still have the actual interpretation of
the top-level module code - if it's not the marshalling
but this part that actually costs performance, this
efficient marshalling algorithm won't help. It would be
interesting to find out which modules have a particularly
high startup cost - perhaps they can be rewritten.

Regards,
Martin

From tutufan at gmail.com  Sat Dec 20 23:06:02 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 16:06:02 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494D5A8B.8060000@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<494D5A8B.8060000@egenix.com>
Message-ID: <3c6c07c20812201406j198acad7y8e04bae80324be0a@mail.gmail.com>

On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> If you want a really fast exit, try this:
>
> import os
> os.kill(os.getpid(), 9)
>
> But you better know what you're doing if you take this approach...

This would work, but I think os._exit(EX_OK) is probably just as fast,
and allows you to control the exit status...

From martin at v.loewis.de  Sat Dec 20 23:16:09 2008
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 20 Dec 2008 23:16:09 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <494D4FD0.4020202@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
Message-ID: <494D6EA9.2040201@v.loewis.de>

>> I will try next week to see if I can come up with a smaller,
>> submittable example.  Thanks.
> 
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.

I doubt that. The long exit times are usually caused by a bad
malloc implementation.

Regards,
Martin


From arfrever.fta at gmail.com  Sat Dec 20 23:28:18 2008
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Sat, 20 Dec 2008 23:28:18 +0100
Subject: [Python-Dev] 2.6.1 documentation not available for download
Message-ID: <200812202328.20045.Arfrever.FTA@gmail.com>

Python 2.6.1 documentation currently isn't available for download at:
http://docs.python.org/ftp/python/doc/

Additionally please include version numbers in documentation
archives (e.g. python-docs-html-2.6.1.tar.bz2).

-- 
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081220/04a32d3f/attachment.pgp>

From steve at holdenweb.com  Sat Dec 20 23:37:12 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 20 Dec 2008 17:37:12 -0500
Subject: [Python-Dev] 2.6.1 license
Message-ID: <gijs2p$61a$2@ger.gmane.org>

It might be helpful if

  http://www.python.org/download/releases/2.6.1/license/

said it was also the official license for the 2.6.1 release (though I
don't suppose it matters that it's still called the 2.5 license, since
that's its origin).

Another detail to go into the release manage PEP?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From steve at holdenweb.com  Sat Dec 20 23:44:28 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 20 Dec 2008 17:44:28 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <loom.20081220T210531-211@post.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>	<18765.21740.137339.943481@montanaro-dyndns-org.local>	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<loom.20081220T210531-211@post.gmane.org>
Message-ID: <494D754C.1050109@holdenweb.com>

Antoine Pitrou wrote:
> Leif Walsh <leif.walsh <at> gmail.com> writes:
>> It might be a semantic change that I'm looking for here, but it seems
>> to me that if you turn off the garbage collector, you should be able
>> to expect that either it also won't run on exit, or it should have a
>> way of letting you tell it not to run on exit. 
> [...]
> 
> I'm skeptical that it's a garbage collector problem. The script creates one dict
> containing lots of strings and ints. The thing is, strings and ints aren't
> tracked by the GC as they are simple atomic objects. Therefore, the /only/
> object created by the script which is tracked by the GC is the dict. Moreover,
> since there is no cycle created, the dict should be directly destroyed when its
> last reference dies (the "del" statement), not go through the garbage collection
> process.
> 
> Given that the problem is reproduced on certain systems and not others, it can
> be related to an interaction between allocation patterns of the dict
> implementation, the Python memory allocator, and the implementation of the C
> malloc() / free() functions. I'm no expert enough to find out more on the
> subject.
> 
I believe the OP engendered a certain amount of confusion by describing
object deallocation as being performed by the garbage collector. So he
perhaps didn't understand that even decref'ing all the objects only
referenced by the dict will take a huge amount of time unless there's
enough real memory to hold it.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From steve at holdenweb.com  Sat Dec 20 23:44:28 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 20 Dec 2008 17:44:28 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <loom.20081220T210531-211@post.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>	<18765.21740.137339.943481@montanaro-dyndns-org.local>	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<loom.20081220T210531-211@post.gmane.org>
Message-ID: <494D754C.1050109@holdenweb.com>

Antoine Pitrou wrote:
> Leif Walsh <leif.walsh <at> gmail.com> writes:
>> It might be a semantic change that I'm looking for here, but it seems
>> to me that if you turn off the garbage collector, you should be able
>> to expect that either it also won't run on exit, or it should have a
>> way of letting you tell it not to run on exit. 
> [...]
> 
> I'm skeptical that it's a garbage collector problem. The script creates one dict
> containing lots of strings and ints. The thing is, strings and ints aren't
> tracked by the GC as they are simple atomic objects. Therefore, the /only/
> object created by the script which is tracked by the GC is the dict. Moreover,
> since there is no cycle created, the dict should be directly destroyed when its
> last reference dies (the "del" statement), not go through the garbage collection
> process.
> 
> Given that the problem is reproduced on certain systems and not others, it can
> be related to an interaction between allocation patterns of the dict
> implementation, the Python memory allocator, and the implementation of the C
> malloc() / free() functions. I'm no expert enough to find out more on the
> subject.
> 
I believe the OP engendered a certain amount of confusion by describing
object deallocation as being performed by the garbage collector. So he
perhaps didn't understand that even decref'ing all the objects only
referenced by the dict will take a huge amount of time unless there's
enough real memory to hold it.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From musiccomposition at gmail.com  Sat Dec 20 23:46:15 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sat, 20 Dec 2008 16:46:15 -0600
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <200812202328.20045.Arfrever.FTA@gmail.com>
References: <200812202328.20045.Arfrever.FTA@gmail.com>
Message-ID: <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>

On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
<arfrever.fta at gmail.com> wrote:
> Python 2.6.1 documentation currently isn't available for download at:
> http://docs.python.org/ftp/python/doc/

It is avaiable here, though:

http://www.python.org/ftp/python/doc/current/

>
> Additionally please include version numbers in documentation
> archives (e.g. python-docs-html-2.6.1.tar.bz2).

>



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From musiccomposition at gmail.com  Sat Dec 20 23:56:46 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sat, 20 Dec 2008 16:56:46 -0600
Subject: [Python-Dev] 2.6.1 license
In-Reply-To: <gijs2p$61a$2@ger.gmane.org>
References: <gijs2p$61a$2@ger.gmane.org>
Message-ID: <1afaf6160812201456kf192bf6r389fd6896bfb4fbd@mail.gmail.com>

On Sat, Dec 20, 2008 at 4:37 PM, Steve Holden <steve at holdenweb.com> wrote:
> It might be helpful if
>
>  http://www.python.org/download/releases/2.6.1/license/
>
> said it was also the official license for the 2.6.1 release (though I
> don't suppose it matters that it's still called the 2.5 license, since
> that's its origin).

I've updated the website and the PEP.



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From arfrever.fta at gmail.com  Sun Dec 21 00:02:05 2008
From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis)
Date: Sun, 21 Dec 2008 00:02:05 +0100
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>
References: <200812202328.20045.Arfrever.FTA@gmail.com>
	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>
Message-ID: <200812210002.05587.Arfrever.FTA@gmail.com>

2008-12-20 23:46:15 Benjamin Peterson napisa?(a):
> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
> <arfrever.fta at gmail.com> wrote:
> > Python 2.6.1 documentation currently isn't available for download at:
> > http://docs.python.org/ftp/python/doc/
> 
> It is avaiable here, though:
> 
> http://www.python.org/ftp/python/doc/current/

I need documentation created from the 'r261' tag, not from the HEAD of
the 'release26-maint' branch.

-- 
Arfrever Frehtes Taifersar Arahesis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081221/3672285d/attachment.pgp>

From brett at python.org  Sun Dec 21 00:15:22 2008
From: brett at python.org (Brett Cannon)
Date: Sat, 20 Dec 2008 15:15:22 -0800
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <cfb578b20812201345l331f918g3f654d3478f4c21b@mail.gmail.com>
References: <0016e64f68207a52a5045e6de625@google.com>
	<loom.20081219T224445-192@post.gmane.org>
	<ca471dc20812191503w3475ac17sb430c099cff62457@mail.gmail.com>
	<cfb578b20812201345l331f918g3f654d3478f4c21b@mail.gmail.com>
Message-ID: <bbaeab100812201515x7ab7f261r542f11efe9fb82c9@mail.gmail.com>

On Sat, Dec 20, 2008 at 13:45, Fabio Zadrozny <fabiofz at gmail.com> wrote:
> It appears that this bug was already reported: http://bugs.python.org/issue4705
>
> Any chance that it gets in the next 3.0.x bugfix release?
>
> Just as a note, if I do: sys.stdout._line_buffering = True, it also
> works, but doesn't seem right as it's accessing an internal attribute.
>
> Note 2: the solution that said to pass 'wb' does not work, because I
> need the output as text and not binary or text becomes garbled when
> it's not ascii.
>

Can't you decode the bytes after you receive them?

-Brett

> Thanks,
>
> Fabio
>
> On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum <guido at python.org> wrote:
>> Fror truly unbuffered text output you'd have to make changes to the
>> io.TextIOWrapper class to flush after each write() call. That's an API
>> change -- the constructor currently has a line_buffering option but no
>> option for completely unbuffered mode. It would also require some
>> changes to io.open() which currently rejects buffering=0 in text mode.
>> All that suggests that it should wait until 3.1.
>>
>> However it might make sense to at least turn on line buffering when -u
>> or PYTHONUNBUFFERED is given; that doesn't require API changes and so
>> can be considered a bug fix.
>>
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
>>
>>
>> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>>
>>>> Well, ``python -h`` still lists it.
>>>
>>> Precisely, it says:
>>>
>>> -u     : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
>>>         see man page for details on internal buffering relating to '-u'
>>>
>>> Note the "binary". And indeed:
>>>
>>> ./python -u
>>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54)
>>> [GCC 4.3.2] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import sys
>>>>>> sys.stdout.buffer.write(b"y")
>>> y1
>>>>>>
>>>
>>> I don't know what it would take to enable unbuffered text IO while keeping the
>>> current TextIOWrapper implementation...
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>

From solipsis at pitrou.net  Sun Dec 21 00:25:11 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 20 Dec 2008 23:25:11 +0000 (UTC)
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>	<18765.21740.137339.943481@montanaro-dyndns-org.local>	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<loom.20081220T210531-211@post.gmane.org>
	<494D754C.1050109@holdenweb.com>
Message-ID: <loom.20081220T232401-433@post.gmane.org>

Steve Holden <steve <at> holdenweb.com> writes:
> I believe the OP engendered a certain amount of confusion by describing
> object deallocation as being performed by the garbage collector. So he
> perhaps didn't understand that even decref'ing all the objects only
> referenced by the dict will take a huge amount of time unless there's
> enough real memory to hold it.

He said he has 64GB RAM so I assume all his working set was in memory, not
swapped out.




From alexandre at peadrop.com  Sun Dec 21 00:28:23 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sat, 20 Dec 2008 18:28:23 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
Message-ID: <acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>

On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman <tutufan at gmail.com> wrote:
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string, int).)
>  Nothing but possibly the strings and ints is shared.
>



> That is, after executing the final statement (a print), it is apparently spending a
> huge amount of time cleaning up before exiting.


> I have done 'gc.disable()' for performance (which is hideous without it)--I have
> no reason to think there are any loops.

From alexandre at peadrop.com  Sun Dec 21 00:40:25 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Sat, 20 Dec 2008 18:40:25 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
Message-ID: <acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>

[Sorry, for the previous garbage post.]

> On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman <tutufan at gmail.com> wrote:
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string, int).)
> Nothing but possibly the strings and ints is shared.

Could you give us more information about the dictionary. For example,
how many objects does it contain? Is 45GB the actual size of the
dictionary or of the Python process?

> That is, after executing the final statement (a print), it is apparently
> spending a huge amount of time cleaning up before exiting.

Most of this time is probably spent on DECREF'ing objects in the
dictionary. As other mentioned, it would useful to have self-contained
example to examine the behavior more closely.

> I have done 'gc.disable()' for performance (which is hideous without it)--I
> have no reason to think there are any loops.

Have you seen any significant difference in the exit time when the
cyclic GC is disabled or enabled?

-- Alexandre

From ncoghlan at gmail.com  Sun Dec 21 01:14:44 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 21 Dec 2008 10:14:44 +1000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com>
Message-ID: <494D8A74.9050306@gmail.com>

Tim Peters wrote:
> If that is the case here, there's no evident general solution.  If you
> have millions of objects still alive at exit, refcount-based
> reclamation has to visit all of them, and if they've been swapped out
> to disk it can take a very long time to swap them all back into memory
> again.

In that case, it sounds like using os._exit() to get out of the program
without visiting all that memory *is* the right answer (or as right an
answer as is available at least).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From andrew-pythondev at puzzling.org  Sun Dec 21 01:15:30 2008
From: andrew-pythondev at puzzling.org (Andrew Bennetts)
Date: Sun, 21 Dec 2008 11:15:30 +1100
Subject: [Python-Dev] extremely slow exit for program having huge
	(45G)	dict (python 2.5.2)
In-Reply-To: <18765.5492.200918.790182@montanaro-dyndns-org.local>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494CC400.7070404@bullseye.andymac.org>
	<giir4b$90m$1@ger.gmane.org>
	<18765.5492.200918.790182@montanaro-dyndns-org.local>
Message-ID: <20081221001530.GA32606@steerpike.home.puzzling.org>

skip at pobox.com wrote:
> 
>     Steve> Unfortunately there are doubtless programs out there that do rely
>     Steve> on actions being taken at shutdown.
> 
> Indeed.  I believe any code which calls atexit.register.
> 
>     Steve> Maybe os.exit() could be more widely advertised, though ...
> 
> That would be os._exit().  Calling it avoids calls to exit functions
> registered with atexit.register().  I believe it is both safe, and
> reasonable programming practice for modules to register exit functions.
> Both the logging and multiprocessing modules call it.  It's incumbent on the
> application programmer to know these details of the modules the app uses
> (perhaps indirectly) to know whether or not it's safe/wise to call
> os._exit().

You could call sys.exitfunc() just before os._exit().

-Andrew.


From ncoghlan at gmail.com  Sun Dec 21 01:28:25 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 21 Dec 2008 10:28:25 +1000
Subject: [Python-Dev] Call PyType_Ready on builtin
 types	during	interpreter startup?
In-Reply-To: <20081220170154.GA28166@panix.com>
References: <494C1CE4.5080102@gmail.com> <20081220170154.GA28166@panix.com>
Message-ID: <494D8DA9.6010307@gmail.com>

Aahz wrote:
> On Sat, Dec 20, 2008, Nick Coghlan wrote:
>> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of
>> the builtin types - they're left to have it called implicitly when an
>> operation using them needs tp_dict filled in.
> 
> This seems like a release blocker for 3.0.1 to me

The problem isn't actually as bad as I first thought (it turns out most
of the builtin types *are* fully initialised in _Py_ReadyTypes, which is
called from Py_InitializeEx). However, xrange/range are definitely
missing from that function (which is the actual proximate cause of the
strange range() hashing  behaviour in Py3k), and I'm still hoping
someone knows why the numeric types aren't being readied there when
certain parts of the core need additional handling to cope with the
possibility that those types aren't fully initialised (e.g.
PyObject_Format has a lazy call to PyType_Ready with a comment noting
that it may be asked to format floating point numbers before
PyType_Ready has otherwise been called for the float type).

That said, I have still added the range() hashing problem to the list of
release blockers.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From tutufan at gmail.com  Sun Dec 21 01:05:19 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 18:05:19 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
Message-ID: <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>

Tim, I left out some details that I believe probably rule out the
"swapped out" theory.  The machine in question has 64GB RAM, but only
16GB swap.  I'd prefer more swap, but in any case only around ~400MB
of the swap was actually in use during my program's entire run.
Furthermore, during my program's exit, it was using 100% CPU, and I'm
95% sure there was no significant "system" or "wait" CPU time for the
system.  (All observations via 'top'.)  So, I think that the problem
is entirely a computational one within this process.

The system does have 8 CPUs.  I'm not sure about it's memory
architecture, but if it's some kind of NUMA box, I guess access to
memory could be slower than what we'd normally expect.  I'm skeptical
about that being a significant factor here, though.

Just to clarify, I didn't gc.disable() to address this problem, but
rather because it destroys performance during the creation of the huge
dict.  I don't have a specific number, but I think disabling gc
reduced construction from something like 70 minutes to 5 (or maybe
10).  Quite dramatic.

Mike


>From Tim Peters:
BTW, the original poster should try this:  use whatever tools the OS
supplies to look at CPU and disk usage during the long exit.  What I
/expect/ is that almost no CPU time is being used, while the disk is
grinding itself to dust.  That's what happens when a large number of
objects have been swapped out to disk, and exit processing has to page
them all back into memory again (in order to decrement their
refcounts).

From tutufan at gmail.com  Sun Dec 21 01:22:40 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 18:22:40 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>
Message-ID: <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com>

Re "held" and "intern_it":  Haha!  That's evil and extremely evil,
respectively.  :-)

I will add these to the Python wiki if they're not already there...

Mike

From leif.walsh at gmail.com  Sun Dec 21 01:34:35 2008
From: leif.walsh at gmail.com (Leif Walsh)
Date: Sat, 20 Dec 2008 19:34:35 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com>
Message-ID: <cc7430500812201634lb2adc35y690fd5c2ed59484a@mail.gmail.com>

On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters <tim.peters at gmail.com> wrote:
> [Lots of answers]

Thanks.  Wish I could have offered something useful.

-- 
Cheers,
Leif

From solipsis at pitrou.net  Sun Dec 21 01:35:40 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 21 Dec 2008 00:35:40 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?extremely_slow_exit_for_program_having_hug?=
	=?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?=
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>
Message-ID: <loom.20081221T003354-848@post.gmane.org>

Mike Coleman <tutufan <at> gmail.com> writes:
> 
> Just to clarify, I didn't gc.disable() to address this problem, but
> rather because it destroys performance during the creation of the huge
> dict.  I don't have a specific number, but I think disabling gc
> reduced construction from something like 70 minutes to 5 (or maybe
> 10).  Quite dramatic.

There's a pending patch which should fix that problem:
http://bugs.python.org/issue4074

Regards

Antoine.



From tutufan at gmail.com  Sun Dec 21 02:09:00 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Sat, 20 Dec 2008 19:09:00 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
Message-ID: <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>

On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
<alexandre at peadrop.com> wrote:
> Could you give us more information about the dictionary. For example,
> how many objects does it contain? Is 45GB the actual size of the
> dictionary or of the Python process?

The 45G was the VM size of the process (resident size was similar).

The dict keys were all uppercase alpha strings of length 7.  I don't
have access at the moment, but maybe something like 10-100M of them
(not sure how redundant the set is).  The values are all lists of
pairs, where each pair is a (string, int).  The pair strings are of
length around 30, and drawn from a "small" fixed set of around 60K
strings ().  As mentioned previously, I think the ints are drawn
pretty uniformly from something like range(10000).  The length of the
lists depends on the redundancy of the key set, but I think there are
around 100-200M pairs total, for the entire dict.

(If you're curious about the application domain, see 'http://greylag.org'.)

> Have you seen any significant difference in the exit time when the
> cyclic GC is disabled or enabled?

Unfortunately, with GC enabled, the application is too slow to be
useful, because of the greatly increased time for dict creation.  I
suppose it's theoretically possible that with this increased time, the
long time for exit will look less bad by comparison, but I'd be
surprised if it makes any difference at all.  I'm confident that there
are no loops in this dict, and nothing for cyclic gc to collect.

Mike

From solipsis at pitrou.net  Sun Dec 21 02:18:52 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 21 Dec 2008 01:18:52 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?extremely_slow_exit_for_program_having_hug?=
	=?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?=
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
Message-ID: <loom.20081221T011745-512@post.gmane.org>

Mike Coleman <tutufan <at> gmail.com> writes:
> 
> The 45G was the VM size of the process (resident size was similar).

Can you reproduce it with a smaller working set? Something between 1 and 2GB,
possibly randomly-generated, and post both the generation script and the
problematic script on the bug tracker?




From musiccomposition at gmail.com  Sun Dec 21 04:25:16 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sat, 20 Dec 2008 21:25:16 -0600
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <200812210002.05587.Arfrever.FTA@gmail.com>
References: <200812202328.20045.Arfrever.FTA@gmail.com>
	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>
	<200812210002.05587.Arfrever.FTA@gmail.com>
Message-ID: <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>

On Sat, Dec 20, 2008 at 5:02 PM, Arfrever Frehtes Taifersar Arahesis
<arfrever.fta at gmail.com> wrote:
> 2008-12-20 23:46:15 Benjamin Peterson napisa?(a):
>> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
>> <arfrever.fta at gmail.com> wrote:
>> > Python 2.6.1 documentation currently isn't available for download at:
>> > http://docs.python.org/ftp/python/doc/
>>
>> It is avaiable here, though:
>>
>> http://www.python.org/ftp/python/doc/current/
>
> I need documentation created from the 'r261' tag, not from the HEAD of
> the 'release26-maint' branch.

I've made documentation for 2.6.1 now. It's at
http://www.python.org/ftp/python/doc/2.6.1
>



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From jeremy at alum.mit.edu  Sun Dec 21 05:21:59 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Sat, 20 Dec 2008 23:21:59 -0500
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
Message-ID: <e8bf7a530812202021t2bde652bk49b81283d23bc2d@mail.gmail.com>

4631 should be a release blocker.  I'll have a bit of time on Monday
and Tuesday to wrap it up.

Jeremy

On Fri, Dec 19, 2008 at 5:28 PM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'd like to get Python 3.0.1 out before the end of the year.  There are no
> showstoppers, but I haven't yet looked at the deferred blockers or the
> buildbots.
>
> Do you think we can get 3.0.1 out on December 24th?  Or should we wait until
> after Christmas and get it out, say on the 29th?  Do we need an rc?
>
> This question goes mostly to Martin and Georg.  What would work for you
> guys?
>
> - -Barry
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Darwin)
>
> iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/
> DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu
> xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor
> 5n6Ag5zT+e8=
> =oU/g
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>

From andymac at bullseye.apana.org.au  Sun Dec 21 07:30:29 2008
From: andymac at bullseye.apana.org.au (Andrew MacIntyre)
Date: Sun, 21 Dec 2008 17:30:29 +1100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	
	<494CC400.7070404@bullseye.andymac.org>
	<3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
Message-ID: <494DE285.6040301@bullseye.andymac.org>

Mike Coleman wrote:
> Andrew, this is on an (intel) x86_64 box with 64GB of RAM.  I don't
> recall the maker or details of the architecture off the top of my
> head, but it would be something "off the rack" from Dell or maybe HP.
> There were other users on the box at the time, but nothing heavy or
> that gave me any reason to think was affecting my program.
> 
> It's running CentOS 5 I think, so that might make glibc several years
> old.  Your malloc idea sounds plausible to me.  If it is a libc
> problem, it would be nice if there was some way we could tell malloc
> to "live for today because there is no tomorrow" in the terminal phase
> of the program.
> 
> I'm not sure exactly how to attack this.  Callgrind is cool, but no
> way will work on something this size.  Timed ltrace output might be
> interesting.  Or maybe a gprof'ed Python, though that's more work.

Some malloc()s (notably FreeBSD's) can be externally tuned at runtime
via options in environment variables or other mechanisms - the malloc
man page on your system might be helpful if your platform has something
like this.

It is likely that PyMalloc would be better with a way to disable the
free()ing of empty arenas, or move to an arrangement where (like the
various type free-lists in 2.6+) explicit action can force pruning of
empty arenas - there are other usage patterns than yours which would
benefit (performance wise) from not freeing arenas automatically.

-- 
-------------------------------------------------------------------------
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac at pcug.org.au             (alt) |        Belconnen ACT 2616
Web:    http://www.andymac.org/               |        Australia

From yinon.me at gmail.com  Sun Dec 21 10:19:39 2008
From: yinon.me at gmail.com (Yinon Ehrlich)
Date: Sun, 21 Dec 2008 11:19:39 +0200
Subject: [Python-Dev] os.defpath for Windows
Message-ID: <494E0A2B.4080704@gmail.com>

Hi,

just saw that os.defpath for Windows is defined as
	Lib/ntpath.py:30:defpath = '.;C:\\bin'

Most Windows machines I saw has no c:\bin directory.

Any reason why it was defined this way ?
Thanks,
	Yinon

From martin at v.loewis.de  Sun Dec 21 10:46:46 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 21 Dec 2008 10:46:46 +0100
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>
References: <200812202328.20045.Arfrever.FTA@gmail.com>	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>	<200812210002.05587.Arfrever.FTA@gmail.com>
	<1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>
Message-ID: <494E1086.5030608@v.loewis.de>

> I've made documentation for 2.6.1 now. It's at
> http://www.python.org/ftp/python/doc/2.6.1

In previous releases (back to 1.2), these files had version
numbers in them. It would be good if those could be added for
the more recent documentation sets as well.

Regards,
Martin

From martin at v.loewis.de  Sun Dec 21 10:48:13 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 21 Dec 2008 10:48:13 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <494DE285.6040301@bullseye.andymac.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>		<494CC400.7070404@bullseye.andymac.org>	<3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com>
	<494DE285.6040301@bullseye.andymac.org>
Message-ID: <494E10DD.2000505@v.loewis.de>

> It is likely that PyMalloc would be better with a way to disable the
> free()ing of empty arenas, or move to an arrangement where (like the
> various type free-lists in 2.6+) explicit action can force pruning of
> empty arenas - there are other usage patterns than yours which would
> benefit (performance wise) from not freeing arenas automatically.

Before such a mechanism is added, I'd like to establish for a fact
that this is an actual problem.

Regards,
Martin

From fabiofz at gmail.com  Sun Dec 21 12:28:39 2008
From: fabiofz at gmail.com (Fabio Zadrozny)
Date: Sun, 21 Dec 2008 09:28:39 -0200
Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
In-Reply-To: <bbaeab100812201515x7ab7f261r542f11efe9fb82c9@mail.gmail.com>
References: <0016e64f68207a52a5045e6de625@google.com>
	<loom.20081219T224445-192@post.gmane.org>
	<ca471dc20812191503w3475ac17sb430c099cff62457@mail.gmail.com>
	<cfb578b20812201345l331f918g3f654d3478f4c21b@mail.gmail.com>
	<bbaeab100812201515x7ab7f261r542f11efe9fb82c9@mail.gmail.com>
Message-ID: <cfb578b20812210328n21ef88aaj6bbe61aa5fd8af11@mail.gmail.com>

>> It appears that this bug was already reported: http://bugs.python.org/issue4705
>>
>> Any chance that it gets in the next 3.0.x bugfix release?
>>
>> Just as a note, if I do: sys.stdout._line_buffering = True, it also
>> works, but doesn't seem right as it's accessing an internal attribute.
>>
>> Note 2: the solution that said to pass 'wb' does not work, because I
>> need the output as text and not binary or text becomes garbled when
>> it's not ascii.
>>
>
> Can't you decode the bytes after you receive them?
>

Well, in short, no (long answer is that I probably could if I spent a
long time doing my own console instead of relying on what's already
done and working in Eclipse for all the current available languages it
supports, but that just doesn't seem right).

Also, it's seems easily solvable (enabling line buffering for the
python streams when -u is passed) in the Python side... My current
workaround is doing that on a custom site-initialization when a Python
3 interpreter is found, but I find that this is not the right way for
doing it, and it really feels like a Python bug.

-- Fabio

From dima at hlabs.spb.ru  Sun Dec 21 12:56:31 2008
From: dima at hlabs.spb.ru (Dmitry Vasiliev)
Date: Sun, 21 Dec 2008 14:56:31 +0300
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <16D50043-22B0-4711-BE91-E752953444EA@python.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>	<494C2369.5030901@gmail.com>
	<16D50043-22B0-4711-BE91-E752953444EA@python.org>
Message-ID: <494E2EEF.3080207@hlabs.spb.ru>

Barry Warsaw wrote:
> Thanks.  I've bumped that to release blocker for now.  If there are any
> other 'high' bugs that you want considered for 3.0.1, please make the
> release blockers too, for now.

I think wsgiref package needs to be fixed. For now it's totally broken.
I've already found 4 issues about that:

http://bugs.python.org/issue3348
http://bugs.python.org/issue3401
http://bugs.python.org/issue3795
http://bugs.python.org/issue4522


What needs to be fixed:

1. Headers handling in wsgiref.simple_server. Not so hard actually - in
a few places headers expected as a list object instead of a dict.

2. wsgiref.handlers should support bytes instead of str. I think WSGI
applications must return bytes as a result but we can allow Unicode
strings in start_response() because the resulting encoding for headers
is known and strings can be safely encoded. So the fix won't be so hard
too - few asserts needs to be fixed and headers output needs to be
directed through auxiliary encoding method.

3. Tests

4. Documentation examples.


I can create the patch before December 24th if needed.

-- 
Dmitry Vasiliev <dima at hlabs.spb.ru>
http://hlabs.spb.ru

From musiccomposition at gmail.com  Sun Dec 21 17:37:03 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sun, 21 Dec 2008 10:37:03 -0600
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <494E1086.5030608@v.loewis.de>
References: <200812202328.20045.Arfrever.FTA@gmail.com>
	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>
	<200812210002.05587.Arfrever.FTA@gmail.com>
	<1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>
	<494E1086.5030608@v.loewis.de>
Message-ID: <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com>

On Sun, Dec 21, 2008 at 3:46 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> In previous releases (back to 1.2), these files had version
> numbers in them. It would be good if those could be added for
> the more recent documentation sets as well.

I agree that adding version numbers would be nice, but I'm also afraid
of breaking people's automatic downloads of the documentation. Perhaps
add symlinks?



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

From stijn.deweirdt at ugent.be  Sun Dec 21 17:35:38 2008
From: stijn.deweirdt at ugent.be (Stijn De Weirdt)
Date: Sun, 21 Dec 2008 17:35:38 +0100
Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2
Message-ID: <1229877338.14751.34.camel@spike.ugent.be>

hi all,

i'm trying to build python 2.5.3 on centos5.2 x86_64 (base gcc is
4.1.2) 

output of env, configure, make -j and make test at
http://users.ugent.be/~stdweird/python-gcc-seg.tar.gz


this all seems ok (at least to me ;)
but the following code gives a segfault instead of an IOerror
fname='test123'
f=open(fname,'w')
f.read()

(test123 doesn't exists. it is a reduced problem from a scipy unittest).

with system python (2.4.3) i get:
IOError: [Errno 9] Bad file descriptor

any hints what might cause this (or how i can figure it out). i have a
coredump, but have no clue what to look for.

many thanks,

stijn
-- 
The system will shutdown in 5 minutes.


From skip at pobox.com  Sun Dec 21 18:53:18 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 21 Dec 2008 11:53:18 -0600
Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2
In-Reply-To: <1229877338.14751.34.camel@spike.ugent.be>
References: <1229877338.14751.34.camel@spike.ugent.be>
Message-ID: <18766.33422.496164.601910@montanaro-dyndns-org.local>


    Stijn> any hints what might cause this (or how i can figure it out). i
    Stijn> have a coredump, but have no clue what to look for.

I can reproduce it on my Mac.  The croak happens while it is attempting to
raise the exception about a bad file descriptor.  Unfortunately, in
PyErr_Restore the call to PyThreadState_GET() returns NULL which means that
_PyThreadState_Current is NULL.  I see no differences between pystate.[ch]
in the 2.5 and 2.6 branches.  There must be something different about the
way PyThreadState_Swap or PyThreadState_DeleteCurrent are used.  Those are
the only two routines which appear to set it.  

Did this not happen with 2.5.2?

-- 
Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/

From martin at v.loewis.de  Sun Dec 21 18:57:38 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 21 Dec 2008 18:57:38 +0100
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com>
References: <200812202328.20045.Arfrever.FTA@gmail.com>	
	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>	
	<200812210002.05587.Arfrever.FTA@gmail.com>	
	<1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>	
	<494E1086.5030608@v.loewis.de>
	<1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com>
Message-ID: <494E8392.40102@v.loewis.de>

> I agree that adding version numbers would be nice, but I'm also afraid
> of breaking people's automatic downloads of the documentation. Perhaps
> add symlinks?

For the releases that have been made, yes (or, actually, hard links
would work as well). For the releases yet to come, it would be good
if the release process created version-numbered files.

Regards,
Martin

From lists at cheimes.de  Sun Dec 21 19:27:29 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 21 Dec 2008 19:27:29 +0100
Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2
In-Reply-To: <18766.33422.496164.601910@montanaro-dyndns-org.local>
References: <1229877338.14751.34.camel@spike.ugent.be>
	<18766.33422.496164.601910@montanaro-dyndns-org.local>
Message-ID: <gim1qh$ser$1@ger.gmane.org>

skip at pobox.com schrieb:
>     Stijn> any hints what might cause this (or how i can figure it out). i
>     Stijn> have a coredump, but have no clue what to look for.
> 
> I can reproduce it on my Mac.  The croak happens while it is attempting to
> raise the exception about a bad file descriptor.  Unfortunately, in
> PyErr_Restore the call to PyThreadState_GET() returns NULL which means that
> _PyThreadState_Current is NULL.  I see no differences between pystate.[ch]
> in the 2.5 and 2.6 branches.  There must be something different about the
> way PyThreadState_Swap or PyThreadState_DeleteCurrent are used.  Those are
> the only two routines which appear to set it.  
> 
> Did this not happen with 2.5.2?

Wild guess: the bug might be related to
http://bugs.python.org/issue1683. From the top of my head it's the only
major change in the thread state code that I can recall.

Christian


From rhamph at gmail.com  Sun Dec 21 19:44:12 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 21 Dec 2008 11:44:12 -0700
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
Message-ID: <aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>

On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman <tutufan at gmail.com> wrote:
> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
>> Have you seen any significant difference in the exit time when the
>> cyclic GC is disabled or enabled?
>
> Unfortunately, with GC enabled, the application is too slow to be
> useful, because of the greatly increased time for dict creation.  I
> suppose it's theoretically possible that with this increased time, the
> long time for exit will look less bad by comparison, but I'd be
> surprised if it makes any difference at all.  I'm confident that there
> are no loops in this dict, and nothing for cyclic gc to collect.

Try putting an explicit gc.collect() at the end, with the usual
timestamps before and after.

After that try deleting your dict, then calling gc.collect(), with
timestamps throughout.


-- 
Adam Olsen, aka Rhamphoryncus

From musiccomposition at gmail.com  Sun Dec 21 19:54:07 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Sun, 21 Dec 2008 12:54:07 -0600
Subject: [Python-Dev] 2.6.1 documentation not available for download
In-Reply-To: <494E8392.40102@v.loewis.de>
References: <200812202328.20045.Arfrever.FTA@gmail.com>
	<1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com>
	<200812210002.05587.Arfrever.FTA@gmail.com>
	<1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com>
	<494E1086.5030608@v.loewis.de>
	<1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com>
	<494E8392.40102@v.loewis.de>
Message-ID: <1afaf6160812211054x1f6fba0eg8336280eaeba0284@mail.gmail.com>

On Sun, Dec 21, 2008 at 11:57 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> I agree that adding version numbers would be nice, but I'm also afraid
>> of breaking people's automatic downloads of the documentation. Perhaps
>> add symlinks?
>
> For the releases that have been made, yes (or, actually, hard links
> would work as well). For the releases yet to come, it would be good
> if the release process created version-numbered files.

Ok. I will add hardlinks for past releases and modify the Doc/Makefile
to add version numbers.


-- 
Regards,
Benjamin Peterson

From scott+python-dev at scottdial.com  Sun Dec 21 20:33:51 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Sun, 21 Dec 2008 14:33:51 -0500
Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2
In-Reply-To: <18766.33422.496164.601910@montanaro-dyndns-org.local>
References: <1229877338.14751.34.camel@spike.ugent.be>
	<18766.33422.496164.601910@montanaro-dyndns-org.local>
Message-ID: <494E9A1F.80808@scottdial.com>

skip at pobox.com wrote:
> Did this not happen with 2.5.2?

I have 2.5.1 and 2.5.2 and it produces an IOError, just as it should. So
this was indeed introduced by 2.5.3.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From scott+python-dev at scottdial.com  Sun Dec 21 21:57:29 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Sun, 21 Dec 2008 15:57:29 -0500
Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2
In-Reply-To: <1229877338.14751.34.camel@spike.ugent.be>
References: <1229877338.14751.34.camel@spike.ugent.be>
Message-ID: <494EADB9.4020408@scottdial.com>

Stijn De Weirdt wrote:
> but the following code gives a segfault instead of an IOerror
> fname='test123'
> f=open(fname,'w')
> f.read()

I've tracked this down to r67740:

"""
Issue #1706039: Support continued reading from a file even after
EOF was hit.
"""

Looking at the diff, I question the correctness of this patch. I believe
the actual issue is the Py_UniversalNewlineFread() was changed to make
calls to PyErr_SetFromErrno(), but then these calls occur within an
ALLOW_THREADS block.

I was going to try to make a new patch, but the test case that was added
for it succeeded *before* the patch was applied (I reverted fileobject.c
to r67739) on many platforms. I don't have access to a platform which
exhibits the problem described in the tracker.

Reading people's assessment, I *think* the correct patch is merely to
add a call to clearerr() just before calling fread() in each function
(to clear the EOF flag before performing the fread()). I don't really
understand what the point of all the other changes are in the diff. I
can't test my assessment because it seems the only platform discussed
that had a problem was OS X (and I don't have one of those).

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From martin at v.loewis.de  Mon Dec 22 11:06:10 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 11:06:10 +0100
Subject: [Python-Dev] Releasing 2.5.4
Message-ID: <494F6692.8000001@v.loewis.de>

It seems r67740 shouldn't have been committed. Since this
is a severe regression, I think I'll have to revert it, and
release 2.5.4 with just that change.

Unless I hear otherwise, I would release Python 2.5.4
(without a release candidate) tomorrow.

Regards,
Martin

From mal at egenix.com  Mon Dec 22 13:20:59 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 22 Dec 2008 13:20:59 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <494D6EA9.2040201@v.loewis.de>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>
	<494D6EA9.2040201@v.loewis.de>
Message-ID: <494F862B.60701@egenix.com>

On 2008-12-20 23:16, Martin v. L?wis wrote:
>>> I will try next week to see if I can come up with a smaller,
>>> submittable example.  Thanks.
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
> 
> I doubt that. The long exit times are usually caused by a bad
> malloc implementation.

With "garbage collection" I meant the process of Py_DECREF'ing the
objects in large containers or deeply nested structures, not the GC
mechanism for breaking circular references in Python.

This will usually also involve free() calls, so the malloc
implementation affects this as well. However, I've seen such long
exit times on Linux and Windows, which both have rather good
malloc implementations.

I don't think there's anything much we can do about it at the
interpreter level. Deleting millions of objects takes time and that's
not really surprising at all. It takes even longer if you have
instances with .__del__() methods written in Python.

Applications can choose other mechanisms for speeding up the
exit process in various (less clean) ways, if they have a need for
this.

BTW: Rather than using a huge in-memory dict, I'd suggest to either
use an on-disk dictionary such as the ones found in mxBeeBase or
a database.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 22 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From ggpolo at gmail.com  Mon Dec 22 13:45:36 2008
From: ggpolo at gmail.com (Guilherme Polo)
Date: Mon, 22 Dec 2008 10:45:36 -0200
Subject: [Python-Dev] [capi-sig] Exceptions with additional instance
	variables
In-Reply-To: <bfb4c52d0812220406x5877db66ud10d64b2407415d4@mail.gmail.com>
References: <bfb4c52d0812220406x5877db66ud10d64b2407415d4@mail.gmail.com>
Message-ID: <ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>

On Mon, Dec 22, 2008 at 10:06 AM,  <chojrak11 at gmail.com> wrote:
> On Mon, Dec 22, 2008 at 03:29, Guilherme Polo <ggpolo at gmail.com> wrote:
>> On Sun, Dec 21, 2008 at 11:02 PM,  <chojrak11 at gmail.com> wrote:
>>> Hello,
>>>
>>> I'm trying to implement custom exception that have to carry some
>>> useful info by means of instance members, to be used like:
>>>
>>> try:
>>>    // some code
>>> except MyException, data:
>>>    // use data.errorcode, data.errorcategory, data.errorlevel,
>>> data.errormessage and some others
>>>
>>> The question is - how to implement the instance variables with
>>> PyErr_NewException?
>>
>> Using PyErr_NewException is fine. You must understand that an
>> exception is a class, and thus PyErr_NewException creates one for you
>> and returns it.
>> Just like you would do with a class that has __dict__, set some
>> attributes to what you want. That is, use PyObject_SetAttrString or
>> something more appropriated for you.
>
> Ok so I did the following. In init function (forget refcounting and
> error checking for a moment ;-)
>
> PyObject *dict = PyDict_New();
> PyDict_SetItemString(dict, "errorcode", PyInt_FromLong(0));
> static PyObject *myexception =
> PyErr_NewException("module.MyException", NULL, dict);

You do not really have to create a dict here, one will be created for
you if you pass a NULL there.

> PyModule_AddObject(module, "MyException", myexception);
>
> It worked more or less as expected, the help shown:
>
>  |  ----------------------------------------------------------------------
>  |  Data and other attributes defined here:
>  |
>  |  errorcode = 0
>  |
>  |  ----------------------------------------------------------------------
>
> Then I did the following when raising the exception:
>
> PyObject_SetAttrString(myexception, "errorcode", PyInt_FromLong(111));
> PyErr_SetString(myexception, "Bad thing happened");
> return NULL;
>
> and the test code was:
> try:
>    do_bad_thing();
> except MyException, data:
>
> and you surely already guessed it -- data.errorcode was 0.... Not only
> that, module.MyException.errorcode was also 0...
>
> What I'm doing wrong? I certainly don't get the idea of exceptions in
> Python, especially what is being raised - a class or an instance?

There are two forms raise can take, both will end up involving a class
and a intsance.

> If
> the latter - how's the class instantiated?

You can call a class to instantiate it.

> If not - what about values
> in different threads? The docs are so vague about that...
>
>
> Thanks again in advance,
> Chojrak
>

Again, an exception is a class, so you could create a new type in C,
and do anything you wanted. But you probably don't want to create a
new type to achieve this, so there are two simple ways I'm going to
paste below:

#include "Python.h"

static PyObject *MyErr;

static PyMethodDef module_methods[] = {
	{"raise_test", (PyCFunction)raise_test, METH_NOARGS, NULL},
	{NULL},
};

PyMODINIT_FUNC
initfancy_exc(void)
{
	PyObject *m;

	m = Py_InitModule("fancy_exc", module_methods);
	if (m == NULL)
		return;

	MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL);

	Py_INCREF(MyErr);
	if (PyModule_AddObject(m, "err", MyErr) < 0)
		return;
}

the raise_test function is missing, pick one of these:

static PyObject *
raise_test(PyObject *self)
{
	PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42));
	PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one"));
	PyErr_SetString(MyErr, "All is good, I hope");
        return NULL;
}

or

static PyObject *
raise_test(PyObject *self)
{
	
	PyObject *t = PyTuple_New(3);
	PyTuple_SetItem(t, 0, PyString_FromString("error message"));
	PyTuple_SetItem(t, 1, PyInt_FromLong(10));
	PyTuple_SetItem(t, 2, PyString_FromString("category name here"));
	PyErr_SetObject(MyErr, t);
	Py_DECREF(t);
	return NULL;
}

In this second form you check for the args attribute of the exception.

-- 
-- Guilherme H. Polo Goncalves

From ggpolo at gmail.com  Mon Dec 22 13:48:46 2008
From: ggpolo at gmail.com (Guilherme Polo)
Date: Mon, 22 Dec 2008 10:48:46 -0200
Subject: [Python-Dev] [capi-sig] Exceptions with additional instance
	variables
In-Reply-To: <ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>
References: <bfb4c52d0812220406x5877db66ud10d64b2407415d4@mail.gmail.com>
	<ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>
Message-ID: <ac2200130812220448q26992132r5c028dcac71e0357@mail.gmail.com>

On Mon, Dec 22, 2008 at 10:45 AM, Guilherme Polo <ggpolo at gmail.com> wrote:
> On Mon, Dec 22, 2008 at 10:06 AM,  <chojrak11 at gmail.com> wrote:
>> On Mon, Dec 22, 2008 at 03:29, Guilherme Polo <ggpolo at gmail.com> wrote:
>>> On Sun, Dec 21, 2008 at 11:02 PM,  <chojrak11 at gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I'm trying to implement custom exception that have to carry some
>>>> useful info by means of instance members, to be used like:
>>>>
>>>> try:
>>>>    // some code
>>>> except MyException, data:
>>>>    // use data.errorcode, data.errorcategory, data.errorlevel,
>>>> data.errormessage and some others
>>>>
>>>> The question is - how to implement the instance variables with
>>>> PyErr_NewException?
>>>
>>> Using PyErr_NewException is fine. You must understand that an
>>> exception is a class, and thus PyErr_NewException creates one for you
>>> and returns it.
>>> Just like you would do with a class that has __dict__, set some
>>> attributes to what you want. That is, use PyObject_SetAttrString or
>>> something more appropriated for you.
>>
>> Ok so I did the following. In init function (forget refcounting and
>> error checking for a moment ;-)
>>
>> PyObject *dict = PyDict_New();
>> PyDict_SetItemString(dict, "errorcode", PyInt_FromLong(0));
>> static PyObject *myexception =
>> PyErr_NewException("module.MyException", NULL, dict);
>
> You do not really have to create a dict here, one will be created for
> you if you pass a NULL there.
>
>> PyModule_AddObject(module, "MyException", myexception);
>>
>> It worked more or less as expected, the help shown:
>>
>>  |  ----------------------------------------------------------------------
>>  |  Data and other attributes defined here:
>>  |
>>  |  errorcode = 0
>>  |
>>  |  ----------------------------------------------------------------------
>>
>> Then I did the following when raising the exception:
>>
>> PyObject_SetAttrString(myexception, "errorcode", PyInt_FromLong(111));
>> PyErr_SetString(myexception, "Bad thing happened");
>> return NULL;
>>
>> and the test code was:
>> try:
>>    do_bad_thing();
>> except MyException, data:
>>
>> and you surely already guessed it -- data.errorcode was 0.... Not only
>> that, module.MyException.errorcode was also 0...
>>
>> What I'm doing wrong? I certainly don't get the idea of exceptions in
>> Python, especially what is being raised - a class or an instance?
>
> There are two forms raise can take, both will end up involving a class
> and a intsance.
>
>> If
>> the latter - how's the class instantiated?
>
> You can call a class to instantiate it.
>
>> If not - what about values
>> in different threads? The docs are so vague about that...
>>
>>
>> Thanks again in advance,
>> Chojrak
>>
>
> Again, an exception is a class, so you could create a new type in C,
> and do anything you wanted. But you probably don't want to create a
> new type to achieve this

By creating a type I mean one that involves defining a tp_init, and
everything else your type needs, not about the simple one created by
PyErr_NewException.

> , so there are two simple ways I'm going to
> paste below:
>
> #include "Python.h"
>
> static PyObject *MyErr;
>
> static PyMethodDef module_methods[] = {
>        {"raise_test", (PyCFunction)raise_test, METH_NOARGS, NULL},
>        {NULL},
> };
>
> PyMODINIT_FUNC
> initfancy_exc(void)
> {
>        PyObject *m;
>
>        m = Py_InitModule("fancy_exc", module_methods);
>        if (m == NULL)
>                return;
>
>        MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL);
>
>        Py_INCREF(MyErr);
>        if (PyModule_AddObject(m, "err", MyErr) < 0)
>                return;
> }
>
> the raise_test function is missing, pick one of these:
>
> static PyObject *
> raise_test(PyObject *self)
> {
>        PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42));
>        PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one"));
>        PyErr_SetString(MyErr, "All is good, I hope");
>        return NULL;
> }
>
> or
>
> static PyObject *
> raise_test(PyObject *self)
> {
>
>        PyObject *t = PyTuple_New(3);
>        PyTuple_SetItem(t, 0, PyString_FromString("error message"));
>        PyTuple_SetItem(t, 1, PyInt_FromLong(10));
>        PyTuple_SetItem(t, 2, PyString_FromString("category name here"));
>        PyErr_SetObject(MyErr, t);
>        Py_DECREF(t);
>        return NULL;
> }
>
> In this second form you check for the args attribute of the exception.
>
> --
> -- Guilherme H. Polo Goncalves
>



-- 
-- Guilherme H. Polo Goncalves

From skip at pobox.com  Mon Dec 22 15:35:25 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 22 Dec 2008 08:35:25 -0600
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <494F6692.8000001@v.loewis.de>
References: <494F6692.8000001@v.loewis.de>
Message-ID: <18767.42413.194130.39594@montanaro-dyndns-org.local>


    Martin> It seems r67740 shouldn't have been committed. Since this is a
    Martin> severe regression, I think I'll have to revert it, and release
    Martin> 2.5.4 with just that change.

    Martin> Unless I hear otherwise, I would release Python 2.5.4 (without a
    Martin> release candidate) tomorrow.

I don't think there is a test case which fails with it applied and passes
with it removed.  If not, I think it might be worthwhile to write such a
test even if it's used temporarily just to test the change.  I wrote a
trivial test case:

Index: Lib/test/test_file.py
===================================================================
--- Lib/test/test_file.py       (revision 67899)
+++ Lib/test/test_file.py       (working copy)
@@ -116,6 +116,8 @@
         except:
             self.assertEquals(self.f.__exit__(*sys.exc_info()), None)

+    def testReadWhenWriting(self):
+        self.assertRaises(IOError, self.f.read)

 class OtherFileTests(unittest.TestCase):

which segfaults (on Solaris 10 at least) when run with the 2.5.3 released
code and which passes after I undo r67740.

Should we add this to the active branches (2.6, trunk, py3k, 3.0)?

Skip


From martin at v.loewis.de  Mon Dec 22 15:52:59 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 15:52:59 +0100
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <18767.42413.194130.39594@montanaro-dyndns-org.local>
References: <494F6692.8000001@v.loewis.de>
	<18767.42413.194130.39594@montanaro-dyndns-org.local>
Message-ID: <494FA9CB.4010802@v.loewis.de>

> Should we add this to the active branches (2.6, trunk, py3k, 3.0)?

Sure! Go ahead.

For 2.5.3, I'd rather not add an additional test case, but merely
revert the patch.

Regards,
Martin

From fdrake at acm.org  Mon Dec 22 15:39:18 2008
From: fdrake at acm.org (Fred Drake)
Date: Mon, 22 Dec 2008 09:39:18 -0500
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <18767.42413.194130.39594@montanaro-dyndns-org.local>
References: <494F6692.8000001@v.loewis.de>
	<18767.42413.194130.39594@montanaro-dyndns-org.local>
Message-ID: <D05D28D5-440F-43F8-A7DE-4B5302DA9512@acm.org>

On Dec 22, 2008, at 9:35 AM, skip at pobox.com wrote:
> I don't think there is a test case which fails with it applied and  
> passes
> with it removed.  If not, I think it might be worthwhile to write  
> such a
> test even if it's used temporarily just to test the change.  I wrote a
> trivial test case:

If this is sufficient to drive a release, then whatever test there is  
should be part of the release as well.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From barry at python.org  Mon Dec 22 17:15:27 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 22 Dec 2008 11:15:27 -0500
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <494C5C06.30109@v.loewis.de>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>
	<494C5C06.30109@v.loewis.de>
Message-ID: <A18EA0B8-F263-4B90-8276-A59B76873D80@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 19, 2008, at 9:44 PM, Martin v. L?wis wrote:

>> Do you think we can get 3.0.1 out on December 24th?
>
> I won't have physical access to my build machine from December 24th to
> January 3rd.


Okay.  Let's just push it until after the new year then.  In the mean  
time, please continue to work on fixes for 3.0.1.  I'm thinking  
tentatively to do a release the week of January 5th.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSU+9IHEjvBPtnXfVAQL0vQQAmxcMDP1GUuhCOxCVHqnSGaywdG1mz3f0
iNCNs4lVsRLYV/AVdf/tbpWyLbcUvFL0hUyLDp8PCScOjZReKwe6VpnujL/BwU5E
4P7RtUn493QGqkFJDjHNJ2SIcxOfzk9Y7E3qyS0QHPmsqmNpSD6ZQQd0PkdCoqQo
f08Z9HrKZZw=
=ujaK
-----END PGP SIGNATURE-----

From barry at python.org  Mon Dec 22 17:16:19 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 22 Dec 2008 11:16:19 -0500
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <494E2EEF.3080207@hlabs.spb.ru>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>	<494C2369.5030901@gmail.com>
	<16D50043-22B0-4711-BE91-E752953444EA@python.org>
	<494E2EEF.3080207@hlabs.spb.ru>
Message-ID: <D4C35C7F-D411-48CA-B42A-4FB3D9CD7133@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 21, 2008, at 6:56 AM, Dmitry Vasiliev wrote:

> Barry Warsaw wrote:
>> Thanks.  I've bumped that to release blocker for now.  If there are  
>> any
>> other 'high' bugs that you want considered for 3.0.1, please make the
>> release blockers too, for now.
>
> I think wsgiref package needs to be fixed. For now it's totally  
> broken.
> I've already found 4 issues about that:
>
> http://bugs.python.org/issue3348
> http://bugs.python.org/issue3401
> http://bugs.python.org/issue3795
> http://bugs.python.org/issue4522

Please make sure these issues are release blockers.  Fixes before  
January 5th would be able to make it into 3.0.1.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSU+9U3EjvBPtnXfVAQII5wP+M9tyL169XMIwoibqupyPErAjHNL+zWD1
wydak1MKc/gF6KvSFfs9t6uuI3p8GI42dNxeHXIXsCb1he16YfUgu7xG210ZJ9C3
YkDcr6vDDMYUvMI8XdVJGh9ASnQhrQRiyMI/TtiJTh16t3wnn78EH2F2IyrYcDrD
0xaKQjaK1+k=
=t6EL
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Mon Dec 22 17:38:24 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 22 Dec 2008 16:38:24 +0000 (UTC)
Subject: [Python-Dev] Python 3.0.1
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>	<494C2369.5030901@gmail.com>
	<16D50043-22B0-4711-BE91-E752953444EA@python.org>
	<494E2EEF.3080207@hlabs.spb.ru>
	<D4C35C7F-D411-48CA-B42A-4FB3D9CD7133@python.org>
Message-ID: <loom.20081222T163730-849@post.gmane.org>

Barry Warsaw <barry <at> python.org> writes:
> 
> Please make sure these issues are release blockers.  Fixes before  
> January 5th would be able to make it into 3.0.1.

Should http://bugs.python.org/issue4486 be a release blocker as well?
(I don't think so, but...)




From barry at python.org  Mon Dec 22 17:59:44 2008
From: barry at python.org (Barry Warsaw)
Date: Mon, 22 Dec 2008 11:59:44 -0500
Subject: [Python-Dev] Python 3.0.1
In-Reply-To: <loom.20081222T163730-849@post.gmane.org>
References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org>	<494C2369.5030901@gmail.com>
	<16D50043-22B0-4711-BE91-E752953444EA@python.org>
	<494E2EEF.3080207@hlabs.spb.ru>
	<D4C35C7F-D411-48CA-B42A-4FB3D9CD7133@python.org>
	<loom.20081222T163730-849@post.gmane.org>
Message-ID: <99E28236-03DE-4A21-93BF-B94B3114A6DE@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 22, 2008, at 11:38 AM, Antoine Pitrou wrote:

> Barry Warsaw <barry <at> python.org> writes:
>>
>> Please make sure these issues are release blockers.  Fixes before
>> January 5th would be able to make it into 3.0.1.
>
> Should http://bugs.python.org/issue4486 be a release blocker as well?
> (I don't think so, but...)

I don't think so either.  It would be nice to have, but it needn't  
hold up the release.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSU/HgnEjvBPtnXfVAQKzcAP+NThqngryODxF/bKpeMs/EhpjfI9HV4eC
Lul5LMocaxEe91ontMjhfnZQo6Tx/jJCGECzVLCLXVmrjKg7/d6/9TFEByc9OWFm
zODpRvQ+4u+jd8c8DcBQmEwuFJF4MQZ5x6SUP8HxRTLmWq1KMcGM5WTNHCxMoOVw
Gkg8JmknqjM=
=6teE
-----END PGP SIGNATURE-----

From tutufan at gmail.com  Mon Dec 22 19:01:56 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 12:01:56 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
Message-ID: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>

Thanks for all of the useful suggestions.  Here are some preliminary results.

With still gc.disable(), at the end of the program I first did a
gc.collect(), which took about five minutes.  (So, reason enough not
to gc.enable(), at least without Antoine's patch.)

After that, I did a .clear() on the huge dict.  That's where the time
is being spent.  Doing the suggested "poor man's profiling" (repeated
backtraces via gdb), for 20 or so samples, one is within libc free,
but all of the rest are in the same place (same source line) within
PyObjectFree (see below), sometimes within list_dealloc and sometimes
within tuple_dealloc.  So, apparently a lot of time is being spent in
this loop:


                        /* Case 3:  We have to move the arena towards the end
                         * of the list, because it has more free pools than
                         * the arena to its right.

                       ...

                        /* Locate the new insertion point by iterating over
                         * the list, using our nextarena pointer.
                         */
                        while (ao->nextarena != NULL &&
                                        nf > ao->nextarena->nfreepools) {
                                ao->prevarena = ao->nextarena;
                                ao->nextarena = ao->nextarena->nextarena;
                        }

Investigating further, from one stop, I used gdb to follow the chain
of pointers in the nextarena and prevarena directions.  There were
5449 and 112765 links, respectively.  maxarenas is 131072.

Sampling nf at different breaks gives values in the range(10,20).

This loop looks like an insertion sort.  If it's the case that only a
"few" iterations are ever needed for any given free, this might be
okay--if not, it would seem that this must be quadratic.

I attempted to look further by setting a silent break with counter
within the loop and another break after the loop to inspect the
counter, but gdb's been buzzing away on that for 40 minutes without
coming back.  That might mean that there are a lot of passes through
this loop per free (i.e., that gdb is taking a long time to process
100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't
handling this well.

In any case, this looks like the problem locus.

It's tempting to say "don't do this arena ordering optimization if
we're doing final cleanup", but really the program could have done
this .clear() at any point.  Maybe there needs to be a flag to disable
it altogether?  Or perhaps there's a smarter way to manage the list of
arena/free pool info.

Mike



Program received signal SIGINT, Interrupt.
0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at Objects/obmalloc.c:1064
1064				while (ao->nextarena != NULL &&
(gdb) bt
#0  0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at
Objects/obmalloc.c:1064
#1  0x0000000000433478 in list_dealloc (op=0x5ec043dd0) at
Objects/listobject.c:281
#2  0x000000000044075b in PyDict_Clear (op=0x74c7cd0) at
Objects/dictobject.c:757
#3  0x00000000004407b9 in dict_clear (mp=0x5ec043db0) at
Objects/dictobject.c:1776
#4  0x0000000000485905 in PyEval_EvalFrameEx (f=0x746ca50,
throwflag=<value optimized out>)
    at Python/ceval.c:3557
#5  0x000000000048725f in PyEval_EvalCodeEx (co=0x72643f0,
globals=<value optimized out>,
    locals=<value optimized out>, args=0x1, argcount=0, kws=0x72a5770,
kwcount=0, defs=0x743eba8,
    defcount=1, closure=0x0) at Python/ceval.c:2836
#6  0x00000000004855bc in PyEval_EvalFrameEx (f=0x72a55f0,
throwflag=<value optimized out>)
    at Python/ceval.c:3669
#7  0x000000000048725f in PyEval_EvalCodeEx (co=0x72644e0,
globals=<value optimized out>,
    locals=<value optimized out>, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0,
    closure=0x0) at Python/ceval.c:2836
#8  0x00000000004872a2 in PyEval_EvalCode (co=0x5ec043db0,
globals=0x543e41f10, locals=0x543b969c0)
    at Python/ceval.c:494
#9  0x00000000004a844e in PyRun_FileExFlags (fp=0x7171010,
    filename=0x7ffffaf6b419
"/home/mkc/greylag/main/greylag_reannotate.py", start=<value optimized
out>,
    globals=0x7194510, locals=0x7194510, closeit=1,
flags=0x7ffffaf69080) at Python/pythonrun.c:1273
#10 0x00000000004a86e0 in PyRun_SimpleFileExFlags (fp=0x7171010,
    filename=0x7ffffaf6b419
"/home/mkc/greylag/main/greylag_reannotate.py", closeit=1,
    flags=0x7ffffaf69080) at Python/pythonrun.c:879
#11 0x0000000000412275 in Py_Main (argc=<value optimized out>,
argv=0x7ffffaf691a8) at Modules/main.c:523
#12 0x00000030fea1d8b4 in __libc_start_main () from /lib64/libc.so.6
#13 0x0000000000411799 in _start ()





On Sun, Dec 21, 2008 at 12:44 PM, Adam Olsen <rhamph at gmail.com> wrote:
> On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman <tutufan at gmail.com> wrote:
>> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
>>> Have you seen any significant difference in the exit time when the
>>> cyclic GC is disabled or enabled?
>>
>> Unfortunately, with GC enabled, the application is too slow to be
>> useful, because of the greatly increased time for dict creation.  I
>> suppose it's theoretically possible that with this increased time, the
>> long time for exit will look less bad by comparison, but I'd be
>> surprised if it makes any difference at all.  I'm confident that there
>> are no loops in this dict, and nothing for cyclic gc to collect.
>
> Try putting an explicit gc.collect() at the end, with the usual
> timestamps before and after.
>
> After that try deleting your dict, then calling gc.collect(), with
> timestamps throughout.
>
>
> --
> Adam Olsen, aka Rhamphoryncus
>

From tutufan at gmail.com  Mon Dec 22 19:13:33 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 12:13:33 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494F862B.60701@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
Message-ID: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>

On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> BTW: Rather than using a huge in-memory dict, I'd suggest to either
> use an on-disk dictionary such as the ones found in mxBeeBase or
> a database.

I really want this to work in-memory.  I have 64G RAM, and I'm only
trying to use 45G of it ("only" 45G :-), and I don't need the results
to persist after the program finishes.

Python should be able to do this.  I don't want to hear "Just use Perl
instead" from my co-workers...  ;-)

From rhamph at gmail.com  Mon Dec 22 21:22:56 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 22 Dec 2008 13:22:56 -0700
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
Message-ID: <aac2c7cb0812221222y408a139diebcd04795eabb13c@mail.gmail.com>

On Mon, Dec 22, 2008 at 11:01 AM, Mike Coleman <tutufan at gmail.com> wrote:
> Thanks for all of the useful suggestions.  Here are some preliminary results.
>
> With still gc.disable(), at the end of the program I first did a
> gc.collect(), which took about five minutes.  (So, reason enough not
> to gc.enable(), at least without Antoine's patch.)
>
> After that, I did a .clear() on the huge dict.  That's where the time
> is being spent.  Doing the suggested "poor man's profiling" (repeated
> backtraces via gdb), for 20 or so samples, one is within libc free,
> but all of the rest are in the same place (same source line) within
> PyObjectFree (see below), sometimes within list_dealloc and sometimes
> within tuple_dealloc.  So, apparently a lot of time is being spent in
> this loop:
>
>
>                        /* Case 3:  We have to move the arena towards the end
>                         * of the list, because it has more free pools than
>                         * the arena to its right.
>
>                       ...
>
>                        /* Locate the new insertion point by iterating over
>                         * the list, using our nextarena pointer.
>                         */
>                        while (ao->nextarena != NULL &&
>                                        nf > ao->nextarena->nfreepools) {
>                                ao->prevarena = ao->nextarena;
>                                ao->nextarena = ao->nextarena->nextarena;
>                        }
>
> Investigating further, from one stop, I used gdb to follow the chain
> of pointers in the nextarena and prevarena directions.  There were
> 5449 and 112765 links, respectively.  maxarenas is 131072.
>
> Sampling nf at different breaks gives values in the range(10,20).
>
> This loop looks like an insertion sort.  If it's the case that only a
> "few" iterations are ever needed for any given free, this might be
> okay--if not, it would seem that this must be quadratic.
>
> I attempted to look further by setting a silent break with counter
> within the loop and another break after the loop to inspect the
> counter, but gdb's been buzzing away on that for 40 minutes without
> coming back.  That might mean that there are a lot of passes through
> this loop per free (i.e., that gdb is taking a long time to process
> 100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't
> handling this well.

To make sure that's the correct line please recompile python without
optimizations.  GCC happily reorders and merges different parts of a
function.

Adding a counter in C and recompiling would be a lot faster than using
a gdb hook.


-- 
Adam Olsen, aka Rhamphoryncus

From martin at v.loewis.de  Mon Dec 22 21:38:55 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 21:38:55 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
Message-ID: <494FFADF.7020609@v.loewis.de>

> Or perhaps there's a smarter way to manage the list of
> arena/free pool info.

If that code is the real problem (in a reproducible test case),
then this approach is the only acceptable solution. Disabling
long-running code is not acceptable.

Regards,
Martin

From tutufan at gmail.com  Mon Dec 22 21:43:33 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 14:43:33 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494FFADF.7020609@v.loewis.de>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
	<494FFADF.7020609@v.loewis.de>
Message-ID: <3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com>

On Mon, Dec 22, 2008 at 2:38 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Or perhaps there's a smarter way to manage the list of
>> arena/free pool info.
>
> If that code is the real problem (in a reproducible test case),
> then this approach is the only acceptable solution. Disabling
> long-running code is not acceptable.

By "disabling", I meant disabling the optimization that's trying to
rearrange the arenas so that more memory can be returned to the OS.
This presumably wouldn't be any worse than things were in Python 2.4,
when memory was never returned to the OS.

(I'm working on a test case.)

From krstic at solarsail.hcs.harvard.edu  Mon Dec 22 21:54:35 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Mon, 22 Dec 2008 15:54:35 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
Message-ID: <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>

On Dec 22, 2008, at 1:13 PM, Mike Coleman wrote:
> On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> BTW: Rather than using a huge in-memory dict, I'd suggest to either
>> use an on-disk dictionary such as the ones found in mxBeeBase or
>> a database.
>
> I really want this to work in-memory.  I have 64G RAM, and I'm only
> trying to use 45G of it ("only" 45G :-), and I don't need the results
> to persist after the program finishes.

It's still not clear to me, from reading the whole thread, precisely  
what you're seeing. A self-contained test case, preferably with  
generated random data, would be great, and save everyone a lot of  
investigation time. In the meantime, can you 1) turn off all swap  
files and partitions, and 2) confirm positively that your CPU cycles  
are burning up in userland?

(In general, unless you know exactly why your workload needs swap, and  
have written your program to take swapping into account, having _any_  
swap on a machine with 64GB RAM is lunacy. The machine will grind to a  
complete standstill long before filling up gigabytes of swap.)

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From mal at egenix.com  Mon Dec 22 22:07:38 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 22 Dec 2008 22:07:38 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>
	<494D6EA9.2040201@v.loewis.de>	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
Message-ID: <4950019A.7030509@egenix.com>

On 2008-12-22 19:13, Mike Coleman wrote:
> On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> BTW: Rather than using a huge in-memory dict, I'd suggest to either
>> use an on-disk dictionary such as the ones found in mxBeeBase or
>> a database.
> 
> I really want this to work in-memory.  I have 64G RAM, and I'm only
> trying to use 45G of it ("only" 45G :-), and I don't need the results
> to persist after the program finishes.
> 
> Python should be able to do this.  I don't want to hear "Just use Perl
> instead" from my co-workers...  ;-)

What kinds of objects are you storing in your dictionary ? Python
instances, strings, integers ?

The time it takes to deallocate the objects in your dictionary
depends a lot on the types you are using.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 22 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From martin at v.loewis.de  Mon Dec 22 22:11:56 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 22:11:56 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>	
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>	
	<494FFADF.7020609@v.loewis.de>
	<3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com>
Message-ID: <4950029C.5050107@v.loewis.de>

>> If that code is the real problem (in a reproducible test case),
>> then this approach is the only acceptable solution. Disabling
>> long-running code is not acceptable.
> 
> By "disabling", I meant disabling the optimization that's trying to
> rearrange the arenas so that more memory can be returned to the OS.

I meant the same thing - I'm opposed to giving up one feature or
optimization in favor of a different feature or optimization.

> This presumably wouldn't be any worse than things were in Python 2.4,
> when memory was never returned to the OS.

Going back to the state of Python 2.4 would not be acceptable.

Regards,
Martin

From chojrak11 at gmail.com  Mon Dec 22 22:21:21 2008
From: chojrak11 at gmail.com (chojrak11 at gmail.com)
Date: Mon, 22 Dec 2008 22:21:21 +0100
Subject: [Python-Dev] [capi-sig] Exceptions with additional instance
	variables
In-Reply-To: <ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>
References: <bfb4c52d0812220406x5877db66ud10d64b2407415d4@mail.gmail.com>
	<ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>
Message-ID: <bfb4c52d0812221321x1a0c0008p2b05c5aa32a3f141@mail.gmail.com>

2008/12/22 Guilherme Polo <ggpolo at gmail.com>:
> On Mon, Dec 22, 2008 at 10:06 AM,  <chojrak11 at gmail.com> wrote:
>
> #include "Python.h"
>
> static PyObject *MyErr;
>
> static PyMethodDef module_methods[] = {
>        {"raise_test1", (PyCFunction)raise_test1, METH_NOARGS, NULL},
>        {"raise_test2", (PyCFunction)raise_test2, METH_NOARGS, NULL},
>        {"raise_test3", (PyCFunction)raise_test3, METH_NOARGS, NULL},
>        {NULL},
> };
>
> PyMODINIT_FUNC
> initfancy_exc(void)
> {
>        PyObject *m;
>
>        m = Py_InitModule("fancy_exc", module_methods);
>        if (m == NULL)
>                return;
>
>        MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL);
>
>        Py_INCREF(MyErr);
>        if (PyModule_AddObject(m, "err", MyErr) < 0)
>                return;
> }
>
> static PyObject *
> raise_test1(PyObject *self)
> {
>        PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42));
>        PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one"));
>        PyErr_SetString(MyErr, "All is good, I hope");
>        return NULL;
> }
>
> static PyObject *
> raise_test2(PyObject *self)
> {
>
>        PyObject *t = PyTuple_New(3);
>        PyTuple_SetItem(t, 0, PyString_FromString("error message"));
>        PyTuple_SetItem(t, 1, PyInt_FromLong(10));
>        PyTuple_SetItem(t, 2, PyString_FromString("category name here"));
>        PyErr_SetObject(MyErr, t);
>        Py_DECREF(t);
>        return NULL;
> }
>
> In this second form you check for the args attribute of the exception.

static PyObject *
raise_test3(PyObject *self) {
PyObject *d = PyDict_New();
        PyDict_SetItemString(d, "category", PyInt_FromLong(111));
        PyDict_SetItemString(d, "message", PyString_FromString("error
message"));
        PyErr_SetObject(MyErr, d);
        Py_DECREF(d);
        return NULL;
}

(Small changes in the above code to be able to call more variants of
raise_test methods simultaneously.)

Yes! I finally understood this (I think...) So to explain things for
people like me:

1) PyErr_NewException creates *the class* in the module, it's a simple
method of creating exception classes, but classes created that way are
limited in features (i.e. cannot be manipulated from the module in all
ways a 'full' type can). Third argument to PyErr_NewException can be
NULL, in which case API will create an empty dictionary. After
creating the class you need to add it to the module with
PyModule_AddObject. Side note: If you want to specify a help for the
class, you do PyObject_SetAttrString on the class with the key
'__doc__'.

2) there's no instantiation anywhere:
    a. PyErr_SetString and PyErr_SetObject set the exception *class*
(exception type) and exception data -- see
http://docs.python.org/c-api/exceptions.html which notes that
exceptions are similar in concept to the global 'errno' variable, so
you just set what type of last error was and what error message (or
other data) you want to associate with it
    b. the "code" and "category" variables from raise_test1() in the
above example inserted with PyObject_SetAttrString() are *class*
variables, not instance variables:

try:
    fancy_exc.raise_test1()
except fancy_exc.err, e:
    print e.code, fancy_exc.err.code
print fancy_exc.err.code

it prints:
42 42
42

    c. the data is still present in the fancy_exc.err class after
exception handling is finished, which is ok for now but may be
problematic in case of multithreaded usage patterns (however I
probably don't understand how multithreading in Python works)

3) alternative to the above is to pass all required data to the
exception with PyErr_SetObject - you can prepare a dictionary or a
tuple earlier, which will be accessible with 'args' member:

try:
    fancy_exc.raise_test2()
except fancy_exc.err, e:
    print e.args[0]

If it's dictionary, the syntax is a bit weird because e.args is always a tuple:

try:
    fancy_exc.raise_test3()
except fancy_exc.err, e:
    print e.args[0]['category']

The 'args' values are unavailable outside of 'except' clause, however
you can still use the 'e' variable which retains the values. So it's
an instance variable.

4) creating the exception class using a new type in C (PyTypeObject
structure) would give the most robust solution because every nuance of
the class can be manipulated, but it's not worth the trouble now. I
can switch to it transparently at a later time. Transparently means
that nothing will need to be updated in Python solutions written by
the module users.

5) most of the projects I've inspected with Google Code Search use the
PyErr_NewException approach.

6) there's the option of using Cython which simplifies creating
extensions and hides many unnecessary internals.

Many thanks Guilherme and Stefan for your help and for the patience.


Kind regards,
Chojrak

From krstic at solarsail.hcs.harvard.edu  Mon Dec 22 22:23:12 2008
From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=)
Date: Mon, 22 Dec 2008 16:23:12 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <4950019A.7030509@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>
	<494D6EA9.2040201@v.loewis.de>	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
	<4950019A.7030509@egenix.com>
Message-ID: <8E0B5EC0-229C-4B31-80F3-569FFC2F43D7@solarsail.hcs.harvard.edu>

On Dec 22, 2008, at 4:07 PM, M.-A. Lemburg wrote:
> What kinds of objects are you storing in your dictionary ? Python
> instances, strings, integers ?

Answered in a previous message:

On Dec 20, 2008, at 8:09 PM, Mike Coleman wrote:
> The dict keys were all uppercase alpha strings of length 7.  I don't
> have access at the moment, but maybe something like 10-100M of them
> (not sure how redundant the set is).  The values are all lists of
> pairs, where each pair is a (string, int).  The pair strings are of
> length around 30, and drawn from a "small" fixed set of around 60K
> strings ().  As mentioned previously, I think the ints are drawn
> pretty uniformly from something like range(10000).  The length of the
> lists depends on the redundancy of the key set, but I think there are
> around 100-200M pairs total, for the entire dict.
>
> (If you're curious about the application domain, see 'http://greylag.org 
> '.)


--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From chojrak11 at gmail.com  Mon Dec 22 22:25:12 2008
From: chojrak11 at gmail.com (chojrak11 at gmail.com)
Date: Mon, 22 Dec 2008 22:25:12 +0100
Subject: [Python-Dev] [capi-sig] Exceptions with additional instance
	variables
In-Reply-To: <bfb4c52d0812221321x1a0c0008p2b05c5aa32a3f141@mail.gmail.com>
References: <bfb4c52d0812220406x5877db66ud10d64b2407415d4@mail.gmail.com>
	<ac2200130812220445v524d8288lf0df0d435433a87e@mail.gmail.com>
	<bfb4c52d0812221321x1a0c0008p2b05c5aa32a3f141@mail.gmail.com>
Message-ID: <bfb4c52d0812221325l61e8daa5n6e5128bbed248eb@mail.gmail.com>

Not this list, sorry....

From steve at pearwood.info  Mon Dec 22 22:45:42 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 23 Dec 2008 08:45:42 +1100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <494F862B.60701@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com>
Message-ID: <200812230845.42805.steve@pearwood.info>

On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
> On 2008-12-20 23:16, Martin v. L?wis wrote:
> >>> I will try next week to see if I can come up with a smaller,
> >>> submittable example.  Thanks.
> >>
> >> These long exit times are usually caused by the garbage collection
> >> of objects. This can be a very time consuming task.
> >
> > I doubt that. The long exit times are usually caused by a bad
> > malloc implementation.
>
> With "garbage collection" I meant the process of Py_DECREF'ing the
> objects in large containers or deeply nested structures, not the GC
> mechanism for breaking circular references in Python.
>
> This will usually also involve free() calls, so the malloc
> implementation affects this as well. However, I've seen such long
> exit times on Linux and Windows, which both have rather good
> malloc implementations.
>
> I don't think there's anything much we can do about it at the
> interpreter level. Deleting millions of objects takes time and that's
> not really surprising at all. It takes even longer if you have
> instances with .__del__() methods written in Python.


This behaviour appears to be specific to deleting dicts, not deleting 
random objects. I haven't yet confirmed that the problem still exists 
in trunk (I hope to have time tonight or tomorrow), but in my previous 
tests deleting millions of items stored in a list of tuples completed 
in a minute or two, while deleting the same items stored as key:item 
pairs in a dict took 30+ minutes. I say plus because I never had the 
patience to let it run to completion, it could have been hours for all 
I know.

> Applications can choose other mechanisms for speeding up the
> exit process in various (less clean) ways, if they have a need for
> this.
>
> BTW: Rather than using a huge in-memory dict, I'd suggest to either
> use an on-disk dictionary such as the ones found in mxBeeBase or
> a database.

The original poster's application uses 45GB of data. In my earlier 
tests, I've experienced the problem with ~ 300 *megabytes* of data: 
hardly what I would call "huge".



-- 
Steven D'Aprano

From chambon.pascal at wanadoo.fr  Mon Dec 22 22:49:58 2008
From: chambon.pascal at wanadoo.fr (Pascal Chambon)
Date: Mon, 22 Dec 2008 22:49:58 +0100
Subject: [Python-Dev] Hello everyone + little question around
	Cpython/stackless
Message-ID: <49500B86.1070605@wanadoo.fr>


Hello snakemen and snakewomen

I'm Pascal Chambon, a french engineer just leaving my Telecom School, 
blatantly fond of Python, of its miscellaneous offsprings and of all 
what's around dynamic languages and high level programming concepts.


I'm currently studying all I can find on stackless python, PYPY and the 
concepts they've brought to Python, and so far I wonder : since 
stackless python claims to be 100% compatible with CPython's extensions, 
faster, and brings lots of fun stuffs (tasklets, coroutines and no C 
stack), how comes it hasn't been merged back, to become the standard 
'fast' python implementation ? Would I have missed some crucial point 
around there ? Isn't that a pity to maintain two separate branches if 
they actually complete each other very well ?

Waiting for your lights on this subject,
regards,
Pascal




From martin at v.loewis.de  Mon Dec 22 22:58:24 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 22:58:24 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
Message-ID: <49500D80.2090201@v.loewis.de>

> Investigating further, from one stop, I used gdb to follow the chain
> of pointers in the nextarena and prevarena directions.  There were
> 5449 and 112765 links, respectively.  maxarenas is 131072.

To reduce the time for keeping sorted lists of arenas, I was first
thinking of a binheap. I had formulated it all, and don't want to
waste that effort, so I attach it below in case my second idea (right
below) is flawed.

It then occurred that there are only 64 different values for nfreepools,
as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping
the list sorted, I now propose to maintain 64 lists, accessible in
an array double-linked lists indexed by nfreepools. Whenever nfreepools
changes, the arena_object is unlinked from its current list,  and linked
into the new list. This should reduce the overhead for keeping the lists
sorted down from O(n) to O(1), with a moderate overhead of 64 pointers
(512 Bytes in your case).

Allocation of a new pool would have to do a linear search in these
pointers (finding the arena with the least number of pools); this
could be sped up with a finger pointing to the last index where
a pool was found (-1, since that pool will have moved).

Regards,
Martin

a) usable_arenas becomes an arena_object**, pointing to an array of
   maxarenas+1 arena*. A second variable max_usable_arenas is added.
   arena_object loses the prevarena pointer, and gains a usable_index
   value of type size_t (which is 0 for unused or completely allocated
   arena_objects).
   usable_arenas should stay heap-sorted, with the arena_object with
   the smallest nfreepools at index 1.

b) sink and swim operations are added, which keep usable_index intact
   whenever arena_object pointers get swapped.

c) whenever a pool is allocated in an arena, nfreepools decreases,
   and swim is called for the arena. whenever a pool becomes free,
   sink is called.

d) when the last pool was allocated in an arena, it is removed from
   the heap. likewise, when all pools are freed in an arena, it is
   removed from the heap and returned to the system.

e) when the first pool gets freed in an arena, it is added to the
   heap.

On each pool allocation/deallocation, this should get the O(n)
complexity of keeping the arena list sorted down to O(log n).

From skip at pobox.com  Mon Dec 22 23:02:06 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 22 Dec 2008 16:02:06 -0600
Subject: [Python-Dev] If I check something in ...
Message-ID: <18768.3678.749094.475868@montanaro-dyndns-org.local>


I have this trivial little test case for test_file.py:

    +    def testReadWhenWriting(self):
    +        self.assertRaises(IOError, self.f.read)

I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x
trunk and the py3k branch.  What is the preferred way to do that?  Do I
really have to do the same task four times or can I check it in once (or
twice) secure in the belief that someone will come along and do a monster
merge?

Thx,

Skip


From musiccomposition at gmail.com  Mon Dec 22 23:06:41 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Mon, 22 Dec 2008 16:06:41 -0600
Subject: [Python-Dev] If I check something in ...
In-Reply-To: <18768.3678.749094.475868@montanaro-dyndns-org.local>
References: <18768.3678.749094.475868@montanaro-dyndns-org.local>
Message-ID: <1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com>

On Mon, Dec 22, 2008 at 4:02 PM,  <skip at pobox.com> wrote:
>
> I have this trivial little test case for test_file.py:
>
>    +    def testReadWhenWriting(self):
>    +        self.assertRaises(IOError, self.f.read)
>
> I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x
> trunk and the py3k branch.  What is the preferred way to do that?  Do I
> really have to do the same task four times or can I check it in once (or
> twice) secure in the belief that someone will come along and do a monster
> merge?

If you check it into the trunk, it will find it's way into 2.6, 3.1, and 3.0.



-- 
Regards,
Benjamin Peterson

From martin at v.loewis.de  Mon Dec 22 23:23:58 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 23:23:58 +0100
Subject: [Python-Dev] Hello everyone + little question
	around	Cpython/stackless
In-Reply-To: <49500B86.1070605@wanadoo.fr>
References: <49500B86.1070605@wanadoo.fr>
Message-ID: <4950137E.8040506@v.loewis.de>

> I'm currently studying all I can find on stackless python, PYPY and the
> concepts they've brought to Python, and so far I wonder : since
> stackless python claims to be 100% compatible with CPython's extensions,
> faster, and brings lots of fun stuffs (tasklets, coroutines and no C
> stack), how comes it hasn't been merged back, to become the standard
> 'fast' python implementation ?

There is a long history to it, and multiple reasons influenced that
status. In summary, some of the reasons were:
- Stackless Python was never officially proposed for inclusion into
  Python (it may be that parts of it were, and of those parts actually
  did get added).
- Stackless Python originally was fairly unmaintainable; this prevented
  its inclusion.
- in its current form, it has limited portability, as it needs to
  be ported to each microprocessor and operating system separately.
  CPython has so far avoided using assembler code, and is fairly
  portable.

Regards,
Martin

From martin at v.loewis.de  Mon Dec 22 23:27:07 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 22 Dec 2008 23:27:07 +0100
Subject: [Python-Dev] If I check something in ...
In-Reply-To: <18768.3678.749094.475868@montanaro-dyndns-org.local>
References: <18768.3678.749094.475868@montanaro-dyndns-org.local>
Message-ID: <4950143B.50100@v.loewis.de>

> I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x
> trunk and the py3k branch.  What is the preferred way to do that?  Do I
> really have to do the same task four times or can I check it in once (or
> twice) secure in the belief that someone will come along and do a monster
> merge?

You shouldn't check it in four times. But (IMO) you also shouldn't wait
for somebody else to merge it (I know some people disagree with that
recommendation).

Instead, you should commit it into trunk, and then run svnmerge.py three
times, namely:

- in a release26-maint checkout, run

    svnmerge.py -r<yourrev>
    svn commit -F svnmerge-commit-something-press-tab

- in a py3k checkout, run

    svnmerge.py -r<yourrev>
    svn commit -F svnmerge-commit-something-press-tab

- in a release30-maint check, then run

   svnmerge.py -r<revfrom3k>
   svn revert .
   svn commit -F svnmerge-commit-something-press-tab

Regards,
Martin

From solipsis at pitrou.net  Mon Dec 22 23:35:30 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 22 Dec 2008 22:35:30 +0000 (UTC)
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
	<49500D80.2090201@v.loewis.de>
Message-ID: <loom.20081222T222256-318@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:
> 
> It then occurred that there are only 64 different values for nfreepools,
> as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping
> the list sorted, I now propose to maintain 64 lists, accessible in
> an array double-linked lists indexed by nfreepools. Whenever nfreepools
> changes, the arena_object is unlinked from its current list,  and linked
> into the new list. This should reduce the overhead for keeping the lists
> sorted down from O(n) to O(1), with a moderate overhead of 64 pointers
> (512 Bytes in your case).
> 
> Allocation of a new pool would have to do a linear search in these
> pointers (finding the arena with the least number of pools);

You mean the least number of free pools, right? IIUC, the heuristic is to favour
a small number of busy arenas rather than a lot of sparse ones.
And, by linear search in these pointers, do you mean just probe the 64 lists for
the first non-NULL list head? If so, then it's likely fast enough for a rather
infrequent operation.

Now, we should find a way to benchmark this without having to steal Mike's
machine and wait 30 minutes every time.

Regards

Antoine.



From musiccomposition at gmail.com  Mon Dec 22 23:39:08 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Mon, 22 Dec 2008 16:39:08 -0600
Subject: [Python-Dev] If I check something in ...
In-Reply-To: <4950143B.50100@v.loewis.de>
References: <18768.3678.749094.475868@montanaro-dyndns-org.local>
	<4950143B.50100@v.loewis.de>
Message-ID: <1afaf6160812221439p49931977jad433cf95369c071@mail.gmail.com>

On Mon, Dec 22, 2008 at 4:27 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> You shouldn't check it in four times. But (IMO) you also shouldn't wait
> for somebody else to merge it (I know some people disagree with that
> recommendation).

I don't completely disagree. Certainly, if you want to make sure your
change is merged correctly into every branches, then please do merge
it yourself. It's also nice if platform-specific merges (ie Windows
build files) are handled by the original committer. However, minor
changes to the documentation or code formatting and even simple bug
fixes are trivial to merge all at once between branches. In the end, I
suppose it doesn't really matter; everyone can do what they are
comfortable with.

-- 
Regards,
Benjamin

From martin at v.loewis.de  Mon Dec 22 23:55:40 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 22 Dec 2008 23:55:40 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <loom.20081222T222256-318@post.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>	<49500D80.2090201@v.loewis.de>
	<loom.20081222T222256-318@post.gmane.org>
Message-ID: <49501AEC.3010805@v.loewis.de>

>> Allocation of a new pool would have to do a linear search in these
>> pointers (finding the arena with the least number of pools);
> 
> You mean the least number of free pools, right?

Correct.

> IIUC, the heuristic is to favour
> a small number of busy arenas rather than a lot of sparse ones.

Correct. Or, more precisely, the hope is indeed to make most arenas
sparse, so that they eventually see all their pools freed.

> And, by linear search in these pointers, do you mean just probe the 64 lists for
> the first non-NULL list head?

Correct.

> If so, then it's likely fast enough for a rather infrequent operation.

I would hope so, yes. However, the same hope applied to the current
code (how much time can it take to sink an arena in a linear list?),
so if we have the prospect of using larger arenas some day, this might
change.

> Now, we should find a way to benchmark this without having to steal Mike's
> machine and wait 30 minutes every time.

I think this can be simulated by using just arena objects, with no
associated arenas, and just adjusting pool counters. Allocate 100,000
arena objects, and start out with them all being completely allocated.
Then randomly chose one arena to deallocate a pool from; from time to
time, also allocate a new pool. Unfortunately, this will require some
hacking of the code to take the measurements.

Alternatively, make the arena size 4k, and the pool size 32 bytes, and
then come with a pattern to allocate and deallocate 8 byte blocks.
Not sure whether the code works for these parameters, though (but
it might be useful to fix it for non-standard sizes). This would require
only 400MiB of memory to run the test.

I think obmalloc is fairly independent from the rest of Python,
so it should be possible to link it with a separate main() function,
and nothing else of Python.

Regards,
Martin

From tutufan at gmail.com  Tue Dec 23 00:28:48 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 17:28:48 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
	<13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>
Message-ID: <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com>

On Mon, Dec 22, 2008 at 2:54 PM, Ivan Krsti?
<krstic at solarsail.hcs.harvard.edu> wrote:
> It's still not clear to me, from reading the whole thread, precisely what
> you're seeing. A self-contained test case, preferably with generated random
> data, would be great, and save everyone a lot of investigation time.

I'm still working on a test case.  The first couple of attempts, using
a half-hearted attempt to model the application behavior wrt this dict
didn't demonstrate bad behavior.

My impression is that no one's burning much time on this but me at the
moment, aside from offering helpful advice.  If you are, you might
want to wait.  I noticed just now that the original hardware was
throwing some chipkills, so I'm retesting on something else.


> In the
> meantime, can you 1) turn off all swap files and partitions, and 2) confirm
> positively that your CPU cycles are burning up in userland?

For (1), I don't have that much control over the machine.  Plus, based
on watching with top, I seriously doubt the process is using swap in
any way.  For (2), yes, 100% CPU usage.

> (In general, unless you know exactly why your workload needs swap, and have
> written your program to take swapping into account, having _any_ swap on a
> machine with 64GB RAM is lunacy. The machine will grind to a complete
> standstill long before filling up gigabytes of swap.)

The swap is not there to support my application per se.  Clearly if
you're swapping, generally you're crawling.  This host is used by a
reasonably large set of non- and novice programmers, who sometimes
vacuum up VM without realizing it.  If you have a nice, big swap
space, you can 'kill -STOP' these offenders, and allow them to swap
out while you have a leisurely discussion with the owner and possibly
'kill -CONT' later, as opposed to having to do a quick 'kill -KILL' to
save the machine.  That's my thinking, anyway.

Mike

From tutufan at gmail.com  Tue Dec 23 01:19:10 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 18:19:10 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <aac2c7cb0812221222y408a139diebcd04795eabb13c@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
	<aac2c7cb0812221222y408a139diebcd04795eabb13c@mail.gmail.com>
Message-ID: <3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com>

On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen <rhamph at gmail.com> wrote:
> To make sure that's the correct line please recompile python without
> optimizations.  GCC happily reorders and merges different parts of a
> function.
>
> Adding a counter in C and recompiling would be a lot faster than using
> a gdb hook.

Okay, I did this.  The results are the same, except that now sampling
selects the different source statements within this loop, instead of
just the top of the loop (which makes sense).

I added a counter (static volatile long) as suggested, and a
breakpoint to sample it.  Not every pass through PyObject_Free takes
case 3, but for those that do, this loop runs around 100-25000 times.
I didn't try to graph it, but based on a quick sample, it looks like
more than 5000 iterations on most occasions.

The total counter is 12.4 billion at the moment, and still growing.
That seems high, but I'm not sure what would be expected or hoped for.

I have a script that demonstrates the problem, but unfortunately the
behavior isn't clearly bad until large amounts of memory are used.  I
don't think it shows at 2G, for example.  (A 32G machine is
sufficient.)  Here is a log of running the program at different sizes
($1):

1 4.04686999321 0.696660041809
2 8.1575551033 1.46393489838
3 12.6426320076 2.30558800697
4 16.471298933 3.80377006531
5 20.1461620331 4.96685886383
6 25.150053978 5.48230814934
7 28.9099609852 7.41244196892
8 32.283219099 6.31711483002
9 36.6974511147 7.40236377716
10 40.3126089573 9.01174497604
20 81.7559120655 20.3317198753
30 123.67071104 31.4815018177
40 161.935647011 61.4484620094
50 210.610441923 88.6161060333
60 248.89805007 118.821491003
70 288.944771051 194.166989088
80 329.93295002 262.14109993
90 396.209988832 454.317914009
100 435.610564947 564.191882133

If you plot this, it is clearly quadratic (or worse).

Here is the script:

#!/usr/bin/env python


"""
Try to trigger quadratic (?) behavior during .clear() of a large but simple
defaultdict.

"""


from collections import defaultdict
import time
import sys

import gc; gc.disable()


print >> sys.stderr, sys.version

h = defaultdict(list)

n = 0

lasttime = time.time()


megs = int(sys.argv[1])

print megs,
sys.stdout.flush()

# 100M iterations -> ~24GB? on my 64-bit host

for i in xrange(megs * 1024 * 1024):
    s = '%0.7d' % i
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
#   if (i % 1000000) == 0:
#       t = time.time()
#       print >> sys.stderr, t-lasttime
#       lasttime = t

t = time.time()
print t-lasttime,
sys.stdout.flush()
lasttime = t

h.clear()

t = time.time()
print t-lasttime,
sys.stdout.flush()
lasttime = t

print

From krstic at solarsail.hcs.harvard.edu  Tue Dec 23 01:32:25 2008
From: krstic at solarsail.hcs.harvard.edu (=?ISO-8859-2?Q?Ivan_Krsti=E6?=)
Date: Mon, 22 Dec 2008 19:32:25 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
	<13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>
	<3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com>
Message-ID: <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu>

On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
> For (2), yes, 100% CPU usage.

100% _user_ CPU usage? (I'm trying to make sure we're not chasing some  
particular degeneration of kmalloc/vmalloc and friends.)

--
Ivan Krsti? <krstic at solarsail.hcs.harvard.edu> | http://radian.org


From solipsis at pitrou.net  Tue Dec 23 01:34:53 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 23 Dec 2008 00:34:53 +0000 (UTC)
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>	<49500D80.2090201@v.loewis.de>
	<loom.20081222T222256-318@post.gmane.org>
	<49501AEC.3010805@v.loewis.de>
Message-ID: <loom.20081223T002448-972@post.gmane.org>


> Now, we should find a way to benchmark this without having to steal Mike's
> machine and wait 30 minutes every time.

So, I seem to reproduce it. The following script takes about 15 seconds to
run and allocates a 2 GB dict which it deletes at the end (gc disabled of
course).
With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher
(including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular
but the difference is clear.

Also, after the dict is deleted and before the program exits, you can witness
(with `ps` or `top`) that 2.5 and higher has reclaimed 1GB, while 2.4 has
reclaimed nothing. There is a sleep() call at the end so that you have the
time :-)

You can tune memory occupation at the beginning of the script, but the lower
the more difficult it will be to witness a difference.

Regards

Antoine.


#######


import random
import time
import gc
import itertools


# Adjust this parameter according to your system RAM!
target_size = int(2.0  * 1024**3)  # 2.0 GB

pool_size = 4 * 1024
# This is a ballpark estimate: 60 bytes overhead for each
# { dict entry struct + float object + tuple object header },
# 1.3 overallocation factor for the dict.
target_length = int(target_size / (1.3 * (pool_size + 60)))


def make_dict():
    print ("filling dict up to %d entries..." % target_length)

    # 1. Initialize the dict from a set of pre-computed random keys.
    keys = [random.random() for i in range(target_length)]
    d = dict.fromkeys(keys)

    # 2. Build the values that will constitute the dict. Each value will, as
    #    far as possible, span a contiguous `pool_size` memory area.

    # Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc()
    # We avoid that by allocating tuples of smaller longs.
    int_size = 200
    # 24 roughly accounts for the long object overhead (YMMV)
    int_start = 1 << ((int_size - 24) * 8 - 7)
    int_range = range(1, 1 + pool_size // int_size)

    values = [None] * target_length
    # Maximize allocation locality by pre-allocating the values
    for n in range(target_length):
        values[n] = tuple(int_start + j for j in int_range)
        if n % 10000 == 0:
            print ("  %d iterations" % n)

    # The keys are iterated over in their original order rather than in
    # dict order, so as to randomly spread the values in the internal dict
    # table wrt. allocation address.
    for n, k in enumerate(keys):
        d[k] = values[n]

    print ("dict filled!")
    return d

if __name__ == "__main__":
    gc.disable()
    t1 = time.time()
    d = make_dict()
    t2 = time.time()
    print (" -> %.3f s." % (t2 - t1))
    print ("deleting dict...")
    t2 = time.time()
    del d
    t3 = time.time()
    print (" -> %.3f s." % (t3 - t2))
    print ("Finished, you can press Ctrl+C.")
    time.sleep(10.0)




From skip at pobox.com  Tue Dec 23 01:41:41 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 22 Dec 2008 18:41:41 -0600
Subject: [Python-Dev] If I check something in ...
In-Reply-To: <1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com>
References: <18768.3678.749094.475868@montanaro-dyndns-org.local>
	<1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com>
Message-ID: <18768.13253.753399.192276@montanaro-dyndns-org.local>


    Benjamin> If you check it into the trunk, it will find it's way into
    Benjamin> 2.6, 3.1, and 3.0.

Outstanding!

Thx,

Skip

From ncoghlan at gmail.com  Tue Dec 23 02:24:44 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 23 Dec 2008 11:24:44 +1000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <200812230845.42805.steve@pearwood.info>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<200812230845.42805.steve@pearwood.info>
Message-ID: <49503DDC.7080107@gmail.com>

Steven D'Aprano wrote:
> This behaviour appears to be specific to deleting dicts, not deleting 
> random objects. I haven't yet confirmed that the problem still exists 
> in trunk (I hope to have time tonight or tomorrow), but in my previous 
> tests deleting millions of items stored in a list of tuples completed 
> in a minute or two, while deleting the same items stored as key:item 
> pairs in a dict took 30+ minutes. I say plus because I never had the 
> patience to let it run to completion, it could have been hours for all 
> I know.

There's actually an interesting comment in list_dealloc:

	/* Do it backwards, for Christian Tismer.
	   There's a simple test case where somehow this reduces
	   thrashing when a *very* large list is created and
	   immediately deleted. */

The "backwards" the comment is referring to is the fact that it invokes
DECREF on the last item in the list first and counts back down to the
first item, instead of starting at the first item and incrementing the
index each time around the loop.

The revision number on that (13452) indicates that it predates the
implementation of PyObject_Malloc and friends, so it was probably
avoiding pathological behaviour in platform malloc() implementations by
free'ing memory in the reverse order to which it was allocated (assuming
the list was built initially starting with the first item).

However, I'm now wondering it if also has the side effect of avoiding
the quadratic behaviour Mike has found inside the more recent code to
release arenas back to the OS.

I'm working on a simple benchmark that looks for non-linear scaling of
the deallocation times - I'll include a case of deallocation of a
reversed list along with a normal list and a dictionary.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From tutufan at gmail.com  Tue Dec 23 03:05:06 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 20:05:06 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
	<13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>
	<3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com>
	<002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu>
Message-ID: <3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com>

2008/12/22 Ivan Krsti? <krstic at solarsail.hcs.harvard.edu>:
> On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote:
>>
>> For (2), yes, 100% CPU usage.
>
> 100% _user_ CPU usage? (I'm trying to make sure we're not chasing some
> particular degeneration of kmalloc/vmalloc and friends.)

Yes, user.  No noticeable sys or wait CPU going on.

From tutufan at gmail.com  Tue Dec 23 03:32:07 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Mon, 22 Dec 2008 20:32:07 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com>
	<13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu>
	<3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com>
	<002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu>
	<3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com>
Message-ID: <3c6c07c20812221832q79295e4au7e7ba9471749e743@mail.gmail.com>

I unfortunately don't have time to work out how obmalloc works myself,
but I wonder if any of the constants in that file might need to scale
somehow with memory size.  That is, is it possible that some of them
that work okay with 1G RAM won't work well with (say) 128G or 1024G
(coming soon enough)?

From ncoghlan at gmail.com  Tue Dec 23 04:25:47 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 23 Dec 2008 13:25:47 +1000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>	<aac2c7cb0812221222y408a139diebcd04795eabb13c@mail.gmail.com>
	<3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com>
Message-ID: <49505A3B.2000101@gmail.com>

Mike Coleman wrote:
> If you plot this, it is clearly quadratic (or worse).

Here's another comparison script that tries to probe the vagaries of the
obmalloc implementation. It looks at the proportional increases in
deallocation times for lists and dicts as the number of contained items
increases when using a variety of deallocation orders:
- in hash order (dict)
- in reverse order of allocation (list)
- in order of allocation (list, reversed in place)
- in random order (list, shuffled in place using the random module)

I've included the final output from a run on my own machine below [1],
but here are the main points I get out of it:
- at the sizes I can test (up to 20 million items in the containers),
this version of the script doesn't show any particularly horrible
non-linearity with deallocation of dicts, lists or reversed lists.
- when the items in a list are deallocated in *random* order, however,
the deallocation times are highly non-linear - by the time we get to 20
million items, deallocating in random order takes nearly twice as long
as deallocation in either order of allocation or in reverse order.
- after the list of items had been deallocated in random order,
subsequent deallocation of a dict and the list took significantly longer
than when those operations took place on a comparatively "clean"
obmalloc state.

I'm going to try making a new version of the script that uses random
integers with a consistent number of digits in place of the monotically
increasing values that are currently used and see what effect that has
on the dict scaling (that's where I expect to see the greatest effect,
since the hash ordering is the one which will be most affected by the
change to the item contents).

Cheers,
Nick.

[1] Full final results from local test run:

Dict: (Baseline=0.003135 seconds)
  100000=100.0%
  1000000=1020.9%
  2000000=2030.5%
  5000000=5026.7%
  10000000=10039.7%
  20000000=20086.4%
List: (Baseline=0.005764 seconds)
  100000=100.0%
  1000000=1043.7%
  2000000=2090.1%
  5000000=5227.2%
  10000000=10458.1%
  20000000=20942.7%
ReversedList: (Baseline=0.005879 seconds)
  100000=100.0%
  1000000=1015.0%
  2000000=2023.5%
  5000000=5057.1%
  10000000=10114.0%
  20000000=20592.6%
ShuffledList: (Baseline=0.028241 seconds)
  100000=100.0%
  1000000=1296.0%
  2000000=2877.3%
  5000000=7960.1%
  10000000=17216.9%
  20000000=37599.9%
PostShuffleDict: (Baseline=0.016229 seconds)
  100000=100.0%
  1000000=1007.9%
  2000000=2018.4%
  5000000=5075.3%
  10000000=10217.5%
  20000000=20873.1%
PostShuffleList: (Baseline=0.020551 seconds)
  100000=100.0%
  1000000=1021.9%
  2000000=1978.2%
  5000000=4953.6%
  10000000=10262.3%
  20000000=19854.0%

Baseline changes for Dict and List after deallocation of list in random
order:
  Dict: 517.7%
  List: 356.5%

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dealloc_timing.py
Type: text/x-python
Size: 2003 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081223/4abc8b9b/attachment-0001.py>

From alexandre at peadrop.com  Tue Dec 23 04:26:29 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Mon, 22 Dec 2008 22:26:29 -0500
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <loom.20081223T002448-972@post.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
	<49500D80.2090201@v.loewis.de>
	<loom.20081222T222256-318@post.gmane.org>
	<49501AEC.3010805@v.loewis.de>
	<loom.20081223T002448-972@post.gmane.org>
Message-ID: <acd65fa20812221926j6ab5019djb24746a9b295b382@mail.gmail.com>

On Mon, Dec 22, 2008 at 7:34 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> Now, we should find a way to benchmark this without having to steal Mike's
>> machine and wait 30 minutes every time.
>
> So, I seem to reproduce it. The following script takes about 15 seconds to
> run and allocates a 2 GB dict which it deletes at the end (gc disabled of
> course).
> With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher
> (including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular
> but the difference is clear.
>

I modified your script to delete the dictionary without actually
deallocating the items in it. You can speed up a dictionary
deallocation significantly if you keep a reference to its items and
delete the dictionary before deleting its items. In Python 2.4, the
same behavior exists, but is not as strongly marked as in Python 2.6
with pymalloc enabled.

I can understand that deallocating the items in the order (or
actually, the reverse order) they were allocated is faster, than doing
so in a rather haphazard manner (i.e., like dict). However, I am not
sure why pymalloc accentuate this behavior.

-- Alexandre

Python 2.6 with pymalloc, without pydebug

alex at helios:~$ python2.6 dict_dealloc_test.py
creating 397476 items...
 -> 6.613 s.
building dict...
 -> 0.230 s.
deleting items...
 -> 0.059 s.
deleting dict...
 -> 2.299 s.
total deallocation time: 2.358 seconds.

alex at helios:~$ python2.6 dict_dealloc_test.py
creating 397476 items...
 -> 6.530 s.
building dict...
 -> 0.228 s.
deleting dict...
 -> 0.089 s.
deleting items...
 -> 0.971 s.
total deallocation time: 1.060 seconds.


Python 2.6 without pymalloc, without pydebug

alex at helios:release26-maint$ ./python /home/alex/dict_dealloc_test.py
creating 397476 items...
 -> 5.921 s.
building dict...
 -> 0.244 s.
deleting items...
 -> 0.073 s.
deleting dict...
 -> 1.502 s.
total deallocation time: 1.586 seconds.

alex at helios:release26-maint$ ./python /home/alex/dict_dealloc_test.py
creating 397476 items...
 -> 6.122 s.
building dict...
 -> 0.237 s.
deleting dict...
 -> 0.092 s.
deleting items...
 -> 1.238 s.
total deallocation time: 1.330 seconds.


alex at helios:~$ python2.4 dict_dealloc_test.py
creating 397476 items...
 -> 6.164 s.
building dict...
 -> 0.218 s.
deleting items...
 -> 0.057 s.
deleting dict...
 -> 1.185 s.
total deallocation time: 1.243 seconds.

alex at helios:~$ python2.4 dict_dealloc_test.py
creating 397476 items...
 -> 6.202 s.
building dict...
 -> 0.218 s.
deleting dict...
 -> 0.090 s.
deleting items...
 -> 0.852 s.
total deallocation time: 0.943 seconds.



######

import random
import time
import gc


# Adjust this parameter according to your system RAM!
target_size = int(2.0  * 1024**3)  # 2.0 GB

pool_size = 4 * 1024
# This is a ballpark estimate: 60 bytes overhead for each
# { dict entry struct + float object + tuple object header },
# 1.3 overallocation factor for the dict.
target_length = int(target_size / (1.3 * (pool_size + 60)))

def make_items():
    print ("creating %d items..." % target_length)

    # 1. Initialize a set of pre-computed random keys.
    keys = [random.random() for i in range(target_length)]

    # 2. Build the values that will constitute the dict. Each value will, as
    #    far as possible, span a contiguous `pool_size` memory area.

    # Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc()
    # We avoid that by allocating tuples of smaller longs.
    int_size = 200
    # 24 roughly accounts for the long object overhead (YMMV)
    int_start = 1 << ((int_size - 24) * 8 - 7)
    int_range = range(1, 1 + pool_size // int_size)

    values = [None] * target_length
    # Maximize allocation locality by pre-allocating the values
    for n in range(target_length):
       values[n] = tuple(int_start + j for j in int_range)
    return list(zip(keys,values))

if __name__ == "__main__":
    gc.disable()
    t1 = time.time()
    items = make_items()
    t2 = time.time()
    print " -> %.3f s." % (t2 - t1)

    print "building dict..."
    t1 = time.time()
    testdict = dict(items)
    t2 = time.time()
    print " -> %.3f s." % (t2 - t1)

    def delete_testdict():
       global testdict
       print "deleting dict..."
       t1 = time.time()
       del testdict
       t2 = time.time()
       print " -> %.3f s." % (t2 - t1)

    def delete_items():
       global items
       print "deleting items..."
       t1 = time.time()
       del items
       t2 = time.time()
       print " -> %.3f s." % (t2 - t1)

    t1 = time.time()
    # Swap these, and look at the total time
    delete_items()
    delete_testdict()
    t2 = time.time()
    print "total deallocation time: %.3f seconds." % (t2 - t1)

From skip at pobox.com  Tue Dec 23 04:56:18 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 22 Dec 2008 21:56:18 -0600
Subject: [Python-Dev] If I check something in ...
In-Reply-To: <4950143B.50100@v.loewis.de>
References: <18768.3678.749094.475868@montanaro-dyndns-org.local>
	<4950143B.50100@v.loewis.de>
Message-ID: <18768.24930.356203.736710@montanaro-dyndns-org.local>

    Martin> Instead, you should commit it into trunk, and then run svnmerge.py three
    Martin> times, namely:
    ...

Thanks for that cheat sheet.  I never would have figured that out on my
own.  Well, at least not in a timely fashion.

Skip

From scott+python-dev at scottdial.com  Mon Dec 22 15:47:01 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Mon, 22 Dec 2008 09:47:01 -0500
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <494F6692.8000001@v.loewis.de>
References: <494F6692.8000001@v.loewis.de>
Message-ID: <494FA865.2050009@scottdial.com>

Martin v. L?wis wrote:
> It seems r67740 shouldn't have been committed. Since this
> is a severe regression, I think I'll have to revert it, and
> release 2.5.4 with just that change.

My understanding of the problem is that clearerr() needs to be called
before any FILE read operations on *some* platforms. The only platform I
saw mentioned was OS X. Towards that end, I have attached a much simpler
patch onto the tracker issue, which maybe somebody can verify solves the
problem because I do not have access to a platform which fails the test
that was originally given.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From lance.ellinghaus at eds.com  Tue Dec 23 07:28:19 2008
From: lance.ellinghaus at eds.com (Ellinghaus, Lance)
Date: Tue, 23 Dec 2008 01:28:19 -0500
Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10
Message-ID: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com>

I am hoping someone can assist me. I normally don't care if the _ctypes
module builds or not, but I now need to have it build.
I am running Solaris 10 with Sun's C compiler under SunStudio 11.

After running 'configure' and 'make', the _ctypes module fails with the
following error:

cc -xcode=pic32 -DNDEBUG -O -I. -I/data/python/Python-2.6.1/./Include
-Ibuild/temp.solaris-2.10-sun4u-2.6/libffi/include
-Ibuild/temp.solaris-2.10-sun4u-2.6/libffi
-I/data/python/Python-2.6.1/Modules/_ctypes/libffi/src
-I/usr/local/python/include -I. -IInclude -I./Include
-I/usr/local/include -I/data/python/Python-2.6.1/Include
-I/data/python/Python-2.6.1 -c
/data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c -o
build/temp.solaris-2.10-sun4u-2.6/data/python/Python-2.6.1/Modules/_ctyp
es/_ctypes.o
"build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257:
syntax error before or at: __attribute__
"build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257:
warning: old-style declaration or incorrect type for: __attribute__
"build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257:
warning: syntax error:  empty declaration
"/data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c", line 187: cannot
recover from previous errors
cc: acomp failed for /data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c

Is there anything special I have to do to get it to compile under
Solaris 10 and SunStudio 11?
BTW: I cannot use GCC.

Thank you very much,

Lance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081223/d349834b/attachment.htm>

From martin at v.loewis.de  Tue Dec 23 10:37:57 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Dec 2008 10:37:57 +0100
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <494FA865.2050009@scottdial.com>
References: <494F6692.8000001@v.loewis.de> <494FA865.2050009@scottdial.com>
Message-ID: <4950B175.1020704@v.loewis.de>

> My understanding of the problem is that clearerr() needs to be called
> before any FILE read operations on *some* platforms. The only platform I
> saw mentioned was OS X. Towards that end, I have attached a much simpler
> patch onto the tracker issue, which maybe somebody can verify solves the
> problem because I do not have access to a platform which fails the test
> that was originally given.

Thanks. I won't then reject the patch outright, only revert it from 2.5.
I can't give this a second try, as 2.5.3 was already supposed to be the
last release - I don't want to find myself reverting your patch two
weeks from now.

Is the approach that you add a clearerr call is added for each read
operation?

Regards,
Martin

From martin at v.loewis.de  Tue Dec 23 10:44:33 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 23 Dec 2008 10:44:33 +0100
Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10
In-Reply-To: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com>
References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com>
Message-ID: <4950B301.4020702@v.loewis.de>

> I am hoping someone can assist me. I normally don?t care if the _ctypes
> module builds or not, but I now need to have it build.
> 
> I am running Solaris 10 with Sun?s C compiler under SunStudio 11.

I don't think ctypes (rather, libffi) supports Sun C. You will need to
port it (as you have already ruled out the other options, such as using
gcc, or not using ctypes).

Regards,
Martin

From ncoghlan at gmail.com  Tue Dec 23 11:43:55 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 23 Dec 2008 20:43:55 +1000
Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10
In-Reply-To: <4950B301.4020702@v.loewis.de>
References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com>
	<4950B301.4020702@v.loewis.de>
Message-ID: <4950C0EB.9030901@gmail.com>

Martin v. L?wis wrote:
>> I am hoping someone can assist me. I normally don?t care if the _ctypes
>> module builds or not, but I now need to have it build.
>>
>> I am running Solaris 10 with Sun?s C compiler under SunStudio 11.
> 
> I don't think ctypes (rather, libffi) supports Sun C. You will need to
> port it (as you have already ruled out the other options, such as using
> gcc, or not using ctypes).

There is also an existing issue relating to this:

http://bugs.python.org/issue2552

(although it doesn't add much beyond what Martin already said)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From rocky at gnu.org  Tue Dec 23 12:55:40 2008
From: rocky at gnu.org (Rocky Bernstein)
Date: Tue, 23 Dec 2008 06:55:40 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
Message-ID: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>

Now that there is a package mechanism (are package mechanisms?) like
zipimporter that bundle source code into a single file, should the
notion of a "file" location should be adjusted to include the package
and/or importer?

Is there a standard API or routine which can extract this information
given a code object?

A use case here I am thinking of here is in a stack trace or a
debugger, or a tool which wants to show in great detail information
from a code object possibly via a frame. For example does this come
from a zipped egg? And if so, which one?

For concreteness, here is what I did and here's what I saw.  Select
one of the zipimporter eggs at http://code.google.com/p/pytracer and
install one of these.

I did this on GNU/Linux and Python 2.5 and I look at the co_filename
of one of the methods:

>>> import tracer
>>> tracer.__dict__['size'].func_code.co_filename
'build/bdist.linux-i686/egg/tracer.py'

But there is no file called "build/bdist.linux-686/egg/tracer.py" in
the filesystem. Instead there is a member "tracer.py" inside
/usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg'.

It's possible I caused this egg to get built incorrectly or that
setuptools has a bug which entered that misleading information.
However, shouldn't there be a standard way to untangle package
location, loader and member inside the package?

As best as I can tell, PEP 302 which discussed importer hooks and
suggests a standard way to get file data. But it doesn't address a
standard way to get container package and/or loader information.

Also I'm not sure there *is* a standard print string way to show
member inside a package. zipimporter may insert co_filename strings
like:

  /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py

but the trouble with this is that it means file routines have to scan
the path and notice say that
/usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*,
not a directory. And a file stat/reading routine needs to understand
what kind of packager that is in order to get tracer.py information.

(Are there any file routines in place for doing this?)

Thanks.

From mal at egenix.com  Tue Dec 23 13:47:15 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 23 Dec 2008 13:47:15 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <200812230845.42805.steve@pearwood.info>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<494D6EA9.2040201@v.loewis.de>
	<494F862B.60701@egenix.com>
	<200812230845.42805.steve@pearwood.info>
Message-ID: <4950DDD3.7030601@egenix.com>

On 2008-12-22 22:45, Steven D'Aprano wrote:
> On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
>> On 2008-12-20 23:16, Martin v. L?wis wrote:
>>>>> I will try next week to see if I can come up with a smaller,
>>>>> submittable example.  Thanks.
>>>> These long exit times are usually caused by the garbage collection
>>>> of objects. This can be a very time consuming task.
>>> I doubt that. The long exit times are usually caused by a bad
>>> malloc implementation.
>> With "garbage collection" I meant the process of Py_DECREF'ing the
>> objects in large containers or deeply nested structures, not the GC
>> mechanism for breaking circular references in Python.
>>
>> This will usually also involve free() calls, so the malloc
>> implementation affects this as well. However, I've seen such long
>> exit times on Linux and Windows, which both have rather good
>> malloc implementations.
>>
>> I don't think there's anything much we can do about it at the
>> interpreter level. Deleting millions of objects takes time and that's
>> not really surprising at all. It takes even longer if you have
>> instances with .__del__() methods written in Python.
> 
> 
> This behaviour appears to be specific to deleting dicts, not deleting 
> random objects. I haven't yet confirmed that the problem still exists 
> in trunk (I hope to have time tonight or tomorrow), but in my previous 
> tests deleting millions of items stored in a list of tuples completed 
> in a minute or two, while deleting the same items stored as key:item 
> pairs in a dict took 30+ minutes. I say plus because I never had the 
> patience to let it run to completion, it could have been hours for all 
> I know.

That's interesting. The dictionary dealloc routine doesn't give
any hint as to why this should take longer than deallocating
a list of tuples.

However, due to the way dictionary tables are allocated, it is
possible that you create a table that is nearly twice the size
of the actual number of items needed by the dictionary. At those
dictionary size, this can result in a lot of extra memory being
allocated, certainly more than the corresponding list of tuples
would use.

>> Applications can choose other mechanisms for speeding up the
>> exit process in various (less clean) ways, if they have a need for
>> this.
>>
>> BTW: Rather than using a huge in-memory dict, I'd suggest to either
>> use an on-disk dictionary such as the ones found in mxBeeBase or
>> a database.
> 
> The original poster's application uses 45GB of data. In my earlier 
> tests, I've experienced the problem with ~ 300 *megabytes* of data: 
> hardly what I would call "huge".

Times have changed, that's true :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 23 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0      http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From scott+python-dev at scottdial.com  Tue Dec 23 15:04:51 2008
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Tue, 23 Dec 2008 09:04:51 -0500
Subject: [Python-Dev] Releasing 2.5.4
In-Reply-To: <4950B175.1020704@v.loewis.de>
References: <494F6692.8000001@v.loewis.de> <494FA865.2050009@scottdial.com>
	<4950B175.1020704@v.loewis.de>
Message-ID: <4950F003.2060802@scottdial.com>

Martin v. L?wis wrote:
>> My understanding of the problem is that clearerr() needs to be called
>> before any FILE read operations on *some* platforms. The only platform I
>> saw mentioned was OS X. Towards that end, I have attached a much simpler
>> patch onto the tracker issue, which maybe somebody can verify solves the
>> problem because I do not have access to a platform which fails the test
>> that was originally given.
> 
> Thanks. I won't then reject the patch outright, only revert it from 2.5.
> I can't give this a second try, as 2.5.3 was already supposed to be the
> last release - I don't want to find myself reverting your patch two
> weeks from now.

I agree, and as far as I can tell, the bug (assuming the report is
accurate) only occurs on a few platforms and since it's received little
attention over the life of the issue on the tracker, I imagine it's not
very important to many people. And since I don't have an effected
platform to test, I can't even be sure that it really solves the bug.
So, I agree leave it out.

> Is the approach that you add a clearerr call is added for each read
> operation?

Yes, I merely added clearerr() calls just prior to first the fread,
fgets, and getc calls in each of the read methods for files. I'll make a
clean patch against the trunk and update the issue on the tracker, then
maybe the reporter or someone else with an effected platform can verify
my patch.

-Scott

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From p.f.moore at gmail.com  Tue Dec 23 15:06:31 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 23 Dec 2008 14:06:31 +0000
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
Message-ID: <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>

2008/12/23 Rocky Bernstein <rocky at gnu.org>:
> Now that there is a package mechanism (are package mechanisms?) like
> zipimporter that bundle source code into a single file, should the
> notion of a "file" location should be adjusted to include the package
> and/or importer?

Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically
the get_source (optional) method. It's not exactly what you describe,
but it may help. Please note that it's optional - if you loaded the
code from a zipfile containing only bytecode files, there is no source
to get, so you have to be prepared for that case. But if the source is
available, this should give you a way of getting to it.

Paul.

From ncoghlan at gmail.com  Tue Dec 23 16:29:23 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Dec 2008 01:29:23 +1000
Subject: [Python-Dev] Should there be a way or API for retrieving from
 a	code object a loader method and package file where the code	comes from?
In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
Message-ID: <495103D3.9000505@gmail.com>

Rocky Bernstein wrote:
> As best as I can tell, PEP 302 which discussed importer hooks and
> suggests a standard way to get file data. But it doesn't address a
> standard way to get container package and/or loader information.

If a "filename" may not be an actual filename, but instead a
pseduo-filename created based on the __file__ attribute of a Python
module, then there are a few mechanisms for accessing it:

1. Use the package/module name and the relative path from that location,
then use pkgutil.get_data to retrieve it. This has the advantage of
correctly handling the case where no __loader__ attribute is present (or
it is None), which can happen for standard filesystem imports. However,
it only works in Python 2.6 and above (since get_data() is a new
addition to pkgutil).

2. Implement your own version of pkgutil.get_data - more work, but it is
the only way to get something along those lines that works for versions
prior to Python 2.6

3. Do what a number of standard library APIs (e.g. linecache) that
accept filenames do and also accept an optional "module globals"
argument. If the globals argument is passed in and contains a
"__loader__" entry, use the appropriate loader method when processing
the "filename" that was passed in.

> Also I'm not sure there *is* a standard print string way to show
> member inside a package. zipimporter may insert co_filename strings
> like:
> 
>   /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py
> 
> but the trouble with this is that it means file routines have to scan
> the path and notice say that
> /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*,
> not a directory. And a file stat/reading routine needs to understand
> what kind of packager that is in order to get tracer.py information.
> 
> (Are there any file routines in place for doing this?)

Finding a loader given only a pseudo-filename and no module is actually
possible in the specific case of zipimport, but is still pretty obscure
at this point in time:

1. Scan sys.path looking for an entry that matches the start of the
pseudo-filename (remembering to use os.path.normpath).

2. Once such a path entry has been found, use PEP 302 to find the
associated importer object (the undocumented pkgutil.get_importer
function does exactly that - although, as with any undocumented feature,
the promises of API compatibility across major version changes aren't as
strong as they would be for an officially documented and supported
interface).

3. Hope that the importer is one like zipimport that allows get_data()
to be invoked directly on the importer object, rather than only
providing it on a separate loader object after the module has been
loaded. If it needs a real loader instead of just the importer, then
you're back to the original problem of needing a module or package name
(or globals dictionary) in addition to the pseudo filename.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From p.f.moore at gmail.com  Tue Dec 23 16:41:56 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 23 Dec 2008 15:41:56 +0000
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <18768.63272.61558.985690@panix5.panix.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
	<18768.63272.61558.985690@panix5.panix.com>
Message-ID: <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com>

2008/12/23  <rocky at gnu.org>:
> What is wanted is a uniform way get and describe a file location
> from a code object that takes into account the file might be a member
> of an archive.

But a code object may not have come from a file. Ignoring the
interactive prompt (not because it's unimportant, just because people
have a tendency to assume it's the only special case :-)) you need to
consider code loaded via a PEP302 importer from (say) a sqlite
database, or code created using compile(), or possibly even more
esoteric means.

So I'm not sure your request is clearly specified.

> Are there even guidelines for saying what string goes into a code
> object's co_filename? Clearly it should be related to the source code
> that generated the code, and there are various conventions that seem
> to exist when the code comes from an "eval" or an "exec".

I'm not aware of guidelines - the documentation for compile() says
"The filename argument should give the file from which the code was
read; pass some recognizable value if it wasn't read from a file
('<string>' is commonly used)" which is pretty non-commital.

> But empirically it seems as though there's some variation. It could be
> an absolute file or a file with no root directory specified. (But is
> it possible to have things like "." and ".."?). And in the case of a
> member of a package what happens? Should it be just the member without
> the package? Or should it include the package name like
>   /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ?
>
> Or be unspecified? If left unspecified as I gather it is now, it makes
> it more important to have some sort of common routine to be able to
> pick out the archive part in a filesystem from the member name inside
> the archive.

I think you need to be clear on *why* you want to know this
information. Once it's clear what you're trying to achieve, it will be
easier to say what the options are.

It sounds like you're trying to propose a stronger convention, to be
enforced in the future. (At least, your suggestion of producing stack
traces implies that you want stack trace code not to have to deal with
the current situation). When PEP 302 was being developed, we were
looking at similar issues. That's why I pointed you at get_source() -
it was the best we could do with all the various conflicting
requirements, and the fact that it's optional is because we had to
cater for cases where there simply wasn't a meaningful answer.
Frankly, backward compatibility requirements kill a lot of the options
here.

Maybe what you want is a *pair* of linked conventions:

    - co_filename (or a replacement) returns a (notionally opaque, but
in practice a filename for file-based cases) token representing "the
file or other object the code came from"
    -  xxx.get_source_code(token) is a function (I don't know where,
xxx is a placeholder for some "suitable" module) which, given such a
token, returns the source, or None if there's no viable concept of
"the source".

Or maybe you want a (possibly separate) attribute of a code object,
which holds a string containing a human-readable (but quite possibly
not machine-parseable) value representing the "place the code came
from" - co_filename is essentially this at the moment, and maybe your
complaint is merely that you don't find its contents sufficiently
human-readable in the case of the zipimport module (in which case you
might want to search some of the archives for the discussions on the
constraints imposed on zipimport, because objects on sys.path must be
strings and cannot be arbitrary objects...)

I'm sorry if this is a little rambling. I can appreciate that there's
some sort of issue that you see here, but I don't yet see any
practical way of changing things that would help. And as always,
there's backward compatibility to consider - existing code isn't going
to change, so new code has to be prepared to handle that.

I hope this is of some help,
Paul.

From ncoghlan at gmail.com  Tue Dec 23 16:42:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Dec 2008 01:42:07 +1000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <4950DDD3.7030601@egenix.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<494D6EA9.2040201@v.loewis.de>	<494F862B.60701@egenix.com>	<200812230845.42805.steve@pearwood.info>
	<4950DDD3.7030601@egenix.com>
Message-ID: <495106CF.5070302@gmail.com>

M.-A. Lemburg wrote:
> On 2008-12-22 22:45, Steven D'Aprano wrote:
>> This behaviour appears to be specific to deleting dicts, not deleting 
>> random objects. I haven't yet confirmed that the problem still exists 
>> in trunk (I hope to have time tonight or tomorrow), but in my previous 
>> tests deleting millions of items stored in a list of tuples completed 
>> in a minute or two, while deleting the same items stored as key:item 
>> pairs in a dict took 30+ minutes. I say plus because I never had the 
>> patience to let it run to completion, it could have been hours for all 
>> I know.
> 
> That's interesting. The dictionary dealloc routine doesn't give
> any hint as to why this should take longer than deallocating
> a list of tuples.

Shuffling the list with random.shuffle before deleting it makes a
*massive* difference to how long the deallocation takes.

Not only that, but after the shuffled list has been deallocated,
deleting an unshuffled list subsequently takes significantly longer.

(I posted numbers and a test script showing these effects elsewhere in
the thread).

The important factor seems to be deallocation order relative to
allocation order.

A simple list deletes objects in the reverse of the order of creation,
while a reversed list deletes them in order of creation. Both of these
seem to scale fairly linearly.

A dict with a hash order that I believe is a fair approximation of
creation order also didn't appear to exhibit particularly poor scaling
(at least not within the 20 million objects I could test).

The shuffled list, on the other hand, was pretty atrocious, taking
nearly twice as long to be destroyed as an unshuffled list of the same size.

I'd like to add another dict to the test which eliminates the current
coupling between hash order and creation order, and see if it exhibits
poor behaviour which is similar to that of the shuffled list, but I'm
not sure when I'll get to that (probably post-Christmas).

Note that I think these results are consistent with the theory that the
problem lies in the way partially allocated memory pools are tracked in
the obmalloc code - it makes sense that deallocating in creation order
or in reverse of creation order would tend to clean up each arena in
order and keep the obmalloc internal state neat and tidy, while
deallocating objects effectively at random would lead to a lot of
additional bookkeeping as the "most used" and "least used" arenas change
over the course of the deallocation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From p.f.moore at gmail.com  Tue Dec 23 17:00:25 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 23 Dec 2008 16:00:25 +0000
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <495103D3.9000505@gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<495103D3.9000505@gmail.com>
Message-ID: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com>

2008/12/23 Nick Coghlan <ncoghlan at gmail.com>:
> Finding a loader given only a pseudo-filename and no module is actually
> possible in the specific case of zipimport, but is still pretty obscure
> at this point in time:
>
> 1. Scan sys.path looking for an entry that matches the start of the
> pseudo-filename (remembering to use os.path.normpath).
>
> 2. Once such a path entry has been found, use PEP 302 to find the
> associated importer object (the undocumented pkgutil.get_importer
> function does exactly that - although, as with any undocumented feature,
> the promises of API compatibility across major version changes aren't as
> strong as they would be for an officially documented and supported
> interface).
>
> 3. Hope that the importer is one like zipimport that allows get_data()
> to be invoked directly on the importer object, rather than only
> providing it on a separate loader object after the module has been
> loaded. If it needs a real loader instead of just the importer, then
> you're back to the original problem of needing a module or package name
> (or globals dictionary) in addition to the pseudo filename.

There were lots of proposals tossed around on python-dev at the time
PEP 302 was being developed, which might have made all this easier.
Most, if not all, were killed by backward compatibility requirements.

I have some hopes that when Brett completes his "import in Python"
work, that will add sufficient flexibility to allow people to
experiment with all of this machinery, and ultimately maybe move
forward with a more modular import mechanism. But the timescales for
Brett's changes won't be until at least Python 3.1, and it'll be a
release or two after that before any significant change can be eased
in in a compatible manner. That's going to take a lot of energy on
someone's part.

Paul.

PS One of these days, I'm going to write an insanely useful importer
which takes the least-convenient option wherever PEP 302 allows
flexibility. It'll be adopted by everyone because it's so great, and
all the software that currently makes unwarranted assumptions about
importers will break and get fixed to support it because otherwise its
users will rebel, and we'll live in a paradise where everything
follows the specs to the letter. Oh, yes, and I'm going to win the
lottery every week for the next month :-)

PPS Seriously, setuptools and the adoptions of eggs has pushed a lot
of code to be much more careful about unwarranted assumptions that
code lives in the filesystem. That's an incredibly good thing, and
very hard to do right (witness the setuptools "zip_safe" parameter
which acts as a get-out clause). Much kudos to setuptools for getting
as far as it has.

From rocky at gnu.org  Tue Dec 23 15:35:20 2008
From: rocky at gnu.org (rocky at gnu.org)
Date: Tue, 23 Dec 2008 09:35:20 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
Message-ID: <18768.63272.61558.985690@panix5.panix.com>

Paul Moore writes:
 > 2008/12/23 Rocky Bernstein <rocky at gnu.org>:
 > > Now that there is a package mechanism (are package mechanisms?) like
 > > zipimporter that bundle source code into a single file, should the
 > > notion of a "file" location should be adjusted to include the package
 > > and/or importer?
 > 
 > Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically
 > the get_source (optional) method. 

Yes, that's one of the things I was thinking when I wrote:

  As best as I can tell, PEP 302 which discussed importer hooks and
  suggests a standard way to get file data.

And by "suggests" I meant was implying that yes I know this is
optional.


 > It's not exactly what you describe,
 > but it may help. 

Yes, it's not exactly what is desired. 

 > Please note that it's optional - if you loaded the
 > code from a zipfile containing only bytecode files, there is no source
 > to get, so you have to be prepared for that case. But if the source is
 > available, this should give you a way of getting to it.

What is wanted is a uniform way get and describe a file location
from a code object that takes into account the file might be a member
of an archive. 

Are there even guidelines for saying what string goes into a code
object's co_filename? Clearly it should be related to the source code
that generated the code, and there are various conventions that seem
to exist when the code comes from an "eval" or an "exec". 

But empirically it seems as though there's some variation. It could be
an absolute file or a file with no root directory specified. (But is
it possible to have things like "." and ".."?). And in the case of a
member of a package what happens? Should it be just the member without
the package? Or should it include the package name like
   /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ? 

Or be unspecified? If left unspecified as I gather it is now, it makes
it more important to have some sort of common routine to be able to
pick out the archive part in a filesystem from the member name inside 
the archive.


 > 
 > Paul.
 > 

From pje at telecommunity.com  Tue Dec 23 17:19:52 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Dec 2008 11:19:52 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
 code object a loader method and package file where the code comes from?
In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com
 >
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
Message-ID: <20081223161810.B3B513A409D@sparrow.telecommunity.com>

At 06:55 AM 12/23/2008 -0500, Rocky Bernstein wrote:
>Now that there is a package mechanism (are package mechanisms?) like
>zipimporter that bundle source code into a single file, should the
>notion of a "file" location should be adjusted to include the package
>and/or importer?
>
>Is there a standard API or routine which can extract this information
>given a code object?

The inspect module (in 2.5 and up) supports retrieving the source 
lines for any object that has module globals.  So you could do it for 
a class, a function, a method, module-level code, or even a frame, 
but not for a standalone code object.

I believe there are also certain inspect module APIs that will return 
a pseudo-filename, i.e. the zipfile name followed by the path within 
the zipfile.


>Also I'm not sure there *is* a standard print string way to show
>member inside a package. zipimporter may insert co_filename strings
>like:
>
>   /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py

AFAIK, it'll only do this if the zipfile doesn't contain a usable 
.pyc or .pyo.  Ordinarily, co_filename will be the name of the 
original source file before the zipfile was created.


From pje at telecommunity.com  Tue Dec 23 17:29:22 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Dec 2008 11:29:22 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
 code object a loader method and package file where the code comes from?
In-Reply-To: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.co
 m>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<495103D3.9000505@gmail.com>
	<79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com>
Message-ID: <20081223162739.BA5E83A409D@sparrow.telecommunity.com>

At 04:00 PM 12/23/2008 +0000, Paul Moore wrote:
>PPS Seriously, setuptools and the adoptions of eggs has pushed a lot
>of code to be much more careful about unwarranted assumptions that
>code lives in the filesystem. That's an incredibly good thing, and
>very hard to do right (witness the setuptools "zip_safe" parameter
>which acts as a get-out clause). Much kudos to setuptools for getting
>as far as it has.

And ironically, if I ever get the time to actually work on a new 
version of easy_install (as opposed to perpetually tweaking the old 
one), the default zipping and default sys.path munging will be among 
the first things to go.  ;-)

Ironically, my choice of isolated directories and zipfiles for 
quick-and-dirty uninstall support has ended up costing far too much, 
compared to if I'd just taken the time to design a decent uninstall 
feature.  Of course, hindsight is 20-20; in order to fully understand 
the requirements of a problem, you sometimes have to get a rather 
long way towards solving it the simple, obvious...  and wrong way.

(And, it didn't help that I had significant time constraints pushing 
me in the direction of the Seemingly-Simplest-At-The-Moment Thing 
That Could Possibly Work.)


From rocky at panix.com  Tue Dec 23 17:36:48 2008
From: rocky at panix.com (R. Bernstein)
Date: Tue, 23 Dec 2008 11:36:48 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
	<18768.63272.61558.985690@panix5.panix.com>
	<79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com>
Message-ID: <18769.5024.990970.46864@panix5.panix.com>

Paul Moore writes:
 > 2008/12/23  <rocky at gnu.org>:
 > > What is wanted is a uniform way get and describe a file location
 > > from a code object that takes into account the file might be a member
 > > of an archive.
 > 
 > But a code object may not have come from a file. 

Right. That's why I mentioned for example "eval" and "exec" that you
cite below. So remove the "file" in what is cited above. Replace with:
"a unform way to get information (not necessarily just the source
text) about the location/origin of code from a code object.

 > Ignoring the
 > interactive prompt (not because it's unimportant, just because people
 > have a tendency to assume it's the only special case :-)) you need to
 > consider code loaded via a PEP302 importer from (say) a sqlite
 > database, or code created using compile(), or possibly even more
 > esoteric means.
 > 
 > So I'm not sure your request is clearly specified.

Is the above any more clear? 

 > 
 > > Are there even guidelines for saying what string goes into a code
 > > object's co_filename? Clearly it should be related to the source code
 > > that generated the code, and there are various conventions that seem
 > > to exist when the code comes from an "eval" or an "exec".
 > 
 > I'm not aware of guidelines - the documentation for compile() says
 > "The filename argument should give the file from which the code was
 > read; pass some recognizable value if it wasn't read from a file
 > ('<string>' is commonly used)" which is pretty non-commital.
 > 
 > > But empirically it seems as though there's some variation. It could be
 > > an absolute file or a file with no root directory specified. (But is
 > > it possible to have things like "." and ".."?). And in the case of a
 > > member of a package what happens? Should it be just the member without
 > > the package? Or should it include the package name like
 > >   /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ?
 > >
 > > Or be unspecified? If left unspecified as I gather it is now, it makes
 > > it more important to have some sort of common routine to be able to
 > > pick out the archive part in a filesystem from the member name inside
 > > the archive.
 > 
 > I think you need to be clear on *why* you want to know this
 > information. Once it's clear what you're trying to achieve, it will be
 > easier to say what the options are.

This is what I wrote originally (slightly modified):
  
  A use case here I am thinking of here is in a stack trace or a
  debugger, or a tool which wants to show in great detail, information
  from a code object obtained possibly via a frame object.

I find it kind of sucky to see in a traceback: "<string>" as opposed
to the text (or prefix of the text) of the actual string that was
passed. Or something that has been referred to as a "pseudo-file" like
/usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py
when it is really member foo/bar.py of zipped egg
/usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.

(As a separate issue, it seems that zipimporter file locations inside
setuptools may have a problem.)

Inside a debugger or an IDE, it is conceivable a person might want
loader, and module information, and if the code is part of an archive
file, then member information. (If part of an eval string then, the
eval string.)

 > 
 > It sounds like you're trying to propose a stronger convention, to be
 > enforced in the future. 

Well, I wasn't sure if there was one. But I gather from what you write,
there isn't. :-)

Yes, I would suggest a stronger convention. Or a more up-front
statement that none is desired/forthcoming.

 > (At least, your suggestion of producing stack
 > traces implies that you want stack trace code not to have to deal with
 > the current situation). When PEP 302 was being developed, we were
 > looking at similar issues. That's why I pointed you at get_source() -
 > it was the best we could do with all the various conflicting
 > requirements, and the fact that it's optional is because we had to
 > cater for cases where there simply wasn't a meaningful answer.
 > Frankly, backward compatibility requirements kill a lot of the options
 > here.
 > 
 > Maybe what you want is a *pair* of linked conventions:
 > 
 >     - co_filename (or a replacement) returns a (notionally opaque, but
 > in practice a filename for file-based cases) token representing "the
 > file or other object the code came from"

This would be nice.

 >     -  xxx.get_source_code(token) is a function (I don't know where,
 > xxx is a placeholder for some "suitable" module) which, given such a
 > token, returns the source, or None if there's no viable concept of
 > "the source".

There always is a viable concept of a source. It's whatever was done
to get the code. For example, if it was via an eval then the source
was the eval function and a string, same for exec. If it's via
database access, well that then and some summary info about what's
known about that. 

 > 
 > Or maybe you want a (possibly separate) attribute of a code object,
 > which holds a string containing a human-readable (but quite possibly
 > not machine-parseable) value representing the "place the code came
 > from" - co_filename is essentially this at the moment, and maybe your
 > complaint is merely that you don't find its contents sufficiently
 > human-readable in the case of the zipimport module (in which case you
 > might want to search some of the archives for the discussions on the
 > constraints imposed on zipimport, because objects on sys.path must be
 > strings and cannot be arbitrary objects...)

There are two problems. One is displaying location information in an
unambiguous way -- the pseudo-file above is ambiguous and so is
<string> since there's no guarentee that OS's make to not name a file
that. The second problem is programmatically getting information such
as a debugger or an IDE might do so that the information can be
conveyed back to a user who might want to inspect surrounding source
code or modules.

 > 
 > I'm sorry if this is a little rambling. I can appreciate that there's
 > some sort of issue that you see here, but I don't yet see any
 > practical way of changing things that would help. And as always,
 > there's backward compatibility to consider - existing code isn't going
 > to change, so new code has to be prepared to handle that.
 > 
 > I hope this is of some help,

Yes, thanks. At least I now have a clearer idea of the state of
where things stand. 

 > Paul.
 > 

From p.f.moore at gmail.com  Tue Dec 23 17:55:36 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 23 Dec 2008 16:55:36 +0000
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <18769.5024.990970.46864@panix5.panix.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
	<18768.63272.61558.985690@panix5.panix.com>
	<79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com>
	<18769.5024.990970.46864@panix5.panix.com>
Message-ID: <79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com>

2008/12/23 R. Bernstein <rocky at panix.com>:
>  A use case here I am thinking of here is in a stack trace or a
>  debugger, or a tool which wants to show in great detail, information
>  from a code object obtained possibly via a frame object.

Thanks for the clarifications. I see what you're after much better now.

> I find it kind of sucky to see in a traceback: "<string>" as opposed
> to the text (or prefix of the text) of the actual string that was
> passed. Or something that has been referred to as a "pseudo-file" like
> /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py
> when it is really member foo/bar.py of zipped egg
> /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.

Fair comment. That points to a "human readable" type of string. It's
not available at the moment, but I guess it could be.

But see below.

>  >     -  xxx.get_source_code(token) is a function (I don't know where,
>  > xxx is a placeholder for some "suitable" module) which, given such a
>  > token, returns the source, or None if there's no viable concept of
>  > "the source".
>
> There always is a viable concept of a source. It's whatever was done
> to get the code. For example, if it was via an eval then the source
> was the eval function and a string, same for exec. If it's via
> database access, well that then and some summary info about what's
> known about that.

Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc",
for example. But not "source" in the sense of "source code". Some
applications run with only bytecode shipped, no source code available
at all.

> There are two problems. One is displaying location information in an
> unambiguous way -- the pseudo-file above is ambiguous and so is
> <string> since there's no guarentee that OS's make to not name a file
> that. The second problem is programmatically getting information such
> as a debugger or an IDE might do so that the information can be
> conveyed back to a user who might want to inspect surrounding source
> code or modules.

This is more than you were asking for above.

The first problem is addressed with a "human readable" (narrative)
description, as above.

The second, however, requires machine-readable access to source code
(if it exists). That's what the loader get_source() call does for you.
But you have to be prepared for the fact that it may not be possible
to get source code, and decide what you want to happen in that case.

>  > I hope this is of some help,
>
> Yes, thanks. At least I now have a clearer idea of the state of
> where things stand.

Good. Sorry it's not better news :-)

Paul

From rocky at panix.com  Tue Dec 23 17:55:00 2008
From: rocky at panix.com (R. Bernstein)
Date: Tue, 23 Dec 2008 11:55:00 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <495103D3.9000505@gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<495103D3.9000505@gmail.com>
Message-ID: <18769.6116.425537.968778@panix5.panix.com>

Nick Coghlan writes:
 > 3. Do what a number of standard library APIs (e.g. linecache) that
 > accept filenames do and also accept an optional "module globals"
 > argument. 

Actually, I did this and committed a change (to pydb) before posting
any of these queries. ;-)

If "a number of standard library APIs" are doing the *same* thing,
then shouldn't this exposed as a common routine?

If on the other hand, by "a number" you mean "one" as in linecache --
1 *is* a number too! -- then perhaps the relevant code that is buried
inside the "updatecache" should be exposed on its own.  (As a side
benefit that code can be tested separately too!)

Should I file a feature request for this? 

From lance.ellinghaus at eds.com  Tue Dec 23 18:03:02 2008
From: lance.ellinghaus at eds.com (Ellinghaus, Lance)
Date: Tue, 23 Dec 2008 12:03:02 -0500
Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10
In-Reply-To: <4950B301.4020702@v.loewis.de>
References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com>
	<4950B301.4020702@v.loewis.de>
Message-ID: <752A61D5C34D41478E638FC92AF9051B035636C9@usahm207.amer.corp.eds.com>

Martin,
Thank you very much. At least I know what I need to do now. 

> From: "Martin v. L?wis" [mailto:martin at v.loewis.de] 
> I don't think ctypes (rather, libffi) supports Sun C. You will need to
> port it (as you have already ruled out the other options, such as using
> gcc, or not using ctypes).

Lance


From tutufan at gmail.com  Tue Dec 23 18:54:11 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Tue, 23 Dec 2008 11:54:11 -0600
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>
	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>
	<494D4FD0.4020202@egenix.com>
	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>
	<18765.21740.137339.943481@montanaro-dyndns-org.local>
	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>
	<3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>
	<3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com>
Message-ID: <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com>

On Sat, Dec 20, 2008 at 6:22 PM, Mike Coleman <tutufan at gmail.com> wrote:
> Re "held" and "intern_it":  Haha!  That's evil and extremely evil,
> respectively.  :-)

P.S.  I tried the "held" idea out (interning integers in a list), and
unfortunately it didn't make that much difference.  In the example I
tried, there were 104465178 instances of integers from range(33467).
I guess if ints are 12 bytes (per Beazley's book, but not sure if that
still holds), then that would correspond to a 1GB reduction.  Judging
by 'top', it might have been 2 or 3GB instead, from a total of 45G.

Mike

From tutufan at gmail.com  Tue Dec 23 18:59:21 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Tue, 23 Dec 2008 11:59:21 -0600
Subject: [Python-Dev] suggest change to "Failed to find the necessary bits
	to build these modules" message
Message-ID: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com>

I was thrown by the "Failed to find the necessary bits to build these
modules" message at the end of newer Python builds, and thought that
this indicated that the Python executable itself was not built.
That's arguably stupidity on my part, but I wonder if others will not
trip on this, too.

Would it be possible to change this wording slightly, to something like

    Python built, but failed to find the necessary bits to build these modules

?

From brett at python.org  Tue Dec 23 19:12:09 2008
From: brett at python.org (Brett Cannon)
Date: Tue, 23 Dec 2008 10:12:09 -0800
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<495103D3.9000505@gmail.com>
	<79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com>
Message-ID: <bbaeab100812231012t2b10f91alf71796e4f82ce1d@mail.gmail.com>

On Tue, Dec 23, 2008 at 08:00, Paul Moore <p.f.moore at gmail.com> wrote:
> 2008/12/23 Nick Coghlan <ncoghlan at gmail.com>:
>> Finding a loader given only a pseudo-filename and no module is actually
>> possible in the specific case of zipimport, but is still pretty obscure
>> at this point in time:
>>
>> 1. Scan sys.path looking for an entry that matches the start of the
>> pseudo-filename (remembering to use os.path.normpath).
>>
>> 2. Once such a path entry has been found, use PEP 302 to find the
>> associated importer object (the undocumented pkgutil.get_importer
>> function does exactly that - although, as with any undocumented feature,
>> the promises of API compatibility across major version changes aren't as
>> strong as they would be for an officially documented and supported
>> interface).
>>
>> 3. Hope that the importer is one like zipimport that allows get_data()
>> to be invoked directly on the importer object, rather than only
>> providing it on a separate loader object after the module has been
>> loaded. If it needs a real loader instead of just the importer, then
>> you're back to the original problem of needing a module or package name
>> (or globals dictionary) in addition to the pseudo filename.
>
> There were lots of proposals tossed around on python-dev at the time
> PEP 302 was being developed, which might have made all this easier.
> Most, if not all, were killed by backward compatibility requirements.
>
> I have some hopes that when Brett completes his "import in Python"
> work, that will add sufficient flexibility to allow people to
> experiment with all of this machinery, and ultimately maybe move
> forward with a more modular import mechanism.

I have actually made a good amount of progress as of late. It's a New
Years resolution to get importlib done, but I am actually aiming for
before January 1 (sans the damn compile() problem I am having).This
goal does ignore everything but a compatible __import__, though.

> But the timescales for
> Brett's changes won't be until at least Python 3.1, and it'll be a
> release or two after that before any significant change can be eased
> in in a compatible manner.

I suspect that any import work will be a Pending/DeprecationWarning
deal, so 3.3 would be the first version that could have any real
changes as the default.

> That's going to take a lot of energy on
> someone's part.

That would be me. =) After importlib is finished I have a couple of
PEPs planned plus properly documenting how the import machinery works
in the language spec. And I suspect this will lead to some discussions
about things, e.g. requirements of the format for __file__ and
__path__ in regards to when they point inside of an archive, etc.

-Brett

From brett at python.org  Tue Dec 23 19:13:17 2008
From: brett at python.org (Brett Cannon)
Date: Tue, 23 Dec 2008 10:13:17 -0800
Subject: [Python-Dev] suggest change to "Failed to find the necessary
	bits to build these modules" message
In-Reply-To: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com>
References: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com>
Message-ID: <bbaeab100812231013u1ba98749ifa32d09ff6edd90a@mail.gmail.com>

On Tue, Dec 23, 2008 at 09:59, Mike Coleman <tutufan at gmail.com> wrote:
> I was thrown by the "Failed to find the necessary bits to build these
> modules" message at the end of newer Python builds, and thought that
> this indicated that the Python executable itself was not built.
> That's arguably stupidity on my part, but I wonder if others will not
> trip on this, too.
>
> Would it be possible to change this wording slightly, to something like
>
>    Python built, but failed to find the necessary bits to build these modules
>
> ?

Sounds reasonable to me. Can you file a bug report at bugs.python.org,
Mike, so this doesn't get lost?

-Brett

From tutufan at gmail.com  Tue Dec 23 19:22:42 2008
From: tutufan at gmail.com (Mike Coleman)
Date: Tue, 23 Dec 2008 12:22:42 -0600
Subject: [Python-Dev] suggest change to "Failed to find the necessary
	bits to build these modules" message
In-Reply-To: <bbaeab100812231013u1ba98749ifa32d09ff6edd90a@mail.gmail.com>
References: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com>
	<bbaeab100812231013u1ba98749ifa32d09ff6edd90a@mail.gmail.com>
Message-ID: <3c6c07c20812231022y53f267bcg1c86339b30b0e074@mail.gmail.com>

Done: http://bugs.python.org/issue4731


On Tue, Dec 23, 2008 at 12:13 PM, Brett Cannon <brett at python.org> wrote:
> On Tue, Dec 23, 2008 at 09:59, Mike Coleman <tutufan at gmail.com> wrote:
>> I was thrown by the "Failed to find the necessary bits to build these
>> modules" message at the end of newer Python builds, and thought that
>> this indicated that the Python executable itself was not built.
>> That's arguably stupidity on my part, but I wonder if others will not
>> trip on this, too.
>>
>> Would it be possible to change this wording slightly, to something like
>>
>>    Python built, but failed to find the necessary bits to build these modules
>>
>> ?
>
> Sounds reasonable to me. Can you file a bug report at bugs.python.org,
> Mike, so this doesn't get lost?
>
> -Brett
>

From martin at v.loewis.de  Tue Dec 23 21:20:40 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Dec 2008 21:20:40 +0100
Subject: [Python-Dev] [ANN] Python 2.5.4 (final)
Message-ID: <49514818.7060103@v.loewis.de>

On behalf of the Python development team and the Python community, I'm
happy to announce the release of Python 2.5.4 (final).

Python 2.5.3 unfortunately contained an incorrect patch that could
cause interpreter crashes; the only change in Python 2.5.4 relative
to 2.5.4 is the reversal of this patch.

2.5.4 is the last bug fix release of Python 2.5. Future 2.5.x releases
will only include security fixes. According to the release notes, about
80 bugs and patches have been addressed since Python 2.5.2, many of
them improving the stability of the interpreter, and improving its
portability.

See the release notes at the website (also available as Misc/NEWS in
the source distribution) for details of bugs fixed; most of them prevent
interpreter crashes (and now cause proper Python exceptions in cases
where the interpreter may have crashed before).

For more information on Python 2.5.4, including download
links for various platforms, release notes, and known issues, please
see:

    http://www.python.org/2.5.4

Highlights of the previous major Python releases are available
from the Python 2.5 page, at

    http://www.python.org/2.5/highlights.html

Enjoy this release,
Martin

Martin v. Loewis
martin at v.loewis.de
Python Release Manager
(on behalf of the entire python-dev team)

From chambon.pascal at wanadoo.fr  Tue Dec 23 21:55:10 2008
From: chambon.pascal at wanadoo.fr (Pascal Chambon)
Date: Tue, 23 Dec 2008 21:55:10 +0100
Subject: [Python-Dev] Hello everyone + little question
	around	Cpython/stackless
In-Reply-To: <4950137E.8040506@v.loewis.de>
References: <49500B86.1070605@wanadoo.fr> <4950137E.8040506@v.loewis.de>
Message-ID: <4951502E.2030805@wanadoo.fr>

Allright then, I understand the problem...

Thanks a lot,
regards,
Pascal


>   



From kristjan at ccpgames.com  Tue Dec 23 22:08:16 2008
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Tue, 23 Dec 2008 21:08:16 +0000
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <49501AEC.3010805@v.loewis.de>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>
	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>
	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>
	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>
	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>
	<49500D80.2090201@v.loewis.de>	<loom.20081222T222256-318@post.gmane.org>
	<49501AEC.3010805@v.loewis.de>
Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local>

I'd like to suggest here, if you are giving this code a facelift, that on Windows you use VirtualAlloc and friends to allocate the arenas.  This gives you the most direct access to the VM manager and makes sure that a release arena is immediately availible to the rest of the system.  It also makes sure that you don't mess with the regular heap and fragment it.
Kristj?n

-----Original Message-----
From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of "Martin v. L?wis"
Sent: 22. desember 2008 22:56
To: Antoine Pitrou
Cc: python-dev at python.org
Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

>> Allocation of a new pool would have to do a linear search in these
>> pointers (finding the arena with the least number of pools);
> 
> You mean the least number of free pools, right?

Correct.


From martin at v.loewis.de  Tue Dec 23 22:52:31 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Dec 2008 22:52:31 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<acd65fa20812201528y5ff28ccanff95eae1280f9e3b@mail.gmail.com>	<acd65fa20812201540l3daa9adayfaaa24faba25f81@mail.gmail.com>	<3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com>	<aac2c7cb0812211044y78dc4c9bvefc296d968426da3@mail.gmail.com>	<3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com>	<49500D80.2090201@v.loewis.de>	<loom.20081222T222256-318@post.gmane.org>
	<49501AEC.3010805@v.loewis.de>
	<930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local>
Message-ID: <49515D9F.4020207@v.loewis.de>

> I'd like to suggest here, if you are giving this code a facelift,
> that on Windows you use VirtualAlloc and friends to allocate the
> arenas.  This gives you the most direct access to the VM manager and
> makes sure that a release arena is immediately availible to the rest
> of the system.  It also makes sure that you don't mess with the
> regular heap and fragment it.

While I'd like to see this done myself, I believe it is independent
from the problem at hand. Contributions are welcome.

Regards,
Martin

From tjreedy at udel.edu  Tue Dec 23 23:03:51 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 23 Dec 2008 17:03:51 -0500
Subject: [Python-Dev] [ANN] Python 2.5.4 (final)
In-Reply-To: <49514818.7060103@v.loewis.de>
References: <49514818.7060103@v.loewis.de>
Message-ID: <girn86$8cv$1@ger.gmane.org>

Martin v. L?wis wrote:

> For more information on Python 2.5.4, including download
> links for various platforms, release notes, and known issues, please
> see:
> 
>     http://www.python.org/2.5.4

http://www.python.org/download/releases/2.5.4/


From ncoghlan at gmail.com  Tue Dec 23 23:29:36 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 24 Dec 2008 08:29:36 +1000
Subject: [Python-Dev] Should there be a way or API for retrieving from a
 code object a loader method and package file where the code comes from?
In-Reply-To: <18769.6116.425537.968778@panix5.panix.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>	<495103D3.9000505@gmail.com>
	<18769.6116.425537.968778@panix5.panix.com>
Message-ID: <49516650.7010005@gmail.com>

R. Bernstein wrote:
> Nick Coghlan writes:
>  > 3. Do what a number of standard library APIs (e.g. linecache) that
>  > accept filenames do and also accept an optional "module globals"
>  > argument. 
> 
> Actually, I did this and committed a change (to pydb) before posting
> any of these queries. ;-)
> 
> If "a number of standard library APIs" are doing the *same* thing,
> then shouldn't this exposed as a common routine?
> 
> If on the other hand, by "a number" you mean "one" as in linecache --
> 1 *is* a number too! -- then perhaps the relevant code that is buried
> inside the "updatecache" should be exposed on its own.  (As a side
> benefit that code can be tested separately too!)
> 
> Should I file a feature request for this? 

The reason for my slightly odd phrasing is that all of the examples I
was originally going to mention (traceback, pdb, doctest, inspect)
actually all end up calling linecache to do the heavy lifting.

So it is possible that linecache.getlines() actually *is* the common
routine you're looking for - it just needs to be added to the
documentation and the __all__ attribute for linecache to be officially
supported. Currently, only the single line getline() function is
documented and exposed via __all__, but I don't see any reason for that
restriction - linecache.getlines() has been there with a stable API
since at least Python 2.5.

For cases where you have an appropriate Python object (i.e. a module,
function, method, class, traceback, frame or code object) rather than a
pseudo-filename, then inspect.getsource() actually jumps through a lot
of hoops to try to find the actual source code for that object - in
those cases, using the appropriate inspect function is generally a much
better idea than trying to interpret __file__ yourself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From martin at v.loewis.de  Tue Dec 23 23:43:45 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 23 Dec 2008 23:43:45 +0100
Subject: [Python-Dev] [ANN] Python 2.5.4 (final)
In-Reply-To: <girn86$8cv$1@ger.gmane.org>
References: <49514818.7060103@v.loewis.de> <girn86$8cv$1@ger.gmane.org>
Message-ID: <495169A1.6000205@v.loewis.de>

>> For more information on Python 2.5.4, including download
>> links for various platforms, release notes, and known issues, please
>> see:
>>
>>     http://www.python.org/2.5.4
> 
> http://www.python.org/download/releases/2.5.4/

Thanks for pointing that out; the original URL now also works as well
(as it does for all other releases).

Regards,
Martin

From steve at pearwood.info  Wed Dec 24 00:39:30 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 24 Dec 2008 10:39:30 +1100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
In-Reply-To: <loom.20081220T193826-31@post.gmane.org>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>
	<200812202155.28024.steve@pearwood.info>
	<loom.20081220T193826-31@post.gmane.org>
Message-ID: <200812241039.31452.steve@pearwood.info>

On Sun, 21 Dec 2008 06:45:11 am Antoine Pitrou wrote:
> Steven D'Aprano <steve <at> pearwood.info> writes:
> > In November 2007, a similar problem was reported on the
> > comp.lang.python newsgroup. 370MB was large enough to demonstrate
> > the problem. I don't know if a bug was ever reported.
>
> Do you still reproduce it on trunk?
> I've tried your scripts on my machine and they work fine, even if I
> leave garbage collecting enabled during the process.
> (dual core 64-bit machine but in 32-bit mode)

I'm afraid that sometime over the last year, I replaced my computer's 
motherboard, and now I can't reproduce the behaviour at all. I've tried 
two different boxes, with both Python 2.6.1 and 2.5.1.


-- 
Steven D'Aprano

From rocky at panix.com  Wed Dec 24 05:22:09 2008
From: rocky at panix.com (R. Bernstein)
Date: Tue, 23 Dec 2008 23:22:09 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
 code object a loader method and package file where the code comes from?
In-Reply-To: <49516650.7010005@gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<495103D3.9000505@gmail.com>
	<18769.6116.425537.968778@panix5.panix.com>
	<49516650.7010005@gmail.com>
Message-ID: <18769.47345.382346.169427@panix5.panix.com>

Nick Coghlan writes:
 > R. Bernstein wrote:
 > > Nick Coghlan writes:
 > >  > 3. Do what a number of standard library APIs (e.g. linecache) that
 > >  > accept filenames do and also accept an optional "module globals"
 > >  > argument. 
 > > 
 > > Actually, I did this and committed a change (to pydb) before posting
 > > any of these queries. ;-)
 > > 
 > > If "a number of standard library APIs" are doing the *same* thing,
 > > then shouldn't this exposed as a common routine?
 > > 
 > > If on the other hand, by "a number" you mean "one" as in linecache --
 > > 1 *is* a number too! -- then perhaps the relevant code that is buried
 > > inside the "updatecache" should be exposed on its own.  (As a side
 > > benefit that code can be tested separately too!)
 > > 
 > > Should I file a feature request for this? 
 > 
 > The reason for my slightly odd phrasing is that all of the examples I
 > was originally going to mention (traceback, pdb, doctest, inspect)
 > actually all end up calling linecache to do the heavy lifting.
 > 
 > So it is possible that linecache.getlines() actually *is* the common
 > routine you're looking for 

I never asked about getting the text lines for the source code, no
matter how many times people suggest that as an alternative. :-)

Instead, I was asking about a common way to get information about the
source location for say a frame or traceback object (which might
include package name and type) and suggest that there should be a more
unambiguous way to display this information than seems to be in use at
present.

Part of work to retrieve or displaying that information has to do the
some of the same things that is inside of linecache.updatecache()
*before* it retrieves the lines of the source code (when
possible). And possibly parts of it include parts of what's done in
pieces of the inspect module.

 > - it just needs to be added to the
 > documentation and the __all__ attribute for linecache to be officially
 > supported. Currently, only the single line getline() function is
 > documented and exposed via __all__, but I don't see any reason for that
 > restriction - linecache.getlines() has been there with a stable API
 > since at least Python 2.5.
 > 
 > For cases where you have an appropriate Python object (i.e. a module,
 > function, method, class, traceback, frame or code object) rather than a
 > pseudo-filename, then inspect.getsource() actually jumps through a lot
 > of hoops to try to find the actual source code for that object - in
 > those cases, using the appropriate inspect function is generally a much
 > better idea than trying to interpret __file__ yourself.
 > 
 > Cheers,
 > Nick.

Thanks for the information. I will keep in mind those inspect routines. 

They probably will be a helpful for another problem I had been
wondering about -- how one can determine if there is no code
associated at a given a line and file. (In other words and invalid
location for a debugger line breakpoint, such as because the line
part of a comment or the interior line of a string that spans many
lines)

 > 
 > -- 
 > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
 > ---------------------------------------------------------------
 > _______________________________________________
 > Python-Dev mailing list
 > Python-Dev at python.org
 > http://mail.python.org/mailman/listinfo/python-dev
 > Unsubscribe: http://mail.python.org/mailman/options/python-dev/rocky%40gnu.org
 > 

From steve at holdenweb.com  Wed Dec 24 05:37:13 2008
From: steve at holdenweb.com (Steve Holden)
Date: Tue, 23 Dec 2008 23:37:13 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
 code object a loader method and package file where the code comes from?
In-Reply-To: <18769.47345.382346.169427@panix5.panix.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>	<495103D3.9000505@gmail.com>	<18769.6116.425537.968778@panix5.panix.com>	<49516650.7010005@gmail.com>
	<18769.47345.382346.169427@panix5.panix.com>
Message-ID: <gise9r$pem$1@ger.gmane.org>

R. Bernstein wrote:
> Nick Coghlan writes:
>  > R. Bernstein wrote:
>  > > Nick Coghlan writes:
>  > >  > 3. Do what a number of standard library APIs (e.g. linecache) that
>  > >  > accept filenames do and also accept an optional "module globals"
>  > >  > argument. 
>  > > 
>  > > Actually, I did this and committed a change (to pydb) before posting
>  > > any of these queries. ;-)
>  > > 
>  > > If "a number of standard library APIs" are doing the *same* thing,
>  > > then shouldn't this exposed as a common routine?
>  > > 
>  > > If on the other hand, by "a number" you mean "one" as in linecache --
>  > > 1 *is* a number too! -- then perhaps the relevant code that is buried
>  > > inside the "updatecache" should be exposed on its own.  (As a side
>  > > benefit that code can be tested separately too!)
>  > > 
>  > > Should I file a feature request for this? 
>  > 
>  > The reason for my slightly odd phrasing is that all of the examples I
>  > was originally going to mention (traceback, pdb, doctest, inspect)
>  > actually all end up calling linecache to do the heavy lifting.
>  > 
>  > So it is possible that linecache.getlines() actually *is* the common
>  > routine you're looking for 
> 
> I never asked about getting the text lines for the source code, no
> matter how many times people suggest that as an alternative. :-)
> 
> Instead, I was asking about a common way to get information about the
> source location for say a frame or traceback object (which might
> include package name and type) and suggest that there should be a more
> unambiguous way to display this information than seems to be in use at
> present.
> 
I agree. Since PEP 302 many parts of Python are rather too file-centric
for my liking. I notes almost four years ago, for example, that the
interpreter assumes that the os module will be imported from filestore
in order to set the prefix. This issue appears to have received no
attention since, and I'm certainly not the one with the best skills or
knowledge to solve this problem.

  http://bugs.python.org/issue1116520

> Part of work to retrieve or displaying that information has to do the
> some of the same things that is inside of linecache.updatecache()
> *before* it retrieves the lines of the source code (when
> possible). And possibly parts of it include parts of what's done in
> pieces of the inspect module.
> 
>  > - it just needs to be added to the
>  > documentation and the __all__ attribute for linecache to be officially
>  > supported. Currently, only the single line getline() function is
>  > documented and exposed via __all__, but I don't see any reason for that
>  > restriction - linecache.getlines() has been there with a stable API
>  > since at least Python 2.5.
>  > 
>  > For cases where you have an appropriate Python object (i.e. a module,
>  > function, method, class, traceback, frame or code object) rather than a
>  > pseudo-filename, then inspect.getsource() actually jumps through a lot
>  > of hoops to try to find the actual source code for that object - in
>  > those cases, using the appropriate inspect function is generally a much
>  > better idea than trying to interpret __file__ yourself.
>  > 
>  > Cheers,
>  > Nick.
> 
> Thanks for the information. I will keep in mind those inspect routines. 
> 
> They probably will be a helpful for another problem I had been
> wondering about -- how one can determine if there is no code
> associated at a given a line and file. (In other words and invalid
> location for a debugger line breakpoint, such as because the line
> part of a comment or the interior line of a string that spans many
> lines)
> 
Looks like that start of some necessary attention to this issue. The
inspect module might indeed offer the right facilities. I'm still
wondering what we do about the various prefix settings in an environment
where there are no filestore imports at all.

In the event I can assist feel free to rope me in.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From rocky at gnu.org  Wed Dec 24 06:03:37 2008
From: rocky at gnu.org (rocky at gnu.org)
Date: Wed, 24 Dec 2008 00:03:37 -0500
Subject: [Python-Dev] Should there be a way or API for retrieving from a
	code object a loader method and package file where the code
	comes from?
In-Reply-To: <79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com>
References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com>
	<79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com>
	<18768.63272.61558.985690@panix5.panix.com>
	<79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com>
	<18769.5024.990970.46864@panix5.panix.com>
	<79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com>
Message-ID: <18769.49833.290713.414067@panix5.panix.com>

Paul Moore writes:
 > 2008/12/23 R. Bernstein <rocky at panix.com>:
 > >  A use case here I am thinking of here is in a stack trace or a
 > >  debugger, or a tool which wants to show in great detail, information
 > >  from a code object obtained possibly via a frame object.
 > 
 > Thanks for the clarifications. I see what you're after much better now.
 > 
 > > I find it kind of sucky to see in a traceback: "<string>" as opposed
 > > to the text (or prefix of the text) of the actual string that was
 > > passed. Or something that has been referred to as a "pseudo-file" like
 > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py
 > > when it is really member foo/bar.py of zipped egg
 > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.
 > 
 > Fair comment. That points to a "human readable" type of string. It's
 > not available at the moment, but I guess it could be.
 > 
 > But see below.
 > 
 > >  >     -  xxx.get_source_code(token) is a function (I don't know where,
 > >  > xxx is a placeholder for some "suitable" module) which, given such a
 > >  > token, returns the source, or None if there's no viable concept of
 > >  > "the source".
 > >
 > > There always is a viable concept of a source. It's whatever was done
 > > to get the code. For example, if it was via an eval then the source
 > > was the eval function and a string, same for exec. If it's via
 > > database access, well that then and some summary info about what's
 > > known about that.
 > 
 > Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc",
 > for example. But not "source" in the sense of "source code". Some
 > applications run with only bytecode shipped, no source code available
 > at all.
 > 
 > > There are two problems. One is displaying location information in an
 > > unambiguous way -- the pseudo-file above is ambiguous and so is
 > > <string> since there's no guarentee that OS's make to not name a file
 > > that. The second problem is programmatically getting information such
 > > as a debugger or an IDE might do so that the information can be
 > > conveyed back to a user who might want to inspect surrounding source
 > > code or modules.
 > 
 > This is more than you were asking for above.
 > 
 > The first problem is addressed with a "human readable" (narrative)
 > description, as above.
 > 
 > The second, however, requires machine-readable access to source code
 > (if it exists). That's what the loader get_source() call does for you.
 > But you have to be prepared for the fact that it may not be possible
 > to get source code, and decide what you want to happen in that case.

I'm missing your point here. 

When one uses information from a traceback, or is in a debugger, or is
in an IDE, it is assumed that in order to use the information given
you'll need access to the source code. And IDE's and debuggers have
had to deal with the fact that source code is not available from day
one, even before there was zipimporter.

In order to get the strings of source text that linecache.getlines()
gives, it has to prowl around for other information, possibly looking
for a loader along the protocol defined in PEP 302 and/or others. And
its that information that a debugger, IDE or some tool of that ilk
might need.

Many IDE's and debuggers nowadays open a socket and pass information
back and forth over that. An obvious advantage is that it means you
can debug remotely. But in order for this to work, some information is
generally passed back and for regarding the location of the source
text. In the Java world and Eclipse for example, it is possible for
the jar to be in a different location from on the machine which you
might be debugging on. And probably too often that jar isn't the same
one. So it is helpful in this kind of scenario to break out a location
into the name of a jar and the member inside the jar. Perhaps also
some information about that jar.

It is possible that instead of passing around locations, debuggers and
such tools instead use get_source() instead, because that's what
Python has to offer.  :-)

I jest here, but honestly I've been surprised that there is no IDE
that I know of that in fact works this way. The machine running the
code clearly may have more accurate access to the source than a
front-end IDE. Undeterred by the harsh facts of reality, I have hope
that someday there *might* be an IDE that has provision for this. So
in a Ruby debugger (ruby-debug) one can request checksum information
on the files the debugger things are loaded in order to facilitate
checking that the source one an IDE might be showing in fact matches
the source for that part of the code that one is currently under
investigation.


 > 
 > >  > I hope this is of some help,
 > >
 > > Yes, thanks. At least I now have a clearer idea of the state of
 > > where things stand.
 > 
 > Good. Sorry it's not better news :-)
 > 
 > Paul
 > 

From skip at pobox.com  Thu Dec 25 16:41:54 2008
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 25 Dec 2008 09:41:54 -0600
Subject: [Python-Dev] test message - please ignore
Message-ID: <18771.43458.868053.174950@montanaro-dyndns-org.local>

Merry Christmas everyone.  Still, just hit 'd'.  I'm testing the mpo spam
filter.

Skip

From list at qtrac.plus.com  Fri Dec 26 09:55:49 2008
From: list at qtrac.plus.com (Mark Summerfield)
Date: Fri, 26 Dec 2008 08:55:49 +0000
Subject: [Python-Dev] Python 3 - Mac Installer?
Message-ID: <200812260855.49518.list@qtrac.plus.com>

Hi,

Just wondered if/when there'd be a Mac installer for Python 3?

Thanks!

-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Programming in Python 3" - ISBN 0137129297


From techtonik at gmail.com  Fri Dec 26 15:25:34 2008
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 26 Dec 2008 16:25:34 +0200
Subject: [Python-Dev] os.defpath for Windows
In-Reply-To: <494E0A2B.4080704@gmail.com>
References: <494E0A2B.4080704@gmail.com>
Message-ID: <d34314100812260625r223660cfkb2d8003fb6891791@mail.gmail.com>

I can't see any logical reason for that. There should not be such a
hack to avoid "magical bugs" when PATH is empty.

On Sun, Dec 21, 2008 at 11:19 AM, Yinon Ehrlich <yinon.me at gmail.com> wrote:
> Hi,
>
> just saw that os.defpath for Windows is defined as
>        Lib/ntpath.py:30:defpath = '.;C:\\bin'
>
> Most Windows machines I saw has no c:\bin directory.
>
> Any reason why it was defined this way ?
> Thanks,
>        Yinon
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com
>



-- 
--anatoly t.

From status at bugs.python.org  Fri Dec 26 18:07:11 2008
From: status at bugs.python.org (Python tracker)
Date: Fri, 26 Dec 2008 18:07:11 +0100 (CET)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20081226170711.DF02C78301@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (12/19/08 - 12/26/08)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue 
number.  Do NOT respond to this message.


 2295 open (+38) / 14279 closed (+12) / 16574 total (+50)

Open issues with patches:   776

Average duration of open issues: 701 days.
Median duration of open issues: 2752 days.

Open Issues Breakdown
   open  2277 (+38)
pending    18 ( +0)

Issues Created Or Reopened (51)
_______________________________

IDLE Code Caching Windows                                        12/19/08
       http://bugs.python.org/issue4691    reopened amaury.forgeotdarc        
                                                                               

[PATCH] msvc9compiler raises IOError when no compiler found inst 12/19/08
       http://bugs.python.org/issue4702    created  pjenvey                   
       patch                                                                   

Syntax error in sample code for enumerate in documentation.      12/20/08
CLOSED http://bugs.python.org/issue4703    created  trenholmes                
                                                                               

Update pybench for python 3.0                                    12/20/08
       http://bugs.python.org/issue4704    created  marketdickinson           
       patch                                                                   

python3.0 -u: unbuffered stdout                                  12/20/08
       http://bugs.python.org/issue4705    created  haypo                     
                                                                               

try to build a C module, but don't worry if it doesn't work      12/20/08
       http://bugs.python.org/issue4706    created  zooko                     
                                                                               

round() shows undocumented behaviour                             12/20/08
       http://bugs.python.org/issue4707    created  dingo                     
       patch                                                                   

os.pipe should return inheritable descriptors (Windows)          12/21/08
       http://bugs.python.org/issue4708    created  castironpi                
                                                                               

Mingw-w64 and python on windows x64                              12/21/08
       http://bugs.python.org/issue4709    created  cdavid                    
       patch                                                                   

[PATCH] zipfile.ZipFile does not extract directories properly    12/21/08
       http://bugs.python.org/issue4710    created  faw                       
       patch                                                                   

Wide literals in the table of contents overflow in documentation 12/21/08
       http://bugs.python.org/issue4711    created  scottdial                 
                                                                               

Document pickle behavior for subclasses of dicts/lists           12/21/08
       http://bugs.python.org/issue4712    created  georg.brandl              
                                                                               

Installing sgmlop can crash xmlrpclib                            12/21/08
       http://bugs.python.org/issue4713    created  cito                      
       patch                                                                   

print opcode stats at the end of pybench runs                    12/21/08
       http://bugs.python.org/issue4714    created  pitrou                    
       patch                                                                   

optimize bytecode for conditional branches                       12/21/08
       http://bugs.python.org/issue4715    created  pitrou                    
       patch                                                                   

Python 3.0 halts on shutdown when settrace is set                12/22/08
       http://bugs.python.org/issue4716    created  fabioz                    
                                                                               

execfile conversion is not correct                               12/22/08
CLOSED http://bugs.python.org/issue4717    created  fabioz                    
                                                                               

wsgiref package totally broken                                   12/22/08
       http://bugs.python.org/issue4718    created  hdima                     
       patch                                                                   

sys.exc_clear() not flagged in any way                           12/22/08
CLOSED http://bugs.python.org/issue4719    created  fabioz                    
                                                                               

Extension function optional argument specification | causes Runt 12/22/08
CLOSED http://bugs.python.org/issue4720    created  pearu                     
                                                                               

pythonw.exe crash in GUI application(PythonWX)                   12/22/08
CLOSED http://bugs.python.org/issue4721    created  george                    
                                                                               

_winreg.QueryValue fault while reading mangled registry values   12/22/08
       http://bugs.python.org/issue4722    created  malicious.wizard          
                                                                               

os.path.basename error on directory names with numbers           12/22/08
CLOSED http://bugs.python.org/issue4723    created  kle_py                    
                                                                               

setting f_exc_traceback aborts in debug builds                   12/22/08
       http://bugs.python.org/issue4724    created  benjamin.peterson         
                                                                               

reporting file locations in egg (and other package) files        12/22/08
CLOSED http://bugs.python.org/issue4725    created  rocky                     
                                                                               

doctest gets line numbers wrong due to quotes in comments        12/22/08
       http://bugs.python.org/issue4726    created  guyer                     
       patch                                                                   

pickle/copyreg doesn't support keyword only arguments in __new__ 12/23/08
       http://bugs.python.org/issue4727    created  erickt                    
                                                                               

Endianness and universal builds problems                         12/23/08
       http://bugs.python.org/issue4728    created  cdavid                    
                                                                               

Documentation under 'pass' statement talks about exception very  12/23/08
CLOSED http://bugs.python.org/issue4729    created  orsenthil                 
                                                                               

cPickle corrupts high-unicode strings                            12/23/08
       http://bugs.python.org/issue4730    created  njs                       
                                                                               

suggest change to "Failed to find the necessary bits to build th 12/23/08
       http://bugs.python.org/issue4731    created  mkc                       
                                                                               

Object allocation stress leads to segfault on RHEL               12/23/08
       http://bugs.python.org/issue4732    created  ajg                       
                                                                               

Add a "decode to declared encoding" version of urlopen to urllib 12/23/08
       http://bugs.python.org/issue4733    created  ajaksu2                   
       patch                                                                   

broken link for 2.5.3 doc download                               12/24/08
CLOSED http://bugs.python.org/issue4734    created  quiver                    
                                                                               

An error occurred during the installation of assembly            12/24/08
       http://bugs.python.org/issue4735    created  rwpjr66                   
                                                                               

io.BufferedRWPair.closed broken; tries to call bool writer.close 12/24/08
CLOSED http://bugs.python.org/issue4736    created  semanticist               
                                                                               

documentation and noddy*.c                                       12/24/08
CLOSED http://bugs.python.org/issue4737    created  exe                       
                                                                               

Patch to make zlib-objects better support threads                12/24/08
       http://bugs.python.org/issue4738    created  ebfe                      
       patch                                                                   

[patch] Let users do help('@') and so on for confusing syntax co 12/24/08
       http://bugs.python.org/issue4739    created  alsuren                   
       patch                                                                   

pickle test for protocol 3 (HIGHEST_PROTOCOL in py3k)            12/24/08
       http://bugs.python.org/issue4740    created  ocean-city                
       patch, easy                                                             

winsound.SND_PURGE has no effect                                 12/24/08
CLOSED http://bugs.python.org/issue4741    created  Ultrasick                 
                                                                               

3.0 distutils byte-compiling -> Syntax error: unknown encoding:  12/24/08
       http://bugs.python.org/issue4742    created  sjmachin                  
                                                                               

intra-pkg multiple import (import local1, local2) not fixed      12/25/08
       http://bugs.python.org/issue4743    created  sjmachin                  
                                                                               

asynchat documentation needs to be more precise                  12/25/08
       http://bugs.python.org/issue4744    created  beazley                   
                                                                               

socket.send obscure error message                                12/25/08
       http://bugs.python.org/issue4745    created  Luther                    
                                                                               

Misguiding wording 3.0 c-api reference                           12/25/08
       http://bugs.python.org/issue4746    created  ebfe                      
                                                                               

SyntaxError executing a script containing non-ASCII characters i 12/26/08
       http://bugs.python.org/issue4747    created  gagenellina               
                                                                               

yield expression vs lambda                                       12/26/08
       http://bugs.python.org/issue4748    created  georg.brandl              
                                                                               

Issue with RotatingFileHandler logging handler on Windows        12/26/08
       http://bugs.python.org/issue4749    created  mramahi77                 
                                                                               

tarfile keeps excessive dir structure in compressed files        12/26/08
       http://bugs.python.org/issue4750    created  techtonik                 
       patch                                                                   

Patch for better thread support in hashlib                       12/26/08
       http://bugs.python.org/issue4751    created  ebfe                      
       patch                                                                   



Issues Now Closed (27)
______________________

ctypes function pointer enhancements                              349 days
       http://bugs.python.org/issue1797    haypo                     
       patch                                                                   

[distutils] - error when processing the "--formats=tar" option    340 days
       http://bugs.python.org/issue1885    techtonik                 
       patch                                                                   

IDLE "find in files" output not formatted optimally               206 days
       http://bugs.python.org/issue2996    loewis                    
       patch                                                                   

speedup some comparisons                                          190 days
       http://bugs.python.org/issue3106    pitrou                    
       patch                                                                   

Cannot start wsgiref simple server in Py3k                        163 days
       http://bugs.python.org/issue3348    pitrou                    
       patch                                                                   

create a numbits() method for int and long types                  148 days
       http://bugs.python.org/issue3439    marketdickinson           
       patch, needs review                                                     

wsgiref.simple_server fails to run demo_app                       107 days
       http://bugs.python.org/issue3795    pitrou                    
                                                                               

Tkinter cannot find Tcl/Tk on Mac OS X                             79 days
       http://bugs.python.org/issue4017    benjamin.peterson         
                                                                               

library.pdf - Section 17.6.4 Examples - Multiprocessing - Format   62 days
       http://bugs.python.org/issue4162    benjamin.peterson         
                                                                               

library/turtle.rst does not format properly in PDF mode            62 days
       http://bugs.python.org/issue4169    benjamin.peterson         
                                                                               

2to3 drops executable bit with --write                             15 days
       http://bugs.python.org/issue4602    benjamin.peterson         
       patch                                                                   

Add Mac OS X Disk Images to Python.org homepage                    10 days
       http://bugs.python.org/issue4627    benjamin.peterson         
                                                                               

test_bad_address in test_urllib2_localnet often fails               7 days
       http://bugs.python.org/issue4666    rpetrov                   
                                                                               

Typo in PyObjC URL on "GUI Programming on the Mac"                  3 days
       http://bugs.python.org/issue4689    loewis                    
                                                                               

UnicodeEncodeError in license()                                     0 days
       http://bugs.python.org/issue4700    amaury.forgeotdarc        
                                                                               

Syntax error in sample code for enumerate in documentation.         0 days
       http://bugs.python.org/issue4703    benjamin.peterson         
                                                                               

execfile conversion is not correct                                  0 days
       http://bugs.python.org/issue4717    benjamin.peterson         
                                                                               

sys.exc_clear() not flagged in any way                              0 days
       http://bugs.python.org/issue4719    benjamin.peterson         
                                                                               

Extension function optional argument specification | causes Runt    0 days
       http://bugs.python.org/issue4720    benjamin.peterson         
                                                                               

pythonw.exe crash in GUI application(PythonWX)                      0 days
       http://bugs.python.org/issue4721    loewis                    
                                                                               

os.path.basename error on directory names with numbers              0 days
       http://bugs.python.org/issue4723    loewis                    
                                                                               

reporting file locations in egg (and other package) files           1 days
       http://bugs.python.org/issue4725    loewis                    
                                                                               

Documentation under 'pass' statement talks about exception very     1 days
       http://bugs.python.org/issue4729    benjamin.peterson         
                                                                               

broken link for 2.5.3 doc download                                  0 days
       http://bugs.python.org/issue4734    loewis                    
                                                                               

io.BufferedRWPair.closed broken; tries to call bool writer.close    0 days
       http://bugs.python.org/issue4736    benjamin.peterson         
                                                                               

documentation and noddy*.c                                          0 days
       http://bugs.python.org/issue4737    benjamin.peterson         
                                                                               

winsound.SND_PURGE has no effect                                    2 days
       http://bugs.python.org/issue4741    georg.brandl              
                                                                               



Top Issues Most Discussed (10)
______________________________

 23 wsgiref package totally broken                                     4 days
open    http://bugs.python.org/issue4718   

 23 round() shows undocumented behaviour                               6 days
open    http://bugs.python.org/issue4707   

 10 range objects becomes hashable after attribute access              7 days
open    http://bugs.python.org/issue4701   

 10 Added clearerr() to clear EOF state                              613 days
open    http://bugs.python.org/issue1706039

  9 sys.exc_clear() not flagged in any way                             0 days
closed  http://bugs.python.org/issue4719   

  6 zipfile returns string but expects binary                         16 days
open    http://bugs.python.org/issue4621   

  5 Get rid of more refercenes to __cmp__                             71 days
open    http://bugs.python.org/issue1717   

  5 subprocess is not EINTR-safe                                    1500 days
open    http://bugs.python.org/issue1068268

  4 os.path.basename error on directory names with numbers             0 days
closed  http://bugs.python.org/issue4723   

  4 Permit to easily use distutils "--formats=tar,gztar,bztar" on a  340 days
open    http://bugs.python.org/issue1886   




From mikko+python at redinnovation.com  Fri Dec 26 22:46:14 2008
From: mikko+python at redinnovation.com (Mikko Ohtamaa)
Date: Fri, 26 Dec 2008 23:46:14 +0200
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
In-Reply-To: <gu4vdtc7c8g.fsf@ee.oulu.fi.DONT_SPAM>
References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>
	<gu4vdtc7c8g.fsf@ee.oulu.fi.DONT_SPAM>
Message-ID: <7b5b293c0812261346v746a95e9r41751cf29c8d3c55@mail.gmail.com>

On Mon, Dec 22, 2008 at 12:09 PM, Erno Kuusela <erno at iki.fi> wrote:

>
> unexec probably work out of the box on symbian, but...:
>
> http://mail.python.org/pipermail/python-dev/2003-May/035727.html
>
>
unexec() is pretty much what I was looking for. However, looks like its old
hack from 80s and cannot be applied as is to the modern environment.
Basically unexec() dumps the running application code (not specific to any
interpreter) and data segments out as a.out binary.

1) Generating a binary file is not possible on Symbian and iPhone
environments, because all binaries must be signed - however, we can probably
use a generic stub exe which loads data segment only

2) a.out format is deprecated

3) Dynamic DLLs are not managed - basically a show stopper

I hope I could find someone find enough OS fu to tell whether this is
possible with DLLs at all and what data pointers must be patched on each
unexec() call.

-Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081226/c77f71bb/attachment-0001.htm>

From mikko+python at redinnovation.com  Fri Dec 26 22:56:13 2008
From: mikko+python at redinnovation.com (Mikko Ohtamaa)
Date: Fri, 26 Dec 2008 23:56:13 +0200
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
In-Reply-To: <494D69D2.5090601@v.loewis.de>
References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>
	<494D69D2.5090601@v.loewis.de>
Message-ID: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com>

>
>
>
> Of course, you still have the actual interpretation of
> the top-level module code - if it's not the marshalling
> but this part that actually costs performance, this
> efficient marshalling algorithm won't help. It would be
> interesting to find out which modules have a particularly
> high startup cost - perhaps they can be rewritten


I am afraid this is the case. I hope we could marshal an arbitary
application state (not even Python specific) into a fast loading dump file
(hibernation/snapshot).

We have tried to use lazy importing as much as possible to distribute the
importing cost across the application UI states.

Out of my head I know at least two particular module which could be
refactored. I'd recommend as the best practice that everything should be
imported lazily if it's possible. However, looks like currently Python
community is moving to another direction, since doing explict imports in
__init__ etc. makes APIs cleaner (think Django) and debugging more sane task
- Python is mainly used on the server and limited environments haven't been
particular interesting until lately.

logging - defines lots of classes which are used only if they are specified
by logging options. I once hacked this for my personal use to be little
lighter.

urllib - particular heavy, imports httplib, ftplib and stuff even if it is
not used

Nokia has just released Python 2.5 based PyS60. I think we'll come back this
after a while with a nice generic profiler which will tell the import cost.

Merry XMas,
-Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081226/9610322e/attachment.htm>

From ncoghlan at gmail.com  Sat Dec 27 00:06:49 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 27 Dec 2008 09:06:49 +1000
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
In-Reply-To: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com>
References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>	<494D69D2.5090601@v.loewis.de>
	<7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com>
Message-ID: <49556389.8080206@gmail.com>

Mikko Ohtamaa wrote:
> Out of my head I know at least two particular module which could be
> refactored. I'd recommend as the best practice that everything should be
> imported lazily if it's possible.

We actually have a reason for discouraging lazy imports - using them
carelessly makes it much easier to accidentally deadlock yourself on the
import lock.

I agree that this contributes to the problem of long startup times though.

One sledgehammer approach to lazy imports is to modify the actual import
system to use lazy imports by default, rather than having to explicitly
enable them in a given module or for each particular import.

Mercurial does this quite nicely by overriding the __import__
implementation [1].

Perhaps PyS60 could install something similar in site.py? The trade-off
will be whether enough time is saved in avoiding "wasted" module loads
to make up for the extra time spent managing the bookkeeping for the
lazy imports.

Cheers,
Nick.

[1] From a recent thread on Python-Ideas that Google found for me:
http://selenic.com/repo/index.cgi/hg-stable/file/967adcf5910d/mercurial/demandimport.py#l1

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From benjamin at python.org  Sat Dec 27 00:30:45 2008
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 26 Dec 2008 17:30:45 -0600
Subject: [Python-Dev] Python 3 - Mac Installer?
In-Reply-To: <200812260855.49518.list@qtrac.plus.com>
References: <200812260855.49518.list@qtrac.plus.com>
Message-ID: <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com>

On Fri, Dec 26, 2008 at 2:55 AM, Mark Summerfield <list at qtrac.plus.com> wrote:
> Hi,
>
> Just wondered if/when there'd be a Mac installer for Python 3?

I think there should be one eventually. Unfortunately, the 3.x build
process is not ironed out. If somebody wants to make a patch which
makes the build script in Mac/BuildScript/ work, I'd be very happy. :)

>
> Thanks!




-- 
Regards,
Benjamin Peterson

From skip at pobox.com  Sat Dec 27 00:40:51 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 26 Dec 2008 17:40:51 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
Message-ID: <18773.27523.297588.265405@montanaro-dyndns-org.local>

The doc for os.path.commonprefix states:

    Return the longest path prefix (taken character-by-character) that is a
    prefix of all paths in list. If list is empty, return the empty string
    (''). Note that this may return invalid paths because it works a
    character at a time.

I remember encountering this in an earlier version of Python 2.x (maybe 2.2
or 2.3?) and "fixed" it to work by pathname components instead of by
characters.  That had to be reverted because it was a behavior change and
broke code which used it for strings which didn't represent paths.  After
the reversion I then forgot about it.

I just stumbled upon it again.  It seems to me this would have been a good
thing to fix in 3.0.  Is this something which could change in 3.1 (or be
deprecated in 3.1 with deletion in 3.2)?

Skip


From martin at v.loewis.de  Sat Dec 27 00:52:28 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 27 Dec 2008 00:52:28 +0100
Subject: [Python-Dev] VM imaging based launch optimizations for CPython?
In-Reply-To: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com>
References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com>	<494D69D2.5090601@v.loewis.de>
	<7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com>
Message-ID: <49556E3C.7030903@v.loewis.de>

>     Of course, you still have the actual interpretation of
>     the top-level module code - if it's not the marshalling
>     but this part that actually costs performance, this
>     efficient marshalling algorithm won't help. It would be
>     interesting to find out which modules have a particularly
>     high startup cost - perhaps they can be rewritten
> 
> 
> I am afraid this is the case.

Is that an unfounded or a founded fear? IOW, do you have hard numbers
proving that it is the actual interpretation time (rather than the
marshaling time) that causes the majority of the startup cost?

> I hope we could marshal an arbitary
> application state (not even Python specific) into a fast loading dump
> file (hibernation/snapshot).

I understand that this is what you want to get. I'm proposing that
there might be a different approach to achieve a similar speedup.

> logging - defines lots of classes which are used only if they are
> specified by logging options. I once hacked this for my personal use to
> be little lighter.

So what speedup did you gain by rewriting it? (i.e. how many
microseconds did "import logging" take before, how much afterwards?)
How much of it was parsing/unmarshaling, and how much time byte
code interpretation? Of the byte code interpretation, what opcodes
in particular?

> urllib - particular heavy, imports httplib, ftplib and stuff even if it
> is not used

Same questions here. This doesn't sound like any heavy computation is
being done during startup.

> Nokia has just released Python 2.5 based PyS60. I think we'll come back
> this after a while with a nice generic profiler which will tell the
> import cost.

Looking forward to hear your numbers!

Regards,
Martin

From ncoghlan at gmail.com  Sat Dec 27 00:58:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 27 Dec 2008 09:58:07 +1000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
Message-ID: <49556F8F.5090709@gmail.com>

skip at pobox.com wrote:
> The doc for os.path.commonprefix states:
> 
>     Return the longest path prefix (taken character-by-character) that is a
>     prefix of all paths in list. If list is empty, return the empty string
>     (''). Note that this may return invalid paths because it works a
>     character at a time.
> 
> I remember encountering this in an earlier version of Python 2.x (maybe 2.2
> or 2.3?) and "fixed" it to work by pathname components instead of by
> characters.  That had to be reverted because it was a behavior change and
> broke code which used it for strings which didn't represent paths.  After
> the reversion I then forgot about it.
> 
> I just stumbled upon it again.  It seems to me this would have been a good
> thing to fix in 3.0.  Is this something which could change in 3.1 (or be
> deprecated in 3.1 with deletion in 3.2)?

Why can't we add an "allow_fragment" keyword that defaults to True? Then
"allow_fragment=False" will stop at the last full directory name and
ignore any partial matches on the filenames or the next subdirectory
(depending on where the common prefix ends).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From skip at pobox.com  Sat Dec 27 01:04:40 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 26 Dec 2008 18:04:40 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
Message-ID: <18773.28952.937116.329215@montanaro-dyndns-org.local>


    skip> I just stumbled upon it again.  It seems to me this would have
    skip> been a good thing to fix in 3.0.  Is this something which could
    skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)?

Hmmm...  I didn't really mean "deletion".  I meant, could a behavior change
be implemented in 3.2 with a warning emitted in 3.1?

Skip

From skip at pobox.com  Sat Dec 27 03:49:55 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 26 Dec 2008 20:49:55 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <49556F8F.5090709@gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
Message-ID: <18773.38867.117021.560152@montanaro-dyndns-org.local>


    Nick> Why can't we add an "allow_fragment" keyword that defaults to
    Nick> True? Then "allow_fragment=False" will stop at the last full
    Nick> directory name and ignore any partial matches on the filenames or
    Nick> the next subdirectory (depending on where the common prefix ends).

You could I suppose though that would just be adding another hack on top of
existing questionable behavior.  I wasn't so concerned with implementation
as whether or not a change to the semantics of the function was possible.

Skip


From skip at pobox.com  Sat Dec 27 05:03:03 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 26 Dec 2008 22:03:03 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
Message-ID: <18773.43255.196790.18980@montanaro-dyndns-org.local>


    skip> I just stumbled upon it again.  It seems to me this would have
    skip> been a good thing to fix in 3.0.  Is this something which could
    skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)?

This new issue in the tracker:

    http://bugs.python.org/issue4755

implements a commonpathprefix function.  As explained in the submission this
would be my second choice should it be decided that something should change.

Skip


From steve at pearwood.info  Sat Dec 27 07:37:20 2008
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 27 Dec 2008 17:37:20 +1100
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <49556F8F.5090709@gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
Message-ID: <200812271737.22101.steve@pearwood.info>

On Sat, 27 Dec 2008 10:58:07 am Nick Coghlan wrote:
> skip at pobox.com wrote:
> > The doc for os.path.commonprefix states:
> >
> >     Return the longest path prefix (taken character-by-character)
> > that is a prefix of all paths in list. If list is empty, return the
> > empty string (''). Note that this may return invalid paths because
> > it works a character at a time.
> >
> > I remember encountering this in an earlier version of Python 2.x
> > (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components
> > instead of by characters.  That had to be reverted because it was a
> > behavior change and broke code which used it for strings which
> > didn't represent paths.  After the reversion I then forgot about
> > it.
> >
> > I just stumbled upon it again.  It seems to me this would have been
> > a good thing to fix in 3.0.  Is this something which could change
> > in 3.1 (or be deprecated in 3.1 with deletion in 3.2)?
>
> Why can't we add an "allow_fragment" keyword that defaults to True?
> Then "allow_fragment=False" will stop at the last full directory name
> and ignore any partial matches on the filenames or the next
> subdirectory (depending on where the common prefix ends).

For what it's worth, I think that the two pieces of functionality are 
different enough that in an ideal world they should be two different 
functions rather than one function with a switch. I think 
os.path.commonprefix should only operate on path components, and if 
character-by-character prefix matching on general strings is useful, 
then it should be a string method.



-- 
Steven D'Aprano

From solipsis at pitrou.net  Sat Dec 27 17:10:20 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Dec 2008 16:10:20 +0000 (UTC)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
Message-ID: <loom.20081227T160809-547@post.gmane.org>

<skip <at> pobox.com> writes:
> You could I suppose though that would just be adding another hack on top of
> existing questionable behavior.

Agreed. We should fix the original function so that it has the obvious, intented
effect. Leaving the buggy function in place and adding another function with the
proper behaviour sounds ridiculous.




From skip at pobox.com  Sat Dec 27 17:57:40 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 27 Dec 2008 10:57:40 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <loom.20081227T160809-547@post.gmane.org>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
Message-ID: <18774.24196.530208.708594@montanaro-dyndns-org.local>


    >> You could I suppose though that would just be adding another hack on
    >> top of existing questionable behavior.

    Antoine> Agreed. We should fix the original function so that it has the
    Antoine> obvious, intented effect. Leaving the buggy function in place
    Antoine> and adding another function with the proper behaviour sounds
    Antoine> ridiculous.

If we add commonpath or commonpathprefix or pathprefix, or whatever, then
find someplace to move the existing commonprefix function (maybe to the
string module or as a class method of string objects?) then could we make a
2to3 fixer for this?

Skip

From solipsis at pitrou.net  Sat Dec 27 18:07:15 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Dec 2008 17:07:15 +0000 (UTC)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
Message-ID: <loom.20081227T170641-131@post.gmane.org>

<skip <at> pobox.com> writes:
> 
> If we add commonpath or commonpathprefix or pathprefix, or whatever, then
> find someplace to move the existing commonprefix function (maybe to the
> string module or as a class method of string objects?) then could we make a
> 2to3 fixer for this?

IMHO it's a bug, the py3k migration process needn't apply.




From ncoghlan at gmail.com  Sat Dec 27 21:44:00 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Dec 2008 06:44:00 +1000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <loom.20081227T170641-131@post.gmane.org>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49556F8F.5090709@gmail.com>	<18773.38867.117021.560152@montanaro-dyndns-org.local>	<loom.20081227T160809-547@post.gmane.org>	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org>
Message-ID: <49569390.1080805@gmail.com>

Antoine Pitrou wrote:
> <skip <at> pobox.com> writes:
>> If we add commonpath or commonpathprefix or pathprefix, or whatever, then
>> find someplace to move the existing commonprefix function (maybe to the
>> string module or as a class method of string objects?) then could we make a
>> 2to3 fixer for this?
> 
> IMHO it's a bug, the py3k migration process needn't apply.

The current behaviour is exactly what one would need to implement
bash-style tab completion [1], so I don't get why anyone is calling it
"useless" or "obviously broken". It's brokenness isn't obvious at all to
me - it just doesn't do what you want it to do.

Adding a separate function called "os.path.commonpath" with the
behaviour Skip wants sounds like *exactly* the right answer to me.

Cheers,
Nick.

*

entries = os.listdir()
candidates = [e for e in entries if e.startswith(typed)]
if len(candidates) > 1:
  tab_result = os.path.commonprefix(entries)
elif candidates:
  tab_result = candidates[0]
else:
  tab_result = typed


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sat Dec 27 21:59:28 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 27 Dec 2008 20:59:28 +0000 (UTC)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49556F8F.5090709@gmail.com>	<18773.38867.117021.560152@montanaro-dyndns-org.local>	<loom.20081227T160809-547@post.gmane.org>	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org>
	<49569390.1080805@gmail.com>
Message-ID: <loom.20081227T204733-481@post.gmane.org>

Nick Coghlan <ncoghlan <at> gmail.com> writes:
> 
> The current behaviour is exactly what one would need to implement
> bash-style tab completion [1], so I don't get why anyone is calling it
> "useless" or "obviously broken".

Point taken. 
Although the fact that it lives in os.path suggests that the function should
know about path components instead of ignoring their existence... A generic
longest common prefix function would belong elsewhere.

The issue people are having with the proposal to create a separate function is
that it's a bloat of the API. I don't think the os.path module claims to give
utilities for implementing bash-style tab completion, however it is supposed to
make manipulation of paths easier -- which returning invalid answers (or, worse,
valid but intuitively wrong answers) does not.



From ncoghlan at gmail.com  Sun Dec 28 07:26:10 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Dec 2008 16:26:10 +1000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <loom.20081227T204733-481@post.gmane.org>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49556F8F.5090709@gmail.com>	<18773.38867.117021.560152@montanaro-dyndns-org.local>	<loom.20081227T160809-547@post.gmane.org>	<18774.24196.530208.708594@montanaro-dyndns-org.local>	<loom.20081227T170641-131@post.gmane.org>	<49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org>
Message-ID: <49571C02.7090205@gmail.com>

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> The current behaviour is exactly what one would need to implement
>> bash-style tab completion [1], so I don't get why anyone is calling it
>> "useless" or "obviously broken".
> 
> Point taken. 
> Although the fact that it lives in os.path suggests that the function should
> know about path components instead of ignoring their existence... A generic
> longest common prefix function would belong elsewhere.
> 
> The issue people are having with the proposal to create a separate function is
> that it's a bloat of the API. I don't think the os.path module claims to give
> utilities for implementing bash-style tab completion, however it is supposed to
> make manipulation of paths easier -- which returning invalid answers (or, worse,
> valid but intuitively wrong answers) does not.

True, but it's a matter of weighing up the migration cost of the two
options:

a) Add a new function (e.g. os.path.commonpath) which works on a path
component basis. Zero migration cost, minor ongoing cost in explaining
the difference between commonpath (with path component based semantics)
and commprefix (with character based semantics). That ongoing cost can
largely be handled just by referencing the two functions from each
other's documentation (note that they will actually be next to each
other in the alphabetical list of os.path functions, and the path
component based one will appear before the character based one).

b) Deprecate the current semantics of os.path.commonprefix (which will
likely involve changing the name anyway, since it is easier to deprecate
the old semantics when the new semantics have a different name), add the
new path component based semantics, add the character-based semantics
back somewhere else. This imposes a major migration cost (since the old
commonprefix will at least change its name) with significant potential
for confusion due to the semantic changes across versions (if the
commonprefix name is reused for the new semantics).

If we're going to end up with two functions anyway, why mess with the
one which is already there and in use for real programs? Just add a new
function with the new semantics and be done with it. Anything else will
just cause migration pain without any significant counterbalancing benefit.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sun Dec 28 11:29:01 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Dec 2008 11:29:01 +0100
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <49571C02.7090205@gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org>	<49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org> <49571C02.7090205@gmail.com>
Message-ID: <1230460141.6361.4.camel@localhost>

Le dimanche 28 d?cembre 2008 ? 16:26 +1000, Nick Coghlan a ?crit :
> If we're going to end up with two functions anyway, why mess with the
> one which is already there and in use for real programs?

Well, agreed.
I was just hoping we could get away with "fixing" the existing function
and voil? :)




From ncoghlan at gmail.com  Sun Dec 28 11:47:46 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Dec 2008 20:47:46 +1000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <1230460141.6361.4.camel@localhost>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49556F8F.5090709@gmail.com>	<18773.38867.117021.560152@montanaro-dyndns-org.local>	<loom.20081227T160809-547@post.gmane.org>	<18774.24196.530208.708594@montanaro-dyndns-org.local>	<loom.20081227T170641-131@post.gmane.org>	<49569390.1080805@gmail.com>	<loom.20081227T204733-481@post.gmane.org>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
Message-ID: <49575952.7070405@gmail.com>

Antoine Pitrou wrote:
> Le dimanche 28 d?cembre 2008 ? 16:26 +1000, Nick Coghlan a ?crit :
>> If we're going to end up with two functions anyway, why mess with the
>> one which is already there and in use for real programs?
> 
> Well, agreed.
> I was just hoping we could get away with "fixing" the existing function
> and voil? :)

I'm all for breaking backwards compatibility when it allows some genuine
improvements that would otherwise be impossible, but in this particular
case a little API bloat seems like the least of the available evils :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From solipsis at pitrou.net  Sun Dec 28 11:51:48 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Dec 2008 10:51:48 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?Hello_everyone_+_little_question_around=09?=
	=?utf-8?q?Cpython/stackless?=
References: <49500B86.1070605@wanadoo.fr>
Message-ID: <loom.20081228T104805-946@post.gmane.org>


Hello,

> I'm currently studying all I can find on stackless python, PYPY and the 
> concepts they've brought to Python, and so far I wonder : since 
> stackless python claims to be 100% compatible with CPython's extensions, 
> faster, and brings lots of fun stuffs (tasklets, coroutines and no C 
> stack), how comes it hasn't been merged back, to become the standard 
> 'fast' python implementation ?

I'm not sure Stackless ever claimed to be faster than CPython for standard tasks
(i.e., not coroutine-related). Do you have any pointers to this?

As for coroutines, the greenlets (*) package is said to bring them to the
standard interpreter.

(*) http://codespeak.net/py/dist/greenlet.html

Regards

Antoine.



From ncoghlan at gmail.com  Sun Dec 28 13:09:44 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Dec 2008 22:09:44 +1000
Subject: [Python-Dev] Call PyType_Ready on builtin types during
	interpreter startup?
In-Reply-To: <494C9C08.5030702@gmail.com>
References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com>
Message-ID: <49576C88.30503@gmail.com>

Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Is there a specific reason for not fully initialising the builtin types?
>> Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?
> 
> I need to correct this slightly: some builtin types *are* initialised
> properly by _Py_ReadyTypes.
> 
> So the question is actually whether or not the missing builtin types
> should be added to that function.

I'm probably going to fix the specific problem with hashing of range
objects in Py3k just by initialising xrange/range properly in
_Py_ReadyTypes.

However, I wonder how many other builtin types have the same problem -
for example, the enumerate type is also missing a call to PyType_Ready:

Python 3.1a0 (py3k, Dec 14 2008, 21:35:11)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = enumerate([])
>>> hash(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'enumerate'
>>> enumerate.__name__ # implicit call to PyType_Ready
'enumerate'
>>> hash(x)
-1212398692

Rather than playing whack-a-mole with this, does anyone have any ideas
on how to systematically find types which are defined in the core, but
are missing an explicit PyType_Ready call? (I guess one way would be to
remove all the implicit calls in a local build and see what blows up...
that seems a little drastic though)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Sun Dec 28 13:46:29 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 28 Dec 2008 22:46:29 +1000
Subject: [Python-Dev] Call PyType_Ready on builtin types during
	interpreter startup?
In-Reply-To: <49576C88.30503@gmail.com>
References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com>
	<49576C88.30503@gmail.com>
Message-ID: <49577525.1080800@gmail.com>

Nick Coghlan wrote:
> Rather than playing whack-a-mole with this, does anyone have any ideas
> on how to systematically find types which are defined in the core, but
> are missing an explicit PyType_Ready call? (I guess one way would be to
> remove all the implicit calls in a local build and see what blows up...
> that seems a little drastic though)

The whack-a-mole tactic did pick up a couple more though - the two
"builtin" types that iter() can return (the basic sequence iterator and
the callable with sentinel result iterator).

Perhaps the path of least resistance is to change PyObject_Hash to be
yet another place where PyType_Ready will be called implicitly if it
hasn't been called already?

That approach would get us back to the Python 2.x status quo where
calling PyType_Ready was only absolutely essential if you wanted to
correctly inherit a slot from a type other than object itself.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From eric at trueblade.com  Sun Dec 28 14:54:01 2008
From: eric at trueblade.com (Eric Smith)
Date: Sun, 28 Dec 2008 08:54:01 -0500
Subject: [Python-Dev] Call PyType_Ready on builtin types
 during	interpreter startup?
In-Reply-To: <49577525.1080800@gmail.com>
References: <494C1CE4.5080102@gmail.com>
	<494C9C08.5030702@gmail.com>	<49576C88.30503@gmail.com>
	<49577525.1080800@gmail.com>
Message-ID: <495784F9.8010008@trueblade.com>

Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Rather than playing whack-a-mole with this, does anyone have any ideas
>> on how to systematically find types which are defined in the core, but
>> are missing an explicit PyType_Ready call? (I guess one way would be to
>> remove all the implicit calls in a local build and see what blows up...
>> that seems a little drastic though)
> 
> The whack-a-mole tactic did pick up a couple more though - the two
> "builtin" types that iter() can return (the basic sequence iterator and
> the callable with sentinel result iterator).
> 
> Perhaps the path of least resistance is to change PyObject_Hash to be
> yet another place where PyType_Ready will be called implicitly if it
> hasn't been called already?

I think that's the best thing to do. It would bring PyObject_Hash in 
line with PyObject_Format, for example.

Eric.

From solipsis at pitrou.net  Sun Dec 28 20:11:55 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 28 Dec 2008 19:11:55 +0000 (UTC)
Subject: [Python-Dev] Use -M option on buildbots?
Message-ID: <loom.20081228T191015-744@post.gmane.org>

Hi all,

Could we use the -M option (with a suitable value depending on the amount of
physical RAM) for regression tests on the buildbots? It would help avoid the
kind of situation described in http://bugs.python.org/issue3700

cheers

Antoine.



From martin at v.loewis.de  Sun Dec 28 21:21:25 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sun, 28 Dec 2008 21:21:25 +0100
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <49575952.7070405@gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49556F8F.5090709@gmail.com>	<18773.38867.117021.560152@montanaro-dyndns-org.local>	<loom.20081227T160809-547@post.gmane.org>	<18774.24196.530208.708594@montanaro-dyndns-org.local>	<loom.20081227T170641-131@post.gmane.org>	<49569390.1080805@gmail.com>	<loom.20081227T204733-481@post.gmane.org>	<49571C02.7090205@gmail.com>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
Message-ID: <4957DFC5.9030405@v.loewis.de>

> I'm all for breaking backwards compatibility when it allows some genuine
> improvements that would otherwise be impossible, but in this particular
> case a little API bloat seems like the least of the available evils :)

I don't think any change is necessary. os.path.commonprefix works just
fine on path components:

py> p = ["/usr/bin/ls", "/usr/bin/ln"]
py> os.path.commonprefix([f.split('/') for f in p])
['', 'usr', 'bin']
py> p.append("/usr/local/bin/ls")
py> os.path.commonprefix([f.split('/') for f in p])
['', 'usr']

Of course, using it that way would require a library function that
reliably splits a path into components; I think one would have to do
abspath on arbitrary inputs.

Regards,
Martin

From rhamph at gmail.com  Sun Dec 28 21:59:07 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sun, 28 Dec 2008 13:59:07 -0700
Subject: [Python-Dev] Call PyType_Ready on builtin types during
	interpreter startup?
In-Reply-To: <49576C88.30503@gmail.com>
References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com>
	<49576C88.30503@gmail.com>
Message-ID: <aac2c7cb0812281259t797aa7a1ye3f6c983faad7d49@mail.gmail.com>

On Sun, Dec 28, 2008 at 5:09 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Nick Coghlan wrote:
>> Nick Coghlan wrote:
>>> Is there a specific reason for not fully initialising the builtin types?
>>> Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?
>>
>> I need to correct this slightly: some builtin types *are* initialised
>> properly by _Py_ReadyTypes.
>>
>> So the question is actually whether or not the missing builtin types
>> should be added to that function.
>
> I'm probably going to fix the specific problem with hashing of range
> objects in Py3k just by initialising xrange/range properly in
> _Py_ReadyTypes.
>
> However, I wonder how many other builtin types have the same problem -
> for example, the enumerate type is also missing a call to PyType_Ready:
>
> Python 3.1a0 (py3k, Dec 14 2008, 21:35:11)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = enumerate([])
>>>> hash(x)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> TypeError: unhashable type: 'enumerate'
>>>> enumerate.__name__ # implicit call to PyType_Ready
> 'enumerate'
>>>> hash(x)
> -1212398692
>
> Rather than playing whack-a-mole with this, does anyone have any ideas
> on how to systematically find types which are defined in the core, but
> are missing an explicit PyType_Ready call? (I guess one way would be to
> remove all the implicit calls in a local build and see what blows up...
> that seems a little drastic though)

What I did with safethread was replace the implicit calls with
assertions.  That with the test suite should pick everything up.


-- 
Adam Olsen, aka Rhamphoryncus

From skip at pobox.com  Mon Dec 29 00:01:52 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 28 Dec 2008 17:01:52 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <4957DFC5.9030405@v.loewis.de>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org> <49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org> <49571C02.7090205@gmail.com>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
Message-ID: <18776.1376.724926.669345@montanaro-dyndns-org.local>


    Martin> I don't think any change is necessary. os.path.commonprefix
    Martin> works just fine on path components:
    ...

Ummm...

    >>> os.path.commonprefix(["/export/home", "/etc/passwd"])
    '/e'

I suppose that's correct given the defined behavior of the function, but
it certainly doesn't seem to be very path-like to me.

    Martin> Of course, using it that way would require a library function
    Martin> that reliably splits a path into components; I think one would
    Martin> have to do abspath on arbitrary inputs.

See <http://bugs.python.org/issue4755> for what I think is a function with
more predictable behavior given that we are discussing paths and not just
strings.

Skip

From skip at pobox.com  Mon Dec 29 00:14:00 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 28 Dec 2008 17:14:00 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18776.1376.724926.669345@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org> <49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org> <49571C02.7090205@gmail.com>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
Message-ID: <18776.2104.439166.518935@montanaro-dyndns-org.local>

>>>>> "skip" == skip  <skip at pobox.com> writes:

    Martin> I don't think any change is necessary. os.path.commonprefix
    Martin> works just fine on path components:

    skip> Ummm...

    >>> os.path.commonprefix(["/export/home", "/etc/passwd"])
    '/e'

    skip> I suppose that's correct given the defined behavior of the
    skip> function, but it certainly doesn't seem to be very path-like to
    skip> me.

I should also point out that most people will not have the foresight to use
it the way Martin demonstrated.  Documentation or not, I'll be a fair
fraction of all usage assumes the return value represents a valid path.

    Martin> Of course, using it that way would require a library function
    Martin> that reliably splits a path into components; I think one would
    Martin> have to do abspath on arbitrary inputs.

Kinda what I think os.path.split ought to do.  Should I tackle that next?
;-)

Skip


From martin at v.loewis.de  Mon Dec 29 00:16:22 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 29 Dec 2008 00:16:22 +0100
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18776.1376.724926.669345@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org>
	<49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
Message-ID: <495808C6.4050304@v.loewis.de>

>     Martin> I don't think any change is necessary. os.path.commonprefix
>     Martin> works just fine on path components:
>     ...
> 
> Ummm...
> 
>     >>> os.path.commonprefix(["/export/home", "/etc/passwd"])
>     '/e'

This calls it with strings, not with path components. As I said, it
works fine for path components:

py> os.path.commonprefix([f.split('/') for f in ["/export/home",
"/etc/passwd"]])
['']

> See <http://bugs.python.org/issue4755> for what I think is a function with
> more predictable behavior given that we are discussing paths and not just
> strings.

See above: the function works for lists as well.

Regards,
Martin

From skip at pobox.com  Mon Dec 29 00:21:11 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 28 Dec 2008 17:21:11 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <495808C6.4050304@v.loewis.de>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49556F8F.5090709@gmail.com>
	<18773.38867.117021.560152@montanaro-dyndns-org.local>
	<loom.20081227T160809-547@post.gmane.org>
	<18774.24196.530208.708594@montanaro-dyndns-org.local>
	<loom.20081227T170641-131@post.gmane.org> <49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org> <49571C02.7090205@gmail.com>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
Message-ID: <18776.2535.459306.987378@montanaro-dyndns-org.local>


    >> See <http://bugs.python.org/issue4755> for what I think is a function
    >> with more predictable behavior given that we are discussing paths and
    >> not just strings.

    Martin> See above: the function works for lists as well.

But as you yourself pointed out, Python lacks a reliable split function for
filesystem paths.  The patch implements different versions for Windows and
other platforms because Python supports two separators on that platform.

Skip

From hall.jeff at gmail.com  Mon Dec 29 22:49:26 2008
From: hall.jeff at gmail.com (Jeff Hall)
Date: Mon, 29 Dec 2008 16:49:26 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18776.2535.459306.987378@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49569390.1080805@gmail.com> <loom.20081227T204733-481@post.gmane.org>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
Message-ID: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>

I think Nick's solution is "Don't let the best be the enemy of the good"

Had this been caught before 3.0 release it might be a different solution

Let's just add a new function that works "correctly"

Martin, it seems to me that a path. method shouldn't require me to pass path
components but instead should accept a "path" as its input (or in this case
multiple paths). The current usage feels like a string method to me. Not
saying it's not useful but it isn't "intuitive".

For those that prefer not to add functions all willy-nilly, would it not be
better to add a "delimiter" keyword that defaults to False? Then
"delimiter=False" will function with the current functionality unchanged
while

os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/")


would properly return

'bob/'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081229/b8e64872/attachment.htm>

From Scott.Daniels at Acm.Org  Mon Dec 29 23:02:20 2008
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Mon, 29 Dec 2008 14:02:20 -0800
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org>	<49571C02.7090205@gmail.com>
	<1230460141.6361.4.camel@localhost>	<49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>	<18776.1376.724926.669345@montanaro-dyndns-org.local>	<495808C6.4050304@v.loewis.de>	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
Message-ID: <gjbhc2$24d$1@ger.gmane.org>

Jeff Hall wrote:
>... For those that prefer not to add functions all willy-nilly, would it not 
> be better to add a "delimiter" keyword that defaults to False? Then 
> "delimiter=False" will function with the current functionality unchanged 
> while
> 
> os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = 
> "/")

The proper call should be:
os.path.commonprefix(["bob/example", "bob/etc/passwd"], delimiter=True)

and output:
        'bob'   (path to the common directory)

Perhaps even call the keyword arg "delimited," rather than "delimiter."
On Windows, I'd like to see:
   os.path.commonprefix(['a/b/c.d/e'f', r'a\b\c.d\eve'], delimited=True)
return either
      'a/b/c.d'
  or  r'a\b\c.d'
Perhaps even ['a', 'b', 'c.d'] (suitable for os.path.join).

--Scott David Daniels
Scott.Daniels at Acm.Org


From hall.jeff at gmail.com  Mon Dec 29 23:07:50 2008
From: hall.jeff at gmail.com (Jeff Hall)
Date: Mon, 29 Dec 2008 17:07:50 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <gjbhc2$24d$1@ger.gmane.org>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<gjbhc2$24d$1@ger.gmane.org>
Message-ID: <1bc395c10812291407i53edf28bv60af385f405df4a9@mail.gmail.com>

I was thinking that the user could just define the delimiter character due
to the differences amongst delimiters used in OS's... but if that isn't a
problem (Skip seemed to think it wouldn't be) then my solution is
functionally identical to the first one he proposed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081229/f0d87546/attachment.htm>

From skip at pobox.com  Mon Dec 29 23:46:01 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 29 Dec 2008 16:46:01 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49569390.1080805@gmail.com> <loom.20081227T204733-481@post.gmane.org>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
Message-ID: <18777.21289.504321.865439@montanaro-dyndns-org.local>


    Jeff> For those that prefer not to add functions all willy-nilly, would
    Jeff> it not be better to add a "delimiter" keyword that defaults to
    Jeff> False? Then "delimiter=False" will function with the current
    Jeff> functionality unchanged while

    Jeff> os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/")

    Jeff> would properly return

    Jeff> 'bob/'

On Windows what would you do with this crazy, but valid, path?

    c:/etc\\passwd

I don't do Windows, so don't have any idea if there is even an /etc/passwd
file on Windows.  I'd guess not, but that's not the point.  The point is
that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in
Windows pathnames.  See my patch (issue 4755) for a version of
os.path.<whatever> which works as at least I expect and should work
cross-platform.

Skip


From pje at telecommunity.com  Tue Dec 30 02:02:07 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 29 Dec 2008 20:02:07 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18777.21289.504321.865439@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<49569390.1080805@gmail.com>
	<loom.20081227T204733-481@post.gmane.org>
	<49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
Message-ID: <20081230010023.B46883A406C@sparrow.telecommunity.com>

You know, all this path separator and list complication isn't really 
necessary, when you can just take the os.path.dirname() of the return 
from commonprefix().

Perhaps we could just add that recommendation to the docs?


At 04:46 PM 12/29/2008 -0600, skip at pobox.com wrote:

>     Jeff> For those that prefer not to add functions all willy-nilly, would
>     Jeff> it not be better to add a "delimiter" keyword that defaults to
>     Jeff> False? Then "delimiter=False" will function with the current
>     Jeff> functionality unchanged while
>
>     Jeff> os.path.commonprefix(["bob/export/home", 
> "bob/etc/passwd"], delimiter = "/")
>
>     Jeff> would properly return
>
>     Jeff> 'bob/'
>
>On Windows what would you do with this crazy, but valid, path?
>
>     c:/etc\\passwd
>
>I don't do Windows, so don't have any idea if there is even an /etc/passwd
>file on Windows.  I'd guess not, but that's not the point.  The point is
>that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in
>Windows pathnames.  See my patch (issue 4755) for a version of
>os.path.<whatever> which works as at least I expect and should work
>cross-platform.
>
>Skip
>
>_______________________________________________
>Python-Dev mailing list
>Python-Dev at python.org
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe: 
>http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


From ncoghlan at gmail.com  Tue Dec 30 08:35:45 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 30 Dec 2008 17:35:45 +1000
Subject: [Python-Dev] Commands for correctly merging to the python 3.0
	maintenance branch
Message-ID: <4959CF51.4030507@gmail.com>

Getting the svnmerge-intergrated property right when merging
trunk->py3k->release30 is a little tricky. The most concise set of
instructions I have found which gets it right is to do the following in
the 3.0 maintenance branch after committing to the py3k branch:

svn update
svnmerge merge -r <py3k-rev>
svn revert .
svnmerge -M -F <py3k-rev>
<test change still works, etc>
svn commit -F svnmerge-commit-message.txt

Revert and property changes on "." and running that second svnmerge line
is also useful if you do a "svn update" after the first svnmerge and get
a conflict on the svnmerge-intregrated property. The -M option tells the
utility to only make the property changes, while the -F tells it to go
ahead despite the existence of local modification.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From p.f.moore at gmail.com  Tue Dec 30 10:36:26 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 30 Dec 2008 09:36:26 +0000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081230010023.B46883A406C@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
Message-ID: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>

2008/12/30 Phillip J. Eby <pje at telecommunity.com>:
> You know, all this path separator and list complication isn't really
> necessary, when you can just take the os.path.dirname() of the return from
> commonprefix().
>
> Perhaps we could just add that recommendation to the docs?

Actually, consider the following (on Windows):

>python
Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"])
'foo'
>>>

This very clearly shows that commonprefix is a string operation rather
than a path operation, as it does not respect the equivalence of
os.sep and os.altsep. In path semantics, the common prefix is
"foo/bar" (or equivalently "foo\\bar").

I'm not sure how to deal with this, except by recommending that all
paths passed to os.path.commonprefix should at the very least be
normalised via os.path.normpath first - which starts to get clumsy
fast. So the "recommended" usage to get the common directory is

    paths = [...]
    common = os.path.dirname(os.path.commonprefix([os.path.normpath(p)
for p in paths]))

Hmm...

Paul.

From martin at v.loewis.de  Tue Dec 30 10:42:23 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 30 Dec 2008 10:42:23 +0100
Subject: [Python-Dev] Commands for correctly merging to the python 3.0
 maintenance branch
In-Reply-To: <4959CF51.4030507@gmail.com>
References: <4959CF51.4030507@gmail.com>
Message-ID: <4959ECFF.2010803@v.loewis.de>

> svn revert .
> svnmerge -M -F <py3k-rev>

[are you sure you don't need a command for svnmerge here?]

Instead of these two, I always do

  svn resolved .

Regards,
Martin

From skip at pobox.com  Tue Dec 30 13:14:24 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 30 Dec 2008 06:14:24 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
Message-ID: <18778.4256.215698.798495@montanaro-dyndns-org.local>


Paul demonstrates the shortcoming of commonprefix:

    >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"])
    'foo'

With the patch in issue4755:

    >>> import ntpath
    >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"])
    'foo\\bar'

Ta da ...

Skip

From pje at telecommunity.com  Tue Dec 30 13:33:36 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Dec 2008 07:33:36 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18778.4256.215698.798495@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<18778.4256.215698.798495@montanaro-dyndns-org.local>
Message-ID: <20081230123153.252893A406C@sparrow.telecommunity.com>

At 06:14 AM 12/30/2008 -0600, skip at pobox.com wrote:

>Paul demonstrates the shortcoming of commonprefix:
>
>     >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"])
>     'foo'
>
>With the patch in issue4755:
>
>     >>> import ntpath
>     >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"])
>     'foo\\bar'

But it doesn't handle the fact that Windows paths are 
case-insensitive, or that Posix paths can have symlinks...  or that 
one path might be relative and another absolute...

As soon as you move away from being a string operation, you get an 
endless series of gotchas...  none of which are currently documented.


From doomster at knuut.de  Tue Dec 30 13:18:51 2008
From: doomster at knuut.de (Ulrich Eckhardt)
Date: Tue, 30 Dec 2008 13:18:51 +0100
Subject: [Python-Dev] WinCE port (issues #4075 #4051)
Message-ID: <200812301318.51367.doomster@knuut.de>

Hi!

I'm currently working again on the CE port, and since 2.6 and 3.0 are now out 
of the door, could you apply the patches in #4075 & #4051? Both patches are 
fairly isolated and easy to review and I'm pretty sure they won't cause any 
inconveniences.

Note: this is far from everything that is necessary for Python to rock on CE, 
but these are prerequisites, as explained in both bugs' histories.

thanks

Uli

From skip at pobox.com  Tue Dec 30 13:58:11 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 30 Dec 2008 06:58:11 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081230123153.252893A406C@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<18778.4256.215698.798495@montanaro-dyndns-org.local>
	<20081230123153.252893A406C@sparrow.telecommunity.com>
Message-ID: <18778.6883.440406.175729@montanaro-dyndns-org.local>


    Phillip> But it doesn't handle the fact that Windows paths are
    Phillip> case-insensitive, or that Posix paths can have symlinks...  or
    Phillip> that one path might be relative and another absolute...

    Phillip> As soon as you move away from being a string operation, you get
    Phillip> an endless series of gotchas...  none of which are currently
    Phillip> documented.

Well, then we can document (some of?) the gotchas* and work on a better
implementation of commonpathprefix.  I don't do Windows.  You're lucky I got
as far as I did with the Windows side of things. ;-)

Skip

* I would argue that symlinks should be transparent.  By the very nature of
  the operations and the fact that they might be performed on other
  platforms (import posixpath on Windows for instance) there is not much, if
  anything, you can infer about the paths themselves other than their
  structure.


From ncoghlan at gmail.com  Tue Dec 30 15:19:08 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 31 Dec 2008 00:19:08 +1000
Subject: [Python-Dev] Commands for correctly merging to the python 3.0
 maintenance branch
In-Reply-To: <4959ECFF.2010803@v.loewis.de>
References: <4959CF51.4030507@gmail.com> <4959ECFF.2010803@v.loewis.de>
Message-ID: <495A2DDC.6080401@gmail.com>

Martin v. L?wis wrote:
>> svn revert .
>> svnmerge -M -F <py3k-rev>
> 
> [are you sure you don't need a command for svnmerge here?]

D'oh, I thought I fixed that before sending the message. Yes, that line
should indeed be:

svnmerge merge -M -F <py3k-rev>

> Instead of these two, I always do
> 
>   svn resolved .

That's what I had been doing before today, and I believe it works
correctly so long as you never get the svn update and svnmerge merge
operations out of sequence (i.e. always update and only then merge).

However, I encountered the case today where I had already merged to the
maintenance branch and did the svn update afterwards. In that situation,
reverting the property changes and reapplying them was the only way for
me to avoid losing the record of the changes everyone else had already
merged.

If I hadn't checked the property diff and noticed that several merged
revisions were no longer listed in the property in my working copy, then
svnmerge may have become very confused. The
revert+redo-merge-bookkeeping approach is definitely slower than just
marking the conflict as resolved, but has a definite advantage in doing
the right thing even if the earlier update+merge operations were
performed out of sequence (or if an extra update becomes necessary due
to checkins after the merge was performed).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From ncoghlan at gmail.com  Tue Dec 30 22:20:25 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 31 Dec 2008 07:20:25 +1000
Subject: [Python-Dev] test_subprocess and sparc buildbots
Message-ID: <495A9099.1030907@gmail.com>

Does anyone have local access to a sparc machine to try to track down
the ongoing buildbot failures in test_subprocess?

(I think the problem is specific to 3.x builds on sparc machines, but I
haven't checked the buildbots all that closely - that assessment is just
based on what I recall of the buildbot failure emails).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From barry at barrys-emacs.org  Tue Dec 30 22:45:44 2008
From: barry at barrys-emacs.org (Barry Scott)
Date: Tue, 30 Dec 2008 21:45:44 +0000
Subject: [Python-Dev] Python 3 - Mac Installer?
In-Reply-To: <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com>
References: <200812260855.49518.list@qtrac.plus.com>
	<1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com>
Message-ID: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org>


On 26 Dec 2008, at 23:30, Benjamin Peterson wrote:

> On Fri, Dec 26, 2008 at 2:55 AM, Mark Summerfield  
> <list at qtrac.plus.com> wrote:
>> Hi,
>>
>> Just wondered if/when there'd be a Mac installer for Python 3?
>
> I think there should be one eventually. Unfortunately, the 3.x build
> process is not ironed out. If somebody wants to make a patch which
> makes the build script in Mac/BuildScript/ work, I'd be very happy. :)

Since I've been building 3.0 for a while now I looked at the script.

build-install.py seems to have been half converted to py 3.0.
Going full 3.0 was not hard but then there is the problem of
the imports.

Python 3.0 does not have MacOS or Carbon modules.

Seems that there are two ways to go.

Put back the Carbon and MacOS modules into 3.0.
Use Python 2 to build the python 3 package.

Barry


From benjamin at python.org  Tue Dec 30 22:59:51 2008
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 30 Dec 2008 15:59:51 -0600
Subject: [Python-Dev] Python 3 - Mac Installer?
In-Reply-To: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org>
References: <200812260855.49518.list@qtrac.plus.com>
	<1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com>
	<74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org>
Message-ID: <1afaf6160812301359r36d3b5b9k98afb21b517a69ce@mail.gmail.com>

On Tue, Dec 30, 2008 at 3:45 PM, Barry Scott <barry at barrys-emacs.org> wrote:
>
> build-install.py seems to have been half converted to py 3.0.
> Going full 3.0 was not hard but then there is the problem of
> the imports.

Thanks for your help, but just today Ronald Oussoren, the Mac
maintainer, spent some time making the installer work. As a result, we
should be ready to go for 3.0.1!

>
> Python 3.0 does not have MacOS or Carbon modules.
>
> Seems that there are two ways to go.
>
> Put back the Carbon and MacOS modules into 3.0.
> Use Python 2 to build the python 3 package.

I've converted it back to 2.x for the time being. Eventually, I think
some 3.x bindings should be released.



-- 
Regards,
Benjamin Peterson

From Scott.Daniels at Acm.Org  Tue Dec 30 23:32:02 2008
From: Scott.Daniels at Acm.Org (Scott David Daniels)
Date: Tue, 30 Dec 2008 14:32:02 -0800
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com>	<4957DFC5.9030405@v.loewis.de>	<18776.1376.724926.669345@montanaro-dyndns-org.local>	<495808C6.4050304@v.loewis.de>	<18776.2535.459306.987378@montanaro-dyndns-org.local>	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>	<18777.21289.504321.865439@montanaro-dyndns-org.local>	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
Message-ID: <gje7fm$rgi$1@ger.gmane.org>

Paul Moore wrote:
> 2008/12/30 Phillip J. Eby <pje at telecommunity.com>:
>> You know, all this path separator and list complication isn't really
>> necessary, when you can just take the os.path.dirname() of the return from
>> commonprefix()....
> 
> Actually, consider: ...
>>>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"])
> 'foo'
> 
> ... I'm not sure how to deal with this, except by recommending that all
> paths passed to os.path.commonprefix should at the very least be
> normalised via os.path.normpath first - which starts to get clumsy
> fast. So the "recommended" usage to get the common directory is
> 
>     paths = [...]
>     common = os.path.dirname(os.path.commonprefix([
 >                    os.path.normpath(p) for p in paths]))


More trouble with the "just take the dirname":

     paths = ['/a/b/c', '/a/b/d', '/a/b']
     os.path.dirname(os.path.commonprefix([
                         os.path.normpath(p) for p in paths]))

give '/a', not '/a/b'.

--Scott David Daniels
Scott.Daniels at Acm.Org


From pje at telecommunity.com  Tue Dec 30 23:51:48 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Dec 2008 17:51:48 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <gje7fm$rgi$1@ger.gmane.org>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
Message-ID: <20081230225006.B5D043A405E@sparrow.telecommunity.com>

At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>More trouble with the "just take the dirname":
>
>     paths = ['/a/b/c', '/a/b/d', '/a/b']
>     os.path.dirname(os.path.commonprefix([
>                         os.path.normpath(p) for p in paths]))
>
>give '/a', not '/a/b'.

...because that's the correct answer.


From jcea at jcea.es  Wed Dec 31 01:08:38 2008
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 31 Dec 2008 01:08:38 +0100
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
 dict (python 2.5.2)
In-Reply-To: <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com>
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>	<18765.21740.137339.943481@montanaro-dyndns-org.local>	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>	<3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>	<3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com>
	<3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com>
Message-ID: <495AB806.7050603@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mike Coleman wrote:
> I guess if ints are 12 bytes (per Beazley's book, but not sure if that
> still holds), then that would correspond to a 1GB reduction.

Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
[GCC 4.2.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(0)
12

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSVq3+Zlgi5GaxT1NAQLYUAP+Jc0JPYf2GPdNCKypORO+mD887xs81hQ0
MM7QBbRgLflcQ6g2tijpWPhN2/INscbtFn41lptHEYFTv/kka9EICuxgoNP1COYT
Or+1uChnSsx1Z7Xxr8YwLFe6ZW/LDyvPjCMpIT32mGSlc1/mfPZk3WjpqTJPeCwY
vqu9xD0T0iw=
=gXQ5
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Wed Dec 31 01:40:02 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 31 Dec 2008 00:40:02 +0000 (UTC)
Subject: [Python-Dev] extremely slow exit for program having huge (45G)
	dict (python 2.5.2)
References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com>	<930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local>	<3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com>	<494D4FD0.4020202@egenix.com>	<cc7430500812201220j3a444f5fr7bbb43bbdd2c37e2@mail.gmail.com>	<18765.21740.137339.943481@montanaro-dyndns-org.local>	<cc7430500812201301n3c522886o1ca4ca03b38bb665@mail.gmail.com>	<3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com>	<3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com>
	<3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com>
	<495AB806.7050603@jcea.es>
Message-ID: <loom.20081231T003849-329@post.gmane.org>

Jesus Cea <jcea <at> jcea.es> writes:
> 
> Mike Coleman wrote:
> > I guess if ints are 12 bytes (per Beazley's book, but not sure if that
> > still holds), then that would correspond to a 1GB reduction.
> 
> Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07)
> [GCC 4.2.3] on sunos5
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> sys.getsizeof(0)
> 12

On a 32-bit system, sure, but given Mike creates a 45 GB dict, he has a 64-bit
system, where ints are 24 bytes:

>>> sys.getsizeof(0)
24
>>> sys.getsizeof(100000)
24

cheers

Antoine.



From victor.stinner at haypocalc.com  Wed Dec 31 01:49:32 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 31 Dec 2008 01:49:32 +0100
Subject: [Python-Dev] Missing FAQ about Python3 and unicode
Message-ID: <200812310149.32686.victor.stinner@haypocalc.com>

Hi,

Slowly, we get recurrent questions about Python3 and unicode. It's maybe time 
to start a FAQ? Here is an ugly draft to start it ;-)


(1) Exit on undecodable command line arguments

   $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff'
   Could not convert argument 2 to string$

Is it an expected behaviour? Yes!

Example of the question: http://bugs.python.org/issue3023


(2) Undecodable filenames

os.listdir(str)->str raises an exception on undecodable filenames.

Solution: use os.listdir(bytes)->bytes. To display the filename to the user, 
use a function like:

   import sys
   def humanFilename(filename):
      encoding = sys.getfilesystemencoding()
      return filename.encode(encoding, "replace")

See also http://bugs.python.org/issue3187


(3) Bytes environment variables

Python 3.0 only supports decodable variables for os.environ. Undecodable 
variables are skipped for the creation of os.environ but original variables 
still exist at the C level.

$ A=$(echo -e "\xff") B=c ./python
Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49)
>>> import os
>>> os.environ.get('A'), os.environ.get('B')
(None, 'c')
>>> retcode=os.system('echo -n $A|hexdump -C')
00000000  ff                                                |.|
00000001
>>> retcode=os.system('echo -n $B|hexdump -C')
00000000  63                                                |c|
00000001

Discussion to support bytes environment variables:
http://mail.python.org/pipermail/python-dev/2008-December/083856.html

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From victor.stinner at haypocalc.com  Wed Dec 31 01:55:40 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 31 Dec 2008 01:55:40 +0100
Subject: [Python-Dev] I would like an svn account
Message-ID: <200812310155.40206.victor.stinner@haypocalc.com>

Hi,

I already asked in September to get an svn account to be able to commit 
directly patches to trunk (or other branches like py3k). My query was 
rejected because I didn't know Python core enough (and maybe other reasons 
that I don't know).

I helped to fix many issues using the bug tracker. The bigger patch was the 
bytes filename support for Python3, accepted by Guido (after a long 
review ;-)).

Why an svn account instead of just using the amazing bug tracker? Just because 
there are not enough people to review/commit patches on the tracker and so 
there are more and more open issues (and so more and more lost patches) :-( I 
will be able to work faster using the svn.

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From ncoghlan at gmail.com  Wed Dec 31 02:30:06 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 31 Dec 2008 11:30:06 +1000
Subject: [Python-Dev] I would like an svn account
In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com>
References: <200812310155.40206.victor.stinner@haypocalc.com>
Message-ID: <495ACB1E.1020300@gmail.com>

Victor Stinner wrote:
> Hi,
> 
> I already asked in September to get an svn account to be able to commit 
> directly patches to trunk (or other branches like py3k). My query was 
> rejected because I didn't know Python core enough (and maybe other reasons 
> that I don't know).
> 
> I helped to fix many issues using the bug tracker. The bigger patch was the 
> bytes filename support for Python3, accepted by Guido (after a long 
> review ;-)).
> 
> Why an svn account instead of just using the amazing bug tracker? Just because 
> there are not enough people to review/commit patches on the tracker and so 
> there are more and more open issues (and so more and more lost patches) :-( I 
> will be able to work faster using the svn.

+1 here

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------

From jnoller at gmail.com  Wed Dec 31 02:47:55 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 30 Dec 2008 20:47:55 -0500
Subject: [Python-Dev] I would like an svn account
In-Reply-To: <495ACB1E.1020300@gmail.com>
References: <200812310155.40206.victor.stinner@haypocalc.com>
	<495ACB1E.1020300@gmail.com>
Message-ID: <42758A82-7A0D-4079-889A-EE5D618E76C0@gmail.com>



On Dec 30, 2008, at 8:30 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Victor Stinner wrote:
>> Hi,
>>
>> I already asked in September to get an svn account to be able to  
>> commit
>> directly patches to trunk (or other branches like py3k). My query was
>> rejected because I didn't know Python core enough (and maybe other  
>> reasons
>> that I don't know).
>>
>> I helped to fix many issues using the bug tracker. The bigger patch  
>> was the
>> bytes filename support for Python3, accepted by Guido (after a long
>> review ;-)).
>>
>> Why an svn account instead of just using the amazing bug tracker?  
>> Just because
>> there are not enough people to review/commit patches on the tracker  
>> and so
>> there are more and more open issues (and so more and more lost  
>> patches) :-( I
>> will be able to work faster using the svn.
>
> +1 here
>
> Cheers,
> Nick.
>
>

Also +1 FWIW

Jesse

From rdmurray at bitdance.com  Wed Dec 31 03:30:21 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Tue, 30 Dec 2008 21:30:21 -0500 (EST)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081230225006.B5D043A405E@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
Message-ID: <Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>

On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>> More trouble with the "just take the dirname":
>>
>>      paths = ['/a/b/c', '/a/b/d', '/a/b']
>>      os.path.dirname(os.path.commonprefix([
>>                          os.path.normpath(p) for p in paths]))
>> 
>> give '/a', not '/a/b'.
>
> ...because that's the correct answer.

But not the answer that is wanted.

So the challenge now is to write a single expression that will yield
'/a/b' when passed the above paths list, and also produce '/a/b' when
passed the following paths list:

     paths = ['/a/b/c', '/a/b/cd']

--RDM

From alexandre at peadrop.com  Wed Dec 31 03:37:01 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Tue, 30 Dec 2008 21:37:01 -0500
Subject: [Python-Dev] test_subprocess and sparc buildbots
In-Reply-To: <495A9099.1030907@gmail.com>
References: <495A9099.1030907@gmail.com>
Message-ID: <acd65fa20812301837v7b207fe0na8868e6d075eb609@mail.gmail.com>

Here is what I found just by analyzing the logs. It seems the first
failures appeared after this change:

http://svn.python.org/view/python/branches/release30-maint/Objects/object.c?rev=67888&view=diff&r1=67888&r2=67887&p1=python/branches/release30-maint/Objects/object.c&p2=/python/branches/release30-maint/Objects/object.c

The logs of failing test runs all shows the same error message:

[31481 refs]
* ob
object  : <refcnt 0 at 0x3a97728>
type    : str
refcount: 0
address : 0x3a97728
* op->_ob_prev->_ob_next
object  : <refcnt 0 at 0x3a97728>
type    : str
refcount: 0
address : 0x3a97728
* op->_ob_next->_ob_prev
object  : [31776 refs]

This is the output of _Py_ForgetReference (which calls _PyObject_Dump)
called either from _PyUnicode_New or unicode_subtype_new. In both
cases, this implies PyObject_MALLOC returned NULL when allocating the
internal array of a str object. However, I have no idea why malloc()
is failing there.

By counting the number of [reftotal] printed in the log, I found that
the failing test could be one of the following: test_invalid_args,
test_invalid_bufsize, test_list2cmdline, test_no_leaking. Looking at
the tests, it seems only test_no_leaking could be problematic:

* test_list2cmdline checks if the subprocess.line2cmdline function
  works correctly, only Python code is involved here;
* test_invalid_args checks if using an option unsupported by a
platform raises an
  exception, only Python code is involved here;
* test_invalid_bufsize only checks whether Popen rejects non-integer
bufsize, only
  Python code is involved here.

And unsurprisingly, that is the failing test:

test test_subprocess failed -- Traceback (most recent call last):
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/test/test_subprocess.py",
line 423, in test_no_leaking
    data = p.communicate(b"lime")[0]
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 671, in communicate
    return self._communicate(input)
  File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py",
line 1171, in _communicate
    bytes_written = os.write(self.stdin.fileno(), chunk)
OSError: [Errno 32] Broken pipe

It seems one of the spawned processes goes out of memory while
allocating a new PyUnicode object. I believe we don't see the usual
MemoryError because the parent process catches stderr and stdout of
the children.

Also, only klose-*-sparc buildbots are failing this way; loewis-sun is
failing too but for a different reason. So, how much memory is
available on this machine (or actually, on this virtual machine)?

Now, I wonder why manipulating the GIL caused the bug to appear in
3.0, but not in 2.x. Maybe it is related to the new I/O library in
Python 3.0.

Regards,
-- Alexandre

On Tue, Dec 30, 2008 at 4:20 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Does anyone have local access to a sparc machine to try to track down
> the ongoing buildbot failures in test_subprocess?
>
> (I think the problem is specific to 3.x builds on sparc machines, but I
> haven't checked the buildbots all that closely - that assessment is just
> based on what I recall of the buildbot failure emails).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre%40peadrop.com
>

From rdmurray at bitdance.com  Wed Dec 31 03:40:07 2008
From: rdmurray at bitdance.com (rdmurray at bitdance.com)
Date: Tue, 30 Dec 2008 21:40:07 -0500 (EST)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
	<Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
Message-ID: <Pine.LNX.4.64.0812302134261.14037@kimball.webabinitio.net>

On Tue, 30 Dec 2008 at 21:30, rdmurray at bitdance.com wrote:
> On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
>>  At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>> >  More trouble with the "just take the dirname":
>> > 
>> >       paths = ['/a/b/c', '/a/b/d', '/a/b']
>> >       os.path.dirname(os.path.commonprefix([
>> >                           os.path.normpath(p) for p in paths]))
>> > 
>> >  give '/a', not '/a/b'.
>>
>>  ...because that's the correct answer.
>
> But not the answer that is wanted.
>
> So the challenge now is to write a single expression that will yield
> '/a/b' when passed the above paths list, and also produce '/a/b' when
> passed the following paths list:
>
>    paths = ['/a/b/c', '/a/b/cd']

Sorry, now I see what you are saying: that in '/a/b' the 'b' is the
filename.  Clearly that wasn't what I intuitively expected our
notional 'commonpathprefix' command to produce, for whatever
that is worth :)

--RDM

From skip at pobox.com  Wed Dec 31 03:57:45 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 30 Dec 2008 20:57:45 -0600
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081230225006.B5D043A405E@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
Message-ID: <18778.57257.227598.592245@montanaro-dyndns-org.local>


    Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
    >> More trouble with the "just take the dirname":
    >> 
    >> paths = ['/a/b/c', '/a/b/d', '/a/b']
    >> os.path.dirname(os.path.commonprefix([
    >> os.path.normpath(p) for p in paths]))
    >> 
    >> give '/a', not '/a/b'.

    Phillip> ...because that's the correct answer.

I don't understand.  If you search for os.path.commonprefix at
codesearch.google.com you'll find uses like this:

    if os.path.commonprefix([basedir, somepath]) != basedir:
        ...

which leads me to believe that other people using the current function in
the real world would be confused by your interpretation.

Skip



From benjamin at python.org  Wed Dec 31 04:29:19 2008
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 30 Dec 2008 21:29:19 -0600
Subject: [Python-Dev] Missing FAQ about Python3 and unicode
In-Reply-To: <200812310149.32686.victor.stinner@haypocalc.com>
References: <200812310149.32686.victor.stinner@haypocalc.com>
Message-ID: <1afaf6160812301929u509378fbxd0794c76ee13af82@mail.gmail.com>

On Tue, Dec 30, 2008 at 6:49 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Hi,
>
> Slowly, we get recurrent questions about Python3 and unicode. It's maybe time
> to start a FAQ? Here is an ugly draft to start it ;-)

Looks like good stuff! It would probably make a good addition to the
meager porting docs in development on the wiki. [1]

...

[1] http://wiki.python.org/moin/PortingToPy3k



-- 
Regards,
Benjamin Peterson

From ajaksu at gmail.com  Wed Dec 31 04:41:41 2008
From: ajaksu at gmail.com (Daniel (ajax) Diniz)
Date: Wed, 31 Dec 2008 01:41:41 -0200
Subject: [Python-Dev] test_subprocess and sparc buildbots
In-Reply-To: <acd65fa20812301837v7b207fe0na8868e6d075eb609@mail.gmail.com>
References: <495A9099.1030907@gmail.com>
	<acd65fa20812301837v7b207fe0na8868e6d075eb609@mail.gmail.com>
Message-ID: <2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com>

Alexandre Vassalotti wrote:
> The logs of failing test runs all shows the same error message:
>
> [31481 refs]
> * ob
> object  : <refcnt 0 at 0x3a97728>
> type    : str
> refcount: 0
> address : 0x3a97728
> * op->_ob_prev->_ob_next
> object  : <refcnt 0 at 0x3a97728>
> type    : str
> refcount: 0
> address : 0x3a97728
> * op->_ob_next->_ob_prev
> object  : [31776 refs]

A reliable way to get that in a --with-pydebug build seems to be:

~/py3k$ ./python -c "import locale; locale.format_string(1,1)"
* ob
object  : <refcnt 0 at 0x825c76c>
type    : tuple
refcount: 0
address : 0x825c76c
* op->_ob_prev->_ob_next
NULL
* op->_ob_next->_ob_prev
object  : <refcnt 0 at 0x825c76c>
type    : tuple
refcount: 0
address : 0x825c76c
Fatal Python error: UNREF invalid object
TypeError: expected string or buffer
Aborted

Found using Fusil in a very quick run on top of:
Python 3.1a0 (py3k:68055M, Dec 31 2008, 01:34:52)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

So kudos to Victor again :)

HTH,
Daniel

From python at rcn.com  Wed Dec 31 05:05:12 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 30 Dec 2008 20:05:12 -0800
Subject: [Python-Dev] I would like an svn account
References: <200812310155.40206.victor.stinner@haypocalc.com>
Message-ID: <9A42531762714CADABF8A6F40C08AD23@RaymondLaptop1>

From: "Victor Stinner" <victor.stinner at haypocalc.com>

> Why an svn account instead of just using the amazing bug tracker? Just because 
> there are not enough people to review/commit patches on the tracker and so 
> there are more and more open issues (and so more and more lost patches) :-( I 
> will be able to work faster using the svn.

Based on the work I've seen so far, my preference is that you continue to use the
tracker instead of directly committing patches.


Raymond


From pje at telecommunity.com  Wed Dec 31 05:08:04 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Dec 2008 23:08:04 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
	<Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
Message-ID: <20081231040622.4C8143A405E@sparrow.telecommunity.com>

At 09:30 PM 12/30/2008 -0500, rdmurray at bitdance.com wrote:
>On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
>>At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>>>More trouble with the "just take the dirname":
>>>
>>>      paths = ['/a/b/c', '/a/b/d', '/a/b']
>>>      os.path.dirname(os.path.commonprefix([
>>>                          os.path.normpath(p) for p in paths]))
>>>give '/a', not '/a/b'.
>>
>>...because that's the correct answer.
>
>But not the answer that is wanted.
>
>So the challenge now is to write a single expression that will yield
>'/a/b' when passed the above paths list, and also produce '/a/b' when
>passed the following paths list:
>
>     paths = ['/a/b/c', '/a/b/cd']

Change that to [os.path.normpath(p)+'/' for p in paths] and you've 
got yourself a winner.


From pje at telecommunity.com  Wed Dec 31 05:11:34 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Dec 2008 23:11:34 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <18778.57257.227598.592245@montanaro-dyndns-org.local>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com>
	<4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
	<18778.57257.227598.592245@montanaro-dyndns-org.local>
Message-ID: <20081231040951.36EB43A410E@sparrow.telecommunity.com>

At 08:57 PM 12/30/2008 -0600, skip at pobox.com wrote:

>     Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>     >> More trouble with the "just take the dirname":
>     >>
>     >> paths = ['/a/b/c', '/a/b/d', '/a/b']
>     >> os.path.dirname(os.path.commonprefix([
>     >> os.path.normpath(p) for p in paths]))
>     >>
>     >> give '/a', not '/a/b'.
>
>     Phillip> ...because that's the correct answer.
>
>I don't understand.  If you search for os.path.commonprefix at
>codesearch.google.com you'll find uses like this:
>
>     if os.path.commonprefix([basedir, somepath]) != basedir:
>         ...
>
>which leads me to believe that other people using the current function in
>the real world would be confused by your interpretation.

It never would've occurred to me to use it for that, versus checking 
for somepath.startswith(basedir+sep).

The only thing I've ever used commonprefix for is to find the 
most-specific directory that contains all the specified paths.  Never 
occurred to me that there was any other use for it, actually.


From stephen at xemacs.org  Wed Dec 31 08:46:09 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 31 Dec 2008 16:46:09 +0900
Subject: [Python-Dev]  I would like an svn account
In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com>
References: <200812310155.40206.victor.stinner@haypocalc.com>
Message-ID: <87fxk4ss72.fsf@xemacs.org>

Victor Stinner writes:

 > I already asked in September to get an svn account to be able to
 > commit directly patches to trunk (or other branches like py3k). My
 > query was rejected because I didn't know Python core enough (and
 > maybe other reasons that I don't know).

One possible reason is that commit privilege is not about quality of
code, it's about quality of review.  Would you review your own code in
the same way that other committers review their own?  Would you make
the same decisions about which fixes to commit, which changes to wait
for others' review, and which to propose on Python-Dev first?
Remember, to be appropriate for Python, a patch needs not only to be
good code, it must also be "Pythonic".  Does your personal sense of
code quality result in Pythonic patches?  (I can't answer that,
because my own sense of Pythonicity is dubiously reliable at
best.<wink>)

Another possible reason is that, while it's not an absolute
requirement, in my projects I'm always a lot more supportive of
candidates who have a track record of helping others get their patches
committed.  Of course if your patches have a history of being accepted
often without substantial change, then implicitly you are doing good
self-review, and that might be enough.  But in my book, that path
*should* take longer and demand higher standards than the "review
others' patches" path.

 > The bigger patch was the bytes filename support for Python3,
 > accepted by Guido (after a long review ;-)).

Would you have committed that patch if nobody else had reviewed it?

 > Just because there are not enough people to review/commit patches
 > on the tracker and

Are you planning to review and commit other people's patches, and help
reduce this backlog?  Or just your own?  Your emphasis on your own
working speed suggests the latter.  Again, I'm more supportive of
people who want commit privileges in part to help improve the
project's process, as well as to remove obstacles to their own work.

 > so there are more and more open issues (and so more and more lost
 > patches) :-(

An open issue is not a lost patch.  It's an open issue.  In my own
projects, I oppose candidates who seem to think that the presumption
is that a patch should be applied quickly unless there's good reason
given not to.  Your phrasing suggests that attitude to me.



You don't have to pay attention to me, since I don't have a vote in
the matter.  And I don't mean to be negatively critical of you,
because I'm not in a position to speak for the Powers That Be in
Python.  Those are my criteria, and other people and projects use
different ones.  But it seems to me that the committers in Python do
mostly conform to my criteria, and thus it's *possible* that those
criteria are somewhat representative of the "maybe other reasons [you]
don't know."

If so, I suppose an explicit explanation may be of use to you (and
others in your position).

Happy New Year to you!

From alexandre at peadrop.com  Wed Dec 31 08:50:54 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Wed, 31 Dec 2008 02:50:54 -0500
Subject: [Python-Dev] test_subprocess and sparc buildbots
In-Reply-To: <2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com>
References: <495A9099.1030907@gmail.com>
	<acd65fa20812301837v7b207fe0na8868e6d075eb609@mail.gmail.com>
	<2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com>
Message-ID: <acd65fa20812302350h432530buf3c13d3bb6f3389a@mail.gmail.com>

On Tue, Dec 30, 2008 at 10:41 PM, Daniel (ajax) Diniz <ajaksu at gmail.com> wrote:
> A reliable way to get that in a --with-pydebug build seems to be:
>
> ~/py3k$ ./python -c "import locale; locale.format_string(1,1)"
> * ob
> object  : <refcnt 0 at 0x825c76c>
> type    : tuple
> refcount: 0
> address : 0x825c76c
> * op->_ob_prev->_ob_next
> NULL
> * op->_ob_next->_ob_prev
> object  : <refcnt 0 at 0x825c76c>
> type    : tuple
> refcount: 0
> address : 0x825c76c
> Fatal Python error: UNREF invalid object
> TypeError: expected string or buffer
> Aborted
>

Nice catch! I reduced your example to: "import _sre;  _sre.compile(0,
0, [])". And, it doesn't seem to be an input validation problem with
_sre. From what I saw, it's actually a bug in Py_TRACE_REFS's code.
Now, it's getting interesting!

It seems something is breaking the refchain. However, I don't know
what is causing the problem exactly.

> Found using Fusil in a very quick run on top of:
> Python 3.1a0 (py3k:68055M, Dec 31 2008, 01:34:52)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>
> So kudos to Victor again :)
>

Could share the details on how you used Fusil to find another crasher?
It sounds like a useful tool.

Thanks!

-- Alexandre

From p.f.moore at gmail.com  Wed Dec 31 09:49:43 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 31 Dec 2008 08:49:43 +0000
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081231040622.4C8143A405E@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
	<Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
	<20081231040622.4C8143A405E@sparrow.telecommunity.com>
Message-ID: <79990c6b0812310049g9c22991n21356ccba1cf6376@mail.gmail.com>

2008/12/31 Phillip J. Eby <pje at telecommunity.com>:
> Change that to [os.path.normpath(p)+'/' for p in paths] and you've got
> yourself a winner.

s#'/'#os.sep# to make it work on Windows as well :-)

Have we established yet that this is hard enough to get right to
warrant a stdlib implementation?
Paul

From solipsis at pitrou.net  Wed Dec 31 14:08:50 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 31 Dec 2008 13:08:50 +0000 (UTC)
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>
	<1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de>
	<18776.1376.724926.669345@montanaro-dyndns-org.local>
	<495808C6.4050304@v.loewis.de>
	<18776.2535.459306.987378@montanaro-dyndns-org.local>
	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>
	<18777.21289.504321.865439@montanaro-dyndns-org.local>
	<20081230010023.B46883A406C@sparrow.telecommunity.com>
	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>
	<gje7fm$rgi$1@ger.gmane.org>
	<20081230225006.B5D043A405E@sparrow.telecommunity.com>
	<18778.57257.227598.592245@montanaro-dyndns-org.local>
Message-ID: <loom.20081231T130331-715@post.gmane.org>

<skip <at> pobox.com> writes:
> 
> which leads me to believe that other people using the current function in
> the real world would be confused by your interpretation.

... and are vulnerable to security hazards.




From steve at holdenweb.com  Wed Dec 31 14:21:49 2008
From: steve at holdenweb.com (Steve Holden)
Date: Wed, 31 Dec 2008 08:21:49 -0500
Subject: [Python-Dev] A wart which should have been repaired in 3.0?
In-Reply-To: <20081231040622.4C8143A405E@sparrow.telecommunity.com>
References: <18773.27523.297588.265405@montanaro-dyndns-org.local>	<1230460141.6361.4.camel@localhost>
	<49575952.7070405@gmail.com>	<4957DFC5.9030405@v.loewis.de>	<18776.1376.724926.669345@montanaro-dyndns-org.local>	<495808C6.4050304@v.loewis.de>	<18776.2535.459306.987378@montanaro-dyndns-org.local>	<1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com>	<18777.21289.504321.865439@montanaro-dyndns-org.local>	<20081230010023.B46883A406C@sparrow.telecommunity.com>	<79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com>	<gje7fm$rgi$1@ger.gmane.org>	<20081230225006.B5D043A405E@sparrow.telecommunity.com>	<Pine.LNX.4.64.0812302126260.14037@kimball.webabinitio.net>
	<20081231040622.4C8143A405E@sparrow.telecommunity.com>
Message-ID: <gjfrld$f38$1@ger.gmane.org>

Phillip J. Eby wrote:
> At 09:30 PM 12/30/2008 -0500, rdmurray at bitdance.com wrote:
>> On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote:
>>> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote:
>>>> More trouble with the "just take the dirname":
>>>>
>>>>      paths = ['/a/b/c', '/a/b/d', '/a/b']
>>>>      os.path.dirname(os.path.commonprefix([
>>>>                          os.path.normpath(p) for p in paths]))
>>>> give '/a', not '/a/b'.
>>>
>>> ...because that's the correct answer.
>>
>> But not the answer that is wanted.
>>
>> So the challenge now is to write a single expression that will yield
>> '/a/b' when passed the above paths list, and also produce '/a/b' when
>> passed the following paths list:
>>
>>     paths = ['/a/b/c', '/a/b/cd']
> 
> Change that to [os.path.normpath(p)+'/' for p in paths] and you've got
> yourself a winner.
> 
Or possibly [os.path.normpath(p)+os.path.sep for p in paths]?

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From victor.stinner at haypocalc.com  Wed Dec 31 14:26:58 2008
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Wed, 31 Dec 2008 14:26:58 +0100
Subject: [Python-Dev] I would like an svn account
In-Reply-To: <87fxk4ss72.fsf@xemacs.org>
References: <200812310155.40206.victor.stinner@haypocalc.com>
	<87fxk4ss72.fsf@xemacs.org>
Message-ID: <200812311426.58779.victor.stinner@haypocalc.com>

Le Wednesday 31 December 2008 08:46:09 Stephen J. Turnbull, vous avez ?crit?:
> Would you review your own code in the same way that other committers 
> review their own?

I'm unable to review my own code. I always re-read my code and test it, but I 
can not see every possibles cases. That's why I prefer external eyes to 
review my code for parts of the code that I don't understand/known well 
enough.

> Would you make the same decisions about which fixes to commit, 
> which changes to wait for others' review, and which to propose 
> on Python-Dev first?

I think that I'm able to know if a patch needs a review or not. Especially if 
the patch changes the behaviour or the API (or if the patch is complex), I 
always prefer a review.

I will not use svn as I use the tracker. Sometimes, I write a quick and dirty 
patch to demonstrate a feature or to propose a solution to fix the bug. If 
the solution is accepted, I try to write a better patch.

>  > The bigger patch was the bytes filename support for Python3,
>  > accepted by Guido (after a long review ;-)).
>
> Would you have committed that patch if nobody else had reviewed it?

Certainly not. The patch changed the behaviour of most functions related to 
files. The mailing list + the bug tracker were the right tools.

>  > Just because there are not enough people to review/commit patches
>  > on the tracker and
>
> Are you planning to review and commit other people's patches, and help
> reduce this backlog?  Or just your own?

It depends on the issue. There are many trivial fixes that doesn't change the 
behaviour / API but just improve the project and are waiting for a review or 
are reviewed but not commited yet.

About my own patch: yes, I would like to use direclty on the svn without using 
the tracker to fix trivial bugs. Example: during one month, there were two 
gcc warnings in _testcapi module. The fix was trivial and it requires too 
much efforts to open an issue for such stupid warning.

> Again, I'm more supportive of
> people who want commit privileges in part to help improve the
> project's process, as well as to remove obstacles to their own work.

My not-so-secret goal is also to improve Python stability against fuzzing. I 
stopped to work on fuzzing because it took sometimes months to fix a dummy 
bug (dummy : easy to understand but also easy to fix without side effects).

Example of such issue: "import _tkinter; _tkinter.mainloop()" crashs Python 
(maybe not directly but later on garbage collection). I opened the issue 
(with a patch) in august, gpolo reviewed the patch ("Looks fine to me.") two 
weeks later, but 4 months later the isue is still open:
   http://bugs.python.org/issue3638

Is it was you called "An open issue is not a lost patch."?

> An open issue is not a lost patch.  It's an open issue.  In my own
> projects, I oppose candidates who seem to think that the presumption
> is that a patch should be applied quickly unless there's good reason
> given not to.  Your phrasing suggests that attitude to me.

Even after a review, some issues stay open for months or years.

Another example of issue: nntplib doesn't support IPv6, dmorr proposed a 
simple and good patch reusing the nice function socket.create_connection() 
one year ago. In this case, I think that nobody was able to test the change. 
But without testing it, I'm sure that the patch is better than the current 
situation. Well, if I have to commit the patch, I will test it before. My 
computer has a public IPv6 address :-)
   http://bugs.python.org/issue1664

> You don't have to pay attention to me,

No, your opinion is interresting. I hope that my answers will help you to 
understand my expectations about an svn account :-)

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/

From solipsis at pitrou.net  Wed Dec 31 14:47:30 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 31 Dec 2008 13:47:30 +0000 (UTC)
Subject: [Python-Dev] opcode dispatch optimization
Message-ID: <loom.20081231T133902-247@post.gmane.org>


Hello,

I would like to mention that I've written a patch which enables "threaded
interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2
3600+), it is good for a 15-20% speedup of the interpreter on pystone and
pybench. I also had the opportunity to test it on a Core2-derived CPU, where it
doesn't make a difference (I conjecture it's because Core2 CPUs have
hardware-based indirect branch optimizations). It will make no difference if the
interpreter is compiled with something else than gcc (I tested on Windows).

The additional complexity is very small. There's a separate script which is run
to build the dispatch table (only if needed, that is if dis.py has been
modified). In ceval.c, there are a couple of macros and some #ifdef's. That's
all. It breaks no test in the regression suite.

Could other people test and report their results here? (the patch is for py3k,
btw). Also, what are you thoughts for/against integrating this patch in the
standard interpreter?

Regards

Antoine.


(*) please note: it has nothing to see with multithreading.



From solipsis at pitrou.net  Wed Dec 31 14:49:53 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 31 Dec 2008 13:49:53 +0000 (UTC)
Subject: [Python-Dev] opcode dispatch optimization
References: <loom.20081231T133902-247@post.gmane.org>
Message-ID: <loom.20081231T134907-200@post.gmane.org>

Antoine Pitrou <solipsis <at> pitrou.net> writes:
> 
> I would like to mention that I've written a patch which enables "threaded
> interpretation"

... and I forgot to give the URL:
http://bugs.python.org/issue4753

Regards

Antoine.



From stephen at xemacs.org  Wed Dec 31 16:04:42 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 01 Jan 2009 00:04:42 +0900
Subject: [Python-Dev] I would like an svn account
In-Reply-To: <200812311426.58779.victor.stinner@haypocalc.com>
References: <200812310155.40206.victor.stinner@haypocalc.com>
	<87fxk4ss72.fsf@xemacs.org>
	<200812311426.58779.victor.stinner@haypocalc.com>
Message-ID: <87y6xwqtbp.fsf@xemacs.org>

Victor Stinner writes:
 > Le Wednesday 31 December 2008 08:46:09 Stephen J. Turnbull, vous avez ?crit?:

 > > Would you review your own code in the same way that other committers 
 > > review their own?
 > 
 > I'm unable to review my own code.

Of course not, in the formal "software process" sense.  But in some
sense to commit code you have to have reviewed it, that's all I meant.

 > Is it was you called "An open issue is not a lost patch."?

Yes, and I'll say it again:

 > > An open issue is not a lost patch.  It's an open issue.

 > Even after a review, some issues stay open for months or years.

There *is* a process problem, though I don't claim to have an idea how
to solve it.  Some developers (especially well-known is Martin van
Loewis) are trying to address this with the "one committer's review
for five reviews" offer, but maybe there are even better ways to do
it.  However, this is a *different problem* from "lost patches", which
many projects do suffer from, and shouldn't be called by that name,
which is insulting to the Python committers.

In particular, we know that effort is devoted to tracking open issues
by the developers, both individually and as a formal matter (the
weekly report).  It is insufficient in some sense, but way better
than, say, in XEmacs (a project I'm supposed to be leading :-/ ).  And
IIRC the statistics show that the number of issues closed is of the
same order of magnitude as those opened, although consistently lower
by 10-20%.  Actually, I think that's pretty amazing for a project
that has nobody whose salary depends on getting the numbers up.

 > > You don't have to pay attention to me,
 > 
 > No, your opinion is interresting. I hope that my answers will help you to 
 > understand my expectations about an svn account :-)

Well, as I say I have no vote.  But I hope your answers will help to
convince any doubters among the committers.

From solipsis at pitrou.net  Wed Dec 31 16:11:31 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 31 Dec 2008 15:11:31 +0000 (UTC)
Subject: [Python-Dev] lost patches
References: <200812310155.40206.victor.stinner@haypocalc.com>
	<87fxk4ss72.fsf@xemacs.org>
	<200812311426.58779.victor.stinner@haypocalc.com>
	<87y6xwqtbp.fsf@xemacs.org>
Message-ID: <loom.20081231T150610-46@post.gmane.org>


Hi,

Stephen J. Turnbull <stephen <at> xemacs.org> writes:
> 
> There *is* a process problem, though I don't claim to have an idea how
> to solve it.  Some developers (especially well-known is Martin van
> Loewis) are trying to address this with the "one committer's review
> for five reviews" offer, but maybe there are even better ways to do
> it.  However, this is a *different problem* from "lost patches", which
> many projects do suffer from, and shouldn't be called by that name,
> which is insulting to the Python committers.

I don't think it is insulting (I say that as a young Python committer), and I do
think it is fair to call them "lost patches". Perhaps not after four months, but
when a good patch hasn't been committed after two years, it is potentially lost
because the code base has changed a lot since that and 1) the patch doesn't
apply completely anymore 2) it must be reassessed whether the patch is
good/useful/necessary with respect to the current code base, which can be tricky.

As for reviews, we don't seem to use Rietveld a lot, although it offers a nice
interface for comfortably viewing changes, and possibly commenting them. The
overhead of having to open a separate issue in Rietveld and upload the patch
there is a bit annoying, though.

Regards

Antoine.



From lists at cheimes.de  Wed Dec 31 18:44:27 2008
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 31 Dec 2008 18:44:27 +0100
Subject: [Python-Dev] opcode dispatch optimization
In-Reply-To: <loom.20081231T133902-247@post.gmane.org>
References: <loom.20081231T133902-247@post.gmane.org>
Message-ID: <495BAF7B.5090405@cheimes.de>

Antoine Pitrou wrote:
> I would like to mention that I've written a patch which enables "threaded
> interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2
> 3600+), it is good for a 15-20% speedup of the interpreter on pystone and
> pybench. I also had the opportunity to test it on a Core2-derived CPU, where it
> doesn't make a difference (I conjecture it's because Core2 CPUs have
> hardware-based indirect branch optimizations). It will make no difference if the
> interpreter is compiled with something else than gcc (I tested on Windows).

The patch makes use of a GCC feature where labels can be used as values:
http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know
about the feature and got confused by the unary && operator.

A happy new your to you all!

Christian

From jason.orendorff at gmail.com  Wed Dec 31 19:51:28 2008
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Wed, 31 Dec 2008 12:51:28 -0600
Subject: [Python-Dev] opcode dispatch optimization
In-Reply-To: <495BAF7B.5090405@cheimes.de>
References: <loom.20081231T133902-247@post.gmane.org>
	<495BAF7B.5090405@cheimes.de>
Message-ID: <bb8868b90812311051tf45e405oeefc761cd38aad4e@mail.gmail.com>

On Wed, Dec 31, 2008 at 11:44 AM, Christian Heimes <lists at cheimes.de> wrote:
> The patch makes use of a GCC feature where labels can be used as values:
> http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know
> about the feature and got confused by the unary && operator.

Right.  SpiderMonkey (Mozilla's JavaScript interpreter) does this, and
it was good for a similar win on platforms that use GCC.  (It took me
a while to figure out why it was so much faster, so I think this patch
would be better with a few very specific comments!)

SpiderMonkey calls this optimization "threaded code" too, but this
isn't the standard meaning of that term. See:
http://en.wikipedia.org/wiki/Threaded_code

-j

From brett at python.org  Wed Dec 31 21:19:41 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 31 Dec 2008 12:19:41 -0800
Subject: [Python-Dev] lost patches
In-Reply-To: <loom.20081231T150610-46@post.gmane.org>
References: <200812310155.40206.victor.stinner@haypocalc.com>
	<87fxk4ss72.fsf@xemacs.org>
	<200812311426.58779.victor.stinner@haypocalc.com>
	<87y6xwqtbp.fsf@xemacs.org> <loom.20081231T150610-46@post.gmane.org>
Message-ID: <bbaeab100812311219o73c971d1h2d638a2fb7d35da5@mail.gmail.com>

On Wed, Dec 31, 2008 at 07:11, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>>
>> There *is* a process problem, though I don't claim to have an idea how
>> to solve it.  Some developers (especially well-known is Martin van
>> Loewis) are trying to address this with the "one committer's review
>> for five reviews" offer, but maybe there are even better ways to do
>> it.  However, this is a *different problem* from "lost patches", which
>> many projects do suffer from, and shouldn't be called by that name,
>> which is insulting to the Python committers.
>
> I don't think it is insulting (I say that as a young Python committer), and I do
> think it is fair to call them "lost patches". Perhaps not after four months, but
> when a good patch hasn't been committed after two years, it is potentially lost
> because the code base has changed a lot since that and 1) the patch doesn't
> apply completely anymore 2) it must be reassessed whether the patch is
> good/useful/necessary with respect to the current code base, which can be tricky.
>

It is unfortunate when a good patch for a real issue doesn't get
applied during the current development cycle. But I honestly think, in
general, the important ones do get looked at and handled. Yes, some
slip through the cracks, but overall I think we do pretty well.

> As for reviews, we don't seem to use Rietveld a lot, although it offers a nice
> interface for comfortably viewing changes, and possibly commenting them. The
> overhead of having to open a separate issue in Rietveld and upload the patch
> there is a bit annoying, though.

My hope is that some day we get around to fixing this and getting a
code review application tied into the issue workflow so it is no more
than pressing a button.

-Brett

From brett at python.org  Wed Dec 31 22:20:54 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 31 Dec 2008 13:20:54 -0800
Subject: [Python-Dev] I would like an svn account
In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com>
References: <200812310155.40206.victor.stinner@haypocalc.com>
Message-ID: <bbaeab100812311320s2c1d4ee4vdab0517051efa674@mail.gmail.com>

On Tue, Dec 30, 2008 at 16:55, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Hi,
>
> I already asked in September to get an svn account to be able to commit
> directly patches to trunk (or other branches like py3k). My query was
> rejected because I didn't know Python core enough (and maybe other reasons
> that I don't know).
>

I am going to stick my neck out on this one and say why I have not
spoken up for giving you commit privs, Victor, and my general thoughts
on handing them out since I don't think this has been stated by anyone
before.

When it comes to commit privs in general, I am of the school that they
should be handed out carefully. I for one do not want to have to
babysit other committers to make sure that they did something
correctly. That's a waste of my time since that defeats the purpose of
having more committers. This is why I think Benjamin got is privs too
soon. Luckily Georg took it upon himself, I assume because he gave
Benjamin the privileges, to double-check all of Benjamin's checkins
and fix them until Benjamin absorbed enough of the development process
to no longer need to be watched over. But I was honestly rather close
to suggesting Benjamin lose is privileges early on until he had more
time to figure out how things worked. Luckily it didn't come to that
and Benjamin has turned out to be a good developer.

I also want people who have no agenda. It's okay to have an area you
care about, but that doesn't mean you should necessarily say "I will
only work on math, ever, even if something is staring me right in the
face!", etc.

There is also dedication. I don't like giving commit privileges to
people who I don't think will definitely stick around. It's fine if
they come and go, but if I am not sure if they will typically come
back I would prefer to not bother giving them the privilege of saying
they are a developer of Python. Typically this takes a year of regular
contributions for me to believe this.

And lastly, general cohesion with the other committers. Once you
become a committer you become a co-worker in a way and that means
getting along with everybody. And since we don't have some manager who
forces a new co-worker down our throats we tend to be very picky about
this. Plus I already lived through high school and I don't want that
kind of drama here.

So that is my personal criteria on whether or not I speak up for
someone getting commit privileges. How do you play into all of this in
my head? To start, your focus on security, for me at least, goes too
far sometimes. I have disagreed with some of your decisions in the
name of security in the past and I am not quite ready to say that if
you committed something I wouldn't feel compelled to double-check it
to make sure you didn't go too far. This worry, though, has gone down
a lot compared to the last time you asked for commit privs.

And I do worry about your attitude. I remember at one point you
basically threatened to stop helping because your patches were not
been looked at quickly. That really pissed me off personally. You have
improved here and are a lot less abrasive than you were, but I am
still smarting a little from some comments you made a few months back
that came off as pushy.

And as I said, I prefer to give commit privileges to people who I
think will stick around and have been contributing regularly for a
year (I just checked bugs.python.org and it looks like you got really
involved only five months ago). Saying you stopped doing your fuzzing
work simply because the turn-around was not to your liking does not
cause me to instantly think you will stick around when it gets nasty
around here (which in variably does a couple times a year).

In other words I think you are on the right track to get commit
privileges in the future, but just not right now (although if you did
get them right now I wouldn't throw up a roadblock).

-Brett

From nicko at nicko.org  Wed Dec 31 23:34:40 2008
From: nicko at nicko.org (Nicko van Someren)
Date: Wed, 31 Dec 2008 14:34:40 -0800
Subject: [Python-Dev] Python 3 - Mac Installer?
In-Reply-To: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org>
References: <200812260855.49518.list@qtrac.plus.com>
	<1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com>
	<74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org>
Message-ID: <B7B058C6-43E7-410C-BA2C-E2FBDE29A05D@nicko.org>

On 30 Dec 2008, at 13:45, Barry Scott wrote:
...
> Since I've been building 3.0 for a while now I looked at the script.
>
> build-install.py seems to have been half converted to py 3.0.
> Going full 3.0 was not hard but then there is the problem of
> the imports.
>
> Python 3.0 does not have MacOS or Carbon modules.
>
> Seems that there are two ways to go.
>
> Put back the Carbon and MacOS modules into 3.0.
> Use Python 2 to build the python 3 package.

As far as I can tell the Carbon and MacOS modules are _only_ used in  
the setIcon() function, which is used to give pretty icon to the  
python folder.  Perhaps it might be better to have a fully Python 3  
build system and loose the prettiness for the time being.

	Nicko