> The point isn't about my suffering as such. The point is more that
> python-dev owns a tiny amount of the code out there, and I don't believe we
> should put Python's users through this.
> Sure - I would be happy to "upgrade" all the win32all code, no problem. I
> am also happy to live in the bleeding edge and take some pain that will
> The issue is simply the user base, and giving Python a reputation of not
> being able to painlessly upgrade even dot revisions.
I agree with all this.
[As I imagined explicit syntax did not catch up and would require
lot of discussions.]
> > Another way is to use special rules
> > (similar to those for class defs), e.g. having
> > <frag>
> > y=3
> > def f():
> > exec "y=2"
> > def g():
> > return y
> > return g()
> > print f()
> > </frag>
> > # print 3.
> > Is that confusing for users? maybe they will more naturally expect 2
> > as outcome (given nested scopes).
> This seems the best compromise to me. It will lead to the least
> broken code, because this is the behavior that we had before nested
> scopes! It is also quite easy to implement given the current
> implementation, I believe.
> Maybe we could introduce a warning rather than an error for this
> situation though, because even if this behavior is clearly documented,
> it will still be confusing to some, so it is better if we outlaw it in
> some future version.
Yes this can be easy to implement but more confusing situations can arise:
What should this print? the situation leads not to a canonical solution
as class def scopes.
from foo import *
> > This probably won't be a very popular suggestion, but how about pulling
> > nested scopes (I assume they are at the root of the problem)
> > until this can be solved cleanly?
> Agreed. While I think nested scopes are kinda cool, I have lived without
> them, and really without missing them, for years. At the moment the cure
> appears worse then the symptoms in at least a few cases. If nothing else,
> it compromises the elegant simplicity of Python that drew me here in the
> first place!
> Assuming that people really _do_ want this feature, IMO the bar should be
> raised so there are _zero_ backward compatibility issues.
I don't say anything about pulling nested scopes (I don't think my opinion
can change things in this respect)
but I should insist that without explicit syntax IMO raising the bar
has a too high impl cost (both performance and complexity) or creates
> >Assuming that people really _do_ want this feature, IMO the bar should be
> >raised so there are _zero_ backward compatibility issues.
> Even at the cost of additional implementation complexity? At the cost
> of having to learn "scopes are nested, unless you do these two things
> in which case they're not"?
> Let's not waffle. If nested scopes are worth doing, they're worth
> breaking code. Either leave exec and from..import illegal, or back
> out nested scopes, or think of some better solution, but let's not
> introduce complicated backward compatibility hacks.
IMO breaking code would be ok if we issue warnings today and implement
nested scopes issuing errors tomorrow. But this is simply a statement
about principles and raised impression.
IMO import * in an inner scope should end up being an error,
not sure about 'exec's.
We will need a final BDFL statement.
regards, Samuele Pedroni.
Looking at the recent burst of checkins for the Unicode implementation
completely bypassing the standard SF procedure and possible comments
I might have on the different approaches, I guess I've been ruled out
as maintainer and designer of the Unicode implementation.
Well, I guess that's how things go. Was nice working for you guys,
but no longer is... I'm tired of having to defend myself against
meta-comments about the design, uncontrolled checkins and no true
backup about my standing in all this from Guido.
Perhaps I am misunderstanding the role of a maintainer and
implementation designer, but as it is all respect for the work I've
put into all this seems faded. That's the conclusion I draw from recent
postings by Martin and Fredrik and their nightly "takeover".
CEO eGenix.com Software GmbH
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/
Slow python-dev day...consider this exiting new proposal to allow deal
with important new characters like the Japanese dentristy symbols and
ecological symbols (but not Klingon)
-------- Original Message --------
Subject: PEP: Support for "wide" Unicode characters
Date: Thu, 28 Jun 2001 15:33:00 -0700
From: Paul Prescod <paulp(a)ActiveState.com>
To: "python-list(a)python.org" <python-list(a)python.org>
Title: Support for "wide" Unicode characters
Version: $Revision: 1.3 $
Author: paulp(a)activestate.com (Paul Prescod)
Type: Standards Track
Post-History: 27-Jun-2001, 28-Jun-2001
Python 2.1 unicode characters can have ordinals only up to 2**16
These characters are known as Basic Multilinual Plane characters.
There are now characters in Unicode that live on other "planes".
The largest addressable character in Unicode has the ordinal 17 *
2**16 - 1 (0x10ffff). For readability, we will call this TOPCHAR
and call characters in this range "wide characters".
Used by itself, means the addressable units of a Python
If you imagine Unicode as a mapping from integers to
characters, each integer represents a code point. Some are
really used for characters. Some will someday be used for
characters. Some are guaranteed never to be used for
A code point defined in the Unicode standard whether it is
already assigned or not. Identified by an integer.
An integer representing a character in some encoding.
Two code units that represnt a single Unicode character.
One solution would be to merely increase the maximum ordinal to a
larger value. Unfortunately the only straightforward
implementation of this idea is to increase the character code unit
to 4 bytes. This has the effect of doubling the size of most
Unicode strings. In order to avoid imposing this cost on every
user, Python 2.2 will allow 4-byte Unicode characters as a
build-time option. Users can choose whether they care about
wide characters or prefer to preserve memory.
The 4-byte option is called "wide Py_UNICODE". The 2-byte option
is called "narrow Py_UNICODE".
Most things will behave identically in the wide and narrow worlds.
* unichr(i) for 0 <= i < 2**16 (0x10000) always returns a
* unichr(i) for 2**16 <= i <= TOPCHAR will return a
length-one string representing the character on wide Python
builds. On narrow builds it will return ValueError.
ISSUE: Python currently allows \U literals that cannot be
represented as a single character. It generates two
characters known as a "surrogate pair". Should this be
disallowed on future narrow Python builds?
ISSUE: Should Python allow the construction of characters
that do not correspond to Unicode characters?
Unassigned Unicode characters should obviously be legal
(because they could be assigned at any time). But
code points above TOPCHAR are guaranteed never to
be used by Unicode. Should we allow access to them
* ord() is always the inverse of unichr()
* There is an integer value in the sys module that describes the
largest ordinal for a Unicode character on the current
interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds
of Python and TOPCHAR on wide builds.
ISSUE: Should there be distinct constants for accessing
TOPCHAR and the real upper bound for the domain of
unichr (if they differ)? There has also been a
suggestion of sys.unicodewith which can take the
values 'wide' and 'narrow'.
* codecs will be upgraded to support "wide characters"
(represented directly in UCS-4, as surrogate pairs in UTF-16 and
as multi-byte sequences in UTF-8). On narrow Python builds, the
codecs will generate surrogate pairs, on wide Python builds they
will generate a single character. This is the main part of the
implementation left to be done.
* there are no restrictions on constructing strings that use
code points "reserved for surrogates" improperly. These are
called "isolated surrogates". The codecs should disallow reading
these but you could construct them using string literals or
unichr(). unichr() is not restricted to values less than either
TOPCHAR nor sys.maxunicode.
There is a new (experimental) define:
#define PY_UNICODE_SIZE 2
There is a new configure options:
--enable-unicode=ucs2 configures a narrow Py_UNICODE, and uses
wchar_t if it fits
--enable-unicode=ucs4 configures a wide Py_UNICODE, and uses
whchar_t if it fits
--enable-unicode same as "=ucs2"
The intention is that --disable-unicode, or --enable-unicode=no
removes the Unicode type altogether; this is not yet implemented.
This PEP does NOT imply that people using Unicode need to use a
4-byte encoding. It only allows them to do so. For example,
ASCII is still a legitimate (7-bit) Unicode-encoding.
Rationale for Surrogate Creation Behaviour
Python currently supports the construction of a surrogate pair
for a large unicode literal character escape sequence. This is
basically designed as a simple way to construct "wide characters"
even in a narrow Python build.
ISSUE: surrogates can be created this way but the user still
needs to be careful about slicing, indexing, printing
etc. Another option is to remove knowledge of
surrogates from everything other than the codecs.
There were two primary solutions that were rejected. The first was
more or less the status-quo. We could officially say that Python
characters represent UTF-16 code units and require programmers to
implement wide characters in their application logic. This is a
heavy burden because emulating 32-bit characters is likely to be
very inefficient if it is coded entirely in Python. Plus these
abstracted pseudo-strings would not be legal as input to the
regular expression engine.
The other class of solution is to use some efficient storage
internally but present an abstraction of wide characters
to the programmer. Any of these would require a much more complex
implementation than the accepted solution. For instance consider
the impact on the regular expression engine. In theory, we could
move to this implementation in the future without breaking Python
code. A future Python could "emulate" wide Python semantics on
This document has been placed in the public domain.
Paul Prescod <paulp(a)ActiveState.com> writes:
> "M.-A. Lemburg" wrote:
> > I'd suggest not to use the term character in this PEP at all;
> > this is also what Mark Davis recommends in his paper on Unicode.
> That's fine, but Python does have a concept of character and I'm going
> to use the term character for discussing these.
As a Unicode Idiot (tm) can I please beg you to reconsider? There are
so many possible meanings for "character" that I really think it's
best to avoid the word altogether. Call Python characters "length 1
strings" or even "length 1 Python strings".
> > Please note that you are mixing terms: you don't construct
> > characters, you construct code points. Whether the concatenation
> > of these code points makes a valid Unicode character string
> > is an issue which applications and codecs have to decide.
> unichr() does not construct code points. It constructs 1-char Python
> Unicode strings
This is what I think you should be saying.
> ...also known as Python Unicode characters.
Which I'm suggesting you forget!
I'm a keen cyclist and I stop at red lights. Those who don't need
hitting with a great big slapping machine.
-- Colin Davidson, cam.misc
A week ago I posted this on jython-dev, but no-one was able to give any
advise on the best way to fix it. Maybe you can help.
For some time now, our [jython] web CVS have not worked correctly:
Finally I managed to track the problem to the Java2Accessibility.py,v
file in the CVS repository. The "rlog" command cannot be executed on
Thanks to Rob Collins (implementer) and Greg Smith (profiler), Cygwin now
provides enough pthreads support so that Cygwin Python builds OOTB *and*
functions reasonably well even with threads enabled. Unfortunately,
there are still a few issues that need to be resolved.
The one that I would like to address in this posting prevents a threaded
Cygwin Python from building the standard extension modules (without some
kind of intervention). :,( Specifically, the build would frequently
hang during the Distutils part when Cygwin Python is attempting to execvp
a gcc process.
See the first attachment, test.py, for a minimal Python script that
exhibits the hang. See the second attachment, test.c, for a rewrite
of test.py in C. Since test.c did not hang, I was able to conclude that
this was not just a straight Cygwin problem.
Further tracing uncovered that the hang occurs in _execvpe() (in os.py),
when the child tries to import tempfile. If I apply the third attachment,
os.py.patch, then the hang is avoided. Hence, it appears that importing a
module (or specifically the tempfile module) in a threaded Cygwin Python
child cause a hang.
I saw the following comment in _execvpe():
# Process handling (fork, wait) under BeOS (up to 5.0)
# doesn't interoperate reliably with the thread interlocking
# that happens during an import. The actual error we need
# is the same on BeOS for posix.open() et al., ENOENT.
The above makes me think that possibly Cygwin is having a similar problem.
Can anyone offer suggestions on how to further debug this problem?
Director, Software Engineering Phone: 732.264.8770 x235
Dot Hill Systems Corp. Fax: 732.264.8798
82 Bethany Road, Suite 7 Email: Jason.Tishler(a)dothill.com
Hazlet, NJ 07730 USA WWW: http://www.dothill.com
Short version: I can confirm that bug under linux, but the patch breaks
nis module on solaris.
Linux machine is:
Linux malhar 2.2.16-3smp #1 SMP Mon Jun 19 17:37:04 EDT 2000 i686 unknown
with python version from recent CVS. I see the reported bug and the
suggested patch does fix the problem.
Sparc box looks like this:
SunOS cfa0 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-Enterprise
using python2.0 source tree. The nis module works out of the box, but
applying the suggested patch breaks it: 'nis.error: No such key in map'.
Correct me, but AFAICS there are only 186 days left until Python's MAGIC
/* XXX Perhaps the magic number should be frozen and a version field
added to the .pyc file header? */
/* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */
#define MAGIC (60202 | ((long)'\r'<<16) | ((long)'\n'<<24))
I couldn't find this problem in the SF bug tracking system. Should I
submit a new bug entry ?
Recently, Jack Jansen <jack(a)oratrix.nl> said:
> Just noted (that's Just-the-person, not me-just-noting:-) that on the
> Mac time.strftime() can blow up with an access violation if you pass
> silly values to it (such as 9 zeroes).
Following up to myself, after I just noticed (just-me-noticing, not
Just-the-person this time) that all zeros is a legal C value:
gettmarg() converts this all-zeroes tuple to
(0, 0, 0, 0, -1, 100, 0, -1, 0)
Fine with me, apparently Python wants to have human-understandable
(1-based) monthnumbers and yeardaynumbers, but then I think it really
should also check that the values are in-range.
What do others think?
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen(a)oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
Just noted (that's Just-the-person, not me-just-noting:-) that on the
Mac time.strftime() can blow up with an access violation if you pass
silly values to it (such as 9 zeroes).
Does anyone know enough of the ANSI standard to tell me how strftime
should behave with out-of-range values? I.e. should I report this as a
bug to MetroWerks or should we rig up time.strftime() to check that
all the values are in range?
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen(a)oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++