I keep getting regular requests from people looking for Python coders
(and this is in addition to Google asking me to hand over my contacts
:-). This is good news because it suggests Python is on the uptake
(always good to know). At the same time it is disturbing because
apparently there aren't enough Python programmers out there. (At least
none of them looking for work.) What's up with that?
I wonder if we should start maintaining a list of Python developers
for hire somewhere on python.org, beyond the existing Jobs page. Is
anyone interested in organizing this?
--Guido van Rossum (home page: http://www.python.org/~guido/)
I've just received a private email from Christian Jacobsen (we were discussing
some ctypes bugs/deficiencies that do not matter in this context). He wrote:
> [...] The bug
> reporting procedures for documentation is a big inconsistent:
> http://wiki.python.org/moin/SubmittingBugs, says: "If you find errors
> in the documentation, please use either the Add a comment or the
> Suggest a change features of the relevant page in the most recent
> online documentation at http://docs.python.org/.", but the most
> recent online documentation points to the SF bugtracker or
> docs(a)python.org. The SF bugtracker in points back to bugs.python.org
I feel with him. Further, there is no 'Add a comment' or 'Suggest a change' link
in the 2.5 documentation shown at http://docs.python.org.
Based on Jean-Yves Mengant's work on previous versions, I have ported
Python 2.5.1 to z/OS. A patch against current svn head is attached to
<http://bugs.python.org/issue1298>. The same patch should work with very
little changes also against pristine 2.5.1 sources. (The only failing
hunk is for Modules/makesetup, and it is quite trivial.)
I have no opinion on whether the patch should eventually be incorporated
into the main distribution. The port was motivated by internal reasons,
and I'm merely offering it as a community service to anyone else who
might be interested. If Jean-Yves wishes to distribute it from his
z/OS-page, that is fine with me. In general, anyone can do what they
want with the patch, but please give credit.
I'll describe some of the porting issues below.
The biggest, major difficulty with z/OS is of course the character set.
There are lots of ASCII-dependencies in Python code, and z/OS uses
CP1047, an EBCDIC variant, which is utterly incompatible with ASCII.
There are two possible approaches in this situation. One is to keep on
using ASCII as the execution character set (and also as the default
encoding of string objects), and to add conversion support to everywhere
where we do text-based I/O, so that communication with the external
world still happens in EBCDIC. This was feasible since the z/OS C
compiler does support ASCII as the execution character set. (The source
character set would still remain EBCDIC, though. If you've ever wondered
why the C standard makes a distinction between these, here's a prime
example of a situation where they're different.)
However, I decided against this approach. The I/O conversions would have
been deeply magical, and would have required classic "text mode vs.
binary mode" -crap, which would be rather confusing.
Instead, I followed Jean-Yves' example and kept Python as a "native"
EBCDIC application: input, 8-bit data is treated by default as EBCDIC
everywhere. This only required fixing various ASCII-specific bits in the
code, e.g. stuff like this (in PyString_DecodeEscape):
- else if (c < ' ' || c >= 0x7f)
+ else if (!isprint((unsigned char) c))
Of course, now this allows unescaped printing of characters if they are
printable in the platform's encoding even if they wouldn't be printable
in ASCII. I'm not sure if this is desirable or not. It would be simple
to fix this so that only characters in the ASCII _character set_ are
A result of making strings EBCDIC-native is that it breaks any code that
depends on string literals being in ASCII. This probably applies to most
network protocol implementations written in Python. On the other hand,
making string literals use ASCII would break code that does ordinary
text processing on local files. Damned if you do, damned if you don't.
The real issue is that strings in Python are rather underspecified.
String objects are really just octet sequences without any _inherent_
textual interpretation for them. This is apparent from the fact that
strings are what are read from and written to binary files, and also
what unicode strings are encoded to and decoded from. However, Python
syntax allows specifying an octet sequence with a _character_ sequence
(i.e. a string literal), and the relationship between the source
characters and the resulting octets has been left implicit. So
programmers aren't really encouraged to think about character set issues
and the end result is code that only works on a platform that uses ASCII
Python already has the property that the meaning of a source file
depends on its encoding: if I write a string literal with some latin-1
characters, the resulting octet sequence depends on whether my source
was encoded in latin-1 or utf-8. I'm not sure if this is a good idea,
but my approach with the z/OS port continues the tradition: when your
source is in EBCDIC, the string literals get encoded in EBCDIC.
All this just shows that treating plain octet sequences as "strings"
simply won't work in the long run. You have to have separate type for
_textual_ data (i.e. Unicode strings, in Python), and encode and decode
between those and octet sequences using some _explicit_ encoding. Of
course, all non-English-speaking people have been keenly aware of this
already for ages. The relative universality of ASCII is an exception
amongst encodings rather than the norm. It's only reasonable to require
English text to require the same attention to encodings as all the other
The biggest hurdle by far (at least LoC-wise) in the porting was
Unicode. The code assumed that the execution character set was not only
ASCII, but ISO-8859-1, since there was lots of casting back and forth
between Py_UNICODE and char. I added the following conversion operations
# define Py_UNICODE_FROM_CHAR(c) ((Py_UNICODE)(unsigned char)(c))
# define Py_UNICODE_AS_CHAR(u) (u < 0x80 ? (char)(unsigned char)(u) : '\0')
# define Py_UNICODE_FROM_CHAR(c) _PyUnicode_FromChar(c)
# define Py_UNICODE_AS_CHAR(u) _PyUnicode_AsChar(u)
The Py_UNICODE_AS_CHAR operation maps a unicode character into a char in
the execution character set's encoding, or to '\0' if it's not
When on a non-ASCII platform, I used the simplest trick of all:
/* Map from ASCII codes to the platform's execution character set, or to
'\0' if the corresponding character is not known. */
static const char unicode_ascii_table =
(This is reasonably portable, as all the printable ASCII characters
except `, @ and $ are required by C to be present in any source or
execution character set, and of those, Python requires all but $.)
This, and the corresponding reverse index, are good enough for all
purposes in the Python core: converting unicode string literals into
unicode objects, detecting special escape characters, and calculating
digit values. It doesn't allow writing string or unicode literals that
directly contain characters that don't exist in ASCII, though. But since
such code wouldn't be portable across character sets anyway, this isn't
much of a problem.
I also added a Lib/encodings/cp1047.py that does proper recoding outside
the core. It was generated from jdk-1.5.0/CP1047.TXT (from
map seems to best correspond to the actual conventions I have seen on a
Now, strings and unicode seem to work together fairly well, even though
the results may be a bit surprising to one used to ASCII and its
Here 129 is the EBCDIC value of the letter 'a'. The unicode literal
u'a', like all textual input, is itself represented in EBCDIC:
[164, 125, 129, 125]
But when such a literal is parsed, the resulting unicode object has the
correct value for the corresponding unicode character:
And, of course, when this unicode literal is printed back or its repr is
taken, it is again encoded to EBCDIC so it shows correctly:
[164, 125, 129, 125]
This seems to me to be the Right Thing. Now, as long as no exotic
characters are used directly in the source, source can be translated
between ASCII and EBCDIC so that strings and unicode strings retain
their correct semantic character values, even though the encoding of the
literals themselves is different. String objects have a
platform-dependent encoding, but unicode objects behave the same
One problem with this approach is that it is completely incompatible
with Python's UTF-8 support. The parser assumes that utf-8 (or latin-1)
are supersets of the platform's native encoding, and this of course
isn't true with EBCDIC.
A consequence is that the z/OS port cannot support eval of unicode
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
SyntaxError: invalid syntax
This is because internally evaluation of unicode strings is implemented
by first encoding the unicode string as utf-8, and then trying to parse
that. And this of course fails.
This seems like a rather complicated and limited way of going about it.
It would be much cleaner and more portable to first decode input into
unicode by various means, and then to parse the unicode. Then unicode
strings would be the ones that don't need any special processing. But
this would require heavy changes to Python's parsing machinery, and I
tried to keep my changes as minimal as possible for now.
One more character set issue arised with pickling. The pickle protocols
are a bit schitzophrenic in the sense that they can't quite decide
whether to be textual or binary protocols. A textual protocol should be
readable, and recodable across platforms to preserve the semantic
character values correctly, whereas a binary protocol should be based on
specific octet values whose readability is not an issue.
The original pickle protocol 0 can be seen either as a textual protocol
(all the pickles are readable), or a binary protocol (when characters
get mapped to their corresponding octet values in ASCII). The other
protocol versions, though extensions of protocol 0, are clearly binary,
since the pickled data is at least partially specified as specific octet
Now, on an EBCDIC platform, it's impossible to have protocol 0 be
textual while still compatible with the other protocols. This is because
e.g. the following opcodes get the same value if we let 'a' be textual
(i.e. encoded in the host platforms's encoding):
APPEND = 'a' # append stack top to list below it
NEWOBJ = '\x81' # build object by applying cls.__new__ to argtuple
In the end, for now, I made protocol 0 textual, and disabled support for
protocol versions > 0 on non-ASCII platforms. This seems like the safest
choice. It's certainly possible to add support for the binary protocols
and make them explicitly use ASCII, but that again would require
Incidentally, modified_EncodeRawUnicodeEscape in cPickle.c seems to be
out of sync with the one in unicodeobject.c, in that it lacks support
for Py_UNICODE_WIDE. Also, both versions generate a latin-1 string as
output, which doesn't seem portable enough. My patch recodes characters
in ASCII to the execution character set, and escapes everything else,
even characters in U+0080 - U+00FF -range. (Though strictly, all the
latin-1 characters happen to be representable in CP1047. But this is not
something that I think it's good to depend upon.)
There were quite a number of places where (hex) digits were parsed
nonportably. I added the following to longobject.h, and used that:
PyAPI_FUNC(int) _PyLong_DigitValue(char c);
This resulted in some nice cleanups. From PyString_DecodeEscape:
- unsigned int x = 0;
- c = Py_CHARMASK(*s);
- if (isdigit(c))
- x = c - '0';
- else if (islower(c))
- x = 10 + c - 'a';
- x = 10 + c - 'A';
- x = x << 4;
- c = Py_CHARMASK(*s);
- if (isdigit(c))
- x += c - '0';
- else if (islower(c))
- x += 10 + c - 'a';
- x += 10 + c - 'A';
- *p++ = x;
+ int xh = _PyLong_DigitValue(*s++);
+ int xl = _PyLong_DigitValue(*s++);
+ *p++ = Py_CHARMASK(xh * 16 + xl);
Most of the other changes are boring build-technical issues and tweaks
to make things compile on z/OS's very spartan support for Unix-like
facilities. I hard-coded various #ifdef __MVS__ bits here and there to
make things compile. I guess these things should properly be checked by
configure, but I'm not very good at autoconf magic, and besides, running
configure takes _ages_ on the machine I'm using, so I wasn't inclined to
tweak the scripts any more than I had to.
The dynamic loading support in dynload_mvs.c is verbatim from Jean-Yves'
modifications. I just cleaned it up a little.
I have only tested this with --enable-shared (which does what
--with-zdll did in Jean-Yves' version, i.e. enables shared libraries).
Without shared libraries the building of extensions may well fail
because of some linkage tweaks in Lib/distutils/unixccompiler.py. I hope
there is some way of deciding what to do depending on whether shared
libraries are enabled or not.
One nasty difficulty was that the makefile implicitly assumes that
shared libraries are named libpython2.x.dll only on Windows. However,
they have that name on z/OS, too. I resolved this with a simple "case
$(MACHDEP)" in the rule for building the library, but hopefully someone
can come up with a prettier solution.
Various wrappers for external libraries are untested. Certainly it might
be possible to install zlib, libbz2, openssl and various other nifty
libraries on z/OS, and see if the Python wrappers work, but that is an
undertaking that I will pass at least for now.
Quite a number of tests fail simply because they assume that strings are
encoded in ASCII. For instance, Lib/test/test_calendar.py fails because
the expected result is:
result_2004_html = """
<?xml version="1.0" encoding="ascii"?>
And the real result begins with:
<?xml version="1.0" encoding="cp1047"?> ...
There were so many of these kinds of failures that there may be some
_actual_ problems amongs them that I've overlooked.
That is about all. Comments are welcome. I'd be especially interested in
hearing if my patch works on any other machine besides the one I was
Lauri Alanko Software Engineer
SSH Communications Security Corp Mobile: +358-40-864-3037
Valimotie 17, FI-00380, Helsinki, Finland Tel: +358-20-500-7000
http://www.ssh.com/ Fax: +358-20-500-7001
I think the latest patch for fixing Issue 708374 (adding offset to mmap)
should be committed to SVN.
I will do it, if nobody opposes the plan. I think it is a very
important addition and greatly increases the capability of the mmap module.
P.S. Initially sent this to the wrong group (I've been doing that a lot
lately --- too many groups seen through gmane...). Apologies for
smtpd.SMTPChannel contains a bug such that when connected to an SMTPServer
(or any subclass thereof), issuing a MAIL command with no argument closes
the socket and gives this error on the server:
error: uncaptured python exception, closing channel <smtpd.SMTPChannelconnected
127.0.0.1:58587 at 0x847d8> (<type 'exceptions.TypeError'>:'NoneType' object
The desired result is of course is to respond with a 501 Syntax error. The
problem arises because the arg parameter passed to each smtp_* command
handler function is None when there is no argument. arg is passed on to
__getaddr which attempts a slice on the None object.
I think the most elegant solution would be to insert a "if not arg" check in
__getaddr and return. The existing smtp_MAIL code will then issue the 501
as you may have seen in the checkins, I just created a new chapter
in the docs called "Using Python."
We, the doc team, would like that section to contain, roughly,
information how to install Python on different platforms, how to
configure it, how to invoke it, what extra things and quirks you
should know about, etc.
Currently, there are two sections: one about the Python command-line
options, newly distilled from the manpage and other sources docs, and the
old "Using Python on Mac" document which had been in the HOWTO section
since the migration.
Now it would of course be nice to have at least two other documents
for Unixy platforms and Windows...
I don't think this is not written down somewhere, but it isn't in the
official docs currently -- so it'd be wonderful if you could help us
to collect this information, give it a proper shape and add it to
the new chapter.
Even links to wiki pages, blog articles etc. can help!
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.
In trunk after 2.5, equality and hashing for TestCase were added, changing the behavior so that two instances of TestCase for the same test method hash the same and compare equal. This means two instances of TestCase for the same test method cannot be added to a single set.
Here's the change:
The implementations aren't even very good, since they prevent another type from deciding that it wants to customize comparison against TestCase (or TestSuite,
or FunctionTestCase) instances.
Is there a real use case for this functionality? If not, I'd like it to be
removed to restore the old behavior.
I've got a tricky problem with a self-compiled python2.4.4 under Windows
Vista. I compiled with Visual Studio 8 Standard (.NET 2005).
Python greets with the right header:
Python 2.4.4 (#71, Oct 19 2007, 18:49:44) [MSC v.1400 32 bit (Intel)]
I started the Visual Studio Shell and tried to get some extensions using
python ez_setup.py readline
Best match: readline 2.4.2
Running readline-2.4.2\setup.py -q bdist_egg --dist-dir
error: Setup script exited with error: Python was built with version
8.0 of Visual Studio, and extensions need to be built with the same
version of the compiler, but it isn't installed.
Does anyone know, why setuptools cannot find VS8? I'm sure the system
variables are set correctly as in %VCDIR%\bin\vcvars32.bat.
gocept gmbh & co. kg · forsterstrasse 29 · 06112 halle/saale · germany
www.gocept.com · work. +49 345 122988912 · fax. +49 345 12298891
The 64-bit windows trunk buildbot now only fails the test_winsound test.
This is because for whatever reasons the machine cannot play any sounds.
I have no idea why this is so, and I'm not too inclined to fix this. The
buildbot is running Window XP 64-bit in a vmware image running under ubuntu.
Is there a way to display the winsound test on this machine? I'm annoyed
by the red color ;-).