At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, Phillip J. Eby <pje(a)telecommunity.com> wrote:
> > I didn't mean that it was the only purpose. In Python 2.x, practical code
> > has to sometimes deal with "string-like" objects. That is, code that takes
> > either strings or unicode. If such code calls bytes(), it's going to want
> > to include an encoding so that unicode conversions won't fail.
>
>That sounds like a rather hypothetical example. Have you thought it
>through? Presumably code that accepts both str and unicode either
>doesn't care about encodings, but simply returns objects of the same
>type as the arguments -- and then it's unlikely to want to convert the
>arguments to bytes; or it *does* care about encodings, and then it
>probably already has to special-case str vs. unicode because it has to
>control how str objects are interpreted.
Actually, it's the other way around. Code that wants to output
uninterpreted bytes right now and accepts either strings or Unicode has to
special-case *unicode* -- not str, because str is the only "bytes type" we
currently have.
This creates an interesting issue in WSGI for Jython, which of course only
has one (unicode-based) string type now. Since there's no bytes type in
Python in general, the only solution we could come up with was to treat
such strings as latin-1:
http://www.python.org/peps/pep-0333.html#unicode-issues
This is why I'm biased towards latin-1 encoding of unicode to bytes; it's
"the same thing" as an uninterpreted string of bytes.
I think the difference in our viewpoints is that you're still thinking
"string" thoughts, whereas I'm thinking "byte" thoughts. Bytes are just
bytes; they don't *have* an encoding.
So, if you think of "converting a string to bytes" as meaning "create an
array of numerals corresponding to the characters in the string", then this
leads to a uniform result whether the characters are in a str or a unicode
object. In other words, to me, bytes(str_or_unicode) should be treated as:
bytes(map(ord, str_or_unicode))
In other words, without an encoding, bytes() should simply treat str and
unicode objects *as if they were a sequence of integers*, and produce an
error when an integer is out of range. This is a logical and consistent
interpretation in the absence of an encoding, because in that case you
don't care about the encoding - it's just raw data.
If, however, you include an encoding, then you're stating that you want to
encode the *meaning* of the string, not merely its integer values.
>What would bytes("abc\xf0", "latin-1") *mean*? Take the string
>"abc\xf0", interpret it as being encoded in XXX, and then encode from
>XXX to Latin-1. But what's XXX? As I showed in a previous post,
>"abc\xf0".encode("latin-1") *fails* because the source for the
>encoding is assumed to be ASCII.
I'm saying that XXX would be the same encoding as you specified. i.e.,
including an encoding means you are encoding the *meaning* of the string.
However, I believe I mainly proposed this as an alternative to having
bytes(str_or_unicode) work like bytes(map(ord,str_or_unicode)), which I
think is probably a saner default.
>Your argument for symmetry would be a lot stronger if we used Latin-1
>for the conversion between str and Unicode. But we don't.
But that's because we're dealing with its meaning *as a string*, not merely
as ordinals in a sequence of bytes.
> I like the
>other interpretation (which I thought was yours too?) much better: str
><--> bytes conversions don't use encodings by simply change the type
>without changing the bytes;
I like it better too. The part you didn't like was where MAL and I believe
this should be extended to Unicode characters in the 0-255 range also. :)
>There's one property that bytes, str and unicode all share: type(x[0])
>== type(x), at least as long as len(x) >= 1. This is perhaps the
>ultimate test for string-ness.
>
>Or should b[0] be an int, if b is a bytes object? That would change
>things dramatically.
+1 for it being an int. Heck, I'd want to at least consider the
possibility of introducing a character type (chr?) in Python 3.0, and
getting rid of the "iterating a string yields strings"
characteristic. I've found it to be a bit of a pain when dealing with
heterogeneous nested sequences that contain strings.
>There's also the consideration for APIs that, informally, accept
>either a string or a sequence of objects. Many of these exist, and
>they are probably all being converted to support unicode as well as
>str (if it makes sense at all). Should a bytes object be considered as
>a sequence of things, or as a single thing, from the POV of these
>types of APIs? Should we try to standardize how code tests for the
>difference? (Currently all sorts of shortcuts are being taken, from
>isinstance(x, (list, tuple)) to isinstance(x, basestring).)
I'm inclined to think of certain features at least in terms of the buffer
interface, but that's not something that's really exposed at the Python level.
Since I was on a streak of implementing not-quite-the-right-thing, I checked
in my PEP 308 implementation *with* backward compatibility -- just to spite
Guido's latest change to the PEP. It jumps through a minor hoop (two new
grammar rules) in order to be backwardly compatible, but that hoop can go
away in Python 3.0, and that shouldn't be too long from now. I apologize for
the test failures of compile, transform and parser: they seem to all depend
on the parsermodule being updated. If no one feels responsible for it, I'll
do it later in the week (I'll be sprinting until Thursday anyway.)
--
Thomas Wouters <thomas(a)xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
At PyCon, there was general reluctance for incorporating
the ast-objects branch, primarily because people where
concerned what the reference counting would do to
maintainability, and what (potentially troublesome)
options direct exposure of AST objects would do.
OTOH, the approach of creating a shadow tree did not
find opposition, so I implemented that.
Currently, you can use compile() to create an AST
out of source code, by passing PyCF_ONLY_AST (0x400)
to compile. The mapping of AST to Python objects
is as follows:
- there is a Python type for every sum, product,
and constructor.
- The constructor types inherit from their sum
types (e.g. ClassDef inherits from stmt)
- Each constructor and product type has an
_fields member, giving the names of the fields
of the product.
- Each node in the AST has members with the names
given in _fields
- If the field is optional, it might be None
- if the field is zero-or-more, it is represented
as a list.
It might be reasonable to expose this through
a separate module, in particular to provide
access to the type objects.
Regards,
Martin
Why would you change the Python scoping rules, instead of using the
function attributes, available from release 2.1 (PEP 232) ?
For example, you may write:
def incgen(start, inc):
def incrementer():
incrementer.a += incrementer.b
return incrementer.a
incrementer.a = start - inc
incrementer.b = inc
return incrementer
f = incgen(100, 2)
g = incgen(200, 3)
for i in range(5):
print f(), g()
The result is:
100 200
102 203
104 206
106 209
108 212
I just noticed that cProfile (like profile) prints to stdout. Yuck. I
guess that's to be expected because the pstats module does the actual
printing and it's used by both modules. I'm willing to give up backward
compatibility to achieve a little more sanity and flexibility here. I
propose rewriting the necessary bits to att a stream= keyword argument where
necessary and using stream.write(...) or print >> stream, ... instead of the
current bare print. I'd prefer the default for the stream be sys.stderr as
well.
Thoughts?
Skip
2006/2/25, martin(a)v.loewis.de <martin(a)v.loewis.de>:
> Translating the library reference as such is more difficult, because
> it can't be translated in small chunks very well.
The SVN directory "python/dist/src/Doc/lib/" has 276 .tex's, with an
average of 250 lines each.
Maybe manage each file independently could work.
> Some group of French translators once translated everything for 1.5.2,
> and that translation never got updated.
We're afraid of this. And that's why we think that it'd be necessary
to have some automated system that tell us if the original file got
updated, if there're new files to translate, to show the state of the
translation (in process, finished, not even started, etc...).
I think that a system like this is not so difficult, but if the wheel
is already invented...
. Facundo
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Since I implemented[*] PEP 328, Aahz suggested I take over editing the PEP,
too, as there were some minor discussion points to add still. I haven't been
around for the discussioons, though, and it's been a while for everone else,
I think, so I'd like to rehash and ask for any other open points.
The one open point that Aahz forwarded me, and is expressed somewhat in
http://mail.python.org/pipermail/python-dev/2004-September/048695.html , is
the case where you have a package that you want to transparently supply a
particular version of a module for forward/backward compatibility, replacing
a version elsewhere on sys.path (if any.) I see four distinct situations for
this:
1) Replacing a stdlib module (or a set of them) with a newer version, if the
stdlib module is too old, where you want the whole stdlib to use the
newer version.
2) Same as 1), but private to your package; modules not in your package
should get the stdlib version when they import the 'replaced' module.
3) Providing a module (or a set of them) that the stdlib might be missing
(but which will be a new enough version if it's there)
1) and 3) are easy to solve: put the module in a separate directory, insert
that into sys.path; at the front for 1), at the end for 3). Mailman, IIRC,
does this, and I think it works fine.
2) is easy if it's a single module; include it in your package and import it
relatively. If it's a package itself, it's again pretty easy; include the
package and include it relatively. The package itself is hopefully already
using relative imports to get sibling packages. If the package is using
absolute imports to get sibling packages, well, crap. I don't think we can
solve that issue whatever we do: that already breaks.
The real problem with 2) is when you have tightly coupled modules that are
not together in a package and not using relative imports, or perhaps when
you want to *partially* override a package. I would argue that tightly
coupled modules should always use relative imports, whether they are
together in a package or not (even though they should probably be in a
package anyway.) I'd also argue that having different modules import
different versions of existing modules is a bad idea. It's workable if the
modules are only used internally, but exposing anything is troublesome. for
instance, an instance of a class defined in foo (1.0) imported by bar will
not be an instance of the same class defined in foo (1.1) imported by
feeble.
Am I missing anything?
([*] incorrectly, to be sure, but I have a 'correct' version ready that I'll
upload in a second; I was trying to confuse Guido into accepting my version,
instead.)
--
Thomas Wouters <thomas(a)xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Martin and I were talking about dropping support for older versions of
Windows (of the non-NT flavor). We both thought that it was
reasonable to stop supporting Win9x (including WinME) in Python 2.6.
I updated PEP 11 to reflect this.
The Python 2.5 installer will present a warning message on the systems
which will not be supported in Python 2.6.
n
There's a problem with genexp's that I think really needs to get
fixed. See http://python.org/sf/1167751 the details are below. This
code:
>>> foo(a = i for i in range(10))
generates "NameError: name 'i' is not defined" when run because:
2 0 LOAD_GLOBAL 0 (foo)
3 LOAD_CONST 1 ('a')
6 LOAD_GLOBAL 1 (i)
9 CALL_FUNCTION 256
12 POP_TOP
13 LOAD_CONST 0 (None)
16 RETURN_VALUE
If you add parens around the code: foo(a = i for i in range(10))
You get something quite different:
2 0 LOAD_GLOBAL 0 (foo)
3 LOAD_CONST 1 ('a')
6 LOAD_CONST 2 (<code object <generator
expression> at 0x2a960baae8, file "<stdin>", line 2>)
9 MAKE_FUNCTION 0
12 LOAD_GLOBAL 1 (range)
15 LOAD_CONST 3 (10)
18 CALL_FUNCTION 1
21 GET_ITER
22 CALL_FUNCTION 1
25 CALL_FUNCTION 256
28 POP_TOP
29 LOAD_CONST 0 (None)
32 RETURN_VALUE
I agree with the bug report that the code should either raise a
SyntaxError or do the right thing.
n
Guido's on_missing() proposal is pretty good for what it is, but it is
not a replacement for set_default(). The use cases for a derivable,
definition or instantiation time framework is different than the
call-site based decision being made with setdefault(). The difference
is that in the former case, the class designer or instantiator gets to
decide what the default is, and in the latter (i.e. current) case, the
user gets to decide.
Going back to first principles, the two biggest problems with today's
setdefault() is 1) the default object gets instantiated whether you need
it or not, and 2) the idiom is not very readable.
To directly address these two problems, I propose a new method called
getdefault() with the following signature:
def getdefault(self, key, factory)
This yields the following idiom:
d.getdefault('foo', list).append('bar')
Clearly this completely addresses problem #1. The implementation is
simple and obvious, and there's no default object instantiated unless
the key is missing.
I think #2 is addressed nicely too because "getdefault()" shifts the
focus on what the method returns rather than the effect of the method on
the target dict. Perhaps that's enough to make the chained operation on
the returned value feel more natural. "getdefault()" also looks more
like "get()" so maybe that helps it be less jarring.
This approach also seems to address Raymond's objections because
getdefault() isn't "special" the way on_missing() would be.
Anyway, I don't think it's an either/or choice with Guido's subclass.
Instead I think they are different use cases. I would add getdefault()
to the standard dict API, remove (eventually) setdefault(), and add
Guido's subclass in a separate module. But I /wouldn't/ clutter the
built-in dict's API with on_missing().
-Barry
P.S.
_missing = object()
def getdefault(self, key, factory):
value = self.get(key, _missing)
if value is _missing:
value = self[key] = factory()
return value