[Python-Dev] Re: String methods... finally

Fredrik Lundh fredrik at pythonware.com
Tue Jun 15 11:48:40 CEST 1999


> > >     a) Python raises an exception
> > >     b) result is an ordinary string object
> > >     c) result is a unicode string object
> > 
> > Well, we could take this to the extreme, and allow _every_ object to grow a
> > join method, where join attempts to cooerce to the same type.

well, I think that unicode strings and ordinary strings
should behave like "strings" where possible, just like
integers, floats, long integers and complex values be-
have like "numbers" in many (but not all) situations.

if we make unicode strings easier to mix with ordinary
strings, we don't necessarily have to make integers and
lists easier to mix with strings too...

(people who want that can use Tcl instead ;-)

> I think i'd agree with Mark's answer for this situation, though
> i don't know about adding 'join' methods to other types.  I see two
> arguments that can be made here:
> 
>     For b): the result should match the type of the object
>     on which the method was called.  This way the type of
>     the result more easily determinable by the programmer
>     or reader.  Also, since the type of the result is
>     immediately known to the "join" code, each member of the 
>     passed-in sequence need only be fetched once, and a
>     __getitem__-style generator can easily stand in for the
>     sequence.
> 
>     For c): the result should match the "biggest" type among
>     the operands.  This behaviour is consistent with what
>     you would get if you added all the operands together.
>     Unfortunately this means you have to see all the operands
>     before you know the type of the result, which means you
>     either scan twice or convert potentially the whole result.
> 
> b) weighs more strongly in my opinion, so i think the right
> thing to do is to match the type of the separator.
> 
> (But if a Unicode string contains characters outside of
> the Latin-1 range, is it supposed to raise an exception
> on an attempt to convert to an ordinary string?  In that
> case, the actual behaviour of the above example would be
> a) and i'm not sure if that would get annoying fast.)

exactly.  there are some major issues hidden in here,
including:

1) what should "str" do for unicode strings?
2) should join really try to convert its arguments?
3) can "str" really raise an exception for a built-in type?
4) should code written by americans fail when used
   in other parts of the world?

based on string-sig input, the unicode class currently
solves (1) by returning a UTF-8 encoded version of the
unicode string contents.  this was chosen to make sure
that the answer to (3) is "no, never", and that the an-
swer (4) is "not always, at least" -- we've had enough of
that, thank you:
http://www.lysator.liu.se/%e5ttabitars/7bit-example.txt

if (1) is a reasonable solution (I think it is), I think the
answer to (2) should be no, based on the rule of least
surprise.  Python has always required me to explicitly
state when I want to convert things in a way that may
radically change their meaning.  I see little reason to
abandon that in 1.6.

</F>





More information about the Python-Dev mailing list