[Python-Dev] Auto-str and auto-unicode in join
Tim Peters
tim.peters at gmail.com
Sun Aug 29 03:51:48 CEST 2004
If we were to do auto-str, it would be better to rewrite str.join() as
a 1-pass algorithm, using the kind of "double allocated space as
needed" gimmick unicode.join uses. It would be less efficient if
auto-promotion to Unicode turns out to be required, but it's hard to
measure how little I care about that; it might be faster if auto-str
and Unicode promotion aren't needed (as only 1 pass would be needed).
auto-str couldn't really *mean* string.join(map(str, seq)) either.
The problem with the latter is that if a seq element x is a unicode
instance, str(x) will convert it into an encoded (8-bit) str, which
would not be backward compatible. So the logic would be more (in
outline):
class string:
def join(self, seq):
seq = PySequence_Fast(seq)
if seq is NULL:
return NULL
if len(seq) == 0:
return ""
elif len(seq) == 1 and type(seq[0]) is str:
return seq[0]
allocate a string object with (say) 100 bytes of space
let p point to the first free byte
for x in seq:
if type(x) is str:
copy x's guts into p, getting more space if needed
elif isinstance(x, unicode):
return unicode,join(self, seq)
else:
x = PyObject_Str(x)
if x is NULL:
return NULL
copy x's guts into p, etc
if not the last element:
copy the separator's guts into p, etc
cut p back to the space actually used
return p's string object
Note a peculiarity: if x is neither str nor unicode, but has a
__str__ or __repr__ method that returns a unicode object,
PyObject_Str() will convert that into an 8-bit str. That may be
surprising. It would be ugly to duplicate most of the logic from
PyObject_Unicode() to try to guess whether there's "a natural" Unicode
spelling of x. I think I'd rather say "tough luck -- use unicode.join
if that's what you want".
More information about the Python-Dev
mailing list