[Python-Dev] Re: Re: [Python-checkins] python/dist/src/Objects
unicodeobject.c, 2.219, 2.220
Tim Peters
tim.peters at gmail.com
Fri Aug 27 16:30:27 CEST 2004
[M.-A. Lemburg]
> Hmm, you've now made PyUnicode_Join() to work with iterators
> whereas PyString_Join() only works for sequences.
They have both worked with iterators since the release in which
iterators were introduced. Nothing changed now in this respect.
> What are the performance implications of this for PyUnicode_Join() ?
None.
> Since the string and Unicode implementations have to be in sync,
> we'd also need to convert PyString_Join() to work on iterators.
It already does. I replied earlier this week on the same topic --
maybe you didn't see that, or maybe you misunderstand what
PySequence_Fast does.
> Which brings up the second question:
> What are the performance implications of this for PyString_Join() ?
None.
> The join operation is a widely used method, so both implementations
> need to be as fast as possible. It may be worthwhile making the
> PySequence_Fast() approach a special case in both routines and
> using the iterator approach as fallback if no sequence is found.
string_join uses PySequence_Fast already; the Unicode join didn't, and
still doesn't. In the cases of exact list or tuple arguments,
PySequence_Fast would be quicker in Unicode join. But in any cases
other than those, PySequence_Fast materializes a concrete tuple
containing the full materialized iteration, so could be more
memory-consuming. That's probably a good tradeoff, though.
> Note that PyString_Join() with iterator support will also
> have to be careful about not trying to iterate twice,
It already is. Indeed, the primary reason it uses PySequence_Fast is
to guarantee that it never iterates over an iterator argument more
than once. The Unicode join doesn't have that potential problem.
> so it will have to use a similiar logic to the one applied
> in PyString_Format() where the work already done up to the
> point where it finds a Unicode string is reused when calling
> PyUnicode_Format().
>>> def g():
... for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield
... yield piece
...
>>> ' '.join(g())
u'a b c d'
>>>
More information about the Python-Dev
mailing list