[Python-Dev] Re: [Python-checkins] python/dist/src/Objects
unicodeobject.c, 2.219, 2.220
M.-A. Lemburg
mal at egenix.com
Fri Aug 27 16:47:49 CEST 2004
Tim Peters wrote:
> [M.-A. Lemburg]
>
>>Hmm, you've now made PyUnicode_Join() to work with iterators
>>whereas PyString_Join() only works for sequences.
>
>
> They have both worked with iterators since the release in which
> iterators were introduced. Nothing changed now in this respect.
>
>
>>What are the performance implications of this for PyUnicode_Join() ?
>
>
> None.
>
>
>>Since the string and Unicode implementations have to be in sync,
>>we'd also need to convert PyString_Join() to work on iterators.
>
>
> It already does. I replied earlier this week on the same topic --
> maybe you didn't see that, or maybe you misunderstand what
> PySequence_Fast does.
Indeed. At the time Fredrik added this API, it was optimized
for lists and tuples and had a fallback mechanism for arbitrary
sequences. Didn't know that it now also works for iterators. Nice !
>>Which brings up the second question:
>>What are the performance implications of this for PyString_Join() ?
>
>
> None.
>
>
>>The join operation is a widely used method, so both implementations
>>need to be as fast as possible. It may be worthwhile making the
>>PySequence_Fast() approach a special case in both routines and
>>using the iterator approach as fallback if no sequence is found.
>
>
> string_join uses PySequence_Fast already; the Unicode join didn't, and
> still doesn't. In the cases of exact list or tuple arguments,
> PySequence_Fast would be quicker in Unicode join. But in any cases
> other than those, PySequence_Fast materializes a concrete tuple
> containing the full materialized iteration, so could be more
> memory-consuming. That's probably a good tradeoff, though.
Indeed. I'd opt for going the PySequence_Fast() way
for Unicode as well.
>>Note that PyString_Join() with iterator support will also
>>have to be careful about not trying to iterate twice,
>
>
> It already is. Indeed, the primary reason it uses PySequence_Fast is
> to guarantee that it never iterates over an iterator argument more
> than once. The Unicode join doesn't have that potential problem.
>
>
>>so it will have to use a similiar logic to the one applied
>>in PyString_Format() where the work already done up to the
>>point where it finds a Unicode string is reused when calling
>>PyUnicode_Format().
>
>
>>>>def g():
>
> ... for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield
> ... yield piece
> ...
>
>>>>' '.join(g())
>
> u'a b c d'
Nice :-)
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 27 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list