[Python-Dev] Re: [Python-checkins] python/dist/src/Objects unicodeobject.c, 2.219, 2.220

M.-A. Lemburg mal at egenix.com
Fri Aug 27 16:47:49 CEST 2004


Tim Peters wrote:
> [M.-A. Lemburg]
> 
>>Hmm, you've now made PyUnicode_Join() to work with iterators
>>whereas PyString_Join() only works for sequences.
> 
> 
> They have both worked with iterators since the release in which
> iterators were introduced.  Nothing changed now in this respect.
> 
 >
>>What are the performance implications of this for PyUnicode_Join() ?
> 
> 
> None.
> 
> 
>>Since the string and Unicode implementations have to be in sync,
>>we'd also need to convert PyString_Join() to work on iterators.
> 
> 
> It already does.  I replied earlier this week on the same topic --
> maybe you didn't see that, or maybe you misunderstand what
> PySequence_Fast does.

Indeed. At the time Fredrik added this API, it was optimized
for lists and tuples and had a fallback mechanism for arbitrary
sequences. Didn't know that it now also works for iterators. Nice !

>>Which brings up the second question:
>>What are the performance implications of this for PyString_Join() ?
> 
> 
> None.
> 
> 
>>The join operation is a widely used method, so both implementations
>>need to be as fast as possible. It may be worthwhile making the
>>PySequence_Fast() approach a special case in both routines and
>>using the iterator approach as fallback if no sequence is found.
> 
> 
> string_join uses PySequence_Fast already; the Unicode join didn't, and
> still doesn't.  In the cases of exact list or tuple arguments,
> PySequence_Fast would be quicker in Unicode join.  But in any cases
> other than those,  PySequence_Fast materializes a concrete tuple
> containing the full materialized iteration, so could be more
> memory-consuming.  That's probably a good tradeoff, though.

Indeed. I'd opt for going the PySequence_Fast() way
for Unicode as well.

>>Note that PyString_Join() with iterator support will also
>>have to be careful about not trying to iterate twice,
> 
> 
> It already is.  Indeed, the primary reason it uses PySequence_Fast is
> to guarantee that it never iterates over an iterator argument more
> than once.  The Unicode join doesn't have that potential problem.
> 
> 
>>so it will have to use a similiar logic to the one applied
>>in PyString_Format() where the work already done up to the
>>point where it finds a Unicode string is reused when calling
>>PyUnicode_Format().
> 
> 
>>>>def g():
> 
> ...     for piece in 'a', 'b', u'c', 'd': # force Unicode promotion on 3rd yield
> ...         yield piece
> ...
> 
>>>>' '.join(g())
> 
> u'a b c d'

Nice :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 27 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list