[Python-Dev] Auto-str and auto-unicode in join
M.-A. Lemburg
mal at egenix.com
Fri Aug 27 11:02:05 CEST 2004
Nick Coghlan wrote:
> Tim Peters wrote:
>
>> I needed a break from intractable database problems, and am almost
>> done with PyUnicode_Join(). I'm not doing auto-unicode(), though, so
>> there will still be plenty of fun left for Nick!
>
>
> I actually got that mostly working (off slightly out-of-date CVS though).
>
> Joining a sequence of 10 integers with auto-str seems to take about 60%
> of the time of a str(x) list comprehension on that same sequence (and
> the PySequence_Fast call means that a generator is slightly slower than
> a list comp!). For a sequence which mixed strings and non-strings, the
> gains could only increase.
>
> However, there is one somewhat curly problem I'm not sure what to do about.
>
> To avoid slowing down the common case of string join (a list of only
> strings) it is necessary to do the promotion to string in the type-check
> & size-calculation pass.
>
> That's fine in the case of a list that consists of only strings and
> non-basestrings, or the case of a unicode separator - every
> non-basestring is converted using either PyObject_Str or PyObject_Unicode.
>
> Where it gets weird is something like this:
> ''.join([an_int, a_unicode_str])
> u''.join([an_int, a_unicode_str])
This gives you a TypeError, so it's a non-issue (.join() does
not do an implicit call to str(obj) on the list elements).
The real issue is the case where you have [a_str, a_unicode_obj]
and for that the current implementation already does the right
thing, namely to look for Unicode objects in the length checking pass.
> In the first case, the int will first be converted to a string via
> PyObject_Str, and then that string representation is what will get
> converted to Unicode after the detection of the unicode string causes
> the join to be handed over to Unicode join.
>
> In the latter case, the int is converted directly to Unicode.
>
> So my question would be, is it reasonable to expect that
> PyObject_Unicode(PyObject_Str(some_object)) give the same answer as
> PyObject_Unicode(some_object)?
>
> If not, then the string join would have to do something whereby it kept
> a 'pristine' version of the sequence around to hand over to the Unicode
> join.
>
> My first attempt at implementing this feature had that property, but
> also had the effect of introducing about a 1% slowdown of the standard
> sequence-of-strings case (it introduced an extra if statement to see if
> a 'stringisation' pass was required after the initial type checking and
> sizing pass). For longer sequences than 10 strings, I imagine the
> relative slowdown would be much less.
>
> Hmm. . . I think I see a way to implement this, while still avoiding
> adding any code to the standard path through the function. It'd be
> slower for the case where an iterator is passed in, and we automatically
> invoke PyObject_Str but don't end up delegating to Unicode join, though,
> as it involves making a copy of the sequence that only gets used if the
> Unicode join is invoked. (If the original object is a real sequence,
> rather than an iterator, there is no extra overhead - we have to make
> the copy anyway, to avoid mutating the user's sequence).
>
> If people are definitely interested in this feature, I could probably
> put a patch together next week.
>
> Regards,
> Nick.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/mal%40egenix.com
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 27 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list