[issue26057] Avoid nonneeded use of PyUnicode_FromObject()

Serhiy Storchaka report at bugs.python.org
Sat Jan 9 03:41:42 EST 2016


New submission from Serhiy Storchaka:

In Python 2 PyUnicode_FromObject() was used for coercing 8-bit strings to unicode by decoding them with the default encoding. But in Python 3 there is no such coercing. The effect of PyUnicode_FromObject() in Python 3 is ensuring that the argument is a string and convert an instance of str subtype to exact str. The latter often is just a waste of memory and time, since resulted string is used only for retrieving UTF-8 representation or raw data. 

Proposed patch makes following things:

1. Avoids unneeded copying of string's content.
2. Avoids raising some unneeded exceptions.
3. Gets rid of unneeded incref/decref.
4. Makes some error messages more correct or informative.
5. Converts runtime checks PyBytes_Check() for results of string encoding to asserts.

Example of performance gain:

Unpatched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.404 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.723 usec per loop

Patched:
$ ./python -m timeit -s "a = 'a'*100; b = 'b'*1000" -- "a in b"
1000000 loops, best of 3: 0.383 usec per loop
$ ./python -m timeit -s "class S(str): pass" -s "a = S('a'*100); b = S('b'*1000)" -- "a in b"
1000000 loops, best of 3: 0.387 usec per loop

----------
components: Interpreter Core
files: no_unicode_copy.patch
keywords: patch
messages: 257806
nosy: haypo, ncoghlan, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Avoid nonneeded use of PyUnicode_FromObject()
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file41541/no_unicode_copy.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26057>
_______________________________________


More information about the Python-bugs-list mailing list