[issue13570] Expose faster unicode<->ascii functions in the C-API

Fri Dec 9 22:12:32 CET 2011

New submission from Stefan Krah <stefan-usenet at bytereef.org>:

I just ran the telco benchmark ...

  http://www.bytereef.org/mpdecimal/quickstart.html#telco-benchmark

... on _decimal to see how the PEP-393 changes affect the module.
The benchmark reads numbers from a binary file, does some calculations
and prints the result strings to a file.

Average results (10 iterations each):

Python 2.7:            5.87s
Revision 1726fa560112: 6.07s
Revision 7ffe3d304487: 6.56s

The bottleneck in telco.py is the line that writes a Decimal to the
output file:

  outfil.write("%s\n" % t)

The bottleneck in _decimal is (res is ascii):

   PyUnicode_FromString(res);

PyUnicode_DecodeASCII(res) has the same performance.

With this function ...

  static PyObject*
unicode_fromascii(const char* s, Py_ssize_t size)
{
    PyObject *res;
    res = PyUnicode_New(size, 127);
    if (!res)
        return NULL;
    memcpy(PyUnicode_1BYTE_DATA(res), s, size);
    return res;
}

... I get the same performance as with Python 2.7 (5.85s)!

I think it would be really beneficial for C-API users to have
more ascii low level functions that don't do error checking and
are simply as fast as possible.

----------
components: Unicode
messages: 149124
nosy: ezio.melotti, haypo, loewis, skrah
priority: normal
severity: normal
status: open
title: Expose faster unicode<->ascii functions in the C-API
type: performance
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13570>
_______________________________________