Re: [Cython] [cython] Initial startswith / endswith optimization (#35)
On Wed, May 25, 2011 at 12:37 PM, jpe <reply+i-954759-90be1c778e144f2c17b3665667d3d62b01062479@reply.github.com> wrote:
This optimizes startswith / endwith optimization for str.
Cool.
What's unclear to me is how str will be mapped to either bytes or unicode; I assume at some point cython will have a python3 syntax mode where str is unicode, print is a function, etc (if it doesn't have one already). Should I be using the type name bytes instead of str?
I'm glad you're thinking about this question, some explanation of the various string types is at http://wiki.cython.org/enhancements/stringliterals Probably the way to do this is have one optimization for bytes, one for unicode, and then have a third type for str that dispatches to the one or the other depending on the python version (using #define). - Robert
On 5/25/11 3:51 PM, Robert Bradshaw wrote:
I'm glad you're thinking about this question, some explanation of the various string types is at http://wiki.cython.org/enhancements/stringliterals
Probably the way to do this is have one optimization for bytes, one for unicode, and then have a third type for str that dispatches to the one or the other depending on the python version (using #define).
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually? Thanks, John
On Wed, May 25, 2011 at 1:41 PM, John Ehresman <jpe@wingware.com> wrote:
On 5/25/11 3:51 PM, Robert Bradshaw wrote:
I'm glad you're thinking about this question, some explanation of the various string types is at http://wiki.cython.org/enhancements/stringliterals
Probably the way to do this is have one optimization for bytes, one for unicode, and then have a third type for str that dispatches to the one or the other depending on the python version (using #define).
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually?
Yes.
Robert Bradshaw, 25.05.2011 22:52:
On Wed, May 25, 2011 at 1:41 PM, John Ehresman wrote:
On 5/25/11 3:51 PM, Robert Bradshaw wrote:
I'm glad you're thinking about this question, some explanation of the various string types is at http://wiki.cython.org/enhancements/stringliterals
Probably the way to do this is have one optimization for bytes, one for unicode, and then have a third type for str that dispatches to the one or the other depending on the python version (using #define).
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually?
Yes.
Well, minus those that are not portable. For example, the return type of indexing and iteration is the C type "Py_UCS4" for unicode, but the Python type "str" (i.e. bytes/unicode) for "str". I also didn't take a thorough look through the C-API functions for the str type in Py2 and Py3. Things certainly become more ugly when trying to optimise Python code into C for both platforms, than when leaving things at the Python type level. Stefan
On Thu, May 26, 2011 at 12:27 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Robert Bradshaw, 25.05.2011 22:52:
On Wed, May 25, 2011 at 1:41 PM, John Ehresman wrote:
On 5/25/11 3:51 PM, Robert Bradshaw wrote:
I'm glad you're thinking about this question, some explanation of the various string types is at http://wiki.cython.org/enhancements/stringliterals
Probably the way to do this is have one optimization for bytes, one for unicode, and then have a third type for str that dispatches to the one or the other depending on the python version (using #define).
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually?
Yes.
Well, minus those that are not portable. For example, the return type of indexing and iteration is the C type "Py_UCS4" for unicode, but the Python type "str" (i.e. bytes/unicode) for "str". I also didn't take a thorough look through the C-API functions for the str type in Py2 and Py3. Things certainly become more ugly when trying to optimise Python code into C for both platforms, than when leaving things at the Python type level.
I was referring to Python-level things like startswith. On this note, the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py. We floated such "overlay" ideas way back in the day. - Robert
Robert Bradshaw, 26.05.2011 09:40:
the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py.
Right. All that would really be needed is a way to define default values for arguments of builtin methods. Then most of the method optimisations could be moved into Builtin.py. Stefan
On 05/26/2011 10:12 AM, Stefan Behnel wrote:
Robert Bradshaw, 26.05.2011 09:40:
the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py.
Right. All that would really be needed is a way to define default values for arguments of builtin methods. Then most of the method optimisations could be moved into Builtin.py.
BTW, the idea of the overlay stuff Robert referred to was that we could add syntax to pxd files so that the "unicode" type and its alternative method implementations could be fleshed out in a pxd file (and the same with other standard library or third-party types that are not written with Cython support in mind, but may have a C API that we want to dispatch to instead of their Python API). Dag Sverre
Dag Sverre Seljebotn, 26.05.2011 10:24:
On 05/26/2011 10:12 AM, Stefan Behnel wrote:
Robert Bradshaw, 26.05.2011 09:40:
the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py.
Right. All that would really be needed is a way to define default values for arguments of builtin methods. Then most of the method optimisations could be moved into Builtin.py.
BTW, the idea of the overlay stuff Robert referred to was that we could add syntax to pxd files so that the "unicode" type and its alternative method implementations could be fleshed out in a pxd file (and the same with other standard library or third-party types that are not written with Cython support in mind, but may have a C API that we want to dispatch to instead of their Python API).
Well, dispatching to existing C-API functions is easy and can be done in Builtin.py. Moving that to .pxd files or not is a minor detail. The problems, as in this case, are that we often need to implement our own C-API and that we need to match defaulted arguments to suitable default values. I guess this could be done with inline methods, once we have something like overlay types in .pxd files. What's the status of Cython implemented utility code, BTW? Stefan
On 05/26/2011 10:43 AM, Stefan Behnel wrote:
Dag Sverre Seljebotn, 26.05.2011 10:24:
On 05/26/2011 10:12 AM, Stefan Behnel wrote:
Robert Bradshaw, 26.05.2011 09:40:
the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py.
Right. All that would really be needed is a way to define default values for arguments of builtin methods. Then most of the method optimisations could be moved into Builtin.py.
BTW, the idea of the overlay stuff Robert referred to was that we could add syntax to pxd files so that the "unicode" type and its alternative method implementations could be fleshed out in a pxd file (and the same with other standard library or third-party types that are not written with Cython support in mind, but may have a C API that we want to dispatch to instead of their Python API).
Well, dispatching to existing C-API functions is easy and can be done in Builtin.py. Moving that to .pxd files or not is a minor detail. The problems, as in this case, are that we often need to implement our own C-API and that we need to match defaulted arguments to suitable default values. I guess this could be done with inline methods, once we have something like overlay types in .pxd files.
What's the status of Cython implemented utility code, BTW?
It's still bit-rotting in Kurt's branch, but it is working there. The plans are that I set aside a couple of days in June for merging that branch. I need to do so before Mark resumes his GSoC in July, so there's some pressure on me to finally get it done. Dag Sverre
On Thu, May 26, 2011 at 1:43 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Dag Sverre Seljebotn, 26.05.2011 10:24:
On 05/26/2011 10:12 AM, Stefan Behnel wrote:
Robert Bradshaw, 26.05.2011 09:40:
the pattern of swapping out builtin methods (and perhaps functions) with more optimized C versions is something that perhaps it would be good to be able to do more generally, rather than hard coding the list into Optimize.py.
Right. All that would really be needed is a way to define default values for arguments of builtin methods. Then most of the method optimisations could be moved into Builtin.py.
BTW, the idea of the overlay stuff Robert referred to was that we could add syntax to pxd files so that the "unicode" type and its alternative method implementations could be fleshed out in a pxd file (and the same with other standard library or third-party types that are not written with Cython support in mind, but may have a C API that we want to dispatch to instead of their Python API).
Well, dispatching to existing C-API functions is easy and can be done in Builtin.py. Moving that to .pxd files or not is a minor detail. The problems, as in this case, are that we often need to implement our own C-API and that we need to match defaulted arguments to suitable default values. I guess this could be done with inline methods, once we have something like overlay types in .pxd files.
Yes, I was thinking that many of these could be implemented in Cython. We sometimes do tricky stuff with #defines though, but usually on the Python version which could possibly be supported somehow (?). At least, as we move away from the low-level ones and start expanding, implementing them in Cython makes more and more sense.
What's the status of Cython implemented utility code, BTW?
Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
On 5/26/11 3:27 AM, Stefan Behnel wrote:
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually?
Yes.
Well, minus those that are not portable. For example, the return type of indexing and iteration is the C type "Py_UCS4" for unicode, but the Python type "str" (i.e. bytes/unicode) for "str". I also didn't take a thorough look through the C-API functions for the str type in Py2 and Py3. Things certainly become more ugly when trying to optimise Python code into C for both platforms, than when leaving things at the Python type level.
Would it work for these methods to return Py_UCS4 in all 3 cases (unicode, bytes, str)? In the bytes case, the multibyte int would simply be cast to char if that was what it was assigned to but the value wouldn't be above 255 in any case. The case I worry about is losing optimizations w/ a Python3 runtime if str is used rather than unicode. John
John Ehresman, 26.05.2011 22:02:
On 5/26/11 3:27 AM, Stefan Behnel wrote:
I think this means that the current unicode optimizations aren't used when variables are declared as str and a python 3 runtime is used. Should all unicode optimizations support str eventually?
Yes.
Well, minus those that are not portable. For example, the return type of indexing and iteration is the C type "Py_UCS4" for unicode, but the Python type "str" (i.e. bytes/unicode) for "str". I also didn't take a thorough look through the C-API functions for the str type in Py2 and Py3. Things certainly become more ugly when trying to optimise Python code into C for both platforms, than when leaving things at the Python type level.
Would it work for these methods to return Py_UCS4 in all 3 cases (unicode, bytes, str)?
There are two sides to this: what the C compiler eventually sees and what Cython makes of the types internally. Letting Cython assume that the result is Py_UCS4 is incorrect in the Py2 case. Amongst other problems, it would make the value turn into a unicode string when coercing to a Python object.
In the bytes case, the multibyte int would simply be cast to char if that was what it was assigned to but the value wouldn't be above 255 in any case.
Sure it could, "str" is unicode in Py3, so you get a Unicode string with all possible values, e.g. when using unicode escapes.
The case I worry about is losing optimizations w/ a Python3 runtime if str is used rather than unicode.
You should expect that. If you want optimised code, use a suitable type. Stefan
participants (4)
-
Dag Sverre Seljebotn -
John Ehresman -
Robert Bradshaw -
Stefan Behnel