
Eric Snow schrieb am 04.02.22 um 17:35:
On Fri, Feb 4, 2022 at 8:21 AM Stefan Behnel wrote:
Correct. We (intentionally) have our own way to intern strings and do not depend on CPython's identifier framework.
You're talking about __Pyx_StringTabEntry (and __Pyx_InitString())?
Yes, that's what we generate. The C code parsing is done here: https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978b... The deduplication is a bit complex on our side because it needs to handle Python source encodings, and also distinguishes between identifiers (that become 'str' in Py2), plain Unicode strings and byte strings. You don't need most of that for plain C code. But it's done here: https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978b... And then there's a whole bunch of code that helps in getting Unicode character code points and arbitrary byte values in very long strings pushed through C compilers, while keeping it mostly readable for interested users. :) https://github.com/cython/cython/blob/master/Cython/Compiler/StringEncoding.... You probably don't need that either, as long as you only deal with ASCII strings. Any way, have fun. Feel free to ask if I can help. Stefan