I'll answer with general concerns.
A dedicated builder API that allocates the unicode object at the end is really a good idea (PyUnicode_Join is really too slow for high-performance string building)
The builder itself should ideally be a stack variable (even if the allocated string payload is malloc'ed)
There could be separate builder types:
- UCS1, UCS2 and UCS4 builder types (for when you know the width upfront)
- a dynamic width builder type
builders should support presizing and/or reserving more data on the fly
builders should support variants of appending with or without implicit reallocation (the latter, for the case where the right size is fully preallocated)
I'm biased, but I suggest you look at Arrow's BufferBuilder API (C++, but should be relatively to do a C equivalent): https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer_builder.h#L...
It has been serving us well.
Separately from the builder API, there are cases where the data already exists somewhere as a full-blown UTF8 string (this is of course more and more common, since UTF8 is ubiquitous). There should be a fast conversion method from a UTF8 memory area to a unicode object.
Regards
Antoine.
Le 10/06/2021 à 16:48, Antonio Cuni a écrit :
Hi all, at the language summit many people told me that the HPy team should try to communicate more with the CPython developers, so let's try :).
In HPy we want to design an API to build bytes/str objects in two steps, to avoid the problem that currently in CPython they are not really immutable.
Before making any proposal, I spent quite a lot of time in researching how the current API are used to construct bytes/str objects, and I summarized my results here: https://docs.hpyproject.org/en/latest/misc/str-builder-api.html I think that my survey could be interesting for the people in this ML, independently of HPy.
That said, I also opened an issue where to discuss concrete proposals for the HPy API to do that: https://github.com/hpyproject/hpy/issues/214
I would be glad to receive comments and suggestions about that, and especially to know whether I missed some important use case in my analysis.
Also, if you think that these kind of mails are off-topic in this ML, please let me know and I'll stop.
Antonio
capi-sig mailing list -- capi-sig@python.org To unsubscribe send an email to capi-sig-leave@python.org https://mail.python.org/mailman3/lists/capi-sig.python.org/ Member address: antoine@python.org