Re: Automatic PyUnicode to 'const char*'
To: c++-sig@python.org From: David Abrahams <dave@boost-consulting.com> Date: Thu, 31 Jul 2003 12:18:34 -0400 Subject: [C++-sig] Re: Automatic PyUnicode to 'const char*' Reply-To: c++-sig@python.org
Stefan Seefeld <seefeld@sympatico.ca> writes:
David Abrahams wrote:
"Lijun Qin" <qinlj@solidshare.com> writes:
Hi all,
I'm using boost.python to wrap the WTL (Windows Template Libaray), using VC 7.1 and porting some old code previously use win32ui.pyd. Basically it is easy to do, though gccxml failed to parse the ATL/WTL code so I can not use Pyste. But there is a trouble, does anybody know how to automaticly convert PyUnicode to 'const char *'? Without this, I have to change lot of code lines to explictly use str() function. I'm not an expert in it, but I thought Unicode used 16- or 32- byte wchar_t characters. How would you convert it to char const*?
unicode allows different encodings, some (such as utf-8) with variably sized character representations. This means that conversions usually can't be done on-the-fly without helper constructs, i.e. some memory management is needed.
As an example, I'm doing such conversions in a xml library. The C implementation uses 'xmlChar *', which is just an alias for 'char *', but really holds utf-8 encoded text. I convert it to various string types (the public API is parametrized for the string type):
To do that kind of narrowing, at the moment, the only way would be to write a thin wrapper function:
void f(char const*);
void f_thin_wrapper(object unicode) { str narrowed(unicode); f(extract<char const*>(str)); }
we may have some support for doing this automatically in the version of Boost.Python which integrates with Luabind, but that's a ways off yet.
-- Dave Abrahams Boost Consulting www.boost-consulting.com
Hi, all: I have found a method to do this, by register a convert function: inline void* convert_to_cstring_from_unicode(PyObject* obj) { if (!PyUnicode_Check(obj)) return 0; PyObject* str = PyUnicode_AsEncodedString(obj, "mbcs", NULL); if (!str) throw_error_already_set(); //We must release the str before we return to python, gard here static leak_gard _gard; unicode_str_map[obj] = str; return PyString_AsString(str); } converter::registry::insert(convert_to_cstring_from_unicode, type_id<char>()); But the problem is that the PyString object must be freed when the call is completed, I currently do this by applying a custom call policy to the methods using 'const char*' type of parameters, but if there were reclusive calls into the same fuction, it'll much complex. In this procedure, I found that if we can apply the call policies before the argument conversion procedure and in the context of the call policy object (a call policy object is always attached with a method, right?), the problem will be solved much easier and safer. We'll be able to replace the PyUnicode object with a PyString object (in Windows platform, always MBCS encoding), maybe save the original args tuple in the call policy object, then convert the args to C++ (it'll success because the arg is a PyString) and call the C++ function, when postcall(args, result) called, we can just release the newly allocated args tuple, replace it with the original one. This will give us more control on how the args be processed. Lijun Qin http://www.solidshare.com
"Lijun Qin" <qinlj@solidshare.com> writes:
Hi, all:
I have found a method to do this, by register a convert function:
inline void* convert_to_cstring_from_unicode(PyObject* obj) { if (!PyUnicode_Check(obj)) return 0; PyObject* str = PyUnicode_AsEncodedString(obj, "mbcs", NULL); if (!str) throw_error_already_set();
//We must release the str before we return to python, gard here static leak_gard _gard;
unicode_str_map[obj] = str; return PyString_AsString(str); }
converter::registry::insert(convert_to_cstring_from_unicode, type_id<char>());
But the problem is that the PyString object must be freed when the call is completed, I currently do this by applying a custom call policy to the methods using 'const char*' type of parameters, but if there were reclusive
"recursive", I think.
calls into the same fuction, it'll much complex.
In this procedure, I found that if we can apply the call policies before the argument conversion procedure and in the context of the call policy object (a call policy object is always attached with a method, right?)
Right.
the problem will be solved much easier and safer. We'll be able to replace the PyUnicode object with a PyString object (in Windows platform, always MBCS encoding), maybe save the original args tuple in the call policy object
This is the key thing which I had forgotten: call policies can have state. You can keep a stack (linked list) of "original args tuples" in the call policy in order to deal with recursion.
then convert the args to C++ (it'll success because the arg is a PyString) and call the C++ function, when postcall(args, result) called, we can just release the newly allocated args tuple, replace it with the original one. This will give us more control on how the args be processed.
Yes, that sounds brilliant! So this approach does not use the convert_to_cstring_from_unicode converter, correct? I think it has inspired me, also, to consider expanding the conversion interface. I was looking for a way for converters to maintain state (e.g. to be able to keep a PyString alive during the conversion, or to be able to fabricate a vector<int> from a Python list and then write the modified elements of the vector back into the list after the call) without incurring the cost of a polymorphic postcall operation for each converter in a function call. One possibility might be to define an abstract postcall object, and pass a pointer to a chain of those, which lives in the call policy, to all converters during function calls. Any converter which needed to maintain state could add a postcall object to the chain. That would essentially reduce the postcall check to a single step for N arguments, in the usual case that there were no postcall operations. In non-function contexts, such as char const* p = extract<char*>(s); the pointer to the postcall chain would be NULL, so any converters which required a postcall chain could simply report "no match". -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (2)
-
David Abrahams -
Lijun Qin