Hi all!
I'm trying to write TargetParser in Cython just to compare perfomance.
The problem is with data types. If I define data method as "def
data(self, char *data):" I'm unable to use it as TargetParser. I get
" def data(self, char *data):
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-4: ordinal not in range(128)" error. I could instance and directly
call data() and close() methods and everything works fine, but it
refuses to work with lxml. Small testcase following:
----- _target.pyx -----------
cdef class Target:
cdef list _data
def __init__(self):
self._data = []
def data(self, char *data):
self._data.append(data)
def close(self):
return ''.join(self._data)
---- end of target.pyx ------
---- test.py -------
# -*- encoding: utf-8 -*-
import lxml.html
from lxml import etree
from _target import Target
res = etree.HTML(u"<span>ABCD</span>",
parser=lxml.html.HTMLParser(target = Target()))
-------end of target.pyx ------