[lxml-dev] Why can't I turn all xpath node types into string?

XML defines several node types that get returned by XPATH: http://www.w3.org/TR/xpath#data-model When I use the xpath feature in lxml and then try to read out the data, it won't work for certain node types. How can I get a string version of all node types? Examples follow. The following works: Domain/HTML: http://www.citiesxl.com/ XPATH: //div//a from lxml import etree self.xpath_list = self.tree.xpath(xpath) for entry in self.xpath_list: tmp = "" try: tmp = etree.tostring(entry) except: tmp = str(entry) print tmp In this example, xpath_list is of type: <class 'lxml.etree._Element'> The etree.tostring() function works for this type. The following will NOT work for etree.tostring() XPATH: //div//a/attribute::href It says it can't handle the type or something. In this case, xpath_list is of type: <class 'lxml.etree._ElementStringResult'> or something like that (I forget the actual type). The str() function will work on this one. The following will NOT work at all using either str() or etree.tostring() XPATH: //self::text() One of the types is: <type 'lxml.etree._ElementUnicodeResult'> and I get this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 38: ordinal not in range(128) Is there an lxml function that will convert all types to strings properly instead of trying to use this try/except hack to handle all the types? _________________________________________________________________ Bing brings you maps, menus, and reviews organized in one place. http://www.bing.com/search?q=restaurants&form=MFESRP&publ=WLHMTAG&crea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1

Hi,
When I use the xpath feature in lxml and then try to read out the data, it won't work for certain node types. How can I get a string version of all node types? Examples follow.
Please take a look at http://codespeak.net/lxml/xpathxslt.html#xpath (section xpath return values)
In this example, xpath_list is of type: <class 'lxml.etree._Element'>
The etree.tostring() function works for this type.
etree.tostring() is lxml's serialization function and takes an element or an elementtree (see api reference)
The following will NOT work for etree.tostring()
XPATH: //div//a/attribute::href
It says it can't handle the type or something. In this case, xpath_list is of type: <class 'lxml.etree._ElementStringResult'> or something like that (I forget the actual type). The str() function will work on this one.
See above.
The following will NOT work at all using either str() or etree.tostring()
XPATH: //self::text()
One of the types is:
<type 'lxml.etree._ElementUnicodeResult'> and I get this error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 38: ordinal not in range(128)
You can't safely use str() on a unicode string as this implicitly tries to encode to your python installation encoding (which is usually ascii). You might want to look at e.g. http://www.amk.ca/python/howto/unicode for background on unicode in python.
Is there an lxml function that will convert all types to strings properly instead of trying to use this try/except hack to handle all the types?
Well, you can use unicode() to convert all these to unicode strings (and afterwards encode to the encoding you need), but I suppose you want a serialized representation of an element result. So, you will have to treat the types separately, e.g. by using isinstance() tests. brgds Holger -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02
participants (2)
-
jholg@gmx.de
-
Kevin Ar18