re. text() vs text_content(), have you investigated including the text() as part of the XPATH expression?
re python3 vs python2, sorry it didn't work out, it was just a suggestion of a path to follow

Best, /PA

On 14 February 2018 at 15:40, Peng Yu <pengyu.ut@gmail.com> wrote:
> You might be bitten by the behaviour described in this bug report:
>         https://bugs.launchpad.net/lxml/+bug/1002581
> Maybe the workarounds sketched there are of some help for you.
> It looks like libmxml2 does different things for XML vs HTML parsing
> wrt to encodings, e.g. different default encoding assumptions
> (also depending on iconv support in your environment).
> You can see this if you try etree.parse() instead of html.parse(),
> which works for this simple example as the HTML happens to be well-formed
> XML:
> $ cat main_etree.py
> import sys
> from lxml import html, etree
> doc = etree.parse(sys.stdin)
> print doc.xpath('//div')[0].text
> $ python2.7 main_etree.py < main.html
> NT-PGC-1α

I need to use text_content() besides just 'text'. But text_content()
does not exist in etree. What is the substitute for text_content() in

$ cat main.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import sys
from lxml import etree
tree = etree.parse(sys.stdin, parser=etree.HTMLParser(encoding='utf-8'))

$ cat main.sh
#!/usr/bin/env bash
# vim: set noexpandtab tabstop=2:

./main.py <<EOF
$ ./main.sh
Traceback (most recent call last):
  File "./main.py", line 8, in <module>
AttributeError: 'lxml.etree._Element' object has no attribute 'text_content'

Mailing list for the lxml Python XML toolkit - http://lxml.de/

Fragen sind nicht da um beantwortet zu werden,
Fragen sind da um gestellet zu werden
Georg Kreisler