Hi guys,

I try to parse html encoded in 'iso-8859-2' and with xpath want to get a specific content. The content I usually get with xpath is python unicode, but in this case it does not contain unicode code points but characters encoded in 'iso-8859-2' just like it was never decoded and put in unicode object  as it is.
Let's take for example this url: 'http://www.pkm.jaworzno.pl/rozklady/rozklad.php?kat=302_20100628&nr=14&kier=1', and do something in command line:

>>> from lxml import html
>>> import urllib2
>>> root = html.parse(urllib2.urlopen('http://www.pkm.jaworzno.pl/rozklady/rozklad.php?kat=302_20100628&nr=14&kier=1'))
>>> root.docinfo.encoding
'iso-8859-2'
>>> header = root.xpath('/html/body/center/center[1]/table/tr/td/table')[3].text_content().strip()
>>> header
u'Soboty, niedziele i \xb6wi\xeata'
>>> uc = u'Soboty, niedziele i ¶wiêta'
>>> uc
u'Soboty, niedziele i \u015bwi\u0119ta'
>>> uc == header
False

I expect header and uc variables to be equal but they're not, while uc is the actual unicode representation of my string.
I use this code in a script and run it on Windows with english locale and the script has # -*- coding: utf-8 -*- directive.
Interesting thing is that the script passes the compassion uc==header on http://www.pkm.jaworzno.pl/rozklady/rozklad.php?kat=302_20100628&nr=13&kier=1 but does not pass on http://www.pkm.jaworzno.pl/rozklady/rozklad.php?kat=302_20100628&nr=14&kier=1. Needless to say, the content I try to get (Soboty, niedziele i ¶wiêta) on both pages is binary the same, as well as declared encoding and they both render correctly in a web browser.
Can anybody help me with this?

OS: Windows XP (english) 32 bit
Python: 2.6.5
lxml.etree:        (2, 2, 0, 0)
libxml used:       (2, 7, 2)
libxml compiled:   (2, 7, 2)
libxslt used:      (1, 1, 24)
libxslt compiled:  (1, 1, 24)

Regards
Piotr