data:image/s3,"s3://crabby-images/175c3/175c3a322cb1ac9cb31e9951254c7c695c996391" alt=""
Hi I'm reading csv file encoded in cp1250 using csv python module. Then translate it to xml. This is the code I use .... for i in range(len(row)): child=etree.SubElement(jednostka, csvHeaders[i]) child.text=unicode(row[csvHeaders[i]].strip(), 'cp1250') print type(child.tag), child.tag, type(child.text), child.text .... the national characters can appear in some child.text but not in all of them. It depends on the data. But generally all the child.text should be encoded to unicode (?), but it is not the case. Only the data with national characters are encoded, the rest is of type str. Why? Is lxml selective in that case? But that looks strange, example: <a> <b>this is english</b> </a> <a> <b>źdźbło - polish</b> </a> and the lxml type representation of the elements looks like this: all tags are <str> and it is ok but the 'this is english' text is of type <str> and 'źdźbło - polish' is of type <unicode>. Is it normal? Finally, after serialization to xml, utf-8 encoded file looks ok, national characters are ok etc, so maybe it is not a problem but anyway I'm curious what is going on. P.