hi... My problem is i suppose well known, but i couldnt find any soultion through my searches... I have a regular html link with ? and an &. When i print the variable in pyhton, it looks fine... (like: http://www.somelink.com/site.html?param1=test¶m2=hello), BUT when i add it to my root xml element with: adId1 = etree.SubElement(tagAd, "originalAdUrl") adId1.text = adUrl and then later write the xml to a file with this: toStringValue = etree.tostring(xmlTagRoot, encoding="utf-8", method="xml", xml_declaration=True, pretty_print=True) ... the tag has as its value the link with an & instead of & !! How can i use the correct signs for persistant storage in a xml file...? thank you very much.. -- Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
On Thu, 04 Dec 2008 12:46:34 +0100, Daniel Jirku <nepi@gmx.ch> wrote:
hi...
My problem is i suppose well known, but i couldnt find any soultion through my searches...
I have a regular html link with ? and an &. When i print the variable in pyhton, it looks fine... (like: http://www.somelink.com/site.html?param1=test¶m2=hello), BUT when i add it to my root xml element with: adId1 = etree.SubElement(tagAd, "originalAdUrl") adId1.text = adUrl
and then later write the xml to a file with this: toStringValue = etree.tostring(xmlTagRoot, encoding="utf-8", method="xml", xml_declaration=True, pretty_print=True) ...
the tag has as its value the link with an & instead of & !! How can i use the correct signs for persistant storage in a xml file...?
The XML Processor has correctly escaped your "&" character. If you deserialise (aka load) the file with a XML Parser of your choice, it will restore your "&" character. see http://en.wikipedia.org/wiki/Character_encodings_in_HTML#XML_character_entit... --dirk
when str is the html I use: htmldecode( unicode(str,'utf-8') ).encode('utf-8') import re from htmlentitydefs import name2codepoint # This pattern matches a character entity reference (a decimal numeric # references, a hexadecimal numeric reference, or a named reference). charrefpat = re.compile(r'&(#(\d+|x[\da-fA-F]+)|[\w.:-]+);?') def htmldecode(text): """Decode HTML entities in the given text.""" if type(text) is unicode: uchr = unichr else: uchr = lambda value: value > 255 and unichr(value) or chr(value) def entitydecode(match, uchr=uchr): entity = match.group(1) if entity.startswith('#x'): return uchr(int(entity[2:], 16)) elif entity.startswith('#'): return uchr(int(entity[1:])) elif entity in name2codepoint: return uchr(name2codepoint[entity]) else: return match.group(0) return charrefpat.sub(entitydecode, text) On Thu, 2008-12-04 at 12:57 +0100, Dirk Rothe wrote:
On Thu, 04 Dec 2008 12:46:34 +0100, Daniel Jirku <nepi@gmx.ch> wrote:
hi...
My problem is i suppose well known, but i couldnt find any soultion through my searches...
I have a regular html link with ? and an &. When i print the variable in pyhton, it looks fine... (like: http://www.somelink.com/site.html?param1=test¶m2=hello), BUT when i add it to my root xml element with: adId1 = etree.SubElement(tagAd, "originalAdUrl") adId1.text = adUrl
and then later write the xml to a file with this: toStringValue = etree.tostring(xmlTagRoot, encoding="utf-8", method="xml", xml_declaration=True, pretty_print=True) ...
the tag has as its value the link with an & instead of & !! How can i use the correct signs for persistant storage in a xml file...?
The XML Processor has correctly escaped your "&" character. If you deserialise (aka load) the file with a XML Parser of your choice, it will restore your "&" character.
see http://en.wikipedia.org/wiki/Character_encodings_in_HTML#XML_character_entit...
--dirk _______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev -- Sérgio M. B.
participants (3)
-
Daniel Jirku
-
Dirk Rothe
-
Sergio Monteiro Basto