[lxml-dev] is tostring() confusing?
Having tostring() as a function and not a method seems a bit odd to me. I know it's from ElementTree, but at least for HTML it's awkward -- using lxml.etree.tostring on HTML is almost certain to create bad output; the output won't be real XHTML (lacking namespaces and it'll probably be invalid), and it will parse quite badly as HTML (<script src="..."/> for instance will typically break the entire page in a browser). When I was first using ElementTree, I remember being a bit baffled by the lack of a serializing method. I then found tostring and kind of forgot about it, but as I copy tostring methods around (e.g., lxml.html.tostring) it's starting to seem like a problem again. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org | Write code, do good | http://topp.openplans.org/careers
Hi Ian, Ian Bicking wrote:
Having tostring() as a function and not a method seems a bit odd to me. I know it's from ElementTree, but at least for HTML it's awkward -- using lxml.etree.tostring on HTML is almost certain to create bad output; the output won't be real XHTML (lacking namespaces and it'll probably be invalid), and it will parse quite badly as HTML (<script src="..."/> for instance will typically break the entire page in a browser).
When I was first using ElementTree, I remember being a bit baffled by the lack of a serializing method. I then found tostring and kind of forgot about it, but as I copy tostring methods around (e.g., lxml.html.tostring) it's starting to seem like a problem again.
I prefer keeping it the way it is in ET. After all, lxml.html is a module that is based on the existing ET API in lxml.etree. It would feel funny to have it one way in etree and another way in lhtml. It should be straight forward to change from one to the other depending on the problem to solve and that requires consistency. Stefan
Stefan Behnel wrote:
Hi Ian,
Ian Bicking wrote:
Having tostring() as a function and not a method seems a bit odd to me. I know it's from ElementTree, but at least for HTML it's awkward -- using lxml.etree.tostring on HTML is almost certain to create bad output; the output won't be real XHTML (lacking namespaces and it'll probably be invalid), and it will parse quite badly as HTML (<script src="..."/> for instance will typically break the entire page in a browser).
When I was first using ElementTree, I remember being a bit baffled by the lack of a serializing method. I then found tostring and kind of forgot about it, but as I copy tostring methods around (e.g., lxml.html.tostring) it's starting to seem like a problem again.
I prefer keeping it the way it is in ET. After all, lxml.html is a module that is based on the existing ET API in lxml.etree. It would feel funny to have it one way in etree and another way in lhtml. It should be straight forward to change from one to the other depending on the problem to solve and that requires consistency.
I wouldn't propose adding it to just lxml.html, but it feels missing in all contexts. That is, it seems like tostring would be a better method (on all kinds of elements) than a function. Ian
On 2007-06-08 19:09:30 +0200, Ian Bicking <ianb@colorstudy.com> said:
Having tostring() as a function and not a method seems a bit odd to me. I know it's from ElementTree, but at least for HTML it's awkward -- using lxml.etree.tostring on HTML is almost certain to create bad output; the output won't be real XHTML (lacking namespaces and it'll probably be invalid), and it will parse quite badly as HTML (<script src="..."/> for instance will typically break the entire page in a browser).
When I was first using ElementTree, I remember being a bit baffled by the lack of a serializing method. I then found tostring and kind of forgot about it, but as I copy tostring methods around (e.g., lxml.html.tostring) it's starting to seem like a problem again.
What I wonder about is, why str(tree) or unicode(tree) isn't supported. I see that str/unicode cannot have arguments (i.e. pretty-print, encoding). But still there are suitable defaults, are there not. Regards -- Christian Zagrodnick gocept gmbh & co. kg · forsterstrasse 29 · 06112 halle/saale www.gocept.com · fon. +49 345 12298894 · fax. +49 345 12298891
Hi, Christian Zagrodnick wrote:
What I wonder about is, why str(tree) or unicode(tree) isn't supported. I see that str/unicode cannot have arguments (i.e. pretty-print, encoding). But still there are suitable defaults, are there not.
It's not really the defaults (plain UTF-8, sure), it's more of a concern about having str() do something unexpectedly recursive. You could argue that repr() should do a simple thing and str() should go for recursion, but then, "print" calls str(), so that would flood your console with UTF-8 stuff if you accidentally printed something for debugging. Stefan
participants (3)
-
Christian Zagrodnick -
Ian Bicking -
Stefan Behnel