[lxml-dev] HTML Meta Content-Type Tag not created as documenation states?

So, I was trying to figure out what happend to my meta tags when using the lxml.html module, and saw the note in the documentation that html.tostring will handle them as so:
However, that doesn't seem to actually be the case. It looks like etree.tostring is never creating the meta tag as html.tostring appears to expect, and instead the include_meta_content_type flag is simply controlling whether any found meta tag is removed from the output (with an re!). Python 2.5.2 (r252:60911, Sep 22 2008, 12:08:38) [GCC 4.1.2 (Gentoo 4.1.2 p1.1)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Not present there, so I figure maybe it's because it's not being treated as a complete document?
Parsing as document doesn't create it either.
Okay, maybe it's because I'm using the default encoding for HTML (us-ascii)? Nope, trying something else doesn't cause it to exist either.
Maybe wrapping in an ElementTree? Get a doctype declaration out of that, but still no meta tag.
In further testing, it appeared that if a Meta Content-Type tag was specified, it was passed though as is, as long as include_meta_content_type was True. The really weird part of this for me though, is that I've set include_meta_content_type on my much more complicated application server, and it does in fact appear to be generating meta tags automatically (or at least something in my XSLT heavy processing chain is). My testing was an attempt to duplicate that, and I was quite surprised when I couldn't. I've tried this on boxes with both libxml2 2.6.26 (RHEL5) & 2.6.32, and didn't see a difference there. -- John Krukoff <jkrukoff@ltgc.com> Land Title Guarantee Company

Hi, John Krukoff wrote:
This hint you gave makes me wonder if this functionality wasn't lost when I switched from the original XSLT based generation to the one based on tostring(method="html"). AFAIR, that was long before 2.0 was released... I assume that HTML generation using xsl:output generates the <meta> tag and the normal HTML serialisation does not do it. There are some new features in libxml2 2.7.2 that would allow moving the serialisation to the xmlSave*() API, but that's not backportable to older versions (lxml currently runs with libxml2 2.6.21). IMHO, your current best bet is to always serialise using XSLT if you want to have a <meta> tag. When pre-parsed, the obvious stylesheet that does that shouldn't really be slower than a call to tostring(). Stefan

Hi, John Krukoff wrote:
This hint you gave makes me wonder if this functionality wasn't lost when I switched from the original XSLT based generation to the one based on tostring(method="html"). AFAIR, that was long before 2.0 was released... I assume that HTML generation using xsl:output generates the <meta> tag and the normal HTML serialisation does not do it. There are some new features in libxml2 2.7.2 that would allow moving the serialisation to the xmlSave*() API, but that's not backportable to older versions (lxml currently runs with libxml2 2.6.21). IMHO, your current best bet is to always serialise using XSLT if you want to have a <meta> tag. When pre-parsed, the obvious stylesheet that does that shouldn't really be slower than a call to tostring(). Stefan
participants (2)
-
John Krukoff
-
Stefan Behnel