Hi,
I noticed that lxml.html.parse() currently takes a string as argument that is
interpreted as HTML code. However, lxml.etree.parse() also takes a string,
which is interpreted as a filename. I think we should not divert from the
other lxml APIs here. At least, it surprised me when I called
etree.tostring(html.parse("doc/html/api.html")) and got
"<p>doc/html/api.html</p>" as a result.
I really want lxml to stay an integrated set of tools, things that work
together smoothly. And a commen base API is very important here. I don't mind
Elements having different APIs in different packages (that's the main idea
after all), but I would like to keep functions and methods with similar names
semantically close wherever possible.
It's hard to come up with a good name for the functions, though, as the
function that comes closest is called HTML(). Not really a perfect name for a
function (but ok for a factory). What about "parse_chunk()" or
"parse_string()"? Then the other functions would become something like
"parse_string_element()" and "parse_string_elements()". I know, that's long,
but I wouldn't mind that, since the meaning is clear and most of the time,
you'd use "parse_string()" anyway.
We could then add a "parse()" function that basically does what the current
"parse()" function does, but for files as input.
Would that be ok for you?
Stefan