[Catalog-sig] HTML in long description

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 21 17:05:47 CEST 2009

> FWIW lxml.html is pretty convenient to remove any dangerous tag, it's
> a one-liner
> that will get rid of any <form> <script> <embed> etc..

Hmm. Is there a library whose *explicit* purpose is to create "safe"
HTML. I would be hesitating to implement it myself.

> The reStructuredText format is an implicit rule from pypi and trying an
> rst2html process on server side, no matter what long_description contains,
> seem like a bad practice to me.

I think it's not too bad. Since the long_description is either plain
text or ReST, the cost of misinterpretation is really low - ReST may
get mis-rendered as preformatted plain text, in which case it will
remain readable still.

> I'd like to see the nature of long_description explicitely declared in
> the metadata
> For example we could have a "long_description_format" field that would
> be 'text', 'html' or 'restructuredtext'

Sounds fairly complex to me. I think I could accept it - but if html
is removed from the list of allowed formats (which I think it should),
then I don't think this this overhead is really needed.

> Last, notice that there's a new command in distutils called "check" ,
> that can be used
> to check if the long_description field content compiles well in reStructuredText
> This client-side process is convenient to avoid any error or warning
> on the PyPI page.

That could be done, either way, IMO. It might also be useful to have a
distutils command that generates a pypi-like page, so that people can
preview the rendered description.


More information about the Catalog-SIG mailing list