[Catalog-sig] HTML in long description

Tarek Ziadé ziade.tarek at gmail.com
Fri Aug 21 16:51:37 CEST 2009

On Fri, Aug 21, 2009 at 4:35 PM, Fred Drake<fdrake at gmail.com> wrote:
> On Fri, Aug 21, 2009 at 10:33 AM, "Martin v. Löwis"<martin at v.loewis.de> wrote:
>> Which way should PyPI go: escape all markup if ReST rendering fails?
>> Or else allow arbitrary HTML to be embedded? I'm worried that somebody
>> would create a cross-site attack out of that...
> Same here; the text in the <pre> should be properly escaped.

FWIW lxml.html is pretty convenient to remove any dangerous tag, it's
a one-liner
that will get rid of any <form> <script> <embed> etc..

But in any case, I find the current situation fuzzy :

The reStructuredText format is an implicit rule from pypi and trying an
rst2html process on server side, no matter what long_description contains,
seem like a bad practice to me.

I'd like to see the nature of long_description explicitely declared in
the metadata

For example we could have a "long_description_format" field that would
be 'text',
'html' or 'restructuredtext'

If present, PyPI could use this info to decide what it should do with
(although this does not remove the need to clean it up on server side
for security reasons
of course)

Last, notice that there's a new command in distutils called "check" ,
that can be used
to check if the long_description field content compiles well in reStructuredText
This client-side process is convenient to avoid any error or warning
on the PyPI page.

(it's available only docutils is installed of course)

>  -Fred
> --
> Fred L. Drake, Jr.    <fdrake at gmail.com>
> "Chaos is the score upon which reality is written." --Henry Miller
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Tarek Ziadé | http://ziade.org

More information about the Catalog-SIG mailing list