Regular Expressions: Can't quite figure this problem out

Robert Dailey rcdailey at gmail.com
Tue Sep 25 21:14:34 EDT 2007


On 9/25/07, J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Dailey top posted:
> > Hmm, ElementTree.tostring() also adds a space between the last
> > character of the element name and the />. Not sure why it is doing
> > this.
> >
> > Something like <root/> will become <root /> after the tostring().
>
>
> The space was common practice in pseudo-XHTML code when people still
> had to routinely support browsers like Netscape 4, which had no clue
> about XML.  It basically makes a uniquely XML construct into valid
> HTML.  Basically, the space makes unaware parsers treat the / as the
> next attribute.  Being an attribute with unknown meaning, the standard
> practice is to ignore it, and hence, it is parsed properly in both
> XHTML parsers and plain HTML parsers.  I guess the practice just
> caught on beyond the XHTML world.
>
> I don't know if there's a flag to get rid of it, but you can always
> dig into the code....
>
> Cheers,
> Cliff
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (MingW32)
>
> iD8DBQFG+ZXyGI3CK/MIt30RAkvhAJ0TAz4Y5ngDEVo9wnRwPhESh+D64QCcDjdM
> JKT6H37LgX1Fk7665+Mqwh0=
> =GcvK
> -----END PGP SIGNATURE-----
>
>
Right now I just run a trivial regular expression on the result of
tostring() to remove the spaces.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070925/eed50fa1/attachment.html>


More information about the Python-list mailing list