[XML-SIG] Re: RSS and stuff

Wed, 02 Jun 1999 16:14:23 -0700

This is a multi-part message in MIME format.
--------------1E252B7D3F134ACBF841820B
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Lars,

Thanks for your response.  I have forwarded it to others here who are involved
with RSS.  Below are my responses.

> This is definitely a good idea. Sadly, though, many of the RSS files
> on the net are not even well-formed. The ones for WebMonkey and
> python.org spring to mind.
>

I assume you mean they are not well-formed because they embed entities?

> | However, in my spare time I've been working on a generic validator
> | that will read in a schema file (of my own devise, not a real XML
> | schema) that's written in XML, and then validate a document based on
> | that.
>
> Hmmm. Why not use a real XML schema? It should support everything I
> can imagine you would want anyway. Or is it too complex?

1) It's a spec.  A very complex spec.  I don't know of any software that
implements it.  I don't have time to write such software, given our development
schedules which are measured in days.  I just want something that is flexible
enough that we can change our format without having to write a bunch of new code.
When XML schemas are well supported, then we should be able to move to those quite
easily, provided they have a superset of our functionality.  Besides, if I tried,
I would probably end up with something that is close to XML schemas, but not
exact, so then we have unexpected behavior, etc.  This way, it is obviously not an
xml schema, just "Dan's validation rules" DTD.

2) I may have just missed it, but I didn't see any support for limiting length of
strings.

3) The time support is IS0 8601 only, which is itself a very complicated subject.
(aside: anyone know of a python module to parse dates according to 8601?).  I
would like to see support for unix/c style integer timestamps (seconds since 1970
UNC, as returned by time() ).  We tend to use these a lot.  Also for unix/c style
date string as returned by `date`.  eg: Sun May 30 19:24:15 PDT 1999.  I already
forwarded this request to the xml schema folks.

> Seriously, these things aren't as important as many people think. And
> it's also worth remembering that XML comes from a document background
> where such things are not all that relevant. (Imagine trying to do
> this for HTML. Actually enforcing correct use of DFN, H1-H6, ABBR,
> ACRONYM, VAR, ADDRESS and all the other elements would require a
> serious number of years of AI development in Prolog or Common Lisp.)
>

They are important to us. We need to store this stuff in a database.  We need to
make sure some joker hasn't given us a string that is 20 megabytes long, and
further that we won't be putting HTML into our generated page that breaks the
entire page.  We also need to be able to tell end-users (webmasters) whether the
data they have given us will actually be displayed correctly or not.  I think that
as XML becomes used for data transfer, as opposed to document transfer, people
will be more and more concerned about this.  E-commerce especially is going to
require a very specific set of enforceable rules for validity.  For some reason,
people tend to become very upset when money is involved.  ;-)

>
> | What would you like to see / not see in the format?  It really is
> | just supposed to be a summary.
>
> The first thing I'd like to see is a date element for items. Many RSS
> providers currently use something like:
>
>   <item>
>     <title>(19990602) New foo!</title>
>     <link>...
>
> and it would be useful to formalize that as:
>
>   <item>
>     <date>19990602</date>
>     <title>...
>

Agreed.  I had this in the original spec, but was removed for public release,
since we were not actually going to use the value.  What do you think of
<timestamp>  (seconds since 1970) </timestamp>  instead?  Again, I'm not fond of
parsing IS0 8601.

> The second thing is descriptions for items. I'm thinking of providing
> an RSS feed for my home page, and when I do I know I will want to be
> able to have entries like:
>
>   <item>
>     <date>19990602</date>
>     <title>RSS feed available!</title>
>     <description>I now provide an RSS feed which lists all updates to
>     my home page. This will hopefully make it easier for people
>

This should be possible.  Again, we didn't support stuff like this originally,
because will not actually use the data in the "description" tag anywhere on My
Netscape, and because our (old) validator code had to know about description rules
for each location it is used. As others are now using the format, I can see where
it would make sense, and it should be easy to add this as an optional element if I
can convince people to use my new validation code.

> A third thing is a place to put the email address of the maintainer so
> that I know where to complain when a document isn't well-formed.
>

hmm.  I assume you think this should be inside the <channel> tag?  This is where
<dc:creator> would be nice...

>   - "RSS 0.9 supports the full ASCII character set, as well as all
>   legal decimal and HTML entities. RSS 0.9 does not support other
>   types of character data, such as UTF-8. For a list of legal HTML and
>   decimal entities, refer to Special Symbols and Entities on DevEdge,
>   Netscape's information resource for developers."
>

We are updating this to support UTF-8 soon, and possibly other encodings.  I
promise to post a DTD soon.  ;-)

>   - Also, what's the relationship with RDF? RSS uses the RDF root
>   element, but does not conform to the RDF syntax or actually use
>   anything meaningful from RDF.

This boils down to internal politics.  If you click on the "Future Directions"
link in the quickstart (http://my.netscape.com/publish/help/futures.html), I have
an example of the original RSS format I came up with, which does make meaningful
use of RDF (channels have IDs, all nodes connect, dublin core is used, etc.)
However, apparently this "overly complicated".  There are other technical reasons
I can't really go into.  Anyway, for now, RSS is basically an XML format, and it
may eventually have an RDF superset.

[regarding posted RSS DTD]

 Thanks.  I'll take a look at this, run it through a validating parser, etc.  Do
you mind if we post it, or a slightly modified version, as the "official" DTD?

> <!ELEMENT channel (title, description, link)>

This implies ordering, correct?  ie, title, then description, then link?  A
problem I had with DTDs is that I couldn't figure out how to say that an element
is required, and that ordering is unimportant.  Therefore, if I posted this DTD
now, it would mean that a whole bunch of existing channels are invalid.  The other
option is to use (title | description | link), but this means that they are
optional, which is even less correct.

> I've done exactly the same for XSA and have exactly the same problem
> as you. I provided a DTD and have special validating software that
> rides on top of a validator (xmlproc). If I were to do it again
> there's no question that I would do the same thing. So far there has
> been no confusion at all (although I've seen HTML users become
> confused by this).

What is this special validating software?  Is it generic, or does it know
specifically about your format?  If generic, what do you use as input to define
the validaton rules?  My apologies if this is all explained in detail somwhere...
;-)

-dan

--------------1E252B7D3F134ACBF841820B
Content-Type: text/x-vcard; charset=us-ascii;
 name="danda.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Dan Libby
Content-Disposition: attachment;
 filename="danda.vcf"

begin:vcard 
n:Libby;Dan
x-mozilla-html:TRUE
org:Netscape Communications
adr:;;;Mountain View;CA;94043;USA
version:2.1
email;internet:danda@netscape.com
x-mozilla-cpt:;0
tel;home:650-964-5913
tel;work:650-937-2276
fn:Dan Libby
end:vcard

--------------1E252B7D3F134ACBF841820B--