[XML-SIG] Re: RSS and stuff

Lars Marius Garshol larsga@ifi.uio.no
05 Jun 1999 12:50:00 +0200


* Lars Marius Garshol
|
| Sadly, though, many of the RSS files on the net are not even
| well-formed. The ones for WebMonkey and python.org spring to mind.

* Dan Libby
| 
| I assume you mean they are not well-formed because they embed
| entities?

Actually, no. The WebMonkey file is not well-formed because the XML
declaration does not begin the document (if they removed it all would
be well; I've emailed them, but to no avail) and the python.org one is
not well-formed because it has a <link>...</linK> pair.

* Lars Marius Garshol
|
| Hmmm. Why not use a real XML schema? It should support everything I
| can imagine you would want anyway. Or is it too complex?
 
* Dan Libby
|
| 1) It's a spec.  A very complex spec.  I don't know of any software
| that implements it.  I don't have time to write such software, given
| our development schedules which are measured in days.  I just want
| something that is flexible enough that we can change our format
| without having to write a bunch of new code.  When XML schemas are
| well supported, then we should be able to move to those quite
| easily, provided they have a superset of our functionality.
| Besides, if I tried, I would probably end up with something that is
| close to XML schemas, but not exact, so then we have unexpected
| behavior, etc.  This way, it is obviously not an xml schema, just
| "Dan's validation rules" DTD.
 
| 2) I may have just missed it, but I didn't see any support for
| limiting length of strings.

I don't think there is any.
 
| 3) The time support is IS0 8601 only, which is itself a very
| complicated subject.

Walter Underwood and AMK have already dealt with this, so I'll just
skip it here.

* Lars Marius Garshol
|
| [on the topic of XML and data typing]
|
| Seriously, these things aren't as important as many people think.
| And it's also worth remembering that XML comes from a document
| background where such things are not all that relevant.
 
* Dan Libby
|
| They are important to us. We need to store this stuff in a database.
| We need to make sure some joker hasn't given us a string that is 20
| megabytes long,

Sure, but in the original SGML context this wasn't a problem in the
same way.

| I think that as XML becomes used for data transfer, as opposed to
| document transfer, people will be more and more concerned about
| this.  E-commerce especially is going to require a very specific set
| of enforceable rules for validity.  

Definitely, and for this very reason I've been advocating that the W3C
schema language should be extensible, so that the e-commerce and EDI
communities (and other communities with special needs) can build on
what's already defined.

| For some reason, people tend to become very upset when money is
| involved.  ;-)

Strange. Can't think why that would be. :)

| [dates in RSS]
| 
| Agreed.  I had this in the original spec, but was removed for public
| release, since we were not actually going to use the value.  What do
| you think of <timestamp> (seconds since 1970) </timestamp> instead?

I don't like it. Most people will be authoring RSS by hand or generate
it automatically from some hand-written source. When writing RSS by
hand seconds since 1970 is out of the question and when generating it
with XSL I don't think this transformation is possible.

Also, seconds since 1970 is not human-readable or intuitive in any way.

| Again, I'm not fond of parsing IS0 8601.

A simple requirement like YYYYMMDD would be sufficient, I think. (Even
not requiring anything at all should be acceptable, but in this case
YYYYMMDD might be the best choice.)
 
| [item descriptions in RSS]
| 
| This should be possible.  Again, we didn't support stuff like this
| originally, because will not actually use the data in the
| "description" tag anywhere on My Netscape, and because our (old)
| validator code had to know about description rules for each location
| it is used. As others are now using the format, I can see where it
| would make sense, and it should be easy to add this as an optional
| element if I can convince people to use my new validation code.

Good! I'm crossing my fingers here. :)
 
* Lars Marius Garshol
|
| A third thing is a place to put the email address of the maintainer so
| that I know where to complain when a document isn't well-formed.
 
* Dan Libby
|
| hmm.  I assume you think this should be inside the <channel> tag?

Yes.

| This is where <dc:creator> would be nice...

Ouch, no. <contact-email>, perhaps. Dublin Core doesn't mandate the
syntax of DC element contents, but using the email address here
doesn't feel very right.

Also: one thing I detest about this use of namespaces is that it gives
you no choice in naming (except in the prefix, which I don't think
should be abused). Something like:

  <!ATTLIST contact-email dublin-core CDATA #FIXED "creator">

would be much better.
 
* Lars Marius Garshol
|
| - "RSS 0.9 supports the full ASCII character set, as well as all
| legal decimal and HTML entities. RSS 0.9 does not support other
| types of character data, such as UTF-8. For a list of legal HTML and
| decimal entities, refer to Special Symbols and Entities on DevEdge,
| Netscape's information resource for developers."
 
* Dan Libby
|
| We are updating this to support UTF-8 soon, and possibly other
| encodings.  

Hmmm. Which parser(s) are you using?

| I promise to post a DTD soon.  ;-)

Good. :)

| [RSS and RDF] 
| 
| If you click on the "Future Directions" link in the quickstart
| (http://my.netscape.com/publish/help/futures.html), I have an
| example of the original RSS format I came up with, which does make
| meaningful use of RDF (channels have IDs, all nodes connect, dublin
| core is used, etc.)

Hmmm. Maybe there's something about RDF I've missed, but this doesn't
appear to be correct RDF either. Shouldn't the RDF document be just a
sequence of RDF statements, with custom elements inside the statements?

| However, apparently this "overly complicated". 

I think that's correct. Do you think this proposal would have caught
on the way RSS 0.9 has? (Sometimes I think we should all re-read
worse-is-better every morning. :)

| [regarding posted RSS DTD]
| 
| Thanks.  I'll take a look at this, run it through a validating
| parser, etc.  Do you mind if we post it, or a slightly modified
| version, as the "official" DTD?

Not at all. Does this mean that I captured your view of RSS correctly?
 
* Lars Marius Garshol
|
| <!ELEMENT channel (title, description, link)>
 
* Dan Libby
|
| This implies ordering, correct?  ie, title, then description, then
| link?  

Yes.

| A problem I had with DTDs is that I couldn't figure out how to say
| that an element is required, and that ordering is unimportant.

In XML there isn't any. Schemas currently allow this, as do SGML DTDs.
You can do it by explicitly allowing choices between all the possible
different sequences, but for n elements the number of sequences equals
n factorial.

| Therefore, if I posted this DTD now, it would mean that a whole
| bunch of existing channels are invalid.  

Ouch. Not good.

However, why did you allow any ordering? If the order doesn't matter
it may as well be fixed, especially as this causes much less pain in
specifying a DTD. I don't see the harm anywhere either.

| The other option is to use (title | description | link), but this
| means that they are optional, which is even less correct.

I agree, this is an ugly problem, but it's mainly caused by being
insufficently restrictive to begin with. 

| [XSA custom validator]
| 
| What is this special validating software?  Is it generic, or does it
| know specifically about your format? If generic, what do you use as
| input to define the validaton rules? 

I use a DTD as a declarative means of specifying the hard bits
(allowed elements and nesting), and then Python code to deal with
element content typing. (This is not generic at the moment. After
reading the XML schemas draft I'm working on an implementation of the
data types part which would be completely generic and not even depend
on schemas.) Since the DTD handles everything except element content
this works well and is really easy.

Also, the DTD works well as documentation and people can also use it
to guide XML-aware editors and so on.

| My apologies if this is all explained in detail somwhere...  ;-)

It's not. :)

--Lars M.