Managing Google Groups headaches

rusi rustompmody at
Fri Dec 6 14:03:57 CET 2013

On Friday, December 6, 2013 1:06:30 PM UTC+5:30, Roy Smith wrote:
>  Rusi  wrote:

> > On Thursday, December 5, 2013 6:28:54 AM UTC+5:30, Roy Smith wrote:

> > > The real problem with web forums is they conflate transport and 
> > > presentation into a single opaque blob, and are pretty much universally 
> > > designed to be a closed system.  Mail and usenet were both engineered to 
> > > make a sharp division between transport and presentation, which meant it 
> > > was possible to evolve each at their own pace.
> > > Mostly that meant people could go off and develop new client 
> > > applications which interoperated with the existing system.  But, it also 
> > > meant that transport layers could be switched out (as when NNTP 
> > > gradually, but inexorably, replaced UUCP as the primary usenet transport 
> > > layer).
> > There is a deep assumption hovering round-about the above -- what I
> > will call the 'Unix assumption(s)'.

> It has nothing to do with Unix.  The separation of transport from 
> presentation is just as valid on Windows, Mac, etc.

> > But before that, just a check on
> > terminology. By 'presentation' you mean what people normally call
> > 'mail-clients': thunderbird, mutt etc. And by 'transport' you mean
> > sendmail, exim, qmail etc etc -- what normally are called
> > 'mail-servers.'  Right??

> Yes.

> > Assuming this is the intended meaning of the terminology (yeah its
> > clearer terminology than the usual and yeah Im also a 'Unix-guy'),
> > here's the 'Unix-assumption':
> >   - human communication�
> > (is not very different from)
> >   - machine communication�
> > (can be done by)
> >   - text�
> > (for which)
> >   - ASCII is fine�
> > (which is just)
> >   - bytes�
> > (inside/between byte-memory-organized)
> >   - von Neumann computers
> > To the extent that these assumptions are invalid, the 'opaque-blob'
> > may well be preferable.

> I think you're off on the wrong track here.  This has nothing to do with 
> plain text (ascii or otherwise).  It has to do with divorcing how you 
> store and transport messages (be they plain text, HTML, or whatever) 
> from how a user interacts with them.

Evidently (and completely inadvertently) this exchange has just
illustrated one of the inadmissable assumptions:

"unicode as a medium is universal in the same way that ASCII used to be"

I wrote a number of ellipsis characters ie codepoint 2026 as in:

  - human communication…
(is not very different from)
  - machine communication… 

Somewhere between my sending and your quoting those ellipses became
the replacement character FFFD

> >   - human communication�
> > (is not very different from)
> >   - machine communication�

Leaving aside whose fault this is (very likely buggy google groups),
this mojibaking cannot happen if the assumption "All text is ASCII"
were to uniformly hold.

Of course with unicode also this can be made to not happen, but that
is fragile and error-prone.  And that is because ASCII (not extended)
is ONE thing in a way that unicode is hopelessly a motley inconsistent

With unicode there are in-memory formats, transportation formats eg
UTF-8, strange beasties like FSR (which then hopelessly and
inveterately tickle our resident trolls!) multi-layer encodings (in
html), BOMS and unnecessary/inconsistent BOMS (in microsoft-notepad).
With ASCII, ASCII is ASCII; ie "ABC" is 65,66,67 whether its in-core,
in-file, in-pipe or whatever.  Ok there are a few wrinkles to this
eg. the null-terminator in C-strings. I think this is the exception to
the rule that in classic Unix, ASCII is completely inter-operable and
therefore a universal data-structure for inter-process or inter-machine

It is this universal data structure that makes classic unix pipes and
filters possible and easy (of which your separation of presentation
and transportation is just one case).

Give it up and the composability goes with it.

Go up from the ASCII -> Unicode level to the plain-text -> hypertext
(aka html) level and these composability problems hit with redoubled

> Take something like Wikipedia (by which, I really mean, MediaWiki, which 
> is the underlying software package).  Most people think of Wikipedia as 
> a web site.  But, there's another layer below that which lets you get 
> access to the contents of articles, navigate all the rich connections 
> like category trees, and all sorts of metadata like edit histories.  
> Which means, if I wanted to (and many examples of this exist), I can 
> write my own client which presents the same information in different 
> ways.

Not sure whats your point.
Html is a universal data-structuring format -- ok for presentation, bad for
SQL databases (assuming thats the mediawiki backend) is another -- ok for 
data-structuring bad for presentation.

Mediawiki mediates between the two formats.

Beyond that I lost you... what are you trying to say??

More information about the Python-list mailing list