[Python-3000] Draft pre-PEP: function annotations

Fri Aug 11 15:10:31 CEST 2006

Collin Winter wrote:
> The idea is that each developer can pick the notation/semantics that's
> most natural to them. I'll go even further: say one library offers a
> semantics you find handy for task A, while another library's ideas
> about type annotations are best suited for task B. Without a single
> standard, you're free to mix and match these libraries to give you a
> combination that allows you to best express the ideas you're going
> for.

Let me tell you a story.

Once upon a time, there was a little standard called Midi (Musical
Instrument Digital Interface). The Midi standard was small and
lightweight, containing less than a dozen commands of 2-3 bytes each.
However, they realized that they needed a way to allow hardware vendors
to add their own custom message types, so they created a special message
type called "System Exclusive Message" or SysEx for short. The idea is
that you would send a 3-byte manufacturer ID, and then any subsequent
bytes would be considered to be in a vendor-specific format. The MMA
(Midi Manufacturers Association) did not provide any guidelines or
suggestions as to what the format of those bytes should be - it would be
completely up to the vendors to decide what the format of their system
exclusive message would be.

Since the Midi standard did not define a way to save and load the
instrument's memory, vendors typically would use the SysEx message to
allow a "bulk dump" of patch information - essentially it was a way to
access the instrument's internal state of sounds, programs, sequences,
and so on.

This would have worked fine, except for the fact that the vendors and
the MMA were not the only stakeholders. Just about this time (mid-80s)
there began to rise a new type of music company: companies like Mark of
the Unicorn, Steinberg Audio and Blue Ribbon Soundworks that created
professional music software for personal computers. Some companies made
sequencer programs that would allow you to enter musical scores on the
computer screen and play them back through your Midi instrument. Other
companies worked on a different type of product - a "Universal
Librarian", essentially a computer program which would store all of your
patches and sound programs for all your different instruments.

In 1987 I created a program for the Amiga called Music-X, which was a
combination of sequencer and Universal Librarian. In order to create the
librarian module, I needed to get information about all of the various
vendor-specific protocols

    Interrupt - as I was typing this last sentence, I knocked over my
    glass of ice water onto my Powerbook G4, completely toasting the
    motherboard and damaging the display. 24 hours, and $2700 later, I
    have completed my "forced upgrade" and can now continue this posting.
    Lesson to be learned: Internet rants and prescription pain meds do
    not mix! Be warned!

...which was not that difficult, since most of the vendors wold include 
an appendix in the back of the users manual (generally written in very 
bad english) describing the SysEx protocol for that device. I was also 
able to get my hands on "The Big Midi Book of SysEx protocols", which 
was essentially the xerox of all of these various appendices, bound up 
in book form and sold commercially.

At the time there were approximately 150 registered vendor IDs, but my 
idea was that I wouldn't have to implement every protocol - I figured, 
since all I wanted to do was load and store the resulting information, I 
didn't really need to *interpret* the data, I just needed to store it. 
Of course, I would need to interpret any transport-layer instructions 
(commands, block headers, checksums and so on), since a lot of 
instruments sent their "data dumps" as multiple SysEx messages which 
would need to be stored together.

But I figured, since I was only supporting two vendor-specific commands 
for each vendor - bulk dump and bulk load - how different can they all 
be? Sure, there were likely to be individual variations on how things 
were done, but I could solve that by creating a per-instrument 
"personality file" - essentially a set of parameters which would tweak 
the behavior of my transport module. So for example, one parameter would 
indicate the type of checksum algorithm to be used, the second would 
indicate the number of checksum bytes, and so on.

For instruments that I couldn't borrow to test, I would rely on my users 
to fill in the holes (Ah, the heady optimism of the early days of the 
computer revolution!) and I would then add the user-contributed 
parameters to each update of the product.

I think by now you can start to see where this all goes wrong.

I started with a small set of 3 instruments, each from a different 
manufacturer. I analyzed their bulk data protocols, and came up with an 
abstract model that encompassed all of them as a superset. Then I added 
a 4th synth, only to discover that its bulk dump protocol was completely 
different than the previous three, and so my model had to be rebuild 
from scratch. No problem, I thought, 3 is too small a sample size 
anyway. Then I added a 5th synth, and the same thing happened. And a 
6th. And so on.

For example, every vendor I investigated used a *completely different* 
algorithm for computing checksums. Some used CRCs, some did simple 
addition, others used XOR - and some had odd ideas of *which* bytes 
should be checksummed. Some of the algorithms were really bad too.

Different vendors also used different byte encodings. Because Midi is 
designed to work in an environment where cables can be unplugged at any 
moment, and because all other Midi messages (other than SysEx) were at 
most 3 bytes long, the Midi standard required that only 7 bits of each 
byte could be used to carry data, the 8th bit was reserved for a "start 
of new message" flag.

Different vendors adapted to this challenge with surprising creativity. 
Some would simply slice the whole dump into units of 7 bits each, 
crossing the normal byte boundaries. Some would only send 4 bits per 
Midi Byte. Some did things like: For each 7 bytes of input data, send 
the bottom 7 bits of each input byte as the first 7 bytes, and then send 
an 8th byte containing the missing top-bits from the first seven. And 
then there were those clever manufacturers who simply decided to design 
their instruments so that no control parameter could have a magnitude 
greater than 127.

Another example of variation was in timing. Roland machines (of certain 
models) were notorious for rejecting messages if they were sent too fast 
- you had to wait at least 20 ms from the time you received a message to 
the time you sent the response. Others would "time out" if you waited 
too long.

There were half-duplex and full-duplex, stateless and stateful 
protocols, and I could go on. The point is, that there was no way for me 
to come up with some sort of algorithmic way to describe all of these 
protocols - the only way to do was in code, with a separate 
implementation for each and every protocol. Nowadays, I'd simply embed 
Python into the program and make each personality file a Python script, 
but I didn't have that option back then. I toyed around with the idea of 
inventing a custom scripting language specifically for representing dump 
protocols, but the idea was infeasible at the time.

So, if you have had the patience to read through this long-winded 
anecdote and are wondering how in the hell this relates to Colin's 
question, I can sum it up in a very short motto (and potential QOTW):

   "Never question the creative power of an infinite number of monkeys."

Or to put it another way: If you create a tool, and you assume that tool 
will only be used in certain specific ways, but you fail to enforce that 
limitation, then your assumption will be dead wrong. The idea that there 
will only be a few type annotation providers who will all nicely 
cooperate with one another is just as naive as I was in the SysEx debacle.

I'll have more focused things to say about this later, but I need to 
rest. (Had to get that out before all the rant energy dissipated.)

-- Talin