[Python-3000] Draft pre-PEP: function annotations
talin at acm.org
Fri Aug 11 15:10:31 CEST 2006
Collin Winter wrote:
> The idea is that each developer can pick the notation/semantics that's
> most natural to them. I'll go even further: say one library offers a
> semantics you find handy for task A, while another library's ideas
> about type annotations are best suited for task B. Without a single
> standard, you're free to mix and match these libraries to give you a
> combination that allows you to best express the ideas you're going
Let me tell you a story.
Once upon a time, there was a little standard called Midi (Musical
Instrument Digital Interface). The Midi standard was small and
lightweight, containing less than a dozen commands of 2-3 bytes each.
However, they realized that they needed a way to allow hardware vendors
to add their own custom message types, so they created a special message
type called "System Exclusive Message" or SysEx for short. The idea is
that you would send a 3-byte manufacturer ID, and then any subsequent
bytes would be considered to be in a vendor-specific format. The MMA
(Midi Manufacturers Association) did not provide any guidelines or
suggestions as to what the format of those bytes should be - it would be
completely up to the vendors to decide what the format of their system
exclusive message would be.
Since the Midi standard did not define a way to save and load the
instrument's memory, vendors typically would use the SysEx message to
allow a "bulk dump" of patch information - essentially it was a way to
access the instrument's internal state of sounds, programs, sequences,
and so on.
This would have worked fine, except for the fact that the vendors and
the MMA were not the only stakeholders. Just about this time (mid-80s)
there began to rise a new type of music company: companies like Mark of
the Unicorn, Steinberg Audio and Blue Ribbon Soundworks that created
professional music software for personal computers. Some companies made
sequencer programs that would allow you to enter musical scores on the
computer screen and play them back through your Midi instrument. Other
companies worked on a different type of product - a "Universal
Librarian", essentially a computer program which would store all of your
patches and sound programs for all your different instruments.
In 1987 I created a program for the Amiga called Music-X, which was a
combination of sequencer and Universal Librarian. In order to create the
librarian module, I needed to get information about all of the various
Interrupt - as I was typing this last sentence, I knocked over my
glass of ice water onto my Powerbook G4, completely toasting the
motherboard and damaging the display. 24 hours, and $2700 later, I
have completed my "forced upgrade" and can now continue this posting.
Lesson to be learned: Internet rants and prescription pain meds do
not mix! Be warned!
...which was not that difficult, since most of the vendors wold include
an appendix in the back of the users manual (generally written in very
bad english) describing the SysEx protocol for that device. I was also
able to get my hands on "The Big Midi Book of SysEx protocols", which
was essentially the xerox of all of these various appendices, bound up
in book form and sold commercially.
At the time there were approximately 150 registered vendor IDs, but my
idea was that I wouldn't have to implement every protocol - I figured,
since all I wanted to do was load and store the resulting information, I
didn't really need to *interpret* the data, I just needed to store it.
Of course, I would need to interpret any transport-layer instructions
(commands, block headers, checksums and so on), since a lot of
instruments sent their "data dumps" as multiple SysEx messages which
would need to be stored together.
But I figured, since I was only supporting two vendor-specific commands
for each vendor - bulk dump and bulk load - how different can they all
be? Sure, there were likely to be individual variations on how things
were done, but I could solve that by creating a per-instrument
"personality file" - essentially a set of parameters which would tweak
the behavior of my transport module. So for example, one parameter would
indicate the type of checksum algorithm to be used, the second would
indicate the number of checksum bytes, and so on.
For instruments that I couldn't borrow to test, I would rely on my users
to fill in the holes (Ah, the heady optimism of the early days of the
computer revolution!) and I would then add the user-contributed
parameters to each update of the product.
I think by now you can start to see where this all goes wrong.
I started with a small set of 3 instruments, each from a different
manufacturer. I analyzed their bulk data protocols, and came up with an
abstract model that encompassed all of them as a superset. Then I added
a 4th synth, only to discover that its bulk dump protocol was completely
different than the previous three, and so my model had to be rebuild
from scratch. No problem, I thought, 3 is too small a sample size
anyway. Then I added a 5th synth, and the same thing happened. And a
6th. And so on.
For example, every vendor I investigated used a *completely different*
algorithm for computing checksums. Some used CRCs, some did simple
addition, others used XOR - and some had odd ideas of *which* bytes
should be checksummed. Some of the algorithms were really bad too.
Different vendors also used different byte encodings. Because Midi is
designed to work in an environment where cables can be unplugged at any
moment, and because all other Midi messages (other than SysEx) were at
most 3 bytes long, the Midi standard required that only 7 bits of each
byte could be used to carry data, the 8th bit was reserved for a "start
of new message" flag.
Different vendors adapted to this challenge with surprising creativity.
Some would simply slice the whole dump into units of 7 bits each,
crossing the normal byte boundaries. Some would only send 4 bits per
Midi Byte. Some did things like: For each 7 bytes of input data, send
the bottom 7 bits of each input byte as the first 7 bytes, and then send
an 8th byte containing the missing top-bits from the first seven. And
then there were those clever manufacturers who simply decided to design
their instruments so that no control parameter could have a magnitude
greater than 127.
Another example of variation was in timing. Roland machines (of certain
models) were notorious for rejecting messages if they were sent too fast
- you had to wait at least 20 ms from the time you received a message to
the time you sent the response. Others would "time out" if you waited
There were half-duplex and full-duplex, stateless and stateful
protocols, and I could go on. The point is, that there was no way for me
to come up with some sort of algorithmic way to describe all of these
protocols - the only way to do was in code, with a separate
implementation for each and every protocol. Nowadays, I'd simply embed
Python into the program and make each personality file a Python script,
but I didn't have that option back then. I toyed around with the idea of
inventing a custom scripting language specifically for representing dump
protocols, but the idea was infeasible at the time.
So, if you have had the patience to read through this long-winded
anecdote and are wondering how in the hell this relates to Colin's
question, I can sum it up in a very short motto (and potential QOTW):
"Never question the creative power of an infinite number of monkeys."
Or to put it another way: If you create a tool, and you assume that tool
will only be used in certain specific ways, but you fail to enforce that
limitation, then your assumption will be dead wrong. The idea that there
will only be a few type annotation providers who will all nicely
cooperate with one another is just as naive as I was in the SysEx debacle.
I'll have more focused things to say about this later, but I need to
rest. (Had to get that out before all the rant energy dissipated.)
More information about the Python-3000