Number of languages known [was Re: Python is readable] - somewhat OT

Thu Apr 5 14:17:48 EDT 2012

Re-trolling.

On Wed, Apr 4, 2012 at 1:49 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>> As part of my troll-outreach effort, I will indulge here.  I was
>> specifically thinking about some earlier claims that programming
>> languages as they currently exist are somehow inherently superior to a
>> formalized natural language in expressive power.
>
> I would argue that they are, but only for the very limited purpose for
> which they are written. With the possible exception of Inform 7, most
> programming languages are useless at describing (say) human interactions.

I was thinking about this statement this morning.  Compression is just
the process of finding a more efficient encoding for information.  I
suppose you could say then that language developers could
theoretically be trying to compress natural language representations
of information.  The problem then is that everyone is doing a horrible
job, because they are approaching the subject in an ad hoc manner.
There are multiple branches of mathematics and computer science that
deal with this exact subject in a rigorous way.  The trick is to find
an encoding that has low space complexity, and for which the
transformation to knowledge is efficient, for human beings.

Lets assume that the input to be encoded are logical
(proposition/predicate) statements. The first thing that came to mind
when thinking this way is radix trees and directed acyclic word graphs
(a form of DFA).  These structures are fairly easy to work out on
paper given a set of inputs, and it is fairly easy to reconstruct a
set of inputs from the structure.  Perhaps, we could use natural
language statements, and some very minimal extended syntax to indicate
a data structure (which fans out to a set of statements).  As a quick
example to see what I mean (mimicking some python syntax for
similarity):

in the context of chess:

    a color is either white or black

    the board:
        is a cartesian grid having dimension (8, 8)
        has squares, representing points on the grid

    a square:
        has a color
        contains a piece or is empty

    a piece:
        has a color
        is located in a square or has been captured

    a { king, queen, rook, bishop, knight, pawn } is a type of piece

It should be clear that this is a directed acyclic phrase graph, and
if you select a phrase fragment, then one phrase fragment from each
child level until reaching a leaf, the concatenation of the phrase
fragments forms a logical phrase.  Note that the set braces are
shorthand for multiple statements.  This was really easy to write, and
I bet even non programmers would have little or no trouble
understanding what was going on.  Additionally, I could make a full
statement elsewhere, and if we have an algorithm to transform to a
canonical phrase structure and merge synonyms, it could be inserted in
the phrase graph, just as neatly as if I had written it there in the
first place.  The sexy thing about that, is that lets you take two
sets of propositional statements, and perform set theoretic operations
on them (union, complement, etc), and get a phrase graph structure out
at the end which looks just like a nice neat little "program".  You
could even get really crazy, if you could define equivalence relations
(other than the natural relation) for the union (Set1.A ~ Set2.B) as
that would let you compose the graphs in arbitrarily many ways.  If
you're dealing processes, you would also want to be able to specify
temporal equivalence (Process1.T1 ~ Process2.T6).