No-syntax Web-programming-IDE (was: Does turtle graphics have the wrong associations?)

Sun Nov 22 07:03:28 EST 2009

> >>> My proposed no-syntax
> >>> IDE *also* gets rid of the need to bother with any programming-language
> >>> syntax. I've been proposing it for years, but nobody has shown any
> >>> interest
> From: Terry Reedy <tjre... at udel.edu>
> What you describe below is similar to various systems that have
> been proposed and even implemented, including visual programming
> systems.

Are any of them integrated with tutorial material and available
over the Web, as mine will be? If so, will you tell me the URLs so
that I can play with them?

> And there has been some success with non-programmers.

The purpose of *my* system will be to start with mostly
non-programmers and *teach* them algorithm design from examples of
tasks they get paid (labor-credits, not $money$) to perform,
without needing to simultaneously bother them with
programming-language syntax. They learn how to take one step at a
time towards a long journey (Chinese proverb) without needing to
first learn a language for stating with ultra-precision *how*
exactly to take each one step. Thus they learn algorithm design
with maximal efficiency, because nearly their whole attention is on
*that* without distraction of syntax too.

> But for most people, it is simply easier to say/write what one
> means rather than point and click.

That works only after you already know to say/write what you mean
for the computer *precisely*exactly*analRetentively* what you want
the computer to do. If you don't already know *how* to say
precisely what you mean for the computer to do, it's impossible.
For example, show somebody this data:
   [12 15 19 26 43 62 92 71 78 85 93]
Point to this item ...^^
Ask that person, who has *never* even seen a computer program much
less written one him/herself, and also who has never studied formal
mathematics such as set theory, and ask that person to say in
precise terms how a computer should find that indicated item among
the data. Look at the data. You see what's special about that one
item among the data, right? So how to express to a computer how to
find it? Answer: It's the item that's out-of-sequence relative to
its neighbors.

How many non-computer non-math beginners would even notice what's
special about that item? (my guess: about half)

How many would be able to express either the mathematical
definition of the desired result, or an algorithm for finding it,
clearly enough that even a human seeing the expression but *not*
seeing the sample would be able to write a computer program per
that spec to solve the problem? (my guess: less than 1%)

Example of a valid informal mathematical expression: There's a
sequence of numbers. Mostly they are in correct sequence. But
exactly one of them is in the wrong place relative to the others
around it. Find that one that's out of place.

Example of a valid semi-formal mathematical expression:
Given an index set [0..N], and a function F from that index set
into the integers;
Such that the predicate "lambda (i) F(i-1) < F(i) < F(i+1)"
is true for all but one member of the interval [1..N-1];
Find the element of [1..N-1] for which the predicate is not true.

Formal mathematical expression depends on the notation conventions,
so I won't bother to even attempt to concoct such here for example.

Computer algorithms are several in overall algorithm, depending on
which primitives are available from declarative or imperative
programming, from functional or prodedural etc. programming, and
within each choice of *that*, the actual syntax can be any of
several per programming language such as Lisp or APL or Forth etc.
(Yeah, I deliberately avoided mentionning C or Fortran or Java etc.)

If my guess is correct that less than 1% of absolute beginners can
even state what is desired, so that a human can understand it
unambiguously, much less how to obtain it, likewise, expecting that
such an absolute beginner would simply "say/write what one means"
is IMO unreasonable.

Hence my idea is a sort of friendly "wizard" to take as much of a
crude ambiguous statment as the student is capable of and willing
to key in, use that to narrow and/or prioritize the number of
possible data-processing steps that are reasonably possible given
the data we have to work from, and then show the prioritized
options to the student, clearly expressed moderately verbosely, and
let the student either pick one of them or clarify what had been
keyed in just before. At the start, before the student has
said *anything* about what the next D/P step will be, *all*
possible operations on existing data are available, organized in
some meaningful way that would support a fully menu-driven method
as you presume. But a hybrid of vague statement what to do (such as
my first "answer" above which said nothing about there being any
sequence or that the numbers were ascending except for that one out
of place) and limited set of known options available a priori,
would be available whenever the student *can* at least partially
express what to do with the data.

Now in the example given above, the desired processing step is too
complicated to be directly available in most programming languages,
and so I would treat that as a high-level problem to be broken into
pieces rather than a low-level task to be directly expressed in
primitives available at the start. So in a sense it wasn't a fair
example, but it's what I thought of at the moment while composing
this article. And it *does* serve as a good example of the *other*
part of my proposed integrated NewEco contract-work
team-programming system, namely the discussion forum and
concensus-achiever process of how to break a complicated task down
into pieces so that each piece can be solved separately by
recursive application of the system and then the pieces can be
applied in combination to solve the given toplevel problem.
(Caveat: The problem example I stated above is probably not the
 actual highest-level problem, but more likely some intermediate
 problem that comprises the toplevel problem. But recursively it's
 the toplevel problem at the moment.)

> Point-and-click writing reminds me of point-and-click speaking.
> Great for those who need it but a hindrance to those who do not.

The hybrid system would satisfy people at both ends of your
spectrum, as well as people anywhere in the middle, such as a
person who knew only fifty words and had only a vague idea of just
a little of the grammar and needed to express something that used
about half known words that can be keyed in and half unknown words
that must be obtained from a thesaurus or menu. And actually if you
compare programming with ordering food in a restaurant, unless you
know in advance the entire menu of the restaurant or the entire set
of functions/methods of the programming language, you really *do*
need to consult the menu from time to time to see what's available,
rather than make a fool of yourself by trying to ask for thousands
of entrees not available in that restaurant
 (just **try** going into your local Jack in the Box and standing
  at the counter (or using the drive-thru intercom) and ordering a
  plate of escargot, then a plate of lasagna, then a pizza, then
  some marzipan candies, then a sundae, then a T-bone steak, then a
  souffle, then a mug of beer, then some fresh walnuts in the
  shell, then a can of corn beef hash, then some potstickers, then
  a bowl of chili, then a half pound of chicken almond chow mein,
  then a glass of Grey Reisling, then a plate of spaghetti, then a
  baked potato with sour cream, then ... see if you even get that
  far before the manager comes out to tell you to either read the
  menu or go away and stop bothering them)

So yeah I have a grand idea of a system that represents data per
intentional datatype rather than any specific implementational
datatype used in any particular programming language, and uses
generic intentional-datatype functions/methods not directly related
to anything in any regular programming language (at least it'll be
rare when by chance there's a function in an existing language that
is *exactly* as general as what I offer), but uses a hierarchy of
general-to-specific datatypes so that algorithms can be expressed
very generally at first then at some point pinned down if necessary
to a specific type of representation in order to be able to write
algorithms dependent on that representation, or just left in
abstract form if my system happens to have a built-in function to
do exactly what is needed so that no further splitting of the task
is necessary. For example, Java has the concept of a "Set", and
functions to do various set-theoretic operations on it, with the
*same* method specification regardless of whether the Set is
implemented as a TreeSet or HashSet. By comparison, Common Lisp
doesn't have such a generic set of methods defined, instead has
separate functions defined for linked-list acting as a set or
bitmap acting as a set or hashtable acting as a set. So depending
on how large a system I have implemented, various of those
operations might or might not be available at the get-go.

Let me go back to that example I gave earlier. If users are
presented with that problem, by example, and asked to think of ways
to solve it, some might prefer the mathematical approach, similar
to what I expressed earlier, similar to a declarational programming
language, while others might prefer a procedural or functional
expression of the algorithm. If both viewpoints had sufficient
voters, the project might split into two different programming
teams, one analyzing the problem mathematically, and the other
analyzing the problem imperatively. These teams would then break
the main goal into sub-tasks in somewhat different ways. And within
each team, there might be further differences of opinion, such as
whether to iterate sequentially (as might be done in C) or to use a
mapping operation to emulate set-theoretic operations (as might be
done in Lisp) or to use set-theoretical operations directly (as in
SQL). At an even finer level of analysis, the iterative team might
prefer linear search or parallel-processing (process-forking)
recursive binary search, depending on their perception of the CPU
and system-level task software available. And the linear-search
team might prefer either explicit array indexing, i.e.
  for (ix=0; ix<length(sequence); ix++) { ... sequence[ix] ...}
or stream primitives, i.e.
  stream=open(sequence); while (row=getNext(stream)) { ... row ...}
For the purpose of learning how to write algorithms, *all* these
various approaches have instructional value, and in some cases it
might be worthwhile for a given student to switch teams from time
to time to get a "feel" for other ways to analyze the very same
problem.

> This is not to say that traditional editors cannot be improved
> with better backup memory aids. Indeed, even IDLE recognizes
> function names and pops up a bar listing parameters.

But I assume the student using IDLE is *forced* into one particular
syntax for the given programming language?

> Feel free to develop a Visual Python environment.

Like I said, there won't be *any* programming-language syntax.
No Lisp, no PHP, no Java, no C
 (the C examples of array indexing vs. streams were just the best
  way I could think of expressing the *algorithm* style to reaaders
  of this thread, no intention that students would actually *see*
  that syntax anywhere in my system)
and no Python, sorry, but I have to emphasize that point again.

> I might even give it a try.

Would you give it a try if it didn't show you any Python syntax at
any point, but after an algorithm is *completed* it gave you the
option of rendering the algorithm in any of several different
programming languages, maybe even more than one way in each
language if the abstract intentional datatypes were never resolved
to specific implementational datatypes, hence could be emulated
multiple ways per multiple implementational datatypes in each
single language? For example, the sequence given in the example
above could be represented in Java as a generic Vector using
instance methods from the Vector class, or as a generic Array using
primitive C-like code within static methods, or as an explicit
user-defined class with user-defined methods, or as a user-defined
sub-class of Vector that used a mix of Vector methods and sub-class
methods, or in Java version 6 as a specialized type of Vector.

Oops, somehow when I downloaded the above and below articles, for
purpose of later responding to each, the newsgroup header from the
article below was lost. Google Groups advanced search for phrase:
  I did too and had the same question
turns up *nothing*, as if your article was never posted!
  Processing a single data file
turns up only one false match. (I hate how Google Groups has been
grossly broken for the past several months, as if they just don't
care any more.) So I have no way find your article (below) to get
the header, hence no way to post a followup to it, so I'll just
append my reply here:

> >> From: Steven D'Aprano <st... at REMOVE-THIS-cybersource.com.au>
> >> I'm interested. No-syntax IDE? How is this even possible?
> > I guess you missed what I previously posted.
> I did too and had the same question.

;Note the following is just *one* example of a test-rig set-up:

> > The basic idea is that
> > you start with test data, and you use menus to select appropriate
> > data-processing actions to perform on that data. For example, you
> > manually key in a file name containing test data, or copy and paste
> > that same file name. Then you select "open file by that name" or
> > "load all lines from file by that name" etc. from a menu. If you
> > just opened the file, you now have a stream of input, and you can
> > select to read one line or one s-expression or one character etc.
> > from that file. After loading the whole file or one unit of data,
> > you now have some *real* data to work from. For example, with a
> > line of input, you might break it into words.
> Processing a single data file is a common programming task, but
> not the only general category.

That was just an example, but in fact often when devising
algorithms the sample test input data *is* given either in a disk
file or in a Web page, so when writing *most* of the code, the
input data might indeed come from such a test-data-file, even
though the finished algorithm will in practice get its data *live*
from some other source such as the URL-encoded form contents. For
purpose of teaching absolute-beginning computer algorithm design, I
think working from a disk-file of test-data is sufficiently
general. That test-data file can in fact have either the
URL-encoded form contents, or associative arrays for GET and POST
and COOKIE already decoded as given. Then later after the student
practices setting up CGI-hello-world demos on his/her own personal
Web site, simply putting the two pieces together is enough to
achieve an online server-side Web application.

> A specialized Visual Data Analysis with Python might be a better
> and more focused project.

How would that be presented to the user via CGI or PHP?

> When I was doing statistical data analysis on a daily basis for a
> living, I would have loved to have had a system that would read
> in the first line and let me define (and name) the fields by
> point and click. (I usually worked with fixed-width,
> column-delimited fields.)

The first task is to *recognize* the boundaries between the various
column-delimited fields. If there is always a blank column between
adjacent data columns, and there's never an accidental blank column
throughout all rows of any data column, it's pretty simple by
algorithm to find where the columns are located, so as to show the
user the columns already delimited and then all the user has to do
is name each one (and if the first row is column headers, with no
repeat names, then even the naming can be done automatically).

2008.Aug I wrote such a column-finder function-set in Lisp:

;Given a list of rows of the table, such as lines read from a file:
;Build an array of column counts. Each count is the number of lines that
; have non-white in that column. Short lines don't ever count past the
; end, hence as if all white after end.
;Note: Omit any header line(s) when passing data to this function,
; so that only the actual data columns will be tabulated.
;Return that array, of length exactly equal to the longest line in the list.
(defun lines-count-nonwhite-cols (lines) ...)

;Given an array listing counts of non-white characters per column:
;Find start and end of each non-white multi-column.
;Return alternating list (startix1 endix1 startix2 endix2 ...)
(defun arrcolnw-to-ixpairs (arr) ...)

;Given alternating list of start..end indices for multi-columns:
;Make a function object to parse strings per those multi-columns.
(defun ixpairs-make-splitter-function-object (ixpairs) ...)

Those three together effect a parse of each line into a vector,
whereby positional indexing can pull out the nth field of each.

Then to automatically match user-defined column-header strings
against column headers given as top line of the file:

;Given the parsed form of the header line, hopefully with each field
; trimmed already, and a list of strings that are supposed to match
; some of these fields:
;Make sure each given string actually does match one of the fields.
;Note: The given string need only be a substring of the field name.
;Return a list of corresponding indexes into the record.
(defun hdrrec+hdrstrs-find-field-indexes (hdrrec hdrstrs) ...)

Now we have a way to map user-defined names to fields within
records, thus retrieve a field by name instead of index.

The rest of 2008-8-cols.lisp was a specific application of that to
parsing and further processing of the report from FileList, which
is a directory listing for Macintosh. Here's what one of those
reports looks like (just the header line and a few file-lines):
FILE NAME                       TYPE CREA   BYTES CREATED           MODIFIED          VOLUME                      PATH
About System 7.5                ttro ttxt   22032 96/05/31 12:00:00 96/05/31 12:00:00 HD:
About System 7.5.5 Update       ttro ttxt   17592 96/09/11 12:00:00 96/09/11 12:00:00 HD:
About the Control Panels folder ttro ttxt   16410 96/05/28 12:00:00 96/05/28 12:00:00 HD:Apple Extras:About the MacOS:
About the Extensions folder     ttro ttxt   20558 96/05/28 12:00:00 96/05/28 12:00:00 HD:Apple Extras:About the MacOS:
About the System Folder         ttro ttxt    4618 96/01/17 12:00:00 96/01/17 12:00:00 HD:Apple Extras:About the MacOS:
AppleCD Audio Player            APPL aucd  141160 95/06/19 12:00:00 95/06/19 12:00:00 HD:Apple Extras:AppleCD Audio Player:
AppleCD Audio Player Guide      poco reno  114948 95/12/22 12:00:00 95/12/22 12:00:00 HD:Apple Extras:AppleCD Audio Player:
About Automated Tasks           ttro ttxt    5742 96/05/28 12:00:00 96/05/28 12:00:00 HD:Apple Extras:AppleScript:Automated Tasks:
Add Alias to Apple Menu         APPL dplt    8559 94/08/02 00:00:00 94/08/02 00:00:00 HD:Apple Extras:AppleScript:Automated Tasks:
Find Original from Alias        APPL dplt    8580 94/08/02 00:00:00 94/08/02 00:00:00 HD:Apple Extras:AppleScript:Automated Tasks:
..
Note the blank column between "Apple" and "Extras" in the last
field, which would cause that field to appear to be two different
fields, if this were the *whole* datafile. There are various
work-arounds for such a case if it occurs.

> Instead, I had to write a format statement and some summary
> analysis code, run it, look at it for sanity, and decide if my
> format had been correct.

Pain! Too bad you didn't have 2008-8-cols.lisp available to use,
and didn't think to invent something similar yourself.