What to use for finding as many syntax errors as possible.
Chris Angelico
rosuav at gmail.com
Tue Oct 11 07:39:07 EDT 2022
On Tue, 11 Oct 2022 at 18:12, <avi.e.gross at gmail.com> wrote:
>
> Thanks for a rather detailed explanation of some of what we have been
> discussing, Chris. The overall outline is about what I assumed was there but
> some of the details were, to put it politely, fuzzy.
>
> I see resemblances to something like how a web page is loaded and operated.
> I mean very different but at some level not so much.
>
> I mean a typical web page is read in as HTML with various keyword regions
> expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things
> often cleanly nested in others. The browser makes nodes galore in some kind
> of tree format with an assortment of objects whose attributes or methods
> represent aspects of what it sees. The resulting treelike structure has
> names like DOM.
Yes. The basic idea of "tokenize, parse, compile" can be used for
pretty much any language - even English, although its grammar is a bit
more convoluted than most programming languages, with many weird
backward compatibility features! I'll parse your last sentence above:
LETTERS The
SPACE
LETTERS resulting
SPACE
... you get the idea
LETTERS like
SPACE
LETTERS DOM
FULLSTOP # or call this token PERIOD if you're American
Now, we can group those tokens into meaningful sets.
Sentence(type=Statement,
subject=Noun(name="structure", addenda=[
Article(type=The),
Adjective(name="treelike"),
]),
verb=Verb(type=Being, name="has", addenda=[]),
object=Noun(name="name", plural=True, addenda=[
Adjective(phrase=Phrase(verb=Verb(name="like"), object=Noun(name="DOM"),
]),
)
Grammar nerds will probably dispute some of the awful shorthanding I
did here, but I didn't want to devise thousands of AST nodes just for
this :)
> To a certain approximation, this tree starts a certain way but is regularly
> being manipulated (or perhaps a copy is) as it regularly is looked at to see
> how to display it on the screen at the moment based on the current tree
> contents and another set of rules in Cascading Style Sheets.
Yep; the DOM tree is initialized from the HTML (usually - it's
possible to start a fresh tree with no HTML) and then can be
manipulated afterwards.
> These are not at all the same thing but share a certain set of ideas and
> methods and can be very powerful as things interact.
Oh absolutely. That's why there are languages designed to help you
define other languages.
> In effect the errors in the web situation have such analogies too as in what
> happens if a region of HTML is not well-formed or uses a keyword not
> recognized.
Aaaaand they're horribly horribly messy, due to a few decades of
sloppy HTML programmers and the desire to still display the page even
if things are messed up :) But, again, there's a huge difference
between syntactic errors (like omitting a matching angle bracket) and
semantic errors (a keyword not known, like using <spam> when you
should have used <span>). In the latter case, you can still build a
DOM tree, but you have an unknown element; in the former case, you
have to guess at what the author meant, just to get anything going at
all.
> There was a guy around a few years ago who suggested he would create a
> system where you could create a series of some kind of configuration files
> for ANY language and his system would them compile or run programs for each
> and every such language? Was that on this forum? What ever happened to him?
That was indeed on this forum, and I have no idea what happened to
him. Maybe he realised that all he'd invented was the Unix shebang?
ChrisA
More information about the Python-list
mailing list