Mailman 3 What about regexp string litterals : re".*" ? - Python-ideas

newer
Relative import of self or parent...

What about regexp string litterals : re".*" ?

Simon D.

March 27, 2017

5:17 p.m.

Hello, After some french discussions about this idea, I subscribed here to suggest adding a new string litteral, for regexp, inspired by other types like : u"", r"", b"", br"", f""… The regexp string litteral could be represented by : re"" It would ease the use of regexps in Python, allowing to have some regexp litterals, like in Perl or JavaScript. We may end up with an integration like :

...

...
...
import re if re".k" in 'ok': ... print "ok" ok

Regexps are part of the language in Perl, and the rather complicated integration of regexp in other languages, especially in Python, is something that comes up easily in language comparing discussion. I've always felt JavaScript integration being half the way it should, and new string litterals types in Python (like f"") looked like a good compromise to have a tight integration of regexps without asking to make them part of the language (as I imagine it has already been discussed years ago, and obviously denied…). As per XKCD illustration, using a regexp may be a problem on its own, but really, the "each-language a new and complicated approach" is another difficulty, of the level of writing regexps I think. And then, when you get the trick for Python, it feels to me still to much letters to type regarding the numerous problems one can solve using regexps. I know regexps are slower than string-based workflow (like .startswith) but regexps can do the most and the least, so they are rapide to come up with, once you started to think with them. As Python philosophy is to spare brain-cycles, sacrificing CPU-cycles, allowing to easily use regexps is a brain-cycle savior trick. What do you think ? -- Simon Descarpentries +336 769 702 53 http://acoeuro.com

Show replies by date

Serhiy Storchaka

March 2017

5:39 p.m.

On 27.03.17 18:17, Simon D. wrote:

...

After some french discussions about this idea, I subscribed here to suggest adding a new string litteral, for regexp, inspired by other types like : u"", r"", b"", br"", f""…

The regexp string litteral could be represented by : re""

It would ease the use of regexps in Python, allowing to have some regexp litterals, like in Perl or JavaScript.

There are several regular expression libraries for Python. One of them is included in the stdlib, but this is not the first regular expression library in the stdlib and may be not the last. Particular project can choose using an alternative regular expression library (because it has additional features or is faster for particular cases).

Simon D.

9:54 a.m.

* Serhiy Storchaka <storchaka@gmail.com> [2017-03-27 18:39:19 +0300]:

...

There are several regular expression libraries for Python. One of them is included in the stdlib, but this is not the first regular expression library in the stdlib and may be not the last. Particular project can choose using an alternative regular expression library (because it has additional features or is faster for particular cases).

I believe that the u"" notation in Python 2.7 is defined by while importing the unicode_litterals module. Each regexp lib could provide its instanciation of regexp litteral notation. And if only the default one does, it would still be won for the beginers, and the majority of persons using the stdlib. -- Simon Descarpentries +336 769 702 53 http://s.d12s.fr

Paul Moore

10:31 a.m.

On 28 March 2017 at 08:54, Simon D. <simon@acoeuro.com> wrote:

...

I believe that the u"" notation in Python 2.7 is defined by while importing the unicode_litterals module.

That's not true. The u"..." syntax is part of the language. from future import unicode_literals is something completely different.

...

Each regexp lib could provide its instanciation of regexp litteral notation.

The Python language has no way of doing that - user (or library) defined literals are not possible.

...

And if only the default one does, it would still be won for the beginers, and the majority of persons using the stdlib.

How? You've yet to prove that having a regex literal form is an improvement over re.compile(r'put your regex here'). You've asserted it, but that's a matter of opinion. We'd need evidence of real-life code that was clearly improved by the existence of your proposed construct. Paul

Abe Dillon

10:30 p.m.

My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs. On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...

On 28 March 2017 at 08:54, Simon D. <simon@acoeuro.com> wrote:

...
I believe that the u"" notation in Python 2.7 is defined by while importing the unicode_litterals module.

That's not true. The u"..." syntax is part of the language. from future import unicode_literals is something completely different.

...
Each regexp lib could provide its instanciation of regexp litteral notation.

The Python language has no way of doing that - user (or library) defined literals are not possible.

...
And if only the default one does, it would still be won for the beginers, and the majority of persons using the stdlib.

How? You've yet to prove that having a regex literal form is an improvement over re.compile(r'put your regex here'). You've asserted it, but that's a matter of opinion. We'd need evidence of real-life code that was clearly improved by the existence of your proposed construct.

Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Markus Meskanen

10:59 p.m.

On Mar 29, 2017 23:31, "Abe Dillon" <abedillon@gmail.com> wrote: My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/ PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs. I've never heard of this before, looks *awesome*. Thanks, if it's as good as it sounds, I too would love something like this added to the standard library.

Stephen J. Turnbull

5:56 a.m.

Abe Dillon writes:

...

My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs.

You think that example is more readable than the proposed transalation ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$ which makes it obvious that the regexp is not very useful from the word "^"? (It matches only URLs which are the only thing, including whitespace, on the line, probably not what was intended.) Are those groups capturing in Verbal Expressions? The use of "find" (~ "search") rather than "match" is disconcerting to the experienced user. What does alternation look like? How about alternation of non-trivial regular expressions? Etc, etc. As far as I can see, Verbal Expressions are basically a way of making it so painful to write regular expressions that people will restrict themselves to regular expressions that would be quite readable in traditional notation! I don't think that this failure to respect the developer's taste is restricted to this particular implementation, either. They *are* regular expressions, just with a verbose, obstructive notation. Far more important than "more readable" regular expressions would be a parsing library in the stdlib, reducing the developer's temptation to parse using complex regular expressions. IMHO YMMV etc. Steve

Steven D'Aprano

5:02 a.m.

On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote:

...

The regexp string litteral could be represented by : re""

It would ease the use of regexps in Python, allowing to have some regexp litterals, like in Perl or JavaScript.

We may end up with an integration like :

...
...
...
import re if re".k" in 'ok': ... print "ok" ok

I dislike the suggested syntax re".k". It looks ugly and not different enough from a raw string. I can easily see people accidentally writing: if r".k" in 'ok': ... and wondering why their regex isn't working. Javascript uses /regex/ as a literal syntax for creating RegExp objects. That's the closest equivalent to the way Python would have to operate, although I don't think we can use the /.../ syntax without breaking the rule that Python's parser will not be more complex than LL(1). So I think /.../ is definitely out. Perl 6 uses m/regex/ and a number of other variations: https://docs.perl6.org/language/regexes I doubt that this will actually be useful. It *seems* useful if you just write trivial regexes like your example, but without Perl's rich set of terse (cryptic?) operators, I don't know that literal regexes makes enough difference to be worth the trouble. There's not very much difference between (say) these: mo = re.search(r'.k', mystring) if mo: print(mo.group()) mo = re.'.k'.search(mystring) if mo: print(mo.group()) You effectively save two parentheses, that's all. That doesn't seem like much of a win for introducing new syntax. Can you show some example code where a regex literal will have a worthwhile advantage?

...

Regexps are part of the language in Perl, and the rather complicated integration of regexp in other languages, especially in Python, is something that comes up easily in language comparing discussion.

Surely you are joking? Regex integration in Python is simple. Regular expression objects are ordinary objects, like lists and dicts and floats. The only difference is that you don't call the Regex object constructor directly, you either pass a string to a module level function re.match(r'my regex', mystring) or you create a regex object: regex = re.compile(r'my regex') regex.match(mystring) That's very neat, Pythonic and simple. The regex itself is very close to the same syntax uses by Perl, Javascript or other variations, the only complication is that due to Python's escaping rules you should use a raw string r'' instead of doubling up all backslashes. I wouldn't call that "rather complicated" -- it is a lot less complicated than Perl: - m// can be abbreviated // - when do you use // directly and when do you use qr// ? - s/// operator implicitly defines a regex In Perl 6, I *think* they use rx// instead of qr//, or are they different things? Both m// and the s/// operator can use arbitrary delimiters, e.g. ! or , (but not : or parentheses) instead of the slashes, and m// regexes will implicitly match against $_ if you don't explicitly match against something else. Compared to Perl, I don't think Python's regexes are complicated. -- Steve

Markus Meskanen

5:24 a.m.

On Mar 28, 2017 06:08, "Steven D'Aprano" <steve@pearwood.info> wrote: On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote:

...

The regexp string litteral could be represented by : re""

It would ease the use of regexps in Python, allowing to have some regexp litterals, like in Perl or JavaScript.

We may end up with an integration like :

...
...
...
import re if re".k" in 'ok': ... print "ok" ok

I dislike the suggested syntax re".k". It looks ugly and not different enough from a raw string. I can easily see people accidentally writing: if r".k" in 'ok': ... and wondering why their regex isn't working. While I agree with most of your arguments, surely you must be the one joking here? "Ugly" is obviously a matter of opinion, I personally find the proposed syntax more beautiful than the // used in many other languages. But claiming it's bad because people would mix it up with raw strings and people not realizing is nonsense. Not only does it look very different, but attempting to call match() or any other regex method on it would surely give out a reasonable error: AttributeError: 'str' object has no attribute 'match' Which _in the worst case scenario_ results into googling where the top rated StackOverflow question clearly explains the difference between r'' and re''

Chris Angelico

7:37 a.m.

On Tue, Mar 28, 2017 at 2:24 PM, Markus Meskanen <markusmeskanen@gmail.com> wrote:

...

While I agree with most of your arguments, surely you must be the one joking here? "Ugly" is obviously a matter of opinion, I personally find the proposed syntax more beautiful than the // used in many other languages. But claiming it's bad because people would mix it up with raw strings and people not realizing is nonsense. Not only does it look very different, but attempting to call match() or any other regex method on it would surely give out a reasonable error:

AttributeError: 'str' object has no attribute 'match'

Which _in the worst case scenario_ results into googling where the top rated StackOverflow question clearly explains the difference between r'' and re''

Yes, but if the "in" operator is used, it would still work, because r"..." is a str, and "str" in "string" is meaningful. But I think a better solution will be for regex literals to be syntax-highlighted differently. If they're a truly-supported syntactic feature, they can be made visually different in your editor, making the distinction blatantly obvious. That said, though, I'm -1 on this. Currently, every prefix letter has its own meaning, and broadly speaking, combining them combines their meanings. An re"..." literal should be a raw "e-string", whatever that is, so I would expect that e"..." is the same kind of thing but with different backslash handling. ChrisA

Markus Meskanen

7:45 a.m.

On Tue, Mar 28, 2017 at 8:37 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

Yes, but if the "in" operator is used, it would still work, because r"..." is a str, and "str" in "string" is meaningful.

But I think a better solution will be for regex literals to be syntax-highlighted differently. If they're a truly-supported syntactic feature, they can be made visually different in your editor, making the distinction blatantly obvious.

That said, though, I'm -1 on this. Currently, every prefix letter has its own meaning, and broadly speaking, combining them combines their meanings. An re"..." literal should be a raw "e-string", whatever that is, so I would expect that e"..." is the same kind of thing but with different backslash handling. <http://python.org/psf/codeofconduct/>

Fair enough, I haven't followed this thread too closely and didn't consider the "in" operator being used. Even then I find it unlikely that confusing re'...' with r'...' and not noticing would turn out to be an issue. That being said, I'm also -1 on this, especially now after your point on "e-string". Adding these re-strings would straight out prevent e-string from ever being implemented.

Simon D.

9:56 a.m.

New subject: What about regexp string litterals : m".*" ?

* Chris Angelico <rosuav@gmail.com> [2017-03-28 16:37:16 +1100]:

...

But I think a better solution will be for regex literals to be syntax-highlighted differently. If they're a truly-supported syntactic feature, they can be made visually different in your editor, making the distinction blatantly obvious.

That said, though, I'm -1 on this. Currently, every prefix letter has its own meaning, and broadly speaking, combining them combines their meanings. An re"..." literal should be a raw "e-string", whatever that is, so I would expect that e"..." is the same kind of thing but with different backslash handling.

First, I would like to state that the "module-static" version of regexp functions, avoiding the compile step, are a great idea. (e.g. : mo = re.search(r'.k', mystring) ) The str integrated one also, but maybe confusing, which regexp lib is used ? (must be the default one). Then, re"" being two letters looks like a real problem. Lets pick one amongs the 22 remaining free alphabet letters. What about : - g"", x"" (like in regex) ? - m"" (like shawn for Perl, meaming Match ?) - q"" (for Query ?) - k"" (in memory of Stephen Cole Kleene ? https://en.wikipedia.org/wiki/Regular_expression) - /"" (to be half the way toward /regexp/ syntax) - ~"" ?"" (other symbols, I avoid regexp-starting symbols, would be ugly in real usage) And what about an approach with flag firsts ? (or where to put them ?) : i"" (regexp with ignorecase flag on) AILMSX"" (regexp with all flags on) It would consume a lot of letters, but would use it for a good reason :-) Personnally, I think a JavaScript-like syntaxe would be great, and I feel it as asking too much… : - it would naturally be highlihted differently ; - it would not be the first (happy) similarity (https://hackernoon.com/javascript-vs-python-in-2017-d31efbb641b4#.ky9it5hph) - its a working integration, including flag matters. -- Simon Descarpentries +336 769 702 53 http://s.d12s.fr

Simon D.

8:38 a.m.

New subject: What about regexp string litterals : m".*" ?

* Simon D. <simon@acoeuro.com> [2017-03-28 09:56:05 +0200]:

...

The str integrated one also, but maybe confusing, which regexp lib is used ? (must be the default one).

Ok, this was a mistake, based on JavaScript memories… There is no regexp aware functions around str, but some hint to go find your happiness in the re module. -- Simon Descarpentries +336 769 702 53 http://acoeuro.com

Nick Coghlan

6:49 a.m.

On 28 March 2017 at 01:17, Simon D. <simon@acoeuro.com> wrote:

...

It would ease the use of regexps in Python

We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare. Baking them directly into the language runtime also locks people in to a particular regex engine implementation, rather than being able to swap in a third party one if they choose to do so (as many folks currently do with the `regex` PyPI module). So it's appropriate to keep them as a string-based library level capability, and hence on a relatively level playing field with less comprehensive, but typically easier to maintain, options like string methods and third party text parsing libraries (such as https://pypi.python.org/pypi/parse for something close to the inverse of str.format) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Abe Dillon

3:38 a.m.

...

a huge advantage of REs is that they are common to many languages. You can take a regex from grep to Perl to your editor to Python. They're not absolutely identical, of course, but the basics are all the same. Creating a new search language means everyone has to learn anew. ChrisA

1) I'm not suggesting we get rid of the re module (the VE implementation I linked requires it) 2) You can easily output regex from verbal expressions 3) verbal expressions are implemented in many different languages too: https://verbalexpressions.github.io/ 4) It even has a generic interface that all implementations are meant to follow: https://github.com/VerbalExpressions/implementation/wiki/List-of-methods-to-... Note that the entire documentation is 250 words while just the syntax portion of Python docs for the re module is over 3000 words.

...

You think that example is more readable than the proposed transalation ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$

Yes. I find it *far* more readable. It's not a soup of symbols like Perl code. I can only surmise that you're fluent in regex because it seems difficult for you to see how the above could be less readable than English words. which makes it obvious that the regexp is not very useful from the

...

word "^"? (It matches only URLs which are the only thing, including whitespace, on the line, probably not what was intended.)

I could tell it only matches URLs that are the only thing inside the string because it clearly says: start_of_line() and end_of_line(). I would have had to refer to a reference to know that "^" doesn't always mean "not", it sometimes means "start of string" and probably other things. I would also have to check a reference to know that "$" can mean "end of string" (and probably other things). Are those groups capturing in Verbal Expressions? The use of "find"

...

(~ "search") rather than "match" is disconcerting to the experienced user.

You can alternately use the word "then". The source code is just one python file. It's very easy to read. I actually like "then" over "find" for the example: verbal_expression.start_of_line() .then('http') .maybe('s') .then('://') .maybe('www.') .anything_but(' ') .end_of_line() What does alternation look like? .OR(option1).OR(option2).OR(option3)... How about alternation of

...

non-trivial regular expressions?

.OR(other_verbal_expression) As far as I can see, Verbal Expressions are basically a way of making

...

it so painful to write regular expressions that people will restrict themselves to regular expressions

What's so painful to write about them? Does your IDE not have autocompletion? I find REs so painful to write that I usually just use string methods if at all feasible. I don't think that this failure to respect the

...

developer's taste is restricted to this particular implementation, either.

I generally find it distasteful to write a pseudolanguage in strings inside of other languages (this applies to SQL as well). Especially when the design principals of that pseudolanguage are *diametrically opposed* to the design principals of the host language. A key principal of Python's design is: "you read code a lot more often than you write code, so emphasize readability". Regex seems to be based on: "Do the most with the fewest key-strokes. Readability be dammed!". It makes a lot more sense to wrap the psudolanguage in constructs that bring it in-line with the host language than to take on the mental burden of trying to comprehend two different languages at the same time. If you disagree, nothing's stopping you from continuing to write res the old-fashion way. Can we at least agree that baking special re syntax directly into the language is a bad idea? On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 28 March 2017 at 01:17, Simon D. <simon@acoeuro.com> wrote:

...
It would ease the use of regexps in Python

We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare.

Baking them directly into the language runtime also locks people in to a particular regex engine implementation, rather than being able to swap in a third party one if they choose to do so (as many folks currently do with the `regex` PyPI module).

So it's appropriate to keep them as a string-based library level capability, and hence on a relatively level playing field with less comprehensive, but typically easier to maintain, options like string methods and third party text parsing libraries (such as https://pypi.python.org/pypi/parse for something close to the inverse of str.format)

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Stephen J. Turnbull

9:23 a.m.

Abe Dillon writes:

...

Note that the entire documentation is 250 words while just the syntax portion of Python docs for the re module is over 3000 words.

Since Verbal Expressions (below, VEs, indicating notation) "compile" to regular expressions (spelling out indicates the internal matching implementation), the documentation of VEs presumably ignores everything except the limited language it's useful for. To actually understand VEs, you need to refer to the RE docs. Not a win IMO.

...

...
You think that example is more readable than the proposed transalation ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$

Yes. I find it *far* more readable. It's not a soup of symbols like Perl code. I can only surmise that you're fluent in regex because it seems difficult for you to see how the above could be less readable than English words.

Yes, I'm fairly fluent in regular expression notation (below, REs). I've maintained a compiler for one dialect. I'm not interested in the difference between words and punctuation though. The reason I find the middle RE most readable is that it "looks like" what it's supposed to match, in a contiguous string as the object it will match will be contiguous. If I need to parse it to figure out *exactly* what it matches, yes, that takes more effort. But to understand a VE's semantics correctly, I'd have to look it up as often as you have to look up REs because many words chosen to notate VEs have English meanings that are (a) ambiguous, as in all natural language, and (b) only approximate matches to RE semantics.

...

I could tell it only matches URLs that are the only thing inside the string because it clearly says: start_of_line() and end_of_line().

That's not the problem. The problem is the semantics of the method "find". "then" would indeed read better, although it doesn't exactly match the semantics of concatenation in REs.

...

I would have had to refer to a reference to know that "^" doesn't always mean "not", it sometimes means "start of string" and probably other things. I would also have to check a reference to know that "$" can mean "end of string" (and probably other things).

And you'll still have to do that when reading other people's REs.

...

...
Are those groups capturing in Verbal Expressions? The use of "find" (~ "search") rather than "match" is disconcerting to the experienced user.

You can alternately use the word "then". The source code is just one python file. It's very easy to read. I actually like "then" over "find" for the example:

You're missing the point. The reader does not get to choose the notation, the author does. I do understand what several varieties of RE mean, but the variations are of two kinds: basic versus extended (ie, what tokens need to be escaped to be taken literally, which ones have special meaning if escaped), and extensions (which can be ignored). Modern RE facilities are essentially all of the extended variety. Once you've learned that, you're in good shape for almost any RE that should be written outside of an obfuscated code contest. This is a fundamental principle of Python design: don't make readers of code learn new things. That includes using notation developed elsewhere in many cases.

...

What does alternation look like?

.OR(option1).OR(option2).OR(option3)...

How about alternation of

...
non-trivial regular expressions?

.OR(other_verbal_expression)

Real examples, rather than pseudo code, would be nice. I think you, too, will find that examples of even fairly simple nested alternations containing other constructs become quite hard to read, as they fall off the bottom of the screen. For example, the VE equivalent of scheme = "(https?|ftp|file):" would be (AFAICT): scheme = VerEx().then(VerEx().then("http") .maybe("s") .OR("ftp") .OR("file")) .then(":") which is pretty hideous, I think. And the colon is captured by a group. If perversely I wanted to extract that group from a match, what would its index be? I guess you could keep the linear arrangement with scheme = (VerEx().add("(") .then("http") .maybe("s") .OR("ftp") .OR("file") .add(")") .then(":")) but is that really an improvement over scheme = VerEx().add("(https?|ftp|file):") ;-)

...

...
As far as I can see, Verbal Expressions are basically a way of making it so painful to write regular expressions that people will restrict themselves to regular expressions

What's so painful to write about them?

One thing that's painful is that VEs "look like" context-free grammars, but clumsy and without the powerful semantics. You can get the readability you want with greater power using grammars, which is why I would prefer we work on getting a parser module into the stdlib. But if one doesn't know about grammars, it's still not great. The main pains about writing VEs for me are (1) reading what I just wrote, (2) accessing capturing groups, and (3) verbosity. Even a VE to accurately match what is normally a fairly short string, such as the scheme, credentials, authority, and port portions of a "standard" URL, is going to be hundreds of characters long and likely dozens of lines if folded as in the examples. Another issue is that we already have a perfectly good poor man's matching library: glob. The URL example becomes http{,s}://{,www.}* Granted you lose the anchors, but how often does that matter? You apparently don't use them often enough to remember them.

...

Does your IDE not have autocompletion?

I don't want an IDE. I have Emacs.

...

I find REs so painful to write that I usually just use string methods if at all feasible.

Guess what? That's the right thing to do anyway. They're a lot more readable and efficient when partitioning a string into two or three parts, or recognizing a short list of affixes. But chaining many methods, as VEs do, is not a very Pythonic way to write a program.

...

...
I don't think that this failure to respect the developer's taste is restricted to this particular implementation, either.

I generally find it distasteful to write a pseudolanguage in strings inside of other languages (this applies to SQL as well).

You mean like arithmetic operators? (Lisp does this right, right? Only one kind of expression, the function call!) It's a matter of what you're used to. I understand that people new to text-processing, or who don't do so much of it, don't find REs easy to read. So how is this a huge loss? They don't use regular expressions very often! In fact, they're far more likely to encounter, and possibly need to understand, REs written by others!

...

Especially when the design principals of that pseudolanguage are *diametrically opposed* to the design principals of the host language. A key principal of Python's design is: "you read code a lot more often than you write code, so emphasize readability". Regex seems to be based on: "Do the most with the fewest key-strokes.

So is all of mathematics. There's nothing wrong with concise expression for use in special cases.

...

Readability be dammed!". It makes a lot more sense to wrap the psudolanguage in constructs that bring it in-line with the host language than to take on the mental burden of trying to comprehend two different languages at the same time.

If you disagree, nothing's stopping you from continuing to write res the old-fashion way.

I don't think that RE and SQL are "pseudo" languages, no. And I, and most developers, will continue to write regular expressions using the much more compact and expressive RE notation. (In fact with the exception of the "word" method, in VEs you still need to use RE notion to express most of the Python extensions.) So what you're saying is that you don't read much code, except maybe your own. Isn't that your problem? Those of us who cooperate widely on applications using regular expressions will continue to communicate using REs. If that leaves you out, that's not good. But adding VEs to the stdlib (and thus encouraging their use) will split the community into RE users and VE users, if VEs are at all useful. That's a bad. I don't see that the potential usefulness of VEs to infrequent users of regular expressions outweighing the downsides of "many ways to do it" in the stdlib.

...

Can we at least agree that baking special re syntax directly into the language is a bad idea?

I agree that there's no particular need for RE literals. If one wants to mark an RE as some special kind of object, re.compile() does that very well both by converting to a different type internally and as a marker syntactically.

...

On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare.

I agree with Nick. Regular expressions, whatever the notation, are a useful tool (no suspension of disbelief necessary for me, though!). But they are cryptic, and it's not just the notation. People (even experienced RE users) are often surprised by what fairly simple regular expression match in a given text, because people want to read a regexp as instructions to a one-pass greedy parser, and it isn't. For example, above I wrote scheme = "(https?|ftp|file):" rather than scheme = "(\w+):" because it's not unlikely that I would want to treat those differently from other schemes such as mailto, news, and doi. In many applications of regular expressions (such as tokenization for a parser) you need many expressions. Compactness really is a virtue in REs. Steve

Stephan Houben

10:20 a.m.

Hi all, FWIW, I also strongly prefer the Verbal Expression style and consider "normal" regular expressions to become quickly unreadable and unmaintainable. Verbal Expressions are also much more composable. Stephan 2017-03-31 9:23 GMT+02:00 Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp>:

...

Abe Dillon writes:

...
Note that the entire documentation is 250 words while just the syntax portion of Python docs for the re module is over 3000 words.

Since Verbal Expressions (below, VEs, indicating notation) "compile" to regular expressions (spelling out indicates the internal matching implementation), the documentation of VEs presumably ignores everything except the limited language it's useful for. To actually understand VEs, you need to refer to the RE docs. Not a win IMO.

...
...
You think that example is more readable than the proposed transalation ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$

Yes. I find it *far* more readable. It's not a soup of symbols like Perl code. I can only surmise that you're fluent in regex because it seems difficult for you to see how the above could be less readable than English words.

Yes, I'm fairly fluent in regular expression notation (below, REs). I've maintained a compiler for one dialect.

I'm not interested in the difference between words and punctuation though. The reason I find the middle RE most readable is that it "looks like" what it's supposed to match, in a contiguous string as the object it will match will be contiguous. If I need to parse it to figure out *exactly* what it matches, yes, that takes more effort. But to understand a VE's semantics correctly, I'd have to look it up as often as you have to look up REs because many words chosen to notate VEs have English meanings that are (a) ambiguous, as in all natural language, and (b) only approximate matches to RE semantics.

...
I could tell it only matches URLs that are the only thing inside the string because it clearly says: start_of_line() and end_of_line().

That's not the problem. The problem is the semantics of the method "find". "then" would indeed read better, although it doesn't exactly match the semantics of concatenation in REs.

...
I would have had to refer to a reference to know that "^" doesn't always mean "not", it sometimes means "start of string" and probably other things. I would also have to check a reference to know that "$" can mean "end of string" (and probably other things).

And you'll still have to do that when reading other people's REs.

...
...
Are those groups capturing in Verbal Expressions? The use of "find" (~ "search") rather than "match" is disconcerting to the experienced user.

You can alternately use the word "then". The source code is just one python file. It's very easy to read. I actually like "then" over "find" for the example:

You're missing the point. The reader does not get to choose the notation, the author does. I do understand what several varieties of RE mean, but the variations are of two kinds: basic versus extended (ie, what tokens need to be escaped to be taken literally, which ones have special meaning if escaped), and extensions (which can be ignored). Modern RE facilities are essentially all of the extended variety. Once you've learned that, you're in good shape for almost any RE that should be written outside of an obfuscated code contest.

This is a fundamental principle of Python design: don't make readers of code learn new things. That includes using notation developed elsewhere in many cases.

...
What does alternation look like?

.OR(option1).OR(option2).OR(option3)...

How about alternation of

...
non-trivial regular expressions?

.OR(other_verbal_expression)

Real examples, rather than pseudo code, would be nice. I think you, too, will find that examples of even fairly simple nested alternations containing other constructs become quite hard to read, as they fall off the bottom of the screen.

For example, the VE equivalent of

scheme = "(https?|ftp|file):"

would be (AFAICT):

scheme = VerEx().then(VerEx().then("http") .maybe("s") .OR("ftp") .OR("file")) .then(":")

which is pretty hideous, I think. And the colon is captured by a group. If perversely I wanted to extract that group from a match, what would its index be?

I guess you could keep the linear arrangement with

scheme = (VerEx().add("(") .then("http") .maybe("s") .OR("ftp") .OR("file") .add(")") .then(":"))

but is that really an improvement over

scheme = VerEx().add("(https?|ftp|file):")

;-)

...
...
As far as I can see, Verbal Expressions are basically a way of making it so painful to write regular expressions that people will restrict themselves to regular expressions

What's so painful to write about them?

One thing that's painful is that VEs "look like" context-free grammars, but clumsy and without the powerful semantics. You can get the readability you want with greater power using grammars, which is why I would prefer we work on getting a parser module into the stdlib.

But if one doesn't know about grammars, it's still not great. The main pains about writing VEs for me are (1) reading what I just wrote, (2) accessing capturing groups, and (3) verbosity. Even a VE to accurately match what is normally a fairly short string, such as the scheme, credentials, authority, and port portions of a "standard" URL, is going to be hundreds of characters long and likely dozens of lines if folded as in the examples.

Another issue is that we already have a perfectly good poor man's matching library: glob. The URL example becomes

http{,s}://{,www.}*

Granted you lose the anchors, but how often does that matter? You apparently don't use them often enough to remember them.

...
Does your IDE not have autocompletion?

I don't want an IDE. I have Emacs.

...
I find REs so painful to write that I usually just use string methods if at all feasible.

Guess what? That's the right thing to do anyway. They're a lot more readable and efficient when partitioning a string into two or three parts, or recognizing a short list of affixes. But chaining many methods, as VEs do, is not a very Pythonic way to write a program.

...
...
I don't think that this failure to respect the developer's taste is restricted to this particular implementation, either.

I generally find it distasteful to write a pseudolanguage in strings inside of other languages (this applies to SQL as well).

You mean like arithmetic operators? (Lisp does this right, right? Only one kind of expression, the function call!) It's a matter of what you're used to. I understand that people new to text-processing, or who don't do so much of it, don't find REs easy to read. So how is this a huge loss? They don't use regular expressions very often! In fact, they're far more likely to encounter, and possibly need to understand, REs written by others!

...
Especially when the design principals of that pseudolanguage are *diametrically opposed* to the design principals of the host language. A key principal of Python's design is: "you read code a lot more often than you write code, so emphasize readability". Regex seems to be based on: "Do the most with the fewest key-strokes.

So is all of mathematics. There's nothing wrong with concise expression for use in special cases.

...
Readability be dammed!". It makes a lot more sense to wrap the psudolanguage in constructs that bring it in-line with the host language than to take on the mental burden of trying to comprehend two different languages at the same time.

If you disagree, nothing's stopping you from continuing to write res the old-fashion way.

I don't think that RE and SQL are "pseudo" languages, no. And I, and most developers, will continue to write regular expressions using the much more compact and expressive RE notation. (In fact with the exception of the "word" method, in VEs you still need to use RE notion to express most of the Python extensions.) So what you're saying is that you don't read much code, except maybe your own. Isn't that your problem? Those of us who cooperate widely on applications using regular expressions will continue to communicate using REs. If that leaves you out, that's not good. But adding VEs to the stdlib (and thus encouraging their use) will split the community into RE users and VE users, if VEs are at all useful. That's a bad. I don't see that the potential usefulness of VEs to infrequent users of regular expressions outweighing the downsides of "many ways to do it" in the stdlib.

...
Can we at least agree that baking special re syntax directly into the language is a bad idea?

I agree that there's no particular need for RE literals. If one wants to mark an RE as some special kind of object, re.compile() does that very well both by converting to a different type internally and as a marker syntactically.

...
On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare.

I agree with Nick. Regular expressions, whatever the notation, are a useful tool (no suspension of disbelief necessary for me, though!). But they are cryptic, and it's not just the notation. People (even experienced RE users) are often surprised by what fairly simple regular expression match in a given text, because people want to read a regexp as instructions to a one-pass greedy parser, and it isn't.

For example, above I wrote

scheme = "(https?|ftp|file):"

rather than

scheme = "(\w+):"

because it's not unlikely that I would want to treat those differently from other schemes such as mailto, news, and doi. In many applications of regular expressions (such as tokenization for a parser) you need many expressions. Compactness really is a virtue in REs.

Steve

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

10:26 a.m.

On 31 March 2017 at 09:20, Stephan Houben <stephanh42@gmail.com> wrote:

...

FWIW, I also strongly prefer the Verbal Expression style and consider "normal" regular expressions to become quickly unreadable and unmaintainable.

Do you publish your code widely? What's the view of 3rd party users of your code? Until this thread, I'd never even heard of the Verbal Expression style, and I read a *lot* of open source Python code. While it's purely anecdotal, that suggests to me that the style isn't particularly commonly used. (OTOH, there's also a lot less use of REs in Python code than in other languages. Much string manipulation in Python avoids using regular languages at all, in my experience. I think that's a good thing - use simpler tools when appropriate and keep the power tools for the hard cases where they justify their complexity). Paul

Stephen J. Turnbull

April 2017

9:28 p.m.

Stephan Houben writes:

...

FWIW, I also strongly prefer the Verbal Expression style and consider "normal" regular expressions to become quickly unreadable and unmaintainable.

Verbal Expressions are also much more composable.

So are grammars. But REs aren't so bad or incomposable if you build them up slowly in a grammar-like fashion and with a specific convention for groups: atom = r"[-%A-Za-z0-9]+" # incorrect, for example only # each component has different lexical # restrictions scheme = user = password = rf"({atom})" domain = rf"((?:{atom}\.)+{atom})" port = r"([0-9]+)" authority = rf"(?:{user}(?::{password})?@)?{domain}(?::{port})?" path = rf"((?:/{atom})+/?)" # Incorrect, but handles many common URIs. url = rf"{scheme}://(?:{authority})?({path})" Of course this is parsing with regular expressions, which is generally frowned upon, and it would be even uglier without f-strings. The non-capturing groups admittedly are a significant distraction when reading. It's about the limit of what I would do if I didn't have a parsing library but did have REs (more complex than this and I'd write my own parser). I will concede that it took me 15 minutes to write that, of which 4 were spent testing and fixing one bug (which was a real bug; there were no syntax errors in the REs). Some of the time was spent deciding how closely to follow the RFC 3986 generic syntax, though. I will also concede that I've been writing REs since 1981, although not as frequently in the last 15 years as in the first 20. Would you like to write that using VEs and show us the result? Don't forget to document the indicies for extracting the scheme, user, password, domain, port, and path (in my RE, they are 1-6). Steve

Neil Girdhar

3:22 a.m.

Same. One day, Python will have a decent parsing library. On Friday, March 31, 2017 at 4:21:51 AM UTC-4, Stephan Houben wrote:

...

Hi all,

FWIW, I also strongly prefer the Verbal Expression style and consider "normal" regular expressions to become quickly unreadable and unmaintainable.

Verbal Expressions are also much more composable.

Stephan

...
Abe Dillon writes:

...
Note that the entire documentation is 250 words while just the syntax portion of Python docs for the re module is over 3000 words.

Since Verbal Expressions (below, VEs, indicating notation) "compile" to regular expressions (spelling out indicates the internal matching implementation), the documentation of VEs presumably ignores everything except the limited language it's useful for. To actually understand VEs, you need to refer to the RE docs. Not a win IMO.

...
...
You think that example is more readable than the proposed

2017-03-31 9:23 GMT+02:00 Stephen J. Turnbull <turnbull....@u.tsukuba.ac.jp <javascript:>>: transalation

...
...
...
^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$

Yes. I find it *far* more readable. It's not a soup of symbols like Perl code. I can only surmise that you're fluent in regex because it seems difficult for you to see how the above could be less readable than English words.

Yes, I'm fairly fluent in regular expression notation (below, REs). I've maintained a compiler for one dialect.

I'm not interested in the difference between words and punctuation though. The reason I find the middle RE most readable is that it "looks like" what it's supposed to match, in a contiguous string as the object it will match will be contiguous. If I need to parse it to figure out *exactly* what it matches, yes, that takes more effort. But to understand a VE's semantics correctly, I'd have to look it up as often as you have to look up REs because many words chosen to notate VEs have English meanings that are (a) ambiguous, as in all natural language, and (b) only approximate matches to RE semantics.

...
I could tell it only matches URLs that are the only thing inside the string because it clearly says: start_of_line() and end_of_line().

That's not the problem. The problem is the semantics of the method "find". "then" would indeed read better, although it doesn't exactly match the semantics of concatenation in REs.

...
I would have had to refer to a reference to know that "^" doesn't always mean "not", it sometimes means "start of string" and probably other things. I would also have to check a reference to know that "$" can mean "end of string" (and probably other things).

And you'll still have to do that when reading other people's REs.

...
...
Are those groups capturing in Verbal Expressions? The use of "find" (~ "search") rather than "match" is disconcerting to the experienced user.

You can alternately use the word "then". The source code is just one python file. It's very easy to read. I actually like "then" over "find" for the example:

You're missing the point. The reader does not get to choose the notation, the author does. I do understand what several varieties of RE mean, but the variations are of two kinds: basic versus extended (ie, what tokens need to be escaped to be taken literally, which ones have special meaning if escaped), and extensions (which can be ignored). Modern RE facilities are essentially all of the extended variety. Once you've learned that, you're in good shape for almost any RE that should be written outside of an obfuscated code contest.

This is a fundamental principle of Python design: don't make readers of code learn new things. That includes using notation developed elsewhere in many cases.

...
What does alternation look like?

.OR(option1).OR(option2).OR(option3)...

How about alternation of

...
non-trivial regular expressions?

.OR(other_verbal_expression)

Real examples, rather than pseudo code, would be nice. I think you, too, will find that examples of even fairly simple nested alternations containing other constructs become quite hard to read, as they fall off the bottom of the screen.

For example, the VE equivalent of

scheme = "(https?|ftp|file):"

would be (AFAICT):

scheme = VerEx().then(VerEx().then("http") .maybe("s") .OR("ftp") .OR("file")) .then(":")

which is pretty hideous, I think. And the colon is captured by a group. If perversely I wanted to extract that group from a match, what would its index be?

I guess you could keep the linear arrangement with

scheme = (VerEx().add("(") .then("http") .maybe("s") .OR("ftp") .OR("file") .add(")") .then(":"))

but is that really an improvement over

scheme = VerEx().add("(https?|ftp|file):")

;-)

...
...
As far as I can see, Verbal Expressions are basically a way of making it so painful to write regular expressions that people will restrict themselves to regular expressions

What's so painful to write about them?

One thing that's painful is that VEs "look like" context-free grammars, but clumsy and without the powerful semantics. You can get the readability you want with greater power using grammars, which is why I would prefer we work on getting a parser module into the stdlib.

But if one doesn't know about grammars, it's still not great. The main pains about writing VEs for me are (1) reading what I just wrote, (2) accessing capturing groups, and (3) verbosity. Even a VE to accurately match what is normally a fairly short string, such as the scheme, credentials, authority, and port portions of a "standard" URL, is going to be hundreds of characters long and likely dozens of lines if folded as in the examples.

Another issue is that we already have a perfectly good poor man's matching library: glob. The URL example becomes

http{,s}://{,www.}*

Granted you lose the anchors, but how often does that matter? You apparently don't use them often enough to remember them.

...
Does your IDE not have autocompletion?

I don't want an IDE. I have Emacs.

...
I find REs so painful to write that I usually just use string methods if at all feasible.

Guess what? That's the right thing to do anyway. They're a lot more readable and efficient when partitioning a string into two or three parts, or recognizing a short list of affixes. But chaining many methods, as VEs do, is not a very Pythonic way to write a program.

...
...
I don't think that this failure to respect the developer's taste is restricted to this particular implementation, either.

I generally find it distasteful to write a pseudolanguage in strings inside of other languages (this applies to SQL as well).

You mean like arithmetic operators? (Lisp does this right, right? Only one kind of expression, the function call!) It's a matter of what you're used to. I understand that people new to text-processing, or who don't do so much of it, don't find REs easy to read. So how is this a huge loss? They don't use regular expressions very often! In fact, they're far more likely to encounter, and possibly need to understand, REs written by others!

...
Especially when the design principals of that pseudolanguage are *diametrically opposed* to the design principals of the host language. A key principal of Python's design is: "you read code a lot more often than you write code, so emphasize readability". Regex seems to be based on: "Do the most with the fewest key-strokes.

So is all of mathematics. There's nothing wrong with concise expression for use in special cases.

...
Readability be dammed!". It makes a lot more sense to wrap the psudolanguage in constructs that bring it in-line with the host language than to take on the mental burden of trying to comprehend two different languages at the same time.

If you disagree, nothing's stopping you from continuing to write res the old-fashion way.

I don't think that RE and SQL are "pseudo" languages, no. And I, and most developers, will continue to write regular expressions using the much more compact and expressive RE notation. (In fact with the exception of the "word" method, in VEs you still need to use RE notion to express most of the Python extensions.) So what you're saying is that you don't read much code, except maybe your own. Isn't that your problem? Those of us who cooperate widely on applications using regular expressions will continue to communicate using REs. If that leaves you out, that's not good. But adding VEs to the stdlib (and thus encouraging their use) will split the community into RE users and VE users, if VEs are at all useful. That's a bad. I don't see that the potential usefulness of VEs to infrequent users of regular expressions outweighing the downsides of "many ways to do it" in the stdlib.

...
Can we at least agree that baking special re syntax directly into the language is a bad idea?

I agree that there's no particular need for RE literals. If one wants to mark an RE as some special kind of object, re.compile() does that very well both by converting to a different type internally and as a marker syntactically.

...
On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan <ncog...@gmail.com <javascript:>> wrote:

...
We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare.

I agree with Nick. Regular expressions, whatever the notation, are a useful tool (no suspension of disbelief necessary for me, though!). But they are cryptic, and it's not just the notation. People (even experienced RE users) are often surprised by what fairly simple regular expression match in a given text, because people want to read a regexp as instructions to a one-pass greedy parser, and it isn't.

For example, above I wrote

scheme = "(https?|ftp|file):"

rather than

scheme = "(\w+):"

because it's not unlikely that I would want to treat those differently from other schemes such as mailto, news, and doi. In many applications of regular expressions (such as tokenization for a parser) you need many expressions. Compactness really is a virtue in REs.

Steve

_______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Mark Lawrence

8:29 a.m.

On 03/04/2017 02:22, Neil Girdhar wrote:

...

Same. One day, Python will have a decent parsing library.

Nothing here https://wiki.python.org/moin/LanguageParsing suits your needs? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Neil Girdhar

10:25 a.m.

On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas < python-ideas@python.org> wrote:

...

On 03/04/2017 02:22, Neil Girdhar wrote:

...
Same. One day, Python will have a decent parsing library.

Nothing here https://wiki.python.org/moin/LanguageParsing suits your needs?

No, unfortunately. I tried to make a simple grammar that parses latex code, and it was basically impossible with these tools.

...

From what I remember, you need the match objects to be able to accept or reject their matched sub-nodes.

It's same thing if you want to parse Python in one pass (not the usual two passes that CPython does whereby it creates an AST and then validates it). It would be cooler to validate as you go since the errors can be much richer since you have the whole parsing context? It's been a while, so I might be forgetting something, but I remember thinking that I'll check back in five years and see if anything new has come out.

...

-- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.

Mark Lawrence

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/FSd6xLHowg8/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

Ryan Gonzalez

2:54 p.m.

Have you tried PyParsing and/or Grako? They're some of my favorites (well, I like PLY too, but I'm thinking you wouldn't like it too much). -- Ryan (ライアン) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com On Apr 3, 2017 3:26 AM, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...

On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas < python-ideas@python.org> wrote:

...
On 03/04/2017 02:22, Neil Girdhar wrote:

...
Same. One day, Python will have a decent parsing library.

Nothing here https://wiki.python.org/moin/LanguageParsing suits your needs?

No, unfortunately.

I tried to make a simple grammar that parses latex code, and it was basically impossible with these tools.

From what I remember, you need the match objects to be able to accept or reject their matched sub-nodes.

It's same thing if you want to parse Python in one pass (not the usual two passes that CPython does whereby it creates an AST and then validates it). It would be cooler to validate as you go since the errors can be much richer since you have the whole parsing context?

It's been a while, so I might be forgetting something, but I remember thinking that I'll check back in five years and see if anything new has come out.

...
-- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.

Mark Lawrence

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/python-ideas/FSd6xLHowg8/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Neil Girdhar

2:57 p.m.

I've tried PyParsing. I haven't tried Grako. On Mon, Apr 3, 2017 at 8:54 AM Ryan Gonzalez <rymg19@gmail.com> wrote:

...

Have you tried PyParsing and/or Grako? They're some of my favorites (well, I like PLY too, but I'm thinking you wouldn't like it too much).

-- Ryan (ライアン) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com

On Apr 3, 2017 3:26 AM, "Neil Girdhar" <mistersheik@gmail.com> wrote:

On Mon, Apr 3, 2017 at 2:31 AM Mark Lawrence via Python-ideas < python-ideas@python.org> wrote:

On 03/04/2017 02:22, Neil Girdhar wrote:

...
Same. One day, Python will have a decent parsing library.

Nothing here https://wiki.python.org/moin/LanguageParsing suits your needs?

No, unfortunately.

I tried to make a simple grammar that parses latex code, and it was basically impossible with these tools.

From what I remember, you need the match objects to be able to accept or reject their matched sub-nodes.

It's same thing if you want to parse Python in one pass (not the usual two passes that CPython does whereby it creates an AST and then validates it). It would be cooler to validate as you go since the errors can be much richer since you have the whole parsing context?

It's been a while, so I might be forgetting something, but I remember thinking that I'll check back in five years and see if anything new has come out.

-- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.

Mark Lawrence

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/FSd6xLHowg8/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Juancarlo Añez

7:06 p.m.

On Mon, Apr 3, 2017 at 8:57 AM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I've tried PyParsing. I haven't tried Grako.

Caveat: I'm the author of Grako. It's very easy to do complex parsing with Grako. The grammar can be embedded in a Python string, and the compiled grammar can be used for parsing without generating any Python code. Most of the unit tests under the distribution's grako/grako/test use those features. https://pypi.org/project/grako/ One of the ways in which a top-down grammar (as those accepted by Grako) can be used is to organize a series of regular expressions into a tree to handle complex cases with clarity. -- Juancarlo *Añez*

2883

Age (days ago)

2890

Last active (days ago)

List overview

Download

24 comments

14 participants

participants (14)

Abe Dillon
Chris Angelico
Juancarlo Añez
Mark Lawrence
Markus Meskanen
Neil Girdhar
Nick Coghlan
Paul Moore
Ryan Gonzalez
Serhiy Storchaka
Simon D.
Stephan Houben
Stephen J. Turnbull
Steven D'Aprano

What about regexp string litterals : re".*" ?

Simon D.

Simon D.

Markus Meskanen

Markus Meskanen

Markus Meskanen

Simon D.

Simon D.

Stephan Houben

Mark Lawrence

tags

participants (14)