On 2022-08-16 22:14, Barry Scott wrote:
On 16 Aug 2022, at 21:24, MRAB <python@mrabarnett.plus.com> wrote:
Other regex implementations have escape sequences for horizontal whitespace (`\h` and `\H`) and vertical whitespace (`\v` and `\V`).
The regex module already supports `\h`, but I can't use `\v` because it represents `\0x0b', as it does in the re module.
You seem to be mixing the use \ as the escape for strings and the \ that re uses. Is it the behaviour that '\<unknown>' becomes '\\<unknown>' that means this is a breaking change?
Won't this work? ``` re.compile('\v:\\v') # which is the same as re.compile(r'\x0b:\v') ```
Some languages, e.g. Perl, have a dedicated syntax for writing regexes, and they take `\n` (a backslash followed by 'n') to mean "match a newline". Other languages, including Python, use string literals and can contain an actual newline, but they also take `\n` (a backslash followed by 'n') to mean "match a newline". Thus:
print(re.match('\n', '\n')) # Literal newline. <re.Match object; span=(0, 1), match='\n'> print(re.match('\\n', '\n')) # `\n` sequence. <re.Match object; span=(0, 1), match='\n'>
On the other hand:
print(re.match('\b', '\b')) # Literal backspace. <re.Match object; span=(0, 1), match='\x08'> print(re.match('\\b', '\b')) # `\b` sequence, which means a word boundary. None
The problem is that the re and regex modules already have the `\v` (a backslash followed by 'v') sequence to mean "match the '\v' character", so: re.compile('\v') and: re.compile('\\v') mean exactly the same.
Now that someone has asked for it, I'm trying to find a nice way of adding it, and I'm currently thinking that maybe I could use `\y` and `\Y` instead as they look a little like `\v` and `\V`, and, also, vertical whitespace is sort-of in the y-direction.
As far as I can tell, only ProgressSQL uses them, and, even then, it's for what everyone else writes as `\b` and `\B`.
I want the regex module to remain compatible with the re module, in case they get added there sometime in the future.
Opinions?