[docs] [issue10713] re module doesn't describe string boundaries for \b
Ralph Corderoy
report at bugs.python.org
Sun May 8 16:27:10 CEST 2011
Ralph Corderoy <ralph-pythonbugs at inputplus.co.uk> added the comment:
Examining the source of Ubuntu's python2.6 2.6.6-5ubuntu1 package
suggests beyond the limits of the string is considered \W, like Perl.
Modules/_sre.c:
336 LOCAL(int)
337 SRE_AT(SRE_STATE* state, SRE_CHAR* ptr, SRE_CODE at)
338 {
339 /* check if pointer is at given position */
340
341 Py_ssize_t thisp, thatp;
...
365 case SRE_AT_BOUNDARY:
366 if (state->beginning == state->end)
367 return 0;
368 thatp = ((void*) ptr > state->beginning) ?
369 SRE_IS_WORD((int) ptr[-1]) : 0;
370 thisp = ((void*) ptr < state->end) ?
371 SRE_IS_WORD((int) ptr[0]) : 0;
372 return thisp != thatp;
SRE_IS_WORD() returns 16 for the 63 \w characters, 0 otherwise.
This is born out by tests.
Note, 366 above confirms it's never true for an empty string. The
documentation states that \B "is just the opposite of \b" yet
re.match(r'\b', '') returns None and so does \B so \B isn't the opposite
of \b in all cases.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10713>
_______________________________________
More information about the docs
mailing list