[issue10713] re module doesn't describe string boundaries for \b

Ralph Corderoy report at bugs.python.org
Sun May 8 16:27:10 CEST 2011


Ralph Corderoy <ralph-pythonbugs at inputplus.co.uk> added the comment:

Examining the source of Ubuntu's python2.6 2.6.6-5ubuntu1 package
suggests beyond the limits of the string is considered \W, like Perl.

    Modules/_sre.c:
       336  LOCAL(int)
       337  SRE_AT(SRE_STATE* state, SRE_CHAR* ptr, SRE_CODE at)
       338  {
       339      /* check if pointer is at given position */
       340
       341      Py_ssize_t thisp, thatp;
       ...
       365      case SRE_AT_BOUNDARY:
       366          if (state->beginning == state->end)
       367              return 0;
       368          thatp = ((void*) ptr > state->beginning) ?
       369              SRE_IS_WORD((int) ptr[-1]) : 0;
       370          thisp = ((void*) ptr < state->end) ?
       371              SRE_IS_WORD((int) ptr[0]) : 0;
       372          return thisp != thatp;

SRE_IS_WORD() returns 16 for the 63 \w characters, 0 otherwise.

This is born out by tests.

Note, 366 above confirms it's never true for an empty string.  The
documentation states that \B "is just the opposite of \b" yet
re.match(r'\b', '') returns None and so does \B so \B isn't the opposite
of \b in all cases.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10713>
_______________________________________


More information about the Python-bugs-list mailing list