Regular expression and substitution, unexpected duplication

MRAB python at mrabarnett.plus.com
Wed Aug 19 00:08:46 CEST 2015


On 2015-08-18 22:42, Laurent Pointal wrote:
> Hello,
>
> I want to make a replacement in a string, to ensure that ellipsis are
> surrounded by spaces (this is not a typographycal problem, but a preparation
> for late text chunking).
>
> I tried with regular expressions and the SRE_Pattern.sub() method, but I
> have an unexpected duplication of the replacement pattern:
>
>
> The code:
>
> ellipfind_re = re.compile(r"((?=\.\.\.)|…)", re.IGNORECASE|re.VERBOSE)
> ellipfind_re.sub(' ... ',
>         "C'est un essai... avec différents caractères… pour voir.")
>
> And I retrieve:
>
> "C'est un essai ... ... avec différents caractères ...  pour voir."
>                      ^^^
>
> I tested with/without group capture, same result.
>
> My Python version:
> Python 3.4.3 (default, Mar 26 2015, 22:03:40)
> [GCC 4.9.2] on linux
>
> Any idea ?
>
(?=...) is a lookahead; a non-capture group is (?:...).

The regex should be r"((?:\.\.\.)|…)", which can be simplified to just
r"\.\.\.|…" for your use-case. (You don't need the
re.IGNORECASE|re.VERBOSE either!)



More information about the Python-list mailing list