[New-bugs-announce] [issue25760] TextWrapper fails to split 'two-and-a-half-hour' correctly

Samwyse report at bugs.python.org
Sat Nov 28 21:07:24 EST 2015

New submission from Samwyse:

Single character words in a hyphenated phrase are not split correctly.  The root issue it the wordsep_re class variable.  To reproduce, run the following:

>>> import textwrap
>>> textwrap.TextWrapper.wordsep_re.split('two-and-a-half-hour')
['', 'two-', 'and-a', '-half-', 'hour']

It works if 'a' is replaces with two or more alphabetic characters.

>>> textwrap.TextWrapper.wordsep_re.split('two-and-aa-half-hour')
['', 'two-', '', 'and-', '', 'aa-', '', 'half-', 'hour']

The problem is in this part of the pattern:  (?=\w+[^0-9\W])

I confess that I don't understand the situation that would require that complicated of a pattern.  Why wouldn't (?=\w) would work?

components: Library (Lib)
messages: 255558
nosy: samwyse
priority: normal
severity: normal
status: open
title: TextWrapper fails to split 'two-and-a-half-hour' correctly
type: behavior
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list