I just noticed the textwrap module in the standard library will break and line-wrap hyphenated words given the opportunity:
from textwrap import wrap wrap('yaba daba-doo', width=10) ['yaba daba-', 'doo']
I have two questions about that: 1) Wouldn't it be worth mentioning this in the Python Library Reference (or it is just too obvious)? 2) Wouldn't it be useful to have a simple way to turn it off? Something like:
from textwrap import wrap wrap('yaba daba-doo', width=10, break_hyphenated_words=False) ['yaba', 'daba-doo']
Since proper line-wrapping of hyphenated words is language-dependent and can interact with other orthographic and typesetting practices, I think it would be nicer to have a documented way to turn it off completely. Granted, it's not hard to manually do either; on Python 2.5.2 (as well as on Python 2.6 r62386), it's just a matter of setting the "TextWrapper.wordsep_re" attribute to "re.compile('(\s+)')"... I think having a publicly documented attribute wouldn't hurt anyway. -- Sylvain <syfou@users.sourceforge.net> The IBM 2250 is impressive ... if you compare it with a system selling for a tenth its price. -- D. Cohen
On Fri, Apr 18, 2008 at 11:18 PM, Sylvain Fourmanoit <syfou@users.sourceforge.net> wrote:
I just noticed the textwrap module in the standard library will break and line-wrap hyphenated words given the opportunity:
from textwrap import wrap wrap('yaba daba-doo', width=10) ['yaba daba-', 'doo']
I have two questions about that:
1) Wouldn't it be worth mentioning this in the Python Library Reference (or it is just too obvious)?
I think it is obvious, but patches against the docs mentioning this I am sure would be welcome.
2) Wouldn't it be useful to have a simple way to turn it off? Something like:
from textwrap import wrap wrap('yaba daba-doo', width=10, break_hyphenated_words=False) ['yaba', 'daba-doo']
I personally don't think so as you could easily just walk the list and just concatenate the hyphenated words. So -0 from me. And if you do try to pursue this, you might want to try to come up with a shorter keyword argument name. -Brett
Since proper line-wrapping of hyphenated words is language-dependent and can interact with other orthographic and typesetting practices, I think it would be nicer to have a documented way to turn it off completely.
Granted, it's not hard to manually do either; on Python 2.5.2 (as well as on Python 2.6 r62386), it's just a matter of setting the "TextWrapper.wordsep_re" attribute to "re.compile('(\s+)')"... I think having a publicly documented attribute wouldn't hurt anyway.
-- Sylvain <syfou@users.sourceforge.net>
The IBM 2250 is impressive ... if you compare it with a system selling for a tenth its price. -- D. Cohen _______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig
Le samedi 19 avril 2008 à 00:08 -0700, Brett Cannon a écrit :
On Fri, Apr 18, 2008 at 11:18 PM, Sylvain Fourmanoit <syfou@users.sourceforge.net> wrote:
I just noticed the textwrap module in the standard library will break and line-wrap hyphenated words given the opportunity:
from textwrap import wrap wrap('yaba daba-doo', width=10) ['yaba daba-', 'doo']
[...]
I personally don't think so as you could easily just walk the list and just concatenate the hyphenated words.
But then the words wouldn't be wrapped properly, would they ? In the above example, if you join the two strings together, the result is more than 10 chars long. I think this feature makes sense, and doesn't really clutter the API. In the meantime, a workaround is to use other unicode hyphens (*) in order to get the desired result, e.g.:
print(" | ".join(textwrap.wrap('yaba daba-doo', width=10))) yaba daba- | doo >>> print(" | ".join(textwrap.wrap('yaba daba\u2010doo', width=10))) yaba | daba‐doo
(*) http://www.fileformat.info/info/unicode/char/00ad/index.htm http://www.fileformat.info/info/unicode/char/2010/index.htm
participants (3)
-
Antoine Pitrou
-
Brett Cannon
-
Sylvain Fourmanoit