Where to put wrap_text()?
Hidden away in distutils.fancy_getopt is an exceedingly handy function called wrap_text(). It does just what you might expect from the name: def wrap_text (text, width): """wrap_text(text : string, width : int) -> [string] Split 'text' into multiple lines of no more than 'width' characters each, and return the list of strings that results. """ Surprise surprise, Optik uses this. I've never been terribly happy about importing it from distutils.fancy_getopt, and putting Optik into the standard library as OptionParser is a great opportunity for putting wrap_text somewhere more sensible. I happen to think that wrap_text() is useful for more than just auto-formatting --help messages, so hiding it away in OptionParser.py doesn't seem right. Also, Perl has a Text::Wrap module that's been part of the standard library for not-quite-forever -- so shouln't Python have one too? Proposal: a new standard library module, wrap_text, which combines the best of distutils.fancy_getopt.wrap_text() and Text::Wrap. Right now, I'm thinking of an interface something like this: wrap(text : string, width : int) -> [string] Split 'text' into multiple lines of no more than 'width' characters each, and return the list of strings that results. Tabs in 'text' are expanded with string.expandtabs(), and all other whitespace characters (including newline) are converted to space. [This is identical to distutils.fancy_getopt.wrap_text(), but the docstring is more complete.] wrap_nomunge(text : string, width : int) -> [string] Same as wrap(), without munging whitespace. [Not sure if this is really useful to expose publicly. Opinions?] fill(text : string, width : int, initial_tab : string = "", subsequent_tab : string = "") -> string Reformat the paragraph in 'text' to fit in lines of no more than 'width' columns. The first line is prefixed with 'initial_tab', and subsequent lines are prefixed with 'subsequent_tab'; the lengths of the tab strings are accounted for when wrapping lines to fit in 'width' columns. [This is just a glorified "\n".join(wrap(...)); the idea to add initial_tab and subsequent_tab was stolen from Perl's Text::Wrap.] I'll go whip up some code and submit a patch to SF. If people like it, I'll even write some tests and documentation too. Greg -- Greg Ward - Unix nerd gward@python.net http://starship.python.net/~gward/ Support bacteria -- it's the only culture some people have!
On Sat, Jun 01, 2002, Greg Ward wrote:
Proposal: a new standard library module, wrap_text, which combines the best of distutils.fancy_getopt.wrap_text() and Text::Wrap.
Personally, I'd like to at least get the functionality of some versions of 'fmt', which have both goal and maxlength parameters. If you feel like getting ambitious, there's the 'par' program that can wrap quoted text, but that can always be added to a later version of the library. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In the end, outside of spy agencies, people are far too trusting and willing to help." --Ira Winkler
Proposal: a new standard library module, wrap_text, which combines the best of distutils.fancy_getopt.wrap_text() and Text::Wrap.
I think this is a fine idea. But *please* don't put an underscore in the name. I'd say "wrap" or "wraptext" are better than "wrap_text". --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Proposal: a new standard library module, wrap_text, which combines the best of distutils.fancy_getopt.wrap_text() and Text::Wrap.
I think this is a fine idea. But *please* don't put an underscore in the name. I'd say "wrap" or "wraptext" are better than "wrap_text".
Some possibilities are: * a string method * a UserString method * a new module text, with a function wrap() * add function wrap() to UserString Should it work on unicode strings too? Neal
Some possibilities are:
* a string method * a UserString method
This should *definitely* not be a method. Too specialized, too many possibilities for tweaking the algorithm.
* a new module text, with a function wrap() * add function wrap() to UserString
Should it work on unicode strings too?
Yes. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GW" == Greg Ward <gward@python.net> writes:
GW> Proposal: a new standard library module, wrap_text, which GW> combines the best of distutils.fancy_getopt.wrap_text() and GW> Text::Wrap. Right now, I'm thinking of an interface something GW> like this: You might consider a text package with submodules for various wrapping algorithms. The text package might even grow other functionality later too. I say this because in Mailman I also have a wrap() function (big surprise, eh?) that implements the Python FAQ wizard rules for wrapping: def wrap(text, column=70, honor_leading_ws=1): """Wrap and fill the text to the specified column. Wrapping is always in effect, although if it is not possible to wrap a line (because some word is longer than `column' characters) the line is broken at the next available whitespace boundary. Paragraphs are also always filled, unless honor_leading_ws is true and the line begins with whitespace. This is the algorithm that the Python FAQ wizard uses, and seems like a good compromise. """ There's nothing at all Mailman specific about it, so I wouldn't mind donating it to the standard library. -Barry
[Greg Ward]
Proposal: a new standard library module, wrap_text, which combines the best of distutils.fancy_getopt.wrap_text() and Text::Wrap.
[Aahz]
Personally, I'd like to at least get the functionality of some versions of 'fmt'
[Guido van Rossum]
I think this is a fine idea. But *please* don't put an underscore in the name. I'd say "wrap" or "wraptext" are better than "wrap_text".
One thing that I would love to have available in Python is a function able to wrap text using Knuth's filling algorithm. GNU `fmt' does it, and it is _so_ better than dumb refilling, in my eyes at least, that I managed so Emacs own filling algorithm is short-circuited with an external call (I do not mind the small fraction of a second it takes). Also, is there some existing module in which `wraptext' would fit nicely? That might be better than creating a new module for not many functions. -- François Pinard http://www.iro.umontreal.ca/~pinard
On Sat, Jun 01, 2002, François Pinard wrote:
Also, is there some existing module in which `wraptext' would fit nicely? That might be better than creating a new module for not many functions.
I'd prefer to create a package called 'text', with wrap being a module inside it. That way, as we add parsing (e.g. mxTextTools) and other features to the standard library, they can be stuck in the package. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In the end, outside of spy agencies, people are far too trusting and willing to help." --Ira Winkler
On 01 June 2002, Fran?ois Pinard said:
One thing that I would love to have available in Python is a function able to wrap text using Knuth's filling algorithm. GNU `fmt' does it, and it is _so_ better than dumb refilling, in my eyes at least, that I managed so Emacs own filling algorithm is short-circuited with an external call (I do not mind the small fraction of a second it takes).
Damn, I had no idea there was a body of computer science (however small) devoted to the art of filling text. Trust Knuth to be there first. Do you have a reference for this algorithm apart from GNU fmt's source code? Google'ing for "knuth text fill algorithm" was unhelpful, ditto with s/fill/wrap/. Anyways, despite being warned just today on the conceptual/philosophical danger of classes whose names end in "-er" [1], I'm leaning towards a TextWrapper class, so that everyone may impose their desires through subclassing. I'll start with my simple naive text-wrapping algorithm, and then we can see who wants to contribute fancy/clever algorithms to the pot.
Also, is there some existing module in which `wraptext' would fit nicely? That might be better than creating a new module for not many functions.
Not if it grows to accomodate Optik/OptionParser, Mailman, regrtest, etc. Greg [1] objects should *be*, not *do*, and class names like HelpFormatter and TextWrapper are impositions of procedural abstraction onto OOP. It's something to be aware of, but still a useful idiom (IMHO). -- Greg Ward - Unix geek gward@python.net http://starship.python.net/~gward/ No problem is so formidable that you can't just walk away from it.
[Greg Ward]
Damn, I had no idea there was a body of computer science (however small) devoted to the art of filling text.
I take it you don't spend much time surveying the range of computer science literature <wink>.
Trust Knuth to be there first. Do you have a reference for this algorithm apart from GNU fmt's source code? Google'ing for "knuth text fill algorithm" was unhelpful, ditto with s/fill/wrap/.
Search for Knuth hyphenation instead. Three months later, the best advice you'll have read is to avoid hyphenation entirely. But then you're stuck fighting snaky little rivers of vertical whitespace without the biggest gun in the arsenal. Avoid right justification entirely too, and let the whitespace fall where it may. Doing justification with fixed-width fonts is like juggling dirt anyway <wink>.
Anyways, despite being warned just today on the conceptual/philosophical danger of classes whose names end in "-er" [1], I'm leaning towards a TextWrapper class, so that everyone may impose their desires through subclassing.
LOL! Resolved, that the world would be a better place if all classes ended with "-ist".
On 01 June 2002, Tim Peters said:
I take it you don't spend much time surveying the range of computer science literature <wink>.
*snort* I went to grad school in CS. Wasn't that enough? ;->
Search for
Knuth hyphenation
instead. Three months later, the best advice you'll have read is to avoid hyphenation entirely.
I have no desire to put auto-hyphenation into the Python standard library -- isn't the whole world trying to get *away* from (natural) language-specific code? It's wonderful that Knuth came up with the algorithm, and even more wonderful that Andrew implemented it for us in Python. My wrapping algorithm respects hyphens according to the English-language conventions I learned in school, augmented by my peculiar need to handle strings like "-b" and "--file". But that's all I need.
Doing justification with fixed-width fonts is like juggling dirt anyway <wink>.
Don't worry, I have even less intention of going there. Greg -- Greg Ward - Linux nerd gward@python.net http://starship.python.net/~gward/ Never put off till tomorrow what you can put off till the day after tomorrow.
[Greg Ward]
Do you have a reference for this algorithm apart from GNU fmt's source code?
Surely not handy. I heard about it, and others even more capable, many years ago. If I remember well, Knuth's algorithm plays by moving line cuts and optimising a global function through dynamic programming, giving more points, say, when punctuation coincides with end of lines, removing points when a single letter words appear at end of lines, and such thing. So lines are not guaranteed to be as filled as possible, but the overall appearance of the paragraph gets better, sometimes much better. I'm Cc:ing Ross Paterson, who wrote GNU `fmt', in hope he could shed some light about references, or otherwise. Some filling algorithms used by typographers (or so I heard) are even careful about dismantling vertical or diagonal (aliased) white lines which sometimes build up across paragraphs by the effect of dumbier filling.
I'm leaning towards a TextWrapper class, so that everyone may impose their desires through subclassing.
Distutils experience speaking here? :-) By the way, I would like if the module was not named `text'. I use `text' all over in my programs already as a common variable name, as a way to not use `string' for a common variable name, for obvious reasons. Granted that `string' is progressively becoming available again :-). Maybe Python should try to not name modules with likely to be use-everywhere local variables.
Not if it grows to accomodate Optik/OptionParser, Mailman, regrtest, etc.
At some places in my things, I have unusual wrapping/filling needs, and wonder if they could all fit in a generic scheme. An interesting question and exercise, surely. -- François Pinard http://www.iro.umontreal.ca/~pinard
Do you have a reference for this algorithm apart from GNU fmt's source code?
Can we focus on getting the module/package structure and a basic algorithm first? It's fine to design the structure for easy extensibility with other algorithms, but implementing Knuth's algorithm seems hopelessly out of scope. Even Emacs' fill-paragraph is too fancy-schmancy for my taste (for inclusion as a Python standard library). Simply breaking lines at a certain limit is all that's needed. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]
Do you have a reference for this algorithm apart from GNU fmt's source code?
Can we focus on getting the module/package structure and a basic algorithm first? It's fine to design the structure for easy extensibility with other algorithms, but implementing Knuth's algorithm seems hopelessly out of scope.
There is no emergency for Knuth's algorithm of, course. However, if I mentioned it, this was as an invitation for the package to be designed with an opened mind about extensibility. And it usually helps opening the mind, pondering various avenues. What should we read in your "hopelessly out of scope" comment, above? Do you mean you would object beforehand that Python offers it? -- François Pinard http://www.iro.umontreal.ca/~pinard
On Sun, Jun 02, 2002 at 09:09:57AM -0400, Fran?ois Pinard wrote:
years ago. If I remember well, Knuth's algorithm plays by moving line cuts and optimising a global function through dynamic programming, giving more points, say, when punctuation coincides with end of lines, ...
If that's the same algorithm that's used by TeX, see http://www.amk.ca/python/code/tex_wrap.html . --amk
greg wrote:
Damn, I had no idea there was a body of computer science (however small) devoted to the art of filling text. Trust Knuth to be there first. Do you have a reference for this algorithm apart from GNU fmt's source code? Google'ing for "knuth text fill algorithm" was unhelpful, ditto with s/fill/wrap/.
http://www.amk.ca/python/code/tex_wrap.html
Also, is there some existing module in which `wraptext' would fit nicely? That might be better than creating a new module for not many functions.
"string" (yes, I'm serious). </F>
Greg Ward <gward@python.net>:
despite being warned just today on the conceptual/philosophical danger of classes whose names end in "-er" [1]
[1] objects should *be*, not *do*, and class names like HelpFormatter and TextWrapper are impositions of procedural abstraction onto OOP.
I disagree with this statement completely. Surely the concept of objects *doing* things is central to the whole idea of OO! Why do you think objects have things called "methods"?-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Hi, people. For this incoming text wrapper facility, there is a feature that appears really essential to me, and many others: the protection of full stops[1]. In a previous message, I spoke of Knuth's algorithm as a nice possibility, but this is merely whipped cream and cherry over the ice cream. Protection of full stops does not fall in that decoration category, it is essential. I mean, for those who care, a wrapper without full stop protection would be rather unusable when there is more than one sentence to refill. ---------- [1] Full stops are punctuation ending sentences with two spaces guaranteed. Full stops are defined that way for typography based on fixed width fonts, like when we say "this many characters to a line". -- François Pinard http://www.iro.umontreal.ca/~pinard
On Tue, Jun 04, 2002 at 09:29:51AM -0400, Fran?ois Pinard wrote:
[1] Full stops are punctuation ending sentences with two spaces guaranteed. Full stops are defined that way for typography based on fixed width fonts, like when we say "this many characters to a line".
I don't think this really matters, because I doubt anyone will be implementing full justification. Left justification is just a matter of inserting newlines at particular points, so if the input data has two spaces after punctuation, line-breaking won't introduce any errors. --amk
[Andrew Kuchling]
On Tue, Jun 04, 2002 at 09:29:51AM -0400, Fran?ois Pinard wrote:
[1] Full stops are punctuation ending sentences with two spaces guaranteed. Full stops are defined that way for typography based on fixed width fonts, like when we say "this many characters to a line".
I don't think this really matters, because I doubt anyone will be implementing full justification.
This is an orthogonal matter, unrelated to full stops. Simultaneous left and right justification for fixed fonts texts is _not_ to be praised[1]. The real goal of any typographical device, like wrapping, is improving the legibility of text. Maybe simultaneous left and right justification is more "good looking", some would even say "beautiful", but I think it is considered well known that such simultaneous justification signficiatnly decreases legibility for fixed width fonts. If a typographical device aims beauty instead of legibility, it misses the real goal.
Left justification is just a matter of inserting newlines at particular points, so if the input data has two spaces after punctuation, line-breaking won't introduce any errors.
Excellent if it could be done exactly this way. However, things are not always that simple. If a newline is inserted at some point for wrapping purposes, it is desirable and usual to remove what was whitespace around that point, so we do not have unwelcome spaces at start of the beginning line, or spurious trailing whitespace at end of the previous line. If the wrapping device otherwise replaces sequences of many spaces by one, it should be careful at replacing many space by two, in context of full stops. ---------- [1] I think, shudder and horror, that `man' does simultaneous left and right justification when producing ASCII pages, this is especially bad since `man' is about documentation to start with. Of course, when generating pages for laser printers, with proportional fonts and micro-spacing, things are pretty different, and _then_ simultaneous left and right justification makes sense for legibility, if kept within reasonable bounds of course. I'm almost sure that all of us have seen dubious and unreasonable usages. -- François Pinard http://www.iro.umontreal.ca/~pinard
Excellent if it could be done exactly this way. However, things are not always that simple. If a newline is inserted at some point for wrapping purposes, it is desirable and usual to remove what was whitespace around that point, so we do not have unwelcome spaces at start of the beginning line, or spurious trailing whitespace at end of the previous line. If the wrapping device otherwise replaces sequences of many spaces by one, it should be careful at replacing many space by two, in context of full stops.
Emacs does it this way because you reformat the same paragraph over and over. The downside is that sometimes a line is shorter than it could be because it would end in a period. For what we're doing here (producing tidy output) I prefer not to do the Emacs fiddling. --Guido van Rossum (home page: http://www.python.org/~guido/)
On 04 June 2002, Fran?ois Pinard said:
Hi, people.
For this incoming text wrapper facility, there is a feature that appears really essential to me, and many others: the protection of full stops[1].
If you mean reformatting this: """ This is a sentence ending. If we convert each newline to a single space, there won't be enough space after that period. """ to this: """ This is a sentence ending. If we convert each newline to a single space, there won't be enough space after that period. """ then my wrapping algorithm handles it. However, it's currently limited to English, because it relies on string.lowercase to detect sentence ending periods -- this needs to be fixed, but I was going to post the code and let someone who understands locales tell me what to do. ;-) Greg -- Greg Ward - programmer-at-big gward@python.net http://starship.python.net/~gward/ "... but in the town it was well known that when they got home their fat and psychopathic wives would thrash them to within inches of their lives ..."
[Greg Ward, on wrapping text]
...
Note that regrtest.py also has a wrapper: def printlist(x, width=70, indent=4): """Print the elements of a sequence to stdout. Optional arg width (default 70) is the maximum line length. Optional arg indent (default 4) is the number of blanks with which to begin each line. """ This kind of thing gets reinvented too often, so +1 on a module from me. Just make sure it handle the union of all possible desires, but has a simple and intuitive interface <wink>.
Tim> Note that regrtest.py also has a wrapper: Me too... def wrap(s, col=74, startcol=0, hangindent=0): """Insert newlines into 's' so it doesn't extend past 'col'. All lines are indented to 'startcol'. The indentation of the first line is adjusted further by hangindent. """ I guess everybody has one of these laying about... I'll be happy to dump mine once something mostly equivalent is available. I love to throw out code. Skip
Another place for a text wrap function would be part of pprint. I agree with M. Pinard that it doesn't deserve an entire module. RE seems a little off-task for formatting text. PPRINT seems more closely related to the core problem. And it leaves room for adding additional formatting and pretty-printing features. Perhaps a small class hierarchy with different wrapping algorithms (filling and justifying, no filling, etc.) --- Tim Peters <tim.one@comcast.net> wrote:
[Greg Ward, on wrapping text]
...
Note that regrtest.py also has a wrapper:
def printlist(x, width=70, indent=4): """Print the elements of a sequence to stdout.
Optional arg width (default 70) is the maximum line length. Optional arg indent (default 4) is the number of blanks with which to begin each line. """
This kind of thing gets reinvented too often, so +1 on a module from me. Just make sure it handle the union of all possible desires, but has a simple and intuitive interface <wink>.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev
===== -- S. Lott, CCP :-{) S_LOTT@YAHOO.COM http://www.mindspring.com/~slott1 Buccaneer #468: KaDiMa Macintosh user: drinking upstream from the herd. __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com
On 01 June 2002, Tim Peters said:
[Greg Ward, on wrapping text]
...
Note that regrtest.py also has a wrapper:
def printlist(x, width=70, indent=4): """Print the elements of a sequence to stdout.
Optional arg width (default 70) is the maximum line length. Optional arg indent (default 4) is the number of blanks with which to begin each line. """
I think this one will probably stand; I've gotten to the point with my text-wrapping code where I'm reimplementing the various other text-wrappers people have mentioned on top of it, and regrtest.printlist() is just not a good fit. It's for printing lists compactly, not for filling text. Whatever.
Just make sure it handle the union of all possible desires, but has a simple and intuitive interface <wink>.
Right. Gotcha. Code coming up soon. Greg -- Greg Ward - Unix weenie gward@python.net http://starship.python.net/~gward/ Quick!! Act as if nothing has happened!
[Tim]
Note that regrtest.py also has a wrapper:
def printlist(x, width=70, indent=4): """Print the elements of a sequence to stdout.
Optional arg width (default 70) is the maximum line length. Optional arg indent (default 4) is the number of blanks with which to begin each line. """
[Greg Ward]
I think this one will probably stand; I've gotten to the point with my text-wrapping code where I'm reimplementing the various other text-wrappers people have mentioned on top of it, and regrtest.printlist() is just not a good fit. It's for printing lists compactly, not for filling text. Whatever.
regrtest's printlist is trivial to implement on top of the code you posted: def printlist(x, width=70, indent=4): guts = map(str, x) blanks = ' ' * indent w = textwrap.TextWrapper() print w.fill(' '.join(guts), width, blanks, blanks) TextWrapper certainly doesn't have to worry about changing the list into a string, all I want it is that it wrap a string, and it does.
Just make sure it handle the union of all possible desires, but has a simple and intuitive interface <wink>.
Right. Gotcha. Code coming up soon.
It's no more than 10x more elaborate than necessary, so ship it <wink>.
participants (13)
-
Aahz
-
akuchlin@mems-exchange.org
-
Andrew Kuchling
-
barry@zope.com
-
Fredrik Lundh
-
Greg Ewing
-
Greg Ward
-
Guido van Rossum
-
Neal Norwitz
-
pinard@iro.umontreal.ca
-
Skip Montanaro
-
Steven Lott
-
Tim Peters