How to find the best solution ?

Tim Chase python.list at
Tue Mar 23 18:31:09 CET 2010

Johny wrote:
> I have a text and would like  to split the text into smaller parts,
> say into 100 characters each. But if  the 100th character is not a
> blank ( but word) this must be less than 100 character.That means the
> word itself can not be split.
> These smaller parts must contains only whole( not split) words.
> I was thinking  about  RegEx but do not know how to find the correct
> Regular Expression.

While I suspect you can come close with a regular expression:

   import re, random
   size = 100
   r = re.compile(r'.{1,%i}\b' % size)
   # generate a random text string with a mix of word-lengths
   words = ['a', 'an', 'the', 'four', 'fives', 'sixsix']
   data = ' '.join(random.choice(words) for _ in range(200))
   # for each chunk of 100 characters (or fewer
   # if on a word-boundary), do something
   for bit in r.finditer(data):
     chunk =
     print "%i: [%s]" % (len(chunk), chunk)

it may have an EOF fencepost error, so you might have to clean up 
the last item.  My simple test seemed to show it worked without 
cleanup though.


More information about the Python-list mailing list