How to find the best solution ?
Tim Chase
python.list at tim.thechases.com
Tue Mar 23 13:31:09 EDT 2010
Johny wrote:
> I have a text and would like to split the text into smaller parts,
> say into 100 characters each. But if the 100th character is not a
> blank ( but word) this must be less than 100 character.That means the
> word itself can not be split.
> These smaller parts must contains only whole( not split) words.
> I was thinking about RegEx but do not know how to find the correct
> Regular Expression.
While I suspect you can come close with a regular expression:
import re, random
size = 100
r = re.compile(r'.{1,%i}\b' % size)
# generate a random text string with a mix of word-lengths
words = ['a', 'an', 'the', 'four', 'fives', 'sixsix']
data = ' '.join(random.choice(words) for _ in range(200))
# for each chunk of 100 characters (or fewer
# if on a word-boundary), do something
for bit in r.finditer(data):
chunk = bit.group(0)
print "%i: [%s]" % (len(chunk), chunk)
it may have an EOF fencepost error, so you might have to clean up
the last item. My simple test seemed to show it worked without
cleanup though.
-tkc
More information about the Python-list
mailing list