split string into multi-character "letters"

Jussi Piitulainen jpiitula at ling.helsinki.fi
Wed Aug 25 16:05:39 EDT 2010


Jed writes:

> alphabet = ['a','b','c','ch','d','u','r','rr','o'] #this would
> include the whole alphabet but I shortened it here
> theword = 'churro'
> 
> I would like to split the string 'churro' into a list containing:
> 
> 'ch','u','rr','o'

All non-overlapping matches, each as long as can be, and '.' catches
single characters by default:

  >>> import re
  >>> re.findall('ch|ll|rr|.', 'churro')
  ['ch', 'u', 'rr', 'o']



More information about the Python-list mailing list