split string into multi-character "letters"
Jed
jedmeltzer at gmail.com
Wed Aug 25 15:46:25 EDT 2010
Hi, I'm seeking help with a fairly simple string processing task.
I've simplified what I'm actually doing into a hypothetical
equivalent.
Suppose I want to take a word in Spanish, and divide it into
individual letters. The problem is that there are a few 2-character
combinations that are considered single letters in Spanish - for
example 'ch', 'll', 'rr'.
Suppose I have:
alphabet = ['a','b','c','ch','d','u','r','rr','o'] #this would include
the whole alphabet but I shortened it here
theword = 'churro'
I would like to split the string 'churro' into a list containing:
'ch','u','rr','o'
So at each letter I want to look ahead and see if it can be combined
with the next letter to make a single 'letter' of the Spanish
alphabet. I think this could be done with a regular expression
passing the list called "alphabet" to re.match() for example, but I'm
not sure how to use the contents of a whole list as a search string in
a regular expression, or if it's even possible. My real application
is a bit more complex than the Spanish alphabet so I'm looking for a
fairly general solution.
Thanks,
Jed
More information about the Python-list
mailing list