string / split method on ASCII code?

Steven D'Aprano steve at
Thu Mar 13 03:54:53 CET 2008

Sorry for breaking threading by replying to a reply, but I don't seem to 
have the original post.

On Wed, 2008-03-12 at 15:29 -0500, Michael Wieher wrote:
> Hey all,
> I have these annoying textilfes that are delimited by the ASCII char
> for << (only its a single character) and >> (again a single character)
> Their codes are 174 and 175, respectively.
> My datafiles are in the moronic form
> X<<Y>>Z

The glyph that looks like "<<" is a left quote in some European countries 
(and a right quote in others, sigh...), and similar for ">>", and are 
usually known as left and right "angle quotation mark", chevron or 
guillemet. And yes, that certainly looks like a moronic form for a data 

But whatever the characters are, we can work with them as normal, if you 
don't mind ignoring that they don't display properly everywhere:

>>> lq = chr(174)
>>> rq = chr(175)
>>> s = "x" + lq + "y" + rq + "z"
>>> print s
>>> s.split(lq)
['x', 'y\xafz']
>>> s.split(rq)
['x\xaey', 'z']

And you can use regular expressions as well. Assuming that the quotes are 
never nested:

>>> import re
>>> r = re.compile(lq + '(.*?)' + rq)

If you want to treat both characters the same:

>>> s = s.replace(lq, rq)
>>> s.split(rq)
['x', 'y', 'z']


More information about the Python-list mailing list