Attempting to parse free-form ANSI text.
Paul McGuire
ptmcg at austin.rr._bogus_.com
Sun Oct 22 20:44:25 EDT 2006
"Michael B. Trausch" <"mike$#at^&nospam!%trauschus"> wrote in message
news:GsGdnTIYc-lXaafYnZ2dnUVZ_sCdnZ2d at comcast.com...
> Alright... I am attempting to find a way to parse ANSI text from a
> telnet application. However, I am experiencing a bit of trouble.
>
> What I want to do is have all ANSI sequences _removed_ from the output,
> save for those that manage color codes or text presentation (in short,
> the ones that are ESC[#m (with additional #s separated by ; characters).
> The ones that are left, the ones that are the color codes, I want to
> act on, and remove from the text stream, and display the text.
>
Here is a pyparsing-based scanner/converter, along with some test code at
the end. It takes care of partial escape sequences, and strips any
sequences of the form
"<ESC>[##;##;...<alpha>", unless the trailing alpha is 'm'.
The pyparsing project wiki is at http://pyparsing.wikispaces.com.
-- Paul
from pyparsing import *
ESC = chr(27)
escIntro = Literal(ESC + '[').suppress()
integer = Word(nums)
colorCode = Combine(escIntro +
Optional(delimitedList(integer,delim=';')) +
Suppress('m')).setResultsName("colorCode")
# define search pattern that will match non-color ANSI command
# codes - these will just get dropped on the floor
otherAnsiCode = Suppress( Combine(escIntro +
Optional(delimitedList(integer,delim=';')) +
oneOf(list(alphas)) ) )
partialAnsiCode = Combine(Literal(ESC) +
Optional('[') +
Optional(delimitedList(integer,delim=';') +
Optional(';')) +
StringEnd()).setResultsName("partialCode")
ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode
# preserve tabs in incoming text
ansiSearchPattern.parseWithTabs()
def processInputString(inputString):
lastEnd = 0
for t,start,end in ansiSearchPattern.scanString( inputString ):
# pass inputString[lastEnd:start] to wxTextControl - font styles
were set in parse action
print inputString[lastEnd:start]
# process color codes, if any:
if t.getName() == "colorCode":
if t:
print "<change color attributes to %s>" % t.asList()
else:
print "<empty color sequence detected>"
elif t.getName() == "partialCode":
print "<found partial escape sequence %s, tack it on front of
next>" % t
# return partial code, to be prepended to the next string
# sent to processInputString
return t[0]
else:
# other kind of ANSI code found, do nothing
pass
lastEnd = end
# # pass inputString[lastEnd:] to wxTextControl - this is the last bit
# of the input string after the last escape sequence
print inputString[lastEnd:]
test = """\
This is a test string containing some ANSI sequences.
Sequence 1: ~[10;12m
Sequence 2: ~[3;4h
Sequence 3: ~[4;5m
Sequence 4; ~[m
Sequence 5; ~[24HNo more escape sequences.
~[7""".replace('~',chr(27))
leftOver = processInputString(test)
Prints:
This is a test string containing some ANSI sequences.
Sequence 1:
<change color attributes to ['1012']>
Sequence 2:
Sequence 3:
<change color attributes to ['45']>
Sequence 4;
<change color attributes to ['']>
Sequence 5;
No more escape sequences.
<found partial escape sequence ['\x1b[7'], tack it on front of next>
More information about the Python-list
mailing list