Attempting to parse free-form ANSI text.

Paul McGuire ptmcg at austin.rr._bogus_.com
Sun Oct 22 20:44:25 EDT 2006


"Michael B. Trausch" <"mike$#at^&nospam!%trauschus"> wrote in message 
news:GsGdnTIYc-lXaafYnZ2dnUVZ_sCdnZ2d at comcast.com...
> Alright... I am attempting to find a way to parse ANSI text from a
> telnet application.  However, I am experiencing a bit of trouble.
>
> What I want to do is have all ANSI sequences _removed_ from the output,
> save for those that manage color codes or text presentation (in short,
> the ones that are ESC[#m (with additional #s separated by ; characters).
> The ones that are left, the ones that are the color codes, I want to
> act on, and remove from the text stream, and display the text.
>
Here is a pyparsing-based scanner/converter, along with some test code at 
the end.  It takes care of partial escape sequences, and strips any 
sequences of the form
"<ESC>[##;##;...<alpha>", unless the trailing alpha is 'm'.
The pyparsing project wiki is at http://pyparsing.wikispaces.com.

-- Paul

from pyparsing import *

ESC = chr(27)
escIntro = Literal(ESC + '[').suppress()
integer = Word(nums)

colorCode = Combine(escIntro +
                Optional(delimitedList(integer,delim=';')) +
                Suppress('m')).setResultsName("colorCode")

# define search pattern that will match non-color ANSI command
# codes - these will just get dropped on the floor
otherAnsiCode = Suppress( Combine(escIntro +
                            Optional(delimitedList(integer,delim=';')) +
                            oneOf(list(alphas)) ) )

partialAnsiCode = Combine(Literal(ESC) +
                    Optional('[') +
                    Optional(delimitedList(integer,delim=';') + 
Optional(';')) +
                    StringEnd()).setResultsName("partialCode")
ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode


# preserve tabs in incoming text
ansiSearchPattern.parseWithTabs()

def processInputString(inputString):
    lastEnd = 0
    for t,start,end in ansiSearchPattern.scanString( inputString ):
        # pass inputString[lastEnd:start] to wxTextControl - font styles 
were set in parse action
        print inputString[lastEnd:start]

        # process color codes, if any:
        if t.getName() == "colorCode":
            if t:
                print "<change color attributes to %s>" % t.asList()
            else:
                print "<empty color sequence detected>"
        elif t.getName() == "partialCode":
            print "<found partial escape sequence %s, tack it on front of 
next>" % t
            # return partial code, to be prepended to the next string
            # sent to processInputString
            return t[0]
        else:
            # other kind of ANSI code found, do nothing
            pass

        lastEnd = end

    # # pass inputString[lastEnd:] to wxTextControl - this is the last bit
    # of the input string after the last escape sequence
    print inputString[lastEnd:]


test = """\
This is a test string containing some ANSI sequences.
Sequence 1: ~[10;12m
Sequence 2: ~[3;4h
Sequence 3: ~[4;5m
Sequence 4; ~[m
Sequence 5; ~[24HNo more escape sequences.
~[7""".replace('~',chr(27))

leftOver = processInputString(test)


Prints:
This is a test string containing some ANSI sequences.
Sequence 1:
<change color attributes to ['1012']>

Sequence 2:

Sequence 3:
<change color attributes to ['45']>

Sequence 4;
<change color attributes to ['']>

Sequence 5;
No more escape sequences.

<found partial escape sequence ['\x1b[7'], tack it on front of next>





More information about the Python-list mailing list