[Tutor] strings & splitting
Alan Gauld
alan.gauld at freenet.co.uk
Wed Jan 25 23:44:07 CET 2006
Liam wrote:
------------------------
Alan - the data is of the form -
a = {
b = 1
c = 2
d = { e = { f = 4 g = "Ultimate Showdown of Ultimate Destiny" } h =
{ i j k } }
}
Everything is whitespace delimited. I'd like to turn it into
["a", "=", "{", "b", "=", "1",
"c", "=", "2", "d", "=", "{",
"e", "=", "{", "f", "=", "4", "g", "=",
"\"Ultimate Showdown of Ultimate Destiny\"",
"}", "h", "=", "{", "i", "j", "k", "}", "}"]
-----------------------
Liam,
I'd probably tackle this on a character by character basis using
traditional tokenising code. ie build a state machine to determine
what kind of token I'm in and keep reading chars until the token
completes. most tokens are single-char tokens, others are quote-tokens.
Set token-type appropriately and read till end-of-token.
I might even use some classes to define the token types,
but I'd keep them as simple as possible.
Rough pseudo code:
for char in tokenString:
if token.type == quoted:
token.append(char)
if char == ": token.type =None
continue
elif token.type = simple
tokens.append(char)
token.type = None
else: # not in a token
if char in '\n\t ,.': # other non token chars
continue
elif char == '"':
token.type = quoted
token.append(char)
continue
else:
token.type = simple
token.append(char)
You can tidy that up with functions and a proper state jump table
but it might be faster than trying to build complex pattern matches
and doing lots of insertions into lists etc. But it does rely on the
data being as simple as your sample in the variety of token types.
HTH,
Alan G
Author of the learn to program web tutor
http://www.freenetpages.co.uk/hp/alan.gauld
More information about the Tutor
mailing list