Help needed with nested parsing of file into objects
richard
pullenjenna10 at gmail.com
Tue Jun 5 16:18:13 EDT 2012
On Jun 5, 8:50 pm, Eelco <hoogendoorn.ee... at gmail.com> wrote:
> > thank you both for your replies. Unfortunately it is a pre-existing
> > file format imposed by an external system that I can't
> > change. Thank you for the code snippet.
>
> Hi Richard,
>
> Despite the fact that it is a preexisting format, it is very close
> indeed to valid YAML code.
>
> Writing your own whitespace-aware parser can be a bit of a pain, but
> since YAML does this for you, I would argue the cleanest solution
> would be to bootstrap that functionality, rather than roll your own
> solution, or to resort to hard to maintain regex voodoo.
>
> Here is my solution. As a bonus, it directly constructs a custom
> object hierarchy (obviously you would want to expand on this, but the
> essentials are there). One caveat: at the moment, the conversion to
> YAML relies on the appparent convention that instances never directly
> contain other instances, and lists never directly contain lists. This
> means all instances are list entries and get a '-' appended, and this
> just works. If this is not a general rule, youd have to keep track of
> an enclosing scope stack an emit dashes based on that. Anyway, the
> idea is there, and I believe it to be one worth looking at.
>
> <code>
> import yaml
>
> class A(yaml.YAMLObject):
> yaml_tag = u'!A'
> def __init__(self, **kwargs):
> self.__dict__.update(kwargs)
> def __repr__(self):
> return 'A' + str(self.__dict__)
>
> class B(yaml.YAMLObject):
> yaml_tag = u'!B'
> def __init__(self, **kwargs):
> self.__dict__.update(kwargs)
> def __repr__(self):
> return 'B' + str(self.__dict__)
>
> class C(yaml.YAMLObject):
> yaml_tag = u'!C'
> def __init__(self, **kwargs):
> self.__dict__.update(kwargs)
> def __repr__(self):
> return 'C' + str(self.__dict__)
>
> class TestArray(yaml.YAMLObject):
> yaml_tag = u'!TestArray'
> def __init__(self, **kwargs):
> self.__dict__.update(kwargs)
> def __repr__(self):
> return 'TestArray' + str(self.__dict__)
>
> class myList(yaml.YAMLObject):
> yaml_tag = u'!myList'
> def __init__(self, **kwargs):
> self.__dict__.update(kwargs)
> def __repr__(self):
> return 'myList' + str(self.__dict__)
>
> data = \
> """
> An instance of TestArray
> a=a
> b=b
> c=c
> List of 2 A elements:
> Instance of A element
> a=1
> b=2
> c=3
> Instance of A element
> d=1
> e=2
> f=3
> List of 1 B elements
> Instance of B element
> a=1
> b=2
> c=3
> List of 2 C elements
> Instance of C element
> a=1
> b=2
> c=3
> Instance of C element
> a=1
> b=2
> c=3
> An instance of TestArray
> a=1
> b=2
> c=3
> """.strip()
>
> #remove trailing whitespace and seemingly erronous colon in line 5
> lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')]
>
> def transform(lines):
> """transform text line by line"""
> for line in lines:
> #regular mapping lines
> if line.find('=') > 0:
> yield line.replace('=', ': ')
> #instance lines
> p = line.find('nstance of')
> if p > 0:
> s = p + 11
> e = line[s:].find(' ')
> if e == -1: e = len(line[s:])
> tag = line[s:s+e]
> whitespace= line.partition(line.lstrip())[0]
> yield whitespace[:-2]+' -'+ ' !'+tag
> #list lines
> p = line.find('List of')
> if p > 0:
> whitespace= line.partition(line.lstrip())[0]
> yield whitespace[:-2]+' '+ 'myList:'
>
> ##transformed = (transform( lines))
> ##for i,t in enumerate(transformed):
> ## print '{:>3}{}'.format(i,t)
>
> transformed = '\n'.join(transform( lines))
> print transformed
>
> res = yaml.load(transformed)
> print res
> print yaml.dump(res)
> </code>
Hi Eelco many thanks for the reply / solution it definitely looks like
a clean way to go about it. However installing 3rd party libs like
yaml on the server I dont think is on the cards at the moment.
More information about the Python-list
mailing list