Parsing file format to ensure file meets criteria

MRAB python at mrabarnett.plus.com
Thu Dec 17 20:42:56 EST 2009


seafoid wrote:
> Hi folks,
> 
> I am new to python and am having some trouble parsing a file.
> 
> I wish to parse a file and ensure that the format meets certain
> restrictions.
> 
> The file format is as below (abbreviated):
> 
> c this is a comment
> p wcnf 1468 817439 186181
> 286 32 0
> 186191 -198 -1098 0
> 186191 98 -1098 1123 0
> 
> Lines beginning c are comment lines and must precede all other lines.
> 
> Lines beginning p are header lines with the numbers being 'nvar', 'nclauses'
> and 'hard' respectively.
> 
> All other lines are clause lines. These must contain at least two integers
> followed by zero. There is no limit on the number of clause lines.
> 
> Header lines must precede clause lines.
> 
> In the above example:
> nvar = 1468
> nclauses = 817439
> hard = 186191
> 
> Now for the interesting part...........
> 
> The first number in a clause line = weight.
> All else are literals.
> Therefore, clause = weight + literals
> 
> weight <= hard
> |literal| > 0
> |literal| <= nvar
> number of clause lines = nclauses
> 
> My attempts thus far have been a dismal failure, computing is so viciously
> logical :confused:
> 
> My main problem is that below:
> 
> fname = raw_input('Please enter the name of the file: ')
> 
> z = open(fname, 'r')
> 
> z_list = [i.strip().split() for i in z]
> 
> #here each line is converted to a list, all nested within a list - all
> elements of the list are strings, even integers are converted to strings
> 
> Question - how are nested lists indexed?
> 
A list is indexed by integers:

 >>> my_list = ['a', 'b', 'c']
 >>> my_list[0]
'a'

A list of lists requires 2 subscripts, one for the list and the other
for the list in that list:

 >>> my_list = [['a', 'b'], ['c', 'd']]
 >>> my_list[0]
['a', 'b']
 >>> my_list[0][1]
'b'

> I then attempted to extract the comment, headers and clauses from the nested
> list and assign them to a variable.
> 
> I tried:
> 
z_list is a list of lines, where each line is a list of words.

For example, is the file contains:

     c this is a comment
     p wcnf 1468 817439 186181

then z_list contains:

     [['c', 'this', 'is', 'a', 'comment'], ['p', 'wcnf', '1468',
'817439', '186181']]

> for inner in z_list:
>     for lists in inner:
>         if lists[0] == 'c':
>             comment = lists[:]
>         elif lists[0] == 'p':
>             header = lists[:]
>         else:
>             clause = lists[:]
>         print comment, header, clause   
> 
> This does not work for some reasons which I understand. I have messed up the
> indexing and my assignment of variables is wrong.
> 
> The aim was to extract the headers and comments and then be left with a
> nested list of clauses.
> 
> Then I intended to converted the strings within the clauses nested list back
> to integers and via indexing, check that all conditions are met. This would
> have involved also converting the numerical strings within the header to
> integers but the actual strings are proving a difficult problem to ignore.
> 
> Any suggestions?
> 
> If my mistakes are irritatingly stupid, please feel free to advise that I
> r.t.f.m (read the f**king manual). However, thus far the manual has helped
> me little.
> 



More information about the Python-list mailing list