More Help with python .find fucntion

Sat Jan 8 00:06:55 EST 2011

On Fri, Jan 7, 2011 at 8:43 PM, Keith Anthony <kanthony at woh.rr.com> wrote:
> My previous question asked how to read a file into a strcuture
> a line at a time.  Figured it out.  Now I'm trying to use .find
> to separate out the PDF objects.  (See code)  PROBLEM/QUESTION:
> My call to lines[i].find does NOT find all instances of endobj.
> Any help available?  Any insights?
>
> #!/usr/bin/python
>
> inputfile =  file('sample.pdf','rb')            # This is PDF with which we will work
> lines = inputfile.readlines()                   # read file one line at a time
>
> linestart = []                                  # Starting address for each line
> lineend = []                                    # Ending address for each line
> linetype = []
>
> print len(lines)                                # print number of lines
>
> i = 0                                           # define an iterator, i
> addr = 0                                        # and address pointer
>
> while i < len(lines):                           # Go through each line
>    linestart = linestart + [addr]
>    length = len(lines[i])
>    lineend = lineend + [addr + (length-1)]
>    addr = addr + length
>    i = i + 1
>
> i = 0
> while i < len(lines):                           # Initialize line types as normal
>    linetype = linetype + ['normal']
>    i = i + 1
>
> i = 0
> while i < len(lines):                           #
>    if lines[i].find(' obj') > 0:
>        linetype[i] = 'object'
>        print "At address ",linestart[i],"object found at line ",i,": ", lines[i]
>    if lines[i].find('endobj') > 0:
>        linetype[i] = 'endobj'
>        print "At address ",linestart[i],"endobj found at line ",i,": ", lines[i]
>    i = i + 1

Your code can be simplified significantly.
In particular:
- Don't add single-element lists. Use the list.append() method instead.
- One seldom manually tracks counters like `i` in Python; use range()
or enumerate() instead.
- Lists have a multiply method which gives the concatenation of n
copies of the list.

Revised version (untested obviously):

inputfile =  file('sample.pdf','rb')            # This is PDF with
which we will work
lines = inputfile.readlines()                   # read file one line at a time

linestart = []                                  # Starting address for each line
lineend = []                                    # Ending address for each line
linetype = ['normal']*len(lines)

print len(lines)                                # print number of lines

addr = 0                                        # and address pointer

for line in lines: # Go through each line
   linestart.append(addr)
   length = len(line)
   lineend.append(addr + (length-1))
   addr += length

for i, line in enumerate(lines):
   if line.find(' obj') > 0:
       linetype[i] = 'object'
       print "At address ",linestart[i],"object found at line ",i,": ", line
   if line.find('endobj') > 0:
       linetype[i] = 'endobj'
       print "At address ",linestart[i],"endobj found at line ",i,": ", line

As to the bug: I think you want "!= -1" rather than "> 0" for your
conditionals; remember that Python list/string indices are 0-based.

Cheers,
Chris
--
http://blog.rebertia.com