More Help with python .find fucntion
Chris Rebert
clp2 at rebertia.com
Sat Jan 8 00:06:55 EST 2011
On Fri, Jan 7, 2011 at 8:43 PM, Keith Anthony <kanthony at woh.rr.com> wrote:
> My previous question asked how to read a file into a strcuture
> a line at a time. Figured it out. Now I'm trying to use .find
> to separate out the PDF objects. (See code) PROBLEM/QUESTION:
> My call to lines[i].find does NOT find all instances of endobj.
> Any help available? Any insights?
>
> #!/usr/bin/python
>
> inputfile = file('sample.pdf','rb') # This is PDF with which we will work
> lines = inputfile.readlines() # read file one line at a time
>
> linestart = [] # Starting address for each line
> lineend = [] # Ending address for each line
> linetype = []
>
> print len(lines) # print number of lines
>
> i = 0 # define an iterator, i
> addr = 0 # and address pointer
>
> while i < len(lines): # Go through each line
> linestart = linestart + [addr]
> length = len(lines[i])
> lineend = lineend + [addr + (length-1)]
> addr = addr + length
> i = i + 1
>
> i = 0
> while i < len(lines): # Initialize line types as normal
> linetype = linetype + ['normal']
> i = i + 1
>
> i = 0
> while i < len(lines): #
> if lines[i].find(' obj') > 0:
> linetype[i] = 'object'
> print "At address ",linestart[i],"object found at line ",i,": ", lines[i]
> if lines[i].find('endobj') > 0:
> linetype[i] = 'endobj'
> print "At address ",linestart[i],"endobj found at line ",i,": ", lines[i]
> i = i + 1
Your code can be simplified significantly.
In particular:
- Don't add single-element lists. Use the list.append() method instead.
- One seldom manually tracks counters like `i` in Python; use range()
or enumerate() instead.
- Lists have a multiply method which gives the concatenation of n
copies of the list.
Revised version (untested obviously):
inputfile = file('sample.pdf','rb') # This is PDF with
which we will work
lines = inputfile.readlines() # read file one line at a time
linestart = [] # Starting address for each line
lineend = [] # Ending address for each line
linetype = ['normal']*len(lines)
print len(lines) # print number of lines
addr = 0 # and address pointer
for line in lines: # Go through each line
linestart.append(addr)
length = len(line)
lineend.append(addr + (length-1))
addr += length
for i, line in enumerate(lines):
if line.find(' obj') > 0:
linetype[i] = 'object'
print "At address ",linestart[i],"object found at line ",i,": ", line
if line.find('endobj') > 0:
linetype[i] = 'endobj'
print "At address ",linestart[i],"endobj found at line ",i,": ", line
As to the bug: I think you want "!= -1" rather than "> 0" for your
conditionals; remember that Python list/string indices are 0-based.
Cheers,
Chris
--
http://blog.rebertia.com
More information about the Python-list
mailing list