[Tutor] Simple text file processing using fileinput module. "Grabbing successive lines" failure
Peter Otten
__peter__ at web.de
Tue Jul 3 11:35:22 CEST 2012
Flynn, Stephen (L & P - IT) wrote:
> Tutors,
>
> Whilst having a play around with reading in textfiles and reformatting
> them I tried to write a python 3.2 script to read a CSV file, looking for
> any records which were short (indicating that the data may well contain an
> embedded CR/LF. I've attached a small sample file with a "split record" at
> line 3, and my code.
>
> Call the code with
>
> Python pipesmoker.py MyFile.txt ,
>
> (first parameter is the file being read, second parameter is the field
> separator... a comma in this case)
>
> I can read the file in, I can determine that I'm looking for records which
> have 13 fields and I can find a record which is too short (line 3).
>
> What I can't do is read the successive line to a short line in order to
> append it onto the end of short line before writing the entire amended
> line out. I'm still thinking about how to persuade the fileinput module to
> leap over the successor line so it doesn't get processed again.
>
> When I run the code as it stands, I get a traceback as I'm obviously not
> using fileinput.FileInput.readline() correctly.
>
> value of file is C:\myfile.txt
> value of the delimiter is ,
> I'm looking for 13 , in each currentLine...
> "1","0000000688 ","ABCD","930020854","34","0","1"," ","930020854
> "," ","0","0","0","0"
>
> "2","0000000688 ","ABCD","930020854","99","0","1"," ","930020854 ","
> ","0","0","0","0"
>
> short line found at line 3
> Traceback (most recent call last):
> File "C:\Documents and
> Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py", line
> 35, in <module>
> nextLine = fileinput.FileInput.readline(args.file)
> File "C:\Python32\lib\fileinput.py", line 301, in readline
> line = self._buffer[self._bufindex]
> AttributeError: 'str' object has no attribute '_buffer'
>
>
> Can someone explain to me how I am supposed to make use of readline() to
> grab the next line of a text file please? It may be that I should be using
> some other module, but chose fileinput as I was hoping to make the little
> routine as generic as possible; able to spot short lines in tab separated,
> comma separated, pipe separated, ^~~^ separated and anything else which my
> clients feel like sending me.
As you already learned the csv module is the best tool to address your
problem.
However, I'd like to show a generic way to get an extra item in a for-loop.
Instead of iterating over the "iterable" (a list or a FileInput object or
whatever) you first convert it into an iterator explicitly with the iter()
built-in function and keep the reference around:
iterable = ...
it = iter(iterable)
Then inside the for-loop you get an extra item with the next() function:
for item in it:
if some_condition():
extra = next(it)
next() also allows you to provide a default value; without it you may get a
StopIteration exception when you apply it on an exhausted iterator.
Here's a self-contained example:
>>> items = "alpha- beta gamma- delta- epsilon zeta".split()
>>> it = iter(items)
>>> for item in it:
... while item.endswith("-"):
... item += next(it)
... print item
...
alpha-beta
gamma-delta-epsilon
zeta
More information about the Tutor
mailing list