[Tutor] Simple text file processing using fileinput module. "Grabbing successive lines" failure
Joel Goldstick
joel.goldstick at gmail.com
Mon Jul 2 16:20:52 CEST 2012
On Mon, Jul 2, 2012 at 10:03 AM, Flynn, Stephen (L & P - IT)
<Steve.Flynn at capita.co.uk> wrote:
> Tutors,
>
> Whilst having a play around with reading in textfiles and reformatting them I tried to write a python 3.2 script to read a CSV file, looking for any records which were short (indicating that the data may well contain an embedded CR/LF. I've attached a small sample file with a "split record" at line 3, and my code.
>
> Call the code with
>
> Python pipesmoker.py MyFile.txt ,
>
> (first parameter is the file being read, second parameter is the field separator... a comma in this case)
>
> I can read the file in, I can determine that I'm looking for records which have 13 fields and I can find a record which is too short (line 3).
>
> What I can't do is read the successive line to a short line in order to append it onto the end of short line before writing the entire amended line out. I'm still thinking about how to persuade the fileinput module to leap over the successor line so it doesn't get processed again.
>
> When I run the code as it stands, I get a traceback as I'm obviously not using fileinput.FileInput.readline() correctly.
>
> value of file is C:\myfile.txt
> value of the delimiter is ,
> I'm looking for 13 , in each currentLine...
> "1","0000000688 ","ABCD","930020854","34","0","1"," ","930020854 "," ","0","0","0","0"
>
> "2","0000000688 ","ABCD","930020854","99","0","1"," ","930020854 "," ","0","0","0","0"
>
> short line found at line 3
> Traceback (most recent call last):
> File "C:\Documents and Settings\flynns\workspace\PipeSmoker\src\pipesmoker\pipesmoker.py", line 35, in <module>
> nextLine = fileinput.FileInput.readline(args.file)
> File "C:\Python32\lib\fileinput.py", line 301, in readline
> line = self._buffer[self._bufindex]
> AttributeError: 'str' object has no attribute '_buffer'
>
>
> Can someone explain to me how I am supposed to make use of readline() to grab the next line of a text file please? It may be that I should be using some other module, but chose fileinput as I was hoping to make the little routine as generic as possible; able to spot short lines in tab separated, comma separated, pipe separated, ^~~^ separated and anything else which my clients feel like sending me.
>
Take a look at csvreader
http://docs.python.org/library/csv.html#csv.reader. It comes with
python, and according to the text near this link, it will handle a
situation where EOL characters are contained in quoted fields. Will
that help you?
--
Joel Goldstick
More information about the Tutor
mailing list