finding out the number of rows in a CSV file [Resolved]
sjmachin at lexicon.net
Thu Aug 28 02:48:25 CEST 2008
On Aug 28, 7:51 am, norseman <norse... at hughes.net> wrote:
> Peter Otten wrote:
> > John S wrote:
> >> [OP] Jon Clements wrote:
> >>> On Aug 27, 12:54 pm, SimonPalmer <simon.pal... at gmail.com> wrote:
> >>>> after reading the file throughthe csv.reader for the length I cannot
> >>>> iterate over the rows. How do I reset the row iterator?
> >> A CSV file is just a text file. Don't use csv.reader for counting rows
> >> -- it's overkill. You can just read the file normally, counting lines
> >> (lines == rows).
> > Wrong. A field may have embedded newlines:
> >>>> import csv
> >>>> csv.writer(open("tmp.csv", "w")).writerow(["a" + "\n"*10 + "b"])
> >>>> sum(1 for row in csv.reader(open("tmp.csv")))
> > 1
> >>>> sum(1 for line in open("tmp.csv"))
> > 11
> > Peter
> > --
> Well..... a semantics's problem here.
> A blank line is just an EOL by its self. Yes.
Or a line containing blanks. Yes what?
> I may want to count these. Could be indicative of a problem.
If you use the csv module to read the file, a "blank line" will come
out as a row with one field, the contents of which you can check.
> Besides sum(1 for len(line)>0 in ...) handles problem if I'm not
> counting blanks and still avoids tossing, re-opening etc...
What is "tossing", apart from the English slang meaning?
> Again - it's how you look at it, but I don't want EOLs in my dbase
Most people don't want them, but many do have them, as well as Ctrl-Zs
and NBSPs and dial-up line noise (and umlauts/accents/suchlike
inserted by the temporarily-employed backpacker to ensure that her
compatriots' names and addresses were spelled properly) ... and the IT
department fervently believes the content is ASCII even though they
have done absolutely SFA to ensure that.
> csv was designed to 'dump' data base fields into text for those
> not affording a data base program and/or to convert between data base
> programs. By the way - has anyone seen a good spread sheet dumper? One
> that dumps the underlying formulas and such along with the display
> value? That would greatly facilitate portability, wouldn't it? (Yeah -
> the receiving would have to be able to read it. But it would be a start
> - yes?) Everyone got the point? Just because it gets abused doesn't
> mean .... Are we back on track? Number of lines equals number of
> reads - which is what was requested. No bytes magically disappearing. No
> slight of hand, no one dictating how to or what with ....
> The good part is everyone who reads this now knows two ways to approach
> the problem and the pros/cons of each. No loosers.
IMHO it is very hard to discern from all that ramble what the alleged
problem is, let alone what are the ways to approach it.
More information about the Python-list