[Csv] Problems with CSV Module

Skip Montanaro skip at pobox.com
Wed May 21 16:28:29 CEST 2003


    Andreas> 1. Documentation:
    Andreas> What's a row? (The word row means a list or a tuple.)
    Andreas> How does DictReader & DictWriter work? Having a couple of examples would
    Andreas> help ;-))

Thanks, I'll add a couple examples and better define row.  DictReader works
pretty much like dict cursors in the various Python database packages,
returning a dictionary instead of a tuple for each row of data.  Here's an
example of using csv.DictReader.  This particular snippet parses CSV files
dumped by Checkpoint Software's Firewall-1 product.

    class fw1dialect(csv.Dialect):
        lineterminator = '\n'
        escapechar = '\\'
        skipinitialspace = False
        quotechar = '"'
        quoting = csv.QUOTE_ALL
        delimiter = ';'
        doublequote = True

    csv.register_dialect("fw1", fw1dialect)

    fieldnames = ("num;date;time;orig;type;action;alert;i/f_name;"
                  "i/f_dir;product;src;s_port;dst;service;proto;"
                  "rule;th_flags;message_info;icmp-type;icmp-code;"
                  "sys_msgs;cp_message;sys_message").split(';')
    rdr = csv.DictReader(f, fieldnames=fieldnames, dialect="fw1")

    for row in rdr:
        if row["num"] is None:
            continue
        nrows += 1
        if action is not None and  row["action"] != action:
            continue
        source = row.get("src", "unknown")
        ...

Note that instead of returning a tuple for each row, a dictionary is
returned.  Its keys are the elements of the fieldnames parameter of the
constructor. 

    Andreas> 2. Locale:
    Andreas> The CSV module doesn't use locale. The default delimiter for Austria
    Andreas> (+Germany) in Windows is a semicolon ';' not a comma ','.
    Andreas> Having the result, that you can't import a list generated by csv.writer()
    Andreas> in Excel without changing your regional settings, or using
    Andreas> csv.writer(delimiter=';').
    Andreas> It would be nice if the CSV module would adopt to the language settings.

How can I get that from Python or do I have to know that if the locale is de
the default Excel delimiter is a semicolon?  What other locales have a
semicolon as the default?  I suspect if we have to enumerate them all it may
not get done?  Also, note that the 

    Andreas> This could be really simple to implement using the locale
    Andreas> module. But I took a short look at the locale module and it
    Andreas> seems like there is no way to get the list separator sign
    Andreas> (probably it's not POSIX complaint).

That would make it difficult to do.

    Andreas> Another possibility would be to have a dialect like 'excel_ger'
    Andreas> with the correct settings.

But what about all the other locales which must use a semicolon as the
default delimiter?

How about this in your code:

    class excel(csv.excel):
        delimiter = ';'
    csv.register_dialect("excel", excel)

    Andreas> 3. There is no .close()

Note that the "file-like object" can be any object which supports the
iterator protocol, so it need not have a close() method.  In the test code
we often use lists, e.g.:

    def test_read_with_blanks(self):
        reader = csv.DictReader(["1,2,abc,4,5,6\r\n","\r\n",
                                 "1,2,abc,4,5,6\r\n"],
                                fieldnames="1 2 3 4 5 6".split())
        self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc',
                                         "4": '4', "5": '5', "6": '6'})
        self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc',
                                         "4": '4', "5": '5', "6": '6'})

    Andreas> f=file(FILE_CSV,'w')
    Andreas> w=csv.writer(f,dialect='excel',delimiter=';')
    Andreas> w.writerow((1,5,10,25,100,250,500,1000,1500))
    Andreas> f.close()

    Andreas> f=file(FILE_CSV,'r')
    Andreas> r=csv.reader(file(FILE_CSV,'r'),dialect='excel',delimiter=';')
    Andreas> print r.next()
    Andreas> f.close()

Yes, this is what you'll have to do, though note that if you reuse f the
first call to f.close() is unnecessary.

    Andreas> 4. There is no .readrow()

    Andreas> This should be just another name for .next(). It's more
    Andreas> intuitive if you write a row via .writerow() and read it via
    Andreas> .readrow().

I think we can probably squeeze this in.

Skip


More information about the Csv mailing list