csv read _csv.Error: line contains NULL byte

Tim Golden mail at timgolden.me.uk
Fri Mar 21 14:39:37 CET 2014


On 21/03/2014 13:29, chip9munk at gmail.com wrote:
> Hi all!
> 
> I am reading from a huge csv file (> 20 Gb), so I have to read line by line:
> 
> for i, row in enumerate(input_reader):
>       #  and I do something on each row
> 
> Everything works fine until i get to a row with some strange symbols "0I`00�^"
> at that point I get an error: _csv.Error: line contains NULL byte
> 
> How can i skip such row and continue going, or "decipher" it in some way?

Well you have several options:

Without disturbing your existing code too much, you could wrap the
input_reader in a generator which skips malformed lines. That would look
something like this:

def unfussy_reader(reader):
    while True:
        try:
            yield next(reader)
        except csv.Error:
            # log the problem or whatever
            continue


If you knew what to do with the malformed data, you strip it out and
carry on. Whatever works best for you.

Alternatively you could subclass the standard Reader and do something
equivalent to the above in the __next__ method.

TJG





More information about the Python-list mailing list