file data => array(s)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Dec 14 18:27:58 EST 2011


On Wed, 14 Dec 2011 14:20:40 -0800, Eric wrote:

> I'm trying to read some file data into a set of arrays.  The file data
> is just four columns of numbers, like so:
> 
>    1.2    2.2   3.3  0.5
>    0.1   0.2    1.0  10.1
>    ... and so on
> 
> I'd like to read this into four arrays, one array for each column.
> Alternatively, I guess something like this is okay too:
> 
>    [[1.2, 2.2, 3.3, 0.5], [0.1, 0.2, 1.0, 10.1], ... and so on]

First thing: due to the fundamental nature of binary floating point 
numbers, if you convert text like "0.1" to a float, you don't get 0.1, 
you get 0.10000000000000001. That is because 0.1000...01 is the closest 
possible combination of fractions of 1/2, 1/4, 1/8, ... that adds up to 
1/10.

If this fact disturbs you, you can import the decimal module and use 
decimal.Decimal instead; otherwise forget I said anything and continue 
using float. I will assume you're happy with floats.

Assuming the file is small, say, less than 50MB, I'd do it like this:


# Version for Python 2.x
f = open(filename, 'r')
text = f.read()  # Grab the whole file at once.
numbers = map(float, text.split())
f.close()

That gives you a single list [1.2, 2.2, 3.3, 0.5, 0.1, 0.2, ...] which 
you can now split into groups of four. There are lots of ways to do this. 
Here's an inefficient way which hopefully will be simple to understand:

result = []
while numbers != []:
    result.append(numbers[0:4])
    del numbers[0:4]


Here is a much more efficient method which is only a tad harder to 
understand:

result = []
for start in range(0, len(numbers), 4):
    result.append(numbers[start:start+4])


And just for completeness, here is an advanced technique using itertools:

n = len(numbers)//4
numbers = iter(numbers)
from itertools import islice
result = [list(islice(numbers, 4)) for i in range(n)]

Be warned that this version throws away any partial group left over at 
the end; if you don't want that, change the line defining n to this 
instead:

n = len(numbers)//4 + (len(numbers)%4 > 0)


-- 
Steven



More information about the Python-list mailing list