Out of memory while reading excel file
Mahmood Naderan
nt_mahmood at yahoo.com
Wed May 10 13:11:54 EDT 2017
Well actually cells are treated as strings and not integer or float numbers.
One way to overcome is to get the number of rows and then split it to 4 or 5 arrays and then process them. However, i was looking for a better solution.
I read in pages that large excels are in the order of milion rows. Mine is about 100k. Currently, the task manager shows about 4GB of ram usage while working with numpy.
Regards,
Mahmood
--------------------------------------------
On Wed, 5/10/17, Peter Otten <__peter__ at web.de> wrote:
Subject: Re: Out of memory while reading excel file
To: python-list at python.org
Date: Wednesday, May 10, 2017, 3:48 PM
Mahmood Naderan via Python-list wrote:
> Thanks for your reply. The
openpyxl part (reading the workbook) works
> fine. I printed some debug
information and found that when it reaches the
> np.array, after some 10 seconds,
the memory usage goes high.
>
>
> So, I think numpy is unable to
manage the memory.
Hm, I think numpy is designed to manage
huge arrays if you have enough RAM.
Anyway: are all values of the same
type? Then the numpy array may be kept
much smaller than in the general case
(I think). You can also avoid the
intermediate list of lists:
wb =
load_workbook(filename='beta.xlsx', read_only=True)
ws = wb['alpha']
a = numpy.zeros((ws.max_row,
ws.max_column), dtype=float)
for y, row in enumerate(ws.rows):
a[y] = [cell.value for
cell in row]
--
https://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list