[Tutor] Pointers Towards Appropriate Python Methods

Mats Wichmann mats at wichmann.us
Sun Sep 29 18:12:14 EDT 2019


On 9/29/19 12:28 PM, Stephen P. Molnar wrote:

> At this point what I would like are pointers towards python method for 
> processing a large number of data files. I'm not asking anyone to write 
> the coed for me.

not quite ignoring the slightly amusing typo :)

if you have to deal with lots of data files, things will probably get a 
bit slow.  many of the things going on in dealing with text files in 
particular - and your recent samples have been text files with fields 
separated by some particular separator character (commonly called csv 
files after the case where the comma is the separator) - aren't that 
speedy.  there's the involvement of calling out to the operating system 
which is going to do the work of reading data off of permanent storage 
and getting it back to you, and there's the issue of type conversion, 
and then there's the issue of stepping through line by line which is 
happening even if it is hidden from you by the particular methods 
involved.  Just the way it is.  One performance idea is to make sure you 
deal with a file in its entirety and then get rid of it (close, or 
whatever happens to suit the circumstances - I don't mean delete the 
file) or your memory usage will become a problem as well.

there's a fairly popular library called Pandas that you could take a 
look at to see if it suits your purposes in any way, might keep you from 
designing the entire application.

https://pandas.pydata.org/




More information about the Tutor mailing list