[Numpy-discussion] record data previous to Numpy use
paul.carrico at free.fr
paul.carrico at free.fr
Fri Jul 7 10:24:08 EDT 2017
Ounce again I would like to thanks the community for the supports.
I progressing in moving my code to Python ..
In my mind some parts remains quite hugly (and burns me the eyes), but
it works and I'll optimized it in the future ; so far I can work with
the data in a single reading
I builts some blocks in a text file and used Astropy to read it (work
fine now - i'll test pandas next step)
Not finish yet but in a significant progress compare to yesterday :-)
Have a good WE
ps : I'd like to use the following code that is much more familiar for
COMP_list = np.asarray(COMP_list, dtype = np.float64)
i = np.arange(1,NumberOfRecords,2)
COMP_list = np.delete(COMP_list,i)
Le 2017-07-07 12:04, Derek Homeier a écrit :
> On 7 Jul 2017, at 1:59 am, Chris Barker <Chris.Barker at noaa.gov> wrote:
>> On Thu, Jul 6, 2017 at 10:55 AM, <paul.carrico at free.fr> wrote:
>> It's is just a reflexion, but for huge files one solution might be to split/write/build first the array in a dedicated file (2x o(n) iterations - one to identify the blocks size - additional one to get and write), and then to load it in memory and work with numpy -
>> I may have your use case confused, but if you have a huge file with multiple "blocks" in it, there shouldn't be any problem with loading it in one go -- start at the top of the file and load one block at a time (accumulating in a list) -- then you only have the memory overhead issues for one block at a time, should be no problem.
>> at this stage the dimension is known and some packages will be fast and more adapted (pandas or astropy as suggested).
>> pandas at least is designed to read variations of CSV files, not sure you could use the optimized part to read an array out of part of an open file from a particular point or not.
> The fragmented structure indeed would probably be the biggest challenge, although astropy,
> while it cannot read from an open file handle, at least should be able to directly parse a block
> of input lines, e.g. collected with readline() in a list. Guess pandas could do the same.
> Alternatively the line positions of the blocks could be directly passed to the data_start and
> data_end keywords, but that would require opening and at least partially reading the file
> multiple times. In fact, if the blocks are relatively small, the overhead may be too large to
> make it worth using the faster parsers - if you look at the timing notebooks I had linked to
> earlier, it takes at least ~100 input lines before they show any speed gains over genfromtxt,
> and ~1000 to see roughly linear scaling. In that case writing your own customised reader
> could be the best option after all.
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion