[Numpy-discussion] record data previous to Numpy use
paul.carrico at free.fr
paul.carrico at free.fr
Thu Jul 6 06:19:16 EDT 2017
Thanks Rober for your effort - I'll have a look on it
... the goal is be guide in how to proceed (and to understand), and not
to have a "ready-made solution" ... but I appreciate honnestly :-)
Paul
Le 2017-07-06 11:51, Robert Kern a écrit :
> On Thu, Jul 6, 2017 at 1:49 AM, <paul.carrico at free.fr> wrote:
>>
>> Dear All
>>
>> First of all thanks for the answers and the information's (I'll ding into it) and let me trying to add comments on what I want to :
>>
>> My asci file mainly contains data (float and int) in a single column
>> (it is not always the case but I can easily manage it - as well I saw I can use 'spli' instruction if necessary)
>> Comments/texts indicates the beginning of a bloc immediately followed by the number of sub-blocs
>> So I need to read/record all the values in order to build a matrix before working on it (using Numpy & vectorization)
>>
>> The columns 2 and 3 have been added for further treatments
>> The '0' values will be specifically treated afterward
>>
>>
>> Numpy won't be a problem I guess (I did some basic tests and I'm quite confident) on how to proceed, but I'm really blocked on data records ... I trying to find a way to efficiently read and record data in a matrix:
>>
>> avoiding dynamic memory allocation (here using 'append' in python meaning, not np),
>
> Although you can avoid some list appending in your case (because the blocks self-describe their length), I would caution you against prematurely avoiding it. It's often the most natural way to write the code in Python, so go ahead and write it that way first. Once you get it working correctly, but it's too slow or memory intensive, then you can puzzle over how to preallocate the numpy arrays later. But quite often, it's fine. In this case, the reading and handling of the text data itself is probably the bottleneck, not appending to the lists. As I said, Python lists are cleverly implemented to make appending fast. Accumulating numbers in a list then converting to an array afterwards is a well-accepted numpy idiom.
>
>> dealing with huge asci file: the latest file I get contains more than 60 million of lines
>>
>> Please find in attachment an extract of the input format ('example_of_input'), and the matrix I'm trying to create and manage with Numpy
>>
>> Thanks again for your time
>
> Try something like the attached. The function will return a list of blocks. Each block will itself be a list of numpy arrays, which are the sub-blocks themselves. I didn't bother adding the first three columns to the sub-blocks or trying to assemble them all into a uniform-width matrix by padding with trailing 0s. Since you say that the trailing 0s are going to be "specially treated afterwards", I suspect that you can more easily work with the lists of arrays instead. I assume floating-point data rather than trying to figure out whether int or float from the data. The code can handle multiple data values on one line (not especially well-tested, but it ought to work), but it assumes that the number of sub-blocks, index of the sub-block, and sub-block size are each on the own line. The code gets a little more complicated if that's not the case.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170706/cce20e07/attachment-0001.html>
More information about the NumPy-Discussion
mailing list