[Numpy-discussion] record data previous to Numpy use

paul.carrico at free.fr paul.carrico at free.fr
Thu Jul 6 04:49:26 EDT 2017


Dear All

First of all thanks for the answers and the information's (I'll ding
into it) and let me trying to add comments on what I want to : 

 	* My asci file mainly contains data (float and int) in a single column
 	* (it is not always the case but I can easily manage it - as well I
saw I can use 'spli' instruction if necessary)
 	* Comments/texts indicates the beginning of a bloc immediately
followed by the number of sub-blocs

	* So I need to read/record all the values in order to build a matrix
before working on it (using Numpy & vectorization) 

 	* The columns 2 and 3 have been added for further treatments
 	* The '0' values will be specifically treated afterward

Numpy won't be a problem I guess (I did some basic tests and I'm quite
confident) on how to proceed, but I'm really blocked on data records … I
trying to find a way to efficiently read and record data in a matrix: 

 	* avoiding dynamic memory allocation (here using 'append' in python
meaning, not np),
 	* dealing with huge asci file: the latest file I get contains more
than 60 MILLION OF LINES

Please find in attachment an extract of the input format
('example_of_input'), and the matrix I'm trying to create and manage
with Numpy 

Thanks again for your time 

Paul 

####################################### 

##BEGIN _-> line number x in the original file_ 

42   _-> indicates the number of sub-blocs_ 

1     _-> number of the 1rst sub-bloc_ 

6     _-> gives how many value belong to the sub bloc_ 

12 

47 

2 

46 

3 

51 

…. 

13  _ -> another type of sub-bloc with 25 values_ 

25 

15 

88 

21 

42 

22 

76 

19 

89 

0 

18 

80 

23 

38 

24 

73 

20 

81 

0 

90 

0 

41 

0 

39 

0 

77 

… 

42 _-> another type of sub-bloc with 2 values_ 

2 

115 

109 

 ####################################### 

THE MATRIX RESULT 

1 0 0 6 12 47 2 46 3 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

2 0 0 6 3 50 11 70 12 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

3 0 0 8 11 50 3 49 4 54 5 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

4 0 0 8 12 70 11 66 9 65 10 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

5 0 0 8 2 47 12 68 10 44 1 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

6 0 0 8 5 56 6 58 7 61 11 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

7 0 0 8 11 61 7 60 8 63 9 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

8 0 0 19 12 47 2 46 3 51 0 13 97 14 92 15 96 0 72 0 48 0 52 0 0 0 0 0 0 

9 0 0 19 13 97 14 92 15 96 0 16 86 17 82 18 85 0 95 0 91 0 90 0 0 0 0 0
0 

10 0 0 19 3 50 11 70 12 51 0 15 89 19 94 13 96 0 52 0 71 0 72 0 0 0 0 0
0 

11 0 0 19 15 89 19 94 13 96 0 18 81 20 84 16 85 0 90 0 77 0 95 0 0 0 0 0
0 

12 0 0 25 3 49 4 54 5 57 11 50 0 15 88 21 42 22 76 19 89 0 52 0 53 0 55
0 71 

13 0 0 25 15 88 21 42 22 76 19 89 0 18 80 23 38 24 73 20 81 0 90 0 41 0
39 0 77 

14 0 0 25 11 66 9 65 10 68 12 70 0 19 78 25 99 26 98 13 94 0 71 0 67 0
69 0 72 

…. 

####################################### 

AN EXAMPLE OF THE CODE I STARTED TO WRITE 

# -*- coding: utf-8 -*- 

 import time, sys, os, re 

import itertools 

import numpy as np 

PATH = str(os.path.abspath('')) 

input_file_name ='/example_of_input.txt' 

## check if the file exists, then if it's empty or not 

if (os.path.isfile(PATH + input_file_name)): 

    if (os.stat(PATH + input_file_name).st_size > 0): 

        ## go through the file in order to find specific sentences 

        ## specific blocks will be defined afterward         

        Block_position = []; j=0; 

        with open(PATH + input_file_name, "r") as data: 

            for line in data: 

                if '##BEGIN' in line: 

                    Block_position.append(j) 

                j=j+1 

        ## just to tests to get all the values 

#        i = 0 

#        data = np.zeros( (505), dtype=np.int ) 

#        with open(PATH + input_file_name, "r") as f: 

#            for i in range (0,505): 

#                data[i] = int(f.read(Block_position[0]+1+i)) 

#                print ("i = ", i) 

#           for line in itertools.islice(f,Block_position[0],516): 

#               data[i]=f.read(0+i) 

#               i=i+1 

    else: 

        print "The file %s is empty : post-processing cannot be
performed !!!\n" % input_file_name 

else: 

    print "Error : the file %s does not exist: post-processing stops
!!!\n" % input_file_name
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170706/b2b1beb2/attachment-0001.html>


More information about the NumPy-Discussion mailing list