Better way to recontruct a continuous and repeated array with low time complexity?
lampahome
pahome.chen at mirlab.org
Thu Dec 20 04:28:40 EST 2018
I write program to do experiment about time series(weekly) with machine
learning.
I record changes of everyday of each ID and Count.
I read the csv as dataset like below:
ID, Count
1,30 // First Day
2,33
3,45
4,11
5,66
7,88
1,32 // 2nd Day
2,35
3,55
4,21
5,36
7,48
I have two array X, y. I want to put ID and Count in X, and put Count of
2nd Day in y.
The element of X and y is corresponding with the ID.
ex: X[0] and y[0] is the value where ID == 1
X[1] and y[1] is the value where ID == 2...etc
So X is like below:
array([
[1,30],
[2,33],
[3,45],
[4,11],
[5,66],
[7,88]
])
y is like below:
array([
[32],
[35],
[55],
[21],
[36],
[48]
])
Program what I write always cost O(n^2) complexity.
Code:
dataframe = pandas.read_csv(path, header=0, engine='python')
dataset = dataframe.dropna().values.astype('float64')
create_data(dataset)
def create_data(dataset):
uni = np.unique(dataset[:,0])
X = np.zeros((7, 2))
y = np.zeros((7, 1))
offset = 0
for i in uni:
index = dataset[:,0] == i
data = dataset[index]
for j in xrange(len(data)-2+1):
X[offset] = data[j]
y[offset] = data[j, 1]
offset += 1
return X, y
I use two for loop and estimate the complexity is O(n^2).
Is there any better way to re-write the code and reduct the time complexity?
More information about the Python-list
mailing list