Better way to recontruct a continuous and repeated array with low time complexity?

lampahome pahome.chen at mirlab.org
Thu Dec 20 04:28:40 EST 2018


I write program to do experiment about time series(weekly) with machine
learning.
I record changes of everyday of each ID and Count.
I read the csv as dataset like below:
ID, Count
1,30 // First Day
2,33
3,45
4,11
5,66
7,88
1,32 // 2nd Day
2,35
3,55
4,21
5,36
7,48

I have two array X, y. I want to put ID and Count in X, and put Count of
2nd Day in y.
The element of X and y is corresponding with the ID.
ex: X[0] and y[0] is the value where ID == 1
X[1] and y[1] is the value where ID == 2...etc

So X is like below:
array([
[1,30],
[2,33],
[3,45],
[4,11],
[5,66],
[7,88]
])

y is like below:
array([
[32],
[35],
[55],
[21],
[36],
[48]
])

Program what I write always cost O(n^2) complexity.
Code:

dataframe = pandas.read_csv(path, header=0, engine='python')
dataset = dataframe.dropna().values.astype('float64')
create_data(dataset)

def create_data(dataset):

uni = np.unique(dataset[:,0])
X = np.zeros((7, 2))
y = np.zeros((7, 1))
offset = 0
for i in uni:

index = dataset[:,0] == i

data = dataset[index]

for j in xrange(len(data)-2+1):

X[offset] = data[j]

y[offset] = data[j, 1]

offset += 1

return X, y


I use two for loop and estimate the complexity is O(n^2).

Is there any better way to re-write the code and reduct the time complexity?


More information about the Python-list mailing list