[scikit-learn] Batch Incremental Learning from Scikit-Multiflow

Farzana Anowar fad469 at uregina.ca
Fri Feb 25 14:00:00 EST 2022


Hello Scikit-learn community,

I hope you all are doing well!

I am currently working with BatchIncrementalClassifier from 
Scikit-multiflow package. For this BatchIncrementalClassifier, the 
following example is given:

# Setup a data stream
stream = SEAGenerator(random_state=1)

# Pre-training the classifier with 200 samples
X, y = stream.next_sample(200)
batch_incremental_cfier = BatchIncrementalClassifier()
batch_incremental_cfier.partial_fit(X, y)

# Preparing the processing of 5000 samples and correct prediction count
n_samples = 0
correct_cnt = 0
while n_samples < 5000 and stream.has_more_samples():
     X, y = stream.next_sample()
     y_pred = batch_incremental_cfier.predict(X)
     if y[0] == y_pred[0]:
         correct_cnt += 1
     batch_incremental_cfier.partial_fit(X, y)
     n_samples += 1
# Display results
print('Batch Incremental ensemble classifier example')
print('{} samples analyzed'.format(n_samples))
print('Performance: {}'.format(correct_cnt / n_samples))


Now my questions are:

1. For pre-training the model, the classifier used 200 samples from the 
stream, and then it does the prequential evaluation (test-train) on 5000 
samples. So, the 200 samples, are they considered as the 1st batch of 
data from the stream that is just used for pre-training and when the 2nd 
batch of data (5000) becomes available it does the evaluation based on 
the pre-train model??? (This makes sense to me, as in this way, we will 
have influence from the previous pre-trained model)

or

2. Is this one batch (200+5000) from the stream where 1st 200 samples 
have been used to pre-train and the rest of the samples are used for 
evaluation?? And when the next batch will arrive from the stream, will 
it does the same thing (200 for pre-training and the rest of them for 
evaluation)?? (If this is the case, are not we training from the scratch 
each time which does not keep the BatchIncrementalClassifier as an 
incremental classifier anymore?)


Thanks!

-- 
Best Regards,

Farzana Anowar,
PhD Candidate
Department of Computer Science
University of Regina


More information about the scikit-learn mailing list