The NIML model outputs are easy to evaluate using Scikit-Learn scoring functions should the built in functions lack the level of granularity required for a specific model
We'll quickly build, train, and return the predictions of a NIML Model, and then show how you can run the output through a few different Scikit-Learn metric functions. More information regarding sklearn metrics can be found here
Making the Model:
Load The Data
Just as we have done in many of the other demonstrations, load the dataset you wish to use and split it into train and test splits. Below, we're using the simple Iris dataset.
import pandas as pd import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split #Load the dataset data_file = datasets.load_iris() #Extract the data, outcome variable, and character labels from the source dataset X = pd.DataFrame(data_file.data) Y = pd.DataFrame(data_file.target) #Combine the features and label into one dataset to work with NIML data = pd.concat([Y,X], axis=1, ignore_index=True) #Create train and test splits of the data train, test = train_test_split(data, test_size=0.20, random_state=451)
Encode The Data
Now that we have the split data, we must encode it before sending it through our NIML model.
from niml.encoder import encoder # Create an encoder object iris_encoder = encoder.Encoder( set_bits=3, sparsity=0.10, field_types= ["N"]* 4, # 4 numeric features cyclic_flags=[False]* 4, # None of the fields are cyclic spans= [ 0]* 4, # Use simple/basic encoding bit-patterns cat_overlaps=[ 0]* 4, # N/A, data is numeric, not categorical. Set all features to 0 cat_values= [ None]* 4, # N/A, data is numeric, not categorical. Set all features to None ) # Configure the encoder according to the training dataset's distribution iris_encoder.config_encoder(input_data=train) # Encode the training data train_labels, train_isdrs, sdr_width = iris_encoder.encode(input_data=train, label_col=0) # Encode the test data test_labels, test_isdrs, sdr_width = iris_encoder.encode(input_data=test, label_col=0)
Create The Model
Now that we have the encoded data, we will build a model for training.
from niml.model import model
my_model = model.Model(
# Endoded Data parameters
sdr_width=sdr_width,# Recieved from encoding the data
sdr_set_bits=3,
# NPU
neurons=200,
active_neurons=10,
input_pct=0.85,
learning=True,
synapse_inc=10,
synapse_dec=3,
# Boosting
boost_frequency=6,
boost_strength=0.09,
boost_bend_factor=0.175,
boost_table_length=21,
# Classifier
subclass_thresh=0.5,
min_overlap=0.0,
seed=123,
)
Training The Model
With the model constructed we can send the training split through so the model can train against the data using the model.fit() function.
my_model.fit(labels=train_labels, isdrs=train_isdrs, epochs=15)
Using Sklearn Metrics
By using the NIML Model predict function, we can generate a list of predicted class labels for each observation which can be compared to the test_labels allowing many different metrics to be calculated.
While many classification metrics can be computed using the sklearn metrics module, we find use in the functions listed below to help to compute standard data science metrics that help us more deeply understand the behavior of the model.
- metrics.accuracy_score():
- metrics.balanced_accuracy_score():
- metrics.f1_score():
- metrics.precision_score()
- metrics.recall_score()
- metics.classification_report()
Below,
# Get predicitons
preds = my_model.predict(isdrs=test_isdrs)
from sklearn import metrics
acc = metrics.accuracy_score(test_labels, preds)
print("Accuracy: ",acc)
f1 = metrics.f1_score(test_labels, preds, average='weighted')
print("F1: ", f1)
pres_score = metrics.precision_score(test_labels, preds, average='weighted')
print("Precision:", pres_score)
class_rep = metrics.classification_report(test_labels, preds)
print("\nClassification Report:\n", class_rep)
Accuracy: 0.8333333333333334
F1: 0.8429951690821257
Precision: 0.9166666666666666
Classification Report:
precision recall f1-score support
0.0 1.00 1.00 1.00 11
1.0 1.00 0.64 0.78 14
2.0 0.50 1.00 0.67 5
accuracy 0.83 30
macro avg 0.83 0.88 0.82 30
weighted avg 0.92 0.83 0.84 30