How to Use Scikit-Learn Metrics with the NIML Model

The NIML model outputs are easy to evaluate using Scikit-Learn scoring functions should the built in functions lack the level of granularity required for a specific model

We'll quickly build, train, and return the predictions of a NIML Model, and then show how you can run the output through a few different Scikit-Learn metric functions. More information regarding sklearn metrics can be found here

Making the Model:

Load The Data

Just as we have done in many of the other demonstrations, load the dataset you wish to use and split it into train and test splits. Below, we're using the simple Iris dataset.

import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

#Load the dataset
data_file = datasets.load_iris()

#Extract the data, outcome variable, and character labels from the source dataset
X = pd.DataFrame(data_file.data)
Y = pd.DataFrame(data_file.target)

#Combine the features and label into one dataset to work with NIML
data = pd.concat([Y,X], axis=1, ignore_index=True)

#Create train and test splits of the data
train, test = train_test_split(data, test_size=0.20, random_state=451)

Encode The Data

Now that we have the split data, we must encode it before sending it through our NIML model.

from niml.encoder import encoder

# Create an encoder object
iris_encoder = encoder.Encoder(
    set_bits=3, 
    sparsity=0.10,
    field_types= ["N"]* 4, # 4 numeric features
    cyclic_flags=[False]* 4, # None of the fields are cyclic
    spans=       [    0]* 4, # Use simple/basic encoding bit-patterns
    cat_overlaps=[    0]* 4, # N/A, data is numeric, not categorical. Set all features to 0
    cat_values=  [ None]* 4, # N/A, data is numeric, not categorical. Set all features to None
    )

# Configure the encoder according to the training dataset's distribution
iris_encoder.config_encoder(input_data=train)

# Encode the training data
train_labels, train_isdrs, sdr_width = iris_encoder.encode(input_data=train, label_col=0)

# Encode the test data
test_labels, test_isdrs, sdr_width = iris_encoder.encode(input_data=test, label_col=0)

Create The Model

Now that we have the encoded data, we will build a model for training.

from niml.model import model
my_model = model.Model(
    # Endoded Data parameters
    sdr_width=sdr_width,# Recieved from encoding the data
    sdr_set_bits=3,

    # NPU
    neurons=200,
    active_neurons=10,
    input_pct=0.85,
    learning=True,
    synapse_inc=10,
    synapse_dec=3,

    # Boosting
    boost_frequency=6,
    boost_strength=0.09,
    boost_bend_factor=0.175,
    boost_table_length=21,

    # Classifier
    subclass_thresh=0.5,
    min_overlap=0.0,
    seed=123,
)

Training The Model

With the model constructed we can send the training split through so the model can train against the data using the model.fit() function.

my_model.fit(labels=train_labels, isdrs=train_isdrs, epochs=15)

Using Sklearn Metrics

By using the NIML Model predict function, we can generate a list of predicted class labels for each observation which can be compared to the test_labels allowing many different metrics to be calculated.

While many classification metrics can be computed using the sklearn metrics module, we find use in the functions listed below to help to compute standard data science metrics that help us more deeply understand the behavior of the model.

metrics.accuracy_score():
metrics.balanced_accuracy_score():
metrics.f1_score():
metrics.precision_score()
metrics.recall_score()
metics.classification_report()

Below,

# Get predicitons
preds = my_model.predict(isdrs=test_isdrs)

from sklearn import metrics
acc = metrics.accuracy_score(test_labels, preds)
print("Accuracy: ",acc)

f1 = metrics.f1_score(test_labels, preds, average='weighted')
print("F1: ", f1)

pres_score = metrics.precision_score(test_labels, preds, average='weighted')
print("Precision:", pres_score)

class_rep = metrics.classification_report(test_labels, preds)
print("\nClassification Report:\n", class_rep)

Accuracy:  0.8333333333333334
F1:  0.8429951690821257
Precision: 0.9166666666666666

Classification Report:
               precision    recall  f1-score   support

         0.0       1.00      1.00      1.00        11
         1.0       1.00      0.64      0.78        14
         2.0       0.50      1.00      0.67         5

    accuracy                           0.83        30
   macro avg       0.83      0.88      0.82        30
weighted avg       0.92      0.83      0.84        30