Building Your First NIML Model

In this article you will learn how to create a new NIML model, set required hyperparameters, and run fit, evaluate, and predict functions

This article will get you familiar with creating a new model and running a simple classification problem on the NIML system. Today we'll be focusing on the different pipeline components and parameters that are required to run the model. Once you've reviewed this article and are comfortable with the basics, we recommend checking out our guides on how to best tune parameters for the encoder and the NPU. If you already are a pro at NIML, click here to view this same full demo but with less introductory context.

NIS Model Pipeline

As shown above, the NIML model has three core steps: Encoding, Neural Processing, and Classification. These steps form the NIML pipeline and are outlined in more detail below.

Step 1: Load and Encode the Data

First, you want to read in a dataset and perform the encoding step on the data. Data cannot be used to train the NPU until it has been encoded into a pattern based representation known as an iSDR. The NIML software library contains an encoder that can be used for this purpose. In this instance, we will take the Iris dataset from Scikit-Learn and divide it into a test and train split.

import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

#Load the dataset
data_file = datasets.load_iris()

#Extract the data, outcome variable, and character labels from the source dataset
X = pd.DataFrame(data_file.data)
Y = pd.DataFrame(data_file.target)

#Combine the features and label into one dataset to work with NIML
data = pd.concat([Y,X], axis=1, ignore_index=True)

#Create train and test splits of the data
train, test = train_test_split(data, test_size=0.20, random_state=451)

Now that the data has been loaded and split into train and test splits, we are ready to move onto the encoding step. We will need to create an encoder object and use it for our splits.

The encoder takes global parameters that are specified as lists with list entries corresponding to their position-respective features. The global parameters are going to be set at:

- set bits set to 3
- sparsity set to 10%

The per-feature parameters are specified as lists with list entries corresponding to their position-respective features, these will be set as follows:

- field_types: An array of four "N" since all four features in the used Iris dataset are numeric
- cyclic_flages: None of our data is cyclic, so an array of four "False" values will be set
- cat_overlaps: Because the Iris dataset is numeric (not categorical) this field does not apply, so an array of four "0" values will be set
- cat_values: Because the Iris dataset is numeric (not categorical) this field does not apply, so an array of four "None" values will be set

from niml.encoder import encoder

# Create an encoder object
iris_encoder = encoder.Encoder(
    set_bits=3, 
    sparsity=0.10,
    field_types= ["N"]* 4, # 4 numeric features
    cyclic_flags=[False]* 4, # None of the fields are cyclic
    spans=       [    0]* 4, # Use simple/basic encoding bit-patterns
    cat_overlaps=[    0]* 4, # N/A, data is numeric, not categorical. Set all features to 0
    cat_values=  [ None]* 4, # N/A, data is numeric, not categorical. Set all features to None
    )

# Configure the encoder according to the training dataset's distribution
iris_encoder.config_encoder(input_data=train)

Now that the encoder has been created and configured, we can proceed to encoding the training and test data splits that were created earlier. Once we've done this, our data is ready for training and testing the NPU.

# Encode the training data
train_labels, train_isdrs, sdr_width = iris_encoder.encode(input_data=train, label_col=0)

# Encode the test data
test_labels, test_isdrs, sdr_width = iris_encoder.encode(input_data=test, label_col=0)

Step 2: Set up the Model

In order to send the encoded data through a NIML NPU, we must create a model object and initialize it with reasonable hyperparameters. The model essentially consists of neurons that learn similarities within the encoded input data. Neurons "learn" the input patterns by strengthening and weakening their synaptic connections, resulting in meaningful outputs.

We will set the model's sdr_width to the value received from encoding the data. The other hyper parameters of neurons, active_neurons, input_pct, synapse_inc, and synapse_dec must be set by the user based on their analysis of their dataset.

A number of different classifier can be used with this model. If the F34 classifier is being used, the parameters associated with it must also be set when the model object is created. If a different classifier should be used, this article illustrates how to use a third party classifier.

from niml.model import model
my_model = model.Model(
    # Endoded Data parameters
    sdr_width=sdr_width, # Recieved from encoding the data
    sdr_set_bits = 12,

    # NPU
    neurons=1024,
    active_neurons=20,
    input_pct=0.6,
    learning=True,
    synapse_inc=15,
    synapse_dec=3,

    # Boosting
    boost_strength=0.9,
    boost_table_length=21,
    boost_bend_factor=0.01,
    boost_frequency=6,

    # Classifier
    subclass_thresh=0.5,
    min_overlap=0.1,
    seed=123,
)

Step 3: Training The Model

Now that we have a model, we can begin to train it. We will begin by running the training split through the model so the model can train against the data. This is accomplished by running the NIML model fit function and passing the training data and the training labels in as inputs. We also need to define the number of epochs to run. For the NIML model, 10 epochs is usually sufficient.

my_model.fit(labels=train_labels, isdrs=train_isdrs, epochs=15)

Step 4: Evaluate the Model

We can now assess the trained model by running a labeled test set of data through the model and gather metrics against it. To do this, we'll call the NIML model's evaluate function, providing labels and data as inputs.

# Run the models evaluate method
metrics = my_model.evaluate(labels=test_labels, isdrs=test_isdrs)

# Display the gathered metrics
print("Accuracy:", metrics["accuracy_score"])
print("F1 :", metrics["f1_score"])
print("Confusion matrix:")
for row in metrics["confusion_matrix"]:
    for value in row:
        print("%4d" % value, end="")
    print(" | total: %d" % sum(row))

Accuracy: 0.9666666666666667
F1 : 0.9675645342312009
Confusion matrix:
  11   0   0 | total: 11
   0  13   1 | total: 14
   0   0   5 | total: 5

Step 5: Generate Predictions

To illustrate the use of the NIML model's predict function, we will send the test split of the dataset through the model again, this time without labels.

We will print the ground truth (from the test labels list) along with the model's prediction. We will tag each line of output to show whether it is correct or not.

# Run the models predict method
predictions = my_model.predict(isdrs=test_isdrs)

# Display the result of predictions
all_predictions= []
missed_predictions = []
for idx in range(len(predictions)):
    pstring  = "%3d " % idx
    pstring += "gt: %-7s" % test_labels[idx]
    pstring += "prediction: %-7s " % predictions[idx]
    if test_labels[idx] == predictions[idx]:
        pstring += "correct"
    else:
        pstring += "missed prediction"
        missed_predictions.append(pstring)
    all_predictions.append(pstring)

print("Missed predictions (%d)" % len(missed_predictions))
print("\n".join(missed_predictions))

print("\nAll predictions")
print("\n".join(all_predictions))

Missed predictions (1)
 18 gt: 1.0    prediction: 2.0     missed prediction

All predictions
  0 gt: 0.0    prediction: 0.0     correct
  1 gt: 1.0    prediction: 1.0     correct
  2 gt: 2.0    prediction: 2.0     correct
  3 gt: 1.0    prediction: 1.0     correct
  4 gt: 0.0    prediction: 0.0     correct
  5 gt: 0.0    prediction: 0.0     correct
  6 gt: 2.0    prediction: 2.0     correct
  7 gt: 1.0    prediction: 1.0     correct
  8 gt: 1.0    prediction: 1.0     correct
  9 gt: 0.0    prediction: 0.0     correct
 10 gt: 2.0    prediction: 2.0     correct
 11 gt: 1.0    prediction: 1.0     correct
 12 gt: 1.0    prediction: 1.0     correct
 13 gt: 0.0    prediction: 0.0     correct
 14 gt: 1.0    prediction: 1.0     correct
 15 gt: 2.0    prediction: 2.0     correct
 16 gt: 1.0    prediction: 1.0     correct
 17 gt: 1.0    prediction: 1.0     correct
 18 gt: 1.0    prediction: 2.0     missed prediction
 19 gt: 0.0    prediction: 0.0     correct
 20 gt: 0.0    prediction: 0.0     correct
 21 gt: 1.0    prediction: 1.0     correct
 22 gt: 0.0    prediction: 0.0     correct
 23 gt: 0.0    prediction: 0.0     correct
 24 gt: 1.0    prediction: 1.0     correct
 25 gt: 1.0    prediction: 1.0     correct
 26 gt: 2.0    prediction: 2.0     correct
 27 gt: 0.0    prediction: 0.0     correct
 28 gt: 1.0    prediction: 1.0     correct
 29 gt: 0.0    prediction: 0.0     correct

At this point, we can continue to tune the model parameters manually or through hyper parameter searches. Test out different parameter combinations to get a sense of the relationships between parameters, and give the model a try on some simple datasets of your own choosing.

Check out the rest of our knowledge base articles for details on how to select parameters based on your data, how to use different types of classifiers, and how to build models making use of NIML's noise and missing data resilience!