Modeling the Wisconsin Breast Cancer Dataset

Below, we'll walk through all the steps necessary to encode data and train our model for successful classification on the WBC dataset.

Before we begin, note that we'll be using jupyter notebooks and the NIML release 0.7.1 for this demo.

Step 1: Read in and format the data

The Wisconsin Breast Cancer data can be pulled directly from the Scikit-Learn module as shown below. To make things simpler for the NIML model, we formatted the data so the labels and observation values were in the same dataframe, and then we split the data into training and testing sets.

import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

#Load the dataset
data_file = datasets.load_breast_cancer()

#Extract the data, outcome variable, and character labels from the source dataset
X = pd.DataFrame(data_file.data)
Y = pd.DataFrame(data_file.target)

#Combine the features and label into one dataset to work more easily with NIML
data = pd.concat([Y,X], axis=1, ignore_index=True)

#Create train and test splits of the dara
train, test = train_test_split(data, test_size=0.25, random_state=592)

Step 2: Encode the data

Next we'll create an instance of the encoder and configure it from the training data using the encoder.config_encoder() method. Once configured, the training data can be encoded into pattern-based iSDRs using the encoder.encode() function. This will return our training labels, training isdrs, and the sdr width, all of which will be used when we train and evaluate the model.

from niml.encoder import encoder
# Calculate the number of features given the number of columns of data, minus the label_col
num_features = train.shape[1]-1
# Build the encoder
wsbc_encoder = encoder.Encoder(set_bits=10, sparsity=.06,
    field_types = ["N"] * num_features, # 30 numeric features
    cyclic_flags = [False] * num_features, # none of the fields are cyclic
    spans = [0] * num_features, # use simple/basic encoding bit-patterns
    cat_overlaps = [0] * num_features, # N/A, data is numeric, not categorical. Set all features to 0
    cat_values = [None] * num_features, # N/A, data is numeric, not categorical. Set all features to None
)
# Configure the encoder according to the training dataset's distribution
wsbc_encoder.config_encoder(input_data=train, label_col=0)
# Encode the training data -> produce encoded inputs (tr_isdrs) to be sent to the Pooler for learning
train_labels, train_isdrs, sdr_width = wsbc_encoder.encode(input_data=train, label_col=0)

Step 3: Create and train the model

Now we're ready to set up the NIML model. To instance the model, values need to be selected for the hyperparameters listed below. Note that in this demo, we're using our F34 classifier, so classification parameters are also set in this step.

Tip: Choosing optimal values for the NPU can be challenging! To learn the most efficient way to tune your model, please refer to the tutorial XXXXX

from niml.model import model

# Build a NIML model
my_model = model.Model(
    # Encoder/Data parameters
    sdr_width=sdr_width,
    sdr_set_bits=11,

    # NPU parameters
    neurons=1024,
    active_neurons=25,
    input_pct=0.6,
    synapse_inc=15,
    synapse_dec=3,
    seed=123,

    # Boosting parameters
    boost_frequency=6,
    boost_strength=0.09,
    boost_bend_factor=0.175,
    boost_table_length=21,

    # Classifier parameters
    subclass_thresh=0.4,
    min_overlap=0.01,
)

# Fit the model to the training data (iSDRs)
my_model.fit(labels=train_labels, isdrs=train_isdrs, epochs=15, verbose = True)

starting epoch 0
starting epoch 1
starting epoch 2
starting epoch 3
starting epoch 4
starting epoch 5
starting epoch 6
starting epoch 7
starting epoch 8
starting epoch 9
starting epoch 10
starting epoch 11
starting epoch 12
starting epoch 13
starting epoch 14

Step 4: Evaluate the model's metrics

To evaluate the model on the test data, we'll first need to encode it into patterns. We already configured our encoder in step 2, and we'll use that same encoder to encode the test observations.

Finally, we can assess the model's performance by running the model.evalute() function which returns a macro F1 score, an accuracy score, and a confusion matrix. For other metric options check out this demo.

Caution: Do NOT instance and configure a separate encoder for the test data as it will create inconsistent encoder settings and lead to data leakage.

# Encode the test data using the encoder we creating with the training dataset
test_labels, test_isdrs, sdr_width = wsbc_encoder.encode(input_data=test, label_col=0)

#pass the test data through the trained model and evaluate its performance
results = my_model.evaluate(labels=test_labels, isdrs=test_isdrs)

print("Model metrics on test dataset: ", results)

Model metrics on test dataset:  {'f1_score': 0.9376902273927445, 'accuracy_score': 0.9370629370629371, 'confusion_matrix': [[53, 1], [8, 81]]}

Step 5: Get predictions

There are two ways that we can make use of the predict functionality. When a model is ready to be deployed, we can use it with unlabeled incoming data to get classifications. When a model is still being tuned, we can use it with our test data to identify specific observations that are being misclassified.

To get predictions, run the model.predict() function.

Below, we've printed out the prediction, the actual correct label and whether our model was correct or incorrect in it's prediction. This information can help us to better understand our models behavior and further tune it for superior performance.

y_pred = my_model.predict(isdrs=test_isdrs)
# Display the predictions and ground truth labels for the test dtaset
for ground_truth, prediction in zip(test_labels, y_pred):
    print("Truth: %6s prediction: %6s" % (ground_truth, prediction), end=" ")
    if (ground_truth != prediction):
        print(" -- result: MISSED PREDICTION")
    else:
        print(" -- result: correct")

Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    1.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    0.0  -- result: MISSED PREDICTION
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    1.0 prediction:    1.0  -- result: correct
Truth:    0.0 prediction:    0.0  -- result: correct

Feel free to check out our API documentation for more parameter details, and check out our tutorials for more details on model tuning and capabilities!