In this guide, we’ll share our best practices in setting up your NPU so that it functions well, and so that further hyperparameter searches can be narrowed to appropriate ranges.
Before you create initial settings for your NPU, you’ll want to assess the following about your data and encoding:
- The number of observations and features in your dataset
- The sparsity that you chose for your data encoding
- How “complicated” the problem is to solve—think class separability, informative features, etc.
These will help ensure that you are providing adequate model resources without creating an overfit model. These should have been determined by your exploratory analysis and by your encoding process.
The guidance below isn't intended to provide an optimal model on a first pass, but rather establish realistic ranges of values which can be searched based on your integral
Step 1: Decide how many neurons should be in your NPU.
Typically, between 500 to 2,000 neurons are required in a successful model, and the specific choice for your data should be based on the following considerations:
- The more separable your data, the less neurons you’ll need. Separability can be assessed by computing clustering quality metrics or by visual inspection of histograms of each class.
- The more classes you have, the more neurons you’ll need. Sufficient neurons need to be available to learn each distinct class
- The smaller your data, the fewer neurons you should use to avoid overfitting. For most models, you will want to use less neurons than you have observations.
- The maximum number of neurons that can be run with hardware acceleration is 2,000.
Step 2: Decide how many neurons should activate for each observation
Active neurons should represent those neurons that are naturally most similar to a given input pattern. These neurons will then specifically "learn" that input pattern even more precisely, allowing them to specialize.
In general, this should be 1- 10% of the total number of neurons in the NPU. If only 500 neurons are used, 10 active neurons may be appropriate for a simple dataset. If 2000 are used, 25 may be a better number. Again, consider the following:
- How many classes are there in your data?
- How complex are the classes in your data? Again, the more separable your classes, the fewer active neurons you’ll need.
Each neuron is learning different components of the patterns from your dataset. If you select very few active neurons, only the most basic, dominant patterns will identified and learned. If you select many active neurons, some of the selected neurons will offer little to no value by their contribution because they won't naturally be well aligned.
Step 3: Set the neuron input percentage
Neurons in the NIML model rely on strengthening and weakening synaptic connections to determine when they should fire—just like in the human brain’s Hebbian learning processes. The input percentage controls the proportion of locations in each neuron where synaptic connections are allowed to grow.
To tune this, you’ll want to refer back to the encoder sparsity that you selected to make your iSDRs. If you have initialized your encoder with sparsity values in the recommended range between 5-15%, an input percentage between 50-70% should be appropriate. If your sparsity is higher, say 30%, you’ll want a lower input percentage, and vice versa.
A higher input percentage will allow neurons to learn very specific patterns from the inputs, but those patterns may not generalize very well, while a low input percent may not pick up very much of the nuance present in each class.
Step 4: Determine the learning rates
The synaptic strength connections are incremented (strengthened) or decremented (weakened) to help neurons specialize to specific patterns present in the input data. The incremental learning rate is applied to synaptic connections where similarity between iSDRs and neurons are found, and the decrement learning rate is applied with no similarity is found.
Typically, your increment rate should be about 3 times that of your decrement rate to account for the less frequent detection of similarities.
The actual values of these rates should be determined based on the size of your dataset. For a small dataset, say 400 observations with a relatively low amount of noise, a larger inc/dec rate is appropriate, say between 8 and 24. For a larger dataset, say 10,000 observations with more noise, a 1 to 3 or 2 to 6 rate will be better.
Step 5: Set final parameters
Number of Epochs: The NIML model doesn't require a large number of epochs to reach a fully trained state, and 10 epochs is sufficient for most problems.- With a large dataset (10k observations or more) only 5-7 epochs is sufficient for the model to learn the different input patterns. However, for smaller datasets (under 1000), sometimes increasing the number of epochs to 15 is helpful to make sure that each pattern has been examined by the NPU. Keep in mind that other parameters such as the NPU size and synaptic learning rates should be adjusted first to align with dataset size before the number of epochs is adjusted.
Batch size: The batch size default is 1 and is appropriate for most models, but for larger datasets, batch sizes up to 30 may show some improvements.
Boosting: Boosting is a mechanism designed to keep the model from converging too quickly to a non-optimal solution. Boosting parameters should typically be set to the following defaults:
- Boost Max: 0.25
- Boost Strength: 0.05
- Boost Frequency: 3
A deeper discussion of boosting parameters will be posted soon with suggestions of when and how these may be adjusted for more extreme problems, so stay tuned for that!
Step 6: Model tuning
After parameterizing your model and running it for the first time, you're ready to start fine tuning. We typically do a few iterations of manual tuning and then attempt to optimize further by using parameter searches.
Certain parameters tend to have a much larger effect on the performance of the model than others, so focusing on these is the best way to quickly improve your model. We advise focusing specifically on set_bits, sparsity, active_neurons, and input_pct. Changing the set bits and sparsity will help you hone in on the best encoding to capture your specific data, while adjusting the active neurons and input percentage will help to increase the entropy of the model, leading to higher accuracy. Other parameters can certainly help beyond this, but these are a really good start