Feature selection is the processing of choosing which inputs (or features) are necessary when creating and training a machine learning model. By discovering which features are the most important, less important features can be dropped as inputs. This has the effect of saving computational time, complexity, and memory.
However, this becomes even more important in embedded machine learning, where resources are scarce and a “feature” might mean an entire sensor. Dropping an entire sensor to achieve the same results can mean saving costs, board space, and power!
You can read a written tutorial demonstrating these concepts here: https://www.digikey.com/en/maker/projects/feature-selection-for-embedded-machine-learning/d9b3815901824489af5f46a023e25145
All code for this demonstration can be found here: https://github.com/ShawnHymel/perfect-toast-machine
Feature selection is similar to dimensionality reduction where we try to reduce the number of values used as inputs to a machine learning model. Such techniques can reduce the computational complexity of the model. Ideally, we want to reduce the number of inputs while minimizing any accuracy loss that might occur.
While dimensionality reduction usually requires a transformation of the data (thus incurring some computational costs), feature selection allows us to determine which inputs we can drop altogether. This difficult part is figuring out which features are unimportant.
Feature selection is a wide (and still active) area of research that includes a large number of techniques. In the video, we focus on two techniques: Pearson correlation coefficient (PCC) for unsupervised feature selection and Least Absolute Shrinkage and Selection Operator (LASSO) for supervised feature selection.
Correlation simply looks at the relationship between each pair of input features. We can use scatter plots to visualize those relationships or calculate a “correlation strength” number, such as the PCC. The PCC only provides a relative indication of linear correlation; it does not take non-linear relationships into account.
On the other hand, we can use supervised feature selection techniques to train a model and examine which of the inputs were most important in the decision-making process within the model. LASSO relies on adding an L1 regularization term to the first layer of nodes and then examining the resultant weights of that first layer after training. Larger absolute values indicate higher importance in the decision-making process, and weights closer to 0 means that the weights were relatively unimportant.
Once we have figured out which features we want to keep, we can train the model again using just those features to ensure that we did not lose much accuracy. In the video, I apply these techniques to the perfect toast machine (https://youtu.be/meYZOXQo5mY) to eliminate two of the sensors to achieve the same results with fewer sensors!
AI Toaster That Makes Perfect Toast Using Smell: https://www.youtube.com/watch?v=meYZOXQo5mY
Using Sensor Fusion and Machine Learning to Create an AI Nose: https://www.youtube.com/watch?v=KyMC0LsLZms
Intro to TinyML Part 1: https://www.youtube.com/watch?v=BzzqYNYOcWc
Intro to TinyML Part 2: https://www.youtube.com/watch?v=dU01M61RW8s
Related Project Links:
How to Build an AI-powered Toaster: https://www.digikey.com/en/maker/projects/how-to-build-an-ai-powered-toaster/2268be5548e74ceca6830bf35f0f0f9e
How to Make an AI-powered Artificial Nose: https://www.digikey.com/en/maker/projects/how-to-make-an-ai-powered-artificialnose/3fcf88a89efa47a1b231c5ad2097716a
What is Edge AI? Machine Learning + IoT: https://www.digikey.com/en/maker/projects/what-is-edge-ai-machine-learning-iot/4f655838138941138aaad62c170827af
Edge-Based Machine Learning Application Development is Getting a Whole Lot Easier: https://www.digikey.com/en/blog/edge-based-machine-learning-application-development