Amazon Video

Feature Selection for Embedded Machine Learning | Digi-Key Electronics

Feature selection is the processing of choosing which inputs (or features) are necessary when creating and training a machine learning model. By discovering which features are the most important, less important features can be dropped as inputs. This has the effect of saving computational time, complexity, and memory.

However, this becomes even more important in embedded machine learning, where resources are scarce and a “feature” might mean an entire sensor. Dropping an entire sensor to achieve the same results can mean saving costs, board space, and power!

You can read a written tutorial demonstrating these concepts here:

All code for this demonstration can be found here:

Feature selection is similar to dimensionality reduction where we try to reduce the number of values used as inputs to a machine learning model. Such techniques can reduce the computational complexity of the model. Ideally, we want to reduce the number of inputs while minimizing any accuracy loss that might occur.

While dimensionality reduction usually requires a transformation of the data (thus incurring some computational costs), feature selection allows us to determine which inputs we can drop altogether. This difficult part is figuring out which features are unimportant.

Feature selection is a wide (and still active) area of research that includes a large number of techniques. In the video, we focus on two techniques: Pearson correlation coefficient (PCC) for unsupervised feature selection and Least Absolute Shrinkage and Selection Operator (LASSO) for supervised feature selection.

Correlation simply looks at the relationship between each pair of input features. We can use scatter plots to visualize those relationships or calculate a “correlation strength” number, such as the PCC. The PCC only provides a relative indication of linear correlation; it does not take non-linear relationships into account.

On the other hand, we can use supervised feature selection techniques to train a model and examine which of the inputs were most important in the decision-making process within the model. LASSO relies on adding an L1 regularization term to the first layer of nodes and then examining the resultant weights of that first layer after training. Larger absolute values indicate higher importance in the decision-making process, and weights closer to 0 means that the weights were relatively unimportant.

Once we have figured out which features we want to keep, we can train the model again using just those features to ensure that we did not lose much accuracy. In the video, I apply these techniques to the perfect toast machine ( to eliminate two of the sensors to achieve the same results with fewer sensors!

Product Links:

Related Videos:
AI Toaster That Makes Perfect Toast Using Smell:
Using Sensor Fusion and Machine Learning to Create an AI Nose:
Intro to TinyML Part 1:
Intro to TinyML Part 2:

Related Project Links:
How to Build an AI-powered Toaster:
How to Make an AI-powered Artificial Nose:

Related Articles:
What is Edge AI? Machine Learning + IoT:
Edge-Based Machine Learning Application Development is Getting a Whole Lot Easier:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.