Bringing machine learning models to production can be a challenging process, and there are several mistakes that can be made along the way. Here are three of the biggest mistakes to avoid, as well as solutions to help you overcome these obstacles.
Not meeting CPU/Memory requirements
One of the biggest mistakes when bringing machine learning models to production is not adequately considering the CPU and memory requirements of your model. If your model is too complex or requires too much processing power, it may not be able to run on the device or platform you are using.
By deploying a model on the device before training it you can check if it will meet the requirements right away. This is much better than doing a costly training and discovering later that you can’t meet the requirements.
Especially when you are using a NPU you might find out that some operation are much faster than others. It might be smarter to run e.g.: a bigger vanilla depthwise convolution (that could be accelerated by 20x) rather than your SOTA choice which is not accelerated.
Unsupported kernels
Another common mistake is not ensuring that all of your kernels are supported on the device or platform you are using. If your model uses kernels that are not supported, it will not be able to run correctly, which can be frustrating and time-consuming to fix.
To avoid this mistake, it is important to stick to basic operations such as Convolution and Dense when using edge accelerators, as these are typically more widely supported. Especially when 8-bit quantization is used the support for (eg.: LSTM/GRU) is marginal.
Implementation errors
Lastly, another common mistake when bringing machine learning models to production is not checking if all kernels of your model produce the correct output on the edge device. If kernels have implementation errors, it will not run correctly, which can lead to inaccurate results and poor performance.
To avoid this mistake, it is important to check if all kernels output the correct result by comparing the results on the edge versus the desktop. This will help you identify any errors and fix them before deploying your model.
By considering these common mistakes and implementing the solutions outlined above, you can avoid many of the challenges associated with bringing machine learning models to production. When you are deploying your model to a edge device, it is important to carefully assess the requirements and capabilities of your model to ensure it runs smoothly and accurately.