4

I have done mostly machine learning with big data, GPUs on EC2 VMs, K8S clusters etc. But this new assignment is on the other end of the scale.

Basically, it is a time series forecasting and regression problem with some body signals. The problem itself is simple enough. I developed some moderate sized transformer/LSTM models using frameworks like tensorflow and darts. But the deployment constraint says

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

So the questions are

  • What kind of frameworks, or programming languages do I need to support it?
  • I am not an Android/IOS developer either. Suppose I wrap my model inference (using tensorflow) inside a simple python function that takes some features as arguments and spits out a prediction. So do I assume the device/firmware engineer can invoke the python function to consume my model? Or inside the device I can run a Dockerised service at a port to get the inference (this is what I have done previously in big data context)?
  • What kind of interface can I use to retrain and update the model regularly? Again, the challenge is in pushing the model to the end device
  • Or, in some scenarios, may I assume that the device stays internet connected, hence can just make an HTTP request to a server?

I am not sure if I am asking the right questions here, as obviously it is a big context switch from my previous set ups using big cloud services for inference. So any help, resource and standard practices will be greatly appreciated.

Della
  • 169

3 Answers3

6

But the deployment constraint says

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

My biggest piece of advise is to get more information about this wearable device. How much RAM does it have/are you allowed to use? How much storage (flash)? What operating system facilities does it have?

Based on the stated deployment constraint, I would not be surprised if the device uses an embedded OS that does not support the concept of executable files. If so, you can forget about using python or any kind of containerized concept. You can create & train your model in tensorflow, but you will have to have a specialized consumer of the model that is most likely written in C or C++.

6

I'm making a wild guess that "low power proprietary wearable" means a microcontroller, and probably an ARM-based one. The good news is that there are libraries like LiteRT (formerly known as Tensorflow Lite) that are freely available and not difficult to integrate. And even better, you can probably export the model as .tflite files, give them to a C++ developer and have them integrate the library. The challenge is that it's your task to make the model small enough to run on the target device within time and memory constraints.

This kind of device wouldn't necessarily update itself. Instead there is usually a companion app that runs on a mobile phone or desktop computer that handles downloading new weights and deploys them on the device.

ojs
  • 208
3

the model has to be deployed on a low power proprietary wearable device with <50 ms latency

Those are performance constraints. They have nothing to do with frameworks or programming languages. What they have to do with are your models nodes. How many you can have and how deep they can be layered.

A key thing to find out here is if the device has anything like a video card GPU capable of parallel processing or if you're stuck with a CPU. Find out how much memory is left for your code and model when it’s running its operating system.

may I assume that the device stays internet connected, hence can just make an HTTP request to a server?

Well sure, but then you're not deploying the model to a wearable device. You're deploying it to a data center. And as @pjc50 points out, 50ms latency might be ambitious:

Latency Medium
0-10ms T1
5-40ms cable internet
10-70ms DSL
100-220ms dial-up

pingplotter.com

Also, most wearables are not wired. Adding Wifi isn't going to help with the latency.

candied_orange
  • 119,268