How would you deploy a Computer Vision model for streaming?

Suppose you’re using kubernetes and you’ve got the following options brainstormed:

  1. Microservice where the model is a standard image detection REST API with some other service breaking the video up into images to do each REST call

  2. Having a model as part of an app that pulls data in from a source and publishes detection events to a zeromq bus.

  3. Using gRPC to serve

What considerations would you make for deployment? What are some questions you would ask yourself?

Thanks in advance!


Good questions. I would definitely want to know what the video quality is expected to be. If there’s a need for higher resolution videos and high framerate, then it would help to plan accordingly.

Also it would be good to know if there is any temporal dependence in the application processing pipeline - like will I need information from previous frames in order to process later frames.

Is latency of critical importance to the application? 2 examples that come to mind are a crime detection system or accident alert system at an elderly person’s home. These would definitely require low latency responses, which would probably also put a limit on the framerate/video resolution.

Also might be important to consider how many cameras might be part of the system - if more than one. Then it may also be important to consider some sort of distributed computing solution.


When you are using gRPC and Kubernetes, you need to keep in mind one key requirement of gRPC that is not native to Kubernetes: Layer 7 load balancing.

gRPC requires a layer 7 load balancer to work properly. A layer 7 load balancer works at the application layer, and requires greater integration between applications and services.

There are a few good options, the best of which would be nginx’s new layer 7 load balancing, or Envoy – an Istio component, which functions as a sidecar in the pods of your application.

If you are planning on using managed Kubernetes, you may find that managed Istio – such as Google Anthos – provides you with the easiest access to layer 7 load balancing.

There are a lot of questions you need to consider when considering a move to a service mesh architecture and, for this simple of an application, you may not need the observability that Istio or any service mesh provide into your services. However, managed Istio manages to abstract away much of this complexity, and might actually provide you with an easier avenue to ensure that load balancing works properly


@lu.riera is asking the right questions.

When I was applying deep learnings in real time to robots in ROS the key factors I asked were:

What framerate do you need to infer at? is 2Hz inference enough (1 in 15 frames)? or do you need to do 30FPS and infer on all frames knowing that the information in each frame is probably very similar in most regular applications (unless you’re talking high speed footage - e.g. cocacola bottling plants).

I did find that having enough compute on edge and deploying my model within a ROS node was pretty good but to get real speed you have to consider making the model smaller. The downside was that it made the deployment more complicated (wasn’t dockerized etc. because of the ROS deployment).