Model deployment for computer vision applications

There are many computer vision applications in which the speed at which an image or video is analyzed plays a crucial role. For example, if image analysis is part of a production process, the process can usually only continue once the image has been analyzed. This can result in expensive waiting times. Other applications require the image analysis to be so fast that for example human users don’t notice any delays.

In the following, we have summarized a few examples for which a resource-efficient deployment from the computer vision sector is necessary. By optimizing the deployment in terms of resource efficiency, two main goals can be achieved simultaneously:

Minimize model request times: Optimizing the entire deployment setup (from server selection, over optimizing models for inference, to setting the right deployment parameters) can lead to significantly lower request times
Less required computing resources: This directly translates into lower server costs

Our services support the entire deployment process – from hardware selection and optimization of models and deployment configurations to the final deployment – helping you achieve an efficient solution for your use-case.

Quality control / quality assurance

Thanks to digitalization and the introduction of artificial intelligence, more and more processes in companies can be automated. This includes quality control, for example in manufacturing companies.

An assembly line, where a camera is mounted on the conveyor belt. The camera is used for quality insurance. For this, it is important the the image analysis is fast enough in order to meet the speed of the conveyor belt.

It is important that the analysis by artificial intelligence is fast enough so that production processes are not disrupted, slowed down or interrupted. To achieve this, the underlying models must be correctly deployed. This applies not only to the response time of an individual model, but also to the provision of computing resources in the event of fluctuations, for example regarding the number of quality assurance stations operating simultaneously.

Real-time video analysis

The real-time analysis of camera data plays a role in many areas. It is often too memory-intensive to record all the data from a camera. Instead, machine learning models can be used to filter the relevant moments. There are also many other applications that are based on a live camera feed.

A surveillance camera placed in the woods. It should only save videos if there is wildlife in the image. The decision, if there a recorded image should be saved or not, has to be done very fast using a machine learning model.

In this use-case, it is important that the deployment is designed for fast inference time. For example, the servers must be provisioned to avoid cold start times. The model should be optimized to save server costs and keep inference latency as low as possible.

Applications on mobile devices

Nowadays, many applications run on mobile devices, such as smartphones. Computing resources are particularly limited on mobile devices, which is why machine learning computations are often outsourced to a server. To ensure that users do not experience annoying delays, it is important that the model execution is as fast as possible.

A woman using her smartphone for augmented reality use-cases. In this situation, the user expects instant changes when they move their smartphone, requiring ultra low-latency execution of artificial intelligence models.

Usually, different machine learning models have to be used in different scenarios. It is often not clear which models will be requested next. Therefore, it is a good strategy to deploy several models on one server so that they can share the resources. This means that all models are ready to provide responses quickly. This is especially the case with user-facing or real-time applications, where fast model response times are a necessity.

Deployment done right

Deploying artificial intelligence / machine learning models the correct way is easy with our services. Just tell us your requirements and goals, find the right server architecture, and deploy your models seamlessly on-premise or to the cloud. Our all-in-one solution eliminates the hassle, optimizing both your models and deployment configurations to deliver maximum inference speed while minimizing server costs.

Get started!

Computer Vision

Quality control / quality assurance

Real-time video analysis

Applications on mobile devices

Deployment done right