Resource efficient deployment plays a big role for many use-cases. This can be seen from two sides. On the one hand, deploying ML / AI models in a resource efficient way results in lower server cost. On the other hand, inference can be faster using models optimized for deployment. By that, you can reduce your operational cost and your request latency at the same time.
See the following pages as an overview, which use-cases can profit from deploying models in a resource-efficient way.