Serving machine learning inference workloads on the cloud is still a challenging task on the production level. Optimal configuration of the inference workload to meet SLA requirements while optimizing the infrastructure costs is highly complicated due to the complex interaction between batch configuration, resource configurations, and variable arrival process. Serverless computing has emerged in recent years to automate most infrastructure management tasks. In this work, we present MLProxy, an adaptive reverse proxy to support efficient machine learning serving workloads on serverless computing systems.
Analytical performance models are very effective in ensuring the quality of service and cost of service deployment remain desirable under different conditions and workloads. While various analytical performance models have been proposed for previous paradigms in cloud computing, serverless computing lacks such models that can provide developers with performance guarantees. In this work, we aim to develop analytical performance models for the latest trend in serverless computing platforms that use concurrency value and the rate of requests per second for autoscaling decisions.
Developing accurate and extendable performance models for serverless platforms, aka Function-as-a-Service (FaaS) platforms, is a very challenging task. Also, implementation and experimentation on real serverless platforms is both costly and time-consuming. However, at the moment, there is no comprehensive simulation tool or framework to be used instead of the real platform. As a result, in this paper, we fill this gap by proposing a simulation platform, called SimFaaS, which assists serverless application developers to develop optimized Function-as-a-Service applications in terms of cost and performance.
In this work, we propose an analytical performance model that is capable of predicting several key performance metrics for serverless workloads using only their average response time for warm and cold requests. The introduced model uses realistic assumptions, which makes it suitable for online analysis of real-world platforms. We validate the proposed model through extensive experimentation on AWS Lambda.