Imagine a puzzle game similar to Tetris with pieces rapidly falling onto a stack. Some fit perfectly. Others don’t. The goal is to pack the blocks as tightly and efficiently as possible. This game is a loose analogy to the challenge faced by cloud data centers several times every second as they try to allocate processing jobs (called virtual machines or VMs) as efficiently as possible. But in this case, the “pieces” (or VMs) appear and disappear, some with a lifespan of only minutes, and others, days. In spite of the initially unknown VM lifespans, we still want to fill as much of the physical servers as possible with these VMs for the sake of efficiency. If only we knew the approximate lifespan of a job, we could clearly allocate much better.
At the scale of large data centers, efficient resource use is especially critical for both economic and environmental reasons. Poor VM allocation can lead to “resource stranding”, where a server’s remaining resources are too small or unbalanced to host new VMs, effectively wasting capacity. Poor VM allocation also reduces the number of “empty hosts”, which are essential for tasks like system updates and provisioning large, resource-intensive VMs.
This classic bin packing problem is made more complex by this incomplete information about VM behavior. AI can help with this problem by using learned models to predict VM lifetimes. However, this often relies on a single prediction at the VM’s creation. The challenge with this approach is that a single misprediction can tie up an entire host for an extended period, degrading efficiency.
In “LAVA: Lifetime-Aware VM Allocation with Learned Distributions and Adaptation to Mispredictions”, we introduce a trio of algorithms — non-invasive lifetime aware scoring (NILAS), lifetime-aware VM allocation (LAVA), and lifetime-aware rescheduling (LARS) — which are designed to solve the bin packing problem of efficiently fitting VMs onto physical servers. This system uses a process we call “continuous reprediction”, which means it doesn’t rely on the initial, one-time guess of a VM’s lifespan made at its creation. Instead, the model constantly and automatically updates its prediction for a VM’s expected remaining lifetime as the VM continues to run.