GPU Partitioning: Fair Share Scheduling

Jun 15, 2022 Patrick Fu

Photo by Alvaro Reyes on Unsplash

Author:Patrick Fu(CEO of Gemini Open Cloud)

The GPU computation is asynchronous to the POD itself. Typically, the process running on the POD copies data to the GPU memory and issues a CUDA instruction to the GPU to execute the calculation (known as GPU kernel). When the GPU kernel finishes with the computation, it will issue a sync request to wake up the POD and copies the computation results back to the main memory.

GPU Kernels are non-preemptable and cannot be interrupted. Therefore, even after GPU partitioning, the actual amount of GPU usage by each POD is still unpredictable, which may still lead to underutilization, or performance delays. For this reason, we need to implement a collaboration between scheduler front-end, a device manager, and a scheduler backend to achieve fair share scheduling.

Figure 5 Fair share scheduling for Gemini GPU Partitioning
Figure 5 Fair share scheduling for Gemini GPU Partitioning

Figure 5 shows how the Gemini scheduler achieves fair share scheduling for ML workload. It consists of an event driven monitoring subsystem (#1) to collect the GPU utilization for the Device Manager. The Gemini scheduler will calculate on a real-time basis the next POD that should be scheduled. There are 2 pieces of information the scheduler needs to calculate:

  1. The POD that’s currently furthest away from it target GPU % utilization.
  2. The amount of time this POD should be given to run on the target GPU.

This information is encoded in a token and dispatched to the target worker node (#2 & #3). As these processes reiterates, the PODs should be getting closer to its target GPU quota. In the case a POD exceeded its quota, the token will be revoked (#4) and the POD will not be eligible to be scheduled.

We will briefly explain each of these 3 subsystems:

Event driven Monitoring

Token-based time sharing scheduler

Token Revocation


In summary, we have explained how we customize the default Kube scheduler to allow a physical GPU to be shared by multiple POD’s and how we collect their GPU utilization to dynamically adjust the time slice we allocate to the PODs running ML workload.

K8s Scheduler Series Reference

Gemini Open Cloud is a CNCF member and a CNCF-certified Kubernetes service provider. With more than ten years of experience in cloud technology, Gemini Open Cloud is an early leader in cloud technology in Taiwan.

We are currently a national-level AI cloud software and Kubernetes technology and service provider. We are also the best partner for many enterprises to import and management container platforms.

Except for the existing products “Gemini AI Console” and “GOC API Gateway”, Gemini Open Cloud also provides enterprises consulting and importing cloud native and Kubernetes-related technical services. Help enterprises embrace Cloud Native and achieve the goal of digital transformation.

Related Posts

Back to Gemini Technical Blog List

Gemini AI Console

Popular Posts

kubernetes professional service

About Us

Gemini Open Cloud is a leader of hybrid-multi-cloud. We are international KCSP - Kubernetes Certified Service Provider, and we are also a member of CNCF (Cloud Native Computing Fondation).

The cloud experts in Gemini Open Cloud have multiple certificates including Kubernetes, OpenStack and Google Cloud Platform. Our softwares have been provided cloud services and management for hundres of institution/enterprise and thousands of CPU/GPU servers.