GPU Partitioning: Fair Share Scheduling

Jun 15, 2022 Patrick Fu

Photo by Alvaro Reyes on Unsplash

Author:Patrick Fu(CEO of Gemini Open Cloud)

The GPU computation is asynchronous to the POD itself. Typically, the process running on the POD copies data to the GPU memory and issues a CUDA instruction to the GPU to execute the calculation (known as GPU kernel). When the GPU kernel finishes with the computation, it will issue a sync request to wake up the POD and copies the computation results back to the main memory.

GPU Kernels are non-preemptable and cannot be interrupted. Therefore, even after GPU partitioning, the actual amount of GPU usage by each POD is still unpredictable, which may still lead to underutilization, or performance delays. For this reason, we need to implement a collaboration between scheduler front-end, a device manager, and a scheduler backend to achieve fair share scheduling.

Figure 5 Fair share scheduling for Gemini GPU Partitioning
Figure 5 Fair share scheduling for Gemini GPU Partitioning

Figure 5 shows how the Gemini scheduler achieves fair share scheduling for ML workload. It consists of an event driven monitoring subsystem (#1) to collect the GPU utilization for the Device Manager. The Gemini scheduler will calculate on a real-time basis the next POD that should be scheduled. There are 2 pieces of information the scheduler needs to calculate:

  1. The POD that’s currently furthest away from it target GPU % utilization.
  2. The amount of time this POD should be given to run on the target GPU.

This information is encoded in a token and dispatched to the target worker node (#2 & #3). As these processes reiterates, the PODs should be getting closer to its target GPU quota. In the case a POD exceeded its quota, the token will be revoked (#4) and the POD will not be eligible to be scheduled.

We will briefly explain each of these 3 subsystems:

Event driven Monitoring

Token-based time sharing scheduler

Token Revocation

Summary

In summary, we have explained how we customize the default Kube scheduler to allow a physical GPU to be shared by multiple POD’s and how we collect their GPU utilization to dynamically adjust the time slice we allocate to the PODs running ML workload.

K8s Scheduler Series Reference


Gemini Open Cloud is a CNCF member and a CNCF-certified Kubernetes service provider. With more than ten years of experience in cloud technology, Gemini Open Cloud is an early leader in cloud technology in Taiwan.

We are currently a national-level AI cloud software and Kubernetes technology and service provider. We are also the best partner for many enterprises to import and management container platforms.

Except for the existing products “Gemini AI Console” and “GOC API Gateway”, Gemini Open Cloud also provides enterprises consulting and importing cloud native and Kubernetes-related technical services. Help enterprises embrace Cloud Native and achieve the goal of digital transformation.


相關文章

回雙子星技術部落格列表

Gemini AI Console

熱門文章


kubernetes professional service

關於我們

雙子星雲端是混合多雲技術的領導者,是國際認證之 KCSP - Kubernetes 服務提供商,同時也為 CNCF 雲原生計算基金會會員。

雙子星的雲端專家擁有 Kubernetes、OpenStack 與 Google Cloud Platform 等多項證照,我們的軟體至今已為上百家機構和數千台的 CPU/GPU 伺服器提供雲端服務。