Prioritize researchers' workloads on their nodes

Context

This project aims to incentivize researchers to join their machines in our cluster. We are designing a prioritization mechanism such that they have priority over their own machines. Non-priority users can opt-in for more capacity at the risk of preemption.

Preassumptions

external nodes prioritized for a specific group do NOT add capacity to our HSRN cluster; only spot capacity.

Requirements

All external nodes need to be tainted

Pre-work

research what Kyverno is capable of. We may not need to develop a webhook ourselves

Implementation

Create a new priorityClass that is lower than the globalDefault => spot-low-priority
Create a namespaced validate Policy
Create a namespaced mutate Policy
Test policies
Extend rules for pod controllers
Set up CI with kyverno test https://kyverno.io/docs/testing-policies/

Acceptance Criteria

Mutate

opt-in low priority for spot instances and toleration Exists is added (All)
no opt-in and toleration Equals is added (priority namespace)
no opt-in and no toleration is added (non-priority namespace)

Validate

high priority and bad toleration key (non-priority namespace)
high priority and bad toleration (priority namespace)
- bad toleration value
- bad toleration operator
low priority and good toleration (All)
high priority and good toleration (priority namespace)

Deliverables

Two kyverno policies
An example of CI testing

Appendix

How does this prioritization mechanism work?

How to have eviction priority? PriorityClass alone cannot determine the order of eviction because QoS is first to be considered, then priority when it comes to node-pressure eviction. Thus, priority users must have their workload set to Guaranteed QoS level for eviction priority. Eviction takes place in the following order:

BestEffort or Burstable pods where the usage exceeds requests. These pods are evicted based on their Priority and then by how much their usage level exceeds the request.
Guaranteed pods and Burstable pods where the usage is less than requests are evicted last, based on their Priority.

Resources

k8s eviction https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#pod-selection-for-kubelet-eviction

Edited Mar 11, 2024 by Nick Chao

Assignee Loading

Time tracking Loading