Skip to content

Prioritize researchers' workloads on their nodes

Context

This project aims to incentivize researchers to join their machines in our cluster. We are designing a prioritization mechanism such that they have priority over their own machines. Non-priority users can opt-in for more capacity at the risk of preemption.

Preassumptions

  1. external nodes prioritized for a specific group do NOT add capacity to our HSRN cluster; only spot capacity.

Requirements

  1. All external nodes need to be tainted

Pre-work

  • research what Kyverno is capable of. We may not need to develop a webhook ourselves

Implementation

  • Create a new priorityClass that is lower than the globalDefault => spot-low-priority
  • Create a namespaced validate Policy
  • Create a namespaced mutate Policy
  • Test policies
  • Extend rules for pod controllers
  • Set up CI with kyverno test https://kyverno.io/docs/testing-policies/

Acceptance Criteria

Mutate

  • opt-in low priority for spot instances and toleration Exists is added (All)
  • no opt-in and toleration Equals is added (priority namespace)
  • no opt-in and no toleration is added (non-priority namespace)

Validate

  • high priority and bad toleration key (non-priority namespace)
  • high priority and bad toleration (priority namespace)
    • bad toleration value
    • bad toleration operator
  • low priority and good toleration (All)
  • high priority and good toleration (priority namespace)

Deliverables

  • Two kyverno policies
  • An example of CI testing

Appendix

  1. How does this prioritization mechanism work? Prioritization_sequence_diagram__non-priority_user_.svg

Prioritization_sequence_diagram__priority_user_.svg

  1. How to have eviction priority? PriorityClass alone cannot determine the order of eviction because QoS is first to be considered, then priority when it comes to node-pressure eviction. Thus, priority users must have their workload set to Guaranteed QoS level for eviction priority. Eviction takes place in the following order:
  • BestEffort or Burstable pods where the usage exceeds requests. These pods are evicted based on their Priority and then by how much their usage level exceeds the request.
  • Guaranteed pods and Burstable pods where the usage is less than requests are evicted last, based on their Priority.

Resources

k8s eviction https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#pod-selection-for-kubelet-eviction

Edited by Nick Chao