Application resource tuning
As touched upon in the resiliency guidelines, deploying applications with appropriate CPU and memory requests and limits is critical to ensure:
- Resource availability for your own applications
- Resource availability for other tenant applications
While resource quotas are quite generous, these quotas must be seen as a tool to allow tenants enough resources to temporarily burst usage for experimentation, rather than an upper limit of consistent use. The platform is not sized to support every tenant fully utilizing their resource quota.
Resource requests
Resource requests are guaranteed and reserved for the pod. Pod scheduling decisions are made based on the request to ensure that a node has enough capacity available to meet the requested value. Inefficient use of requests lead to having to buy more licenses and hardware for the platform.
Resource limits
Resource limits set an upper limit of what a pod can burst to if the resources are available on the node.
Setting requests and limits
❗ If you set a resource limit, you should also set a resource request. Otherwise the request will match the limit. For example, a deployment with no defined CPU request and a defined CPU limit of 1 core will result in a pod with a request of 1 CPU and a limit of 1 CPU.
General guidelines
☑ Set requests and limits.
☑ Set requests to the minimum of what your application needs.
☑ Set limits to a reasonable burstable number of what a single pod should support.
☑ Use horizontal pod autoscalers where possible, rather than large CPU and memory limits.
Being a good resource citizen
⭐ ⭐ ⭐
Having a 3:1 ratio of CPU request:CPU utilization is a good starting place for new applications that haven't yet been tuned. Using a 3:1 ratio makes you a good community member.
⭐ ⭐ ⭐ ⭐
Having a 2:1 ratio of CPU request:CPU utilization is a great next step for teams whose projects are working and stable, and who are in a position to start tuning their application more effectively - especially those who are seeking to make better use of horizontal scaling. Using a 2:1 ratio makes you a great community member.
⭐ ⭐ ⭐ ⭐ ⭐
Having a 1.5:1 ratio of CPU request:CPU utilization is an amazing goal for teams who have already started tuning their applications and are looking to make the best possible use of the platform's capabilities. Using a 1.5:1 ratio makes you an amazing community member.
CPU and Memory Utilization
Watch a four-minute video that includes an example of resource tuning for a sample OpenShift application.
Resources
Deploying pods without specifying a limit or a request
If you deploy pods without setting limits or requests, they will be deployed with the following defaults:
- CPU request: 100m
- CPU limit: 250m
- Memory request: 256Mi
- Memory limit: 1Gi
This is NOT the same as specifying a resource request or limit of 0.
Setting the request and limits to 0
If you set requests and limits to 0, pods will run under the BestEffort QoS class, using whatever spare capacity is available on the node.
🔺 Assigning 0
as a request or limit must be done through the CLI or directly in the manifest. The Web Console will not accept 0
as a request or limit while editing the resources on a deployment; it will apply the platform defaults outlined in the previous answer.
Specifying a limit value only
If you only specify a limit value, but not a request value, your pods will be deployed with a request that is identical to the limit.
Specifying a request value only
If you only specify a request value, pods will be deployed with the configured request, and will have the default limit applied of:
- CPU limit: 250m
- Memory limit: 1Gi
Creating a deployment with a request that is higher than the default limit
If you create a deployment with a request value that is higher than the default limits above, you will be required to define a limit.
Checking CPU consumption of running pods in a namespace"
There is a way that requires you to make use of the oc
client versus using the web console. Additional math will be required past this point as there is no way of automating this in a cross-platform fashion using just oc
.
$ oc adm top pod
NAME CPU(cores) MEMORY(bytes)
<redacted> 3m 285Mi
<redacted> 3m 299Mi
<redacted> 3m 285Mi
<redacted> 0m 13Mi
<redacted> 9m 61Mi
<redacted> 4m 98Mi
<redacted> 0m 28Mi
<redacted> 2 26Mi
For the above, the column of numbers involving CPU(cores)
is what you want to add up. the m
suffix stands for millicores, so for the above, add up the numbers and divide by 1000 to get the actual consumption of CPU cores by the pods in the current project. If the CPU usage has no m
suffix, then that is just measured in cores, and not millicores. For the above example, the total would then be 2 + (3+3+3+9+4)/1000 = 2.022 CPU cores of actual CPU consumption.
Note: unless you're running a scientific application or you know it's multithreaded you should not be giving an app any more than 1-core.
The vertical pod autoscaling tool can be used to calculate resource recommendations based on the real time resources used by your pods. This video demonstration shows how it can be done.
Get current request values with oc command To get the current value of CPU requests allowed for the project currently logged into with oc
The following one-liner will display the current value of CPU requests as currently allotted for the current project.
oc get quota compute-long-running-quota -o=custom-columns=Requests:.status.used."requests\.cpu"
Below is an example output of the above command, the m
at the end again means millicores, so dividing the number by 1000 tells us the current project per this example has a total allotted CPU requests value of 14.5 CPU cores.
oc get quota compute-long-running-quota -o=custom-columns=Requests:.status.used."requests\.cpu"
Requests
14500m
Note: unless you're running a scientific application or you know it's multithreaded you should not be giving an app any more than 1-core.
Jenkins resource configuration recommendations
Tuning the resources of Jenkins deployments can have a large effect on the available resources of the platform. As of writing, Jenkins accounts for the largest user of CPU requests and limits on the platform. Recent analysis has indicated:
- 15-25% of CPU requests on the platform are related to Jenkins
- 7% of the CPU requests are actually used, on average, over 1 day
- 10% or more CPU requests for the overall platform can be saved by tuning Jenkins resources
Recommended configuration
Based on the performance testing details below, the following recommendations are suggested for Jenkins deployments:
- CPU request: 100m
- CPU limit: 1000m (May vary depending on usage)
- Memory request: 512M
- Memory limit: 1-2GB (May vary depending on usage)
On a typical Jenkins deployment, the following snippet can be used if you are editing the yaml:
spec:
resources:
requests:
cpu: "100m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
The following command can also be used to update a Jenkins DeploymentConfig:
oc patch dc/jenkins -p '{"spec": {"template": {"spec": {"containers":[{"name":"jenkins", "resources":{"requests": {"cpu":"100m", "memory":"512Mi"}, "limits": {"cpu":"1", "memory":"1Gi"}}}]}}}}'
Performance testing details
The reason that Jenkins is often deployed with such high CPU and memory requests was related to previous scheduler issues that have since been fixed on the platform. As a result, the templates and existing Jenkins deployments should be tuned to reduce the CPU requests.
A test was performed to collect the startup time of Jenkins under various resource configurations. Each test was performed three times and the startup time was averaged out across each iteration. The name of each test is in the format of [cpu_requests_in_millicores]-[cpu_limits_in_millicores]-[memory_requests_in_mb]
.
Test Name | Average Startup Time (s) |
---|---|
100m-req-500m-limit-128m | 295 |
100m-req-500m-limit-512m | 248 |
100m-req-500m-limit-128m | 368 |
100m-req-1000m-limit-128m | 163 |
100m-req-500m-limit-512m | 185 |
100m-req-1000m-limit-512m | 77 |
100m-req-2000m-limit-512m | 80 |
500m-req-1000m-limit-128m | 137 |
500m-req-1000m-limit-512m | 91 |
1000m-req-2000m-limit-128m | 131 |
1000m-req-2000m-limit-512m | 73 |
The observations from the testing can be summarized as follows:
- CPU limit has the largest effect on startup performance
- CPU request has little effect on startup performance
- The gain from a CPU limit of 500m to 1000m is major
- The gain from a CPU limit of 1000m to 2000m is minor
- One ideal configuration looks like this:
- CPU request: 100m
- CPU limit: 1000m+
- Memory request: 512M
- Memory limit: 1-2GB (May vary depending on usage)
Advanced Jenkins resource tuning
Consider monitoring the upper and lower bounds of CPU and memory usage of Jenkins instances over time. When idle, it has been observed that Jenkins uses under 5m
of CPU and about 650Mi
of memory. As per the General Guidelines above, "set requests to the minimum of what your application needs." It is ideal to reserve resources conservatively (especially for workloads that are often idle), and leverage resource limits to burst when active.
Also, consider other workloads you may need to run in the tools namespace when accounting for requests/limits allocation to be within the allotted maximums.
Tools namespaces resource quota recommendations
Every product in a cluster is provided a license plate and a namespace for each environment (dev, test, prod). These products also have a tools namespace defined as <license>-tools
, where tooling such as Jenkins are deployed.
As of writing, there is a discrepancy between compute resources (especially CPU) requested compared to actual usage.
On average, Jenkins instances in tools namespaces across the cluster are requesting much more resources than they are utilizing. These overcommitted Jenkins instances are one of the largest contributors to this over-allocation problem.
In short, the recommendation is to lower compute resource requests and leverage resource limits in a burstable fashion.
This section identifies the problem and mitigation recommendation of resource over-allocation in tools namespaces.
Decoupling tools namespaces quotas and limit ranges
It is recommended to decouple the quotas and limits sizing of the tools namespace from the other environment namespaces (dev, test, prod), to separately adjust the quotas and limits of the tools namespaces.
OpenShift template considerations for reduced quota
When deploying a workload such as Jenkins from the OpenShift catalog, you may not be prompted to configure all of the CPU and memory requests and limits. In the case of Jenkins, you may only define the memory limit (defaults to 1Gi) which will set the memory requests to the same value.
To accommodate a reduced project quota, the oc patch
command (depicted above) should be used with more appropriate CPU and memory requests and quotas for all workloads in the tools project. Otherwise, these workloads may not become schedulable if their combined total requests/limits exceed the maximums defined by project quotas.
Viewing quota usage
You can identify current resource quota consumption and properly size resource requests and limits of existing/new workloads using either the OpenShift web console or oc
command-line tool.
Viewing quota usage (GUI)
From the OpenShift web console, in the Administrator perspective, proceed to Administration > ResourceQuotas and select the appropriate ResourceQuota
(i.e., compute-long-running-quota
). Here is an example:
Viewing quota usage (CLI)
To describe a specific quota, use the oc
tool:
$ oc describe resourcequotas compute-long-running-quota # -n <project>
Name: compute-long-running-quota
Namespace: <license>-tools
Scopes: NotBestEffort, NotTerminating
* Matches all pods that have at least one resource requirement set. These pods have a burstable or guaranteed quality of service.
* Matches all pods that do not have an active deadline. These pods usually include long running pods whose container command is not expected to terminate.
Resource Used Hard
-------- ---- ----
limits.cpu 1860m 8
limits.memory 5184Mi 32Gi
pods 9 100
requests.cpu 610m 4
requests.memory 2752Mi 16Gi
Risks associated with reducing resource reservation
Consider these risks when reducing resource quotas (and subsequently, requests/limits).
See Configuring your cluster to place pods on overcommitted nodes for more details.
Simultaneous resource claiming
Reducing resource requests "works on the assumption that not all the pods will claim all of their usable resources at the same time".
Consideration must be made to determine if several workloads across the cluster would be bursting above their requests simultaneously at a specific time of day.
Node CPU saturation
Very low CPU requests (i.e., 5m
) may be assigned to workloads such as Jenkins that have minuscule CPU usage when idle, and rely on CPU limits to burst during pipeline runs. A potential risk with this configuration is if the node a workload is scheduled on is being heavily utilized. The workload will not be able to burst much higher than the given CPU requests, potentially causing a significant slowdown.
However, this will not cause pod evictions. CPU throttling (extensively below CPU limits) can be mitigated, ensuring nodes across the cluster are evenly balanced and not over-utilized.
Node memory saturation
Nodes running out of memory can be more troublesome than CPU saturation. Regardless of node capacity, workloads that consume beyond their configured memory limits will immediately be terminated. However, if the workload is above its memory requests (but within its memory limits) and its node is running out of memory, the workload may be evicted (depending on the scheduler and priority) to reclaim that memory.
Because memory is incompressible, memory requests and limits should be a little more generous to mitigate pod eviction/termination.
Related training content
The OpenShift 101 & 201 training covers some of the principles of resource tuning. There is a related OpenShift 201 resource management lab exercise and video demonstration.
Related links:
- Autoscaling with HPA and VPA presentation (IDIR required)