Skip to content
Icon

gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster Cluster Resource Health

Profile Avatar

Icon 1 2 Troubleshooting Commands

Icon 1 Last updated 9 weeks ago

Icon 1 Contributed by stewartshea



Troubleshooting Commands

Identify High Utilization Nodes for Cluster gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster

What does it do?

This script is a bash script used to gather and analyze resource allocation and usage data for nodes in a Kubernetes cluster. It retrieves information about node details, allocatable resources, and usage, and then processes and analyzes the data to identify nodes with high CPU and memory utilization, outputting the results to a JSON file called high_use_nodes.json.

Command
CONTEXT="gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster" KUBERNETES_DISTRIBUTION_BINARY="kubectl"  bash -c "$(curl -s https://raw.githubusercontent.com/runwhen-contrib/rw-cli-codecollection/main/codebundles/k8s-cluster-resource-health/get_high_use_nodes.sh)" _
IconCopy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash

# Define Kubernetes binary and context with dynamic defaults
KUBERNETES_DISTRIBUTION_BINARY="${KUBERNETES_DISTRIBUTION_BINARY:-kubectl}" # Default to 'kubectl' if not set in the environment
DEFAULT_CONTEXT=$(${KUBERNETES_DISTRIBUTION_BINARY} config current-context)
CONTEXT="${CONTEXT:-$DEFAULT_CONTEXT}" # Use environment variable or the current context from kubectl

# Function to process nodes and their resource usage
process_nodes_and_usage() {
    # Get Node Details including allocatable resources
    nodes=$(${KUBERNETES_DISTRIBUTION_BINARY} get nodes --context ${CONTEXT} -o json | jq '[.items[] | {
        name: .metadata.name,
        cpu_allocatable: (.status.allocatable.cpu | rtrimstr("m") | tonumber),
        memory_allocatable: (.status.allocatable.memory | gsub("Ki"; "") | tonumber / 1024)
    }]')

    # Fetch node usage details
    usage=$(${KUBERNETES_DISTRIBUTION_BINARY} top nodes --context ${CONTEXT} | awk 'BEGIN { printf "[" } NR>1 { printf "%s{\"name\":\"%s\",\"cpu_usage\":\"%s\",\"memory_usage\":\"%s\"}", (NR>2 ? "," : ""), $1, ($2 == "<unknown>" ? "0" : $2), ($4 == "<unknown>" ? "0" : $4) } END { printf "]" }' | jq '.')

    # Combine and process the data
    jq -n --argjson nodes "$nodes" --argjson usage "$usage" '{
        nodes: $nodes | map({name: .name, cpu_allocatable: .cpu_allocatable, memory_allocatable: .memory_allocatable}),
        usage: $usage | map({name: .name, cpu_usage: (.cpu_usage | rtrimstr("m") | tonumber // 0), memory_usage: (.memory_usage | rtrimstr("Mi") | tonumber // 0)})
    } | .nodes as $nodes | .usage as $usage | 
    $nodes | map(
        . as $node | 
        $usage[] | 
        select(.name == $node.name) | 
        {
            name: .name, 
            cpu_utilization_percentage: (.cpu_usage / $node.cpu_allocatable * 100),
            memory_utilization_percentage: (.memory_usage / $node.memory_allocatable * 100)
        }
    ) | map(select(.cpu_utilization_percentage >= 90 or .memory_utilization_percentage >= 90))'
}

# Execute the function and save the output to a file
process_nodes_and_usage > high_use_nodes.json

# Output the contents of the generated file
cat high_use_nodes.json
Helpful Links

Identify Pods Causing High Node Utilization in Cluster gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster

What does it do?

This script is designed to automate the monitoring of resource requests in a Kubernetes cluster. It involves fetching details around CPU & memory allocations for nodes and pods, computes actual utilization, normalizes these metrics, and then compares them against configured settings in order to identify any excessive resource usage.

Command
CONTEXT="gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster" KUBERNETES_DISTRIBUTION_BINARY="kubectl"  bash -c "$(curl -s https://raw.githubusercontent.com/runwhen-contrib/rw-cli-codecollection/main/codebundles/k8s-cluster-resource-health/pods_impacting_high_use_nodes.sh)" _
IconCopy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#!/bin/bash

# Define Kubernetes binary and context with dynamic defaults
KUBERNETES_DISTRIBUTION_BINARY="${KUBERNETES_DISTRIBUTION_BINARY:-kubectl}" # Default to 'kubectl' if not set in the environment
DEFAULT_CONTEXT=$(${KUBERNETES_DISTRIBUTION_BINARY} config current-context)
CONTEXT="${CONTEXT:-$DEFAULT_CONTEXT}" # Use environment variable or the current context from kubectl

process_nodes_and_usage() {
    # Get Node Details including allocatable resources
    nodes=$(${KUBERNETES_DISTRIBUTION_BINARY} get nodes --context ${CONTEXT} -o json | jq '[.items[] | {
        name: .metadata.name,
        cpu_allocatable: (.status.allocatable.cpu | rtrimstr("m") | tonumber),
        memory_allocatable: (.status.allocatable.memory | gsub("Ki"; "") | tonumber / 1024)
    }]')

    # Fetch node usage details
    usage=$(${KUBERNETES_DISTRIBUTION_BINARY} top nodes --context ${CONTEXT} | awk 'BEGIN { printf "[" } NR>1 { printf "%s{\"name\":\"%s\",\"cpu_usage\":\"%s\",\"memory_usage\":\"%s\"}", (NR>2 ? "," : ""), $1, ($2 == "<unknown>" ? "0" : $2), ($4 == "<unknown>" ? "0" : $4) } END { printf "]" }' | jq '.')

    # Combine and process the data
    jq -n --argjson nodes "$nodes" --argjson usage "$usage" '{
        nodes: $nodes | map({name: .name, cpu_allocatable: .cpu_allocatable, memory_allocatable: .memory_allocatable}),
        usage: $usage | map({name: .name, cpu_usage: (.cpu_usage | rtrimstr("m") | tonumber // 0), 
        memory_usage: (.memory_usage | rtrimstr("Mi") | tonumber // 0)})
    } | .nodes as $nodes | .usage as $usage | 
    $nodes | map(
        . as $node | 
        $usage[] | 
        select(.name == $node.name) | 
        {
            name: .name, 
            cpu_utilization_percentage: (.cpu_usage / $node.cpu_allocatable * 100),
            memory_utilization_percentage: (.memory_usage / $node.memory_allocatable * 100)
        }
    ) | map(select(.cpu_utilization_percentage >= 90 or .memory_utilization_percentage >= 90))'
}

# Fetch pod resource requests
${KUBERNETES_DISTRIBUTION_BINARY} get pods --context ${CONTEXT} --all-namespaces -o json | jq -r '.items[] | {namespace: .metadata.namespace, 
pod: .metadata.name, nodeName: .spec.nodeName, cpu_request: (.spec.containers[].resources.requests.cpu // "0m"), memory_request: (.spec.containers[].resources.requests.memory // "0Mi")} 
| select(.cpu_request != "0m" and .memory_request != "0Mi")' | jq -s '.' > pod_requests.json

# Fetch current pod metrics
${KUBERNETES_DISTRIBUTION_BINARY} top pods --context ${CONTEXT} --all-namespaces --containers | awk 'BEGIN { printf "[" } 
NR>1 { printf "%s{\"namespace\":\"%s\",\"pod\":\"%s\",\"container\":\"%s\",\"cpu_usage\":\"%s\",\"memory_usage\":\"%s\"}", (NR>2 ? "," : ""), $1, $2, $3, $4, $5 } 
END { printf "]" }' | jq '.' > pod_usage.json

# Normalize units and compare
jq -s '[
    .[0][] as $usage | 
    .[1][] | 
    select(.pod == $usage.pod and .namespace == $usage.namespace) |
    {
        pod: .pod,
        namespace: .namespace,
        node: .nodeName,
        cpu_usage: $usage.cpu_usage,
        cpu_request: .cpu_request,
        cpu_usage_exceeds: (
            # Convert CPU usage to millicores, assuming all inputs need to be converted from milli-units if they end with 'm'
            ($usage.cpu_usage | 
                if test("m$") then rtrimstr("m") | tonumber 
                else tonumber * 1000 
                end
            ) > (
                # Convert CPU request to millicores, assuming it may already be in millicores if it ends with 'm'
                .cpu_request | 
                if test("m$") then rtrimstr("m") | tonumber 
                else tonumber * 1000 
                end
            )
        ),
        memory_usage: $usage.memory_usage,
        memory_request: .memory_request,
        memory_usage_exceeds: (
            # Normalize memory usage to MiB, handling MiB and GiB
            ($usage.memory_usage | 
                if test("Gi$") then rtrimstr("Gi") | tonumber * 1024
                elif test("G$") then rtrimstr("G") | tonumber * 1024
                elif test("Mi$") then rtrimstr("Mi") | tonumber
                elif test("M$") then rtrimstr("M") | tonumber
                else tonumber
                end
            ) > (
                # Normalize memory request to MiB
                .memory_request | 
                if test("Gi$") then rtrimstr("Gi") | tonumber * 1024
                elif test("G$") then rtrimstr("G") | tonumber * 1024
                elif test("Mi$") then rtrimstr("Mi") | tonumber
                elif test("M$") then rtrimstr("M") | tonumber
                else tonumber
                end
            )
        )
    }
    | select(.cpu_usage_exceeds or .memory_usage_exceeds)
] | group_by(.namespace) | map({(.[0].namespace): .}) | add' pod_usage.json pod_requests.json > pods_exceeding_requests.json

cat pods_exceeding_requests.json
Helpful Links