recipes Namespace Health¶

9 Troubleshooting Commands

Last updated 3 weeks ago

Contributed by stewartshea

Public Source Code

Private Configuration

Join Discussion

Troubleshooting Commands¶

Inspect Warning Events in Namespace `recipes`¶

What does it do?

This command uses kubectl to get events of type Warning from a specific context and namespace and output the results in JSON format, then it filters and processes the data using jq to group and summarize the events based on namespace, kind, and base name, and filter them based on event age.

Command

kubectl get events --field-selector type=Warning --context gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster -n recipes -o json > $HOME/warning_events.json && cat $HOME/warning_events.json | jq -r '[.items[] | {namespace: .involvedObject.namespace, kind: .involvedObject.kind, baseName: ((if .involvedObject.kind == "Pod" then (.involvedObject.name | split("-")[:-1] | join("-")) else .involvedObject.name end) // ""), count: .count, firstTimestamp: .firstTimestamp, lastTimestamp: .lastTimestamp, reason: .reason, message: .message}] | group_by(.namespace, .kind, .baseName) | map({object: (.[0].namespace + "/" + .[0].kind + "/" + .[0].baseName), total_events: (reduce .[] as $event (0; . + $event.count)), summary_messages: (map(.message) | unique | join("; ")), oldest_timestamp: (map(.firstTimestamp) | sort | first), most_recent_timestamp: (map(.lastTimestamp) | sort | last)}) | map(select((now - ((.most_recent_timestamp | fromdateiso8601)))/60 <= 5m ))'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set the kubectl context and namespace variables
export CONTEXT=my-context
export NAMESPACE=my-namespace

# Use kubectl to get events of type Warning in JSON format and save it to a file in the user's home directory
kubectl get events --field-selector type=Warning --context $CONTEXT -n $NAMESPACE -o json > $HOME/warning_events.json

# Use jq to parse and manipulate the JSON output from kubectl to extract relevant information
cat $HOME/warning_events.json | jq -r '
  # Create an array of objects containing selected fields from the original JSON
  [.items[] |
    {
      namespace: .involvedObject.namespace,
      kind: .involvedObject.kind,
      baseName: (
        (if .involvedObject.kind == "Pod" then 
          (.involvedObject.name | split("-")[:-1] | join("-"))
         else
          .involvedObject.name
        end) // ""
      ),
      count: .count,
      firstTimestamp: .firstTimestamp,
      lastTimestamp: .lastTimestamp,
      reason: .reason,
      message: .message
    }
  ] 
  # Group the objects by namespace, kind, and baseName
  | group_by(.namespace, .kind, .baseName) 
  # Map the grouped objects into new objects with aggregated values
  | map(
      {
        object: (.[0].namespace + "/" + .[0].kind + "/" + .[0].baseName),
        total_events: (reduce .[] as $event (0; . + $event.count)),
        summary_messages: (map(.message) | unique | join("; ")),
        oldest_timestamp: (map(.firstTimestamp) | sort | first),
        most_recent_timestamp: (map(.lastTimestamp) | sort | last)
      }
    )
  # Filter out objects based on their most recent timestamp compared to a given event age
  | map(select((now - ((.most_recent_timestamp | fromdateiso8601)))/60 <= ${EVENT_AGE} ))
'

In this multi-line command, we set the kubectl context and namespace, use kubectl to retrieve warning events in JSON format, and then use jq to manipulate the JSON data and extract relevant information according to specific criteria. Each step is accompanied by comments explaining its purpose. This should help newer or less experienced DevOps engineers understand and execute the command effectively.

Helpful Links

Inspect Container Restarts In Namespace `recipes`¶

What does it do?

This command is pulling container restart data from a Kubernetes cluster and formatting it into a readable output, using a specific time period as a threshold for the restart data. It also includes exit code explanations for various scenarios.

Command

TIME_PERIOD="${CONTAINER_RESTART_AGE}"; TIME_PERIOD_UNIT=$(echo $TIME_PERIOD | awk '{print substr($0,length($0),1)}'); TIME_PERIOD_VALUE=$(echo $TIME_PERIOD | awk '{print substr($0,1,length($0)-1)}'); if [[ $TIME_PERIOD_UNIT == "m" ]]; then DATE_CMD_ARG="$TIME_PERIOD_VALUE minutes ago"; elif [[ $TIME_PERIOD_UNIT == "h" ]]; then DATE_CMD_ARG="$TIME_PERIOD_VALUE hours ago"; else echo "Unsupported time period unit. Use 'm' for minutes or 'h' for hours."; exit 1; fi; THRESHOLD_TIME=$(date -u --date="$DATE_CMD_ARG" +"%Y-%m-%dT%H:%M:%SZ"); $KUBERNETES_DISTRIBUTION_BINARY get pods --context=$CONTEXT -n $NAMESPACE -o json | jq -r --argjson exit_code_explanations '{"0": "Success", "1": "Error", "2": "Misconfiguration", "130": "Pod terminated by SIGINT", "134": "Abnormal Termination SIGABRT", "137": "Pod terminated by SIGKILL - Possible OOM", "143":"Graceful Termination SIGTERM"}' --arg threshold_time "$THRESHOLD_TIME" '.items[] | select(.status.containerStatuses != null) | select(any(.status.containerStatuses[]; .restartCount > 0 and (.lastState.terminated.finishedAt // "1970-01-01T00:00:00Z") > $threshold_time)) | "---\npod_name: \(.metadata.name)\n" + (.status.containerStatuses[] | "containers: \(.name)\nrestart_count: \(.restartCount)\nmessage: \(.state.waiting.message // "N/A")\nterminated_reason: \(.lastState.terminated.reason // "N/A")\nterminated_finishedAt: \(.lastState.terminated.finishedAt // "N/A")\nterminated_exitCode: \(.lastState.terminated.exitCode // "N/A")\nexit_code_explanation: \($exit_code_explanations[.lastState.terminated.exitCode | tostring] // "Unknown exit code")") + "\n---\n"'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set the variable TIME_PERIOD to the value of CONTAINER_RESTART_AGE
TIME_PERIOD="${CONTAINER_RESTART_AGE}"

# Get the unit of time from the TIME_PERIOD: 'm' for minutes or 'h' for hours
TIME_PERIOD_UNIT=$(echo $TIME_PERIOD | awk '{print substr($0,length($0),1)}')

# Extract the numerical value from TIME_PERIOD
TIME_PERIOD_VALUE=$(echo $TIME_PERIOD | awk '{print substr($0,1,length($0)-1)}')

# Depending on the unit of time, create the argument for the date command
if [[ $TIME_PERIOD_UNIT == "m" ]]; then 
  DATE_CMD_ARG="$TIME_PERIOD_VALUE minutes ago"
elif [[ $TIME_PERIOD_UNIT == "h" ]]; then 
  DATE_CMD_ARG="$TIME_PERIOD_VALUE hours ago"
else 
  echo "Unsupported time period unit. Use 'm' for minutes or 'h' for hours."
  exit 1
fi

# Calculate the threshold time based on the DATE_CMD_ARG
THRESHOLD_TIME=$(date -u --date="$DATE_CMD_ARG" +"%Y-%m-%dT%H:%M:%SZ")

# Use the Kubernetes distribution binary to get pods in JSON format
$KUBERNETES_DISTRIBUTION_BINARY get pods --context=$CONTEXT -n $NAMESPACE -o json | \
  jq -r --argjson exit_code_explanations '{"0": "Success", "1": "Error", "2": "Misconfiguration", \
    "130": "Pod terminated by SIGINT", "134": "Abnormal Termination SIGABRT", \
    "137": "Pod terminated by SIGKILL - Possible OOM", "143":"Graceful Termination SIGTERM"}' \
    --arg threshold_time "$THRESHOLD_TIME" '.items[] | select(.status.containerStatuses != null) | \
      select(any(.status.containerStatuses[]; .restartCount > 0 and (.lastState.terminated.finishedAt // "1970-01-01T00:00:00Z") > $threshold_time)) | \
      "---\npod_name: \(.metadata.name)\n" + \
      (.status.containerStatuses[] | "containers: \(.name)\nrestart_count: \(.restartCount)\nmessage: \(.state.waiting.message // "N/A")\nterminated_reason: \(.lastState.terminated.reason // "N/A")\nterminated_finishedAt: \(.lastState.terminated.finishedAt // "N/A")\nterminated_exitCode: \(.lastState.terminated.exitCode // "N/A")\nexit_code_explanation: \($exit_code_explanations[.lastState.terminated.exitCode | tostring] // "Unknown exit code")") + "\n---\n"'

Helpful Links

Inspect Pending Pods In Namespace `recipes`¶

What does it do?

This command uses kubectl to get information about pods in a specific context and namespace that are pending, and then uses jq to format the output into a more readable JSON format with specific details about each pod's status and containers.

Command

kubectl get pods --context=gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster -n recipes --field-selector=status.phase=Pending --no-headers -o json | jq -r '[.items[] | {pod_name: .metadata.name, status: (.status.phase // "N/A"), message: (.status.conditions[0].message // "N/A"), reason: (.status.conditions[0].reason // "N/A"), containerStatus: (.status.containerStatuses[0].state // "N/A"), containerMessage: (.status.containerStatuses[0].state.waiting?.message // "N/A"), containerReason: (.status.containerStatuses[0].state.waiting?.reason // "N/A")}]'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# This command uses kubectl to get information about pods in a specific namespace that are in a Pending state
# It then uses jq to format the output into a more readable and structured JSON format

# Set the context and namespace for the kubectl command
CONTEXT="your_context" # replace with the actual context name
NAMESPACE="your_namespace" # replace with the actual namespace name

# Get the pods in the specified context and namespace that are in a Pending state
kubectl get pods --context=${CONTEXT} -n ${NAMESPACE} --field-selector=status.phase=Pending --no-headers -o json \
  | jq -r '[.items[] |
  # Select relevant fields from pod metadata and status
  {pod_name: .metadata.name,
   status: (.status.phase // "N/A"),
   message: (.status.conditions[0].message // "N/A"),
   reason: (.status.conditions[0].reason // "N/A"),
   containerStatus: (.status.containerStatuses[0].state // "N/A"),
   containerMessage: (.status.containerStatuses[0].state.waiting?.message // "N/A"),
   containerReason: (.status.containerStatuses[0].state.waiting?.reason // "N/A")}]'

In this multi-line command, each step is explained with helpful comments to guide newer or less experienced devops engineers through the process and explain what each part of the command does.

Helpful Links

Inspect Failed Pods In Namespace `recipes`¶

What does it do?

This command retrieves information about failed pods in a Kubernetes cluster, including the pod name, restart count, termination message, and exit code explanation. It uses jq to map exit codes to human-readable explanations.

Command

kubectl get pods --context=gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster -n recipes --field-selector=status.phase=Failed --no-headers -o json | jq -r --argjson exit_code_explanations '{"0": "Success", "1": "Error", "2": "Misconfiguration", "130": "Pod terminated by SIGINT", "134": "Abnormal Termination SIGABRT", "137": "Pod terminated by SIGKILL - Possible OOM", "143":"Graceful Termination SIGTERM"}' '[.items[] | {pod_name: .metadata.name, restart_count: (.status.containerStatuses[0].restartCount // "N/A"), message: (.status.message // "N/A"), terminated_finishedAt: (.status.containerStatuses[0].state.terminated.finishedAt // "N/A"), exit_code: (.status.containerStatuses[0].state.terminated.exitCode // "N/A"), exit_code_explanation: ($exit_code_explanations[.status.containerStatuses[0].state.terminated.exitCode | tostring] // "Unknown exit code")}]'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set the context and namespace for the kubectl command
context=${CONTEXT}
namespace=${NAMESPACE}

# Use kubectl to get pods in a specific context and namespace that have failed
# Then parse the output as JSON and store the data using jq
kubectl get pods --context=${context} -n ${namespace} --field-selector=status.phase=Failed --no-headers -o json | \

  # Use jq to format the output for easier readability and understanding
  jq -r --argjson exit_code_explanations '{"0": "Success", "1": "Error", "2": "Misconfiguration", "130": "Pod terminated by SIGINT", "134": "Abnormal Termination SIGABRT", "137": "Pod terminated by SIGKILL - Possible OOM", "143":"Graceful Termination SIGTERM"}' \
  '[.items[] | 
   {pod_name: .metadata.name, 
    restart_count: (.status.containerStatuses[0].restartCount // "N/A"), 
    message: (.status.message // "N/A"), 
    terminated_finishedAt: (.status.containerStatuses[0].state.terminated.finishedAt // "N/A"), 
    exit_code: (.status.containerStatuses[0].state.terminated.exitCode // "N/A"), 
    exit_code_explanation: ($exit_code_explanations[.status.containerStatuses[0].state.terminated.exitCode | tostring] // "Unknown exit code")}]
  '

Helpful Links

Inspect Workload Status Conditions In Namespace `recipes`¶

What does it do?

This command retrieves information about pods in a specific namespace and context, then filters the results to show only pods that are not ready or have not completed, displaying their kind, name, and conditions in JSON format.

Command

kubectl get pods --context gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster -n recipes -o json | jq -r '.items[] | select(.status.conditions[]? | select(.type == "Ready" and .status == "False" and .reason != "PodCompleted")) | {kind: .kind, name: .metadata.name, conditions: .status.conditions}' | jq -s '.'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set the context and namespace for the kubernetes command
CONTEXT="my-kube-context"
NAMESPACE="my-namespace"

# Get the existing pods in JSON format using kubectl
kubectl get pods --context ${CONTEXT} -n ${NAMESPACE} -o json \
  # Use jq to filter out only the pods that meet specific conditions
  | jq -r '.items[] | select(.status.conditions[]? | select(.type == "Ready" and .status == "False" and .reason != "PodCompleted")) \
  # Extract relevant information such as kind, name, and status conditions
  | {kind: .kind, name: .metadata.name, conditions: .status.conditions}' \
  # Use jq with the -s option to treat the entire input as a single JSON array
  | jq -s '.'

This multi-line command breaks down each step of the original command with comments explaining what each part does. It's useful for newer or less experienced devops engineers who may not be familiar with all the commands used.

Helpful Links

Get Listing Of Resources In Namespace `recipes`¶

What does it do?

This command uses kubectl to list all available API resources in the current Kubernetes context, then it uses xargs and bash to get each resource in the specified namespace.

Command

kubectl api-resources --verbs=list --namespaced -o name --context=gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster | xargs -n 1 bash -c 'kubectl get $0 --show-kind --ignore-not-found -n recipes --context=gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# First, use kubectl to list all the available API resources in the cluster that support the "list" verb and are namespaced
api_resources=$(kubectl api-resources --verbs=list --namespaced -o name --context=${CONTEXT})

# Then, for each of the listed resources, use xargs to iterate through them one by one and execute the following command
echo $api_resources | xargs -n 1 bash -c ' \

# Use kubectl to get information about the current resource being processed
kubectl get $0 --show-kind --ignore-not-found -n ${NAMESPACE} --context=${CONTEXT}'

Helpful Links

Check Event Anomalies in Namespace `recipes`¶

What does it do?

This command retrieves events from a Kubernetes cluster, filters out any warnings, and then processes the data to identify any anomalies based on a specified threshold. The results are formatted in JSON for further analysis.

Command

kubectl get events --field-selector type!=Warning --context gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster -n recipes -o json > $HOME/events.json && cat $HOME/events.json | jq -r '[.items[] | {namespace: .involvedObject.namespace, kind: .involvedObject.kind, name: ((if .involvedObject and .involvedObject.kind == "Pod" then (.involvedObject.name | split("-")[:-1] | join("-")) else .involvedObject.name end) // ""), count: .count, firstTimestamp: .firstTimestamp, lastTimestamp: .lastTimestamp, reason: .reason, message: .message}] | group_by(.namespace, .kind, .name) | .[] | {(.[0].namespace + "/" + .[0].kind + "/" + .[0].name): {events: .}}' | jq -r --argjson threshold "3.0" 'to_entries[] | {object: .key, oldest_timestamp: ([.value.events[] | .firstTimestamp] | min), most_recent_timestamp: ([.value.events[] | .lastTimestamp] | max), events_per_minute: (reduce .value.events[] as $event (0; . + $event.count) / (((([.value.events[] | .lastTimestamp | fromdateiso8601] | max) - ([.value.events[] | .firstTimestamp | fromdateiso8601] | min)) / 60) | if . < 1 then 1 else . end)), total_events: (reduce .value.events[] as $event (0; . + $event.count)), summary_messages: [.value.events[] | .message] | unique | join("; ")} | select(.events_per_minute > $threshold)' | jq -s '.'

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set the kubectl context and namespace to use for the command
CONTEXT="example-context"
NAMESPACE="example-namespace"

# Retrieve events from the Kubernetes cluster using the specified context and namespace, and output the results to a JSON file
kubectl get events --field-selector type!=Warning --context ${CONTEXT} -n ${NAMESPACE} -o json > $HOME/events.json

# Process the JSON file with jq to extract relevant event information and format it into a new JSON structure
cat $HOME/events.json | jq -r '[
  .items[] | {
    namespace: .involvedObject.namespace,
    kind: .involvedObject.kind,
    name: (
      (if .involvedObject and .involvedObject.kind == "Pod" then
        (.involvedObject.name | split("-")[:-1] | join("-"))
      else
        .involvedObject.name
      end) // ""
    ),
    count: .count,
    firstTimestamp: .firstTimestamp,
    lastTimestamp: .lastTimestamp,
    reason: .reason,
    message: .message
  }
] | group_by(.namespace, .kind, .name) | .[] | {(.[0].namespace + "/" + .[0].kind + "/" + .[0].name): {events: .}}'

# Further process the formatted JSON data to calculate anomaly metrics based on specified threshold values
jq -r --argjson threshold "${ANOMALY_THRESHOLD}" 'to_entries[] | {
  object: .key,
  oldest_timestamp: ([.value.events[] | .firstTimestamp] | min),
  most_recent_timestamp: ([.value.events[] | .lastTimestamp] | max),
  events_per_minute: (
    reduce .value.events[] as $event (0; . + $event.count) / (
      (
        (
          ([.value.events[] | .lastTimestamp | fromdateiso8601] | max) - 
          ([.value.events[] | .firstTimestamp | fromdateiso8601] | min)
        ) / 60
      ) | if . < 1 then 1 else . end
    )
  ),
  total_events: (reduce .value.events[] as $event (0; . + $event.count)),
  summary_messages: [.value.events[] | .message] | unique | join("; ")
} | select(.events_per_minute > $threshold)'

# Combine all individual JSON objects into a single array and output the final processed data
jq -s '.'

Helpful Links

Check Missing or Risky PodDisruptionBudget Policies in Namepace `recipes`¶

What does it do?

This command checks the health of deployments and statefulsets in a Kubernetes cluster by evaluating the corresponding PodDisruptionBudgets (PDBs) to determine if they are missing, risky, or OK based on certain criteria. It then prints the status of each deployment and statefulset along with their associated PDBs.

Command

context="gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster"; namespace="recipes"; check_health() { local type=$1; local name=$2; local replicas=$3; local selector=$4; local pdbs=$(kubectl --context "$context" --namespace "$namespace" get pdb -o json | jq -c --arg selector "$selector" '.items[] | select(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value == $selector)'); if [[ $replicas -gt 1 && -z "$pdbs" ]]; then printf "%-30s %-30s %-10s\n" "$type/$name" "" "Missing"; else echo "$pdbs" | jq -c . | while IFS= read -r pdb; do local pdbName=$(echo "$pdb" | jq -r '.metadata.name'); local minAvailable=$(echo "$pdb" | jq -r '.spec.minAvailable // ""'); local maxUnavailable=$(echo "$pdb" | jq -r '.spec.maxUnavailable // ""'); if [[ "$minAvailable" == "100%" || "$maxUnavailable" == "0" || "$maxUnavailable" == "0%" ]]; then printf "%-30s %-30s %-10s\n" "$type/$name" "$pdbName" "Risky"; elif [[ $replicas -gt 1 && ("$minAvailable" != "100%" || "$maxUnavailable" != "0" || "$maxUnavailable" != "0%") ]]; then printf "%-30s %-30s %-10s\n" "$type/$name" "$pdbName" "OK"; fi; done; fi; }; echo "Deployments:"; echo "_______"; printf "%-30s %-30s %-10s\n" "NAME" "PDB" "STATUS"; kubectl --context "$context" --namespace "$namespace" get deployments -o json | jq -c '.items[] | "\(.metadata.name) \(.spec.replicas) \(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value)"' | while read -r line; do check_health "Deployment" $(echo $line | tr -d '"'); done; echo ""; echo "Statefulsets:"; echo "_______"; printf "%-30s %-30s %-10s\n" "NAME" "PDB" "STATUS"; kubectl --context "$context" --namespace "$namespace" get statefulsets -o json | jq -c '.items[] | "\(.metadata.name) \(.spec.replicas) \(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value)"' | while read -r line; do check_health "StatefulSet" $(echo $line | tr -d '"'); done

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

# Set context and namespace variables for kubectl commands
context="${CONTEXT}"
namespace="${NAMESPACE}"

# Function to check the health of a resource
check_health() {
    local type=$1
    local name=$2
    local replicas=$3
    local selector=$4
    # Get relevant PodDisruptionBudgets (pdb) using jq filtering
    local pdbs=$(kubectl --context "$context" --namespace "$namespace" get pdb -o json | jq -c --arg selector "$selector" '.items[] | select(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value == $selector)')

    # Check if replicas are greater than 1 and pdb is missing
    if [[ $replicas -gt 1 && -z "$pdbs" ]]; then
        printf "%-30s %-30s %-10s\n" "$type/$name" "" "Missing"
    else
        # Loop over each pdb
        echo "$pdbs" | jq -c . | while IFS= read -r pdb; do
            local pdbName=$(echo "$pdb" | jq -r '.metadata.name')
            local minAvailable=$(echo "$pdb" | jq -r '.spec.minAvailable // ""')
            local maxUnavailable=$(echo "$pdb" | jq -r '.spec.maxUnavailable // ""')

            # Check if minAvailable is 100% or maxUnavailable is 0 or 0%
            if [[ "$minAvailable" == "100%" || "$maxUnavailable" == "0" || "$maxUnavailable" == "0%" ]]; then
                printf "%-30s %-30s %-10s\n" "$type/$name" "$pdbName" "Risky"
            # Check if replicas are greater than 1 and minAvailable or maxUnavailable are other than risky values
            elif [[ $replicas -gt 1 && ("$minAvailable" != "100%" || "$maxUnavailable" != "0" || "$maxUnavailable" != "0%") ]]; then
                printf "%-30s %-30s %-10s\n" "$type/$name" "$pdbName" "OK"
            fi
        done
    fi
}

# Get deployments and check their health
echo "Deployments:"
echo "_______"
printf "%-30s %-30s %-10s\n" "NAME" "PDB" "STATUS"
kubectl --context "$context" --namespace "$namespace" get deployments -o json | jq -c '.items[] | "\(.metadata.name) \(.spec.replicas) \(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value)"' | while read -r line; do 
    check_health "Deployment" $(echo $line | tr -d '"'); 
done

echo ""

# Get statefulsets and check their health
echo "Statefulsets:"
echo "_______"
printf "%-30s %-30s %-10s\n" "NAME" "PDB" "STATUS"
kubectl --context "$context" --namespace "$namespace" get statefulsets -o json | jq -c '.items[] | "\(.metadata.name) \(.spec.replicas) \(.spec.selector.matchLabels | to_entries[] | .key + "=" + .value)"' | while read -r line; do 
    check_health "StatefulSet" $(echo $line | tr -d '"'); 
done

Helpful Links

Check Resource Quota Utilization in Namespace `recipes`¶

What does it do?

This is a Bash script that calculates resource usage and generates recommendations based on the usage of memory and CPU in a Kubernetes environment. It converts memory to Mi (mebibytes) and CPU to millicores, then checks the usage against set limits and generates corresponding recommendations for adjusting the resource quotas.

Command

KUBERNETES_DISTRIBUTION_BINARY="kubectl" NAMESPACE="recipes" CONTEXT="gke_runwhen-nonprod-sandbox_us-central1_sandbox-cluster-1-cluster" ANOMALY_THRESHOLD="3.0" EVENT_AGE="5m"  bash -c "$(curl -s https://raw.githubusercontent.com/runwhen-contrib/rw-cli-codecollection/main/codebundles/k8s-namespace-healthcheck/resource_quota_check.sh)" _

Copy to clipboard Copied to clipboard

Learn more

This multi-line content is auto-generated and used for educational purposes. Copying and pasting the multi-line text might not function as expected.

#!/bin/bash

# Initialize recommendations array
declare -a recommendations

# Function to convert memory to Mi
convert_memory_to_mib() {
    local memory=$1

    # Extract the number and unit separately
    local number=${memory//[!0-9]/}
    local unit=${memory//[0-9]/}

    case $unit in
        Gi)
            echo $(( number * 1024 ))  # Convert Gi to Mi
            ;;
        Mi)
            echo $number  # Already in Mi
            ;;
        Ki)
            echo $(( number / 1024 ))  # Convert Ki to Mi
            ;;
        *)
            echo $(( number / (1024 * 1024) ))  # Convert bytes to Mi
            ;;
    esac
}

# Function to convert CPU to millicores
convert_cpu_to_millicores() {
    local cpu=$1
    if [[ $cpu =~ ^[0-9]+m$ ]]; then
        echo ${cpu%m}
    else
        echo $(($cpu * 1000))  # Convert CPU cores to millicores
    fi
}

# Function to calculate and display resource usage status with recommendations
check_usage() {
    local quota_name=$1
    local resource=$2
    local used=$3
    local hard=$4

    # Convert memory and CPU to a common unit (Mi and millicores respectively)
    if [[ $resource == *memory* ]]; then
        used=$(convert_memory_to_mib $used)
        hard=$(convert_memory_to_mib $hard)
    elif [[ $resource == *cpu* ]]; then
        used=$(convert_cpu_to_millicores $used)
        hard=$(convert_cpu_to_millicores $hard)
    fi

    # Calculating percentage
    local percentage=0
    if [ $hard -ne 0 ]; then
        percentage=$(( 100 * used / hard ))
    fi

    # Generate recommendation based on usage
    local recommendation=""
    local increase_percentage=0
    local increased_value=0
    if [ $percentage -ge 100 ]; then
        if [ $used -gt $hard ]; then
            # If usage is over 100%, match the current usage
            echo "$resource: OVER LIMIT ($percentage%) - Adjust resource quota to match current usage with some headroom for $resource in $NAMESPACE"
            increase_percentage="${CRITICAL_INCREASE_LEVEL:-40}"
            increased_value=$(( used * increase_percentage / 100 ))
            suggested_value=$(( increased_value + used ))
        else
            echo "$resource: AT LIMIT ($percentage%) - Immediately increase the resource quota for $resource in $NAMESPACE"
            increase_percentage="${CRITICAL_INCREASE_LEVEL:-40}"
            increased_value=$(( hard * increase_percentage / 100 ))
            suggested_value=$(( increased_value + hard ))
        fi
        recommendation="{\"remediation_type\":\"resourcequota_update\",\"increase_percentage\":\"$increase_percentage\",\"limit_type\":\"hard\",\"current_value\":\"$hard\",\"suggested_value\":\"$suggested_value\",\"quota_name\": \"$quota_name\", \"resource\": \"$resource\", \"usage\": \"at or above 100%\", \"severity\": \"1\", \"next_step\": \"Increase the resource quota for $resource in \`$NAMESPACE\`\"}"
    #... (and so on)

# Fetching resource quota details
quota_json=$(${KUBERNETES_DISTRIBUTION_BINARY} get quota -n "$NAMESPACE" --context "$CONTEXT" -o json)

# Processing the quota JSON
echo "Resource Quota and Usage for Namespace: $NAMESPACE in Context: $CONTEXT"
echo "==========================================="

# Parsing quota JSON
while IFS= read -r item; do
    quota_name=$(echo "$item" | jq -r '.metadata.name')
    echo "Quota: $quota_name"

    # Create temporary files
    hard_file=$(mktemp)
    used_file=$(mktemp)

    echo "$item" | jq -r '.status.hard | to_entries | .[] | "\(.key) \(.value)"' > "$hard_file"
    echo "$item" | jq -r '.status.used | to_entries | .[] | "\(.key) \(.value)"' > "$used_file"

    # Process 'hard' limits and 'used' resources
    while read -r key value; do
        hard=$(grep "^$key " "$hard_file" | awk '{print $2}')
        used=$(grep "^$key " "$used_file" | awk '{print $2}')
        check_usage "$quota_name" "$key" "${used:-0}" "$hard"
    done < "$hard_file"

    echo "-----------------------------------"

    # Clean up temporary files
    rm "$hard_file" "$used_file"
done < <(echo "$quota_json" | jq -c '.items[]')

# Outputting recommendations as JSON
if [ -n "$recommendations" ]; then
    echo "Recommended Next Steps:"
    echo "[$recommendations]" | jq .
else
    echo "No recommendations."
fi

Helpful Links

recipes Namespace Health¶

Troubleshooting Commands¶

Inspect Warning Events in Namespace recipes¶

Inspect Container Restarts In Namespace recipes¶

Inspect Pending Pods In Namespace recipes¶

Inspect Failed Pods In Namespace recipes¶

Inspect Workload Status Conditions In Namespace recipes¶

Get Listing Of Resources In Namespace recipes¶

Check Event Anomalies in Namespace recipes¶

Check Missing or Risky PodDisruptionBudget Policies in Namepace recipes¶

Check Resource Quota Utilization in Namespace recipes¶

Inspect Warning Events in Namespace `recipes`¶

Inspect Container Restarts In Namespace `recipes`¶

Inspect Pending Pods In Namespace `recipes`¶

Inspect Failed Pods In Namespace `recipes`¶

Inspect Workload Status Conditions In Namespace `recipes`¶

Get Listing Of Resources In Namespace `recipes`¶

Check Event Anomalies in Namespace `recipes`¶

Check Missing or Risky PodDisruptionBudget Policies in Namepace `recipes`¶

Check Resource Quota Utilization in Namespace `recipes`¶