Agent Local Checks

Configure local health checks on your agents to monitor internal services, processes, disk usage, memory, CPU, and more.

Overview

Local checks run directly on the host where the upti.my agent is installed. They monitor services, resources, and certificates from inside your infrastructure, giving you visibility into systems that external probes cannot reach. Each check type has its own configuration fields, and all checks share common settings for interval, timeout, and tagging.

ℹ️ No Inbound Access Required

Local checks operate entirely from within your network. The agent only needs outbound HTTPS access to report results back to upti.my. No firewall rules or port openings are required.

Common Settings

Every local check type shares the following configuration options:

Setting	Type	Description
interval	integer (seconds)	How often the check runs. Minimum 10 seconds, default 30 seconds.
timeout	integer (seconds)	Maximum time a check is allowed to run before being marked as failed. Default 10 seconds.
tags	string array	Optional labels for organizing and filtering checks (e.g., "production", "database").

Check Types Reference

1. HTTP Check

Send an HTTP request to a local or internal endpoint and validate the response. This is ideal for monitoring internal APIs, admin panels, and microservices that are not publicly accessible.

Field	Type	Description
url	string	Full URL to check, e.g., `http://localhost:8080/health`
method	string	HTTP method: GET, POST, PUT, DELETE, HEAD. Default: GET.
expected_status	integer	Expected HTTP status code. Default: 200.
expected_body	string	Optional substring that must appear in the response body.
headers	object	Optional custom headers to include in the request.

HTTP Check Example

{
  "type": "http",
  "url": "http://localhost:3000/api/health",
  "method": "GET",
  "expected_status": 200,
  "expected_body": ""status":"ok"",
  "interval": 30,
  "timeout": 10,
  "tags": ["api", "internal"]
}

2. Process Check

Verify that a specific process is running on the host. The agent scans the process list and matches by process name. If the process is not found, the check fails.

Field	Type	Description
process_name	string	Name of the process to look for, e.g., `nginx` or `postgres`

Process Check Example

{
  "type": "process",
  "process_name": "nginx",
  "interval": 15,
  "timeout": 5,
  "tags": ["web-server"]
}

3. Docker Container Check

Monitor the status of a Docker container by name. The check verifies that the container is running and, if a health check is configured on the container, that it reports a healthy state.

Field	Type	Description
container_name	string	Name of the Docker container to monitor, e.g., `redis-cache`

Docker Container Check Example

{
  "type": "docker_container",
  "container_name": "redis-cache",
  "interval": 30,
  "timeout": 10,
  "tags": ["cache", "docker"]
}

4. Disk Usage Check

Monitor disk usage on a specified file system path. The check fails when the usage percentage exceeds your configured threshold, helping you prevent disk-full outages before they happen.

Field	Type	Description
path	string	File system path to monitor, e.g., `/` or `/var/log`
threshold_percent	integer	Usage percentage that triggers a failure. Default: 90.

Disk Usage Check Example

{
  "type": "disk_usage",
  "path": "/",
  "threshold_percent": 85,
  "interval": 60,
  "timeout": 5,
  "tags": ["infrastructure"]
}

5. Memory Check

Monitor system memory utilization. When total memory usage exceeds the configured threshold, the check fails. This helps you detect memory leaks and resource exhaustion early.

Field	Type	Description
threshold_percent	integer	Memory usage percentage that triggers a failure. Default: 90.

Memory Check Example

{
  "type": "memory",
  "threshold_percent": 90,
  "interval": 30,
  "timeout": 5,
  "tags": ["infrastructure"]
}

6. CPU Check

Monitor CPU utilization across all cores. The check samples CPU usage over the timeout window and fails if the average exceeds the configured threshold. Useful for detecting runaway processes and unexpected load spikes.

Field	Type	Description
threshold_percent	integer	CPU usage percentage that triggers a failure. Default: 90.

CPU Check Example

{
  "type": "cpu",
  "threshold_percent": 85,
  "interval": 30,
  "timeout": 10,
  "tags": ["infrastructure"]
}

7. Certificate Check

Monitor local TLS certificate files for upcoming expiration. The agent reads the certificate file from disk and checks how many days remain before it expires. If the remaining days fall below the warning threshold, the check fails.

Field	Type	Description
cert_path	string	Absolute path to the TLS certificate file, e.g., `/etc/ssl/certs/app.crt`
warning_days	integer	Number of days before expiry to trigger a warning. Default: 30.

Certificate Check Example

{
  "type": "certificate",
  "cert_path": "/etc/ssl/certs/app.crt",
  "warning_days": 30,
  "interval": 3600,
  "timeout": 5,
  "tags": ["ssl", "security"]
}

💡 Choosing Check Intervals

Use shorter intervals (10 to 30 seconds) for critical service checks like HTTP and Process. Use longer intervals (60 seconds or more) for resource checks like Disk, Memory, and CPU. Certificate checks only need to run once per hour since expiry changes slowly.

Summary Table

Check Type	Key Config Fields	Typical Interval
HTTP	url, method, expected_status, expected_body, headers	30s
Process	process_name	15s
Docker Container	container_name	30s
Disk Usage	path, threshold_percent	60s
Memory	threshold_percent	30s
CPU	threshold_percent	30s
Certificate	cert_path, warning_days	3600s

⚠️ Agent Permissions

Some checks require elevated permissions. Docker Container checks need access to the Docker socket. Process checks may need root access to see all running processes. Certificate checks need read access to the certificate file. Make sure your agent runs with the appropriate permissions.