upti.my

Agent Local Checks

Configure local health checks on your agents to monitor internal services, processes, disk usage, memory, CPU, and more.

Overview

Local checks run directly on the host where the upti.my agent is installed. They monitor services, resources, and certificates from inside your infrastructure, giving you visibility into systems that external probes cannot reach. Each check type has its own configuration fields, and all checks share common settings for interval, timeout, and tagging.

ℹ️ No Inbound Access Required

Local checks operate entirely from within your network. The agent only needs outbound HTTPS access to report results back to upti.my. No firewall rules or port openings are required.

Common Settings

Every local check type shares the following configuration options:

SettingTypeDescription
intervalinteger (seconds)How often the check runs. Minimum 10 seconds, default 30 seconds.
timeoutinteger (seconds)Maximum time a check is allowed to run before being marked as failed. Default 10 seconds.
tagsstring arrayOptional labels for organizing and filtering checks (e.g., "production", "database").

Check Types Reference

1. HTTP Check

Send an HTTP request to a local or internal endpoint and validate the response. This is ideal for monitoring internal APIs, admin panels, and microservices that are not publicly accessible.

FieldTypeDescription
urlstringFull URL to check, e.g., http://localhost:8080/health
methodstringHTTP method: GET, POST, PUT, DELETE, HEAD. Default: GET.
expected_statusintegerExpected HTTP status code. Default: 200.
expected_bodystringOptional substring that must appear in the response body.
headersobjectOptional custom headers to include in the request.
HTTP Check Example
{
  "type": "http",
  "url": "http://localhost:3000/api/health",
  "method": "GET",
  "expected_status": 200,
  "expected_body": ""status":"ok"",
  "interval": 30,
  "timeout": 10,
  "tags": ["api", "internal"]
}

2. Process Check

Verify that a specific process is running on the host. The agent scans the process list and matches by process name. If the process is not found, the check fails.

FieldTypeDescription
process_namestringName of the process to look for, e.g., nginx or postgres
Process Check Example
{
  "type": "process",
  "process_name": "nginx",
  "interval": 15,
  "timeout": 5,
  "tags": ["web-server"]
}

3. Docker Container Check

Monitor the status of a Docker container by name. The check verifies that the container is running and, if a health check is configured on the container, that it reports a healthy state.

FieldTypeDescription
container_namestringName of the Docker container to monitor, e.g., redis-cache
Docker Container Check Example
{
  "type": "docker_container",
  "container_name": "redis-cache",
  "interval": 30,
  "timeout": 10,
  "tags": ["cache", "docker"]
}

4. Disk Usage Check

Monitor disk usage on a specified file system path. The check fails when the usage percentage exceeds your configured threshold, helping you prevent disk-full outages before they happen.

FieldTypeDescription
pathstringFile system path to monitor, e.g., / or /var/log
threshold_percentintegerUsage percentage that triggers a failure. Default: 90.
Disk Usage Check Example
{
  "type": "disk_usage",
  "path": "/",
  "threshold_percent": 85,
  "interval": 60,
  "timeout": 5,
  "tags": ["infrastructure"]
}

5. Memory Check

Monitor system memory utilization. When total memory usage exceeds the configured threshold, the check fails. This helps you detect memory leaks and resource exhaustion early.

FieldTypeDescription
threshold_percentintegerMemory usage percentage that triggers a failure. Default: 90.
Memory Check Example
{
  "type": "memory",
  "threshold_percent": 90,
  "interval": 30,
  "timeout": 5,
  "tags": ["infrastructure"]
}

6. CPU Check

Monitor CPU utilization across all cores. The check samples CPU usage over the timeout window and fails if the average exceeds the configured threshold. Useful for detecting runaway processes and unexpected load spikes.

FieldTypeDescription
threshold_percentintegerCPU usage percentage that triggers a failure. Default: 90.
CPU Check Example
{
  "type": "cpu",
  "threshold_percent": 85,
  "interval": 30,
  "timeout": 10,
  "tags": ["infrastructure"]
}

7. Certificate Check

Monitor local TLS certificate files for upcoming expiration. The agent reads the certificate file from disk and checks how many days remain before it expires. If the remaining days fall below the warning threshold, the check fails.

FieldTypeDescription
cert_pathstringAbsolute path to the TLS certificate file, e.g., /etc/ssl/certs/app.crt
warning_daysintegerNumber of days before expiry to trigger a warning. Default: 30.
Certificate Check Example
{
  "type": "certificate",
  "cert_path": "/etc/ssl/certs/app.crt",
  "warning_days": 30,
  "interval": 3600,
  "timeout": 5,
  "tags": ["ssl", "security"]
}

💡 Choosing Check Intervals

Use shorter intervals (10 to 30 seconds) for critical service checks like HTTP and Process. Use longer intervals (60 seconds or more) for resource checks like Disk, Memory, and CPU. Certificate checks only need to run once per hour since expiry changes slowly.

Summary Table

Check TypeKey Config FieldsTypical Interval
HTTPurl, method, expected_status, expected_body, headers30s
Processprocess_name15s
Docker Containercontainer_name30s
Disk Usagepath, threshold_percent60s
Memorythreshold_percent30s
CPUthreshold_percent30s
Certificatecert_path, warning_days3600s

⚠️ Agent Permissions

Some checks require elevated permissions. Docker Container checks need access to the Docker socket. Process checks may need root access to see all running processes. Certificate checks need read access to the certificate file. Make sure your agent runs with the appropriate permissions.