Skip to main content

Monitoring

Breaking Changes

To faciliate access to metrics, we changed how we publish controller-manager metrics in version v0.14.1 of the Signadot Operator. Documentation for older operator versions is available here.

The Signadot Operator exposes several Prometheus endpoints that can be used to collect metrics about the status of the application. Each Prometheus endpoint is exposed via a service in the signadot namespace named with a -metrics suffix and serving port 9090.

The following metrics services are available:

  1. agent-metrics at path /metrics with compnent name agent
  2. io-context-server-metrics at path /metrics with component name io_context_server
  3. signadot-controller-manager-metrics at path /metrics/signadot with component name manager
  4. tunnel-proxy-metrics at path /metrics with component name tunnel_proxy
  5. routeserver-metrics at path /metrics (v0.15+)

Signadot Operator metrics names all take the form signadot_operator_<component>_<metric-name>

Agent Metrics

Agent metris served by the service agent-metrics at port 9090 at path /metrics. The metrics are prefixed by signadot_operator_agent_.

Metrics nameTypeDescription
connection_attemptsCounterTotal number times the agent attempts to connect

A sudden increase in this metric by some value greater than 1 could indicate connectivity problems to Signadot or problems with the agent deployment.

IO Context Server Metrics

IO Context Server metris are served by the service io-context-server-metrics at port 9090 at path /metrics. The metrics names are prefixed by signadot_operator_io_context_server_.

Metrics nameTypeDescription
requests{method,result}CounterCount of requests by HTTP method and HTTP status code of the result

Controller Manager Metrics

Controller Manager metris are served by the service signadot-controller-manager-metrics at port 9090 at path /metrics/signadot. The metrics names are prefixed by signadot_operator_manager_.

The following metrics are exported:

Metrics nameTypeDescription
sandboxesGaugeTotal number of active sandbox resources by Readiness and ReadinessReason
routegroupsGaugeTotal number of active routegroup resources by Readiness and ReadinessReason

Many lower level metrics exported by controller-runtime are available at the same host and port but under path /metrics.

Tunnel Proxy Metrics

Tunnel Proxy metris are served by the service tunnel-proxy-metrics at port 9090 at path /metrics. The metrics names are prefixed by signadot_operator_tunnel_proxy_.

Metrics nameTypeDescription
forward_tunnel_connection_errors{method}CounterTotal number of errors in forward tunnel connections, by method
reverse_tunnels{method}GaugeTotal number of reverse tunnels, by method (xap or ssh)
reverse_tunnel_connections{method}GaugeTotal number of reverse tunnel connections, by method
reverse_tunnel_errors{method}CounterTotal number of errors setting up reverse tunnels, by method
requests_users_total{type,user}CounterTotal number of connection requests, by type (forward or reverse) and user
requests_endpoints_total{type,host}CounterTotal number of connection requests, by type (forward or reverse) and host endpoint
requests_protocol_total{type,protocol}CounterTotal number of connection requests, by type (forward or reverse) and protocol

Route Server Metrics (v0.15+)

Route Server metris are served by the service routeserver-metrics at port 9090 at path /metrics. The metrics names for the gRPC endpoint are prefixed by grpc_server_.

Metrics nameTypeDescription
grpc_server_started_totalCounterTotal number of RPCs started on the server, by type, service and method
grpc_server_handled_totalCounterTotal number of RPCs completed on the server, regardless of success or failure, by type, service, method and code
grpc_server_msg_received_totalCounterTotal number of gRPC stream messages received on the server, by type, service and method
grpc_server_msg_sent_totalCounterTotal number of gRPC stream messages sent by the server, by type, service and method
grpc_server_handling_seconds_countHistogramCount of all completed RPCs by type, service and method
grpc_server_handling_seconds_sumHistogramCumulative time of RPCs by type, service and method
grpc_server_handling_seconds_bucketHistogramThe counts of RPCs by type, service and method in respective handling-time buckets

In the case of the HTTP endpoint, metrics names are prefixed by http_server_.

Metrics nameTypeDescription
http_server_requests_totalCounterCount of HTTP requests received, by status code, method and path
http_server_requests_totalHistogramThe counts of HTTP requests by status code, method and path in respective handling-time buckets

Prometheus Integration

A full example of integration with the Prometheus Operator is available here

Operator v0.14.0 and Prior

Operator v0.14.0 and prior only export general controller-runtime metrics, and only via RBAC authenticated https at:

https://signadot-controller-manager-metrics-service.signadot.svc:8443/metrics

More information about authenticating to this endpoint is available below.

Tunnel Proxy Metrics

The Tunnel Proxy exposes an HTTP endpoing at:

http://tunnel-proxy.signadot.svc:8001/metrics

The following metrics are exported:

Metrics nameTypeDescription
inbound_connectionsGaugeTotal number of active inbound connections by tunnel method
inbound_revtunsGaugeTotal number of inbound reverse tunnels by tunnel method

Authenticating for Controller Manager Metrics v0.14.0 and Prior

Deprecated

To faciliate access to metrics, we changed how we publish controller-manager metrics in version v0.14.1 of the Signadot Operator. This section is only relevant to operator versions v0.14.0 and prior.

Prior to v0.14.1 of the Signadot Operator, the controller-manager metrics endpoint was protected by kube-rbac-proxy, so accessing it requires some authentication setup. In particular, clients will need to

  1. Authenticate to the https metrics endpoint using the ClusterRole signadot-metrics-reader; and
  2. Accept a self-signed certificate.

Below are instructions for accomplishing this with Prometheus and the Datadog Agent.

Authenticating with Prometheus

  1. Grant the required permissions to the service account used by Prometheus:
kubectl create clusterrolebinding signadot-metrics-reader --clusterrole=signadot-metrics-reader --serviceaccount=<namespace>:<service-account-name>
  1. Configure the ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: controller-manager-metrics-monitor
namespace: signadot
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: controller-manager

Datadog Agent

If you are using Datadog Agent, you apply the following patch to the signadot-controller-manager deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: signadot-controller-manager
spec:
template:
metadata:
annotations:
ad.datadoghq.com/kube-rbac-proxy.check_names: |
["openmetrics"]
ad.datadoghq.com/kube-rbac-proxy.init_configs: |
[{}]
ad.datadoghq.com/kube-rbac-proxy.instances: |
[{
"openmetrics_endpoint": "https://%%host%%:8443/metrics",
"namespace": "signadot",
"metrics": [".*"],
"auth_token": {
"reader": {
"type": "file",
"path": "/var/run/secrets/kubernetes.io/serviceaccount/token"
},
"writer": {
"type": "header",
"name": "Authorization",
"value": "Bearer <TOKEN>",
"placeholder": "<TOKEN>"
}
},
"tls_verify": "false"
}]