TSB - Application Owner Troubleshooting GuideVersion: Latest

Service Metrics

In this section, we will review the tools provided by the Tetrate platform to observe traffic (requests and responses) for your services.

Background to Metrics

Metrics and traces are collected from the Envoy sidecar proxies and gateways that handle traffic for your services. The metrics and traces are collected and stored in an ElasticSearch database, and retained for an admin-defined period of time.

Metrics and traces can be viewed in the TSB user interface, or may be exported and rendered in an enterprise dashboard.

You can observe metrics in near-real time. Note that there is typically a delay to collect and store the metrics, so the metrics may lag the real-time activity accordingly
In the TSB user interface, you can look at historical metrics and traces, using a user-defined window of time, dependent on your admin's retention period for data

Getting Started with Metrics

Start from the dashboard, and select the service instance you are interested in:

Summary Report

Explore the dashboard. You can begin with the Average Metrics summary report:

ApDex Score - varying between [0, 1], this score calculates the health of your service
Throughput - is this what would be expected? Are requests getting to the service (or are you monitoring the correct service instance?)
Response codes - am I seeing 200 OK responses, or a large number of 4xx or 5xx errors
- Note the meaning of the status codes (4xx suggests an authentication or client error, 5xx suggests and application error)
Latency - is the response time within expected limits?
- Percentile metrics provide the response time for the median (P50) request, as well as additional bands up to the slowest 1% (P99) of requests.

High Latency - is it the sidecar, or is it the service?

When observing high-latency transactions, it's always tempting to blame the sidecar that envelops the service. TSB provides metrics to determine the latency added by the sidecar process for requests and responses.

Later on, when we look at how traces can monitor the progress of a request through multiple tiers of gateways and services, you'll see how to correlate latency with gateway and service performance as well.

Service Metrics

The Tetrate platform collects a wealth of metrics, monitoring both the performance of your service and the performance of the platform itself:

Metrics may 'tail off'

Note that in RPS related metrics, the tail off in the most recent minute is to be expected, and is a consequence of how the metrics platform collects and merges data across clusters.

Time Windows

On many of the TSB metrics displays, you can select a time-window for the metrics to show:

If you have identified an error condition, you can then look back through longer-term metrics to determine when the error began. For example, an increase in 404 Not Found errors may be correlated with a recent application redeployment.

Background to Metrics​

Getting Started with Metrics​

Summary Report​

Service Metrics​

Time Windows​

Background to Metrics

Getting Started with Metrics

Summary Report

Service Metrics

Time Windows