TQ
dev.com

Blog about software development

Subscribe

High frequency metrics in PHP with TCP sockets

23 Sep 2024 - by 'Maurits van der Schee'

When you are running a high traffic website you may run multiple PHP servers to handle the load. When you want to track a performance metric (such as API or database calls) you may need to do some high frequent logging for performance analysis. In this blog post we present a standardized (and tested) way to do this with minimal impact. A line logging performance may look like this:

log( [metric name], [label name], [label value], [duration in seconds] )

The current implementation supports labeling each measurement with a single label.

Logging API and DB calls

Here is a first practical example for a HTTP call to a REST end-point that took less than a second:

MetricObserver::log("api", "req", "GET /api/v1/users", 0.2857);

In case of an API you may simply instrument the router of your framework, but in case of a database you may need to instrument the generic database class to log the requested SQL:

MetricObserver::log("db", "sql", "SELECT name FROM user WHERE id = ?", 0.014285);

or you may log the PHP file and line number calling the database class:

MetricObserver::log("db", "file", "src/Controller/UserController.php@L123", 0.014285);

In this last case you may find these values by using the "debug_backtrace(2,15)" call. It returns "frames" that contain "file" (full path) and "line" (number). Which frame is the most relevant frame (and how to find that frame) is dependent on the PHP framework you use. It is important that you only search the last 10-15 stack frames, as indicated by the second parameter of "debug_backtrace" to avoid high costs. You may use the "MetricObserver::isConnected()" function to avoid calling "debug_backtrace" when the logging is not active.

Logging to a local TCP socket

We prefer to never store high frequency log data, therefor we ship the high frequent log data to a metric aggregator. In the first example above, the TCP socket (on localhost) will receive the following NDJSON when called 3 times:

["api","req","GET /api/v1/users",0.2857]
["api","req","GET /api/v1/users",0.2857]
["api","req","GET /api/v1/users",0.2857]

This aggregator converts the log lines into metrics (counters and sums) with a name (where the name also contains a label). In the first example above, the metrics are (when called 3 times):

api_seconds_count[req="GET /api/v1/users"]: 3
api_seconds_sum[req="GET /api/v1/users"]: 0.8571

This (text) format is introduced by Prometheus and served over a HTTP endpoint on the "/metrics" path. This has become a standard as defined by the OpenMetrics specification. These metrics can be scraped every 5 seconds and are stored in new rows together with their timestamp (a so called "time-series"). When your servers produce thousands of log lines per second you now only store the metrics. It should be obvious that the size of the metrics (even in time-series) is several magnitudes smaller than the raw log lines (and can be controlled by the configurable scraping interval and the number of different label values).

High frequency metrics in PHP

High frequency metric aggregation requires several features:

When logging metrics to a TCP socket on the localhost there are 2 cases:

We want our code to gracefully handle both cases, and we want the instrumented application to be as fast as possible if there is no connected socket. Therefor we try to connect only once per second and return the connected status from the "MetricObserver::isConnected()" call. You can turn off the logging of metrics by simply stopping the metrics aggregation server (which closes the port on the localhost).

Source code of the MetricObserver

Here is the PHP class you may load into your project. It is framework independent.

class MetricObserver
{
  public static int $port = 7777;

  private static ?Socket $socket = null;
  private static bool $connected = false;
  private static int $connectAt = 0;

  public static function isConnected(): bool
  {
    if (self::$socket === null) {
      self::$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP) ?: null;
      self::$connected = false;
    }
    if (!self::$connected) {
      $now = time();
      if (self::$connectAt != $now) {
        self::$connectAt = $now;
        self::$connected = @socket_connect(self::$socket, 'localhost', self::$port);
      }
    }
    return self::$connected;
  }

  public static function log(string $metricName, string $labelName, string $labelValue, ?float $duration = null)
  {
    if (self::isConnected()) {
      if ($duration === null) {
        $line = json_encode([$metricName, $labelName, $labelValue]);
      } else {
        $line = json_encode([$metricName, $labelName, $labelValue, (string)$duration]);
      }
      if (!@socket_write(self::$socket, $line . "\n", strlen($line) + 1)) {
        self::$socket = null;
        self::$connected = false;
      }
    }
  }
}

Note that you can log to a different port using:

MetricObserver::$port = 8888;

NB: This is a static variable in the class.

Measurement performance

The aggregator implementation (in Go) is tested with up to 1 million logged metrics per second from over 1000 processes and serves gzipped metrics in text format. I also did some measurements to find that logging to a closed port was about 13 times cheaper than logging to an open port. Also logging to a closed port was cheaper than formatting a float with "sprintf()", see:

MetricObserver::log() to a closed port:
0.09 usec
MetricObserver::log() to an open port:
1.18 usec
sprintf('%.2f', 3.14) for comparison:
0.11 usec

These tests can be run from the "perf.php" file that included in the project. Check out the Github link below.

https://github.com/mevdschee/php-observability

Read more...

Links


PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.