TQ
dev.com

Blog about software development

Subscribe

High frequency metrics in PHP with TCP sockets

23 Sep 2024 - by 'Maurits van der Schee'

When you are running a high traffic website you may run multiple PHP servers to handle the load. When you want to track a performance metric (such as API or database calls) you may need to do some high frequent logging for performance analysis. In this blog post we present a standardized (and tested) way to do this with minimal impact. A line logging performance may look like this:

log( [metric name], [label name], [label value], [duration in seconds] )

The current implementation supports labeling each measurement with a single label.

Logging API and DB calls

Here is a first practical example for a HTTP call to a REST end-point that took less than a second:

MetricObserver::log("api", "req", "GET /api/v1/users", 0.2857);

In case of an API you may simply instrument the router of your framework, but in case of a database you may need to instrument the generic database class to log the requested SQL:

MetricObserver::log("db", "sql", "SELECT name FROM user WHERE id = ?", 0.014285);

or you may log the PHP file and line number calling the database class:

MetricObserver::log("db", "file", "src/Controller/UserController.php@L123", 0.014285);

In this last case you may find these values by using the "debug_backtrace(2,15)" call. It returns "frames" that contain "file" (full path) and "line" (number). Which frame is the most relevant frame (and how to find that frame) is dependent on the PHP framework you use. It is important that you only search the last 10-15 stack frames, as indicated by the second parameter of "debug_backtrace" to avoid high costs. You may use the "MetricObserver::logging()" function to avoid calling "debug_backtrace" when the logging is not active.

Logging to TCP socket

We prefer to never store high frequency log data, therefor we ship the high frequent log data to a metric aggregator. In the first example above, the TCP socket (on localhost) will receive the following NDJSON when called 3 times:

["api","req","GET /api/v1/users",0.2857]
["api","req","GET /api/v1/users",0.2857]
["api","req","GET /api/v1/users",0.2857]

This aggregator converts the log lines into metrics (counters and sums) with a name (where the name also contains a label). In the first example above, the metrics are (when called 3 times):

api_seconds_count[req="GET /api/v1/users"]: 3
api_seconds_sum[req="GET /api/v1/users"]: 0.8571

This (text) format is introduced by Prometheus and served over a HTTP endpoint on the "/metrics" path. This has become a standard as defined by the OpenMetrics specification. These metrics can be scraped every 5 seconds and are stored in new rows together with their timestamp (a so called "time-series"). When your servers produce thousands of log lines per second you now only store the metrics. It should be obvious that the size of the metrics (even in time-series) is several magnitudes smaller than the raw log lines (and can be controlled by the configurable scraping interval and the number of different label values).

High frequency logging in PHP

High frequency requires several features:

When logging to a TCP socket there are 3 cases:

We need our code to gracefully handle all 3 cases, as we don't want the instrumented application to lock up due to a blocking call. The most tricky part is that you want the "socket_connect" call to be non-blocking, but while still returning the connecting status if succeeded. A non-blocking connect always reports failure we can't use that. It can be achieved using:

socket_set_option($socket, SOL_SOCKET, SO_SNDTIMEO, ['sec' => 0, 'usec' => 1]);
$connected = socket_connect($socket, $ipAddress, $portNumber);
socket_set_option($socket, SOL_SOCKET, SO_SNDTIMEO, ['sec' => 30, 'usec' => 0]);

As you see we are blocking for 1 microsecond, but only during connect, to allow the connect to report success. By supporting all 3 cases gracefully we can turn off the logging by simply stopping the log aggregation server.

Source code of the MetricObserver

Here is the PHP class you may load into your project. It is framework independent.

class MetricObserver
{
  public static string $address = 'localhost';
  public static int $port = 7777;

  private static ?Socket $socket = null;
  private static bool $connected = false;
  private static int $connectAt = 0;

  public static function logging(): bool
  {
    if (self::$socket === null) {
      self::$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP) ?: null;
      self::$connected = false;
    }
    if (!self::$connected) {
      $now = time();
      if (self::$connectAt != $now) {
        self::$connectAt = $now;
        socket_set_option(self::$socket, SOL_SOCKET, SO_SNDTIMEO, ['sec' => 0, 'usec' => 1]);
        self::$connected = @socket_connect(self::$socket, self::$address, self::$port);
        socket_set_option(self::$socket, SOL_SOCKET, SO_SNDTIMEO, ['sec' => 30, 'usec' => 0]);
      }
    }
    return self::$connected;
  }

  public static function log(string $metricName, string $labelName, string $labelValue, float $duration)
  {
    if (self::logging()) {
      $line = json_encode([$metricName, $labelName, $labelValue, (string)$duration]);
      if (!@socket_write(self::$socket, $line . "\n", strlen($line) + 1)) {
        self::$socket = null;
        self::$connected = false;
      }
    }
  }
}

Note that you can set the server and port to log to a different port or even another host (not recommended) using:

MetricObserver::$address = '127.0.1.1';
MetricObserver::$port = 8888;

The aggregator implementation (in Go) is tested with up to 1 million log lines per second from over 1000 processes and serves gzipped metrics in text format. Check out the Github link below.

https://github.com/mevdschee/php-observability

Read more...

Links


PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.