Blog about software development


Porting PHP-CRUD-API to Python

12 Nov 2017 - by 'Maurits van der Schee'

I have ported the core of PHP-CRUD-API to Python and the results are encouraging. In PHP 7 the core executes at 6500 requests per second, while in Python I can get it to do about 9000 requests per second. It must be noted that I'm not using connection pooling, but just reusing the same single connection per thread. With connection pooling (as PHP and other implementations do) I can't get it above 2600 requests per second.

Hello world web service in Python

This is the "hello world" code I wrote for Bjoern, a fast Python-based web server:

import bjoern, os

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return b'Hello World!'

if __name__ == "__main__":
    bjoern.listen(application, "localhost", 8080)

This is very fast (27k req/sec), which is about as fast as a helloworld.php file performs (with mod_php on Apache) or the implementations of hello world in Java or Go.

A MySQL REST API in Python

Now I wanted to port the core of the full program to Python and see how it performs. This is what I came up with:

import bjoern, os
import re
import json
import mysql.connector

# connect to the mysql database
link = mysql.connector.connect(host='localhost', user='php-crud-api',
                               password='php-crud-api', db='php-crud-api',
                               charset='utf8', autocommit=True)
cursor = link.cursor(dictionary=True)

def app(environ, start_response):
    # get the HTTP method, path and body of the request
    method = environ['REQUEST_METHOD']
    path = environ.get('PATH_INFO', '').split('/')
    size = int(environ.get('CONTENT_LENGTH', '0'))
    if size > 0:
        data = json.loads(environ['wsgi.input'].read(size))
        data = {}
    # retrieve the table and key from the path
    table = re.sub(r'[^a-zA-Z0-9_]+', '', path[1] if len(path) > 1 else '')
    key = int(path[2] if len(path) > 2 else '0')
    # escape the columns and values from the input object
    columns = list(re.sub(r'[^a-zA-Z0-9_]+', '', k) for k in data.keys())
    values = list(link.escape_string(str(v)) for v in data.values())
    # build the SET part of the SQL command
    sql = ''
    for i in range(0, len(columns)):
        sql += (',' if i > 0 else '')+'`'+columns[i]+'`="'+values[i]+'"'
    # create SQL based on HTTP method
    if method == 'GET':
        sql = 'select * from `'+table+'`'
        if key > 0:
            sql += ' WHERE id='+str(key)
    elif method == 'PUT':
        sql = 'update `'+table+'` set '+sql+' where id='+str(key)
    elif method == 'POST':
        sql = 'insert into `'+table+'` set '+sql
    elif method == 'DELETE':
        sql = 'delete from `'+table+'` where id='+str(key)
    # execute SQL statement
        # print results, insert id or affected row count
        start_response('200 OK', [('Content-Type', 'text/html')])
        if method == 'GET':
            if key == 0:
                yield str('[')
            i = 0
            for row in cursor:
                yield str((',' if i > 0 else '')+json.dumps(row))
                i += 1
            if key == 0:
                yield str(']')
        elif method == 'POST':
            yield str(cursor.lastrowid)
            yield str(cursor.rowcount)
    except (mysql.connector.errors.DatabaseError) as err:
        # die if SQL statement failed
        start_response('404 Not Found', [('Content-Type', 'text/html')])
        yield str(err.args[1])
    # close mysql connection

if __name__ == "__main__":
    bjoern.listen(app, '', 8000)

To install dependencies run:

sudo apt-get install python pip libev-dev libprotobuf-dev protobuf-compiler
export MYSQLXPB_PROTOC=/usr/bin/protoc
export MYSQLXPB_PROTOBUF_INCLUDE_DIR=/usr/include/google/protobuf
export MYSQLXPB_PROTOBUF_LIB_DIR=/usr/lib/x86_64-linux-gnu
pip install bjoern mysql-connector gunicorn meinheld

It does 5k requests per second when running single core:

python api.py

When running multi core (workers should match core count) I see 9k requests per second:

gunicorn --workers=4 --worker-class="egg:meinheld#gunicorn_worker" api:app

I applied gunicorn and meinheld as suggested in the Python readme on Techempower's Github account. This should guarantee great performance as the benchmarks in the Techempower benchmark are often highly optimized.

Disabling connection pooling

The connection pooling is disabled. This normally takes care of:

and does so before re-using a connection (so, on every request). We are skipping these to get the performance from 2600 requests per second to an impressive 9000 requests per second. This is not entirely fair as other implementations are not taking this shortcut, but in most situations this optimization is acceptable as you control the executed SQL. If you have pressing reasons why this would not be a good idea, please let me know on Github.