1.15.3 Writing Middleware

Middleware classes usually do one of 3 things:

The web.wsgi.base provides a class BaseMiddleware which has methods to allow you to easily accomplish each of these things.

class BaseMiddleware( application)

application should always be the first parameter to a derived middleware class, but you may also wish to have other parameters in derived classes to allow the middleware to be configured.

The class defines the following attributes:

application
The WSGI application (or middleware stack) to which this middleware should be added.

The class defines the following methods:

start( )
This method should be over-ridden in derived classes to provide your application's functionality.

output( *text)
Takes one or more strings and appends them to the _output attribute when they will be returned at the end of program execution to display the program output. For example:

self.output('one')
self.output('one', 'two')

__call__( environ, start_response)
You should not need to modify this method but is documented here for a complete understanding as it provides the functionality which makes derived classes WSGI middleware.

This method intercepts the environ dictionary as well as the headers and status parameters sent by the WSGI server to the start_response() function. It then sends the environ dictionary to the environ() method for modification. The status, headers and exc_info parameters are sent to the response() method which controls the order in which the different parameters are modified. The response() method sends the parameters to the status(), headers and exc_info() methods for modification. The new values are then returned to the __call__ where a modified application object is returned.

response( status, headers, [exc_info=None])
Calls the status(), headers and exc_info() methods to modify the respective parameters then returns the modified values in the order status, headers, exc_info to the __call__() method. Can be over-ridden to change the order in which the parameters are modified.

environ( environ)
Provides the dictionary environ for modification. Must return the environ dictionary to be passed on down the middleware chain.

status( status)
Provides the status string for modification. Must return the status string to be passed on down the middleware chain.

headers( headers)
Provides the headers list for modification. Must return the headers list to be passed on down the middleware chain.

exc_info( exc_info)
Provides the exc_info tuple object generated by a previous error (if one exists) for modification. Must return the exc_info tuple to be passed on down the middleware chain.

transform( output)
Used to transform the body of output returned from the previous item in the middleware stack.

Be aware that you may need to have checked content-type headers and change the content length header if it is set if you intend to change the length of the returned information.

output is an iterable and an iterable should be returned from the output.

To produce your own middleware class, simply over-ride the appropriate methods in your class derived from the BaseMiddleware class, remembering to return the value you wish to passed on along the middleware chain. If you wish to pass information between the various methods, you should set member variables of the classes.

With long middleware chains and functions being passed as parameters down the chain it can get a bit confusing to keep track of program flow.

Program flow is actually very straightfoward. The first piece of middleware is run first, any changes to the environ dictionary are passed on to the next piece of middleware and so on down the chain. Once the start_response function is called the status, headers and application output are sent back up the chain to the server where they are sent to the web browser.

Here is a test application demonstrating middleware and program flow (the headers are not valid HTTP headers obviously):

#!/usr/bin/env python

import sys; sys.path.append('../../../')
import web.wsgi.base, time

class Application(web.wsgi.base.BaseApplication):
    def start(self):
        self.output('Environ Order:\n')
        self.environ['Application'] = time.time()
        time.sleep(1)
        self.headers.append(('Appliction',str(time.time())))
        self.output('Middleware1 ',self.environ['Middleware1'])
        self.output('\n')
        self.output('Middleware2 ',self.environ['Middleware2'])
        self.output('\n')
        self.output('Application ', self.environ['Application'])
        self.output('\n')
        
class Middleware1(web.wsgi.base.BaseMiddleware):
    def environ(self, environ):
        time.sleep(1)
        environ['Middleware1'] = time.time()
        return environ
        
    def headers(self, headers):
        time.sleep(1)
        headers.append(('Middleware1',str(time.time())))
        return headers
        
    def transform(self, output):
        return output + ['Middleware1\n']

class Middleware2(web.wsgi.base.BaseMiddleware):
    def environ(self, environ):
        time.sleep(1)
        environ['Middleware2'] = time.time()
        return environ
        
    def headers(self, headers):
        time.sleep(1)
        headers.append(('Middleware2',str(time.time())))
        return headers

    def transform(self, output):
        return output + ['Middleware2\n']
        
print "Running test..."
application = web.wsgi.runCGI(Middleware1(Middleware2(Application())))

The program will not run from a WSGI server because of the incorrect HTTP headers but you can run it from the command line. The output should look something like this:

Status: 200 OK
Content-type: text/html
Appliction: 1105847968.69
Middleware2: 1105847969.69
Middleware1: 1105847970.69

Environ Order:
Middleware1 1105847966.68
Middleware2 1105847967.69
Application 1105847967.69

Transform Order:
Middleware2
Middleware1

You can see that environ is modified by Middleware1 then Middleware2 then Application. Headers and return transforms are made in exactly the opposite order.

At each stage of the application and middleware chain the component can either return an list of strings in one go or return an iterable.