1.14.5 Writing Your Own Middleware

See first the WSGI Middleware Introduction earlier in this document.

Eariler in this document we saw some simple middleware components and learned that for an object to be valid WSGI middleare it must take a WSGI application object as parameter and behave exaclty like a WSGI application itself.

With long middleware chains and functions being passed as parameters down the chain it can get a bit confusing to keep track of program flow.

Program flow is actually very straightfoward. The first piece of middleware is run first, any changes to the environ dictionary are passed on to the next piece of middleware and so on down the chain. Once the start_response function is called by the application at the end of the chain, the status, headers and application output are sent back up the chain to the server where they are sent to the web browser.

Here is a test application demonstrating middleware and program flow (the headers are not valid HTTP headers obviously):

#!/usr/bin/env python

import sys; sys.path.append('../../../')
import web.wsgi.base, time

class Application(web.wsgi.base.BaseApplication):
    def start(self):
        self.output('Environ Order:\n')
        self.environ['Application'] = time.time()
        time.sleep(1)
        self.headers.append(('Appliction',str(time.time())))
        self.output('Middleware1 ',self.environ['Middleware1'])
        self.output('\n')
        self.output('Middleware2 ',self.environ['Middleware2'])
        self.output('\n')
        self.output('Application ', self.environ['Application'])
        self.output('\n')
        
class Middleware1(web.wsgi.base.BaseMiddleware):
    def environ(self, environ):
        time.sleep(1)
        environ['Middleware1'] = time.time()
        return environ
        
    def headers(self, headers):
        time.sleep(1)
        headers.append(('Middleware1',str(time.time())))
        return headers
        
    def transform(self, output):
        return output + ['Middleware1\n']

class Middleware2(web.wsgi.base.BaseMiddleware):
    def environ(self, environ):
        time.sleep(1)
        environ['Middleware2'] = time.time()
        return environ
        
    def headers(self, headers):
        time.sleep(1)
        headers.append(('Middleware2',str(time.time())))
        return headers

    def transform(self, output):
        return output + ['Middleware2\n']
        
print "Running test..."
application = web.wsgi.runCGI(Middleware1(Middleware2(Application())))

The program will not run from a WSGI server because of the incorrect HTTP headers but you can run it from the command line. The output should look something like this:

Status: 200 OK
Content-type: text/html
Appliction: 1105847968.69
Middleware2: 1105847969.69
Middleware1: 1105847970.69

Environ Order:
Middleware1 1105847966.68
Middleware2 1105847967.69
Application 1105847967.69

Transform Order:
Middleware2
Middleware1

You can see that environ is modified by Middleware1 then Middleware2 then Application. Headers and return transforms are made in exactly the opposite order.

At each stage of the application and middleware chain the component can either return an list of strings in one go or return an iterable.

We also learned earlier that WSGI middleware can be implemented as a class and usually performs one of the following actions or a combination of them.

The web.wsgi.base module provides a base Middleware class with methods to accomplish these tasks so that you don't need to worry quite so much about program flow or how to implement your middleware.

class BaseMiddleware( application)

application should always be the first parameter to a derived middleware class, but you may also wish to have other parameters in derived classes to allow the middleware to be configured.

Warning: It is important you carefully read the documentation for the __init__() and setup() methods to understand where to configure variables.

The class defines the following attributes:

application
The WSGI application (or middleware stack) to which this middleware should be added.

The class defines the following methods:

__init__( application)
You can override the __init__() method but the first parameter should always be for the application object. Parameters used to configure the class at load time can be specified in the __init__() method but any variables which need to be reset every time the middleware is used should be specified in the setup() method. This is because a WSGI server only loads the middleware once but runs it lots of times so if a variable is specified in the __init__() method it would only be set once and on subsequent calls would retain the value from the previous call.

setup( )
The setup() method is used to configure any class attributes which need to be configured every time the middleare is run and not just when the middleare is loaded. See the documentation for the __init__() method too.

__call__( environ, start_response)
You should not need to modify this method but is documented here for a complete understanding as it provides the functionality which makes derived classes WSGI middleware.

The fitst task of this method is to call setup() to re-initialise any variables which need to be set every time the class is run. It then intercepts the environ dictionary as well as the headers and status parameters sent by the WSGI server to the start_response() function. It then sends the environ dictionary to the environ() method for modification. The status, headers and exc_info parameters are sent to the response() method which controls the order in which the different parameters are modified. The response() method sends the parameters to the status(), headers and exc_info() methods for modification. The new values are then returned to the __call__ where a modified application object is returned.

response( status, headers, [exc_info=None])
Calls the status(), headers and exc_info() methods to modify the respective parameters then returns the modified values in the order status, headers, exc_info to the __call__() method. Can be over-ridden to change the order in which the parameters are modified.

environ( environ)
Provides the dictionary environ for modification. Must return the environ dictionary to be passed on down the middleware chain.

status( status)
Provides the status string for modification. Must return the status string to be passed on down the middleware chain.

headers( headers)
Provides the headers list for modification. Must return the headers list to be passed on down the middleware chain.

exc_info( exc_info)
Provides the exc_info tuple object generated by a previous error (if one exists) for modification. Must return the exc_info tuple to be passed on down the middleware chain.

result( result)
Used to transform the body of output returned from the previous item in the middleware stack.

Be aware that you may need to have checked content-type headers and change the content length header if it is set if you intend to change the length of the returned information.

result is an iterable and an iterable should be returned from the output.

To produce your own middleware class, simply over-ride the appropriate methods in your class derived from the BaseMiddleware class. If you wish to pass information between the various methods, you should set member variables of the class which can then be read by all the methods. You can change the order in which some of the methods are called by overriding response() and calling the methods in the order you wish.

For some examples of how to write middleware comonents using this class look at the source code of the web.wsgi middleware classes.