request / response

a blog about the web, Go, and building things

(by Matt Silverlock)


Using Buffer Pools with Go

•••

Buffers are extremely useful in Go and I’ve written a little about them before.

Part of that has been around rendering HTML templates: ExecuteTemplate returns an error, but if you’ve passed it your http.ResponseWriter it’s too late to do anything about the error. The response is gone and you end up with a malformed page. You might also use a buffer when creating a json.NewEncoder for encoding (marshalling) into before writing out bytes to the wire—another case where you want to catch the error before writing to the response.

Here’s a quick refresher:

buf := new(bytes.Buffer)
// Write to the buffer first so we can catch the error
err := template.ExecuteTemplate(buf, "forms/create.html", user)
// or err := json.NewEncoder(buf).Encode(value)
if err != nil {
    return err
}

buf.WriteTo(w)

In this case (and the JSON case) however, we’re creating and then implicitly throwing away a temporary buffer when the function exits. This is wasteful, and because we need a buffer on every request, we’re just adding an increasing amount of garbage collector (GC) pressure by generating garbage that we might be able to avoid.

So we use a buffer pool—otherwise known as a free list or leaky buffer—that maintains a pool of buffers that we get and put from as needed. The pool will attempt to issue an existing buffer (if one exists) else it will create one for us, and it will optionally discard any buffers after the pool reaches a certain size to keep it from growing unbounded. This has some clear benefits, including:

  • Trading some additional static memory usage (pre-allocation) in exchange for reduced pressure on the garbage collector (GC)
  • Reducing ad-hoc makeSlice calls (and some CPU hits as a result) from re-sizing fresh buffers on a regular basis—the buffers going back into the pool have already been grown

So a buffer pool definitely has its uses. We could implement this by creating a chan of bytes.Buffers that we Get() and Put() from/to. We also set the size on the channel, allowing us to discard excess buffers when our channel is full, avoiding repeated busy-periods from blowing up our memory usage. We’re also still free to issue additional buffers beyond the size of the pool (when business is good), knowing that they’ll be dropped when the pool is full. This is simple to implement and already nets us some clear benefits over throwing away a buffer on every request.

Enter SizedBufferPool

But there’s a slight quirk: if we do have the odd “large” response, that (now large) buffer returns to the pool and the extra memory we allocated for it isn’t released until that particular buffer is dropped. That would only occur if we had to give out more buffers than our pool was initially sized for. This may not always be true if our concurrent requests for buffers don’t exceed the total number of buffers in the pool. Over enough requests, we’re likely to end up with a number of buffers in the pool sized for our largest responses and consuming (wasting) additional memory as a result.

Further, all of our initial (“cold”) buffers might require a few rounds of makeSlice to resize (via copying) into a final buffer large enough to hold our content. It’d be nice if we could avoid this as well by setting the capacity of our buffers on creation, making the memory usage of our application over time more consistent. The typical response size across requests within a web service is unlikely to vary wildly in size either, so “pre-warming” our buffers is a useful trick.

Let’s see how we can address these concerns—which is thankfully pretty straightforward:

package bpool

type SizedBufferPool struct {
    c chan *bytes.Buffer
    a int
}

// SizedBufferPool creates a new BufferPool bounded to the given size.
// size defines the number of buffers to be retained in the pool and alloc sets
// the initial capacity of new buffers to minimize calls to make().
func NewSizedBufferPool(size int, alloc int) (bp *SizedBufferPool) {
    return &SizedBufferPool{
        c: make(chan *bytes.Buffer, size),
        a: alloc,
    }
}

// Get gets a Buffer from the SizedBufferPool, or creates a new one if none are
// available in the pool. Buffers have a pre-allocated capacity.
func (bp *SizedBufferPool) Get() (b *bytes.Buffer) {
    select {
    case b = <-bp.c:
        // reuse existing buffer
    default:
        // create new buffer
        b = bytes.NewBuffer(make([]byte, 0, bp.a))
    }
    return
}

// Put returns the given Buffer to the SizedBufferPool.
func (bp *SizedBufferPool) Put(b *bytes.Buffer) {
    b.Reset()

    // Release buffers over our maximum capacity and re-create a pre-sized
    // buffer to replace it.
    if cap(b.Bytes()) > bp.a {
        b = bytes.NewBuffer(make([]byte, 0, bp.a))
    }

    select {
    case bp.c <- b:
    default: // Discard the buffer if the pool is full.
    }
}

This isn’t a significant deviation from the simple implementation and is the code I pushed to the (ever-useful) oxtoacart/bpool package on GitHub.

  • We create buffers as needed (providing one from the pool first), except we now pre-allocate buffer capacity based on the alloc param we provided when we created the pool.
  • When a buffer is returned via Put we reset it (discard the contents) and then check the capacity.
  • If the buffer capacity has grown beyond our defined maximum, we discard the buffer itself and re-create a new buffer in its place before returning that to the pool. If not, the reset buffer is recycled into the pool.

Note: dominikh pointed out a new buffer.Cap() method coming in Go 1.5 which is a different from calling cap(b.Bytes()). The latter returns the capacity of the unread (see this CL) portion of the buffer’s underlying slice, which may not be the total capacity if you’ve read from it during its lifetime. This doesn’t affect our implementation however as we call b.Reset() (which resets the read offset) before we check the capacity, which means we get the “correct” (full) capacity of the underlying slice.

Setting the Right Buffer Size

What would be especially nice is if we could pre-set the size of our buffers to represent our real-world usage so we’re not just estimating it.

So: how do we determine what our usage is? If you have test data that’s representative of your production data, a simple approach might be to collect the buffer sizes used throughout our application (i.e. your typical HTTP response body) and calculate an appropriate size.

Approaches to this would include:

  • Measuring the (e.g.) 80th percentile Content-Length header across your application. This solution can be automated by hitting your routes with a http.Client and analysing the results from resp.Header.Get("Content-Length").
  • Instrumenting your application and measure the capacity of your buffers before returning them to the pool. Set your starting capacity to a low value, and then call buf.Reset() and cap(buf.Bytes()) as we did in the example above. Write the output to a log file (simple) or aggregate them into a structure safe for concurrent writes that can be analysed later.

Determining whether to set the value as the average (influenced by outliers), median or an upper percentile will depend on the architecture of your application and the memory characteristics you’re after. Too low and you’ll increase GC pressure by discarding a greater number of buffers, but too high and you’ll increase static memory usage.

Postscript

We now have an approach to more consistent memory use that we can take home with us and use across our applications.

  • You can import and use the SizedBufferPool from the oxtoacart/bpool package (mentioned previously). Just go get -u github.com/oxtoacart/bpool and then call bufPool := bpool.NewSizedBufferPool(x, y) to create a new pool. Make sure to measure the size of the objects you’re storing into the pool to help guide the per-buffer capacity.
  • Worth reading is CloudFlare’s “Recycling Memory Buffers in Go” article that talks about an alternative approach to re-usable buffer pools.

It’s worth also mentioning Go’s own sync.Pool type that landed in Go 1.3, which is a building block for creating your own pool. The difference is that it handles dynamic resizing of the pool (rather than having you define a size) and discards objects between GC runs.

In contrast, the buffer pool in this article retains objects and functions as a free list and explicitly zeroes (resets) the contents of each buffer (meaning they are safe to use upon issue), as well as discarding those that have grown too large. There’s a solid discussion on the go-nuts list about sync.Pool that covers some of the implementation quirks.


http.Handler and Error Handling in Go

•••

I wrote an article a while back on implementing custom handler types to avoid a few common problems with the existing http.HandlerFunc—the func MyHandler(w http.ResponseWriter, r *http.Request) signature you often see. It’s a useful “general purpose” handler type that covers the basics, but—as with anything generic—there are a few shortcomings:

  • Having to remember to explicitly call a naked return when you want to stop processing in the handler. This is a common case when you want to raise a re-direct (301/302), not found (404) or internal server error (500) status. Failing to do so can be the cause of subtle bugs (the function will continue) and because the function signature doesn’t require a return value, the compiler won’t alert you.
  • You can’t easily pass in additional arguments (i.e. database pools, configuration values). You end up having to either use a bunch of globals (not terrible, but tracking them can scale poorly) or stash those things into a request context and then type assert each of them out. Can be clunky.
  • You end up repeating yourself. Want to log the error returned by your DB package? You can either call log.Printf in your database package (in each query func), or in every handler when an error is returned. It’d be great if your handlers could just return that to a function that centrally logs errors and raise a HTTP 500 on the ones that call for it.

My previous approach used the func(http.ResponseWriter, *http.Request) (int, error) signature. This has proven to be pretty neat, but a quirk is that returning “non error” status codes like 200, 302, 303 was often superfluous—you’re either setting it elsewhere or it’s effectively unused - e.g.

func SomeHandler(w http.ResponseWriter, r *http.Request) (int, error) {
    db, err := someDBcall()
    if err != nil {
        // This makes sense.
        return 500, err
    }

    if user.LoggedIn {
        http.Redirect(w, r, "/dashboard", 302)
        // Superfluous! Our http.Redirect function handles the 302, not 
        // our return value (which is effectively ignored).
        return 302, nil
    }

}

It’s not terrible, but we can do better.

A Little Different

So how can we improve on this? Let’s lay out some code:

package handler

// Error represents a handler error. It provides methods for a HTTP status 
// code and embeds the built-in error interface.
type Error interface {
	error
	Status() int
}

// StatusError represents an error with an associated HTTP status code.
type StatusError struct {
	Code int
	Err  error
}

// Allows StatusError to satisfy the error interface.
func (se StatusError) Error() string {
	return se.Err.Error()
}

// Returns our HTTP status code.
func (se StatusError) Status() int {
	return se.Code
}

// A (simple) example of our application-wide configuration.
type Env struct {
	DB   *sql.DB
	Port string
	Host string
}

// The Handler struct that takes a configured Env and a function matching
// our useful signature.
type Handler struct {
	*Env
	H func(e *Env, w http.ResponseWriter, r *http.Request) error
}

// ServeHTTP allows our Handler type to satisfy http.Handler.
func (h Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	err := h.H(h.Env, w, r)
	if err != nil {
		switch e := err.(type) {
		case Error:
			// We can retrieve the status here and write out a specific
			// HTTP status code.
			log.Printf("HTTP %d - %s", e.Status(), e)
			http.Error(w, e.Error(), e.Status())
		default:
			// Any error types we don't specifically look out for default
			// to serving a HTTP 500
			http.Error(w, http.StatusText(http.StatusInternalServerError),
				http.StatusInternalServerError)
		}
	}
}

The code above should be self-explanatory, but to clarify any outstanding points:

  • We create a custom Error type (an interface) that embeds Go’s built-in error interface and also has a Status() int method.
  • We provide a simple StatusError type (a struct) that satisfies our handler.Error type. Our StatusError type accepts a HTTP status code (an int) and an error that allows us to wrap the root cause for logging/inspection.
  • Our ServeHTTP method contains a type switch—which is the e := err.(type) part that tests for the errors we care about and allows us to handle those specific cases. In our example that’s just the handler.Error type. Other error types—be they from other packages (e.g. net.Error) or additional error types we have defined—can also be inspected (if we care about their details).

If we don’t want to inspect them, our default case catches them. Remember that the ServeHTTP method allows our Handler type to satisfy the http.Handler interface and be used anywhere http.Handler is accepted: Go’s net/http package and all good third party frameworks. This is what makes custom handler types so useful: they’re flexible about where they can be used.

Note that the net package does something very similar. It has a net.Error interface that embeds the built-in error interface and then a handful of concrete types that implement it. Functions return the concrete type that suits the type of error they’re returning (a DNS error, a parsing error, etc). A good example would be defining a DBError type with a Query() string method in a ‘datastore’ package that we can use to log failed queries.

Full Example

What does the end result look like? And how would we split it up into packages (sensibly)?

package handler

import (
    "net/http"
)

// Error represents a handler error. It provides methods for a HTTP status 
// code and embeds the built-in error interface.
type Error interface {
	error
	Status() int
}

// StatusError represents an error with an associated HTTP status code.
type StatusError struct {
	Code int
	Err  error
}

// Allows StatusError to satisfy the error interface.
func (se StatusError) Error() string {
	return se.Err.Error()
}

// Returns our HTTP status code.
func (se StatusError) Status() int {
	return se.Code
}

// A (simple) example of our application-wide configuration.
type Env struct {
	DB   *sql.DB
	Port string
	Host string
}

// The Handler struct that takes a configured Env and a function matching
// our useful signature.
type Handler struct {
	*Env
	H func(e *Env, w http.ResponseWriter, r *http.Request) error
}

// ServeHTTP allows our Handler type to satisfy http.Handler.
func (h Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	err := h.H(h.Env, w, r)
	if err != nil {
		switch e := err.(type) {
		case Error:
			// We can retrieve the status here and write out a specific
			// HTTP status code.
			log.Printf("HTTP %d - %s", e.Status(), e)
			http.Error(w, e.Error(), e.Status())
		default:
			// Any error types we don't specifically look out for default
			// to serving a HTTP 500
			http.Error(w, http.StatusText(http.StatusInternalServerError),
				http.StatusInternalServerError)
		}
	}
}

func GetIndex(env *Env, w http.ResponseWriter, r *http.Request) error {
    users, err := env.DB.GetAllUsers()
    if err != nil {
        // We return a status error here, which conveniently wraps the error
        // returned from our DB queries. We can clearly define which errors 
        // are worth raising a HTTP 500 over vs. which might just be a HTTP 
        // 404, 403 or 401 (as appropriate). It's also clear where our 
        // handler should stop processing by returning early.
        return StatusError{500, err}
    }

    fmt.Fprintf(w, "%+v", users)
    return nil
}

… and in our main package:

package main

import (
    "net/http"
    "github.com/you/somepkg/handler"
)

func main() {
    db, err := sql.Open("connectionstringhere")
    if err != nil {
          log.Fatal(err)
    }

    // Initialise our app-wide environment with the services/info we need.
    env := &handler.Env{
        DB: db,
        Port: os.Getenv("PORT"),
        Host: os.Getenv("HOST"),
        // We might also have a custom log.Logger, our 
        // template instance, and a config struct as fields 
        // in our Env struct.
    }

    // Note that we're using http.Handle, not http.HandleFunc. The 
    // latter only accepts the http.HandlerFunc type, which is not 
    // what we have here.
    http.Handle("/", handler.Handler{env, handler.GetIndex})

    // Logs the error if ListenAndServe fails.
    log.Fatal(http.ListenAndServe(":8000", nil))
}

In the real world, you’re likely to define your Handler and Env types in a separate file (of the same package) from your handler functions, but I’ve keep it simple here for the sake of brevity. So what did we end up getting from this?

  • A practical Handler type that satisfies http.Handler can be used with frameworks like net/http, gorilla/mux, Goji and any others that sensibly accept a http.Handler type.
  • Clear, centralised error handling. We inspect the errors we want to handle specifically—our handler.Error type—and fall back to a default for generic errors. If you’re interested in better error handling practices in Go, read Dave Cheney’s blog post, which dives into defining package-level Error interfaces.
  • A useful application-wide “environment” via our Env type. We don’t have to scatter a bunch of globals across our applications: instead we define them in one place and pass them explicitly to our handlers.

If you have questions about the post, drop me a line via @elithrar on Twitter, or the Gopher community on Slack.


simple-scrypt

•••

simple-scrypt is a convenience wrapper around Go’s existing scrypt library. The existing library has a limited API and doesn’t facilitate generating salts, comparing keys or retrieving the parameters used to generate a key. The last point is a limitation of the scrypt specification, which doesn’t enforce this by default. Using Go’s bcrypt library as inspiration, I pulled together a more complete scrypt package with some “sane defaults” to make it easier to get started. The library provides functionality to provide your own parameters via the scrypt.Params type, and the public API should be rock solid (I’m planning to tag a v1.0 very soon).

scrypt itself, for those that don’t know, is a memory-hard key derivation function (KDF) entirely suitable for deriving strong keys from ‘weak’ input (i.e. user passwords). It’s often described as a way to ‘hash’ passwords, but unlike traditional hashes (SHA-1, the SHA-2 family, etc.) that are designed to be fast, it’s designed to be “configurably” slow. This makes it ideal for storing user passwords in a way that makes it very hard to brute force or generate rainbow tables against.

Here’s an example of how to get started with it for deriving strong keys from user passwords (e.g. via a web form):

package main

import(
    "fmt"
    "log"

    "github.com/elithrar/simple-scrypt"
)

func main() {
    // e.g. r.PostFormValue("password")
    passwordFromForm := "prew8fid9hick6c"

    // Generates a derived key of the form "N$r$p$salt$dk" where N, r and p are defined as per
    // Colin Percival's scrypt paper: http://www.tarsnap.com/scrypt/scrypt.pdf
    // scrypt.Defaults (N=16384, r=8, p=1) makes it easy to provide these parameters, and
    // (should you wish) provide your own values via the scrypt.Params type.
    hash, err := scrypt.GenerateFromPassword([]byte(passwordFromForm), scrypt.DefaultParams)
    if err != nil {
        log.Fatal(err)
    }

    // Print the derived key with its parameters prepended.
    fmt.Printf("%s\n", hash)

    // Uses the parameters from the existing derived key. Return an error if they don't match.
    err := scrypt.CompareHashAndPassword(hash, []byte(passwordFromForm))
    if err != nil {
        log.Fatal(err)
    }
}

The package also provides functions to compare a password with an existing key using scrypt.CompareHashAndPassword and to retrieve the parameters used in a previously generated key via scrypt.Cost. The latter is designed to make it easy to upgrade parameters as hardware improves.

Pull requests are welcome, and I have a few things on the to-do list to make it configurable based on hardware performance.


Running Go Applications in the Background

•••

A regular question on the go-nuts mailing list, in the #go-nuts IRC channel and on StackOverflow seems to be: how do I run my Go application in the background? Developers eventually reach the stage where they need to deploy something, keep it running, log it and manage crashes. So where to start?

There’s a huge number of options here, but we’ll look at a stable, popular and cross-distro approach called Supervisor. Supervisor is a process management tool that handles restarting, recovering and managing logs, without requiring anything from your application (i.e. no PID files!).

Pre-Requisites

We’re going to assume a basic understanding of the Linux command line, which in this case is understanding how to use a text-editor like vim, emacs or even nano, and the importance of not running your application as root—which I will re-emphasise throughout this article! We’re also going to assume you’re on an Ubuntu 14.04/Debian 7 system (or newer), but I’ve included a section for those on RHEL-based systems.

I should also head off any questions about daemonizing (i.e. the Unix meaning of daemonize) Go applications due to interactions with threaded applications and most systems (aka Issue #227).

Note: I’m well aware of the “built in” options like Upstart (Debian/Ubuntu) and systemd (CentOS/RHEL/Fedora/Arch). I’d even originally wrote this article so that it provided examples for all three options, but it wasn’t opinionated enough and was therefore confusing for newcomers (at whom this article is aimed at).

For what it’s worth, Upstart leans on start-stop-daemon too much for my liking (if you want it to work across versions), and although I really like systemd’s configuration language, my primary systems are running Debian/Ubuntu LTS so it’s not a viable option (until next year!). Supervisor’s cross-platform nature, well documented configuration options and extra features (log rotation, email notification) make it well suited to running production applications (or even just simple side-projects).

Installing Supervisor

I’ve been using Supervisor for a long while now, and I’m a big fan of it’s centralised approach: it will monitor your process, restart it when it crashes, redirect stout to a log file and rotate that all within a single configuration.

There’s no need to write a separate logrotated config, and there’s even a decent web-interface (that you should only expose over authenticated HTTPS!) included. The project itself has been around 2004 and is well maintained.

Anyway, let’s install it. The below will assume Ubuntu 14.04, which has a recent (>= 3.0) version of Supervisor. If you’re running an older version of Ubuntu, or an OS that doesn’t package a recent version of Supervisor, it may be worth installing it via pip and writing your own Upstart/systemd service file.

$ sudo apt-get install supervisor

Now, we also want our application user to be able to invoke supervisorctl (the management interface) as necessary, so we’ll need to create a supervisor group, make our user a member of that group and modify Supervisor’s configuration file to give the supervisor group the correct permissions on the socket.

$ sudo addgroup --system supervisor
# i.e. 'sudo adduser deploy supervisor'
$ sudo adduser <yourappuser> supervisor
$ logout
# Log back in and confirm which should now list 'supervisor':
$ groups

That’s the group taken care of. Let’s modify the Supervisor configuration file to take this into account:

[unix_http_server]
file=/var/run/supervisor.sock   
chmod=0770                       # ensure our group has read/write privs
chown=root:supervisor            # add our group

[supervisord]
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
childlogdir=/var/log/supervisor

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///var/run/supervisor.sock

[include]
files = /etc/supervisor/conf.d/*.conf # default location on Ubuntu

And now we’ll restart Supervisor:

$ sudo service supervisor restart

If it doesn’t restart, check the log with the below:

$ sudo tail /var/log/supervisor/supervisord.log

Typos are the usual culprit here. Otherwise, with the core configuration out of the way, let’s create a configuration for our Go app.

Configuring It

Supervisor is infinitely configurable, but we’ll aim to keep things simple. Note that you will need to modify the configuration below to suit your application: I’ve commented the lines you’ll need to change.

Create a configuration file at the default (Ubuntu) includes directory:

# where 'mygoapp' is the name of your application
$ sudo vim /etc/supervisor/conf.d/mygoapp.conf 

… and pull in the below:

[program:yourapp]
command=/home/yourappuser/bin/yourapp # the location of your app
autostart=true
autorestart=true
startretries=10
user=yourappuser # the user your app should run as (i.e. *not* root!)
directory=/srv/www/yourapp.com/ # where your application runs from
environment=APP_SETTINGS="/srv/www/yourapp.com/prod.toml" # environmental variables
redirect_stderr=true
stdout_logfile=/var/log/supervisor/yourapp.log # the name of the log file.
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=10

Let’s step through it:

  • user is who we want the application to run as. I typically create a “deploy” user for this purpose. We should never run an Internet-facing application as root, so this is arguably the most important line of our configuration.
  • logfile_maxbytes and logfile_backups handle log rotation for us. This saves us having to learn another configuration language and keeps our configuration in one place. If your application generates a lot of logs (say, HTTP request logs) then it may be worth pushing maxbytes up a little.
  • autostart runs our program when supervisord starts (on system boot)
  • autorestart=true will restart our application regardless of the exit code.
  • startretries will attempt to restart our application if it crashes.
  • environment defines the environmetal variables to pass to the application. In this case, we tell it where the settings file is (a TOML config file, in my case).
  • redirect_stderr will re-direct error output to our log file. You can keep a separate error log if your application generates significant amounts of log data (i.e. HTTP requests) via stdout.

Now, let’s reload Supervisor so it picks up our app’s config file, and check that it’s running as expected:

$ supervisorctl reload
$ supervisorctl status yourapp

We should see a “running/started” message and our application should be ready to go. If not, check the logs in /var/log/supervisor/supervisord.log or run supervisorctl tail yourapp to show our application logs. A quick Google for the error message will go a long way if you get stuck.

Fedora/CentOS/RHEL

If you’re running CentOS 7 or Fedora 20, the directory layout is a little different than Ubuntu’s (rather, Ubuntu has a non-standard location), so keep that in mind. Specifically:

  • The default configuration file lives at /etc/supervisord.conf
  • The includes directory lives at /etc/supervisord.d/

Otherwise, Supervisor is much the same: you’ll need to install it, create a system group, add your user to the group, and then update the config file and restart the service using sudo systemctl restart supervisord.

Summary

Pretty easy, huh? If you’re using a configuration management tool (i.e. Ansible, Salt, et. al) for your production machines, then it’s easy to automate this completely, and I definitely recommend doing so. Being able to recreate your production environment like-for-like after a failure (or moving hosts, or just for testing) is a Big Deal and worth the time investment.

It’s also easy to see from this guide how easy it is to add more Go applications to Supervisor’s stable: add a new configuration file, reload Supervisor, and off you go. You can choose how aggressive restarts need to be, log rotations and environmental variables on a per-application basis, which is always useful.


HTTP Request Contexts & Go

•••

Alternatively titled map[string]interface.

Request contexts, for those new to the terminology, are typically a way to pass data alongside a HTTP request as it is processed by composable handlers (or middleware). This data could be a user ID, a CSRF token, whether a user is logged in or not—something typically derived from logic that you don’t want to repeat over-and-over again in every handler. If you’ve ever used Django, the request context is synonymous with the request.META dictionary.

As an example:


func CSRFMiddleware(http.Handler) http.Handler {
    return func(w http.ResponseWriter, r *http.Request) {
        maskedToken, err := csrf.GenerateNewToken(r)
        if err != nil {
			http.Error(w, "No good!", http.StatusInternalServerError)
			return
        }

        // How do we pass the maskedToken from here...
    }
}

func MyHandler(w http.ResponseWriter, r *http.Request) {
    // ... to here, without relying on the overhead of a session store,
    // and without either handler being explicitly tied to the other?
    // What about a CSRF token? Or an auth-key from a request header?
    // We certainly don't want to re-write that logic in every handler!
}

There’s three ways that Go’s web libraries/frameworks have attacked the problem of request contexts:

  1. A global map, with *http.Request as the key, mutexes to synchronise writes, and middleware to cleanup old requests (gorilla/context)

  2. A strictly per-request map by creating custom handler types (goji)
  3. Structs, and creating middleware as methods with pointer receivers or passing the struct to your handlers (gocraft/web).

So how do these approaches differ, what are the benefits, and what are the downsides?

Global Context Map

gorilla/context’s approach is the simplest, and the easiest to plug into an existing architecture.

Gorilla actually uses a map[interface{}]interface{}, which means you need to (and should) create types for your keys. The benefit is that you can use any types that support equality as a key; the downside is that you need to implement your keys in advance if you want to avoid any run-time issues with key types.

You also often want to create setters for the types you store in the context map, to avoid littering your handlers with the same type assertions.


import (
    "net/http"
    "github.com/gorilla/context"
)

type contextKey int

// Define keys that support equality.
const csrfKey contextKey = 0
const userKey contextKey = 1

var ErrCSRFTokenNotPresent = errors.New("CSRF token not present in the request context.")

// We'll need a helper function like this for every key:type
// combination we store in our context map else we repeat this
// in every middleware/handler that needs to access the value.
func GetCSRFToken(r *http.Request) (string, error) {
	val, ok := context.GetOk(r, csrfKey)
	if !ok {
		return "", ErrCSRFTokenNotPresent
	}

	token, ok := val.(string)
	if !ok {
		return "", ErrCSRFTokenNotPresent
	}

	return token, nil
}

// A bare-bones example
func CSRFMiddleware(h http.Handler) http.Handler {
    return func(w http.ResponseWriter, r *http.Request) {
        token, err := GetCSRFToken(r)
        if err != nil {
            http.Error(w, "No good!", http.StatusInternalServerError)
            return
        }

        // The map is global, so we just call the Set function
        context.Set(r, csrfKey, token)

        h.ServeHTTP(w, r)
    }
}

func ShowSignupForm(w http.ResponseWriter, r *http.Request) {
	// We'll use our helper function so we don't have to type assert
	// the result in every handler that triggers/handles a POST request.
	csrfToken, err := GetCSRFToken(r)
	if err != nil {
		http.Error(w, "No good!", http.StatusInternalServerError)
		return
	}

	// We can access this token in every handler we wrap with our
	// middleware. No need to set/get from a session multiple times per
	// request (which is slow!)
	fmt.Fprintf(w, "Our token is %v", csrfToken)
}


func main() {
	r := http.NewServeMux()
	r.Handle("/signup", CSRFMiddleware(http.HandlerFunc(ShowSignupForm)))
	// Critical that we call context.ClearHandler here, else
    // we leave old requests in the map.
	http.ListenAndServe("localhost:8000", context.ClearHandler(r))
}

Full Example

The plusses? It’s flexible, loosely coupled, and easy for third party packages to use. You can tie it into almost any net/http application since all you need is access to http.Request—the rest relies on the global map.

The downsides? The global map and its mutexes may result in contention at high loads, and you need to call context.Clear() at the end of every request (i.e. on each handler). Forget to do that (or wrap your top-level server handler) and you’ll open yourself up to a memory leak where old requests remain in the map. If you’re writing middleware that uses gorilla/context, then you need to make sure your package user imports context calls context.ClearHandler on their handlers/router.

Per Request map[string]interface

As another take, Goji provides a request context as part of an (optional) handler type that embeds Go’s usual http.Handler. Because it’s tied to Goji’s (fast) router implementation, it no longer needs to be a global map and avoids the need for mutexes.

Goji provides a web.HandlerFunc type that extends the default http.HandlerFunc with a request context: func(c web.C, w http.ResponseWriter, r *http.Request).


var ErrTypeNotPresent = errors.New("Expected type not present in the request context.")

// A little simpler: we just need this for every *type* we store.
func GetContextString(c web.C, key string) (string, error) {
	val, ok := c.Env[key].(string)
	if !ok {
		return "", ErrTypeNotPresent
	}

	return val, nil
}

// A bare-bones example
func CSRFMiddleware(c *web.C, h http.Handler) http.Handler {
	fn := func(w http.ResponseWriter, r *http.Request) {
		maskedToken, err := GenerateToken(r)
		if err != nil {
			http.Error(w, "No good!", http.StatusInternalServerError)
			return
		}

		// Goji only allocates a map when you ask for it.
        if c.Env == nil {
			c.Env = make(map[string]interface{})
		}

		// Not a global - a reference to the context map
		// is passed to our handlers explicitly.
		c.Env["csrf_token"] = maskedToken

		h.ServeHTTP(w, r)
	}

	return http.HandlerFunc(fn)
}

// Goji's web.HandlerFunc type is an extension of net/http's
// http.HandlerFunc, except it also passes in a request
// context (aka web.C.Env)
func ShowSignupForm(c web.C, w http.ResponseWriter, r *http.Request) {
	// We'll use our helper function so we don't have to type assert
	// the result in every handler.
	csrfToken, err := GetContextString(c, "csrf_token")
	if err != nil {
		http.Error(w, "No good!", http.StatusInternalServerError)
		return
	}

	// We can access this token in every handler we wrap with our
	// middleware. No need to set/get from a session multiple times per
	// request (which is slow!)
	fmt.Fprintf(w, "Our token is %v", csrfToken)
}

Full Example

The biggest immediate gain is the performance improvement, since Goji only allocates a map when you ask it to: there’s no global map with locks. Note that for many applications, your database or template rendering will be the bottleneck (by far), so the “real” impact is likely pretty small, but it’s a sensible touch.

Most useful is that you retain the ability to write modular middleware that doesn’t need further information about your application: if you want to use the request context, you can do so, but for anything else it’s just http.Handler. The downside is that you still need to type assert anything you retrieve from the context, although like gorilla/context we can simplify this by writing helper functions. A map[string]interface{} also restricts us to string keys: simpler for most (myself included), but potentially less flexible for some.

Context Structs

A third approach is to initialise a struct per-request and define our middleware/handler as methods on the struct. The big plus here is type-safety: we explicitly define the fields of our request context, and so we know the type (unless we do something naive like setting a field to interface{}).

Of course, what you gain in type safety you lose in flexibility. You can’t create “modular” middleware that uses the popular func(http.Handler) http.Handler pattern, because that middleware can’t know what your request context struct looks like. It could provide it’s own struct that you embed into yours, but that still doesn’t solve re-use: not ideal. Still, it’s a good approach: no need to type assert things out of interface{}.


import (
	"fmt"
	"log"
	"net/http"

	"github.com/gocraft/web"
)

type Context struct {
	CSRFToken string
	User      string
}

// Our middleware *and* handlers must be defined as methods on our context struct,
// or accept the type as their first argument. This ties handlers/middlewares to our
// particular application structure/design.
func (c *Context) CSRFMiddleware(w web.ResponseWriter, r *web.Request, next web.NextMiddlewareFunc) {
	token, err := GenerateToken(r)
	if err != nil {
		http.Error(w, "No good!", http.StatusInternalServerError)
		return
	}

	c.CSRFToken = token
	next(w, r)
}

func (c *Context) ShowSignupForm(w web.ResponseWriter, r *web.Request) {
	// No need to type assert it: we know the type.
	// We can just use the value directly.
	fmt.Fprintf(w, "Our token is %v", c.CSRFToken)
}

func main() {
	router := web.New(Context{}).Middleware((*Context).CSRFMiddleware)
	router.Get("/signup", (*Context).ShowSignupForm)

	err := http.ListenAndServe(":8000", router)
	if err != nil {
		log.Fatal(err)
	}
}

Full Example

The plus here is obvious: no type assertions! We have a struct with concrete types that we initialise on every request and pass to our middleware/handlers. But the downside is that we can no longer “plug and play” middleware from the community, because it’s not defined on our own context struct.

We could anonymously embed their type into ours, but that starts to become pretty messy and doesn’t help if their fields share the same names as our own. The real solution is to fork and modify the code to accept your struct, at the cost of time/effort. gocraft/web also wraps the ResponseWriter interface/Request struct with its own types, which ties things a little more closely to the framework itself.

How Else?

One suggestion would be to provide a Context field on Go’s http.Request struct, but actually implementing it in a “sane” way that suits the common use case is easier said than done.

The field would likely end up being a map[string]interface{} (or with interface{} as the key). This means that we either need to initialise the map for users—which won’t be useful on all of those requests where you don’t need to use the request context. Or require package users to check that the map is initialised before using it, which can be a big “gotcha” for newbies who will wonder (at first) why their application panics on some requests but not others.

I don’t think these are huge barriers unto themselves, but Go’s strong preference for being clear and understandable—at the cost of a little verbosity now and then—is potentially at odds with this approach. I also don’t believe that having options in the form of third-party packages/frameworks is a Bad Thing either: you choose the approach that best fits your idioms or requirements.

Wrap

So which approach should you choose for your own projects? It’s going to depend on your use-case (as always). Writing a standalone package and want to provide a request context that the package user can easily access? gorilla/context is probably going to be a good fit (just document the need to call ClearHandler!). Writing something from scratch, or have a net/http app you want to extend easily? Goji is easy to drop in. Starting from nothing? gocraft/web’s “inclusive” approach might fit.

Personally, I like Goji’s approach: I don’t mind writing a couple of helpers to type-assert things I commonly store in a request context (CSRF tokens, usernames, etc), and I get to avoid the global map. It’s also easy for me to write middleware that others can plug into their applications (and for me to use theirs). But those are my use cases, so do your research first!



© 2018 Matt Silverlock | His other site | Code snippets are MIT licensed | Built with Jekyll