XMPP in Go

23 Jun, 2020

Note: This post is about the mellium.im/xmpp project, an XMPP library for Go. Some parts of this post may apply exclusively to v0.16.0. Because this version is pre-1.0, things may change by the time you read this. The most up-to-date version of this document can always be found at mellium.im/docs/overview.

Go is a great language if you need to get a project off the ground quickly. It has its confusing aspects, and its type system allows for lots of abuse thanks to optional dynamic typing, but overall it’s easy to read and easy to quickly build projects that require clear code over absolute type safety. Similarly, XMPP¹ has its warts, but overall is the best choice to get a chat product off the ground quickly if you want a system that’s well understood and has a robust ecosystem and sustainable standards body.

Since Go shines at handling I/O bound services (like asynchronous network protocols used for instant messaging), an XMPP library in Go seems like a great fit. There are a handful of libraries to handle XMPP already in existence, but most of them are small high-level libraries designed only to work with the legacy version of XMPP that was supported by Google Talk, or don’t follow Go idioms and best practices. When I started looking into a Go XMPP implementation around 5 years ago, there wasn’t a low-level library meant to act as a building block from which higher level systems could be created, and that’s what I wanted: the equivalent of the standard libraries net/http but for XMPP. This is why I created mellium.im/xmpp. This post will be about some of the design decisions I made while building the library, and about some of the trade offs made along the way.

Stream Features

Let’s start by talking about feature negotiation. An XMPP session can broadly be divided up into two parts: the synchronous initial handshake, and the actual asynchronous session. Within this initial handshake, a series of common features are negotiated in a certain order. For example, if TLS isn’t already in use, opportunistic TLS (StartTLS) might be negotiated, followed by authentication. This ends up being a loop where the server sends any features it wants to advertise at the current moment (eg. just TLS) then the client chooses one to negotiate and proceeds with that features specific negotiation steps. Then the server sends another list (possibly with new features, eg. now that we have TLS negotiated the server might advertise that authentication is now ready to proceed) and the client selects one and moves forward. This loop is easy enough to write in Go, but representing the features themselves was tricky. Features need to be able to encode the name they go by when the server lists them and any information that should be included in that listing, they need to be able to parse that payload from the clients side, and the actual negotiation from the server and clients side needs to happen. To handle this a struct containing three functions for listing features, parsing the features list, and negotiating the features was created. This means less boilerplate and more type safety than using an interface to represent a stream feature. It also makes it less likely that a user of this API will get confused and write stateful stream features, but if necessary the functions can still close over external state or resources (but don’t do this, you may think you need it, but you’re almost certainly wrong).

type StreamFeature struct {
	Name xml.Name

	Necessary SessionState
	Prohibited SessionState

	List func(ctx context.Context, e xmlstream.TokenWriter, start xml.StartElement) (req bool, err error)
	Parse func(ctx context.Context, r xml.TokenReader, start *xml.StartElement) (req bool, data interface{}, err error)
	Negotiate func(ctx context.Context, session *Session, data interface{}) (mask SessionState, rw io.ReadWriter, err error)
}

We also need to have a way to encode the order features should appear in (eg. auth should not be attempted before TLS). I decided that features would order themselves based on the state of the session at the moment when feature negotiation happens. The feature would say what properties of the session are or are not allowed, and the thing doing negotiation can determine whether the session currently meets those criteria. Session state information only has 4 properties that are useful for session negotiation:

Is a security layer in place (eg. TLS),
has authentication been performed,
is feature negotiation complete, and
was the session initiated by a remote entity?

These are part of the SessionState bits, so in the stream features we can encode what bits are necessary and what bits are prohibited and the state machine that handles session negotiation will be able to figure out when to advertise or negotiate the feature using simple bit math.

const (
	// Secure indicates that the underlying connection has been secured. For
	// instance, after STARTTLS has been performed or if a pre-secured connection
	// is being used such as websockets over HTTPS.
	Secure SessionState = 1 << iota

	// Authn indicates that the session has been authenticated (probably with
	// SASL).
	Authn

	// Ready indicates that the session is fully negotiated and that XMPP stanzas
	// may be sent and received.
	Ready

	// Received indicates that the session was initiated by a foreign entity.
	Received

	…
)

Session Negotiation

Once we have a set of features that we can negotiate, we need to do the actual session negotiation. Normally, XMPP negotiates a session over TCP using the features loop that we already described, however, sometimes an alternative mechanism might be required for negotiation such as the websocket subprotocol defined in RFC 7395 or the legacy XEP-0114: Jabber Component Protocol. Generalizing session negotiation meant allowing the user to provide a special negotiator function and writing a default one for the basic XMPP stream negotiation protocol.

type Negotiator func(ctx context.Context, session *Session, data interface{})
  (mask SessionState, rw io.ReadWriter, cache interface{}, err error)

Because the negotiator can’t change the session state if it’s written in another package (since the session state bits aren’t exported), it returns any changes it wants to be made to the session such as the new session state mask, or any changes to the underlying reader and writer (eg. if we negotiate StartTLS it might return a new reader and writer that speak TLS). The internal code that calls the negotiator function can then create a new session with the requested changes.

The builtin negotiator can be created with NewNegotiator and supports various options such as setting the stream language and copying the input and output streams somewhere else (such as an XML console):

// StreamConfig contains options for configuring the default Negotiator. 
type StreamConfig struct {
	// The native language of the stream.
	Lang string

	// S2S causes the negotiator to negotiate a server-to-server (s2s) connection.
	S2S bool

	// A list of stream features to attempt to negotiate.
	Features []StreamFeature

	// If set a copy of any reads from the session will be written to TeeIn and
	// any writes to the session will be written to TeeOut (similar to the tee(1)
	// command).
	// This can be used to build an "XML console", but users should be careful
	// since this bypasses TLS and could expose passwords and other sensitve data.
	TeeIn, TeeOut io.Writer
}

// NewNegotiator creates a Negotiator that uses a collection of StreamFeatures
// to negotiate an XMPP client-to-server (c2s) or server-to-server (s2s)
// session.
// If StartTLS is one of the supported stream features, the Negotiator attempts
// to negotiate it whether the server advertises support or not.
func NewNegotiator(cfg StreamConfig) Negotiator

It uses stream features as discussed in the previous section, but custom negotiators could be written that use a different type for stream features, making session negotiation and stream features entirely modular. You could replace them with your own implementations, and still use the xmpp package to handle the lower level XMPP protocol.

An example of a custom stream negotiator can be found in the xmpp/component package which negotiates a XEP-0114: Jabber Component Protocol connection.

Receiving Data

Once the session is negotiated, we need to be able to receive stanzas (the primitive types of XMPP) and other top level XML elements over the session. Because the main xmpp package is meant to be lower level than many other XMPP libraries written for Go, it does not contain callbacks or any way to register handlers for different types of top level XML element.

Instead, it contains a single Session.Serve method that decodes all incoming XML tokens and delegates handling them to a single Handler.

// A Handler triggers events or responds to incoming elements in an XML stream.
type Handler interface {
	HandleXMPP(t xmlstream.TokenReadEncoder, start *xml.StartElement) error
}

The Serve method also handles stanza semantics such as always responding to IQs as required to be compliant with the XMPP protocol. Because the handler is provided with a stream to use when writing back to the session, the underlying library can man-in-the-middle the token stream and check if an IQ response was written and automatically send one if not, or add required IDs to stanzas that are missing them.

This design also keeps the number of methods on the session relatively low and keeps the entire library more modular because we can delegate multiplexing elements to more specific handlers to other packages such as the builtin xmpp/mux package. If a user wants a more advanced multiplexer that buffers the stream and matches stream elements based on RELAX NG, Clark notation, or XPath, they could write it themselves and it would have all the same powers and access as the built in muxer! The mux package also provides more specific handlers for the basic XMPP stanza types: IQHandler, MessageHandler, and PresenceHandler, which should be reused by third party multiplexers.

Sending Data

Naturally, receiving data isn’t enough. We also need to send it. This happens by calling methods directly on the Session, the full list of methods for sending data is:

Encode(v interface{}) error
EncodeElement(v interface{}, start xml.StartElement) error
Send(ctx context.Context, r xml.TokenReader) error
SendElement(ctx context.Context, r xml.TokenReader, start xml.StartElement) error
SendIQ(ctx context.Context, r xml.TokenReader) (xmlstream.TokenReadCloser, error)
SendIQElement(ctx context.Context, payload xml.TokenReader, iq stanza.IQ) (xmlstream.TokenReadCloser, error)
TokenWriter() xmlstream.TokenWriteFlushCloser

This collection of methods gives you a low level way to take out a lock on the output stream and write tokens with TokenWriter, a high level way to send Go types without worrying about the underlying XML (the Encode methods), and methods for copying a token reader (provided by many types meant to be marshaled to XML) with the Send methods. The SendIQ methods differ a tiny bit from the other methods because all IQ stanzas in XMPP receive a reply. Having separate methods for SendIQ let you block a goroutine waiting for that reply so that you can write asynchronous code in a synchronous style, which is Go’s super power.

One slightly confusing aspect of this is that the Serve goroutine mentioned in the previous session must be running for the SendIQ methods to work. This is because serve also handles receiving IQs and matching their IDs to the list of sent IQs. Though this is well documented, it is often confusing for new users of the library. In a future version of the library, the SendIQ methods may learn to return an error if the server isn’t running, but for now their behavior without the Serve goroutine running is undefined (and will likely lead to them blocking forever).

Conclusion

That’s it! You should now be acquainted enough with the xmpp package to follow the examples in the documentation and generally understand how the various more advanced connection mechanisms we didn’t discuss here work. The module does so much more than just low-level XMPP connections though, and more functionality can be found in the subdirectories. If you want to write your own extension (or learn more about why the existing extensions were written the way that they have been), see my previous post on the matter: Extensions in Mellium. Finally, be sure to let me know if you build anything interesting with Mellium!

XMPP is sometimes referred to as “Jabber” for historical reasons. I prefer to refer to the federated network as Jabber and the protocol as XMPP. From this point on, Jabber is to Email what XMPP is to SMTP. ↩︎