XMPP Addresses

08 Feb, 2021

This post is copied from an article I originally wrote for Stack Overflow Documentation, a wiki-style documentation site. As it is no longer available, I have copied it here and modified it to work in blog post form.

XMPP addresses, more commonly known as JIDs, are defined in RFC 7622 and identify clients, servers, and other entities on the XMPP network. They look like an email address, but sometimes have an optional “resourcepart” at the end that identifies a particular client logged in as the account represented by the rest of the address (since XMPP may have multiple clients connected per account). For example, my JID with a resourcepart blog would be:

sam@samwhited.com/blog

Unlike email where the account is the lowest level of entity that can be addressed, individual clients on the XMPP network are globally addressable thanks to the addition of the resourcepart.

JID Types

JIDs are sometimes broken down into two types: the “full JID” and the “bare JID”. Full JIDs always have a resource part and uniquely identify an individual client or session on the network:

romeo@example.org/orchard
example.org/da863ab

Bare JIDs never have a resourcepart and identify an account or host on the network:

romeo@example.org
example.org

From a protocol perspective nothing can be gleaned from the lack or presence of a resourcepart or localpart other than the fact that this is an account which may have multiple network entities with current sessions or that it addresses a specific entity.

Splitting a JID

To split a JID into its component parts (the localpart, domainpart, and resourcepart), the following algorithm should be used (where the localpart is represented by lp, the resourcepart by rp, and the domainpart by dp and ∈ is used to check if the given character is included in the string):

In textual form:

If the JID does not contain a “/”, the resourcepart is the empty string. Discard everything after and including the “/” and goto 5.
Otherwise split on the first slash.
If the part after the slash was the empty string, error.
Otherwise the resourcepart is the part after the slash.
If the remainder of the JID does not contain an “@”, the localpart is empty and the domainpart is the remainder of the JID. Goto 9.
Otherwise, split on the “@”.
If the part before the “@” is the empty string, error.
Otherwise the domainpart is localpart is the part before the “@” and the domainpart is the part after the “@”.
Trim any “.” suffix from the domainpart.
Done!

Note that the localpart and resourcepart are optional and may result in empty strings (you may have a jid that is just a domainpart).

And finally, let’s see an example in code form using Go:

In Go, the mellium.im/xmpp/jid package implements operations on JIDs. To Split a JID string into its component parts the SplitString function may be used:

lp, dp, rp, err := SplitString("romeo@example.net")

No validation is performed by SplitString and the parts are not guaranteed to be valid.

To manually split a string without depending on the jid package, the underlying code (with annotations from RFC 7622) looks like this:

func SplitString(s string) (localpart, domainpart, resourcepart string, err error) {
	// RFC 7622 §3.1.  Fundamentals:
	//
	//    Implementation Note: When dividing a JID into its component parts,
	//    an implementation needs to match the separator characters '@' and
	//    '/' before applying any transformation algorithms, which might
	//    decompose certain Unicode code points to the separator characters.
	//
	// so let's do that now. First we'll parse the domainpart using the rules
	// defined in §3.2:
	//
	//    The domainpart of a JID is the portion that remains once the
	//    following parsing steps are taken:
	//
	//    1.  Remove any portion from the first '/' character to the end of the
	//        string (if there is a '/' character present).
	sep := strings.Index(s, "/")

	if sep == -1 {
		resourcepart = ""
	} else {
		// If the resource part exists, make sure it isn't empty.
		if sep == len(s)-1 {
			err = errNoResourcepart
			return
		}
		resourcepart = s[sep+1:]
		s = s[:sep]
	}

	//    2.  Remove any portion from the beginning of the string to the first
	//        '@' character (if there is an '@' character present).

	sep = strings.Index(s, "@")

	switch {
	case sep == -1:
		// There is no @ sign, and therefore no localpart.
		localpart = ""
		domainpart = s
	case sep == 0:
		// The JID starts with an @ sign (invalid empty localpart)
		err = errNoLocalpart
		return
	default:
		domainpart = s[sep+1:]
		localpart = s[:sep]
	}

	// We'll throw out any trailing dots on domainparts, since they're ignored:
	//
	//    If the domainpart includes a final character considered to be a label
	//    separator (dot) by [RFC1034], this character MUST be stripped from
	//    the domainpart before the JID of which it is a part is used for the
	//    purpose of routing an XML stanza, comparing against another JID, or
	//    constructing an XMPP URI or IRI [RFC5122].  In particular, such a
	//    character MUST be stripped before any other canonicalization steps
	//    are taken.

	domainpart = strings.TrimSuffix(domainpart, ".")

	return
}

Validating a JID

Unlike emails, JIDs were defined with Internationalization (i18n) in mind using the Preparation, Enforcement, and Comparison of Internationalized Strings (PRECIS) framework. PRECIS is defined in RFC 8264 and is a framework for comparing strings safely in a variety of contexts. For instance, imagine you have registered the nickname “Richard IV” (Latin capital letters I, Vee) in a group chat: Using PRECIS the chat application could ensure that no one else comes along and registers the nickname “Richard Ⅳ” (Unicode Roman Numeral 4) and uses it to impersonate you.

The algorithm for validating a JID that has already been split into its localpart, domainpart, and resourcepart is as follows:

In textual form:

Take the localpart, domainpart, and resourcepart as the inputs.
If the localpart or resourcepart are not valid UTF-8, error.
Otherwise run the IDNA “To Unicode” algorithm on the domainpart.
If the output domainpart is not valid UTF-8, error.
Otherwise run the PRECIS “Username Case Mapped” profile on the localpart, and the “Opaque String” profile on the resource part.
Perform other validation steps (in no particular order).

The final Validations step should perform the following actions (in no particular order):

Check that the localpart is less than 1024 bytes.
Check that the localpart does not contain any any of the following
characters: "&'/:<>@.
Check that the resourcepart is less than 1024 bytes.
Check that the domainpart is greater than zero bytes and less than 1024
bytes (and possibly validate that parts of the domainpart meet hostname,
DNS, or IP address requirements as appropriate).
If the domainpart is a valid IPv6 address, ensure that it uses bracketed notation (eg. [::1] instead of ::1).

Conclusion

As a user of XMPP most of the time addresses will just look like emails and you won’t need to ever see them except occasionally adding one to your address book. Even as a developer of XMPP you mostly won’t need to worry about the details of splitting or validating JIDs as robust libraries for handling XMPP addresses exist in most or all popular programming languages. However, sometimes it’s fun to take a bit of a dive into seemingly simple technologies and know how they work under the hood. Hopefully this article gave you a deeper appreciation of how to handle addresses in XMPP.

Need an experienced freelancer who knows XMPP, or a whole team to manage a project? I’d love to chat! Follow up with Willow Bark Co-op!