Go and Sum Types

29 Jun, 2019

Introduction

Last year I had the privilege of attending the second annual Go Contributors Summit in Denver, CO. There we got a sneak peak of the new Go 2 Design Documents and had the opportunity to provide early feedback. At the summit I wondered if it would be possible to achieve 90% of what users would get from adding generics to the existing type system without adding as many new concepts as contracts requires. Many of the use cases I hear for generics don’t actually require parametric polymorphism at all. In the past I’ve mostly thought that these use cases could be solved by adding some sort of enumerated type that also acts as a sum type, but now I wonder if you need the data side of the equation at all: maybe just sum types will give us the flexibility we need without having to add new constructs for storing data like an enum.

Syntax

To see how this would work we’ll borrow syntax from Typescript’s union types (though this syntax may not be the best fit for Go; a real proposal would have to think more about that):

[]byte|string

This represents a distinct type with member types “string” and “[]byte”. Types that are members of the sum type, or other sum types with exactly the same member types, may be converted to the sum type. Assigning values to sum types works just as it would for primitive types and type definitions that use them. For example, we could create a new byte slice and a type def that uses a byte slice from a literal or other []byte type:

type bytes []byte

func main() {
	var a bytes
	var b []byte
	a = []byte{'a'}
	b = []byte{'b'}

	a = b

	fmt.Printf("%T: %[1]s\n", a)
	fmt.Printf("%T: %[1]s\n", b)

	// Output:
	// main.bytes: b
	// []uint8: b
}

And the same applies to sum types and their member types:

type StringOrBytes []byte|string

func main() {
	var a []byte|string
	var b StringOrBytes
	a = []byte{'a'}
	b = "b"

	a = b

	fmt.Printf("%T: %[1]s\n", a)
	fmt.Printf("%T: %[1]s\n", b)

	// Output:
	// []byte|string: b
	// main.StringOrBytes: b
}

Naturally, type conversions are also valid:

// Unnecessary, but valid, type conversions:
a = ([]byte|string)(b)
b = StringOrBytes(a)

Where there would be ambiguity, when dealing with untyped literals, the default type is used just as it would be if assigning to interface{} or some other ambiguous context:

var a, b interface{}
a = 1
b = 2.0

fmt.Printf("%T: %[1]v\n", a) // int: 1
fmt.Printf("%T: %[1]v\n", b) // float64: 2

var c, d int|float64
c = 1   // c is stored as an int
d = 2.0 // d is stored as a float64

If sum types were added to the language, we also need some way to use them. Go already has a construct similar to pattern matching that works for interfaces called a type switch. Overloading the same syntax for our sum type makes sense:

var a []byte|string = "abc"

switch b := a.(type) {
case string:
	fmt.Println("String")
case []byte:
	fmt.Println("Bytes")
}

// Output: String

When using a sum type, it could be a compile error to leave off a variant if a default case does not exist (although this means that when using an assignment all member types must always be covered) or leaving off a variant could indicate an empty branch. The second feels more consistent with the rest of the language, but a little compile time safety goes a long way towards correct programs so personally I prefer the first. Type assertions would also work just like they do for interfaces:

var a []byte|string = "abc"

// b is a string
b := a.(string)

// ok is false
if c, ok := a.([]byte); ok {
	panic("unreachable code reached")
}

// panic
d := a.([]byte)

Like the type switch, I think it would be best to make this more strict than type assertions with the empty interface so that it is a compile time error to assert that the variable is any non-member type. For example:

var a []byte|string = "abc"

// Does not compile.
b := a.(int)

Similarly, trying to define a method on a sum type would result in an “invalid receiver” error at compile time, just like defining a method on an interface type.

Error handling

Let’s look at how this could be used to improve error handling. I’m not the biggest fan of the way Go traditionally does error handling (though I do think that in real life it works most of the time and is “good enough”). For example, the encoding/xml package has a Token method that looks like the following:

type Token interface{}

func (d *Decoder) Token() (Token, error)

It’s easy to temporarily not assign the error value and forget to go back and add error handling when returning error values this way. The function doesn’t imply that if error is non-nil, token is nil. However this is assumed by convention, which is confusing. These values also can’t be passed around as a pair easily (Go has no concept of tuples, these are multiple distinct return values). If you want to make sure the error was handled, right now you have to use static analysis after the fact. But if we had a concept of sum types and matching was required, we can guarantee that the error is handled and can pass around the token or error:

type Token StartElement|EndElement|CharData|Comment|ProcInst|Directive

func (d *Decoder) Token() Token|error

Let’s look at an example where we parse a Go modules version string (eg. v2) and return the major version as an int:

func parseVersion(tag string) int|err {
	switch {
	case len(tag) != 2:
		return fmt.Errorf("invalid version %q, expected length 2", tag)
	case tag[0] != 'v':
		return fmt.Errorf("invalid version %q, expected first char to be 'v'", tag)
	}

	// Let's assume strconv.Atoi returns int|err in this new world too, or that
	// we have some special case for things that return (int, error) so that this
	// is possible.
	return strconv.Atoi(tag[1:])
}

Now when we call “parseVersion” we can pass the result around, and if we use it we have to handle the error:

ver := parseVersion("v2")
switch major := ver.(type) {
case int:
	logger.Printf("Found major version: %d", major)
case error:
	logger.Printf("Error parsingn major version: %q", err)
}

We could also type assert if we don’t care about the actual error. This is useful in tests where a panic is acceptable, or in places where we don’t care about the value of the error and just want to check if an error occurred (so that we can use a default value, for instance):

ver, ok := parseVersion("v2").(int)
if !ok {
  // Use a default.
  ver = 1
}

Zero Values

This also leads to some ambiguity which must be resolved. For example, what is the zero value of bar in the following snippet?

// What is the zero value of bar?
var bar int|error

Because this will often be used for error handling, I think the best way to handle this would be to let the zero value be the zero value of the first variant type:

var bar int|error
fmt.Printf("%T: %[1]v", bar)

// Output: int|error: 0

This way we can choose the default return value, and even encode a concept of “no return” (using a nil error value). This may be useful for replacing sentinel values that aren’t actually errors like io.EOF.

// EOFReader is an io.Reader that never reads any data and acts as if it's
// already reached the end of the data.
type EOFReader struct{}

func (EOFReader) Read(p []byte) (error|int) {
  var result error|int
  return result
}

Untyped Nil

This also leads to ambiguity around untyped nil’s:

var a error|interface{}

// What does this assignment do?
a = nil

Luckily, we already have president for this case. If we try to build a program containing the snippet:

a := nil

We currently get the compile error “use of untyped nil”. This could also be applied to assigning untyped nil’s to a sum type. It may even be possible to relax this rule in the future if only one variant is nil-able. We could also use the zero-values rule and make nil always take the type of the first nil-able variant, but this seems unnecessarily confusing to me. I would be curious to know if anyone can think of a compelling use case for anything other than failing to compile on untyped nils.

Ambiguous interfaces

Finally, consider the following:

// Which variant type is used for the return value?
var rw io.Reader|io.Writer = struct{
  io.Reader
  io.Writer
}{}

Without some rules we don’t know the variant type used when storing the value because our returned type implements both interfaces. We can’t create a concept of specificity here (where the interface with the most matching methods is selected) because our returned type may implement both variant types exactly. For example, if we had two different interfaces that both had an identical Read method we would be implementing both.

My initial thought is that this type of ambiguity would be ignored as long as it is never a problem. The following would be fine up until w is defined:

var rw io.Reader|io.Writer = struct{
  io.Reader
  io.Writer
}{}

// No variant is selected here.

// This is okay, and implies that io.Reader is the variant used by rw.
var r io.Reader = rw

// This is not okay because rw has had the io.Reader variant selected. We need
// to assert that rw is also an io.Writer.
var w io.Writer = rw

The exact rules for what is and is not allowed here would have to be worked out. It may also make sense to just disallow creating sum types with conflicting interfaces from the get go and require that the author fix the ambiguity.

Conclusion

Most of the time when I hear someone complaining that Go doesn’t have generics, they really just want a more expressive type system (as do I) and don’t need generics in particular. Thinking through alternative ways to add expressively to the type system is always fun, and I think a concept of sum types is a good way to do it while still maintaining Go’s simplicity.

Update

2019-07-14: It was recently pointed out to me that this post is very similar to some of the discussion from issue 19412. This previous discussion may also be a valuable resource to anyone considering how sum types would work in Go.