Good, specific type hints with a good type checker can make programming much easier. However, the most common languages that we use sometimes are missing useful type features. “Existential types” is one such feature. Its main purpose is to abstract away the underlying representation of a data type, while still exposing the operations you can use that data type for. In other words, it hides what the data is, but exposes how you can use it. In that way, it shares a lot in common with abstract classes, but existential types work in some cases where abstract classes fail.
While most mainstream languages don’t have existential types built in, it turns out that many of them contain the tools that allow you to build existential types yourself. In this post I’ll show you why you might want to use existential types (feel free to skip if you’re already familiar with them from other languages), and how you can go about encoding them in Python1 (TypeScript implementation in the appendix).
What are Existential Types and Why Would you Want Them?
Let’s start with a problem. Let’s say you want to encode the idea of a graph into your program. Graphs have nodes which have neighbors, so let’s define a few basic operations that we want for our graph.
- First, if you give me a node, I should be able to give you the set of all neighbors of that node.
- Second, if you give me a set of nodes, I should be able to give you the set of all neighbors of all those nodes. This second operation should be definable in terms of the first (and in some cases it is), but, as we’ll see, sometimes it makes sense to define it separately for efficiency reasons.
- Finally, we’ll need a node to start with, so we’ll say each of our graphs has a “source” node.
We want this to be abstract, in that it really shouldn’t matter how exactly the nodes/graphs are represented, as long as we can do these operations. Maybe we have two different graph implementations: one using a linked-list style in-memory graph, and another where the nodes are represented by rows in a SQL database.
Alright, seems easy enough. Let’s see how we might define this in Python.
First Attempt: Abstract Node Class
Let’s try simply defining an abstract Node class:
class Node(ABC):
@abstractmethod
def get_neighbors(self) -> set[Self]: ...
@classmethod
@abstractmethod
def get_source(cls) -> Self: ...
### get_all_neighbors???This works pretty well until we try to define the “all neighbors” operation2. We can easily define abstract methods for get_neighbors and get_source, but what should the type of get_all_neighbors be? We could try something like this:
@classmethod
@abstractmethod
def get_all_neighbors(cls, nodes: set[Self]) -> set[Self]: ...But we’ll run into trouble when we try to implement it.
class InMemoryNode(Node):
# other methods omitted
@classmethod
def get_all_neighbors(cls, nodes: set[X]) -> set[X]: # what do we put for 'X' here?
return set().union(*(node._get_neighbors() for node in nodes))What could we put for X? If we put Self (or equivalently, InMemoryNode), our type checker will reject it since the abstract class wants set[Node], not set[InMemoryNode]3. If we instead try to put Node, our type checker would be happy with us, but we’ve introduced a subtle issue: Now InMemoryNode.get_all_neighbors can accept any Node type. Why is this an issue? Going back to our problem statement, we might want to also encode graphs that use database rows as nodes. If we were to use InMemoryNode.get_all_neighbors on those kinds of nodes, we’d end up sending a separate database call per node, which is super inefficient!
This reveals another constraint on our problem: for the “get all neighbors” operation to make sense as an abstract operation, implementations need to be able to only take in nodes of its own type.
Second Attempt: A Generic Graph Protocol
Let’s try something else then. What if we made a Graph protocol4 with a generic type parameter for the node type:
class Graph[NodeT](Protocol):
def get_neighbors(self, node: NodeT) -> set[NodeT]: ...
def get_all_neighbors(self, nodes: set[NodeT]) -> set[NodeT]: ...
def get_source(self) -> NodeT: ...So far so good, but this introduces another issue: now anything that wants to use a Graph has to provide it with a type parameter. There are three ways to handle this:
- Give the graph a concrete type, like
Graph[int]: but this defeats the purpose of having it be abstract. We want consumers to not have to worry about what particular kind ofGraphit is using. - Leave the type argument out, or use
Any: but this gets rid of our type-checker’s ability to help us find errors. - Give it a
TypeVar, likeGraph[OuterNodeT]: butOuterNodeTwould need to be defined somewhere. The most obvious place is to make the function/class that is using the graph generic as well. For functions and methods, this isn’t a big deal, but if we wanted to define some other class that has aGraphas a field, that class would now also need a generic node type parameter. And any class that has that class as a field, and so on. TheNodeTtype parameter is now infecting anything that contains aGraph. Gross.
Enter: Existential Types
What we actually need is some way, in the type system, to say “There is some type T, and some operations that you can do with type T, but don’t worry about what T actually is. Just use the operations I provide, and you will be fine”. That is what an “existential type” is. Now, the bad news is that Python (and most mainstream programming languages5) don’t actually have existential types built in. The good news is there is a way to encode them using features that Python does have6.
How to Encode Existential Types in Python
There’s some fun theory behind how we can write existential types in python, but for those of you who just want to see how to do it, here we go:
class Graph(Protocol):
class Continuation[OutT](Protocol):
def __call__[NodeT](
self,
source: NodeT,
get_neighbors: Callable[[NodeT], set[NodeT]],
get_all_neighbors: Callable[[set[NodeT]], set[NodeT]]
) -> OutT: ...
def run[OutT](self, continuation: Continuation[OutT]) -> OutT: ...This looks pretty strange! But let’s break down what’s going on here. The Continuation defines the type for a function (hence the __call__) saying “give me a source node, along with functions for the two operations, and I’ll do something with them and return something of type OutT”. Anything you could want to do with graph nodes can be done in a function whose signature matches Continuation. run, as we’ll see, is mostly just there to make sure we tie the source node and its operations together.
To get a better idea of what’s going on, let’s look at how you would use such a Graph. Let’s say you want to get all nodes that are 2 steps away from the source node and print them out. Here’s how you’d do that:
def print_two_step_neighbors(graph: Graph) -> None:
def continuation[NodeT](source: NodeT, get_neighbors: Callable[[NodeT], set[NodeT]], get_all_neighbors: Callable[[set[NodeT]], set[NodeT]]) -> None:
neighbors = get_neighbors(source) # set[NodeT]
two_step_neighbors = get_all_neighbors(neighbors) # set[NodeT]
for n in two_step_neighbors:
print(n)
graph.run(continuation)All of the node handling is done inside a continuation helper function, which is then given to graph to execute. It is a bit unfortunate that we need to use this kind of indirect-helper-function-style7 to get this to work, but it does solve the problem1. Some things to note:
continuationis the only place whereNodeTneeds to be defined.print_two_step_neighborsitself has no idea thatNodeTeven exists, and neither would anything consumingprint_two_step_neighbors. No more “infection”!- The names
continuationandNodeThere aren’t important, I just use them for consistency. continuationreturnsNonehere, but in principle it could return anything! For instance, you could imagine a very similar function that instead of printing the nodes, just returns their string representations.
Now what does an implementation of Graph look like?
@dataclass
class InMemoryNode:
neighbors: set['InMemoryNode']
@dataclass
class InMemoryGraph(Graph):
source: InMemoryNode
def _get_neighbors(self, node: InMemoryNode) -> set[InMemoryNode]:
return node.neighbors
def _get_all_neighbors(self, nodes: set[InMemoryNode]) -> set[InMemoryNode]:
return set().union(*(self._get_neighbors(node) for node in nodes))
def run[OutT](self, continuation: Graph.Continuation[OutT]) -> OutT:
return continuation(self.source, self._get_neighbors, self._get_all_neighbors) # this is where the magic happens!
# and a database version for good measure
@dataclass
class DbGraph(Graph):
source: str
def _get_neighbors(self, row_id: str) -> set[str]:
rows = db.query("SELECT dst FROM edges WHERE src = ?", (row_id,))
return {str(id) for (id, ) in rows} # `db` defined elsewhere
def _get_all_neighbors(self, row_ids: set[str]) -> set[str]:
rows = db.query(f"SELECT DISTINCT dst FROM edges WHERE src IN ({','.join('?'*len(row_ids))})", list(row_ids)) # note the separate, more efficient implementation
return {str(id) for (id,) in rows}
def run[OutT](self, continuation: Graph.Continuation[OutT]) -> OutT:
return continuation(self.source, self._get_neighbors, self._get_all_neighbors)
# ----- USAGE -------
# "diamond" shaped graph
node_d = InMemoryNode([])
node_c = InMemoryNode([node_d])
node_b = InMemoryNode([node_d])
node_a = InMemoryNode([node_b, node_c])
memory_graph = InMemoryGraph(node_a)
db_graph = DbGraph()
print_two_step_neighbors(memory_graph)
print_two_step_neighbors(db_graph)The names of the private methods and variables don’t matter here. What matters is that they can be passed into continuation in run. Because of how we defined Continuation, that function call forces source, get_neighbors, and get_all_neighbors to use the same node type, without having to actually explicitly declare that node type anywhere!
Template for Existential Types
In general, here is how you define an existential type in Python:
class {ModuleName}(Protocol):
class Continuation[OutT](Protocol):
def __call__[{RepresentationT}](self, {list_of_operations_that_use_RepresentationT}) -> OutT: ...
def run[OutT](self, continuation: Continuation[OutT]) -> OutT: ...
class {ImplementationName}({ModuleName}):
{implementations_of_operations}
def run[OutT](self, continuation: {ModuleName}.Continuation[OutT]) -> OutT:
return continuation({operations})Note that run will look basically identical in every implementation. This is an unfortunate bit of boiler-plate you’ll have to deal with to use existential types.
Conclusion
Existential types allow a general way to define a type with a list of operations on that type, without having to tell consumers what the type actually is. This is useful for abstraction and decomposition. Python and other mainstream programming languages don’t have existential types built in, but some of them do have features that let you encode them. There are some downsides to doing so, including extra boilerplate for the run functions and the indirect style. Most of the time, using simple abstract classes or generic protocols will get you what you need. But if you’re ever in a situation where 1) you need operations that take multiple instances of your abstract data type, or use the abstract data type in wrapper types and 2) you need to store a value of the abstract type in a field of a non-generic class, existential types can come to the rescue.
Appendix
TypeScript Implementation
There are lots of different ways you could do this in TypeScript, but it boils down to the same pattern: Define a “continuation” function type, along with a module that has a function that ties the operations together. TypeScript allows us to nest the continuation directly inside the main module function. Here is an example:
type Graph = <OutT>(
continuation: <NodeT>(
source: NodeT,
getNeighbors: (node: NodeT) => Set<NodeT>,
getAllNeighbors: (nodes: Set<NodeT>) => Set<NodeT>
) => OutT
) => OutT
type InMemoryNode = {
neighbors: Set<InMemoryNode>
}
function makeInMemoryGraph(source: InMemoryNode): Graph {
function getNeighbors(node: InMemoryNode) {
return node.neighbors
}
function getAllNeighbors(nodes: Set<InMemoryNode>) {
return Array.from(nodes).reduce((acc, curr) => acc.union(curr.neighbors), new Set<InMemoryNode>()) // union requires ES2025
}
return continuation => continuation(source, getNeighbors, getAllNeighbors)
}Using that pattern, the template would be
type ${ModuleName} = <OutT>(
continuation: <${RepresentationT}>(
${operations using RepresentationT}
) => OutT
) => OutT
function make${ImplementationName}(${constructor parameters}): ${ModuleName} {
${operation definitions}
return continuation => continuation(${operations})
}Theoretical Background
Existential types are called “existential” because they are the type-theory equivalent of existential quantifiers in logic: (“there exists some x such that proposition P is true”). Its counterpart is “universal quantification”: (“for all xs, proposition P is true”). In logic, if you have access to , you can define like so: (so means “give me a proof that you can use P to prove y with any x substituted in the P, and I’ll tell you that y is true”). This makes sense. You’re saying that “anything I could want to do with proposition P” (the part, corresponding to the operations on x in the code) I should be able to do regardless of the x that is used in P. Another way of putting it: it’s enough for me to know that there exists some x such that P is true. I don’t have to know what x actually is to then go on to use P.
What does this have to do with programming? The Curry-Howard Correspondence tells us that logical propositions are the same thing as types. So, if we can define a logical proposition with , that tells us there is a corresponding type using existentials. And luckily, many mainstream languages have the equivalent of in their type system, which allows us to use it to encode : namely, generics (sometimes called type polymorphism). This is where the template came from:
class {ModuleName}(Protocol):
class Continuation[OutT](Protocol): # OutT <-> y, "forall y"
def __call__[{RepresentationT}](self, # RepresentationT <-> x, "forall x"
{list_of_operations_that_use_RepresentationT} # "P"
) -> OutT: ... # " -> y"
def run[OutT](self, continuation: Continuation[OutT]) -> OutT: ... # "forall y. (Continuation y) -> y".The nested in the definition is why encoding existential types requires the type system to support rank-2 polymorphism (see footnote6). The term is why we switch to continuation passing style, (see footnote7, implications correspond with functions/continuations). The term “Module” here comes from Mitchell & Plotkin’s “Abstract Types have Existential Type”.
One more note: in the template, Continuation is defined separately, then applied in the run signature. Python requires this because Protocols are, as far as I know, the only way to do rank-2 polymorphism like this in Python. If your language has ways to do rank-2 polymorphism inline, you don’t need to separate them out. TypeScript can do inline rank-2 polymorphism, which is why its template is simpler:
type ${ModuleName} = <OutT>( // OutT <-> y. "forall y"
continuation: <${RepresentationT}>( // RepresentationT <-> x, "forall x"
${operations using RepresentationT} // "P"
) => OutT // " -> y"
) => OutT // " -> y"Footnotes
-
I use pyright for my type checker. Unfortunately, mypy doesn’t consistently work with the generic
__call__’s that I use here as of the time of writing. ↩ ↩2 -
In fact, it’s the “all neighbors” operation, and things like it, that mean abstract classes sometimes fail where existential types succeed. Abstract classes work as long as the operations you’re performing only ever use a single, raw instance of the data type you’re trying to represent. They allow you to access that instance through Python’s
self, TypeScript’sthis, or similar. But as soon as you need to use two or more instances, or you need to use the type wrapped in some other generic type (setin our case), abstract classes break down. ↩ -
The actual error you’d get if you tried this is a variance error.
set[Node] </: set[InMemoryNode]becauseset’s type parameter is invariant. If you switch toContainerto get covariance instead, you get the error I described, which isMethod "get_all_neighbors" overrides class "Node" in an incompatible manner\n\tParameter 2 type mismatch: base parameter is type "Container[Node]", override parameter is type "Container[InMemoryNode]"↩ -
In case you’re not familiar: for the purposes of this post, you can think of
Protocols as abstract classes. The important difference here is thatProtocols don’t require explicit inheritance, which is important for ourContinuations later. I start using them now in order to not surprise you. ↩ -
Some do, such as Rust’s
dyn Trait, or Java’s wildcards. Function-oriented languages such as Haskell and ML also often include them. ↩ -
Any language that has “rank-2 polymorphism” can encode existential types. “Rank-2 polymorphism” just means you can define the type for a generic function whose parameters and/or return value are themselves generic functions. ↩ ↩2
-
This is called “continuation passing style” or CPS. It’s actually very powerful! In some ways it’s more expressive than “direct style”, but it can definitely be harder to read. JavaScript likes to use these quite a bit, often referring to them as “callbacks”. ↩ ↩2