$ emrebener

Interpreter Pattern

author: emre bener read time: 6 min about: interpreter pattern, java repository: https://github.com/Emrebener/GoF-Design-Patterns-Behavioural
published: updated: mentions: software design pattern, behavioral pattern, abstract syntax tree, backus–naur form, visitor pattern, antlr

1. What the interpreter pattern is

The Interpreter pattern represents each rule of a small grammar as its own class, so a sentence in that grammar becomes a tree of objects. You evaluate the sentence by calling a single method, interpret(), on the root of the tree, and each node recursively interprets its children.

That’s the whole pattern. The Gang of Four put it in the behavioural bucket because it’s about how a tree of objects collaborates to compute a result, not about how those objects get built. The grammar’s structure is encoded directly in the class hierarchy: one class per rule, composition for nesting, recursion for evaluation.

Two things up front. “Interpreter” here does not mean a programming-language runtime like the JVM or CPython. It means something much smaller: a way to evaluate expressions in a tiny, fixed grammar you control. And the pattern is one of the least-used in the GoF book, for reasons covered in §6. Still worth knowing, because the idea (a grammar rule maps to a class, a sentence maps to a tree) underlies tools you use every day, even when you’d never hand-write the pattern yourself.

2. The boolean-expression grammar

I’ll interpret boolean expressions over named variables: a small grammar with AND, OR, NOT, literals, and variables. The pattern needs a concrete grammar before any code makes sense, and a good one is small, has a handful of rules, and is unlikely to grow. The classic teaching example is arithmetic. Boolean expressions fit the same brief, keep the evaluation logic to one line per rule, and still exercise both kinds of grammar rule the pattern distinguishes.

The grammar, in BNF (Backus-Naur Form, the standard notation for grammar rules):

expression  ::= variable | constant | orExpr | andExpr | notExpr
variable    ::= a name bound to true/false in the context
constant    ::= "true" | "false"
orExpr      ::= expression "OR" expression
andExpr     ::= expression "AND" expression
notExpr     ::= "NOT" expression

This grammar has two kinds of rule, and the distinction drives the whole class design:

  • Terminal rules are leaves. variable and constant don’t contain other expressions; they bottom out the recursion.
  • Non-terminal rules are branches. orExpr, andExpr, and notExpr are defined in terms of other expressions, so their classes hold references to child expressions.

A sentence like cloudy AND (NOT raining) is a tree: an AND node whose left child is the variable cloudy and whose right child is a NOT wrapping the variable raining.

3. The expression hierarchy in Java

Every grammar rule becomes a class, and every class implements one shared interface. That interface is the entire contract of the pattern:

public interface Expression {
    boolean interpret(Context context);
}

One method. Given a context (the variable bindings, covered in §4), an expression returns its boolean value. Every node in the tree, leaf or branch, answers the same question the same way.

The terminal expressions implement interpret() without recursing. A constant ignores the context entirely:

public final class Constant implements Expression {
    private final boolean value;

    public Constant(boolean value) {
        this.value = value;
    }

    @Override
    public boolean interpret(Context context) {
        return value;
    }
}

A variable is also a leaf, but it does consult the context to look up its binding:

public final class Variable implements Expression {
    private final String name;

    public Variable(String name) {
        this.name = name;
    }

    @Override
    public boolean interpret(Context context) {
        return context.lookup(name);
    }
}

The non-terminal expressions are where the recursion lives. Each one holds its child expressions as fields and, in interpret(), evaluates those children before combining the results. And and Or hold two children each:

public final class And implements Expression {
    private final Expression left;
    private final Expression right;

    public And(Expression left, Expression right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public boolean interpret(Context context) {
        return left.interpret(context) && right.interpret(context);
    }
}

Or is identical apart from the || operator, so I won’t repeat it. Not holds a single child and inverts it:

public final class Not implements Expression {
    private final Expression child;

    public Not(Expression child) {
        this.child = child;
    }

    @Override
    public boolean interpret(Context context) {
        return !child.interpret(context);
    }
}

Notice that no class knows the concrete type of its children. And holds two Expression references; whether a child is a Variable, another And, or a Not is invisible to it. That’s what lets the same five classes compose into arbitrarily deep trees. The recursion terminates because every path down the tree eventually hits a terminal expression, which returns without calling interpret() again.

«interface»Expressionterminal expressionsnon-terminal expressionsConstantVariableAndOrNot«interface»Expressionterminal expressionsnon-terminal expressionsConstantVariableAndOrNot

expression: cloudy AND (NOT raining)AndVariablecloudyNotVariablerainingleftrightchildgreen = terminal (leaf) purple = non-terminal (branch)expression: cloudy AND (NOT raining)AndVariablecloudyNotVariablerainingleftrightchildgreen = terminal (leaf) purple = non-terminal (branch)

4. The context

The context carries everything an expression needs that isn’t baked into the tree itself. For this grammar that’s just the variable bindings: which names are currently true and which are false. It’s passed into interpret() as a parameter rather than stored on the expression nodes, and that choice matters.

public final class Context {
    private final Map<String, Boolean> bindings = new HashMap<>();

    public Context bind(String name, boolean value) {
        bindings.put(name, value);
        return this;
    }

    public boolean lookup(String name) {
        Boolean value = bindings.get(name);
        if (value == null) {
            throw new IllegalArgumentException("Unbound variable: " + name);
        }
        return value;
    }
}

Keeping bindings in the context, not in the Variable nodes, means the same tree can be evaluated against different inputs. Build the tree for cloudy AND (NOT raining) once, then interpret it with cloudy=true, raining=false and again with cloudy=true, raining=true without rebuilding anything. If Variable stored its own value, the tree would be a single-use throwaway. The tree is the grammar sentence, the context is the environment it runs in, and separating the two is what makes the sentence reusable.

In a richer interpreter the context grows. An arithmetic grammar would store numeric variables; a scripting grammar might hold function definitions or an output buffer. The principle holds regardless: state that varies per evaluation goes in the context, structure that’s fixed once parsed goes in the tree.

5. Building and evaluating the tree

The Interpreter pattern covers the tree and its evaluation. It deliberately says nothing about how the tree gets built from input text. That’s the parser’s job, and the GoF book is explicit that parsing is out of scope for the pattern. The pattern assumes the tree already exists.

The simplest way to get a tree, and the honest one for a small fixed expression, is to construct it by hand:

// cloudy AND (NOT raining)
Expression sentence = new And(
        new Variable("cloudy"),
        new Not(new Variable("raining")));

Context context = new Context()
        .bind("cloudy", true)
        .bind("raining", false);

boolean result = sentence.interpret(context); // true

The nesting of the constructor calls is the shape of the tree. new And(left, right) with a new Not(...) as the right argument produces exactly the AST (abstract syntax tree) from §2. Evaluation is the single sentence.interpret(context) call, which fans out recursively through the nodes.

For input that arrives as a string at runtime, you’d write a parser that produces the same object tree. That parser is a separate component with its own design; it could be a hand-written recursive-descent routine or generated by a tool. The key point is the boundary: the parser’s output and the interpreter’s input are the same Expression tree, and neither side needs to know how the other works. Keeping that seam clean is what lets you swap a hand-rolled parser for a generated one later without touching a single expression class.

6. When not to use it

The Interpreter pattern is the least-used pattern in this series, and the reasons are structural.

  • The class count explodes with the grammar. One class per rule is fine for five rules. A grammar with full arithmetic, comparisons, precedence levels and function calls is dozens of rules, hence dozens of nearly-identical small classes. The pattern scales linearly in classes with grammar size, and that gets unwieldy fast.
  • It only covers evaluation, and only one kind. The tree knows how to interpret itself. The moment you want a second operation over the same tree (pretty-printing it, type-checking it, optimising it) you’re adding another method to every one of those classes, or reaching for the Visitor pattern to keep operations separate from the node classes. For anything beyond a single evaluation pass, Visitor over a plain AST is usually the better structure.
  • It says nothing about parsing or errors. Real languages need a lexer, a parser, precedence handling, and useful error messages. The pattern hands you none of that. You build it all yourself, around the edges of the part the pattern actually covers.
  • Parser generators exist. Past a handful of rules, a tool like ANTLR generates the lexer, the parser, and the tree-walking scaffolding from a grammar file. It’s less code, it handles precedence and error recovery, and the grammar lives in one readable place instead of being smeared across a class hierarchy. I’d reach for a generator long before hand-writing twenty expression classes.

So the pattern earns its place in a narrow band: the grammar is small, fixed, and unlikely to grow; there’s exactly one operation (evaluation) you need over it; and pulling in a parser-generator dependency would be heavier than the problem deserves. Think a configuration filter, a feature-flag rule, a small query predicate, or an allow/deny expression in a security policy. Hit that band and the pattern is clean and direct. Miss it, especially on the “unlikely to grow” clause, and you’ve signed up to maintain a grammar as a sprawl of classes when a .g4 file would have done.