The Closed World Assumption: Not a Philosophy, a Proof Theory

There is a persistent tendency to frame the debate between the Closed World Assumption and the Open World Assumption as a philosophical disagreement: a choice between epistemological stances, between humility about what we know and confidence about what we don't. In the tradition of Reiter, Date, and Darwen, this is a formal question, not a philosophical one, and that framing is worth taking seriously. The distinction was introduced in the database literature as a proof-theoretic one, even if it is often discussed later in epistemic or philosophical terms. It was established not by philosophers but by a logician working on database query evaluation, and it has precise formal consequences.

Reiter, 1978

The term Closed World Assumption was coined by Raymond Reiter in a 1978 paper, "On Closed World Data Bases," published in the volume Logic and Data Bases (Gallaire and Minker, eds., Plenum Press). This was not a work of epistemology. It was a formal treatment of what a deductive system is licensed to conclude when querying a database.

Reiter defined both assumptions precisely. The Open World Assumption corresponds to "the usual first order approach to query evaluation: given a database DB and a query Q, the only answers to Q are those which obtain from proofs of Q given DB as hypotheses." The Closed World Assumption goes further: "certain answers are admitted as a result of failure to find a proof. More specifically, if no proof of a positive ground literal exists, then the negation of that literal is assumed true."

These are proof-theoretic definitions. The question they answer is: what is a formal inference system entitled to conclude from a given set of facts? That is a question about the mechanics of logical deduction, not about the nature of reality or the proper attitude of a knowledge-seeker.

Note also the asymmetry in naming. Reiter coined Closed World Assumption as a marked, named concept precisely because he was introducing a departure from the standard approach. What we now call the Open World Assumption largely corresponds to the unaugmented proof behavior of first-order query evaluation in Reiter's setup: the default that existed before his intervention. The term OWA gained currency later, retroactively, as a label for that prior default once the contrast became worth naming.

What the Definitions Actually Mean

Consider a simple employee database containing exactly two facts:

Alice is an employee.
Bob is an employee.

Now pose the query: Is Carol an employee?

Under open-world reasoning, Carol's employment is not entailed by the database. The system is not licensed to conclude either that she is an employee or that she is not. Her absence from the database proves nothing; additional facts might exist that are simply not recorded here.

Under the Closed World Assumption, the failure to find a proof is treated as sufficient grounds for the corresponding negative conclusion. If Carol were an employee, she would appear in the database. She doesn't. Therefore she is not an employee. Under the CWA, the database is treated as defining the complete extent of what is true within its domain, and what falls outside that extent is taken to be false.

C.J. Date, drawing on Reiter's framework, states the CWA this way: Everything stated or implied by the database is true; everything else is false. More formally: if a tuple could appear in a relation at a given time but doesn't, then the proposition corresponding to that tuple is assumed to be false at that time.

"I Don't Know" Is Not a Third Truth Value

Here is where the proof-theoretic grounding connects to a deeper foundational point: one that follows within the classical logical foundation on which the relational model, in the Date/Darwen tradition, is built.

The relational model, as articulated by Date and Darwen, presupposes classical two-valued logic. In that tradition, this is not treated as an arbitrary design choice. The classical propositional and predicate calculus on which that relational theory depends recognizes exactly two truth values: TRUE and FALSE. That commitment has roots going back roughly 2,300 years in the classical logical tradition, and it underpins everything from the definition of a predicate to the rules of inference used to evaluate queries and check integrity constraints.

Within that classical framework, a proposition is taken to be either true or false, even when an observer does not know which. When someone says "I don't know whether Carol is an employee," they are describing an epistemic condition: a limitation of their own knowledge. They are not altering the logical status of the proposition Carol is an employee. That proposition has a truth value. The observer simply doesn't know what it is.

The distinction that matters is between:

"Unknown": an epistemic state of the observer. You don't know which truth value the proposition has.
"True" or "False": properties of the proposition itself.

These are not the same kind of thing. A proposition does not become a third kind of thing because you haven't looked it up. Your ignorance is a fact about you, not a logical property of the proposition. Within classical logic, this is the Law of the Excluded Middle: for any proposition p, either p is true or NOT p is true. There is no third state for the proposition to occupy.

What "Closed" Actually Means

The word "closed" carries rhetorical baggage: closed minds, closed systems, closed thinking. Date explicitly acknowledges this; if you want to argue that closed is better, you can find yourself on the defensive before you've said anything. The pejorative connotation is a distraction.

What is actually closed under the CWA is not the world, and not reality. What is closed is the model. As Date puts it, the world of a database is characterized by the predicates for its relations: the formal statements of what each table means. That model covers exactly the portion of reality it was designed to cover, and nothing else. The suppliers-and-parts database has nothing to say about the weather. The library catalog has nothing to say about books not held by the library.

If a book doesn't appear in a library catalog, we reasonably conclude it isn't in the library. That is the CWA applied to the library's world: the world defined by the catalog's predicate. Under open-world reasoning, we could not draw that conclusion from the catalog alone; the book's presence in the library would remain undetermined by the database, which would make the catalog useless as a tool for finding books.

No metaphysical claim about the completeness of the universe is required. The CWA, in effect, says: within the scope of this model, what is not derivable as true is treated as false.

Why NULLs Fail: Under Either Assumption

SQL's NULL is commonly defended as a way of handling incomplete or missing information: a placeholder for values the database doesn't currently hold. By extension, SQL employs three-valued logic (3VL), in which a predicate can evaluate to TRUE, FALSE, or UNKNOWN. Date's response is direct: NULLs can indeed be used to represent missing information; that is exactly what they are intended for. But is this a good idea? His answer is no, and the OWA is part of the reason why.

The first problem is that NULL is the wrong response given the CWA. Under the CWA, the absence of a tuple is treated as licensing the corresponding negative conclusion, not as a suspended or "unknown" case. Introducing a NULL marker implies the system is withholding judgment where the CWA licenses none: the proposition simply fails.

But here is a deeper problem that Date presses directly: SQL-style missing-information markers fit awkwardly even with open-world reasoning. Suppose you try to represent "supplier S4's city is unknown" by inserting a tuple with a special UNK value in the CITY column. Under open-world reasoning, the database alone does not license you to exclude the possibility that S4 is also in some other city; absence from the database proves nothing. You have stated something definite, and the open-world assumption re-opens it. The attempt to represent missing information explicitly is, in Date's argument, undermined by the very assumption you are operating under. The CWA, by contrast, allows missing information to be represented explicitly and unambiguously: if you record UNK as S4's city, that is S4's city, full stop.

There is a further compounding problem. "Missing information" is not one thing. Date distinguishes at least two logically distinct cases: a value that is unknown but applicable (the supplier has a city, we just don't know which one), and a value that is inapplicable (the attribute simply does not apply to this entity). SQL's NULL collapses both into a single marker, which introduces its own class of logical anomalies on top of everything else.

The deeper argument is therefore not merely that NULLs introduce 3VL, which departs from the classical two-valued framework the relational model presupposes. It is that NULLs are the wrong mechanism even for the problem they are meant to solve, and that the CWA, combined with explicit design, handles genuinely missing information more honestly and more precisely than OWA and NULLs combined. In the Date/Darwen view, the correct response to missing information is an explicit design decision: a separate relation, a proper domain value that represents "unspecified," or a reconsideration of what the predicate is actually meant to say.

A Practical Example

Consider a flight booking system. A passenger asks: does flight BA123 to New York depart tomorrow at 3pm?

Under the CWA, the answer is determinate: if the flight is in the database, it exists and is scheduled; if it isn't, it doesn't. Under the CWA, the database is treated as defining the complete extent of what flights exist within this system's model.

Under open-world reasoning, the absence of the flight from the database does not settle the question. The system is not entitled to conclude that the flight doesn't exist merely from its absence; it might exist but simply not yet be recorded. The proposition is not settled by the database alone.

As Date observes: that doesn't seem like a very practical basis for running an airline.

A Note on Other Contexts

It is worth acknowledging that open-world reasoning has been proposed and explored in other contexts: knowledge graphs, ontology languages, and aspects of the semantic web among them, where the goal is to integrate information from heterogeneous, incomplete, and potentially evolving sources rather than to query a closed and well-defined model. Whether the term OWA means the same thing in those contexts as what Reiter defined is a separate question, and not a simple one. Date himself, having raised the issue, was careful to say he lacked sufficient knowledge of the semantic web to say with confidence how it actually operates.

The point here is narrower: for database query evaluation, for any system in which a relation is defined by a predicate and queries are answered against it, the Closed World Assumption is, in the Date/Darwen tradition, the more logically coherent choice. In such settings, the inability to infer falsity from absence may be a feature rather than a defect; but ordinary database query evaluation is not such a setting. Open-world reasoning applied to query evaluation produces a system that cannot, in general, draw definitive negative conclusions from absence. And NULLs, whatever justification is offered for them, introduce a third truth-functional state into a framework originally conceived in two-valued terms: a move that, in the relational tradition, is argued to compromise its coherence.

What Reiter settled in 1978 was the formal character of the distinction. The argument from that point forward is not primarily about philosophical temperament; it is about what follows if one takes classical two-valued logic seriously as the foundation of a query system.

Key references:

Raymond Reiter, "On Closed World Data Bases," in Logic and Data Bases, Gallaire and Minker (eds.), Plenum Press, 1978.
C.J. Date, "The Closed World Assumption," in Logic and Relational Theory: Thoughts and Essays on Database Matters, Technics Publications, 2020.
C.J. Date and Hugh Darwen, Databases, Types, and the Relational Model: The Third Manifesto (3rd ed.), Addison-Wesley, 2006.