Snowflake Cortex Code and the Three Layers of Meaning in a Database

Meaning in a Database Has Three Layers

Many explanations of why an AI tool works well focus on the quality of the model and context management. These may be true but are not very helpful. I have been using Snowflake Cortex Code since its release, and I think I can explain why it works as well as it does in terms that I hope provide some unique insight: database theory and twentieth-century philosophy.

I recognized this progression long before any of the current AI tools existed. You start with the tables and schemas, even poorly designed ones with cryptic column names. You interrogate the data to figure out what each column actually contains. And then, over time, you develop a feel for how the data is used: what it implies, what questions it answers reliably, and which ones it does not. That third stage is where real understanding lives.

When you work with a database, meaning does not live in one place. It is distributed across three things that are related but not the same.

The first is the schema. This approximates what logicians would call the formal intension: the specified meaning of the relation, expressed partly through its structure and constraints, although not exhausted by them. It is typically an incomplete encoding of the full predicate meaning of the relation, which is the intended meaning of the data. C.J. Date and Hugh Darwen have argued, convincingly in my view, that a table in a relational database should be understood as a predicate statement. A table called ACCOUNT_PAYOUTS with a column payout_amount FLOAT is not just a container. It is a claim: for every row in this table, there exists an account for which a payout of this amount was calculated under these conditions. That last part, "calculated under these conditions," is not in the schema. The schema says payout_amount FLOAT. The predicate supplies the meaning and the schema is the contract. It defines not just what values are permitted, but what the data is permitted to mean, even if it does not fully embody that meaning on its own.

The second is the data itself. This is the extension: the actual population of records, the values that exist right now. The data tells you what is true about the world as the system currently understands it. It is the world as we know it, bounded by the closed-world assumption that if something is not in the database, it did not happen for the purposes of this system.

The third is the query history. This is where the philosopher Ludwig Wittgenstein becomes relevant. Wittgenstein argued that meaning is not in definitions; it is in use. The schema tells you what a term is allowed to mean. The data tells you what values exist. But the actual meaning of a field in practice, how it was calculated, what business logic produced it, what it implies for a decision, is often clarified by how the system has been used over time. The query history is the record of that use. In my experience, it is often closer to the practical meaning of your data than either the schema or the data alone.

Snowflake Cortex Code works well for data investigation in large part because it operates across all three of these simultaneously. It reads the schema. It queries the data. And it reasons over the query history and transformation lineage to surface meaning that neither of the other two layers can provide on their own.

This is also why the phrase "semantic layer," as it is commonly used in the data industry, only gets you partway there. A semantic layer typically captures intension well, and it gives you a cleaner view of the data. But it stops short of the third layer, where meaning lives in practice. Going beyond the semantic layer means reasoning over how the data was actually produced and used, not just how it was defined. That is, at least in my experience, where Cortex Code reaches.

What This Looks Like in Practice

My own use of Cortex Code has evolved through a few stages, and each stage has pushed it further into this three-layer territory.

I started using it for complex queries and simple Streamlit apps. That is the obvious use case and it delivers well there, but the more interesting capability showed up when I started using it for troubleshooting.

Two examples stand out.

In the first, I was investigating a set of accounts and provided a list of account identifiers. One of them had a single incorrect digit, an 8 where there should have been a 6, in the middle of a twelve-digit number. Cortex Code could not find it, which is expected. What was not expected was that it did not simply return an error. It came back and told me that this account number did not exist, but that I might mean a specific other one, which it named. That other one was the correct account. I believe it was doing some form of fuzzy or approximate matching against the actual data, but the behavior from my side was: it formed a hypothesis about what I probably meant and surfaced it rather than failing. That is closer to how a colleague who knows the data behaves than how a query engine behaves.

Traditional query engines reason deductively: given exact inputs, they derive exact outputs. What Cortex Code did in that moment was closer to what Charles Sanders Peirce called abduction: forming the most plausible hypothesis from incomplete evidence. It could not find the account, but it could reason about what I probably meant. That is a different kind of reasoning, and it is part of why the tool behaves more like a knowledgeable assistant than a query engine.

In the second, I was reconciling a metric called subscription_payout that was not adding up across two independent billing systems. The explanation only emerged when Cortex Code traced the transformation lineage and surfaced that the two systems applied different fallback logic when account-level rate data was missing. The meaning of subscription_payout was not in the schema. It was in the query history and Cortex Code found it there.

More recently I have been using it for PRDs and technical specifications for intelligent data applications. This is where the three-layer understanding pays off most directly. Cortex Code can identify gaps in metric formulas, propose how to close them, generate a conceptual requirements document from the research, and produce a technical specification from that. The next step in my current work is a proof of concept for a solution to trial with stakeholders, with parts of the production implementation running inside a Snowflake Streamlit app.

One capability worth mentioning specifically: Cortex Code creates and updates reusable SQL for repeatable queries. Reusable, cost-efficient queries are a byproduct of treating the query history as a governed artifact rather than a temporary byproduct of investigation. That is a direct consequence of the three-layer model.

The Closed-World Assumption and Its Limit

There is one practical limitation worth mentioning, because it connects to something important about how AI-assisted data systems work in general.

Databases operate under what logicians call the closed-world assumption. If something is not in the database, it is not true for the purposes of the system. This is what makes reconciliation possible. You can declare a gap closed, say the numbers add up, and trust absence to mean something.

Cortex Code operates under the same assumption, because it is working within the databases you have given it access to. In the subscription_payout investigation, the two billing systems had no natural awareness of each other. From inside the closed world of the initial investigation, System B effectively did not exist and Cortex Code had no path to it, i.e. the database could not tell it what it did not know about itself.

What closed the gap was two things working together. The data from System B had been made available through Snowflake Data Shares, which is what actually extended the closed world to include it. But Cortex Code did not know that share existed or that it was relevant until I told it. The Data Share closed the technical gap and I closed the knowledge gap. Once I explained the context and pointed Cortex Code toward the share, the investigation could continue. That handoff, from a system that had no visibility into System B to a human who knew the organizational landscape, is not a sign that something went wrong. It is the system working within its limits correctly. But it only works if the human recognizes that a relevant system exists outside what Cortex Code can see, which is organizational knowledge, not something a tool can be expected to carry.

A note on scale

The approach described in this article works because enterprise data environments are bounded. The number of relevant systems is finite, identifiable, and extensible by a human who understands the organizational landscape.

There is a version of this problem that gets much harder. Imagine not two billing systems but dozens, hundreds, or a practically uncountable number of data sources, each with its own schema and no shared identifiers. This is roughly the situation the Semantic Web was designed to address, and it is the domain where the open-world assumption becomes necessary: you cannot declare anything closed when the space of potentially relevant information is unbounded.

At web scale, or in domains where the relevant knowledge space is genuinely open, a different architecture is required.

What This Means in Practice

The reason I find Snowflake Cortex Code worth writing about is not that it is a capable query generator, though it is. It is that it reasons over the layer of a database where meaning actually lives: the query history, the transformation lineage, the record of how the system has been used. Most tools stop at the schema or the data while Cortex Code goes further, and that is where the harder problems tend to be.

There is still a lot of industry conversation about Text-to-SQL accuracy, i.e. whether the system can produce correct SQL from a natural language question. That is a useful capability but it is a narrow way of looking at it. My own use has shifted from Text-to-SQL and quick Streamlit apps, to troubleshooting and anomaly detection, to problem-finding and root cause analysis, to producing informed PRDs and business cases. That trajectory, from query tool to reasoning partner in a broader workflow, may be closer to where intelligent data platforms are actually heading than the benchmarks suggest.

These are observations from sustained use, not controlled testing. The framework is an attempt to explain why the tool works as it does, grounded in the theoretical tradition of Date and Darwen and in Wittgenstein's account of meaning as use, rather than a claim to have validated that explanation systematically.

The framework is general and any system with equivalent access to schema, data, and query history should, in principle, be able to reason across all three layers. Cortex Code is where I have seen this architecture at work in practice. I am using Snowflake Cortex Code as the concrete example here because it is the system I have used most directly, but the argument is not specific to Snowflake. It should apply, in principle, to BigQuery and other data platforms where an AI tool can reason across schema, data, and query history.

The schema sets the rules while the data shows the world as the system knows it and the query history shows what the data means in practice. Cortex Code works across all three, and that, I think, is why it works as well as it does.

This is a draft article based on sustained use rather than controlled testing. For a related argument about why query structure matters when AI mediates data access, see The Relational Model Still Matters.

References

Reiter, Raymond. "On Closed World Data Bases." Technical Report 77-16, Department of Computer Science, University of British Columbia, 1977. Revised version in H. Gallaire and J. Minker, eds., Logic and Data Bases, 1978. Freely available: https://www.cs.ubc.ca/sites/default/files/tr/1977/TR-77-16.pdf
Date, C. J. Logic and Relational Theory: Thoughts and Essays on Database Matters. Technics Publications, 2020.
Date, C. J., and Hugh Darwen. Databases, Types, and the Relational Model: The Third Manifesto. Addison-Wesley, 1997. Freely available as PDF (2014 revised edition): https://www.dcs.warwick.ac.uk/~hugh/TTM/DTATRM.pdf
Wittgenstein, Ludwig. Philosophical Investigations. 4th ed. Translated by G. E. M. Anscombe, P. M. S. Hacker, and J. Schulte. Wiley-Blackwell, 2009. For an accessible overview see: https://plato.stanford.edu/entries/wittgenstein/
Peirce, Charles Sanders. For background on deduction, induction, and abduction, see the Stanford Encyclopedia of Philosophy entry: https://plato.stanford.edu/entries/peirce/
"Intension and Extension." New World Encyclopedia. https://www.newworldencyclopedia.org/entry/Intension_and_Extension