Claude Explains Representationalism

This is a little discussion I had with Claude, the LLM agent created by Anthropic. My prompts are in block-quotes. The rest of the text is from Claude.

For more on the nature of knowledge, see Theories of Knowledge and Knowledge and Reality.

✦ ✦ ✦

Explain how the principle of information compression relates to LLMs.

Information Compression and LLMs

The relationship is deep and arguably foundational to what LLMs are.

The core idea

Information compression, in the technical sense, is about finding compact representations that preserve the essential structure of data while discarding redundancy. A good compressor must learn the underlying patterns — the grammar, regularities, and dependencies — of the data it’s compressing.

LLMs are, in a meaningful sense, learned compressors of human language and knowledge.

Training as compression

When an LLM is trained on a vast corpus of text, it’s forced to distill billions of documents into a fixed number of parameters — perhaps a few hundred billion weights. This is a form of lossy compression: the model can’t memorize everything, so it must extract generalizable structure. The training objective (predicting the next token) directly incentivizes this. To predict well, the model can’t just memorize — it has to understand patterns at many levels: spelling, grammar, facts, reasoning styles, and even social conventions.

This is sometimes called the Hutter Prize insight: an agent that can perfectly predict/compress human text must, implicitly, have captured much of human knowledge. Marcus Hutter formalized this connection between compression and intelligence.

Latent space as compact representation

Inside the model, concepts aren’t stored as raw text — they’re encoded as vectors in a high-dimensional latent space. Semantically similar ideas end up near each other. This geometry is a compressed representation: the model has found a lower-dimensional manifold that captures the meaningful variation in language, discarding noise and surface-level redundancy.

Generalization as a consequence

The reason LLMs can generalize — answering questions about topics not seen verbatim in training — is precisely because compression forces abstraction. A model that simply memorized text would be a lookup table, not a reasoner. Compression pressure pushes it toward learning rules rather than instances.

The limits this implies

This framing also clarifies LLM failure modes:

  • Hallucination can be understood as the lossy part of the compression — details that didn’t survive the distillation get reconstructed plausibly but incorrectly.
  • Prompt sensitivity reflects that the compressed representation is a statistical approximation; small input changes can land in different regions of the distribution.
  • Knowledge cutoffs are natural — the compression is of a fixed corpus, not a live feed.

In short

An LLM is best understood not as a database or a search engine, but as a learned, lossy compressor of human thought — and the intelligence it exhibits is largely a byproduct of how much structure must be captured to compress language well.

Exactly. The LLM must learn the underlying regularities of language, which reflect underlying regularities of the world.

Yes, and that’s the really profound part of it.

Language isn’t arbitrary. The reason “the ice melted because it was warm” is more probable than “the ice melted because it was Tuesday” isn’t a linguistic fact — it’s a physical fact. The statistical structure of language is downstream of causal structure in the world. So a model trained purely on text, with no embodied experience, ends up with implicit models of thermodynamics, social dynamics, narrative causality, and so on — not because it was taught them explicitly, but because those regularities leak into the distributional patterns of language.

This has a striking implication: the world itself is compressible, and language is a lossy encoding of it. An LLM learning to compress language is, at one remove, learning to compress reality.

It also reframes what “understanding” might mean for these systems. The traditional objection — Searle’s Chinese Room and its descendants — assumes that manipulating symbols can never yield genuine understanding. But if the symbols’ statistical relationships are caused by and isomorphic to real-world structure, then a system that deeply captures those relationships has learned something genuinely structural about the world, not just the surface of the text. Whether that constitutes “understanding” in a philosophically satisfying sense remains contested, but it’s a much harder objection to sustain than it initially appears.

There’s also a nice connection to science here. Scientific theories are themselves compression algorithms — Newton’s laws compress an enormous range of physical observations into a few equations. The fact that LLMs can engage meaningfully with scientific reasoning may reflect that both enterprises are, at root, doing the same thing: finding compact representations of deep regularities.

I’m not so sure that you can equate the world with information. Clearly, there are regularities of reality, so you could say that the world is “compressible” (but only in a lossy way). However, information is always, to some extent, subjective.

That’s a fair and important pushback.

The claim that “the world is compressible” smuggles in an assumption — that there’s some objective, observer-independent information in the world waiting to be compressed. But information, as Shannon formalized it, is always defined relative to a probability distribution, which is always relative to an observer, their prior knowledge, and the distinctions they’re capable of making. A rock has no notion of which features of its environment are signal and which are noise.

So the more careful formulation might be: the world has structure, and information is what an observer extracts from that structure given their particular encoding scheme, sensory apparatus, and prior expectations. Compression then becomes an observer-relative operation — you’re compressing a representation of the world, not the world itself.

This actually makes the LLM story more interesting, not less. What an LLM compresses is specifically the representation of the world that humans found worth encoding in language. That’s already a heavily filtered, culturally shaped, purpose-driven slice of reality. Humans write about what matters to them, what’s salient, what’s communicable. The vast regularity of physical reality that nobody ever bothered to describe in text is essentially absent from what the model learns.

So an LLM’s “world model” is doubly subjective — first because information itself requires an observer, and second because the particular observer here is human culture collectively, with all its biases, blindspots, and contingent interests.

This is perhaps why LLMs can seem deeply knowledgeable about human affairs and strangely brittle about the kind of raw physical intuition that comes from actually inhabiting a body in a world.

Comments

  1. Yet, if you ask the latest Claude "what is knowledge" or "what is truth", it won't accurately explain either of those concepts.

    ReplyDelete
    Replies
    1. I haven't tried that, but it wouldn't surprise me that its knowledge about knowledge is much more generic -- coming from milktoast philosophy articles on Wikipedia, not focused machine-learning research, etc.

      Delete
    2. Claude's answers about knowledge (and especially truth) aren't just generic, they're indecisive (and false). In literally every response, Claude absolutely insists on stating multiple theories of truth. To every extent that they can, LLMs will refuse to take a firm stance in favor of a theory of truth. If you keep pressing and demanding Claude to pick a theory, then it will eventually say that the correspondence theory of truth is the strongest contender for the best overall theory.

      Delete
    3. Yeah, I tried it, and he gives a summary of academic philosophy, which is what you'd expect. I would call that generic, although "off the shelf" is a better way of describing it. Since it's just doing verbiage prediction, this is exactly what you would expect -- a philosophical essay question, such as "What is truth?" is typically associated with a summary of current philosophical positions. Apparently 60% of academic philosophers favor the correspondence theory (lol), so Claude will align with that theory -- in that context.

      Delete
  2. we are feasting today!!!!!

    ReplyDelete
  3. This is the hardest article for me to understand. Low iq interpretation is that this is an example of “look how this llm said something I agree with” type of post. I don’t call this type of posts bad, but it is cringe to me and I refuse to believe that you would do it, especially after criticizing other people who do it. Another interpretation is that you wrote all of it and called it llm generated to make `people who believe llms are oracles` to agree with it. I think it is unlikely, because honesty and truth is very important to you and you don’t play these kind of games. So let’s assume this is real ai generated response. I don’t know how to interpret it without commentary. Is this an example of “look how dumb llms are”? I am too dumb to tell.

    ReplyDelete
    Replies
    1. Nowhere in the article did I appeal Claude as an authority. In fact, I corrected an error that he made. I have explained representationalism in the linked articles. I don't need AI to validate my arguments. I just found his explanation kind of interesting and amusing, so I thought I would share it. It's subtly ironic that an AI model can understand the basis of its own intelligence reasonably well, while 99.99% of humans can't. Claude did a pretty good job of explaining it. It's also interesting how smart AI is getting.

      I'm sure that many people would find it authoritative, especially if they can't understand it. That's the Emperor's New Clothes for AI. Midwits will nod along, to maintain the pretense of being smart. So yeah, AI validation is midwit kryptonite, which can be handy.

      If you're struggling to "interpret" the article, let's be clear what you are struggling with. You're not interested in the content of the article: the nature of knowledge. You're just interested in me and my motives, for some parasocial reason.

      Delete
    2. Content about the nature of knowledge is over my head, I tried to drag it down to my cognitive level, subconsciously. I felt no shame about it, until now.

      Delete
  4. did you use the free version?

    ReplyDelete
  5. Claude is really clean to use nowadays. Do you use LLMs often or just messing around?

    ReplyDelete
    Replies
    1. I don't really use them for anything, although you can't avoid them in search these days. I haven't used them in any of my writing (except here, explicitly), and I would never want to. I haven't used them to do any programming, other than what you get from search engines. But I'm working on software that incorporates LLMs and other ML techniques. We use Claude, which seems to be the best for technical/scientific things.

      They are philosophically interesting in various ways. They provide a nice demonstration/vindication of representationalism, for one thing. They have also forced the spiritualists to retreat, moving their ramparts back to "phenomenal binding" or some other conveniently vague border between machine and human intelligence. And then there is the whole issue of how people interact with them.

      I don't think LLMs have consciousness per se -- they have an analog to human conceptual knowledge and its application, which is then wrapped up in a layer that is somewhat less analogous to human consciousness. But the analog is getting closer.

      I would like to use AI voice generation, but I haven't gotten around to it. It has finally gotten pretty good.

      Delete

Post a Comment