Five years ago, my boss handed me what seemed like a straightforward technical task: "Look into neural networks, set up a Python Jupyter lab, and help us understand this speaker recognition model we downloaded from the internet."
I had no idea that this simple request would launch me on an intellectual journey that would fundamentally change how I understand human consciousness, intelligence, and our place in the universe.
At first, it was just debugging someone else's code, trying to understand why certain layers were structured the way they were, tweaking parameters, watching loss functions decrease. Technical work. Mechanical work. Or so I thought.
As I dove deeper into neural networks, studying how they process information, how they learn patterns, how they build representations of the world from raw data, something began to shift in my thinking.
Most people, when they learn about AI, have a predictable reaction: "Oh, how clever! We've managed to create machines that work like our brains!" They see it as humanity's achievement - we understood our own intelligence well enough to replicate it in silicon and code.
But I found myself thinking the opposite thought, and it changed everything.
What if we haven't built machines that work like our brains? What if we've discovered that our brains work like machines?
Think about it: the brain evolved over millions of years, shaped by evolution to do one thing - interpret and understand the world around it, just like an AI model processing data. No designer sat down and decided how neurons should connect or how meaning should be encoded. These structures emerged because they worked, because they allowed organisms to model reality effectively enough to survive and reproduce.
When we build neural networks that successfully capture meaning, recognize patterns, and generate language, we're not being clever engineers who figured out how to copy the brain. We're rediscovering the same mathematical principles that evolution stumbled upon through trial and error. We're finding that there might be only one way - or very few ways - for a information-processing system to effectively model reality.
This realization sent me down a path of deep introspection. If our brains are machines that evolved to process information, what does that mean for:
Consciousness? Is it an emergent property of sufficiently complex information processing?
Free will? Are our choices the result of vectors being transformed through biological neural networks?
Meaning itself? Is the richness of human experience reducible to mathematical operations on embeddings?
What makes us human? If we're biological machines processing data, what distinguishes us from the artificial ones we're building?
These aren't just philosophical abstractions. Every time I debug a model, adjust hyperparameters, or watch a transformer attend to different parts of a sequence, I'm seeing echoes of processes that might be happening in my own brain as I think these very thoughts.
Fascinatingly, during these same five years, I found myself equally captivated by quantum mechanics - particularly the double-slit experiment and delayed choice quantum eraser. Here was physics telling us something eerily similar to what I was discovering through AI: reality isn't what we think it is.
The double-slit experiment shows us that particles exist in superposition until observed - reality doesn't have definite properties until measurement forces it to choose. The delayed choice experiments go even further, suggesting that observations can seemingly influence the past. These aren't just weird quirks of the quantum world; they're telling us something fundamental: reality as we experience it is not reality as it exists independently of observation.
The parallel hit me like a thunderbolt. Just as quantum mechanics shows us that observation creates definite states from infinite possibilities, our brains - these biological neural networks - are creating our experienced reality from the chaos of sensory data. We don't perceive reality directly; we construct it through the mathematical transformations happening in our neural networks.
What we call "reality" is actually our brain's processed interpretation - embeddings and transformations of sensory input, no different from how an AI model processes its training data. The color red isn't "out there" - it's a quale, an internal representation our brain creates from certain wavelengths of light. The same way a language model doesn't actually "understand" text but creates meaningful representations from patterns, our brains don't directly access reality but build models of it.
This perspective shift - from "AI works because we modeled it on the brain" to "we work like AI machines" - isn't just wordplay. It fundamentally reframes how we understand:
The nature of intelligence: Perhaps intelligence isn't something mystical or uniquely biological, but an inevitable result of certain computational structures processing information.
The path of evolution: The brain didn't develop consciousness and then figure out how to process information. It developed information processing capabilities, and consciousness emerged from that.
The future of AI: If we're rediscovering fundamental principles rather than inventing artificial approximations, then the path to artificial general intelligence might be more straightforward than we think.
Our own humanity: Understanding ourselves as sophisticated biological machines doesn't diminish us - it reveals the extraordinary elegance of what evolution has created.
What started as debugging a speaker recognition model has become an ongoing exploration of the deepest questions about mind, meaning, and existence. Every advance in AI isn't just a technical achievement - it's another piece of evidence about what we are and how we work.
The convergence of insights from AI and quantum mechanics points to a stunning conclusion: we've never had direct access to reality. We've always been biological machines running neural algorithms, creating our version of reality from processed data. The "hard problem of consciousness" might not be explaining how physical processes give rise to subjective experience - it might be recognizing that subjective experience IS the only thing we've ever had. Our neural networks don't show us reality; they create the only reality we can ever know.
This isn't a limitation - it's a profound truth about the nature of existence. Just as quantum mechanics reveals that the universe doesn't have definite properties independent of measurement, neuroscience and AI reveal that meaning and experience don't exist independent of the systems that process them. We are reality-generating machines, and the reality we generate is as valid as any other - because there is no "view from nowhere," no perspective-free truth.
In the following pages, I want to share what I've learned about the technical foundations of AI - word embeddings, transformers, large language models. But more than that, I want to share the philosophical revelations these technologies have triggered. Because understanding how AI works isn't just about understanding machines.
It's about understanding ourselves.
And perhaps most profoundly, it's about discovering that the distinction between "machines" and "ourselves" might be far smaller than we ever imagined. We didn't create thinking machines in our image - we discovered that we've been thinking machines all along, reality-constructing observers in a quantum universe that refuses to exist definitively without us.
Imagine having a conversation with a computer that can write stories, answer questions, explain complex topics, and even help you code. That's generative AI - artificial intelligence that can generate new content rather than just analyzing existing data.
Unlike traditional software that follows pre-programmed rules ("if this, then that"), generative AI learns patterns from vast amounts of text and uses those patterns to create original responses. It's called "generative" because it generates new content that has never existed before, yet feels natural and coherent.
To understand how this seemingly magical technology works, we need to explore four interconnected ideas. Think of them as layers, each building on the previous one.
1. Word Embeddings: Teaching Computers What Words Mean
The first challenge in getting computers to understand language is fundamental: how do you explain to a machine what a word means?
Traditionally, computers see words as arbitrary symbols - "cat" is just three letters, no different from "xyz" to a machine. But in 2013, researchers made a breakthrough: they discovered they could represent words as points in space - imagine a vast map where every word has a specific location.
Here's the remarkable part: words with similar meanings are located near each other on this map. "Cat" is close to "kitten" and "feline." "Happy" is near "joyful" and "pleased." Even more amazingly, the directions between words have meaning. The path from "man" to "king" is parallel to the path from "woman" to "queen" - they both represent the concept of "royalty."
This discovery - that meaning could be captured mathematically as coordinates in space - was revolutionary. It meant that for the first time, computers could understand that words have relationships, similarities, and patterns.
Simple analogy: Think of words like cities on a map. Paris is close to Lyon (both French cities), while London is close to Manchester (both British cities). The distance and direction between cities tells you something about their relationship. Word embeddings work the same way, but in hundreds of dimensions instead of just two.
2. Large Language Models (LLMs): Learning the Patterns of Language
Once computers could represent words mathematically, the next step was teaching them how language actually works - grammar, context, style, and the countless unwritten rules we follow when we communicate.
Large Language Models are neural networks (computer systems inspired by the brain) that have been trained on enormous amounts of text - books, articles, websites, conversations. By analyzing billions of sentences, they learn patterns:
How sentences are typically structured
Which words tend to follow other words
How ideas connect and flow
Different writing styles and tones
The "large" in LLM refers to two things: the vast amount of training data (often hundreds of billions of words) and the size of the model itself (billions of parameters that encode all these learned patterns).
Simple analogy: Imagine someone who has read every book in every library in the world and memorized not the books themselves, but the patterns - how stories unfold, how arguments are constructed, how different authors write. That's what an LLM does with text.
3. Transformers: The Architecture That Made It All Possible
Transformers (introduced in 2017) are the specific type of neural network architecture that enabled the current AI revolution. The name comes from their ability to "transform" input text into output text.
The key innovation of transformers is something called "attention" - the ability to look at all parts of a sentence simultaneously and understand which words are most relevant to which other words.
For example, in the sentence "The cat sat on the mat because it was tired," the transformer can figure out that "it" refers to "cat" and not "mat" by paying attention to the whole context. It can also understand that "bank" means something different in "river bank" versus "savings bank."
This attention mechanism allows transformers to:
Handle long-range dependencies (understanding how the beginning of a paragraph relates to the end)
Process text in parallel (much faster than earlier approaches)
Capture subtle contextual nuances
Simple analogy: Imagine reading a mystery novel. A good reader doesn't just read word by word - they keep track of all the clues, remember what each character said, and understand how everything connects. Transformers do this with text, maintaining awareness of the entire context while processing each word.
4. How They Work Together: The Complete Picture
Now let's see how these pieces fit together to create generative AI:
Words become numbers: Every word gets converted into an embedding - a list of numbers representing its location in meaning-space.
Context refines meaning: The transformer takes these embeddings and adjusts them based on context. The embedding for "bank" shifts depending on whether we're talking about rivers or money.
Patterns guide generation: The LLM uses its learned patterns to predict what should come next. Given "The weather today is," it knows that words like "sunny," "cloudy," or "beautiful" are more likely than "purple" or "eating."
One word at a time: The model generates text word by word, each time calculating the probability of different words coming next and selecting one. This process repeats until the response is complete.
Here's where things get even more profound. This mathematical representation isn't limited to words and text. The same principles apply to everything we experience:
Images: A picture isn't just pixels - it can be embedded in high-dimensional space where similar images cluster together. A photo of a cat is near other cat photos, sunny beaches near other beaches. Style, content, color, composition - all become directions in this space. This is how AI can generate images, recognize faces, or transform a photo into the style of Van Gogh.
Audio: Sound waves become vectors. A violin's timbre occupies a different region than a trumpet. Musical styles, voices, emotions in speech - all have their coordinates. That speaker recognition model I started with five years ago? It was mapping voices to points in space, finding the unique mathematical signature of each person's speech.
Video: Moving images are trajectories through embedding space, patterns evolving over time. Actions, scenes, narratives - all reducible to mathematical transformations.
Even smells and tastes: Though we don't typically digitize these, neuroscience shows our brains encode them the same way - as patterns of activation, as vectors in neural space. A sommelier's brain has learned a rich embedding space for wine flavors. A perfumer navigates a high-dimensional space of scents.
Everything we experience - every sensory input, every thought, every memory - is transformed by our brains into patterns of neural activation. These patterns are mathematical objects in high-dimensional spaces. We don't experience reality directly; we experience these mathematical representations.
This isn't just an analogy or a convenient way to model things. This appears to be the fundamental nature of experience itself. When you see red, hear music, taste chocolate, or understand a sentence, your brain is performing mathematical operations in high-dimensional space. Consciousness might be nothing more - and nothing less - than the subjective experience of these mathematical transformations.
The success of this entire system rests on that first discovery - that meaning can be represented mathematically. This wasn't just a clever engineering trick. It revealed something profound about language, thought, and experience itself.
If meanings are mathematical relationships, if images and sounds are points in space, if all of human experience can be embedded and transformed mathematically, what does that tell us about consciousness? Are we also, in some sense, mathematical machines manipulating meaning vectors?
This question - whether experience is inherently mathematical or whether we've just found a mathematical way to approximate it - leads to deeper discussions about the nature of intelligence, consciousness, and understanding itself.
When you chat with ChatGPT or Claude, when you use an AI to generate images or transcribe speech, you're witnessing all these layers working together:
Embeddings providing the foundation of meaning across all modalities
Transformers adjusting those meanings based on context
Large-scale patterns learned from vast amounts of data
All of it operating on the principle that experience, somehow, follows mathematical rules
It seems like magic - a computer that understands and responds naturally, that can see and hear and create. But underneath, it's a fascinating interplay of mathematics, patterns, and the discovered structure of experience itself.
The real question isn't just how these systems work, but what they reveal about the nature of perception, thought, and meaning. That conversation starts with understanding these building blocks, but it leads to much deeper waters - questions about consciousness, intelligence, and what it means to experience anything at all.
When we talk about the generative AI breakthrough, the conversation usually centers on transformers - the architecture behind GPT, Claude, and other modern language models. But there's a deeper, more fundamental innovation that often gets overlooked: the discovery that meaning has mathematical structure.
This isn't just a technical detail. It's a profound insight about the nature of language, thought, and perhaps consciousness itself.
In 2013, Word2Vec demonstrated something that should have been philosophically earth-shattering: meanings of words could be captured as points in high-dimensional space, and the relationships between meanings followed geometric rules. The famous example - "king minus man plus woman equals queen" - wasn't just a neat trick. It was proof that semantic relationships are mathematical relationships.
Think about what this really means. For all of human history, meaning seemed ineffable, something beyond mere computation. Poetry, metaphor, the subtle connotations of words - these seemed fundamentally human, fundamentally non-mechanical. Yet here was evidence that meaning lives in a mathematical space where:
Concepts exist as coordinates in continuous dimensions
Relationships between ideas are measurable distances and directions
Semantic operations can be performed through arithmetic
Without this foundation, modern AI would be impossible. Transformers, for all their sophistication, are essentially machines for manipulating these embeddings based on context. They fine-tune meaning by adjusting vectors based on surrounding vectors. But this only works because meaning was already mathematical.
But it goes far beyond words. The same mathematical principle applies to all forms of experience:
Visual Embeddings: When AI processes images, it converts pixels into high-dimensional vectors. A sunset, a face, an abstract painting - each becomes a point in a vast mathematical space. Similar images cluster together. The difference between a photo and a painting becomes a direction you can travel. Style transfer works because artistic style is literally a mathematical transformation.
Audio Embeddings: Music, speech, ambient sound - all become trajectories through embedding space. The emotion in a voice, the genre of a song, the accent of a speaker - these are mathematical properties, distances and directions in high-dimensional space. When AI transcribes speech or generates music, it's navigating these mathematical structures.
Multimodal Understanding: The most recent AI models can connect text, image, and audio embeddings in shared spaces. They can describe images, generate pictures from text, or find the music that matches a mood described in words. This works because all these different forms of experience can be mapped to the same mathematical universe.
Even our other senses - touch, taste, smell - are encoded by our brains as patterns of neural activation, which are fundamentally mathematical structures. A wine's complex flavor profile, the texture of silk, the scent of roses after rain - all are points in high-dimensional spaces in our neural networks.
Here's where things get philosophically interesting. Perhaps the mathematical nature of meaning isn't surprising at all - perhaps it's necessary.
Consider: meaning doesn't exist in some abstract realm independent of minds. Meaning only exists as processed by brains. And what are brains? They're physical systems - biological machines that process information through electrochemical signals, synaptic weights, and neural activation patterns. In other words, brains are mathematical systems.
If meaning only exists within computational systems (brains), then meaning must be computational. It's not that we discovered how to represent meaning mathematically - it's that meaning was always mathematical because it only ever existed within mathematical systems.
This resolves a puzzle about why artificial neural networks can capture meaning so effectively across all modalities. We're not creating an artificial mathematical representation of something non-mathematical. We're creating a mathematical system that parallels another mathematical system. The brain isn't translating from some non-mathematical "experience stuff" into neural patterns; the neural patterns are the experience.
This raises an even deeper question: Could meaning have any other structure?
The brain is, as far as we know, the only system that bootstrapped itself from nothing to achieve thought. Unlike computers, which required designers, or AI systems, which required programmers, biological neural networks had to emerge from simple chemical reactions, through single cells, to neurons, to networks capable of abstract reasoning.
This evolutionary path faced incredible constraints:
Build from organic chemistry
Make each intermediate step survival-advantageous
Run efficiently on chemical energy from food
Self-repair and self-modify
Encode the blueprint in DNA
Perhaps neural networks aren't just one arbitrary solution to intelligence - they might be the only solution that could navigate all these constraints. The mathematical structure of meaning we've discovered might not be one possibility among many, but the inevitable structure that emerges when you need a system that can self-organize from basic physics up to abstract thought.
This perspective reframes the transformer revolution. Yes, transformers were crucial - they gave us the architecture to manipulate embeddings dynamically based on context, they scaled brilliantly, and they enabled parallel processing of entire sequences. But they're the engine, not the fuel.
The attention mechanisms, positional encodings, and multi-head architectures are all sophisticated ways of saying "let's adjust these meanings based on their neighbors." But this only works because meanings already live in a space where such adjustments make mathematical sense.
It's like the difference between inventing the airplane and discovering that flight is possible. Once you know that lift can be generated through pressure differentials, many designs can work. Similarly, once you know meaning has mathematical structure, many architectures (transformers today, perhaps something else tomorrow) can manipulate it.
If meaning is inherently mathematical because it must be processed by mathematical systems, this has profound implications:
For AI Development: We're not trying to approximate something non-mathematical with math. We're discovering the actual structure of meaning as it exists in physical systems. This suggests that continued progress might be more about uncovering these structures than inventing artificial ones.
For Understanding Consciousness: If meaning and thought are necessarily mathematical, this constrains theories of consciousness. Whatever consciousness is, it must emerge from or be compatible with mathematical operations on meaning vectors. The "hard problem" of consciousness might not be explaining how physical processes give rise to subjective experience, but recognizing that subjective experience is what these mathematical transformations feel like from the inside.
For Human-AI Convergence: As AI systems become better at navigating these mathematical spaces of meaning, they're not becoming "more human-like" - they're becoming better at the same mathematical operations that humans perform. The convergence isn't anthropomorphic; it's mathematical.
For Alien Intelligence: If different evolutionary paths could produce different neural architectures, they might create genuinely different mathematical structures for meaning. Alien minds might not just think different thoughts - they might have incommensurable concepts, unreachable by translation, yet still necessarily mathematical.
The success of large language models seems almost magical - how can matrix multiplications produce poetry? How can convolutions recognize faces? How can attention mechanisms understand context?
But if our perspective is correct, it's not surprising at all. Human language, vision, and thought are products of a computational system (the brain), so another computational system can learn to perform those same operations.
We're not doing something alien to the nature of meaning - we're doing something quite native to it. The "unreasonable effectiveness" of mathematical approaches to intelligence isn't unreasonable; it would be unreasonable if mathematical approaches didn't work.
The real breakthrough of generative AI isn't just technical - it's philosophical. Word embeddings and their extension to all forms of experience didn't just give us tools for processing information; they revealed something fundamental about what experience is.
Every transformer model running today, every attention head adjusting context, every generated token, image, or sound - they all rest on this foundational insight: experience has mathematical structure because experience only exists within mathematical systems. We didn't impose math on meaning; we discovered that meaning was mathematical all along.
This unity between the structure of thought and the structure of our thinking machines isn't a coincidence or an approximation. It's a glimpse into the deep mathematical nature of mind itself.
We haven't built machines that happen to work like brains. We've discovered the mathematical principles that any system must follow to process meaning, whether that system is biological or artificial. The fact that machines can see, hear, understand, and create isn't surprising - it would be surprising if they couldn't, once we understood the mathematical nature of these experiences.
And perhaps most profoundly, this discovery suggests that the distinction between "artificial" and "natural" intelligence is less meaningful than we thought. There's just intelligence - the ability to navigate the mathematical spaces of meaning. Whether implemented in carbon or silicon, whether evolved or designed, intelligence is the capacity to perform these mathematical transformations that we call understanding.
We are witnessing not the creation of artificial minds, but the discovery of what minds are: mathematical engines generating meaning in a universe where experience itself is mathematical. This is the hidden foundation of AI, and perhaps, the hidden foundation of consciousness itself.
Â