[Gardner Analytics Apartment — January 2014, 11:47 PM]
The apartment door stuck on the third try, the lock grinding against the frame where the wood had swollen from San Francisco's persistent damp. Ethan shouldered it open, tossed the observer badge on the kitchen counter, and went straight for the laptop.
No transition. No decompression from the day. The faces from Disrupt were already fading — the fours and fives, the buzzword keynotes, the mediocre coffee. What remained was the sharp image of Richard's compression demo and the quiet certainty of what came next.
He needed to build.
Ethan sat at the desk, opened the laptop, and closed his eyes.
The Transformer materialized.
Not gradually, not piece by piece — the full architecture bloomed in his mind like a building he'd walked through a hundred times. He could see the entry point where raw text became embeddings, numbers assigned to words, position woven in through sinusoidal encoding. From there, the data split into parallel streams, six attention heads in the base configuration, each one a different lens examining the same input from a different angle.
He focused on the first attention head and the resolution sharpened. Queries, keys, values — three projections of the same input, transformed by learned weight matrices. The queries asked: what am I looking for? The keys answered: what do I contain? The values delivered: here's the information. Dot products between queries and keys produced attention weights. Softmax normalized them into probabilities. Multiply by values. Sum. Output.
The entire mechanism was, at its core, a sophisticated way for every word in a sentence to decide how much it cared about every other word. Elegant. Simple in concept, staggering in implication.
He pulled back. The six attention heads fed into a concatenation layer, then a linear projection. Residual connections arced over the whole block — data highways that let gradients flow clean during training. Layer normalization smoothed the signal. Then a feed-forward network: two linear transformations with a ReLU activation sandwiched between, expanding the dimension to four times its size and compressing it back.
One Transformer block. Stack six of them for the encoder. Six more for the decoder. The decoder had an extra layer — cross-attention, where it looked at the encoder's output to generate the next token.
The whole thing. Perfect. Complete. Available for examination from any angle, at any magnification.
And already, after four minutes of focused attention, a dull ache was forming behind his left eye.
Ethan opened his eyes and started typing.
The Accelerated ML Cognition engaged. Not like a switch — more like a current. One moment he was thinking about what to write. The next, his fingers were already writing it. The gap between intention and execution collapsed. Import statements flowed into class definitions. Matrix operations materialized on screen as if someone were dictating them directly into the editor.
He started with the core: the scaled dot-product attention function. In 2025, this was a one-liner in PyTorch. Here, in 2014, he was writing it from scratch in Theano, wrestling with the framework's computational graph abstraction layer, manually defining symbolic variables and shared parameters.
python
def scaled_dot_product_attention(Q, K, V, mask=None): d_k = K.shape[-1] scores = T.batched_dot(Q, K.dimshuffle(0, 2, 1)) / T.sqrt(d_k) if mask is not None: scores = T.switch(mask, scores, -1e9) weights = T.nnet.softmax(scores) return T.batched_dot(weights, V)
The code came fast. Too fast. He checked the clock — eleven minutes had passed, and he'd written the attention mechanism, the multi-head wrapper, and half the positional encoding. In his previous life, this would have taken a day of careful implementation and debugging. Here, his hands knew what Theano needed before his conscious mind formulated the thought.
The positional encoding was tricky. Sinusoidal functions that gave the model information about token order — without them, the Transformer couldn't distinguish "dog bites man" from "man bites dog." He implemented it, checked it against the architecture in his head, adjusted two indices that were off by one, and moved on.
By 1 AM, the encoder was taking shape. Six stacked blocks, each containing multi-head attention and feed-forward layers, connected by residual streams and normalization. The code was dense, technical, and — he was increasingly certain — correct. Not because he'd tested it. Because the architecture in his mind and the code on his screen had converged to the point where he could feel the correspondence. Like overlaying two transparencies and watching the lines match.
---
[Same Apartment — 3:12 AM]
The headache had graduated from dull ache to sharp throb.
Ethan leaned back from the laptop and pressed the heels of his palms against his eye sockets. The architecture was still there in his mind, but the edges had gone soft. When he tried to examine the decoder's cross-attention layer, the dimensions blurred. Numbers that had been razor-sharp an hour ago now wavered, like text viewed through water.
He'd been pushing too hard. Four-plus hours of sustained architectural examination, layered on top of continuous accelerated coding. The combination was draining him in a way that transcended normal fatigue. This wasn't just tiredness. This was the ability itself hitting a wall — a cognitive throttle that tightened the harder he pushed against it.
He stood. His knees popped. His lower back screamed from four hours in a chair with no lumbar support. The apartment was freezing — the heat had shut off at midnight again. He grabbed the North Face jacket, draped it over his shoulders, and walked to the bathroom.
Aspirin. He found a bottle in the medicine cabinet — the original Ethan's, half-empty. He shook out two, cupped water from the tap with his hand, swallowed. The pills went down rough against a dry throat.
The mirror showed the face that was becoming his. Thinner than he was used to. Younger. The shadows under the eyes were his own contribution — four days in this body and he'd already added exhaustion to its features. The broken nose gave him a slightly crooked look, like someone who'd been in exactly one fight and lost.
This is it, he thought. This face. This life. Stop comparing.
He splashed cold water on his cheeks. Dried off with the towel that was starting to smell like it needed washing. Added "buy towels" to the growing mental list of things a living person needed that a dead man's apartment lacked.
Back at the desk. The code waited. The cursor blinked at the end of line 847.
He couldn't examine the architecture anymore tonight — the mental image was too degraded, too blurry at the edges. But the code he'd already written was on the screen. He didn't need the ability to debug what was already typed.
He switched from building to reviewing. Line by line. The attention mechanism looked right. The multi-head wrapper was clean. The positional encoding — he paused. Line 203. The frequency calculation was using the wrong base. He'd written 10000.0 where the architecture specified 10000 as an integer that should be cast to float64 for precision. A minor difference on small models. A training catastrophe at scale.
He fixed it.
Line 412. The layer normalization was applied before the residual connection instead of after. The architecture in his head — even blurry — told him this was wrong for the original Transformer. Pre-norm was a later innovation. The 2017 paper used post-norm.
He fixed that too.
The debugging was slower than the writing. Each correction required him to hold the code's logic in working memory and check it against what he could still access from the architecture. Like proofreading a document in a language he was forgetting — the knowledge was there, but it took effort to retrieve.
By 4:30 AM, the encoder was reviewed and corrected. The decoder was half-built and unreviewed. The training loop didn't exist yet. The data preprocessing pipeline didn't exist. The tokenizer was a placeholder.
Months of work remained, even with accelerated coding. The ability made him fast — inhumanly fast — but "fast" was relative. A full Transformer implementation in Theano, without the modern tooling ecosystem, was like building a car from raw steel. Every component needed fabrication. Every interface needed custom engineering. Nothing was off-the-shelf because the shelf hadn't been invented.
And then came training. Which required compute. Which required ChronoCloud. Which required money.
Twelve thousand dollars. Minus rent in two weeks. Minus food. Minus the Disrupt registration fee. Call it ten thousand dollars of usable capital.
At $50 per hour for V100-equivalents, that bought him two hundred hours of training time. A basic Transformer — small model, limited dataset, proof-of-concept only — might converge in a hundred hours. Maybe. If the implementation was perfect. If the hyperparameters were dialed in. If nothing crashed, nothing diverged, nothing caught fire in the sixty-seven ways that training runs caught fire in the era before MLOps was a word.
One shot. Maybe one and a half. And if the first training run failed — wrong learning rate, unstable gradients, data corruption, any of the thousand failure modes that had tormented ML engineers for decades — he'd be out of money with nothing to show for it.
A pressure formed in his chest. Not the headache. Something deeper. The cold mathematical reality that knowledge and ability weren't enough. He could see the architecture. He could write the code. He could even rent the hardware. But the margin for error was razor-thin, and this body didn't come with a safety net.
The first pale light of dawn was leaking through the window. Gray-blue, San Francisco's version of sunrise — more suggestion than event. The fire escape outside cast iron shadows across the wall. Somewhere below, a garbage truck rumbled through the alley.
Ethan saved the code. Eight hundred and forty-seven lines of Theano implementation, representing roughly forty percent of a complete Transformer. Written in one night. By one person. In a framework that shouldn't be asked to do this.
He closed the laptop. Stood. His body cataloged its complaints: headache (fading), back pain (persistent), eye strain (moderate), hunger (acute — he'd forgotten to eat since the burrito outside Moscone Center eight hours ago).
The coffee maker sat on the counter. Same Folgers. Same mug — DISRUPT EVERYTHING. He brewed a pot because the alternative was collapsing into bed and losing the thread of momentum that had carried him through the night.
While the coffee dripped, he stood at the window and watched San Francisco wake up. Cars multiplied on the street below. A jogger passed. A woman walked two dogs, both pulling in different directions. Across the street, a neon OPEN sign flickered to life in a diner that served breakfast and didn't judge you for looking like you hadn't slept.
The smell of his first cup in this apartment — four days ago, the morning after arriving in this body, standing in this same spot — came back to him. Steam curling in cold air. The strangeness of being alive. The vertigo of a world that was familiar and alien in equal measure.
Four days. In four days, he'd confirmed four supernatural abilities, mapped his meta-knowledge as unreliable, assessed the 2014 tech landscape as hostile to everything he planned to build, watched the Pied Piper team in person, written half a Transformer, and discovered that his gifts came with physical costs that couldn't be cheated.
The coffee finished. He poured. Took a sip. Burned his tongue, same as the first time.
Somewhere nearby, there was a coffee shop. He needed better coffee. He needed food that wasn't stale cereal or street burritos. He needed to leave this apartment and interact with actual human beings instead of code editors and dead men's email threads.
And he needed help. The implementation could be done alone — theoretically — but building a company, securing funding, training models, and navigating the social minefield of Silicon Valley while hiding four impossible abilities and the knowledge of a future that might not happen the way he remembered it?
That required people. Specific people. People who scored high enough on the ability's scale to build what he envisioned, and human enough to trust with pieces of the truth.
Ethan drained the coffee. Grabbed the jacket. Reached for the door.
Tomorrow, he'd find better coffee. And maybe — if the numbers aligned — the first person worth hiring.
Author's Note / Promotion: Your Reviews and Power Stones are the best way to show support. They help me know what you're enjoying and bring in new readers! You don't have to. Get instant access to more content by supporting me on Patreon. I have three options so you can pick how far ahead you want to be: 🪙 Silver Tier ($6): Read 10 chapters ahead of the public site. 👑 Gold Tier ($9): Get 15-20 chapters ahead of the public site. 💎 Platinum Tier ($15): The ultimate experience. Get new chapters the second I finish them . No waiting for weekly drops, just pure, instant access. Your support helps me write more . 👉 Find it all at patreon.com/fanficwriter1
