In recent weeks, researchers from Google and Sakana have unveiled two groundbreaking neural network designs that could revolutionize the AI industry.
These technologies aim to challenge the dominance of transformers – a type of neural network that connects input and output based on context – the technology that has defined AI for the past six years.
The new approaches are Google’s ‘Titans’ and ‘Transformers Squared’, designed by Sakana, a Tokyo AI startup known for using nature as a model for engineering solutions. Both Google and Sakana have tackled the transformer problem by studying the human brain. Their transformers basically use different memory stages and activate different expert modules independently of each other, instead of deploying the entire model at once for each problem.
The net result makes AI systems smarter, faster, and more versatile than ever before, without necessarily making them larger or more expensive to operate.
As for context, transformer architecture, the technology that gave ChatGPT the “T” in its name, is designed for sequence-to-sequence tasks such as language modeling, translation, and image processing. Transformers rely on ‘attention mechanisms’ or tools for understanding how important a concept is depending on a context, to model dependencies between input tokens, allowing them to process data in parallel rather than sequentially, such as so-called recurrent neural networks – the dominant technology in AI before transformers appeared. This technology gave models insight into context and marked a before and after moment in AI development.
However, despite their remarkable success, transformers have faced significant challenges in scalability and adaptability. To make models more flexible and versatile, they also need to be more powerful. So once they are trained, they cannot be improved unless developers come up with a new model or users rely on third-party tools. That’s why today in AI, “bigger is better” is a general rule.
But this may soon change, thanks to Google and Sakana.
Titans: A new memory architecture for dumb AI
Google Research’s Titans architecture takes a different approach to improving AI’s adaptability. Rather than changing the way models process information, Titans focuses on changing the way they store and access information. The architecture introduces a neural long-term memory module that learns to remember during the test, similar to how human memory works.
Currently, models read your entire prompt and output, predict a token, read it all again, predict the next token, and so on until they come up with the answer. They have incredible short-term memory, but they suck at long-term memory. Ask them to remember things outside their context window, or very specific information in a bunch of noise, and they are likely to fail.
Titans, on the other hand, combines three types of memory systems: short-term memory (similar to traditional transformers), long-term memory (for storing historical context), and persistent memory (for task-specific knowledge). This multi-layered approach allows the model to process sequences of more than 2 million tokens, far more than what current transformers can efficiently process.

According to the research paper, Titans shows significant improvements in several tasks, including language modeling, common sense reasoning and genomics. The architecture has proven particularly effective in “needle-in-haystack” tasks, which require locating specific information within very long contexts.
The system mimics how the human brain activates specific regions for different tasks and dynamically reconfigures its networks based on changing demands.
In other words, similar to how different neurons in your brain specialize in different functions and activate based on the task you perform, Titans emulate this idea by incorporating interconnected memory systems. These systems (short-term, long-term, and persistent memories) work together to dynamically store, retrieve, and process information based on the task at hand.
Transformer Squared: Self-adaptive AI is here
Just two weeks after Google’s article, a team of researchers from Sakana AI and the Institute of Science Tokyo introduced Transformer Squared, a framework that allows AI models to adjust their behavior in real time based on the task at hand. The system works by selectively adjusting only the single components of their weight matrices during inference, making it more efficient than traditional fine-tuning methods.
Transformer Squared “uses a two-pass mechanism: first a dispatch system identifies the task properties, and then task-specific ‘expert’ vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt ,” the research paper said.
It sacrifices inference time (it thinks more) for specialization (knowing which expertise to apply).

What makes Transformer Squared particularly innovative is its ability to adapt without the need for extensive retraining. The system uses what the researchers call Singular Value Fine-tuning (SVF), which focuses on adjusting only the essential components needed for a specific task. This approach significantly reduces computational requirements while maintaining or improving performance compared to current methods.
During testing, Sakana’s Transformer demonstrated remarkable versatility across different tasks and model architectures. The framework showed particular promise in dealing with out-of-distribution applications, suggesting it could help AI systems become more flexible and responsive to new situations.
Here is our attempt at an analogy. Your brain forms new neural connections when you learn a new skill, without having to rewire everything. For example, when you learn to play the piano, your brain doesn’t have to rewrite all its knowledge; they adapt specific neural circuits for that task while retaining other capabilities. Sakana’s idea was that developers wouldn’t have to retrain the model’s entire network to adapt to new tasks.
Instead, the model selectively adjusts specific components (via Singular Value Fine-tuning) to become more efficient at certain tasks, while maintaining overall capabilities.
Overall, the era of AI companies boasting about the sheer size of their models may soon be a thing of the past. As this new generation of neural networks gains popularity, future models will no longer need to rely on massive scale to achieve greater versatility and performance.
Today, transformers dominate the landscape, often supplemented with external tools such as Retrieval-Augmented Generation (RAG) or LoRAs to increase their capabilities. But in the rapidly evolving AI industry, all it takes is one breakthrough implementation to pave the way for a seismic shift – and once that happens, the rest of the field will surely follow.
Edited by Andrew Hayward
Generally intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.