Researchers have designed a new type of large-scale language model (LLM) that they propose can bridge the gap between artificial intelligence (AI) and more human-like cognition.
Researchers at AI startup Pathway, which developed the model, say the model, called “Dragon Hatching,” is designed to more accurately simulate how neurons in the brain connect and strengthen through learning experiences. They described this as the first model that can “generalize over time,” meaning it can automatically adjust its neural wiring in response to new information.
you may like
“There’s a lot of discussion going on right now, especially with inference models and synthetic inference models, about whether you can extend inference beyond the patterns we’ve seen in data retention, and whether you can generalize inference to more complex or longer inference patterns,” Adrian Kosowski, co-founder and chief scientific officer at Pathway, said on the Super Data Science podcast on October 7.
“The evidence is largely inconclusive and the answer is generally no. Currently, machines do not generalize reasoning the way humans do. We believe this is a major challenge. [the] The architecture we are proposing has the potential to bring about significant changes. ”
A step towards AGI?
Teaching AI to think like humans is one of the field’s most important goals. However, reaching this level of simulated cognition, often referred to as artificial general intelligence (AGI), remains difficult.
A key challenge is that human thinking is inherently messy. Our thoughts rarely appear as neat, linear sequences of connected information. Rather, the human brain is a chaotic tangle of overlapping thoughts, sensations, emotions, and impulses constantly competing for your attention.
In recent years, LLM has brought the AI industry even closer to simulating human-like reasoning. LLMs are typically powered by Transformer models (Transformers), a type of deep learning framework that allows AI models to connect words and ideas during a conversation. Transformers are the “brains” behind generative AI tools like ChatGPT, Gemini, Claude, etc., allowing them to interact with and respond to users with (at least most of the time) convincing levels of “awareness.”
Transformers is extremely sophisticated, but it also represents the cutting edge of existing generative AI capabilities. One reason is that they don’t learn continuously. Once an LLM is trained, the parameters governing it are locked. This means that new knowledge must be added through retraining or fine-tuning. When LLM encounters something new, it simply generates a response based on what it already knows.
imagine dragon
Dragon Hatchling, on the other hand, is designed to dynamically adapt its understanding beyond the training data. It does this by updating its internal connections in real time as it processes new inputs, similar to how neurons strengthen or weaken over time. This may support continuous learning, the researchers said.
you may like
Unlike typical Transformer architectures, which process information sequentially through stacked layers of nodes, Dragon Hatchling’s architecture behaves like a flexible web that reorganizes itself as new information is revealed. Tiny “neuronal particles” continually exchange information and adjust their connections, strengthening some and weakening others.
Over time, new pathways form that help the model retain and apply what it has learned to future situations, effectively giving the model a type of short-term memory to influence new inputs. However, unlike traditional LLMs, Dragon Hatchling’s memory comes from continuous adaptation within the architecture, rather than from context stored in the training data.
In testing, Dragon Hatchling performed similarly to GPT-2 on benchmark language modeling and translation tasks. This is an impressive feat for a brand new prototype architecture, the researchers note.
Although the paper has not yet been peer-reviewed, the team hopes the model will serve as a foundational step toward autonomously learning and adapting AI systems. In theory, this could mean that AI models get smarter the longer they’re online, for better or worse.
Source link
