Scientists have developed the foundational architecture for the next generation of optical computing, which uses light rather than electricity to power chips. This has the potential to revolutionize the way artificial intelligence (AI) models are trained and run.
At the heart of large-scale language models (LLMs) and deep learning-based models are weighted organizational structures called tensors, which act like a filing cabinet with sticky notes indicating which drawers are most used.
When an AI model is trained to perform a task or function, such as recognizing images or predicting text strings, it sorts data into these tensors. In modern AI systems, the speed at which a model can process tensor data (or classify a filing cabinet) is a fundamental performance bottleneck that severely limits model size.
you may like
In typical light-based computing, models analyze tensors by firing laser arrays multiple times. These work like machines that scan the barcode of a package to determine its contents, except in this case each container references a math problem. The amount of processing power required to process these numbers varies depending on the specific features of the model.
Light-based computing is faster and more energy efficient at small scales, but most optical systems cannot run in parallel. Unlike graphics processing units (GPUs), which can be chained together to dramatically increase the amount and availability of processing power, light-based systems typically run in a linear fashion. For this reason, most developers ignore optical computing in favor of the parallel processing benefits of increased processing power at scale.
This scaling bottleneck is why the most powerful models created by OpenAI, Anthropic, Google, xAI, and others require thousands of GPUs to run in parallel for training and operation.
But a new architecture called Parallel Optical Matrix-Matrix Multiplication (POMMM) has the potential to negate the problems that have held back optical computing. Unlike previous optical methods, a single laser burst is used to perform multiple tensor operations simultaneously.
The result is a foundational AI hardware design with the potential to scale the tensor processing speed of a given AI system beyond state-of-the-art electronic hardware capabilities while reducing its energy footprint.
Next generation optical computing and AI hardware
The study, published November 14 in the journal Nature Photonics, details the results of an experimental optical computing prototype, along with a series of comparative tests against standard optical and GPU processing schemes.
The scientists used a specific arrangement of traditional optical hardware components in conjunction with new encoding and processing methods to capture and analyze tensor packages with a single laser shot.
you may like
They were able to encode digital data into the amplitude and phase of light waves, converting the data into physical properties in the optical field. These light waves are combined to perform mathematical operations such as matrix and tensor multiplication.
These optical operations are performed passively during light propagation, so no additional power is required to process them in this paradigm. This eliminates the need for control or switching during processing, as well as the power required to perform those functions.
“This approach can be implemented on almost any optical platform,” Zhipei Sun, lead author of the study and leader of Aalto University’s photonics group, said in a statement. “In the future, we plan to integrate this computational framework directly into photonic chips, allowing light-based processors to perform complex AI tasks with very low power consumption.”
Zhang estimates that this approach could be integrated into major AI platforms within three to five years.
General purpose artificial intelligence accelerator
Representatives described this as a step toward next-generation artificial general intelligence (AGI), a hypothetical future AI system that is smarter than humans and capable of universal learning across multiple disciplines regardless of training data.
“This will usher in a new generation of optical computing systems that will significantly accelerate complex AI tasks across a myriad of fields,” Zhang added in a statement.
The paper itself does not specifically mention AGI, but it mentions general purpose computing several times.
The idea that scaling current AI development techniques is a viable path to achieving AGI is so pervasive in certain corners of the computer science community that you can buy T-shirts that proclaim, “All you need is scaling.”
Other scientists, such as Meta’s outgoing chief AI scientist Yann LeCun, disagree, saying that the current gold standard AI architecture, LLM, will never reach the status of AGI, no matter how far and deep it extends.
POMMM provides a key piece of the hardware puzzle needed to remove one of the field’s biggest bottlenecks, scientists say, and could allow developers to scale far beyond the fundamental limitations of the current paradigm.
Source link
