
PrismML’s Technology Drastically Improves the Power-to-Compute Equation in Datacenters
Breakthrough 1-bit Bonsai 8B Model Enables Advanced Intelligence to Run Locally on Phones, Laptops, and Other Edge Devices
PASADENA, Calif., 2026— PrismML, a pioneer in high-performance AI models, today emerged from stealth to introduce the world’s first commercially viable 1-bit large language models, built on groundbreaking research developed at Caltech. PrismML’s sweeping goal is to enable a future where powerful AI can run locally, efficiently, securely, and faster, and where datacenter buildouts can do more with fewer resources and avoid ballooning energy costs.
Its flagship model, 1-bit Bonsai 8B, represents a fundamental shift in how AI is deployed: delivering cutting-edge capabilities while operating efficiently on consumer and industrial edge devices, including smartphones, laptops, and embedded systems.
“AI’s future will not be defined by who can build the largest datacenters,” said Vinod Khosla, Founder of Khosla Ventures and an investor in the company. “It will be defined by who can deliver the most intelligence per unit of energy and cost. PrismML represents that kind of breakthrough.”
As AI models grow larger and more computationally intensive, deploying advanced intelligence has increasingly required massive datacenter infrastructure. This limits real-time, on-device AI experiences due to latency, hardware, and privacy constraints.
PrismML addresses this challenge by fundamentally rethinking neural networks at the mathematical level. Instead of traditional 16- or 32-bit architectures, the company creates models with a native 1-bit structure. This dramatically reduces inference compute and memory requirements without sacrificing reasoning performance.
On a range of intelligence benchmarks, 1-bit Bonsai 8B is competitive with leading full-precision 8B models, including Llama3 8B, while being:
- 14x smaller
- 8x faster
- 4-5x more energy efficient
This efficiency enables developers to build sophisticated AI applications that execute directly on devices, reducing reliance on the cloud and unlocking a new generation of edge-first applications in robotics, wearables, and personal computing that were previously impractical.
“We spent years developing the mathematical theory required to compress a neural network without losing its reasoning capabilities,” said Babak Hassibi, CEO and Founder of PrismML and Professor at Caltech. “We see 1-bit not as an endpoint, but as a starting point. We are creating a new paradigm for AI: one that adapts to diverse hardware environments and delivers maximum intelligence per unit of compute and energy.”
While the immediate impact is at the edge, the implications extend to the cloud. The same efficiency gains that enable local deployment also allow datacenters to operate more effectively by improving hardware utilization, lowering operating costs, and reducing energy consumption.
“From a systems perspective, reducing models to 1-bit representations changes the optimization equation,” said Ion Stoica, Databricks Co-Founder and Professor at UC Berkeley. “It enables a new class of AI systems that can both operate efficiently at the edge and scale economically in the cloud.”
Bill Jia, VP of Engineering at Google, Core ML/AI, added: “When advanced models can run on constrained devices, it reshapes system design end to end. Efficiency at the model level compounds across infrastructure.”
PrismML’s technology can also impact future AI hardware design.
Amir Salek of Cerberus Ventures, an investor in the company, and who also founded and led the TPU program at Google, commented: “Power has become the ultimate bottleneck for scaling AI datacenters, and PrismML is fundamentally transforming the power-to-compute equation. Moreover, by reducing the memory footprint and bandwidth demands, this breakthrough technology has the potential to do more than just improve the economics of AI infrastructure; it can unlock a new frontier for innovation in computer architecture for AI inference and the next generation of AI models.”
With today’s launch, PrismML moves this architectural breakthrough from research to reality, placing the power of 1-bit AI directly into the hands of users, developers, and researchers.
Technical Details:
The 1-bit Bonsai 8B model is an 8-billion parameter Large Language Model where each parameter has 1-bit precision. It has been trained using Google v4 TPUs. It is designed for seamless integration with existing AI workflows and is optimized for low-latency inference on consumer-grade CPUs, NPUs, and edge GPUs. The model achieves high-fidelity reasoning and language understanding comparable to FP16 (16-bit floating point) 8B models, but with a fraction of the memory footprint (1GB vs 16GB). PrismML is also releasing 1-bit Bonsai 4B and 1.7B models, with 0.5GB and 0.24GB memory footprint, respectively.
Pricing and Availability:
Developers, researchers, and other users can download the 1-bit Bonsai models under the Apache 2.0 license for free starting today.




