
Toronto-based semiconductor startup Taalas has raised $169 million in new funding to accelerate the development of AI chips tailored to specific models, bringing its total capital raised to $219 million. The round includes backing from Quiet Capital, Fidelity, and veteran venture capitalist Pierre Lamond.
Taalas is positioning itself as a challenger to industry heavyweights such as Nvidia by designing silicon optimized for targeted AI workloads rather than building general-purpose GPUs.
According to CEO Ljubisa Bajic, the company “prints” portions of AI models directly onto chips and pairs them with high-speed SRAM memory to significantly improve processing speed. Instead of manufacturing fully fixed chips months in advance, Taalas assembles a nearly complete processor and performs final customization in roughly two months using TSMC’s fabrication process. By comparison, chips from larger vendors can take about six months to finalize.
The startup’s first commercial product is a processor optimized to run the open-source Llama 3.1 8B language model. Taalas claims the chip can generate 17,000 output tokens per second—approximately 73 times more than Nvidia’s H200 GPU—while consuming just one-tenth of the power.
The efficiency gains stem from tailoring hardware to the specific requirements of an AI model. Traditional graphics cards often include redundant components, such as more DRAM than a model can effectively use. By eliminating unused elements and reallocating silicon toward performance-critical transistors, Taalas enhances both speed and energy efficiency.
Building a fully custom processor is typically cost-prohibitive, but Taalas says it reduces expenses by customizing only two of the more than 100 layers that comprise its chips. In most processors, only a handful of layers contain transistors, while the remainder house interconnect wiring and supporting infrastructure. The company modifies select layers to include what it describes as a mask ROM recall fabric.
Mask ROM is a form of memory that can be written once and then read repeatedly without modification. According to The Next Platform, each mask ROM recall fabric module stores four bits, and Taalas’ architecture uses a single transistor to process that data. That transistor performs matrix multiplications—the core mathematical operations underpinning AI inference.
A notable advantage of this architecture is that it eliminates the need for high-bandwidth memory (HBM), which is commonly used in GPUs to store AI model data. Transferring data between processors and HBM modules introduces latency and requires additional supporting components. Taalas’ approach bypasses these bottlenecks, reducing both delay and system complexity.
Looking ahead, the company is developing a new processor capable of running a Llama model with 20 billion parameters, expected to be ready this summer. It also plans to introduce a more advanced chip, HC2, designed to handle frontier-scale AI models. Additionally, Taalas aims to support increasingly sophisticated systems, including advanced models such as GPT-5.2 by the end of 2026.
By focusing on model-specific silicon and minimizing architectural overhead, Taalas is attempting to redefine how AI chips are built—prioritizing efficiency, speed, and power optimization in a market currently dominated by general-purpose accelerators.




