Microsoft's Chiplet Cloud To Bring The Cost Of LLMS Way Down
The Next Platform, Thursday, July 13,2023
If NVIDIA and AMD are licking their lips thinking about all of the GPUs they can sell to the hyperscalers and cloud builders to support their huge aspirations in generative AI - particularly when it comes to the OpenAI GPT large language model that is the centerpiece of all of the company's future software and services - they had better think again.
We have been saying from the beginning of this generative AI explosion that if inference requires the same hardware to run as the training, then it cannot be productized. No one, not even the deep-pocketed hyperscalers and cloud builders, can afford this.
Which is why researchers at the University of Washington and the University of Sydney have cooked up a little something called the Chiplet Cloud, which in theory at least looks like it can beat the pants off an NVIDIA 'Ampere' A100 GPU (and to a lesser extent a 'Hopper' H100 GPU) and a Google TPUv4 accelerator running OpenAI's GPT-3 175B and Google's PaLM 540B model when it comes to inference.