Google Cloud announced the launch of its Cloud TPU v5e, its 5th generation of tensor processing units. It’ll be used to accelerate large-scale AI inference. Google has developed top-notch AI capabilities over the past decade, creating the Transformer architecture to enable advanced AI. Google also built an AI-optimized infrastructure for its products. This includes YouTube, Gmail, Google Maps, Google Play, and Android, serving billions of users worldwide.
Google added Cloud TPU v5e to its AI-optimized infrastructure portfolio, making it the most cost-effective, versatile, and scalable Cloud TPU available in preview. TPU v5e integrates with Google Kubernetes Engine (GKE), Vertex AI, and popular frameworks like Pytorch, JAX, and TensorFlow, allowing users to start with familiar and user-friendly interfaces.
Mark Lohmeyer, the VP and GM for compute and ML infrastructure at Google Cloud, said in a press conference, “Cloud TPU v5e consistently delivered up to 4X greater performance per dollar than comparable solutions in the market for running inference on our production ASR model,” said Domenic Donato, VP of Technology, AssemblyAI. “The Google Cloud software stack is well-suited for production AI workloads and we are able to take full advantage of the TPU v5e hardware, which is purpose-built for running advanced deep-learning models. This powerful combination of hardware and software dramatically accelerated our ability to provide cost-effective AI solutions to our customers.”
TPU v5e achieves double the training performance and 2.5 times the inference performance per dollar compared to Cloud TPU v4, especially for LLMs and gen AI models. It costs less than half of TPU v4, enabling more organizations to train and deploy larger, intricate AI models.
Google emphasizes the balance of performance, flexibility, and efficiency with TPU v5e pods. These pods can interconnect up to 256 chips with over 400 Tb/s aggregate bandwidth and 100 petaOps of INT8 performance. TPU v5e is highly versatile, supporting eight virtual machine (VM) configurations. This lets customers select the suitable configurations for other LLM and gen AI model sizes.
Additionally, Google Cloud announced its partnership with NVIDIA to offer cutting-edge AI infrastructure and software. This partnership aims to assist customers in developing and deploying substantial generative AI models while improving data science tasks.