NVIDIA A30

High Performance Compute: Up to 330 TOPS for AI inference with structured sparsity.
Flexible Precision Support: Accelerates workloads from FP64 to INT4.
24 GB HBM2 Memory: High-bandwidth memory with 933 GB/s throughput.
Multi-Instance GPU (MIG): Up to 4 secure and isolated GPU instances per card.
Energy Efficient: Delivers exceptional performance at only 165W TDP.
PCIe Gen 4.0 Interface: Fast 64 GB/s system connectivity.
Optimized for AI & HPC: Ideal for training, inference, and scientific computing.
vGPU Compatible: Full support for NVIDIA vPC, vApps, vWS, and vCS.
Scalable Deployment: Works seamlessly in mainstream enterprise servers.
Enterprise-Ready Virtualization: Integrates with Kubernetes, VMware, and Citrix.
End-to-End Software Stack: Compatible with CUDA, cuDNN, TensorRT, and RAPIDS.
Passive Cooling Design: Suitable for dense data center environments.
Accelerated Data Analytics: Optimized for big data workflows with RAPIDS and Apache Spark.
Proven AI Performance: Benchmarked in MLPerf Inference with leading results.

Versatile Compute Acceleration for Mainstream Enterprise Servers

The NVIDIA® A30 Tensor Core GPU is designed to meet the growing computational needs of enterprise data centers with a balanced combination of performance, flexibility, and efficiency. Built on the powerful NVIDIA Ampere architecture, the A30 brings together cutting-edge technologies such as advanced Tensor Cores and Multi-Instance GPU (MIG) capabilities to handle a wide range of workloads from high-throughput AI inference to complex high-performance computing (HPC) tasks.

Its compact PCIe form factor ensures compatibility with mainstream server architectures, while its optimized power consumption (165W) makes it ideal for energy-conscious environments. The A30 is a go-to solution for businesses looking to modernize their data centers without overhauling existing infrastructure.

It provides high memory bandwidth with 24 GB of HBM2 memory and 933 GB/s throughput, enabling efficient handling of large data sets and AI models. By supporting various AI and HPC precision formats (including FP64, TF32, and INT8), the A30 allows enterprises to deploy flexible AI solutions without sacrificing accuracy or speed. Whether deployed in financial services, healthcare, or research environments, the A30 delivers enterprise-grade compute acceleration in a scalable, secure, and cost-effective package.

A Platform for Mainstream AI and HPC Workloads

The NVIDIA A30 is purpose-built to accelerate AI and high-performance computing workloads across a variety of enterprise applications. With support for multiple floating-point precisions including TF32 for AI training and FP64 for scientific computations the A30 delivers consistent, high-throughput performance in both inference and training scenarios.

It also integrates NVIDIA’s full hardware-software stack, providing seamless support for major frameworks like TensorFlow, PyTorch, and MXNet, as well as containerized environments like NVIDIA NGC and Kubernetes. One of the key innovations of the A30 is its support for Multi-Instance GPU (MIG) technology, which allows the GPU to be partitioned into up to four isolated instances.

Each MIG instance behaves like an independent GPU, ensuring that multiple users or applications can share a single physical card without interference or performance degradation. This enables consistent Quality of Service (QoS), even in multi-tenant environments. Enterprises benefit from reduced hardware overhead, better resource utilization, and simplified management. Whether the task is real-time speech recognition, fraud detection, drug discovery, or seismic analysis, the A30 provides the reliable, scalable performance needed for today’s diverse AI and HPC applications.

Performance Specifications

Feature	Specification
CUDA Cores	3,804
Tensor Cores	224
Peak FP64 Performance	5.2 TFLOPS
FP64 Tensor Core Performance	10.3 TFLOPS
Peak FP32 Performance	10.3 TFLOPS
TF32 Tensor Core Performance	82 TFLOPS (165 TFLOPS with sparsity*)
BFLOAT16 Tensor Core Performance	165 TFLOPS (330 TFLOPS with sparsity*)
FP16 Tensor Core Performance	165 TFLOPS (330 TFLOPS with sparsity*)
INT8 Tensor Core Performance	330 TOPS (661 TOPS with sparsity*)
GPU Memory	24 GB HBM2
Memory Bandwidth	933 GB/s
Thermal Design	Passive Cooling
Max Power Consumption	165 W
System Interface	PCIe Gen 4.0 (64 GB/s)
Multi-Instance GPU (MIG)	Supported
vGPU Support	Supported

Optimized for Deep Learning

The NVIDIA A30 is meticulously optimized for deep learning across a full range of precisions, from high-accuracy FP64 computations to ultra-fast INT4 inference operations. With the integration of Tensor Cores capable of delivering up to 330 TFLOPS (with sparsity), the A30 dramatically accelerates both training and inference workflows. Structural sparsity support enables models to run more efficiently by skipping zero values in matrices resulting in double the throughput with minimal accuracy loss.

The A30 can host up to four MIG instances, each operating as a separate GPU, which means that different neural networks can be trained or deployed concurrently without affecting each other’s performance. When paired with the NVIDIA Triton Inference Server, enterprises can deploy scalable, high-throughput AI services across various platforms.

This enables real-time inference applications such as autonomous driving, intelligent video analytics, and conversational AI. By combining deep learning optimization with a robust software ecosystem including support for NVIDIA TensorRT, CUDA, and cuDNN the A30 stands out as an ideal platform for developers and researchers aiming to push the boundaries of AI innovation.

Enterprise-Grade Virtualization and Utilization

Modern enterprises increasingly rely on virtualization to maximize resource efficiency and streamline operations, and the NVIDIA A30 is built with this shift in mind. With support for NVIDIA Multi-Instance GPU (MIG), the A30 can be partitioned into up to four isolated GPU instances, each operating independently. This level of granularity allows organizations to assign right-sized GPU resources to different users or workloads, leading to improved GPU utilization and cost efficiency.

These virtual GPU instances can be used in conjunction with containerized applications and orchestrated using platforms like Kubernetes, enabling flexible deployment in cloud-native and on-premise data centers. The A30 also integrates smoothly with hypervisors such as VMware vSphere and Citrix Hypervisor, ensuring compatibility with existing IT infrastructure.

By allowing multiple users to share a single GPU without performance contention, the A30 enables a broader rollout of GPU acceleration across virtual desktops, AI services, and computational tasks. This makes it particularly valuable for industries like education, finance, and healthcare, where workload diversity and security isolation are critical.

Accelerated Data Analytics at Scale

Today’s data-driven enterprises require powerful platforms to process and analyze ever-growing volumes of information. The NVIDIA A30 rises to this challenge with high-performance compute capabilities, 24 GB of high-bandwidth memory, and robust software support for data analytics frameworks. It enables large-scale analytics workflows by accelerating the processing of structured and unstructured data alike ideal for use cases in retail analytics, genomics, financial modeling, and fraud detection.

With support for RAPIDS, an open-source suite of data science libraries built on CUDA, the A30 allows data scientists to perform end-to-end data processing, machine learning, and visualization workflows entirely on the GPU. Moreover, when used in multi-GPU environments, the A30 takes advantage of NVIDIA NVLink and Magnum IO to enable seamless data sharing between GPUs and fast I/O communication between nodes.

This results in reduced bottlenecks and accelerated time-to-insight. Whether deployed in a standalone system or as part of a distributed computing cluster, the A30 ensures organizations can scale their data pipelines while maintaining high performance and reliability.

vGPU Software Compatibility

The NVIDIA A30 is fully compatible with the latest NVIDIA virtual GPU (vGPU) software solutions, including:

NVIDIA Virtual PC (vPC)
NVIDIA Virtual Applications (vApps)
NVIDIA RTX Virtual Workstation (vWS)
NVIDIA Virtual Compute Server (vCS)

Customizable vGPU profiles range from 1 GB to 24 GB, allowing flexibility for a variety of virtualized workloads from basic virtual desktops to compute-intensive applications.

Whether the environment requires lightweight graphics acceleration or high-throughput compute tasks, vGPU profiles can be dynamically adjusted to optimize performance and user experience.

This capability not only simplifies resource allocation but also reduces hardware requirements by enabling multiple virtual GPUs per physical card. With vGPU support, the A30 extends its reach to more users, departments, and applications making GPU-accelerated computing accessible and manageable in even the most complex enterprise environments.

NVIDIA A30

Feature	Specification
CUDA Cores	3,804
Tensor Cores	224
Peak FP64 Performance	5.2 TFLOPS
FP64 Tensor Core Performance	10.3 TFLOPS
Peak FP32 Performance	10.3 TFLOPS
TF32 Tensor Core Performance	82 TFLOPS (165 TFLOPS with sparsity*)
BFLOAT16 Tensor Core Performance	165 TFLOPS (330 TFLOPS with sparsity*)
FP16 Tensor Core Performance	165 TFLOPS (330 TFLOPS with sparsity*)
INT8 Tensor Core Performance	330 TOPS (661 TOPS with sparsity*)
GPU Memory	24 GB HBM2
Memory Bandwidth	933 GB/s
Thermal Design	Passive Cooling
Max Power Consumption	165 W
System Interface	PCIe Gen 4.0 (64 GB/s)
Multi-Instance GPU (MIG)	Supported
vGPU Support	Supported

Resources

Continue Exploring

Versatile Ampere‑Architecture Performance

Built on NVIDIA’s Ampere architecture, the A30 excels across AI training, inference, HPC, and data analytics workloads in a single efficient GPU platform.
Third‑Generation Tensor Cores with Multi‑Precision Support

Equipped with 224 third-gen Tensor Cores, the A30 delivers up to 165 TFLOPS (TF32) and 330 TFLOPS (FP16/BF16 with sparsity), accelerating deep learning workloads by up to 20X over previous‑gen GPUs—and boosting FP64 HPC performance to 10 TFLOPS.
Hardware‑Partitioning via Multi‑Instance GPU (MIG)

Up to four fully isolated GPU instances per card enable secure, dynamically scalable multi-tenant environments with guaranteed quality of service.
24 GB HBM2e Memory with 933 GB/s Bandwidth

Massive high-bandwidth memory delivers fast access for large models and complex simulations, outperforming T4 by more than 3x.
Support for Structural Sparsity

Leverages model sparsity to double inference throughput for compatible neural networks, enhancing efficiency for AI inference workloads.
High‑Speed Connectivity: PCIe Gen 4 & NVLink

Provides double the system bandwidth via PCIe Gen 4 (64 GB/s), with optional NVLink support (200 GB/s) for multi‑GPU scaling.
Energy‑Efficient 165 W TDP

Delivers exceptional performance in a low‑power, 165 W envelope—ideal for mainstream servers aiming for high efficiency.
Broad Enterprise Software Compatibility

Compatible with NVIDIA AI Enterprise, vGPU, Virtual Compute Server, CUDA-X, TensorRT, and RAPIDS—supporting containerized workloads, virtualization (VMware, Red Hat), and production AI pipelines

NVIDIA A30

GPU memory size: 24 GB HBM2
Thermal Solution: Passive
Form Factor: 2-slot, full height, full length (FHFL)

NVIDIA A30

Versatile Compute Acceleration for Mainstream Enterprise Servers

A Platform for Mainstream AI and HPC Workloads

Performance Specifications

Optimized for Deep Learning

Enterprise-Grade Virtualization and Utilization

Accelerated Data Analytics at Scale

vGPU Software Compatibility

Resources

Continue Exploring

NVIDIA A30

Related Products

Are you ready to unlock your network Capability?

Quick Access

Home

Orders

Account

Cart

Blog

Contact us

Categories

Server

Storage

Networking

Wireless

Access Point

Router

Brands

HP

Dell

Lenovo

Cisco

Mikrotik

Huawei

Privacy

Careers

Terms