The 2025 PCIe GPU in Server Guide
Credit: Supermicro and STH, source: servethehome.com

We often discuss the AI GPUs in the context of high-end integrated racks involving the NVIDIA GB200 NVL72 or NVIDIA HGX 8-GPU platforms that are used for AI factory applications. Still, there is a lot of AI that happens outside of these huge installations. Today, we are going to talk about those, both in the context of an “Enterprise AI Factory” but also down to edge servers and workstations. I have been wanting to do this piece for several months, and Supermicro managed to get all of the components together in one room so we can show folks the answers to simple questions like “which PCIe GPU should I use, where, and when?”
As always, we suggest watching it in its own tab or app for the best viewing experience. We also need to say thank you to Supermicro who got all of these systems together. Also, NVIDIA’s GPUs are the most popular these days, and they managed to get us a bunch of GPUs to use for this. We are going to say they are sponsoring this. With that, let us get to it.
8x PCIe GPU Systems
The 8x PCIe GPU systems are in some ways parallel to the SXM-based systems, with a few big caveats. PCIe GPU systems typically have GPUs between 300W and 600W per GPU, making them lower power than the SXM-based solutions. Beyond that, we typically see ratios of two East-West 400GbE NICs per PCIe GPU while in SXM-based systems, that is more of a 1:1 ratio. Also, removing the NVLink switch architecture means that the systems can be produced at a lower cost (albeit without that increased performance) and at lower power. While it may sound like these systems are simply lower-power versions of the SXM-based platforms, that is not necessarily the case. There are additional options to customize the GPUs used in the platform and add additional graphics capabilities.
Typical PCIe GPUs we see used are:
- NVIDIA H100 NVL / H200 NVL with NVIDIA AI Enterprise software
- NVIDIA RTX PRO 6000 Blackwell Server Edition
- NVIDIA L40S
The NVIDIA H100 NVL and H200 NVL with NVIDIA AI Enterprise software are designed to include NVLink interconnect technology across up to four GPUs. These are solutions typically used for post-training models and for AI inference potentially at a lower power per GPU than the SXM systems. Perhaps the biggest reason to choose the H200 NVL over the H100 NVL is the newer HBM memory subsystem which is improved for memory-bound workloads.

The NVIDIA RTX PRO 6000 Blackwell is used for something slightly different. This is NVIDIA’s solution for those running a large mix of workloads. While these cards do not have the high bandwidth memory, they gain the RT cores, encoders, and even video outputs. That means the RTX PRO 6000 can be used for graphics workloads such as for engineering, VDI, rendering, and so forth. They can also be used for AI inference, with each card packing 96GB of GDDR7 memory per GPU. In an 8-GPU system, one can partition these GPUs into four instances using Multi-Instance GPU (MIG) for up to 32 logical GPUs. In total, eight of these GPUs offer 768GB of combined GPU memory for AI inference applications. In an eight GPU system, one can use the GPUs for different applications based on the time of day (e.g. VDI in the day, and AI inference in the evening.) One can also use the GPUs for different tasks during different times of the day. The key here is the flexibility on what you can do with these since each GPU has a lot of memory, but critically the NVIDIA RTX graphics capabilities that AI-focused GPUs do not have.

The NVIDIA L40S is effectively a lower-cost GPU for this platform based on the Ada Lovelace architecture. These GPUs have 48GB of memory and graphics capability but do not have some of the newer features like MIG.
Supermicro has the SYS-522GA-NRT an RTX PRO Server that supports 8x RTX PRO 6000 Blackwell Server Edition GPUs. Inside the platform, we have two PCIe switches along with two CPUs, 32 DDR5 DIMMs, room for multiple NICs, and SSDs.

Power consumption varies widely based on the configuration, but the advantage of these platforms is that they tend to use less power than the SXM systems leading to lower operating costs. Acquisition costs are often lower than the SXM-based systems as well.

Something new for 2025 is the new NVIDIA MGX PCIe Switch Board with NVIDIA ConnectX-8 SuperNICs for 8x PCIe GPU Servers. This is a big change to the platform that Supermicro is adopting in its Supermicro SYS-422GL-NR.

Instead of using two or four larger switches, the new platform utilizes ConnectX-8 SuperNICs and their built-in switches to provide high-speed networking for the GPUs. This is, by far, the biggest change in the platform in many years.

Next, let us take a look at some standard compute servers that are designed to accommodate GPUs.
Standard Servers with PCIe GPUs
While the 8-GPU platform is really designed primarily for GPU compute, the prospects for AI extend beyond those types of servers. A great example of why organizations are deploying these types of platforms is because if you believe that AI will be in virtually every workflow, then the question becomes how to address that. Deploying servers without GPUs today means that the only option is to go off of a server and to an AI server. An alternative model is to add GPUs into traditional servers so that they can be used during parts of the workload that are best accelerated.

Like the 8-GPU systems, the GPUs used often range between the NVIDIA H100 NVL, H200 NVL, RTX PRO 6000 Blackwell, and L40S. A big difference with this is that typically in a 2U server one can only fit two GPUs side-by-side.

As a result, 4-way NVLink is less common in traditional servers as compared to finding one or two GPUs in each domain. Some are also deploying lower-power GPUs like the NVIDIA L4 add a smaller amount of GPU compute and GPU memory but at a lower power consumption and cost.

As an example, we showed the Supermicro SYS-212GB-NR. This is one of Supermicro’s Hyper line of high-end servers where one can add many different types of GPUs. The idea is that if AI becomes part of your workflow because the software you are running is increasingly implementing AI, then adding a GPU to the server can make sense to keep the AI inference local.

Supermicro also has 2U GPU servers that are inspired by the NVIDIA MGX architecture. We have looked at a number of these previously, but we saw a new Xeon-based design to house multiple GPUs while we were in the demo room.

Next, let us get to the high-density servers.
High-Density Servers with PCIe GPUs
In the video, we also showed high-density servers that can utilize PCIe GPUs. An example we showed was the Supermicro SuperBlade with NVIDIA L4 GPUs. The L4 is versatile because it is a low-profile GPU with minimal cooling needs.

Over the years, we have seen SuperBlade and other high-density Supermicro platforms take a range of GPUs from single-width low-profile GPUs to double-width GPUs. The reasoning is usually the same as with the Standard Servers, but just in a higher-density design.
Next, let us get to the edge.
Edge Servers with PCIe GPUs
Edge servers present a different opportunity. Applications such as computer vision become more prevalent at the edge. A great example of this is in many retail locations where self-checkout is powered by edge servers with GPUs. Other typical applications in retail can be inventory analytics, shopper analytics, and more.

In the video, we showed an example with two NVIDIA L4 GPUs, the Supermicro SYS-E403-14B-FRN2T.

These servers are often highly power and space-constrained so having a 75W TDP or lower single-width low-profile GPU is the go-to.

Beyond the L4 GPUs, there are other edge use cases from network infrastructure to smart cities where larger GPUs can be utilized, often with higher-end networking.
Workstations with PCIe GPUs
Workstations have become hot topics in the era of AI. Folks want to develop AI tools locally. Perhaps the bigger shift will be that as AI becomes bigger parts of everyday workflows, having bigger GPUs can yield out-sized productivity gains.

When NVIDIA launched the RTX 6000 PRO Blackwell there were three versions. One was a 600W version that was designed to provide the maximum performance in a single PCIe card slot. There were then two double-width cards. One is a 300W actively cooled version. The other is the passively cooled version that we often see in the 8-GPU systems.

Recently we reviewed the Supermicro AS-2115HV-TNRT, a 2U server that can handle up to four double-width GPUs. The innovation here is that most other workstations on the market, even if they can be converted to 4U or 5U workstations, only handle up to three GPUs. With this system, we can get up to four GPUs in systems, along with IPMI for management, and then pack them into data center racks.

Supermicro also has other options such as the AS-531AW-TC and SYS-532AW-C that are designed to handle either a single 600W NVIDIA RTX PRO 6000 or multiple 300W versions like the Max-Q edition.
Final Words
Ultimately, if you believe in AI, and are using new tools daily, then the concept that AI is going to be part of most workflows going forward will seem very familiar. We managed to show a number of GPUs and their use cases and how to deploy them. AI is not just going to happen in enormous AI factories. Latency needs, workflow needs, data security requirements, and even deployment preferences are going to push GPUs into most servers going forward.

We spend a lot of time focusing on the large AI cluster deployments, but the writing is on the wall. As we transition to an era of AI GPUs are coming to many other locations and server form factors. It felt like it was time to show folks a sample of different options in different categories. Over time, there will be new GPUs, new networking, and new architectures, but hopefully, this helps frame some of the common use cases and deployments for STH readers today.
Configurator
Supermicro A+ Workstation AS-2115HV-TNRT
- 2U Rackmount
- Supported CPU AMD Threadripper PRO 7000
- CPU Socket 1
-
Max Supported Memory
2 TB DDR5 ECC Supported
- Drive Bays 8 Hot swap 2.5" Drive Bays
-
Power Supply
2600 W Titanium Redundant
Request Quote
SuperWorkstation SYS-532AW-C
- Tower
- Supported CPU Intel Core Ultra Series 2
- CPU Socket 1
-
Max Supported Memory
192 GB DDR5 ECC Supported
-
Power Supply
1000 W Gold
Request Quote
IoT SuperServer SYS-212GB-NR
- 2U Rackmount
- Supported CPUs Intel Xeon 6700 E-cores
- Intel Xeon 6700 P-cores
- CPU Socket 1
-
Max Supported Memory
2 TB DDR5 ECC Supported
-
Power Supply
2000 W Titanium Redundant
Request Quote
IoT SuperServer SYS-E403-14B-FRN2T
- IoT
- Supported CPUs Intel Xeon 6500 P-cores
- Intel Xeon 6700 E-cores
- Intel Xeon 6700 P-cores
- CPU Socket 1
-
Max Supported Memory
2 TB DDR5 ECC Supported
-
Power Supply
800 W Platinum Redundant
Request Quote
A+ Workstation AS-531AW-TC
- Tower
- Supported CPU AMD Threadripper PRO 7000
- CPU Socket 1
-
Max Supported Memory
512 GB DDR5 ECC Supported
- Drive Bays Default: 6 internal fixed 2.5" SATA Fixed 2.5" Drive Bays
-
Power Supply
2000 W Gold
Request Quote
GPU SuperServer SYS-522GA-NRT
- 5U Rackmount
- Supported CPU Intel Xeon 6900 P-cores
- CPU Sockets 2
-
Max Supported Memory
6 TB DDR5 ECC Supported
- Drive Bays 12 Hot swap 2.5" Drive Bays
-
Power Supply
2700 W Titanium Redundant