The NVIDIA DGX Spark: Honest Three-Month Review

The NVIDIA DGX Spark: Honest Three-Month Review | PCIFIC

The NVIDIA DGX Spark: A Three-Month Honest Review

This is the NVIDIA DGX Spark. And it is not just any DGX Spark; it is mine. I have lived with this machine for about three months now, and after putting it through its paces, here is my honest take.

Design and Hardware: A Supercomputer in Disguise

First, let us talk about the aesthetics. This thing is stunning. NVIDIA designers knocked it out of the park. From the front grill to the champagne gold finish, the build quality is incredible. Sometimes I genuinely forget that I am looking at a supercomputer. It feels like a piece of industrial art.

Under the hood, we are looking at the GB10 chip. That is Grace for the CPU and Blackwell for the GPU. The "10" denotes the scale. For context, the GB200 is a single Grace chip paired with two massive Blackwell chips. This is the Superchip architecture powering the B200 clusters behind today's biggest AI models.

The Market Landscape

My unit packs 4TB of SSD storage, which is plenty for 100B parameter models at $q4$ or even $q8$ quantization. If you are shopping for a Spark, you will see a few variants:

NVIDIA Founders Edition: The official DGX Spark.
ASUS Ascent GX10: Available in 1TB, 2TB, and 4TB flavors.
Dell Pro Max GB10: This is where the naming confusion begins.
Gigabyte AI TOP Atom and MSI EDGE Xpert: Yes, Xpert without the ‘e’. Never change, MSI.

These all share the same chip. You are mostly choosing based on storage and cooling solutions.

The Memory Bandwidth Bottleneck

Every config includes 128GB of LPDDR5X. While LPDDR5X is efficient, it creates a bit of a bottleneck. The system has 273GB/s of memory bandwidth. While that sounds fast, a 3090 Ti hits 1.1TB/s (roughly 4x the speed).

However, the Spark is not a 3090 Ti. It is built for professionals fine-tuning models and doing data science. If you are strictly looking for inference speed, you might be better off with a high-spec Mac with Unified HBM or an AMD Strix Halo PC.

Performance: The Dual-Spark Setup

I actually have two of these. To get them, I basically had to sell a kidney. Why two? Because when we were building PCIFIC, we originally planned to fine-tune our own models for the marketplace backend.

To link them, you cannot use standard ethernet. You need a QSFP networking cable (specifically a Direct Attach Copper or DAC cable) plugged into the NVIDIA ConnectX-7 ports. It is specialized data center hardware that allows for memory sharing between systems. The 200Gb/s interconnect is fantastic.

With both Sparks hooked up, I was running MiniMax m2.7 (roughly 227B parameters). It is not blistering (about 20 tokens per second) but having a local model that rivals Opus 4.5 sitting on your desk is insane.

The Shipping Pallet Hack: Understanding NVFP4

The Spark utilizes NVFP4, NVIDIA’s proprietary format and the gold standard for inference on TensorRT. To explain why this matters, let us use a marketplace example like PCIFIC:

FP16 (The Gram Scale): You weigh every laptop to the exact decimal (1,452.34g). It is flawless for calculating 4.5% fees and shipping insurance, but storing millions of these hyper-accurate numbers slows the backend to a crawl.
FP4 (The T-Shirt Sizes): You use 16 simple categories. It is fast and tiny, but a 17-inch gaming rig and a 15-inch ultrabook both get rounded to category 14. The AI loses nuance and starts outputting garbage.
NVFP4 (The Shipping Pallet): This uses microscaling. You pack 32 devices onto a pallet and weigh the entire pallet with FP16 precision. For the individual devices inside, you use the cheap, fast FP4 T-shirt sizes. Because the system knows the total pallet weight, it can reverse-engineer the accuracy.

You get the speed of 4-bit with the accuracy of 16-bit. The only downside is that it is not great for dense media like image generation where you lose too much detail.

The Software Nightmare: The ARM Problem

Now for the frustrating part. The DGX Spark is an ARM chip, not x86. While ARM is the king of performance-per-watt, the deep learning ecosystem was built for x86.

Because the Spark sits in this weird middle ground between a gaming chip and a data center chip, the software support is spotty:

PyTorch can be temperamental.
Flash Attention (2, 3, and 4) often refuses to compile.
vLLM requires the enforce eager flag, adding 5 to 10ms of latency per token.
Unsloth Studio does not support it yet.

For a £4,000 machine that has been out for six months from the world's most valuable company, this is unacceptable.

The Verdict

Despite the software headaches, I love it. The form factor is unique and the power density is mind-boggling. Running huge models like MiniMax m2.7 with a full context window locally is a glimpse into the future.

It is an almost perfect product if you have the patience and the budget to handle the growing pains of a new architecture.