Blog

Nebius September digest: Microsoft deal, NVIDIA Exemplar Status & benchmark results

September was a landmark month for Nebius. From a major new customer for our AI infrastructure to industry-leading performance recognition, we’ve made strides that directly strengthen the systems you rely on.

Platform news

Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads

We’re proud to announce that Nebius is one of the first NVIDIA Cloud Partners to achieve NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads. This recognition validates that Nebius meets NVIDIA’s rigorous standards for performance, resiliency, and scalability — addressing one of the most pressing challenges in AI infrastructure: ensuring consistent workload performance and predictable cost across clouds.

Technology articles

Build a multi-agent AI customer support system

This guide walks you through building a production-ready, multi-agent AI system by using the Google ADK and A2A, powered by Nebius AI Studio models. With sentiment detection, RAG-powered answers and escalation handling, you can automate customer queries end-to-end.

Technology articles

How tokenizers work in AI models: A beginner-friendly guide

Before AI can generate text, answer questions or summarize information, it first needs to read and understand human language. That’s where tokenization comes in. A tokenizer takes raw text and breaks it into smaller pieces or tokens. These tokens may represent whole words, parts of words or even individual characters and each is mapped to a unique numerical ID that models can process mathematically. In this article we’ll explore how tokenizers work, examine common approaches and walk through the basics of building one yourself.

Technology articles

Model distillation with compute: How to set it up

Model distillation is a practical way to shrink large models into efficient versions that run faster and cost less. As parameter counts climb into the billions, model distillation LLM makes it possible to cut GPU memory use, speed inference and simplify deployment. In this blog we’ll explain how the method works, why GPU compute matters, and what to keep in mind when moving from research models to production systems.

Technology articles

Setting up a RAG-powered content generation with Nebius AI Studio and Qdrant

Learn how to build a smart, scalable content generator by using Nebius AI’s Llama 3.3-70B and Qdrant’s vector search. This RAG-based system lets you upload brand-specific documents and get custom social posts, article drafts and more, rooted in your actual company data.

Technology articles

Incident post-mortem analysis: us-central1 service disruption on September 3, 2025

A detailed analysis of the incident on September 3, 2025 that led to service outages in the us-central1 region.

The incident impacted API operations and Console functionality due to persistent routing loops between network domains, while other regions remained operational.

Technology articles

What is Jupyter Notebook in the context of AI

Jupyter Notebook is a browser-based tool for interactive coding, data exploration and documentation. It lets you run code step by step while combining results, visualizations and explanations in one place. Widely used in machine learning, it speeds up experimentation, ensures reproducibility and makes collaboration easier. This article looks at how Jupyter supports ML workflows, its key features and the tasks it handles best.

Platform news

Nebius proves bare-metal-class performance for AI inference workloads in MLPerf® Inference v5.1

Today, we’re happy to share our new performance milestone — the latest submission of MLPerf® Inference v5.1 benchmarks, where Nebius achieved leading performance results for three AI systems accelerated by the most in-demand NVIDIA systems on the market: NVIDIA GB200 NVL72, HGX B200 and HGX H200.

Technology articles

Clusters vs single nodes: which to use in training and inference scenarios

Choosing between a single node and a cluster is one of the core infrastructure decisions when working with LLMs. The choice directly affects training speed, resource efficiency and operational costs. In this article we’ll explain how single-node and cluster configurations differ, when each works best and what to consider before choosing one.

Monthly digests

Nebius monthly digest: August 2025

In August, we introduced self-service NVIDIA Blackwell GPUs in Nebius AI Cloud and published several in-depth technical articles, including ones on cluster reliability and liquid cooling. We also continued to cover customer success — all this and more in the latest digest.

Technology articles

A beginner’s guide to Virtual Private Cloud (VPC) and its benefits

A Virtual Private Cloud (VPC) is a cornerstone of modern cloud infrastructure. It gives organizations the ability to isolate resources, control traffic and configure security much like a private data center — while keeping the flexibility of the cloud. In this article, we’ll look at what a VPC is, how it’s built and why it has become the standard environment for running applications and machine learning workloads.

Blog

Nebius September digest: Microsoft deal, NVIDIA Exemplar Status & benchmark results

Nebius achieves NVIDIA Exemplar Status on NVIDIA H200 GPUs for training workloads

Build a multi-agent AI customer support system

How tokenizers work in AI models: A beginner-friendly guide

Model distillation with compute: How to set it up

Setting up a RAG-powered content generation with Nebius AI Studio and Qdrant

Incident post-mortem analysis: us-central1 service disruption on September 3, 2025

What is Jupyter Notebook in the context of AI

Nebius proves bare-metal-class performance for AI inference workloads in MLPerf® Inference v5.1

Clusters vs single nodes: which to use in training and inference scenarios

Nebius monthly digest: August 2025

A beginner’s guide to Virtual Private Cloud (VPC) and its benefits

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal