Red Hat AI Inference Server

Red Hat® AI Inference Server optimizes model inference across the hybrid cloud for faster, cost-effective model deployments.

Talk to a Red Hatter Read the press release

What is an inference server?

An inference server is the piece of software that allows artificial intelligence (AI) applications to communicate with large language models (LLMs) and generate a response based on data. This process is called inference. It’s where the business value happens and the end result is delivered.

To perform effectively, LLMs need extensive storage, memory, and infrastructure to inference at scale—which is why it can take the majority of your budget.

As part of the Red Hat AI platform, Red Hat AI Inference Server optimizes inference capabilities to drive down traditionally high costs and extensive infrastructure.

Learn more about AI inference

Fast, cost-effective AI inference with Red Hat AI Inference Server. Video duration: 2:28

Introduction to Red Hat AI Inference Server

How does Red Hat AI Inference Server work?

Red Hat AI Inference Server provides fast and cost-effective inference at scale. Its open source nature allows it to support any generative AI (gen AI) model, on any AI accelerator, in any cloud environment.

Powered by vLLM, the inference server maximizes GPU utilization, and enables faster response times. Combined with LLM Compressor capabilities, inference efficiency increases without sacrificing performance. With cross-platform adaptability and a growing community of contributors, vLLM is emerging as the Linux® of gen AI inference.

Learn more about vLLM

50%

Some customers who used LLM Compressor experienced 50% cost savings without sacrificing performance.*

*Zelenović, Saša. “Unleash the full potential of LLMs: Optimize for performance with vLLM.” Red Hat Blog, 27 Feb. 2025.

Your models are your choice

Red Hat AI Inference Server supports all leading open source models and maintains flexible GPU portability. You have the flexibility to use any gen AI model and choose from our optimized collection of validated, open source, third-party models.

Plus, as part of Red Hat AI, Red Hat AI Inference Server is certified for all Red Hat products. It can also be deployed across other Linux and Kubernetes platforms with support under Red Hat’s third-party support policy.

Explore the model repository on Hugging Face

Red Hat AI Inference Server model choice graphic

Features and benefits

Get product documentation | Download the datasheet

Increased efficiency with vLLM

Optimize the deployment of any gen AI model, on any AI accelerator, with vLLM.

LLM Compressor

Compress models of any size to reduce compute utilization and its related costs while maintaining high model response accuracy.

Hybrid cloud flexibility

Maintain portability across different GPUs and run models on premise, in the cloud, or at the edge.

Red Hat AI repository

Third-party validated and optimized models are ready for inference deployment, to help achieve faster time to value and to keep costs low.

Red Hat AI repository on Hugging Face

Red Hat AI Support

As one of the largest commercial contributors to vLLM, we have a deep understanding of the technology. Our AI consultants have the vLLM expertise to help you achieve your enterprise AI goals.

More about Red Hat AI Consulting

How to buy

Red Hat AI Inference Server is available as a standalone product, or as part of Red Hat AI. It is included in both Red Hat Enterprise Linux® AI and Red Hat OpenShift® AI.

Talk to a Red Hatter

Deploy with partners

Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions.

Browse Red Hat AI partners

Frequently asked questions

Do I need to buy Red Hat Enterprise Linux AI or Red Hat OpenShift AI to use Red Hat AI Inference Server?

No. You can purchase Red Hat AI Inference Server as a standalone Red Hat product.

Do I need to buy Red Hat AI Inference Server to use Red Hat Enterprise Linux AI?

No. Red Hat AI Inference Server is included when you purchase Red Hat Enterprise Linux AI as well as Red Hat OpenShift AI.

How to buy Red Hat Enterprise Linux AI

Can Red Hat AI Inference Server run across Red Hat Enterprise Linux or Red Hat OpenShift?

Yes, it can. It can also run on third-party Linux environments under our third-party agreement.

How is Red Hat AI Inference Server priced?

It is priced per accelerator.