Red Hat AI Inference Server

Red Hat® AI Inference Server optimizes model inference across the hybrid cloud for faster, cost-effective model deployments. 

Red Hat Inference Server hero graphic

What is an inference server?

An inference server is the piece of software that allows artificial intelligence (AI) applications to communicate with large language models (LLMs) and generate a response based on data. This process is called inference. It’s where the business value happens and the end result is delivered.

To perform effectively, LLMs need extensive storage, memory, and infrastructure to inference at scale—which is why it can take the majority of your budget. 

As part of the Red Hat AI platform, Red Hat AI Inference Server optimizes inference capabilities to drive down traditionally high costs and extensive infrastructure. 

Fast, cost-effective AI inference with Red Hat AI Inference Server. Video duration: 2:28

Screen shot of an interactive demo showing a command line interface and a prompt stating:

Introduction to Red Hat AI Inference Server

How does Red Hat AI Inference Server work?

Red Hat AI Inference Server provides fast and cost-effective inference at scale. Its open source nature allows it to support any generative AI (gen AI) model, on any AI accelerator, in any cloud environment. 

Powered by vLLM, the inference server maximizes GPU utilization, and enables faster response times. Combined with LLM Compressor capabilities, inference efficiency increases without sacrificing performance. With cross-platform adaptability and a growing community of contributors, vLLM is emerging as the Linux® of gen AI inference. 

50%

Some customers who used LLM Compressor experienced 50% cost savings without sacrificing performance.* 

*Zelenović, Saša. “Unleash the full potential of LLMs: Optimize for performance with vLLM.” Red Hat Blog, 27 Feb. 2025. 

Your models are your choice

Red Hat AI Inference Server supports all leading open source models and maintains flexible GPU portability. You have the flexibility to use any gen AI model and choose from our optimized collection of validated, open source, third-party models.  

Plus, as part of Red Hat AI, Red Hat AI Inference Server is certified for all Red Hat products. It can also be deployed across other Linux and Kubernetes platforms with support under Red Hat’s third-party support policy

Red Hat AI Inference Server model choice graphic

Increased efficiency with vLLM

Optimize the deployment of any gen AI model, on any AI accelerator, with vLLM.

LLM Compressor

Compress models of any size to reduce compute utilization and its related costs while maintaining high model response accuracy. 

Hybrid cloud flexibility

Maintain portability across different GPUs and run models on premise, in the cloud, or at the edge.

Red Hat AI repository

Third-party validated and optimized models are ready for inference deployment, to help achieve faster time to value and to keep costs low.

Red Hat AI Support

As one of the largest commercial contributors to vLLM, we have a deep understanding of the technology. Our AI consultants have the vLLM expertise to help you achieve your enterprise AI goals. 

Red Hat AI screen graphic

How to buy

Red Hat AI Inference Server is available as a standalone product, or as part of Red Hat AI. It is included in both Red Hat Enterprise Linux® AI and Red Hat OpenShift® AI. 

Red Hat AI

Deploy with partners

Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions. 

Dell Technologies Logo
Lenovo Logo
Intel logo
Nvidia Logo
AMD logo

Frequently asked questions

Do I need to buy Red Hat Enterprise Linux AI or Red Hat OpenShift AI to use Red Hat AI Inference Server?

No. You can purchase Red Hat AI Inference Server as a standalone Red Hat product. 

Do I need to buy Red Hat AI Inference Server to use Red Hat Enterprise Linux AI?

No. Red Hat AI Inference Server is included when you purchase Red Hat Enterprise Linux AI as well as Red Hat OpenShift AI. 

Can Red Hat AI Inference Server run across Red Hat Enterprise Linux or Red Hat OpenShift?

Yes, it can. It can also run on third-party Linux environments under our third-party agreement.

How is Red Hat AI Inference Server priced?

It is priced per accelerator.

Explore more AI resources

How to get started with AI at the enterprise

Get Red Hat Consulting for AI

Maximize AI innovation with open source models

Red Hat Consulting: AI Platform Foundation

Contact Sales

Talk to a Red Hatter about Red Hat AI