Red Hat AI Inference Server
Red Hat® AI Inference Server optimizes model inference across the hybrid cloud for faster, cost-effective model deployments.
What is an inference server?
An inference server is the piece of software that allows artificial intelligence (AI) applications to communicate with large language models (LLMs) and generate a response based on data. This process is called inference. It’s where the business value happens and the end result is delivered.
To perform effectively, LLMs need extensive storage, memory, and infrastructure to inference at scale—which is why it can take the majority of your budget.
As part of the Red Hat AI platform, Red Hat AI Inference Server optimizes inference capabilities to drive down traditionally high costs and extensive infrastructure.
Introduction to Red Hat AI Inference Server
How does Red Hat AI Inference Server work?
Red Hat AI Inference Server provides fast and cost-effective inference at scale. Its open source nature allows it to support any generative AI (gen AI) model, on any AI accelerator, in any cloud environment.
Powered by vLLM, the inference server maximizes GPU utilization, and enables faster response times. Combined with LLM Compressor capabilities, inference efficiency increases without sacrificing performance. With cross-platform adaptability and a growing community of contributors, vLLM is emerging as the Linux® of gen AI inference.



50% Some customers who used LLM Compressor experienced 50% cost savings without sacrificing performance.*
*Zelenović, Saša. “Unleash the full potential of LLMs: Optimize for performance with vLLM.” Red Hat Blog, 27 Feb. 2025.
Your models are your choice
Red Hat AI Inference Server supports all leading open source models and maintains flexible GPU portability. You have the flexibility to use any gen AI model and choose from our optimized collection of validated, open source, third-party models.
Plus, as part of Red Hat AI, Red Hat AI Inference Server is certified for all Red Hat products. It can also be deployed across other Linux and Kubernetes platforms with support under Red Hat’s third-party support policy.



Red Hat AI Support
As one of the largest commercial contributors to vLLM, we have a deep understanding of the technology. Our AI consultants have the vLLM expertise to help you achieve your enterprise AI goals.
How to buy
Red Hat AI Inference Server is available as a standalone product, or as part of Red Hat AI. It is included in both Red Hat Enterprise Linux® AI and Red Hat OpenShift® AI.
Deploy with partners
Experts and technologies are coming together so our customers can do more with AI. Explore all of the partners working with Red Hat to certify their operability with our solutions.
Frequently asked questions
Do I need to buy Red Hat Enterprise Linux AI or Red Hat OpenShift AI to use Red Hat AI Inference Server?
No. You can purchase Red Hat AI Inference Server as a standalone Red Hat product.
Do I need to buy Red Hat AI Inference Server to use Red Hat Enterprise Linux AI?
No. Red Hat AI Inference Server is included when you purchase Red Hat Enterprise Linux AI as well as Red Hat OpenShift AI.
Can Red Hat AI Inference Server run across Red Hat Enterprise Linux or Red Hat OpenShift?
Yes, it can. It can also run on third-party Linux environments under our third-party agreement.
How is Red Hat AI Inference Server priced?
It is priced per accelerator.
