# LMCache
**Repository Path**: 7n/LMCache
## Basic Information
- **Project Name**: LMCache
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: dev
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-19
- **Last Updated**: 2025-08-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[](https://docs.lmcache.ai/)
[](https://pypi.org/project/lmcache/)
[](https://pypi.org/project/lmcache/)
[](https://buildkite.com/lmcache/lmcache-unittests)
[](https://github.com/LMCache/LMCache/actions/workflows/code_quality_checks.yml)
[](https://buildkite.com/lmcache/lmcache-vllm-integration-tests)
[](https://www.bestpractices.dev/projects/10841)
[](https://scorecard.dev/viewer/?uri=github.com/LMCache/LMCache)
[](https://deepwiki.com/LMCache/LMCache/)
[](https://github.com/LMCache/LMCache/graphs/commit-activity)
[](https://pypi.org/project/lmcache/)
[](https://www.youtube.com/channel/UC58zMz55n70rtf1Ak2PULJA)
--------------------------------------------------------------------------------
| [**Blog**](https://blog.lmcache.ai/)
| [**Documentation**](https://docs.lmcache.ai/)
| [**Join Slack**](https://join.slack.com/t/lmcacheworkspace/shared_invite/zt-36x1m765z-8FgDA_73vcXtlZ_4XvpE6Q)
| [**Interest Form**](https://forms.gle/MHwLiYDU6kcW3dLj7)
| [**Roadmap**](https://github.com/LMCache/LMCache/issues/1253)
🔥 **NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLM [Production Stack](https://github.com/vllm-project/production-stack). LMCache is also officially supported in [llm-d](https://github.com/llm-d/llm-d/) and [KServe](https://github.com/kserve/kserve)!**
## Summary
LMCache is an **LLM** serving engine extension to **reduce TTFT** and **increase throughput**, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of **_any_** reused text (not necessarily prefix) in **_any_** serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.
By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

## Features
- [x] 🔥 Integration with vLLM v1 with the following features:
* High performance CPU KVCache offloading
* Disaggregated prefill
* P2P KVCache sharing
- [x] LMCache is supported in the [vLLM production stack](https://github.com/vllm-project/production-stack/), [llm-d](https://github.com/llm-d/llm-d/), and [KServe](https://github.com/kserve/kserve)
- [x] Stable support for non-prefix KV caches
- [x] Storage support as follows:
* CPU
* Disk
* [NIXL](https://github.com/ai-dynamo/nixl)
- [x] Installation support through pip and latest vLLM
## Installation
To use LMCache, simply install `lmcache` from your package manager, e.g. pip:
```bash
pip install lmcache
```
Works on Linux NVIDIA GPU platform.
More [detailed installation instructions](https://docs.lmcache.ai/getting_started/installation) are available in the docs.
## Getting started
The best way to get started is to checkout the [Quickstart Examples](https://docs.lmcache.ai/getting_started/quickstart/) in the docs.
## Documentation
Check out the LMCache [documentation](https://docs.lmcache.ai/) which is available online.
We also post regularly in [LMCache blogs](https://blog.lmcache.ai/).
## Examples
Go hands-on with our [examples](https://github.com/LMCache/LMCache/tree/dev/examples),
demonstrating how to address different use cases with LMCache.
## Interested in Connecting?
Fill out the [interest form](https://forms.gle/mQfQDUXbKfp2St1z7), [sign up for our newsletter](https://mailchi.mp/tensormesh/lmcache-sign-up-newsletter), [join LMCache slack](https://join.slack.com/t/lmcacheworkspace/shared_invite/zt-2viziwhue-5Amprc9k5hcIdXT7XevTaQ), [check out LMCache website](https://lmcache.ai/), or [drop an email](mailto:contact@lmcache.ai), and our team will reach out to you!
## Community meeting
The [community meeting]( https://uchicago.zoom.us/j/6603596916?pwd=Z1E5MDRWUSt2am5XbEt4dTFkNGx6QT09) for LMCache is hosted bi-weekly. All are welcome to join!
Meetings are held bi-weekly on: Tuesdays at 9:00 AM PT – [Add to Calendar](https://drive.usercontent.google.com/u/0/uc?id=1f5EXbooGcwNwzIpTgn5u4PHqXgfypMtu&export=download)
We keep notes from each meeting on this [document](https://docs.google.com/document/d/1_Fl3vLtERFa3vTH00cezri78NihNBtSClK-_1tSrcow) for summaries of standups, discussion, and action items.
Recordings of meetings are available on the [YouTube LMCache channel](https://www.youtube.com/channel/UC58zMz55n70rtf1Ak2PULJA).
## Contributing
We welcome and value all contributions and collaborations. Please check out [Contributing Guide](CONTRIBUTING.md) on how to contribute.
We continually update [[Onboarding] Welcoming contributors with good first issues!](https://github.com/LMCache/LMCache/issues/627)
## Citation
If you use LMCache for your research, please cite our papers:
```
@inproceedings{liu2024cachegen,
title={Cachegen: Kv cache compression and streaming for fast large language model serving},
author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
pages={38--56},
year={2024}
}
@article{cheng2024large,
title={Do Large Language Models Need a Content Delivery Network?},
author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
journal={arXiv preprint arXiv:2409.13761},
year={2024}
}
@inproceedings{10.1145/3689031.3696098,
author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},
year = {2025},
url = {https://doi.org/10.1145/3689031.3696098},
doi = {10.1145/3689031.3696098},
booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},
pages = {94–109},
}
```
## Socials
[Linkedin](https://www.linkedin.com/company/lmcache-lab/?viewAsMember=true) | [Twitter](https://x.com/lmcache) | [Youtube](https://www.youtube.com/@LMCacheTeam)
## License
The LMCache codebase is licensed under Apache License 2.0. See the [LICENSE](LICENSE) file for details.