PyTorch Library for Running LLM on Intel CPU and GPU

The BigDL project has transitioned into ipex-llm, a PyTorch library designed for efficient LLM execution on Intel CPUs and GPUs with minimal latency. Integrated with Intel Extension for PyTorch (IPEX), ipex-llm offers seamless collaboration with various technologies like llama.cpp and HuggingFace transformers. Boasting over 50 optimized models, including popular ones like LLaMA2 and Mixtral, ipex-llm continues to expand its capabilities with features like Self-Speculative Decoding and a wide range of LLM finetuning options. Notable updates include support for loading models from ModelScope and enabling large-scale LLM inference on Intel GPUs. Visit the ipex-llm website for more information and detailed documentation.

https://github.com/intel-analytics/ipex-llm