From 71fdd9472fa91d9498e8ebc1f89a7ebbdf5cc172 Mon Sep 17 00:00:00 2001
From: tsong <tsong@microsoft.com>
Date: Tue, 15 Apr 2025 14:36:05 +0000
Subject: [PATCH] add third-party demo

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/README.md b/README.md
index bcaff9e..e4708bb 100644
--- a/README.md
+++ b/README.md
@@ -4,6 +4,8 @@
 
 [<img src="./assets/header_model_release.png" alt="BitNet Model on Hugging Face" width="800"/>](https://huggingface.co/microsoft/BitNet-b1.58-2B-4T)
 
+Try it out via this [demo hosted by third-party](https://bitnet-demo.azurewebsites.net/)， or [build and run](https://github.com/microsoft/BitNet?tab=readme-ov-file#build-from-source) it on your own CPU.
+
 bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support **fast** and **lossless** inference of 1.58-bit models on CPU (with NPU and GPU support coming next).
 
 The first release of bitnet.cpp is to support inference on CPUs. bitnet.cpp achieves speedups of **1.37x** to **5.07x** on ARM CPUs, with larger models experiencing greater performance gains. Additionally, it reduces energy consumption by **55.4%** to **70.0%**, further boosting overall efficiency. On x86 CPUs, speedups range from **2.37x** to **6.17x** with energy reductions between **71.9%** to **82.2%**. Furthermore, bitnet.cpp can run a 100B BitNet b1.58 model on a single CPU, achieving speeds comparable to human reading (5-7 tokens per second), significantly enhancing the potential for running LLMs on local devices. Please refer to the [technical report](https://arxiv.org/abs/2410.16144) for more details.