Optimizing a Chatbot on macOS, Part 2: Baseline
Install llama-cpp-python with Metal support, download Qwen2.5 14B, write a minimal decode loop, and measure our baseline TTFT, ITL, and eval score.
Install llama-cpp-python with Metal support, download Qwen2.5 14B, write a minimal decode loop, and measure our baseline TTFT, ITL, and eval score.
Introducing a practical series on making a local LLM chatbot fast on macOS and eventually on any laptop.