Optimizing a Chatbot on macOS, Part 2: Baseline

Install llama-cpp-python with Metal support, download Qwen2.5 14B, write a minimal decode loop, and measure our baseline TTFT, ITL, and eval score.

Optimizing a Chatbot on macOS, Part 1: Introduction

Introducing a practical series on making a local LLM chatbot fast on macOS and eventually on any laptop.