Optimizing a Chatbot on macOS, Part 2: Baseline

Install llama-cpp-python with Metal support, download Qwen2.5 14B, write a minimal decode loop, and measure our baseline TTFT, ITL, and eval score.