Technology ❯Artificial Intelligence ❯Model Evaluation
Performance Metrics GSM8K AIME and MATH Tests MATH-500 MMLU Reasoning Performance User Feedback Transparency Issues Human Evaluation
The new Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models deliver advanced reasoning capabilities with smaller sizes, open weights, and broad deployment options.