Technology ❯Artificial Intelligence ❯Model Evaluation

Benchmarking

Performance Metrics GSM8K AIME and MATH Tests MATH-500 MMLU Reasoning Performance User Feedback Transparency Issues Human Evaluation

6 ARTICLES

3w ago

Microsoft Unveils Phi-4 Reasoning Models That Outperform Larger AI Systems

The new Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning models deliver advanced reasoning capabilities with smaller sizes, open weights, and broad deployment options.

7 ARTICLES

last mo.

Meta Faces Backlash Over Use of Experimental AI Model for Benchmark Testing

3 ARTICLES

3mo ago

Mistral Unveils Small 3 AI Model, Rivaling Larger Competitors in Efficiency and Accuracy

5 ARTICLES

6mo ago

Alibaba Launches QwQ-32B AI Model to Challenge OpenAI's Reasoning Models

5 ARTICLES

8mo ago

Mistral Unveils Pixtral 12B, Its First Multimodal AI Model