Skip to content

OenoBench

OenoBench is a public benchmark of roughly five thousand wine-knowledge questions used to evaluate how well large language models understand the wine domain. The question set is balanced across four pillars — viticulture, winemaking, the wine business, and the world’s wine regions — and is graded on both factual accuracy and reasoning quality. Results are refreshed on a roughly quarterly cadence, with each refresh shipped as a versioned JSON dataset committed to this repository.

The questions themselves are produced by a multi-model generation pipeline with explicit bias-mitigation steps, so that no single model family disproportionately shapes the questions it will later be scored on. Two companion pages document the system in detail:

  • Methodology — the four-stage pipeline used to build and validate the question set.
  • Leaderboard — current model scores, broken down by domain and difficulty.