Beta Space StudioBeta Space Studio logo
📊 BenchmarksNEW

TRLawBench

Turkish law LLM benchmark: Measures large language models' Turkish legal knowledge and reasoning across 3 stages. Comprehensive evaluation of 297 questions covering ÖSYM exam items, advanced reasoning, and open-ended analysis.

Developer: Beta Space Studio

Overview

TRLawBench is an open three-stage benchmark that measures how well large language models (GPT, Claude, Gemini, Gemma, Qwen, Grok, DeepSeek, Mistral, Llama and others) handle Turkish law. Stages 2 and 3, co-authored with attorney Esat Erbil Tavus, were compiled from print sources and never published online, eliminating training-data contamination.

Why TRLawBench?

  • Turkish legal reasoning focus: Not general language fluency but grounded evaluation against Turkish Code of Obligations (TBK), Turkish Penal Code (TCK), the Constitution, and related legislation.
  • Contamination-free: 200 questions sourced from print, never indexed online, so they cannot appear in training data.
  • Transparent methodology: Correct-answer detection is deterministic; the LLM judge scores only reasoning quality and is blind to the model identity.

Evaluation Methodology

  • Stage 1: Foundational Legal Knowledge. 97 multiple-choice questions from ÖSYM legal exams. Correct answers derived from the official ÖSYM answer key.
  • Stage 2: Advanced Turkish Legal Reasoning. 100 advanced multiple-choice questions across 14 categories. Print-sourced, never online.
  • Stage 3: Open-Ended Questions. 100 open-ended questions spanning 14 branches of law. Deterministic letter-match plus Gemini 3.1 Pro blind-judge scoring on a 0 to 10 scale.

Per-stage leaderboards, accuracy rates, and sample Q&A are available in the interactive panel above.

Stage 0123 models

ÖSYM Legal Questions

97-question evaluation set drawn from ÖSYM legal qualification exams. Measures models' foundational knowledge of Turkish law and exam-style problem-solving.

Evaluation run in April 2025. Gemma 4 31B IT (reasoning on/off) results added in April 2026.

Score Comparison

o3
OpenAI
94.85%
Gemini 2.5 Pro (preview 03-25)
Google
89.69%
Claude 3.7 Sonnet (thinking)
Anthropic
82.47%
Gemini 2.5 Flash
Google
82.47%
o1
OpenAI
80.41%
Claude 3.7 Sonnet
Anthropic
72.16%
Gemma 4 31B IT (thinking)
Google
71.13%
Claude 3.5 Sonnet
Anthropic
71.13%
GPT-4o
OpenAI
67.01%
Gemini 2.0 Flash
Google
64.95%
Grok 3 Beta
xAI
61.86%
GPT-4.5
OpenAI
61.86%
Gemma 4 31B IT
Google
60.82%
DeepSeek R1
DeepSeek
58.76%
GPT-4.1
OpenAI
58.76%
Llama 4 Maverick
Meta
57.73%
DeepSeek V3
DeepSeek
54.64%
GPT-4.1 Mini
OpenAI
49.48%
Qwen Max
Alibaba
49.48%
o4-mini
OpenAI
48.45%
GPT-4o Mini
OpenAI
45.36%
Gemma 3 27B IT
Google
43.30%
Mistral Large 2411
Mistral
38.14%

Sample Question

All 97 questions

  • ASözleşmenin niteliğine ve işin özelliğine yabancı olan genel işlem koşulları yazılmamış sayılır.
  • BSözleşmenin yazılmamış sayılan genel işlem koşulları dışındaki hükümleri geçerliliğini korur.
  • CGenel işlem koşullarına, dürüstlük kurallarına aykırı olarak karşı tarafın aleyhine veya onun durumunu ağırlaştırıcı nitelikte hükümler konulamaz.
  • DAynı amaçla düzenlenen sözleşmelerin metinlerinin özdeş olmaması, bu sözleşmelerin içerdiği hükümlerin genel işlem koşulu sayılmasını engellemez.
  • EGenel işlem koşullarında yer alan bir hüküm, açık ve anlaşılır değilse veya birden çok anlama geliyorsa düzenleyenin aleyhine ve karşı tarafın lehine yorumlanır.
Expected Answer
C — The provision regulated under "content control" in the Turkish Code of Obligations: general terms and conditions cannot contain clauses that, contrary to good faith, disadvantage the other party or aggravate their position.
Model Answer
Gemma 4 31B IT (thinking)
Google
Correct
* Role: Assistant answering Turkish law questions based on the law of the Republic of Turkey. * Constraint: Output *only* the letter of the correct option (A, B, C, D, or E). No other text. * Question: Which of the following provisions regarding general terms and conditions (Genel İşlem Koşulları) is regulated under the heading "content control" (içerik denetimi) in the Turkish Code of Obligations (TBK No. 6098)? * TBK (Turkish Code of Obligations) regulates General Terms and Conditions in Articles 20-25. * *Article 20:* Definition. * *Article 21:* Control of the formation/incorporation (incorporation control - kurulma denetimi). * *Article 22:* Content control (içerik denetimi). * *Article 23:* Interpretation. * *Article 24:* Effects of invalidity. * *Article 25:* Limitation period. * **Option A:** "Sözleşmenin niteliğine ve işin özelliğine yabancı olan genel işlem koşulları yazılmamış sayılır." (General terms and conditions foreign to the nature of the contract and characteristics of the work are deemed not written.) * This is in Article 21 (Control of Incorporation/Formation). * **Option B:** "Sözleşmenin yazılmamış sayılan genel işlem koşulları dışındaki hükümleri geçerliliğini korur." (Provisions of the contract other than the general terms and conditions deemed not written maintain their validity.) * This is in Article 21 (Control of Incorporation/Formation). * **Option C:** "Genel işlem koşullarına, dürüstlük kurallarına aykırı olarak karşı tarafın aleyhine veya onun durumunu ağırlaştırıcı nitelikte hükümler konulamaz." (Provisions that are contrary to the rules of honesty and are against the other party or aggravate their situation cannot be included in general terms and conditions.) * This is the core of **Article 22 (İçerik Denetimi - Content Control)**. * **Option D:** "Aynı amaçla düzenlenen sözleşmelerin metinlerinin özdeş olmaması, bu sözleşmelerin içerdiği hükümlerin genel işlem koşulu sayılmasını engellemez." (The fact that the texts of contracts prepared for the same purpose are not identical does not prevent the provisions contained in these contracts from being considered general terms and conditions.) * This is in Article 20 (Definition). * **Option E:** "Genel işlem koşullarında yer alan bir hüküm, açık ve anlaşılır değilse veya birden çok anlama geliyorsa düzenleyenin aleyhine ve karşı tarafın lehine yorumlanır." (If a provision in the general terms and conditions is not clear and understandable or has multiple meanings, it is interpreted against the drafter and in favor of the other party.) * This is in Article 23 (Interpretation). * Content control (İçerik denetimi) refers to Article 22 TBK. * Option C matches Article 22. * The correct answer is C.
TRLawBench - Resources | Beta Space Studio | Beta Space Studio