• Latest
  • Trending
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

NALCO declares ₹4.50 second interim dividend for FY26

NALCO declares ₹4.50 second interim dividend for FY26

SAIL posts ₹3,142 crore profit in H1 FY26, up 32% YoY on cost efficiency and higher sales

SAIL reports sound net profit growth in Q3FY26

BlackRock and Partners Group introduce outcome-based private markets SMA

BlackRock and Partners Group introduce outcome-based private markets SMA

Global commodities update: Metals cool, gas jumps, oil edges lower

Global commodities update: Metals cool, gas jumps, oil edges lower

LIC Housing Finance Q3 FY26: Stable performance, margins improve

LIC Housing Finance Q3 FY26: Stable performance, margins improve

Outlook for gold and silver in 2026: Kedia Advisory

Gold & silver ETFs see sharp drop after record highs; Trump nominates Kevin Warsh as Fed Chair

Dabur India reports steady Q3FY26 performance

Dabur India reports steady Q3FY26 performance

Canara Bank proposes to raise Basel III–Compliant Tier II Bonds

Canara Bank stock under pressure post Q3FY26 results

TCS to build largest delivery centre in Brazil with $37 million investment

TCS to build largest delivery centre in Brazil with $37 million investment

HAL expands civil helicopter portfolio with ₹1,800 crore Pawan Hans order

HAL expands civil helicopter portfolio with ₹1,800 crore Pawan Hans order

Copper enters historic supercycle as LME prices smash $14,000: Kedia Advisory

Copper enters historic supercycle as LME prices smash $14,000: Kedia Advisory

Vedanta stock in focus: PAT jumps 60% YoY, EBITDA margin expands to 41% in Q3 FY26

Vedanta stock in focus: PAT jumps 60% YoY, EBITDA margin expands to 41% in Q3 FY26

Saturday, January 31, 2026
  • Login
Data Biz Times
  • Commodity
  • Data Story
  • Market
  • Business
  • News
  • Contact Us
No Result
View All Result
Data Biz Times
No Result
View All Result

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

in Tech
Reading Time: 3 mins read
0
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models
Share on FacebookShare on Twitter

DBT Bureau

Pune, 26 Sep 2025

Samsung Electronics unveiled TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark), a proprietary benchmark developed by Samsung Research to evaluate AI productivity.

TRUEBench provides a comprehensive set of metrics to measure how large language models (LLMs) perform in real-world workplace productivity applications. To ensure realistic evaluation, it incorporates diverse dialogue scenarios and multilingual conditions.

Drawing on Samsung’s in-house use of AI for productivity, TRUEBench evaluates commonly used enterprise tasks — such as content generation, data analysis, summarization and translation — across 10 categories and 46 sub-categories. The benchmark ensures reliable scoring with AI-powered automatic evaluation based on criteria that are collaboratively designed and refined by both humans and AI.

“Samsung Research brings deep expertise and a competitive edge through its real-world AI experience,” said Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Research. “We expect TRUEBench to establish evaluation standards for productivity and solidify Samsung’s technological leadership.”

Recently, as companies adopt AI for tasks there has been a growing demand for measuring the productivity of LLMs. However, existing benchmarks primarily measure overall performance, are mostly English‑centric, and are limited to single‑turn question‑answer structures. This restricts their ability to reflect actual work environments.

To address these limitations, TRUEBench is composed of a total of 2,485 test sets across 10 categories and 12 languages1 — while also supporting cross-linguistic scenarios. The test sets examine what AI models can actually solve, and Samsung Research applied test sets ranging from as short as 8 characters to over 20,000 characters, reflecting tasks from simple requests to lengthy document summarization.

To evaluate the performance of AI models, it is important to have clear criteria for judging whether the AI’s responses are correct. In real-world situations, not all user intents may be explicitly stated in the instructions. TRUEBench is designed to enable realistic evaluation by considering not only the accuracy of the answers but also detailed conditions that meet the implicit needs of users.

Samsung Research verified evaluation items through collaboration between humans and AI. First, human annotators create the evaluation criteria, and then the AI reviews it to check for errors, contradictions or unnecessary constraints. Afterward, human annotators refine the criteria again, repeating this process to apply increasingly precise evaluation standards. Based on these cross-verified criteria, automatic evaluation of AI models is conducted, minimizing subjective bias and ensuring consistency. In addition, for each test, all conditions must be satisfied for the model to pass. This enables more detailed and precise scoring across tasks.

TRUEBench’s data samples and leaderboards are available on the global open-source platform Hugging Face, which allows users to compare a maximum of five models and enables comprehensive AI model performance comparisons at a glance. Moreover, data on the average length of response results are also published, enabling simultaneous comparison of both performance and efficiency.

Related Posts

TCS to build largest delivery centre in Brazil with $37 million investment

TCS to build largest delivery centre in Brazil with $37 million investment

0

DBT Bureau Pune, 30 Jan 2026 Tata Consultancy Services reported the construction of its largest delivery centre in Londrina, Brazil....

Sudhir Singh confident of strong FY26 for Coforge

Coforge to acquire US-based Encora for $2.35 billion

0

Athira Sethu Kochi, 27 Dec 2025 Indian IT company Coforge made an announcement on Friday that it will acquire the...

Indian tech firms struggle with high H-1B rejection rates

New rules make H-1B visas harder to get

0

Athira Sethu Kochi, 26 Dec 2025 The US administration has just announced significant changes to the H-1B visa program after...

HARMAN to acquire ZF Group’s ADAS business for €1.5 billion

HARMAN to acquire ZF Group’s ADAS business for €1.5 billion

0

DBT Bureau Pune, 24 Dec 2025 HARMAN International, a wholly-owned subsidiary of Samsung Electronics, reported that it has entered into...

NALCO declares ₹4.50 second interim dividend for FY26
News

NALCO declares ₹4.50 second interim dividend for FY26

0

DBT Bureau Pune, 31 Jan 2026 National Aluminium Company Ltd (NALCO) has announced a second interim dividend of ₹4.50 per...

Read moreDetails
SAIL posts ₹3,142 crore profit in H1 FY26, up 32% YoY on cost efficiency and higher sales
Data Story

SAIL reports sound net profit growth in Q3FY26

0

Athira Sethu Kochi, 31 Jan 2026 Steel Authority of India Limited (SAIL), a Maharatna public sector steel giant, has reported...

Read moreDetails
BlackRock and Partners Group introduce outcome-based private markets SMA
Media Release

BlackRock and Partners Group introduce outcome-based private markets SMA

0

DBT Bureau Pune, 31 Jan 2026 BlackRock and Partners Group announced the launch of a multi-alternatives SMA. The first-of-its-kind solution...

Read moreDetails
Global commodities update: Metals cool, gas jumps, oil edges lower
Commodity

Global commodities update: Metals cool, gas jumps, oil edges lower

0

DBT Bureau Pune, 31 Jan 2026 According to Geojit Investments, global commodity markets remained volatile, with precious metals, energy, and...

Read moreDetails
DBT Bureau

Data Biz Times © 2024. All Rights Reserved.

Navigate Site

  • Media Release
  • Blog
  • Contact Us
  • Privacy Policy

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Artificial Intelligence
  • Business
  • Data Story
  • Market
  • Media Release
  • News
  • Tech
  • Contact Us

Data Biz Times © 2024. All Rights Reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?