• Latest
  • Trending
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Paradip Port handles significant crude oil traffic, five vessels expected next week

Iron ore shipments at Indian major ports dip marginally during April-November

IndiGo signals strong recovery — What’s next?

IndiGo signals strong recovery — What’s next?

U.S.–China trade easing boosts soybean shipments in 2025

U.S.–China trade easing boosts soybean shipments in 2025

India’s E&M industry to hit $47.2 bn by 2029 at 7.8% CAGR: PwC Report

India’s E&M industry to hit $47.2 bn by 2029 at 7.8% CAGR: PwC Report

IPO Tracker: What investors should know about CORONA Remedies?

IPO Tracker: What investors should know about CORONA Remedies?

SEBI drops hammer on influencer Avadhut Sathe; ₹601 crore refund ordered

SEBI drops hammer on influencer Avadhut Sathe; ₹601 crore refund ordered

PNB slashes repo-linked lending rate to 8.10% after RBI cut

PNB slashes repo-linked lending rate to 8.10% after RBI cut

Kaynes Tech shares corrected 20% last week: Know its Q2FY26 performance

Kaynes Tech shares corrected 20% last week: Know its Q2FY26 performance

AWS introduces Graviton5—the company’s most powerful and efficient CPU

AWS introduces Graviton5—the company’s most powerful and efficient CPU

Strong Indian demand lifts Brazil’s cotton exports toward 3.2M tons

Strong Indian demand lifts Brazil’s cotton exports toward 3.2M tons

ITC Hotels Block Deal: How was ITC Hotels performance in Q2FY26?

ITC Hotels Block Deal: How was ITC Hotels performance in Q2FY26?

Wakefit Innovations IPO to open on December 8, 2025: Key details you should know

IPO Tracker: How does Wakefit’s number stack up?

Sunday, December 7, 2025
  • Login
Data Biz Times
  • Commodity
  • Data Story
  • Market
  • Business
  • Media Release
  • Contact Us
No Result
View All Result
Data Biz Times
No Result
View All Result

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

in Tech
Reading Time: 3 mins read
0
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models
Share on FacebookShare on Twitter

DBT Bureau

Pune, 26 Sep 2025

Samsung Electronics unveiled TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark), a proprietary benchmark developed by Samsung Research to evaluate AI productivity.

TRUEBench provides a comprehensive set of metrics to measure how large language models (LLMs) perform in real-world workplace productivity applications. To ensure realistic evaluation, it incorporates diverse dialogue scenarios and multilingual conditions.

Drawing on Samsung’s in-house use of AI for productivity, TRUEBench evaluates commonly used enterprise tasks — such as content generation, data analysis, summarization and translation — across 10 categories and 46 sub-categories. The benchmark ensures reliable scoring with AI-powered automatic evaluation based on criteria that are collaboratively designed and refined by both humans and AI.

“Samsung Research brings deep expertise and a competitive edge through its real-world AI experience,” said Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Research. “We expect TRUEBench to establish evaluation standards for productivity and solidify Samsung’s technological leadership.”

Recently, as companies adopt AI for tasks there has been a growing demand for measuring the productivity of LLMs. However, existing benchmarks primarily measure overall performance, are mostly English‑centric, and are limited to single‑turn question‑answer structures. This restricts their ability to reflect actual work environments.

To address these limitations, TRUEBench is composed of a total of 2,485 test sets across 10 categories and 12 languages1 — while also supporting cross-linguistic scenarios. The test sets examine what AI models can actually solve, and Samsung Research applied test sets ranging from as short as 8 characters to over 20,000 characters, reflecting tasks from simple requests to lengthy document summarization.

To evaluate the performance of AI models, it is important to have clear criteria for judging whether the AI’s responses are correct. In real-world situations, not all user intents may be explicitly stated in the instructions. TRUEBench is designed to enable realistic evaluation by considering not only the accuracy of the answers but also detailed conditions that meet the implicit needs of users.

Samsung Research verified evaluation items through collaboration between humans and AI. First, human annotators create the evaluation criteria, and then the AI reviews it to check for errors, contradictions or unnecessary constraints. Afterward, human annotators refine the criteria again, repeating this process to apply increasingly precise evaluation standards. Based on these cross-verified criteria, automatic evaluation of AI models is conducted, minimizing subjective bias and ensuring consistency. In addition, for each test, all conditions must be satisfied for the model to pass. This enables more detailed and precise scoring across tasks.

TRUEBench’s data samples and leaderboards are available on the global open-source platform Hugging Face, which allows users to compare a maximum of five models and enables comprehensive AI model performance comparisons at a glance. Moreover, data on the average length of response results are also published, enabling simultaneous comparison of both performance and efficiency.

Related Posts

AWS introduces Graviton5—the company’s most powerful and efficient CPU

AWS introduces Graviton5—the company’s most powerful and efficient CPU

0

DBT Bureau Pune, 6 Dec 2025 AWS introduced Amazon EC2 M9g instances powered by the new Graviton5 chip. Key takeaways...

MosChip supports EMASS in developing ultra-low-power ECS-DoT Edge AI chip

MosChip supports EMASS in developing ultra-low-power ECS-DoT Edge AI chip

0

DBT Bureau Pune, 4 Dec 2025 MosChip Technologies, a leading player in silicon and product engineering services, today announced its...

CrowdStrike’s real-time CDR stops cloud attacks in seconds

CrowdStrike’s real-time CDR stops cloud attacks in seconds

0

DBT Bureau Pune, 4 Dec 2025 CrowdStrike unveiled new Cloud Detection and Response (CDR) innovations, advancing real-time protection across hybrid...

Samsung unveils Galaxy Z TriFold — Is this the future of foldable AI phones?

Samsung unveils Galaxy Z TriFold — Is this the future of foldable AI phones?

0

DBT Bureau Pune, 2 Dec 2025 Samsung Electronics announced the launch of Galaxy Z TriFold, further expanding Samsung’s leadership in...

Paradip Port handles significant crude oil traffic, five vessels expected next week
Commodity

Iron ore shipments at Indian major ports dip marginally during April-November

0

Debasis Mohapatra Bengaluru, 7 Dec 2025 Iron ore shipments in Indian ports fell by 2.25% to 31.6 million tonnes during...

Read moreDetails
IndiGo signals strong recovery — What’s next?
News

IndiGo signals strong recovery — What’s next?

0

DBT Bureau Pune, 7 Dec 2025 IndiGo has submitted a Press Statement to the stock exchanges, confirming significant operational improvements...

Read moreDetails
U.S.–China trade easing boosts soybean shipments in 2025
Commodity

U.S.–China trade easing boosts soybean shipments in 2025

0

DBT Bureau Pune, 7 Dec 2025 China’s soybean production for MY 25/26 remains pegged at 19.9 million metric tons, supported...

Read moreDetails
India’s E&M industry to hit $47.2 bn by 2029 at 7.8% CAGR: PwC Report
Reports

India’s E&M industry to hit $47.2 bn by 2029 at 7.8% CAGR: PwC Report

0

DBT Bureau Pune, 7 Dec 2025 PwC India released the India findings of its Global Entertainment & Media Outlook 2025–29,...

Read moreDetails
DBT Bureau

Data Biz Times © 2024. All Rights Reserved.

Navigate Site

  • Media Release
  • Blog
  • Contact Us
  • Privacy Policy

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Media Release
  • Data Story
  • Business
  • Tech
  • Artificial Intelligence

Data Biz Times © 2024. All Rights Reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?