• Latest
  • Trending
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

India wheat procurement falls 9% amid Madhya Pradesh decline

India wheat procurement falls 9% amid Madhya Pradesh decline

NALCO FY26 profit hits ₹5,816 Cr, record output and dividend signal positive market momentum

NALCO FY26 profit hits ₹5,816 Cr, record output and dividend signal positive market momentum

India sugar output seen rising 12% in 2026-27: Kedia Advisory

India sugar output seen rising 12% in 2026-27: Kedia Advisory

RBI proposes updated digital wallet norms: FAQs explained

RBI proposes updated digital wallet norms: FAQs explained

Keeping up the Company Culture with the Times

Talent acquisition & Recruitment – Are they same?

The return of the king: Middle East fires ignite coal’s massive comeback

India’s three-front push to keep urea flowing amid the gas crisis

Hindustan Unilever reports 21% jump in Q4FY26 profit

Hindustan Unilever reports 21% jump in Q4FY26 profit

Reliance Industries and Bajaj Finance identified as top picks in Prabhudas Lilladher’s BEAT Report

Bajaj Finserv posts steady Q4FY26 performance

Federal Bank stock at 52-week high: Know its Q2FY26 performance

Federal Bank likely to improve earnings in coming quarter

Accel backs Sahi with $33M to expand retail trading platform

Accel backs Sahi with $33M to expand retail trading platform

Waaree Renewable to acquire majority stake in Associated Power for Rs 1,225 crore

Waaree Energies stock down 10% despite strong Q4FY26 profit growth

Indian Bank sees lending rates easing further as RBI holds policy steady

Indian Bank reports steady growth, improved asset quality in Q4FY26

  • Market
  • Commodity
  • Personal Finance
  • Data Story
  • News
  • Contact Us
Monday, May 4, 2026
  • Login
Data Biz Times
No Result
View All Result
Data Biz Times
No Result
View All Result

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

in Tech
Reading Time: 3 mins read
0
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models
Share on FacebookShare on Twitter

DBT Bureau

Pune, 26 Sep 2025

Samsung Electronics unveiled TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark), a proprietary benchmark developed by Samsung Research to evaluate AI productivity.

TRUEBench provides a comprehensive set of metrics to measure how large language models (LLMs) perform in real-world workplace productivity applications. To ensure realistic evaluation, it incorporates diverse dialogue scenarios and multilingual conditions.

Drawing on Samsung’s in-house use of AI for productivity, TRUEBench evaluates commonly used enterprise tasks — such as content generation, data analysis, summarization and translation — across 10 categories and 46 sub-categories. The benchmark ensures reliable scoring with AI-powered automatic evaluation based on criteria that are collaboratively designed and refined by both humans and AI.

“Samsung Research brings deep expertise and a competitive edge through its real-world AI experience,” said Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Research. “We expect TRUEBench to establish evaluation standards for productivity and solidify Samsung’s technological leadership.”

Recently, as companies adopt AI for tasks there has been a growing demand for measuring the productivity of LLMs. However, existing benchmarks primarily measure overall performance, are mostly English‑centric, and are limited to single‑turn question‑answer structures. This restricts their ability to reflect actual work environments.

To address these limitations, TRUEBench is composed of a total of 2,485 test sets across 10 categories and 12 languages1 — while also supporting cross-linguistic scenarios. The test sets examine what AI models can actually solve, and Samsung Research applied test sets ranging from as short as 8 characters to over 20,000 characters, reflecting tasks from simple requests to lengthy document summarization.

To evaluate the performance of AI models, it is important to have clear criteria for judging whether the AI’s responses are correct. In real-world situations, not all user intents may be explicitly stated in the instructions. TRUEBench is designed to enable realistic evaluation by considering not only the accuracy of the answers but also detailed conditions that meet the implicit needs of users.

Samsung Research verified evaluation items through collaboration between humans and AI. First, human annotators create the evaluation criteria, and then the AI reviews it to check for errors, contradictions or unnecessary constraints. Afterward, human annotators refine the criteria again, repeating this process to apply increasingly precise evaluation standards. Based on these cross-verified criteria, automatic evaluation of AI models is conducted, minimizing subjective bias and ensuring consistency. In addition, for each test, all conditions must be satisfied for the model to pass. This enables more detailed and precise scoring across tasks.

TRUEBench’s data samples and leaderboards are available on the global open-source platform Hugging Face, which allows users to compare a maximum of five models and enables comprehensive AI model performance comparisons at a glance. Moreover, data on the average length of response results are also published, enabling simultaneous comparison of both performance and efficiency.

Related Posts

Cyient stock sees strong post-result upside amid volatile trading range

Cyient announces Rs 720 crore share buyback plan

0

Athira Sethu Kochi, 27 April 2026 Cyient, a mid-cap IT firm, made a notable move during its quarterly board meeting...

LTTS sells SWC Business to AMI Paradigm

LTTS’ growth recovery hinges on strong execution

0

Debasis Mohapatra Bengaluru, 26 April 2026 L&T Technology Services (LTTS) has completed the restructuring and portfolio rationalisation in the fourth...

HCLTech Partners with Boardwalktech to Improve EUC Risk Management for Financial Institutions

HCLTech FY26 revenue at $14.66 billion, up 3.9% YoY; revenue guidance of 1-4% for FY27

0

Debasis Mohapatra Bengaluru, 21 April 2026 HCLTech witnessed tepid revenue growth in fourth quarter of FY26 though its deal pipeline...

Wipro, SAP, AusNet complete Cloud ERP upgrade

Wipro posts $10.47 bn revenue in FY26; margin at 17.2%

0

Debasis Mohapatra Bengaluru, 16 April 2026 IT major Wipro posted sound deal pipeline for the fourth quarter though revenue growth...

India wheat procurement falls 9% amid Madhya Pradesh decline
Commodity

India wheat procurement falls 9% amid Madhya Pradesh decline

0

DBT Bureau Pune, 5 May 2026 India’s wheat procurement dropped 9% year-on-year to 23.25 million tonnes (mt) as of April...

Read moreDetails
NALCO FY26 profit hits ₹5,816 Cr, record output and dividend signal positive market momentum
Market

NALCO FY26 profit hits ₹5,816 Cr, record output and dividend signal positive market momentum

0

DBT Bureau Pune, 3 May 2026 Shares of National Aluminium Company Limited showed a mixed-to-soft trend during April 2026, reflecting...

Read moreDetails
India sugar output seen rising 12% in 2026-27: Kedia Advisory
Commodity

India sugar output seen rising 12% in 2026-27: Kedia Advisory

0

DBT Bureau Pune, 3 May 2026 India’s sugar production is projected to rise 12% to 33.6 million tonnes in 2026-27,...

Read moreDetails
RBI proposes updated digital wallet norms: FAQs explained
Personal Finance

RBI proposes updated digital wallet norms: FAQs explained

0

Anindita Nayak Bhubaneswar, 2 May 2026 RBI has plans to modify the PPI rules which encompass digital wallets, prepaid cards...

Read moreDetails
DBT Bureau

Data Biz Times © 2024. All Rights Reserved.

Navigate Site

  • Media Release
  • Blog
  • Contact Us
  • Privacy Policy

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Market
  • News
  • Data Story
  • Business
  • Media Release
  • Tech
  • Contact Us

Data Biz Times © 2024. All Rights Reserved.