• Latest
  • Trending
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Aurobindo Pharma USA gets FTC clearance to acquire Lannett in $250 million deal

Aurobindo Pharma USA gets FTC clearance to acquire Lannett in $250 million deal

RBI keeps repo rate unchanged at 6.5%; Inflation, GDP growth forecast for FY25 retained

Banks raise FCNR deposit rates for NRIs post RBI relaxation

Jio platforms files for India’s largest IPO: Here’s what you need to know

Jio platforms files for India’s largest IPO: Here’s what you need to know

Himadri expands investment in U.S. battery firm IBC, lifts stake to 20.47%

Himadri expands investment in U.S. battery firm IBC, lifts stake to 20.47%

India’s coffee exports jump 26% in 2026 as Robusta shipments drive growth

India’s coffee exports jump 26% in 2026 as Robusta shipments drive growth

HSBC targets over $100 million in gains through expanded Google Cloud AI partnership

HSBC targets over $100 million in gains through expanded Google Cloud AI partnership

Global soybean stocks rise despite lower production outlook

Global soybean stocks rise despite lower production outlook

Kotak, HDFC, ICICI and Axis rank among APAC’s most leveraged banks

Kotak, HDFC, ICICI and Axis rank among APAC’s most leveraged banks

Salesforce and Databricks expand partnership to power trusted enterprise AI agents

Salesforce and Databricks expand partnership to power trusted enterprise AI agents

Anant Ambani highlights Reliance’s rapid progress in new energy business

Anant Ambani highlights Reliance’s rapid progress in new energy business

Anthropic opens Seoul office, expands AI partnerships across South Korea

Anthropic opens Seoul office, expands AI partnerships across South Korea

MOU to end the Middle East war : Impact on Indian stock market and investor outlook

MOU to end the Middle East war : Impact on Indian stock market and investor outlook

  • Market
  • Commodity
  • Personal Finance
  • Data Story
  • News
  • Contact Us
Monday, June 22, 2026
  • Login
Data Biz Times
No Result
View All Result
Data Biz Times
No Result
View All Result

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

in Tech
Reading Time: 3 mins read
0
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models
Share on FacebookShare on Twitter

DBT Bureau

Pune, 26 Sep 2025

Samsung Electronics unveiled TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark), a proprietary benchmark developed by Samsung Research to evaluate AI productivity.

TRUEBench provides a comprehensive set of metrics to measure how large language models (LLMs) perform in real-world workplace productivity applications. To ensure realistic evaluation, it incorporates diverse dialogue scenarios and multilingual conditions.

Drawing on Samsung’s in-house use of AI for productivity, TRUEBench evaluates commonly used enterprise tasks — such as content generation, data analysis, summarization and translation — across 10 categories and 46 sub-categories. The benchmark ensures reliable scoring with AI-powered automatic evaluation based on criteria that are collaboratively designed and refined by both humans and AI.

“Samsung Research brings deep expertise and a competitive edge through its real-world AI experience,” said Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Research. “We expect TRUEBench to establish evaluation standards for productivity and solidify Samsung’s technological leadership.”

Recently, as companies adopt AI for tasks there has been a growing demand for measuring the productivity of LLMs. However, existing benchmarks primarily measure overall performance, are mostly English‑centric, and are limited to single‑turn question‑answer structures. This restricts their ability to reflect actual work environments.

To address these limitations, TRUEBench is composed of a total of 2,485 test sets across 10 categories and 12 languages1 — while also supporting cross-linguistic scenarios. The test sets examine what AI models can actually solve, and Samsung Research applied test sets ranging from as short as 8 characters to over 20,000 characters, reflecting tasks from simple requests to lengthy document summarization.

To evaluate the performance of AI models, it is important to have clear criteria for judging whether the AI’s responses are correct. In real-world situations, not all user intents may be explicitly stated in the instructions. TRUEBench is designed to enable realistic evaluation by considering not only the accuracy of the answers but also detailed conditions that meet the implicit needs of users.

Samsung Research verified evaluation items through collaboration between humans and AI. First, human annotators create the evaluation criteria, and then the AI reviews it to check for errors, contradictions or unnecessary constraints. Afterward, human annotators refine the criteria again, repeating this process to apply increasingly precise evaluation standards. Based on these cross-verified criteria, automatic evaluation of AI models is conducted, minimizing subjective bias and ensuring consistency. In addition, for each test, all conditions must be satisfied for the model to pass. This enables more detailed and precise scoring across tasks.

TRUEBench’s data samples and leaderboards are available on the global open-source platform Hugging Face, which allows users to compare a maximum of five models and enables comprehensive AI model performance comparisons at a glance. Moreover, data on the average length of response results are also published, enabling simultaneous comparison of both performance and efficiency.

Related Posts

HSBC targets over $100 million in gains through expanded Google Cloud AI partnership

HSBC targets over $100 million in gains through expanded Google Cloud AI partnership

0

DBT Bureau Pune, 21 June 2026 HSBC and Google Cloud announced a multi-year partnership to build and deploy AI capabilities...

Salesforce and Databricks expand partnership to power trusted enterprise AI agents

Salesforce and Databricks expand partnership to power trusted enterprise AI agents

0

DBT Bureau Pune, 20 June 2026 Salesforce announced an expanded partnership with Databricks, the data and AI company, to help...

Accenture and CrowdStrike partner to drive cybersecurity transformation

Accenture cuts growth outlook, signals softer IT spending

0

Athira Sethu Kochi, 19 June 2026 Accenture, one of the world's largest IT services companies and a key indicator of...

Wipro completes major data center migration for METRO

Wipro completes major data center migration for METRO

0

DBT Bureau Pune, 19 June 2026 Wipro Limited said it has successfully completed a large-scale, multi-year data center migration program...

RBI keeps repo rate unchanged at 6.5%; Inflation, GDP growth forecast for FY25 retained
Finance

Banks raise FCNR deposit rates for NRIs post RBI relaxation

0

Debasis Mohapatra Bengaluru, 22 June 2026 After the relaxation of Reserve Bank of India’s rules on Foreign Currency Non-Resident (FCNR)...

Read moreDetails
Jio platforms files for India’s largest IPO: Here’s what you need to know
Data Story

Jio platforms files for India’s largest IPO: Here’s what you need to know

0

Athira Sethu Kochi, 22 June 2026 Jio Platforms Ltd., the digital services arm of Reliance Industries, has filed draft papers...

Read moreDetails
Himadri expands investment in U.S. battery firm IBC, lifts stake to 20.47%
Business

Himadri expands investment in U.S. battery firm IBC, lifts stake to 20.47%

0

DBT Bureau Pune, 22 June 2026 Himadri Speciality Chemical Ltd. (Himadri), a specialty chemicals and advanced carbon materials company, has...

Read moreDetails
India’s coffee exports jump 26% in 2026 as Robusta shipments drive growth
Commodity

India’s coffee exports jump 26% in 2026 as Robusta shipments drive growth

0

DBT Bureau Pune, 21 June 2026 India's coffee exports recorded strong growth in the first half of 2026, supported by...

Read moreDetails
DBT Bureau

Data Biz Times © 2024. All Rights Reserved.

Navigate Site

  • Media Release
  • Blog
  • Contact Us
  • Privacy Policy

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Market
  • News
  • Data Story
  • Business
  • Media Release
  • Tech
  • Contact Us

Data Biz Times © 2024. All Rights Reserved.