• Latest
  • Trending
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

Boeing delivers second ViaSat‑3 satellite to Viasat

Boeing delivers second ViaSat‑3 satellite to Viasat

IBM and AMD collaborate with Zyphra on next-gen AI infrastructure

IBM and AMD collaborate with Zyphra on next-gen AI infrastructure

Tesla posts strong Q3 with record deliveries and energy storage deployments

Tesla posts strong Q3 with record deliveries and energy storage deployments

OpenAI launches Sora 2, a major leap in video and audio generation

OpenAI launches Sora 2, a major leap in video and audio generation

AI-powered threat defense by Databricks

AI-powered threat defense by Databricks

India’s leading agritech startups transforming post-harvest management and storage

Agritech startups attract just 2% of global PE&VC funding since 2020: Report

Port Report: Iron ore vessel status at Paradip Port

Port Report: Iron ore vessel status at Paradip Port

LTTS recognized as leader in connected product engineering services by Everest Group

Deal Corner: LTTS bags new deal; Capgemini announces new CEO

Data Story: NMDC iron ore price revision

Iron ore prices in Odisha remain steady

CIOs must unite infrastructure modernisation & data localisation

CIOs must unite infrastructure modernisation & data localisation

Data story: Mumbai port Import/Export data comparison April to May 25

China stops purchase of iron ore from BHP: Report

CrowdStrike appoints Amjad Hussain as chief resilience officer

CrowdStrike appoints Amjad Hussain as chief resilience officer

Thursday, October 2, 2025
  • Login
Data Biz Times
  • Artificial Intelligence
  • Commodity
  • Data Story
  • Business
  • Media Release
  • Contact Us
No Result
View All Result
Data Biz Times
No Result
View All Result

Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models

in Tech
Reading Time: 3 mins read
0
Samsung’s TRUEBench offers multilingual, real-world benchmarking for large language models
Share on FacebookShare on Twitter

DBT Bureau

Pune, 26 Sep 2025

RelatedPosts

OpenAI launches Sora 2, a major leap in video and audio generation

Agritech startups attract just 2% of global PE&VC funding since 2020: Report

Deal Corner: LTTS bags new deal; Capgemini announces new CEO

Samsung Electronics unveiled TRUEBench (Trustworthy Real-world Usage Evaluation Benchmark), a proprietary benchmark developed by Samsung Research to evaluate AI productivity.

TRUEBench provides a comprehensive set of metrics to measure how large language models (LLMs) perform in real-world workplace productivity applications. To ensure realistic evaluation, it incorporates diverse dialogue scenarios and multilingual conditions.

Drawing on Samsung’s in-house use of AI for productivity, TRUEBench evaluates commonly used enterprise tasks — such as content generation, data analysis, summarization and translation — across 10 categories and 46 sub-categories. The benchmark ensures reliable scoring with AI-powered automatic evaluation based on criteria that are collaboratively designed and refined by both humans and AI.

“Samsung Research brings deep expertise and a competitive edge through its real-world AI experience,” said Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Research. “We expect TRUEBench to establish evaluation standards for productivity and solidify Samsung’s technological leadership.”

Recently, as companies adopt AI for tasks there has been a growing demand for measuring the productivity of LLMs. However, existing benchmarks primarily measure overall performance, are mostly English‑centric, and are limited to single‑turn question‑answer structures. This restricts their ability to reflect actual work environments.

To address these limitations, TRUEBench is composed of a total of 2,485 test sets across 10 categories and 12 languages1 — while also supporting cross-linguistic scenarios. The test sets examine what AI models can actually solve, and Samsung Research applied test sets ranging from as short as 8 characters to over 20,000 characters, reflecting tasks from simple requests to lengthy document summarization.

To evaluate the performance of AI models, it is important to have clear criteria for judging whether the AI’s responses are correct. In real-world situations, not all user intents may be explicitly stated in the instructions. TRUEBench is designed to enable realistic evaluation by considering not only the accuracy of the answers but also detailed conditions that meet the implicit needs of users.

Samsung Research verified evaluation items through collaboration between humans and AI. First, human annotators create the evaluation criteria, and then the AI reviews it to check for errors, contradictions or unnecessary constraints. Afterward, human annotators refine the criteria again, repeating this process to apply increasingly precise evaluation standards. Based on these cross-verified criteria, automatic evaluation of AI models is conducted, minimizing subjective bias and ensuring consistency. In addition, for each test, all conditions must be satisfied for the model to pass. This enables more detailed and precise scoring across tasks.

TRUEBench’s data samples and leaderboards are available on the global open-source platform Hugging Face, which allows users to compare a maximum of five models and enables comprehensive AI model performance comparisons at a glance. Moreover, data on the average length of response results are also published, enabling simultaneous comparison of both performance and efficiency.

Related Posts

OpenAI launches Sora 2, a major leap in video and audio generation

OpenAI launches Sora 2, a major leap in video and audio generation

0

DBT Bureau Pune, 2 Oct 2025 OpenAI has released Sora 2, its flagship video and audio generation model, marking a...

India’s leading agritech startups transforming post-harvest management and storage

Agritech startups attract just 2% of global PE&VC funding since 2020: Report

0

Anindita Nayak Bhubaneswar, 2 October 2025 Between 2020 and mid-2025, over 160 Indian agritech startups raised more than $2 billion...

LTTS recognized as leader in connected product engineering services by Everest Group

Deal Corner: LTTS bags new deal; Capgemini announces new CEO

0

Debasis Mohapatra Bengaluru, 1 October 2025 IT services companies regularly announce new deal wins and renewal of outsourcing contracts. Such...

CIOs must unite infrastructure modernisation & data localisation

CIOs must unite infrastructure modernisation & data localisation

0

By Ishan Talathi CEO & Founder, Leapswitch Networks CIOs today stand at a crossroads, under pressure to deliver rapid digital...

Boeing delivers second ViaSat‑3 satellite to Viasat
Media Release

Boeing delivers second ViaSat‑3 satellite to Viasat

0

DBT Bureau Pune, 2 Oct 2025 Boeing has delivered ViaSat‑3 F2, the second spacecraft in Viasat’s next‑generation, ultra‑high‑capacity constellation. The...

Read moreDetails
IBM and AMD collaborate with Zyphra on next-gen AI infrastructure
Artificial Intelligence

IBM and AMD collaborate with Zyphra on next-gen AI infrastructure

0

DBT Bureau Pune, 2 Oct 2025 IBM and AMD announced a collaboration to deliver advanced AI infrastructure to Zyphra, an...

Read moreDetails
Tesla posts strong Q3 with record deliveries and energy storage deployments
Automotive

Tesla posts strong Q3 with record deliveries and energy storage deployments

0

DBT Bureau Pune, 2 Oct 2025 In the third quarter, Tesla produced over 447,000 vehicles, delivered over 497,000 vehicles and...

Read moreDetails
OpenAI launches Sora 2, a major leap in video and audio generation
Tech

OpenAI launches Sora 2, a major leap in video and audio generation

0

DBT Bureau Pune, 2 Oct 2025 OpenAI has released Sora 2, its flagship video and audio generation model, marking a...

Read moreDetails
DBT Bureau

Data Biz Times © 2024. All Rights Reserved.

Navigate Site

  • Media Release
  • Blog
  • Contact Us
  • Privacy Policy

Follow Us

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Media Release
  • Data Story
  • Business
  • Tech
  • Artificial Intelligence

Data Biz Times © 2024. All Rights Reserved.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?