Introducing an Enhanced AI Reasoning Technique
Image: Envato/DC_Studio Researchers from AI company DeepSeek and Tsinghua University have introduced a new technique to enhance “reasoning” in large language models (LLMs). Reasoning capabilities have emerged as a critical benchmark in the race to build top-performing generative AI systems. China and the U.S. are actively competing to develop the most powerful and practical models. According to a Stanford University report in April, China’s LLMs are rapidly closing the gap with their U.S. counterparts. In 2024, China produced 15 notable AI models compared to 40 in the U.S., but it leads in patents and academic publications. What is DeepSeek’s new technique? DeepSeek researchers published a paper, titled “Inference-Time Scaling for Generalist Reward Modeling,” on Cornell University’s arXiv, the archive of scientific papers. Note that papers published on arXiv are not necessarily peer-reviewed. In the paper, the researchers detailed a combination of two AI training methods: generative reward modeling and self-principled critique tuning. “In this work, we investigate how to improve reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of …









