The breakthrough comes from a new technique called Self-Principled Critique Tuning (SPCT). SPCT is different from simply making AI models larger to improve performance – SPCT does not require a lot of energy and computing power to teach the AI to judge its own work using a set of self-created rules.

DeepSeek created a self-improving AI using Self-Principled Critique Tuning (SPCT). SPCT teaches AI to judge its own work using self-generated rules. The method boosts performance without massive computing power.

The way that it works is via a built-in “judge” which verifies that the AI response both adheres to its internal reasoning rules, and appears suitable for human output. When the AI provides a solid response it receives positive feedback, which helps it improve its ability to answer similar questions in future instances. DeepSeek implements this method as part of its DeepSeek-GRM system which stands for Generative Reward Modeling. GRM operates differently from traditional methods because it performs parallel checks to enhance both accuracy and consistency. “We propose Self-Principled Critique Tuning (SPCT) to foster scalable reward generation behaviors,” the researchers wrote in their paper. “SPCT enables [the model] to adaptively posit principles and critiques based on the input query and responses, leading to better outcome rewards.” With this system, DeepSeek claims its AI can now perform better than competitors like Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4o, especially when it comes to complex tasks like reasoning or decision-making, as noted by Euronews. Importantly, DeepSeek says it plans to release these new tools as open-source software, though no release date has been shared.