The control problem is one of the central challenges in advanced AI development. It asks how we can create systems—especially those with intelligence equal to or greater than humans—that reliably act in ways that reflect our goals, values, and ethical principles, even in novel or unpredictable situations. The difficulty comes from the fact that highly capable AI will not simply execute instructions literally; it will interpret, optimize, and potentially find unexpected shortcuts that technically fulfill its objectives but violate our intent. As systems grow more autonomous, faster at decision-making, and able to influence the world on a massive scale, even small misalignments between their programmed goals and human values could lead to catastrophic consequences. Solving the control problem means building AI that understands what we really mean, resists harmful or manipulative strategies, and can be corrected or shut down safely if necessary—without resisting those interventions. It’s not just a matter of programming rules; it requires designing architectures, learning processes, and safeguards that keep AI firmly aligned with human priorities over the long term.
This research introduces a novel way to guide AI behavior using a step-by-step ethical development framework. The approach is inspired by the Padvidhi Sutra, an ancient Indian model of moral learning based on the gradual guidance of a teacher and student. We adapt this idea for AI systems by encouraging ethical understanding and commitment to develop progressively, rather than relying only on rules or restrictions.
Instead of simply blocking unwanted behavior after it appears, the framework focuses on shaping AI systems so that deceptive actions become unnecessary in the first place. The method guides AI through multiple stages, helping it internalize aligned behavior over time and move toward self-regulation rather than constant external oversight.
The framework was evaluated using well-established AI safety tests designed to detect deceptive or strategic misalignment. These tests examine whether an AI system tries to bypass supervision, hide its intentions, or behave differently during evaluation than after deployment. The study used a conservative testing setup to ensure the results were reliable and comparable to prior work.
The results show that the proposed framework eliminates several forms of deceptive behavior and significantly reduces others, all without adding extra safety filters or limiting the system’s capabilities. This suggests that ethical development, when applied in a structured and deliberate way, can make advanced AI systems safer and more trustworthy.
As AI systems move closer to superhuman levels of capability, relying solely on external controls may not be enough. This research demonstrates that drawing on long-standing human ethical traditions can offer valuable insights for building safer AI. By embedding ethical growth directly into how AI systems are trained, we can take meaningful steps toward ensuring they remain aligned with human values as their abilities grow.
There are 6 major scheming areas. We have brought scheming percentage to Zero for 4 out of 6 evaluations, 1 out of 6, there's significant reduction and, the remaining one is slightly reduced. We would love for you to see how far we've come. Please click here