IIIT-H researchers work on bug verification tool for AI-driven programming

New Delhi: Professor Abhishek Kr Singh from IIIT Hyderabad, along with Professor Ashish Mishra from IIT Hyderabad, along with students from both institutions working full time are building a practical tool blended with deep theory to detect bugs in software. The challenge is to ensure that machine-generated code is safe, reliable and correct. The research is focused on building automated systems that can catch these bugs early, especially in modern software that runs multiple tasks at the same time, known as parallel or concurrent programmes.

Singh says, “The process of developing software itself is now automated. But in the process, there are a lot of bugs that get generated as well. Many software bugs find their roots in the transition from informal intent to formal implementation. They begin with how humans describe what they want software to do. The problem arises because you never describe your intent clearly. You provide inputs in natural language and then these AI agents produce code for you.”

Correctness by Construction

Rather than catching and fixing the bugs as the code is written, the approach advocated by Singh involves expressing the intent of the programmer in a formal, precise way, using specifications and assertions that the computers can automatically check. Singh explains, “We prove safety and reliability by making the code correct during construction itself. If you can specify your intention in a more formal language, there is a possibility of checking whether those intentions are met or not.”

The danger of Parallel Programmes

Most modern systems run tasks simultaneously, with tiny timing differences in how tasks interleave, known as race conditions, leading to bugs that appear only under very specific conditions, at times years after a software is deployed. The team is using a technique called fuzzing that automatically generates a large number of inputs to evaluate the behaviour of the software. The fuzzing is combined with a deep understanding how parallel programmes behave on real hardware.

Sing explains, “Constructing test cases by hand is not that easy. Industry spends a lot of time doing testing using input-output pairs but this is not a systematic way. If even one property is broken, you know something went wrong. Random fuzzing may not trigger the exact bug that is there inside your program. There is no tool right now that actually deals with fuzzing of weak-memory programs which run on modern architecture.” The team is targeting commonly used programming languages such as C++, and modern architectures such as x86 and ARM.

Transforming software testing

The project is a natural extension of earlier research by Singh that focused on theoretical proofs on programme correctness. Singh says, “My PhD work was mostly about formal verification and mathematical proofs. Now we want to translate those theoretical results into actual tool building. This is not something purely theoretical, It has a lot of impact in industry.” The team plans to extend the approach to GPUs and hardware accelerators, where the correctness challenges are even more severe. The team has an ambitious goal of transforming how software is tested and trusted, as humans rely increasingly on machine-generated code.