New Delhi: Researchers from IIIT Hyderabad along with the Anusandhan National Research Foundation (ANRF) have come up with an innovative AI tool called Saral that converts research papers into presentations in video form, that are customisable in different Indian languages. The tool is pretty simple to use, and provides a more accessible way for going through research material. The end result even with minimal tweaking is surprisingly engaging, and allows for new research to be consumed in a capsule format. The technology is inspired by Google’s Notebook LM, but can only process one research paper at a time in its current form. The tool can also be of use to the researchers themselves, Project Lead at IIITH, Ponnurangan Kumaraguru says, “Typically, students and faculty present slides at conferences. So, we wanted to create an application that would automatically take a research paper as its input and generate a 3-4 minute concise video based on slides in different Indian languages.”
Users have to provide API keys for Google Gemini, Sarvam and Open AI for the tool, with the Google Gemini API being the essential one. Then the user can upload the science paper, and the tool generates five slides with an Introduction, Methodology, Results, Discussion and Conclusion. At this point, the images from the presentation are uploaded into a bank, and the images can be inserted into the slides. The users can also edit the commentary for each slide, as well as the bullet points. Unfortunately, it is not possible to add new images directly to the slides, which requires the user to edit the science paper, and add the images to the pdf there. The tool then generates a video, using AI to read the text commentary, while displaying the slideshow with the bullet points and images.
Enhancing the Saral tool with additional capabilities
CEO of ANRF, Shivakumar Kalyanaram says the goal is to foster a culture of innovation in the academia and industry in the country to transform the domestic research landscape, “As part of ANRF’s mandate to foster excellence in research and innovation, we are committed to broadening research capabilities across a wider range of institutions, and researchers. We see the democratisation of research via technology and AI as a key contributor in this effort along with other programs — accelerating the discovery of knowledge. Through initiatives like SARAL, we aim to promote the diffusion of ideas, build strong partnerships, and use platforms like AI for social media, and demystification of research to amplify the impact of science and research.” ANRF and IIITH intend to advance the capabilities of Saral.
The research is presented in a natural and familiar voice for Indian users thanks to the use of Sarvam. Kumaraguru explains, “The research summary is created with the help of AI tools such as Gemini, Claude, GPT and so on. When the script is ready in the form of slides, we then use Sarvam’s Text-to-Speech conversion engine to create an audio in the voice (either male or female) and language of our choice.” On the roadmap are adding more features to make the tool more powerful, Kumaraguru adds, “Who wouldn’t like to watch an animated, engaging video? Conferences also require posters which contain all the required information on a single page. We are currently working on it. It is a difficult one but a problem we are trying to solve nevertheless right now.” Users can try out Saral here.