The Visual Microphone

  • K.Ajay Kumar Reddy,Sundeep Akella,V.S.A.Hemanth,Ch.Vijayendra Sai
  • Machine learning is changing the way we think about images and how they are created. Researchers have trained machines to generate faces, to draw cartoons, and even to transfer the style of paintings to pictures. It is just a short step from these techniques to creating videos in this way, and indeed this is already being done.

    All that points to a way of creating virtual environments entirely by machine. That opens all kinds of possibilities for the future of human experience.

    But there is a problem. Video is not just a visual experience; generating realistic sound is just as important. So an interesting question is whether machines can convincingly generate the audio component of a video.

    A good live-action video captures moving images and clear audio. Some videos, however, lack an original sound recording, and some have poor quality audio. Typically, creating a realistic soundtrack for a video can be tough and time-consuming. To avoid this, we need to generate audio from scratch using the given video.


    • There are unintelligible sounds that are also more information to find recovered sounds. Identifying the number and gender of speakers in a room can be useful in some surveillance scenarios even if intelligible speech cannot be recovered. Examples of such areas include places where heavy surveillance is required.
    • We can also recover music well enough for some listeners to recognize the song, though the lyrics themselves are unintelligible in the recovered sound.
    • Because we are recovering sound from a video, we get a spatial measurement of the audio signal at many points on the filmed object rather than a single point like a laser microphone. We can use this spatial measurement to recover the vibration modes of an object. This can be a powerful tool for structural analysis, where general deformations of an object are often expressed as superpositions of the object’s vibration modes.
    • Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments.
  • The Stakeholder is a professional video editor. His name is M.Prashanth.He has been in this profession for the past 4 years and has produced lots of music videos and songs.

    Me: Good Evening Sir. This is Ajay Kumar Reddy. I am a B.Tech 3rd-year student studying at Amrita University, Coimbatore.

    Stakeholder: Good Evening. My name is Prashanth. I am a professional video editor. How may I help you?

    Me: Sir, we are trying to know about the problems faced by video editors today. We would like to ask you a few questions, sir. Are you busy?

    Stakeholder: Yeah, sure. Go ahead.

    Me: Thank you, sir. The first question we would like to ask you is 'Have you ever faced a problem where you wanted to generate a video clip with the audio but was unable to do because you had a lot of noise?'.

    Stakeholder: Yes. There was once a time where I had to edit a short video clip of a train moving on the tracks. But, the sound of the original video clip was so bad that I had to ultimately remove the audio from the video. The essence and beauty of the sound were completely lost and I had to force this to make it look good.

    Me: Why weren't you able to filter the noise from the original audio of the video?

    Stakeholder: As I have said, the audio from the video was so bad and even parts of it were corrupted. So, trying to remove the noise and clean it would have been a nightmare for me and would have taken a lot of time and effort. 

    Me: Is there anything that solves this? 

    Stakeholder: I have some of the best software that I use for development and so far, I haven't been able to resolve this issue. It would be nicer to have some software that can regenerate the audio from the given video without the audio, and also it should really so that users should not know whether the sound of the video is real or generated by a computer.

    Me: Thank you for your valuable information and for sharing your experience with me. I have one final question for you, sir.

    Stakeholder: Yes?

    Me: If there comes out a software that can generate audio from a given video and solves the problem which you are facing right now, how much are you willing to pay, sir?

    Stakeholder: I would love to pay the price at a yearly subscription of Rs.1700.

    Me: Thank you, sir, for taking time out for this call.

    Stakeholder: Good Luck with your project!

  • After collecting lots of input data through surveys from corporates, employees, family & friends, we understood that the majority of people find your approach interesting, and they see the potential in it being highly useful to the society.

November 13, 2019

0 responses on "The Visual Microphone"

Leave a Message

Your email address will not be published.


Knowledge and Content by Li2 Technologies | © 2021 NASSCOM Foundation | All rights reserved