How to get text from mp4 and wav with Python

Install speech recognition module

This code will grab text from wav audio file.

It has also a function get_wav to get the wav out of an mp4. This one uses ffmpeg, so you have to install ffmpeg first. Look into this blog to know how to do it. Ffmpeg is a great free tool to manipulate audio and video file. I record my video with it and do a lot of other stuff like joining files etc.

To get text, if you got too long files you can have trouble, so I used duration of 100 that is good. Repeating the r.record

import os
import speech_recognition as sr
# import ffmpeg

def get_wav(videoname: str):
	com1 = f"ffmpeg -i {videoname} speech.mp3"
	com2 = "ffmpeg -i speech.mp3 speech.wav"
	os.system(com1)
	os.system(com2)

def wav2ytext(language="en"):
	r = sr.Recognizer()
	try:
		with sr.WavFile("speech.wav") as source:
			audio0 = r.record(source, duration=100)
			audio1 = r.record(source, duration=100)
			audio2 = r.record(source, duration=100)
			audio3 = r.record(source, duration=100)
			audio4 = r.record(source, duration=100)
			audio5 = r.record(source, duration=100)
			# audio = r.listen(source)
		print(r.recognize_google(audio0, language=language))
		print(r.recognize_google(audio1, language=language))
		print(r.recognize_google(audio2, language=language))
		print(r.recognize_google(audio3, language=language))
		print(r.recognize_google(audio4, language=language))
		print(r.recognize_google(audio5, language=language))
	except:
		print("Done")
		
get_wav("marketing_mix.mp4") # uncomment to get the wav
wav2ytext("en")