Extract text from all Powerpoint files in a directory

Actually working

Let’s try to get all the text in many powerpoint files, inside a folder with python-pptx

The documentation of python-pptx

If you want to extract text:


from pptx import Presentation
import glob

for eachfile in glob.glob("*.pptx"):
    prs = Presentation(eachfile)
    print(eachfile)
    print("----------------------")
    for slide in prs.slides:
        for shape in slide.shapes:
            if hasattr(shape, "text"):
                print(shape.text)