Extract text from all Powerpoint files in a directory

Actually working

Let’s try to get all the text in many powerpoint files, inside a folder with python-pptx

The documentation of python-pptx

If you want to extract text:

  • import Presentation from pptx (pip install python-pptx)
  • for each file in the directory (using glob module)
  • look in every slides and in every shape in each slide
  • if there is a shape with text attribute, print the shape.text

from pptx import Presentation
import glob

for eachfile in glob.glob("*.pptx"):
    prs = Presentation(eachfile)
    print(eachfile)
    print("----------------------")
    for slide in prs.slides:
        for shape in slide.shapes:
            if hasattr(shape, "text"):
                print(shape.text)

Published by pythonprogramming

Started with basic on the spectrum, loved javascript in the 90ies and python in the 2000, now I am back with python, still making some javascript stuff when needed.