Grab the text from a Word document

This code will let you grab the text from a Word document.

Grab text from Word
Grab text from Word

Word: how do I grab the text with Python

I tried for word something with python-docx, to install it write pip install python-docx. I had a word doc called example with 4 lines of text in there that were grabbed in the right way like you see in the output below.

This is the picture of the word file “example.docx”, in the same dir of the file with the code.

enter image description here

from docx import Document

d = Document("example.docx")

for par in d.paragraphs:
    print(par.text)

output (the example.docx content):

Titolo
Paragrafo 1 a titolo di esempio
This is an example of text
This is the final part, just 4 rows

Join all the text of docx in a folder

import os
from docx import Document

files = [f for f in os.listdir() if ".docx" in f]
text_collector = []
whole_text = ''
for f in files:
    doc = Document(f)
    for par in doc.paragraphs:
        text_collector.append(par.text)

for text in text_collector:
    whole_text += text + "\n"

print(whole_text)

 

As above, but with choise

In this code you are asked to choose the file that you want to join froma list that appears of the docx file in the folder.

import os
from docx import Document

files = [f for f in os.listdir() if ".docx" in f]

for n,f in enumerate(files):
    print(n+1,f)
print()
print("Write the numbers of files you need separated by space")
inp = input("Which files do you want to join?")

desired = (inp.split())
desired = map(lambda x: int(x), desired)
list_to_join = []
for n in desired:
    list_to_join.append(files[n-1])


text_collector = []
whole_text = ''
for f in list_to_join:
    doc = Document(f)
    for par in doc.paragraphs:
        text_collector.append(par.text)

for text in text_collector:
    whole_text += text + "\n"

print(whole_text)

 

Utilities

Published by pythonprogramming

Started with basic on the spectrum, loved javascript in the 90ies and python in the 2000, now I am back with python, still making some javascript stuff when needed.