Word: how do I grab the text with Python
I tried for word something with python-docx, to install it write pip install python-docx. I had a word doc called example with 4 lines of text in there that were grabbed in the right way like you see in the output below.
This is the picture of the word file “example.docx”, in the same dir of the file with the code.
from docx import Document
d = Document("example.docx")
for par in d.paragraphs:
print(par.text)
output (the example.docx content):
Titolo
Paragrafo 1 a titolo di esempio
This is an example of text
This is the final part, just 4 rows
Join all the text of docx in a folder
import os from docx import Document files = [f for f in os.listdir() if ".docx" in f] text_collector = [] whole_text = '' for f in files: doc = Document(f) for par in doc.paragraphs: text_collector.append(par.text) for text in text_collector: whole_text += text + "\n" print(whole_text)
As above, but with choise
In this code you are asked to choose the file that you want to join froma list that appears of the docx file in the folder.
import os from docx import Document files = [f for f in os.listdir() if ".docx" in f] for n,f in enumerate(files): print(n+1,f) print() print("Write the numbers of files you need separated by space") inp = input("Which files do you want to join?") desired = (inp.split()) desired = map(lambda x: int(x), desired) list_to_join = [] for n in desired: list_to_join.append(files[n-1]) text_collector = [] whole_text = '' for f in list_to_join: doc = Document(f) for par in doc.paragraphs: text_collector.append(par.text) for text in text_collector: whole_text += text + "\n" print(whole_text)