Skip to content Skip to sidebar Skip to footer

How To Write Separate Docx Files By Page From One Docx File?

I have a MS Word document that consists of several hundred pages. Each page is identical apart from the name of a person which is unique across each page. (One page is one user). I

Solution 1:

I had the exact same problem. Unfortunately I could not find a way to split .docx by page. The solution was to first use python-docx or docx2python (whatever you like) to iterate over each page and extract the unique (person) information and put it in a list so you end up with:

people = ['person_A', 'person_B', 'person_C', ....]

Then save the .docx as a pdf split the pdfs up by page and then save them as person_A.pdf etc like this:

from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("document.pdf", "rb"))

for i inrange(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    withopen(f"{people[i]}.pdf", "wb") as outputStream:
        output.write(outputStream)

The result is a bunch of one page PDFs saved as Person_A.pdf, Person_B.pdf etc. Hope that helps.

Solution 2:

I would suggest another package aspose-words-cloud to split a word document into separate pages. Currently, it works with cloud storage(Aspose cloud storage, Amazon S3, DropBox, Google Drive Storage, Google Cloud Storage, Windows Azure Storage and FTP Storage). However, in near future, it will support process files from the request body(streams).

P.S: I am developer evangelist at Aspose.

# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-pythonimport os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile


# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxx-xxxxx-xxxx-xxxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxx'

words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'

remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = '02_pages.docx'
remoteFileName = '02_pages.docx'#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))

#Split DOCX pages as a zip file
request = asposewordscloud.models.requests.SplitDocumentRequest(name=remoteFileName, format='docx', folder=remoteFolder, zip_output= 'true')
result = words_api.split_document(request)
print("Result {}".format(result.split_result.zipped_pages.href))

#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(result.split_result.zipped_pages.href)
response_download = words_api.download_file(request_download)
copyfile(response_download, 'C:/'+ result.split_result.zipped_pages.href)

Post a Comment for "How To Write Separate Docx Files By Page From One Docx File?"