Splitting Invoices By "Client ID"

Introduction

It is a common task to split a PDF document that contains multiple invoices into individual files while grouping them by some kind of ID number. The following tutorial is going to show how to take a PDF file with multiple invoices (variable number number of pages per invoice) and split it by a "Client ID" (or any other unique identification number). Each output PDF file will contain multiple invoices for each client/account.

Input Document Description

The input PDF document used in this tutorial contains multiple invoices of variable length, one or more invoices per each client. First page of each invoice contains a "Client ID" in the following format "CLIENT ID: XXXXXXXX". Location of the "Client ID" is not fixed and may differ from page to page. The goal is to split input PDF file into documents that contain multiple invoices for every client and name it using a corresponding "Client ID".

Splitting Approach

We are going to use a "Split By Text Pattern" method. Since the "Client ID" always occurs on the first page of each invoice, it is natural to use it as a reliable "separator". The AutoSplit will search all pages of the input document for a text pattern that matches the "Client ID" text pattern. The document will be split at pages where text that matches the text pattern is actually changing.

Output Results

We have used the AutoSplitâ„¢ plug-in to automatically split invoices from a single PDF file and name output files by a "Client ID". Each output file contains all invoices related to a specific "Client ID".