PSS - PDF Scan Splitter
A tool to sort batch scanned PDF documents based on document number contained in barcode placed on document.
GitHub RepositoryDownload pss.pharSetup
You can install pss
tool in three ways:
- Global composer installation - tool will be available globally as
pss
command - You can download phar executable file from GitHub Releases page
- Download source code form GitHub to run
composer global require kduma/pdf-scan-splitter-tool
Requirements
This tool is based on poppler-utils
and zbar-tools
packages.
Those packages are run in a docker container, so you don't need to install them on your system, but you need to have docker installed.
Usage
This tool reads every PDF file provided as an argument, splits every page into separate PDF and looks for a Code-128 barcode on each page.
If a barcode is found, the page is saved into a file named after the barcode into the output directory.
If no barcode is found, the page is saved into a file named UNKNOWN.pdf
.
If there are multiple barcodes on a page, the page is saved into a file named after first barcode.
If there is multiple pages with the same barcode, the pages are saved into a file with a number suffix for duplicates.
./pss process <output_dir> <pdf> [<pdf>...] <--dpi=200>
Multipage Documents
In original idea, this tool sorted only single-page documents/forms. As the need for processing multi-page documents arose, such functionality has been added in form of page-identifying barcodes.
In multi-page documents you need to add barcode with following content:
[<document copy id>[@<document number>]]:<page number>[:<total pages>]
The resulting PDF's will contain all pages with page-identifying barcodes in correct order, so you can scan them in any order.