Promena Transformer - page extractor - PDFBox
This transformer provides functionality to extract a range of pages from application/pdf documents using PDFBox 2.0.16.
Visit Promena#Transformer to understand the repository structure.
Transformation PdfBoxPageExtractorDsl, PdfBoxPageExtractorParametersDsl
The DataDescriptor has to contain at least one descriptor. If more than one descriptor is passed, the transformation will be performed on each of them separately.
Support PdfBoxPageExtractorSupport
Media type PdfBoxPageExtractorSupport.MediaTypeSupport
application/pdf; UTF-8➡️ application/pdf; UTF-8
Parameters PdfBoxPageExtractorSupport.ParametersSupport
pages,List<List<Int>>, optional - extracts each list of the pages (indexed from 1) of the lists to separateTransformedDataDescriptorsplitByBarcodeMetadata,Boolean, optional - extracts pages based onbarcode-detector-metadataproducing bybarcode detectortransformers. This parameter causes that the pages between subsequent barcodes are extracted to separateTransformedDataDescriptorwith metadata for the given range of pages
Dependency
<dependency>
<groupId>pl.beone.promena.transformer</groupId>
<artifactId>page-extractor-pdfbox-configuration</artifactId>
<version>1.0.1</version>
</dependency>
promena-docker-maven-plugin
<dependency>
<groupId>pl.beone.promena.transformer</groupId>
<artifactId>page-extractor-pdfbox</artifactId>
<version>1.0.1</version>
</dependency>
Properties
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.PdfBoxPageExtractorTransformer.priority=1
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.PdfBoxPageExtractorTransformer.actors=1
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.settings.memoryUsageSetting=org.apache.pdfbox.io.MemoryUsageSetting::setupMainMemoryOnly
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.settings.fallbackMemoryUsageSetting=org.apache.pdfbox.io.MemoryUsageSetting::setupTempFileOnly
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.default.parameters.split-by-barcode-metadata=true
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.default.parameters.timeout=