Promena Transformer - page extractor - PDFBox
This transformer provides functionality to extract a range of pages from application/pdf
documents using PDFBox 2.0.16.
Visit Promena#Transformer to understand the repository structure.
Transformation PdfBoxPageExtractorDsl
, PdfBoxPageExtractorParametersDsl
The DataDescriptor
has to contain at least one descriptor. If more than one descriptor is passed, the transformation will be performed on each of them separately.
Support PdfBoxPageExtractorSupport
Media type PdfBoxPageExtractorSupport.MediaTypeSupport
application/pdf; UTF-8
➡️ application/pdf; UTF-8
Parameters PdfBoxPageExtractorSupport.ParametersSupport
pages
,List<List<Int>>
, optional - extracts each list of the pages (indexed from 1) of the lists to separateTransformedDataDescriptor
splitByBarcodeMetadata
,Boolean
, optional - extracts pages based onbarcode-detector-metadata
producing bybarcode detector
transformers. This parameter causes that the pages between subsequent barcodes are extracted to separateTransformedDataDescriptor
with metadata for the given range of pages
Dependency
<dependency>
<groupId>pl.beone.promena.transformer</groupId>
<artifactId>page-extractor-pdfbox-configuration</artifactId>
<version>1.0.1</version>
</dependency>
promena-docker-maven-plugin
<dependency>
<groupId>pl.beone.promena.transformer</groupId>
<artifactId>page-extractor-pdfbox</artifactId>
<version>1.0.1</version>
</dependency>
Properties
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.PdfBoxPageExtractorTransformer.priority=1
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.PdfBoxPageExtractorTransformer.actors=1
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.settings.memoryUsageSetting=org.apache.pdfbox.io.MemoryUsageSetting::setupMainMemoryOnly
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.settings.fallbackMemoryUsageSetting=org.apache.pdfbox.io.MemoryUsageSetting::setupTempFileOnly
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.default.parameters.split-by-barcode-metadata=true
transformer.pl.beone.promena.transformer.pageextractor.pdfbox.default.parameters.timeout=