Pdfreader-call

From GreenVulcano Wiki
Jump to: navigation, search

Definition

PDFReaderCall plug-in is compatible with all PDF formats, and it is very simple to configure. It receives in input a binary stream containing the PDF file, or reads it form file-system, and returns in output a corresponding XML structure.

To the output of PDFReaderCall plug-in it is possible to apply a ChangeGVBufferNode operation or an XSL transformation to retrieve all interesting data.

GreenVulcano® ESB provides two different tools, GV Console® and VulCon®, to configure all supported plug-ins.

VulCon / GV Console Configuration

pdfreader-call is the operation that must be configured into VulCon® or GV Console® System section, to convert an PDF file in GVBuffer.object field, or in file-system, in an XML document.

In order to add an operation pdfreader-call you must define the following fields:

Attribute Type Description
class fixed it.greenvulcano.gvesb.virtual.pdf.reader.GVPdfReaderCallOperation

(java class that manage ExcelReaderCall invocation).

type fixed This attribute must assume the value call
name required This field identify the operation name that you will use in service definition.
fileName optional Pdf file name. Can contains placeholder to be decoded at runtime. If not defined the Pdf file content must be into GVBuffer.object field.
pageStart optional Starting page for conversion. Can contains placeholder to be decoded at runtime. If not defined is -1, meaning that only Pdf metadata must be extracted.
pageEnd optional Ending page for conversion. Can contains placeholder to be decoded at runtime. If not defined is -1, meaning that till Pdf's last must be extracted.
embedPDF optional If true the input pdf file is embedded as base64 data into the output XML. Default to false.

The following example shows the configuration generated from VulCon® or GV Console® when you configure a pdfreader-call operation:

<?xml version="1.0" encoding="UTF-8"?>
<GVSystems name="SYSTEMS" type="module">
    <Systems>
        <System id-system="system-name" system-activation="on">
            <Channel id-channel="CHANNEL_NAME">
                <pdfreader-call class="it.greenvulcano.gvesb.virtual.pdf.reader.GVPdfReaderCallOperation"
                                name="ReadPDF" type="call" pageStart="1" pageEnd="1" embedPDF="true"/>
            </Channel>    
        </System>
    </Systems>
</GVSystems>

To use an pdfreader-call in a GreenVulcano® ESB service, you need to define a node of type GVOperationNode in Service section and define in the field operation-name the name defined in pdfreader-call operation.

The following example shows the configuration generated from VulCon® or GV Console® when you configure an pdfreader-call operation in GreenVulcano® ESB service:

<?xml version="1.0" encoding="UTF-8"?>
<GVServices name="SERVICES" type="module">
    <Groups>
        <Group group-activation="on" id-group="DEFAULT_GRP"/>
    </Groups>
    <Services>
        <Service group-name="DEFAULT_GRP" id-service="SERVICE-NAME"
                 service-activation="on">
            <Client id-system="SYSTEM-NAME" statistics="off" system-activation="on">
                <Operation name="RequestReply" operation-activation="on"
                           out-check-type="none" type="operation">
                    <Participant id-channel="CHANNEL-NAME" id-system="SYSTEM-NAME"/>
                    <Flow first-node="pdf_reader" point-x="20" point-y="112">
                        <GVOperationNode class="it.greenvulcano.gvesb.core.flow.GVOperationNode"
                                         id="pdf_reader" id-system="SYSTEM-NAME"
                                         input="input" next-node-id="end"
                                         op-type="call"
                                         operation-name="ReadPDF"
                                         output="pdf_xml" point-x="158"
                                         point-y="112" type="flow-node"/>
                        <GVEndNode class="it.greenvulcano.gvesb.core.flow.GVEndNode"
                                   end-business-process="yes" id="end" op-type="end"
                                   output="pdf_xml" point-x="358" point-y="112"
                                   type="flow-node"/>
                    </Flow>
                </Operation>
            </Client>    
        </Service>
    </Services>
</GVServices>


At this point you have configured a service with an pdfreader-call operation.

Example

This example shows an XML document generated by a simple PDF document:

<?xml version="1.0" encoding="UTF-8"?>
<pdf>
    <metadata>
        <page-count>5</page-count>
        <title>FOP Development: RTFLib (jfor)</title>
        <author/>
        <subject>Apache FOP</subject>
        <keywords/>
        <creator/>
        <producer>Apache FOP Version 0.94</producer>
        <creation-date>2008-07-31T16:06:16+02:00</creation-date>
        <modification-date/>
        <trapped/>
        <extra>
            <x:xmpmeta xmlns:x="adobe:ns:meta/">
                <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
                    <rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="">
                        <pdf:PDFVersion>1.4</pdf:PDFVersion>
                        <pdf:Producer>Apache FOP Version 0.94</pdf:Producer>
                        <pdf:Creator>Apache Forrest - http://forrest.apache.org/</pdf:Creator>
                    </rdf:Description>
                    <rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="">
                        <xmp:MetadataDate>2008-07-31T15:06:16+01:00</xmp:MetadataDate>
                        <xmp:CreateDate>2008-07-31T15:06:16+01:00</xmp:CreateDate>
                    </rdf:Description>
                    <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
                        <dc:date>2008-07-31T15:06:16+01:00</dc:date>
                        <dc:title>FOP Development: RTFLib (jfor)</dc:title>
                        <dc:description>Apache FOP</dc:description>
                    </rdf:Description>
                </rdf:RDF>
            </x:xmpmeta>
        </extra>
    </metadata>
    <pages end="1" start="1">
        <page num="1"
            >PDF created by Apache FOP
http://xmlgraphics.apache.org/fop/
FOP Development: RTFLib (jfor)
Version 627324
Table of contents
1 General Information............................................................................................................................. 2
  1.1 Introduction.....................................................................................................................................2
  1.2 History.............................................................................................................................................2
  1.3 Status...............................................................................................................................................2
2 User Documentation.............................................................................................................................2
  2.1 Overview.........................................................................................................................................2
  2.2 Document Structure........................................................................................................................ 3
  2.3 Attributes.........................................................................................................................................3
</page>
    </pages>
    <base64pdf>JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovVGl0bGUgKEZPUCBEZXZlbG9wbWVudDogUlRGTGli&#13;
IFwoamZvclwpKQovU3ViamVjdCAoQXBhY2hlIEZPUCkKL1Byb2R1Y2VyIChBcGFjaGUgRk9QIFZl&#13;
cnNpb24gMC45NCkKL0NyZWF0aW9uRGF0ZSAoRDoyMDA4MDczMTE1MDYxNiswMScwMCcpCj4+CmVu&#13;
ZG9iago1IDAgb2JqCjw8IC9OIDMKL0xlbmd0aCAyMiAwIFIKL0ZpbHRlciAvRmxhdGVEZWNvZGUg&#13;
Cj4+CnN0cmVhbQp4nJ2Wd1RT2RaHz703vVCSEIqU0GtoUgJIDb1IkS4qMQkQSsCQACI2RFRwRFGR&#13;
pggyKOCAo0ORsSKKhQFRsesEGUTUcXAUG5ZJZK0Z37x5782b3x/3fmufvc/dZ+991roAkPyDBcJM&#13;
WAmADKFYFOHnxYiNi2dgBwEM8AADbADgcLOzQhb4RgKZAnzYjGyZE/gXvboOIPn7KtM/jMEA/5+U&#13;
uVkiMQBQmIzn8vjZXBkXyTg9V5wlt0/JmLY0Tc4wSs4iWYIyVpNz8ixbfPaZZQ858zKEPBnLc87i&#13;
ZfDk3CfjjTkSvoyRYBkX5wj4uTK+JmODdEmGQMZv5LEZfE42ACiS3C7mc1NkbC1jkigygi3jeQDg&#13;
SMlf8NIvWMzPE8sPxc7MWi4SJKeIGSZcU4aNkxOL4c/PTeeLxcwwDjeNI+Ix2JkZWRzhcgBmz/xZ&#13;
FHltGbIiO9g4OTgwbS1tvijUf138m5L3dpZehH/uGUQf+MP2V36ZDQCwpmW12fqHbWkVAF3rAVC7&#13;
...........
</base64pdf>
</pdf>

With a ChangeGVBufferNode is possible parsing XML and retrieve any tag and value.