Using Big Data to generate, archive & automate documents (PDF, AFP..)

Most of the organizations have to generate various types of documents:
Invoices to vendors
Filing forms with governments
Statements, letters etc
They have needs to archive the generated documents, send these documents by FTP, Email channels and print them for physical delivery.

Organizations are creating data lakes and storing large amount of operational data in Hadoop. They can leverage the data and create operational jobs to generate various types of documents, archive in distributed file systems and deliver using various channels easily while utilizing storage and massively parallel processing of Hadoop.

Insight Lake's Document generation solution aims to help companies achieve their document generation goals very easily with rich functionality and power of Hadoop.

Generate PDF forms

Fillable PDF forms allow a user to fill editable fields in the form, save it, print it or send it to someone easily. Most of the time companies need to submit forms to government entities periodically due to compliance process.

These companies have reporting department where analyst collect data, finalize their excel sheets and fill the PDF forms manually. This process is very costly and time consuming. Some companies have to file PDF forms at city level every month and if they have Global presence then this process takes too much effort and investment. If we can automate the data collection, form collection and delivery then that will save huge cost and reduce time of operation.

InsightLake PDF Forms

InsightLake document generation solution enables organizations to automate the process of data collection, mapping, PDF form generation and delivery by using power and storage of Hadoop.

PDF Form Feature

PDF form feature allows business user to create form template by uploading PDF template, mapping data from various sources to PDF form elements, provisioning archival location and scheduling the generation and delivery.

PDF & AFP Document Generation

Other than PDF forms organizations have to create invoices, statements, letters etc, which are documents for which they need to create a template structure and then fill the data. For example to generate monthly invoice or statements companies would need to create a template with placeholders where data will be calculated and filled during generation. To create the template companies would need a template designer with their assets (Images, Logos etc).

Finally they would need to generate PDF and AFP formats, which are industry standard formats for document printing. Also they would need to archive the generated documents and deliver it to consumers, vendors or printing vendors.

Document Generation Feature

Document generation feature allows business user to create templates using an interactive visual designer, mapping data from various sources to document properties, provisioning archival location and scheduling the generation and delivery.

Interactive Template Designer

Document template designer allows business user to design a document template easily by dragging and dropping visual elements from the tool set. Following elements are provided in the tool set.
Text, Paragraphs
Bar codes

Template designer allows both absolute positioning and relative position of elements. User can create document properties and use them in the template design. These properties are then used during run time to fill the right value from provisioned data source.

Property Mapping

Document properties can be mapped to various data source elements. These properties are then filled during run time to generate the document with values.

Automation Pipelines

Document generation pipelines can be created using visual drag and drop designer. These pipelines are scheduled to run at specified period. Operational data is captured about the pipelines like number of documents generated, failed etc.


Generated forms or documents can be archived in distributed file systems like S3, HDFS, SWIFT, Glacier etc.


Automation pipelines can deliver generated documents in following ways:
FTP/SFTP location
S3 partner buckets
Calling API and uploading documents to external store