Site Map

Parallel Processing in Centerprise Integration Server

VERSION 3 Published

Created on: Sep 13, 2007 1:52 AM by mukta - Last Modified:  May 15, 2008 8:13 PM by mike

Overview

Centerprise Integration Suite is designed for efficiently handling high volume data transformation and integration tasks. The product is designed as a hyper-parallel multi-threaded engine and takes advantage of multi-CPU and multi-core systems to run multiple steps of transformation tasks in parallel. One key design feature of Centerprise Server is the minimal level of processing overhead associated with parallel processing. When the number of CPUs on the server is increased, performance of the server also increases proportionately. Doubling the number of CPUs doubles the performance of the server.

This document provides an overview of the parallel processing architecture of Centerprise Integration Server and how it enables Centerprise to efficiently process high volume data integration and transformation tasks.

Data Integration Pipeline Architecture

Centerprise engine processes data integration tasks in a multithreaded pipeline where a transformation records goes through multiple steps. Transformation tasks typically comprise a series of steps such as reading, parsing, validation, mapping, and writing. Centerprise pipeline steps are designed to run in a high-performance thread pool. Therefore, at any point, many records may be processed by the mapping and writing steps in parallel.


Reading and Parsing Files

Reading, parsing, and validating flat file tasks are optimized by using a multithreaded reading and parsing pipeline. This pipeline spreads reading and parsing tasks among multiple threads resulting in enormous performance gains when processing large files. Additionally, flat file parsers use a highly efficient single pass algorithm for optimal file processing performance.

Data Validations

Data validations in Centerprise are defined using a high-performance formula language similar in syntax to Excel formula language. Centerprise rules engine is a high performance rules engine designed and optimized for large volume data processing tasks.

Transformation

Data transformation tasks are performed by a pool of threads. Transformation step is designed to process a large number of records in parallel. Where appropriate, multiple transformation tasks within a single record are processed in parallel. For instance, database lookups are carried out in parallel to minimize impact of network and database latency. Additionally, lookups use an intelligent caching mechanism to minimize the number of database trips.

Database Writing/Loading

Database writing and loading tasks are optimized using a variety of techniques. These include bulk inserts, batch updates, and data synchronization.

One key optimization is the concept of parallel database writers. This feature uses multiple database connections to perform parallel writes to database. When combined with bulk inserts, batch updates, and other optimization techniques, this approach yields large scale performance and scalability improvements.

File Writing

This is another area where parallel processing is leverage to boost of the overall transformation process. Data formatting, validation, and writing tasks are handled by the thread pool in parallel to spread
transformation processing among multiple processors.


Average User Rating
(0 ratings)




There are no comments on this document