Friday, December 4, 2015

An introduction to ESBs for Data Integration


From ETL Tools to ESBs
In the IT landscape, ETL (extract, transform, load) processes have long been used for building data warehouses and enabling reporting systems. Using business intelligence (BI) oriented ETL processes, businesses extract data from highly distributed sources, transform it through manipulation, parsing, and formatting, and load it into staging databases. From this staging area data, summarizations, and analytical processes then populate data warehouses and data marts.
Most certainly, ETL tools have their place in the IT environment, as numerous database admins utilize ETL tools to facilitate process and deliver optimal value to business.
  • Data Warehousing: Historically, the primary use for ETL tools has been to enable business intelligence. Pulling databases, application data and reference data into data warehouses provide businesses with visibility into their operations over time and enable management to make better decisions.
  • Data Integration: Data integration allows companies to migrate, transform, and consolidate information quickly and efficiently between systems of all kinds. ETL tools reduce the pain of manually entering data and allow dissimilar systems to communicate, all the while supplying a unified view.

ETL Tools Get Complicated

ETL tools indeed provide a method of communication between databases and applications, but pose significant challenges over time. Because creating this type of connectivity requires an comprehensive knowledge of each operational database or application, interconnectivity can get complicated as it calls for implementing very invasive custom integrations.
Over time, this approach grows increasingly complex, and the greater the number of interconnected systems, the more complicated things become. Moreover, with such tight coupling, interdependencies create the potential for big, unpredictable impacts when even the slightest changes are made. The custom point-to-point data-level integrations become a tangled web of brittle connections, quickly beginning to look like “spaghetti code”.

APIs and ESBs Simplify Data Integration

The increase in popularity of APIs has also made it much easier to create connectivity. With APIs, developers can access endpoints and build connections without having in-depth knowledge of the system itself, simplifying processes tremendously. As ETL tools remained focused more towards BI and big data solutions, and as traditional operational data integration methods become outdated with the rise in popularity of cloud computing, ESBs become better options to create connectivity.
An enterprise service bus (ESB) provides API-based connectivity with real-time integration. Unlike traditional ETL tools used for data integration, an ESB isolates applications and databases from one another by providing a middle service layer. This abstraction layer reduces dependencies by decoupling systems and provides flexibility. Developers can utilize pre-built connectors to easily create integrations without extensive knowledge of specific application and database internals, and can very quickly makes changes without fear of the entire integrated system falling apart. Shielded by APIs, applications and databases can be modified and upgraded without unexpected consequences. In comparison to utilizing ETL tools for operational integration, an ESB provides a much more logical and well defined approach to take on such an initiative.
Some of the commonly used ESBs in the Software Industry
1. Oracle Service Bus (OSB)
2. Mule ESB
3. Fuse ESB
4. Talend ESB

Data Integration in a nutshell


Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from a variety of sources.

Various data integration solutions help you understand, cleanse, monitor, transform and deliver data so you can be sure the information is trusted, consistent and governed in real time.






Data Integration Areas

Data integration is a term covering several distinct sub-areas such as:
  • Data warehousing
  • Data migration
  • Enterprise application/information integration
  • Master data management

Data Integration Techniques

There are several organizational levels on which the integration can be performed. As we go down the level of automated integration increases.
Manual Integration or Common User Interface - users operate with all the relevant information accessing all the source systems or web page interface. No unified view of the data exists.
Application Based Integration - requires the particular applications to implement all the integration efforts. This approach is manageable only in case of very limited number of applications.
Middleware Data Integration - transfers the integration logic from particular applications to a new middleware layer. Although the integration logic is not implemented in the applications anymore, there is still a need for the applications to partially participate in the data integration.
Uniform Data Access or Virtual Integration - leaves data in the source systems and defines a set of views to provide and access the unified view to the customer across whole enterprise. For example, when a user accesses the customer information, the particular details of the customer are transparently acquired from the respective system. The main benefits of the virtual integration are nearly zero latency of the data updates propagation from the source system to the consolidated view, no need for separate store for the consolidated data. However, the drawbacks include limited possibility of data's history and version management, limitation to apply the method only to 'similar’ data sources (e.g. same type of database) and the fact that the access to the user data generates extra load on the source systems which may not have been designed to accommodate.
Common Data Storage or Physical Data Integration - usually means creating a new system which keeps a copy of the data from the source systems to store and manage it independently of the original system. The most well know example of this approach is called Data Warehouse (DW). The benefits comprise data version management, combining data from very different sources (mainframes, databases, flat files, etc.). The physical integration, however, requires a separate system to handle the vast volumes of data.
Some of the well known Data Integration vendors, tools and software solutions are listed below:
Data Integration Solutions Review - Actian Pervasive
Data Integration Solutions Review - Adeptia
Data Integration Solutions Review - Clover ETL
Data Integration Solutions Review - Dell Boomi
Data Integration Solutions Review - IBM
Data Integration Solutions Review - Informatica
Data Integration Solutions Review - Microsoft
Data Integration Solutions Review - Oracle
Data Integration Solutions Review of Pentaho
Data Integration Solutions Review - SAP
Data Integration Solutions Review - SAS
Data Integration Solutions Review - SnapLogic
Data Integration Solutions Review of Talend