ETL Solution

ETL solutions are especially important in today’s world, where you need to distil meaning from terabytes of data.

ETL (extract, transform, and load) is a term used to describe the movement and transformation of data between systems handling high data volumes and complex business rules.

ETL tools are widely used in data integration, data migration, and master data management projects. They are critical for data warehouses, business intelligence systems, and big data platforms because they can be used to retrieve data from operational systems and process it for further analysis by reporting and analytics tools. The reliability and timeliness of the entire business intelligence platform depend on ETL processes. So, how do we choose one?

We did a bit of research and here is part 1., where we analyze SSIS for building ETL packages. Check out also:

part 2. where we go through KNIME platform

part 3. on Custom ETL Solution with  SQL Tables, Stored Procedures and Managed Code (C#)

SQL Server Integration Services

SQL Server Integration Services (SSIS) is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data migration tasks. SSIS is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.

microsoft-sql-server-integration-services-1

SSIS  Features

Integration Services include a rich set of built-in tasks and transformations; tools for constructing packages; and the Integration services for running and managing packages. You can use the graphical Integration Services tools to create solutions without writing a single line of code, or you can program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.

SSIS provides the following built-in transformations:

  • Aggregation
  • Audit
  • Cache Transform[5]
  • Copy/Map
  • Data Conversion
  • Data Mining Model Training
  • Data Mining Query
  • Derived Column
  • Dimension Processing
  • Export & Import Column
  • For loop Container
  • Foreach Loop Container
  • (Fuzzy) Lookup
  • Fuzzy Grouping
  • OLE DB Command
  • Partition Processing
  • Percentage Sampling
  • Sampling Transformation
  • Pivot
  • Row Count
  • Row Sampling
  • Script Component
  • Slowly Changing Dimension
  • Term Extraction
  • Term Lookup
  • Unpivot

Extensibility and programmability of SSIS

Users may write code to define their own connection objects, log providers, transforms, and tasks.

SSIS features a programmable object model that allows developers to write their own hosts for package execution. Such a host can respond to events, start and stop packages, and so on. The object model also allows developers to create, store, and load packages, as well as create, destroy, and modify any of the contained objects.

Within limits, SSIS packages can load and call CLI assembly DLLs, providing access to virtually any kind of operation permissible by the .NET CLR.

SSIS can be used on all SQL Server 2005, 2008, 2008 R2, 2012 and 2014 editions except Express and Workgroup.

Pros

  • SSIS is a Ready to use ETL framework which has all common ETL operations.
  • Small learning curve.  SSIS consists of visual drag-and-drop workflow editor and modules leveraging mostly MSSQL features to manipulate the data. So everyone who has a good level of MSSQL management and development can start using SSIS fairly quickly.
  • Available resources.  Developers who have good experience with MSSQL after a short learning period (1-2 weeks) can start developing and supporting SSIS supported ETL infrastructure.

Cons

  • Not critical limitation in flexibility and extensibility. Even though we have a large number of data manipulation modules we cannot very easily extend this functionality in case we want to get some very specific functionalities. It is possible but will require additional extensions development.
  • A paid license.  Server Integration Services (SSIS) is only available in the “Standard”, “Business Intelligence” and “Enterprise” editions of MSSQL Server.
Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert