Build a Custom ETL Solution Part 3. SQL Tables, Stored Procedures and Managed Code (C#)

Today’s article is a part 3. of our analysis on how to build ETL package. In the previous ones, we have covered SQL Server Integration Services, and KNIME analytics platform.

The process can be optimized by having modular ETL solution architecture which consists of core mini-framework functionality to handle customizable general workflows and use logical data mining and transformation blocks implementing all required ETL operations.

Customizable Workflows

Possible frameworks options to re-use:

  • Windows Workflow Foundation allows creating and hosting a long-running workflow that uses custom activities, tracking, persistence.
    building custom ETL package
  • Stateless – Simple State Machine allows building lightweight state machine-based workflows directly in .NET code:
    SimpleStateMachine, written in C# that provides an easy-to-use state machine that can easily be configured.

ETL nodes / execution blocks

ETL nodes are the modules responsible for data transformation and manipulations.

Block represents modular activity blocks implemented in .NET language like C#,  Data permanent and temporary storage done on DB level using SQL queries, tables, views, stored procedures, functions, and triggers.

Those kinds of native platforms provide the maximum level of flexibility and performance.

Pros

  • High flexibility in terms of implementation possibilities. Since we are using C# and MSSQL in the low level we are not limited by what we can implement.
  • No learning curve. Anyone who has experience with C#, MSSQL, and basic ETL principals can be involved in ETL development and support it relatively quickly.
  • Available resources. Since we are using mainstream technologies base for .NET development we do not have a shortage of development resources.

Cons

  • Higher implementation effort as we need to implement the core functionality of ETL modules. However, some components of ETL architecture can be leveraged from existing frameworks and libraries like Workflows frameworks, and data reading/writing libraries and adapters supporting multiple formats which will keep the effort reasonable.