Building ETL package Part 2. Knime

Today’s article is a part 2. of our analysis on how to build ETL package. In the previous one, we have covered SQL Server Integration Services, and now we will go through KNIME platform.

Check out part 3. on Custom ETL Solution with. SQL Tables, Stored Procedures and Managed Code (C#) as well.

KNIME is analytics platform helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. It is provided under GPL Free public license,  meaning it can be downloaded, distributed and used for free.

The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates, and others. It integrates all of the analysis modules of the well known Weka data mining environment, and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines.

KNIME is based on the Eclipse platform and, through its modular API, is easily extensible. When desired, custom nodes and types can be implemented in KNIME within hours thus extending KNIME to comprehend and provide first-tier support for highly domain-specific data. This modularity and extensibility enable KNIME to be employed in commercial production environments as well as teaching and research prototyping settings. knime-2 Scalable: effortlessly toggle between single computer, streaming, and big data executions. The solution allows integrating new capabilities on top of, alongside, or within your existing infrastructure. Data Blending: simple text files, databases, documents, images, networks, and even Hadoop-based data can all be combined within the same visual workflow. Tool Blending: Python, R, SQL, and Java scripting nodes enable legacy code & expertise to be reused, graphically documented, and shared among data scientists. Visual: easy-to-learn graphical interface means that coding is optional and work is visually documented. knime-1

  • Market Basket Analysis and Recommendation Engines
  • Combining Text and Network Mining
  • Credit Scoring / Credit Rating / Customer Risk

Pros

  • Purposely designed ETL framework with a huge number of  ETL operations. A wide range of modules which can be used for data transformation and even analytical operations as well as visualization.

Cons

  • Higher learning curve. Knime is not that popular, and what makes it more difficult is finding available resources, which results in increasing support risk and cost of ownership.
  • Java based complicated hosting infrastructure.  So, support team should have available IT and Java guys to setup and configure the Knime server infrastructure and connectivity.
  • Low Extensibility and Flexibility. Because Knime is a stand alone tool which consists of its own specific components, it makes it difficult to extend the functionality. It is possible to write add-ons and integrate via API but it will require additional effort.
  • No available resources. As Knime mostly relies on its own set of technical approaches and solutions we do not have many available people with Knime skills on market.