With SQL Server 2012, Microsoft has released a major and much needed update to its SQL Server Integration Services, including features that enable much better manageability features, performance tracking (and SSIS performance tuning) features, as well as better (closer) integration with SQL Server Database Engine.
In this post I introduce the new features of Integration Services that are enabled through the new Project Deployment Model, as well as highlight some of the short-comings of this new technology, particularly with regard to Scale-Out and Distributed Processing.
SQL Server Integration Services has introduced a completely new deployment model for SSIS packages, which is now being called the Project Deployment Model as compared to the legacy Package Deployment Model.
The new deployment model takes advantage of the new SSIS Catalog Database, which is a new centralized repository for managing deployment projects, through the now dominant SQL Server Data Tools development environment, and with the SQL Server Database Engine itself. In order to start any SSIS Project Deployment Model, one needs to create a new SSIS Catalog DB within the Integration Services Catalogs folder in the DB Engine, as shown in the image below.
SSIS projects built under the legacy Package Deployment model can be easily converted to the new model within Visual Studio (SQL Server Data Tools), any new (or converted) packages being deployed to the SSIS Catalog Database through the new Project Deployment model can be done directly from Visual Studio.
The new SSIS Project Deployment model has the following advantages over the legacy Package Deployment model:
- SSIS packages can be executed from within SQL Server’s Database Engine: using stored procedures to trigger the execution of SSIS packages.
- Support SSIS Parameters: parameters allows you to change execution variables and package configurations without having to fiddle with SSIS XML Configuration Files.
- SSIS Performance and Execution Reports: The new SSIS Catalog DB comes built-in with a few performance reports, which can be accessed through SQL Server Data Tools. These reports are a much needed addition to SSIS and shows live execution as well as historical performance of your packages, highlighting key information that can allow you to debug SSIS packages in a much easier manner. In the past BI developers that heavily invested in SSIS had to resort to 3rd party tools in order to track performance to this level. All underlying data can be accessed through Managed Views that you can extend with your own reporting needs.
- SSIS Environment Concept: this new concept allows you to configure different execution environments for your SSIS, and then executing a package under a different environment as required. This is similar to creating multiple XML Configuration files, and executing a package with different XML Configuration files as parameters.
These features form the building blocks of many organizational and management features, which will much better elevate SSIS as a well controlled data processing and ETL environment.
SSIS Project Deployment Model Disadvantages
There are some short-comings that still makes SSIS a less scale-able option in terms of being a fully fledged distributed data processing engine, and centralized management of this engine.
SSIS as a Distributed Data Processing Engine and Centralized ETL Management
The Project Deployment Model closely couples SSIS with SQL Server Database Engine, this means that you cannot manage a remote SSIS instance on one machine, through the SSIS Catalog DB that exists on a SQL Server Database Engine instance on another machine, there are no options to link an IS Server with an SSIS Catalog DB, and any attempt to deploy an SSIS package through the new Project Deployment Model on a server that does not contain SSIS you will receive the following error message:
The path for ‘ISServerExec.exe’ cannot be found. The operation will now exit.
Both SQL Server Integration Service and SQL Server Database Engine Service are heavy weight services, and take up plenty of resources in terms of I/O, CPU and memory, having to install both services in order to utilize the features of the new Project Deployment model is an over-kill, and goes against effective resource compartmentalization.
With distributed processing engines such as Hadoop gaining quick popularity, SSIS needs to offer a flexible management approach, were servers running SSIS can be dedicated to purpose, while leaving the management to a centralized point (an SSIS Catalog DB). In order to be truly distributed, SSIS needs to move the code to where the data exists, not move the data to where the code exists.
SSIS Catalog DB needs to allow you to manage multiple SSIS instances across multiple servers, offering a centralized point for the management of ETL processes, while ensure effective distributed processing of data in a truly scaled-out scenario. We are so close, and the Project Deployment Model is a step in the right direction, but it is a virgin technology with alot to learn in terms of true scale-out to handle the exploding data bubble.
You can implement a distributed engine using the legacy Package Deployment Model of SSIS through commandline execution of these packages, and using PsExec to issue cross server trigger commands.
If you feel like pushing for SSIS to become a true scale-out distributed processing engine, then please vote for Decoupling IS Server and the New SSIS Catalog DB (Scale-Out) on Microsoft Connect.