SSIS - SQL Server Integration Services
we decided to produce an article on SSIS or SQL Server Integration Services, BI or Business Intelligence is becoming an ever more important business tool and SSIS is corner stone of SQL Server BI offering. We have asked Sven Blackwood, an experienced and highly qualified database professional with many years experience in SQL Server including beta testing SQL Server 2008 to produce the article for us.
In order to maintain a rational, and confident, line of information to keep and improve competitive advantages, Business requires information sharing. This has become more important in this world where managers are looking for balanced scorecards. Hence, keeping data integrity is not something that relies on the IT department alone any more.
This is where SQL Server Integration Services (SSIS) gets in the game. To see this we look at integrating data from all information systems. Whenever you try to integrate information, you face three major challenges:
So why BI?
The answer is simple; information integration is today perhaps the most used business tool to acquire competitive advantage. Let us take a look at a hardware store as an example.If we have a new buyer who only buys during times of sale, they might consider replenishing Christmas lights in the second week of November. Without proper information, he will not know the lights are already stored in the back of the Texarkana warehouse for the season, and are now going to take extra time to find and ship.
That means the warehouse will not treat the order as urgent. This is mainly because they know all the seasonal buyers placed the last of the Christmas light orders in September, and the lights are due to arrive between the first and second week of December. At this point, the buyer finds the delivery date is deep into the season and will be useless to them until the next year. Here is an example of a fact where he would be aware if he could tap into the ERP Oracle Database from the distribution company.
Additionally, in our example this particular hardware store received an unusually big order for lights from the developers of a new mall, the reason sales in Christmas lights went up showing a nice increase. Without the use of the CRM database, the new buyer will not know this and might overstock the store with items that impact all the numbers of the store, from the ROI, to Stock Rotation.
Being able to see all the data at once is what SSIS is about. No one would buy a boat to sail into the sea without also buying all the working instrumentation.
Imagine a boat sales clerk telling a customer he will only provide a compass, but since the customer does not know how to use it, he won�t provide a fuel, or oil gauge, or a sonar to detect other ships, until the customer gets the feel of the compass. As a boat customer, I know I would not buy the boat, most people would not. And one would certainly not take it for a transatlantic cruise.
Why do you expect your executives to drive the boat without all the data? The answer is simple you do not. You want all your decision makers to always make informed decisions.
Therefore, in getting all the information together for this buyer, we need to consolidate data from:
1. CRM System, that by the way is in Goldmine, and on .DB files.
2. ERP System in Oracle db.
3. The floor manager Excel sheets to provide insight on warehouse movements.
4. Several .txt files that give some insight of particular purchases, sales, and indigenous data.
In addition, you need to take into consideration that information once gathered and filtered should follow these guidelines:
1. Give out coherent and reliable data, both historical and actual, even when integrated from internal and external sources.
2. Some of the data will be accessible via Internet, or ftp, to avoid the delay in the chain of information.
3. Needs to filter the information to assure its quality, and remove duplicate entries.
4. With all the certification and regulation processes increasing now today, it is always a good move to have the information tested and certified regularly.
The Three Obstacles of Data Integration
As I said early on this article, we have to address three big challenges to achieve this integration.
The first major obstacle is the technological challenges
Today data gathering comes from a wide variety of sources. We have internet files, xml files, RSS files, proprietary documents, office documents, and a myriad of different systems to help you do specific tasks on software that is not always MSSQL based.We need to transform all this data, and assure its quality, from different sources in great volumes. Some is structured, but some is not and most are different times of arrival all funneling into a single data stream that makes sense to the corporative analysts.
Even when implementing this integration process, we find ourselves with yet another challenge. Time; when to do the job. We could try to lot process the data, in the past this was feasible but with less night hours available in the workplace, which has changed. We need to utilize a different process.Taking into account the convenience of the internet and the availability, with Windows 2003 server, to remote connect, more and more executives can work out of their homes. They work at weird hours of the night, hence the need to promote a change from lot processing to a more fluid, shorter, and on demand process.This means we must constantly be uploading data from different sources, and publishing in several destinations, in some cases without having a window of more than a few minutes to do this consolidation.
Scalability and performance are necessary. Two words that really take precedence especially when addressing companies like Amazon and other online retail companies which cannot afford an information blackout.
Even with the right technology and procedures, we need to have some alternative staging ground during the data mining and Extraction, Transformation and Loading (ETL) processes. But this warehousing also imposes larger times to close the data circle, this being a problem for even the most demanding systems.
The second major obstacle is the Organizational Challenge
The Organizational challenge is easily divided into two different issues, the energy or data flow challenge and the comfort zone challenge.
In the energy or data flow challenge we face the same major problem as when installing a new quality system, a new company wide procedure, or when we are trying to optimize resources. The problem is we need all the personal involved in the process, whether they are warehouse people capturing the data, or the manager of the call center who has to have the reports inside the system on time, every time.
If people are not involved in the process, then it is surely destined to fail.
Working within the comfort zone is the next big challenge. Let us face it, different people like to work with different programs. I have seen accountants do everything, even letters, in excel. They have word, but they feel more comfortable using Excel.
With that in mind, we need to make sure the data integration is not solved manually. It is our job to be sure the technology used to make this data integration is the best for the project, while not trying to impose on the individual user.
The final obstacle is the Economical challenges.
The two prior factors are responsible for making the data integration process the most expensive data-warehousing project available. Here are some of the major challenges:
1. The data mining process is often one of the longest and hardest processes. It is sometimes seen through the organization as leverage for power processes.
2. The cleansing of data from different sources and the transformation into rational, verifiable, and coherent data proves nearly impossible at times.
3. Since the regular tools cannot get the work accomplished in the correct way, a company expends a lot of money in proprietary technology development, by elaborating their own ETL processes.
4. And, sometimes in each part of the organization, we have different data silos.
When it comes time to combine those silos, since each part uses different technologies, the expense increases. The worst part, maintaining an organization with defective data warehousing plans, usually leads to a big white elephant for the business.
But not all is lost, with SQL Server Integration Services; we have tools to fix these issues. SSIS has a task oriented motor and a fast and scalable data flow motor. The task-oriented motor provides a stable platform for the data flow motor. This dual motor is perfect for those ETL or DW tasks as well as other operations in the data centre.
Located in the middle of the SSIS is the data channeling process. This process, based on buffer technology, allows a user to process rows of data very quickly once loaded into the memory.The approach of the SSIS data channeling process is on the contrary to all their competitors. By processing all this data quickly, the process is effectively the warehousing of temporary data.
With SSIS all the data transforms into structured data, (rows and columns) before loading into memory. This allows any process that applies to tabular data to apply to data in memory, not mattering if the source is a .txt or an .xml file. And, SSIS accepts temporary warehousing solutions in case the company strategy requires it somehow.
Perhaps the most important part of the SSIS process is its flexibility. It allows us to pull data from several sources and, not only make several complex transformations, but publishes them to several locations as well, in a single process. In addition, while the data is flowing you can merge it, divide it, recombine it, or reprocess it with other data sources.
One characteristic to the SSIS flexibility is the use of Adapters. Adapters are how SSIS obtains data from different sources, like ODBC, XML, .NET, and of course personal adapters. These are easily developed in house or by third parties. Once the data is loaded we can add, order, search, change rows to columns or columns to rows, combine, convert and audit the data.
Besides all these operations, we can also use the Small Changes Dimension. It allows for SCD type 1, type 2, the fixed attributes, and inferred members types.
Multidimensional OLAP (MOLAP) is also something we can obtain as a standard result. This allows us to feed a good analytical application and not only a data warehouse.
Microsoft Research invented this little term, and it actually helps guarantee data integrity. This point of view means all data is independent and does not depend on any other data, hence you can use this kind of transformation to clean all the data, not only the address fields.
SSIS is also involved in the Data Mining process of the Analysis Services. With these procedures, the user can encapsulate the data in a Data Mining Model, which allows us to predict which data enters into the model and which data might be potentially wrong.
The mining tool improves data. We can even replace the data we think is wrong with better data in a closed cleaning loop, or using a third party cleaning tool.
Service Oriented Architecture services (SOA)
Since it can handle xml data, via disk or directly from the web, it is easily uses for SOA.
By using Text Mining, you identify terms that are related between the categories and the text fields, allowing us to identify in one process the �interesting� information to better display the data, thus increasing customer satisfaction in our products or services.
The Data Reader is a characteristic that allows us to transport data to a Data Reader object in ADO.NET. This permits the user to obtain clean, organized, and integrated data to pass to other applications. For instance, some reporting services need this option.
This trait allows the SSIS to be used not only as an ETL tool, but also as a source of integrated data. This is the reason why SSIS is more than an ETL tool. It is part of the Business Intelligence (BI) platform of the SQL Server, which allows the development of BI Applications for a business.
Business Intelligence Development Studio
Also, used together within the Business Intelligence Development Studio, the Reporting Services and the SSIS allows the enhancement of the SSIS into an Integrated Development Environment (IDE) for Business Intelligence, programmed in Visual Basic� Visual C++� and .NET among others. With this in mind, SSIS is controlled by a series of API, in Visual C++ and win32 natives.
In short, the existence of the SSIS in the SQL services makes it possible for a company of any size, to compete in the market today and integrate their information easier and with control that is more complete.
By developing the applications with Visual Studio, the total cost of ownership is depleted very quickly. This means not only big corporations can make better decisions based on its information but so can smaller companies.