One of the biggest challenges in BI is tackling an analytics project to measure performance only to find out in the end that the data you are using is not entirely correct and hence not showing what is truly happening.
Basically, what you see is what you get. If the underlying data is inaccurate, of course the performance indicator is not accuarate too. Maybe you’ve spent countless hours building an end-to end BI solution only to produce a product that has no absolute value because the data output just isn’t right! Sound familiar? If so, what if you knew that before you started and could possibly do something about it and get that data cleaned up before you even began!
Let’s talk about Data Profiling. Data Profiling by definition depends upon which perspective you are coming from. From a Data Warehouse administration perspective, maybe you think of Data Profiling as MetaData, that is “data about the data ” – that is how your data structures and schemas of your warehouse are built and how all ties together.
If you are coming from a Business Analyitic & Performance Management Perspective however, Data Profiling is also “data about data quality” – that is how the integrity stands in the value of the business data. This is where the SSIS (SQL Server Integration Services) Data Profiling Task serves to help determine the integrity and validity of the data you are wanting to measure.
Here you can produce exception reporting of the data way before you begin building a solution. You can really know how clean your data is, determine the acceptable tolerance for data error and midigate the risks with data cleansing in the beginning with the SSIS Data Profiling task. The Data Profiling task computes various profiles that help you become familiar with a data source and identify problems in the data that have to be fixed.You can use the Data Profiling task inside an Integration Services package to profile data that is stored in SQL Server and to identify potential problems with data quality.
The Data Profiling Task can compute eight different data profiles. Five of these profiles analyze data in individual columns, and the remaining three analyze multiple columns or relationships between columns and tables.
idea tip: …. want to set up exception reporting for data cleanup? Create an SSIS package containing a data profiling task showing a column null or column pattern matching against a pattern (such as phone numbers etc.) and a FTP file transfer task for the XML output file,. Then schedule the package to run in SQL Agent on a schedule and create a report in reporting services using it as a data source and send it to clean up people on a scheduled subscription.