In the recent past, the architectural design of data warehouses has predominantly been ETL (extract, transform and load). However, with emerging technologies taking center stage, the architectural vision of data warehouse systems has gone through a radical shift. This shift has led to a new data warehousing process, as the previous data models were not equipped to handle the exponential increase in user data. Often, businesses were also unable to meet their critical data-related business needs due to their inability to incorporate changes into their large single data repositories. In turn, that translated into a longer waiting periods for reports that were requested by business users. Eventually, organizations ended up specifically dedicating resources to "extract data,” performing additional and often redundant analytics and reporting activities outside the data warehouse itself.
Thus, several manually intensive and expensive processes have to be established to fill in the gaps created by the limitations of the architecture design of the ETL-based data warehousing systems. The strategic solution was the implementation and transition to the data warehouse design technology ELT (extract, load and transform).
Importance of Data Warehouses in Banking and Financial Services
Banking and financial services involve numerous data intensive processes, from withdrawing money from an ATM to suggesting the most lucrative investment portfolio to a high value client. This makes data warehouse testing extremely critical in this domain. Figure 1 shows some examples of scenarios that involve data warehouse processes in specific financial services domains.Figure 1: Data warehouse system dependability in financial services sub domains
ETL, Focusing on Desired Data Output
The ETL data warehouse technology is primarily based on the report requested by the business user. The approach is top down. First, a decision is made regarding the desired output data (i.e., the format and content of the report demanded by the user) and then the required data is extracted, transformed and loaded, as seen in Figure 2. With the focus on output, the input data needs to be processed to remove any data anomalies, ensuring high quality of data for transformation and loading.Figure 2: The standard ETL process
Data Warehouse Technology ELT
The concept of ELT data warehouse technology came into existence because of the high business demand for error-free, high-quality input data in data warehouse systems. Here the approach is bottom-up. The data is initially extracted to a staging area, where the business rule and integrity checks are performed.With ELT, the load and transformation processes are decoupled from each other, as shown in Figure 3. Hence, the development is split into smaller manageable chunks, which make project management, change management and the development of data warehouse systems much simpler.
The ELT data warehouse systems overcome the shortcomings of ETL systems, which paves the way for this transition across industries, especially in financial services and banking.
Figure 2: The ELT Process
Achieving Thorough ETL to ELT Conversion Testing
If an organization decides to transition from ETL to ELT, from a QA perspective, a lot of planning and strategy is required to ensure that the conversion is thorough and successful. In any typical re-engineering project, especially one that includes an ETL to ELT conversion, the following validation points need to be looked at (Please note that the ETL process is mentioned as the source and ELT process as the target wherever applicable):Structure of the target table: The ELT output table structure is usually not normalized for easy accessibility to the input data. Processes and columns need to be mapped to ensure consistency between the ETL source systems and the ELT target systems.
Data type of the target table: Usually, the structural compatibility between the source and target database needs validation. Accordingly, test cases are framed to warrant data quality in the targeted database. Lastly, the output format of the data is validated to ensure no truncation has occurred to the values of data.
Intermediate phases/models: Usually, the loading or the staging area is divided into intermediate models that include:
- Input data model, which contains data that is clean and completely decoupled from end users.
- Reporting data mart, which contains data that is useful for generating reports for the end user.
- Reporting view, which contains formatted data in a predetermined template.
Figure 4: Sample intermediate stages in an ELT system
- Customer table to loan table.
- Loan to document table.
- Document to customer table.
- Static table testing of relationship codes.
Process validation: This entailstesting the data loading and scheduling of jobs by checking the error logs and output of the scripts used for loading. The analysis of error logs is done to report any inconsistencies (e.g., an actual error occurrence not reported in the error log or vice versa). The ELT jobs are then triggered in sequence to test the ELT data warehouse system behavior.
End-to-end validation of the ELT system: All the stages (i.e., the source, input data model, reporting model, reporting view and report flow) are to be tested using predefined input data to ensure that it moves through all stages in the expected way, generating the desired output. This end-to-end validation of the ELT system is essential for detecting inherent defects due to the interdependency of code/data (if any) between all the stages.
ELT isn’t a solution to all the problems of data warehouse systems. In some specific cases, a hybrid solution of both ETL and ELT methodologies would be the appropriate model to adopt, depending on business requirements. To illustrate this, consider a situation where a data warehouse has multiple source systems. The data from these sources can be brought to a common staging area through the ETL process and then specific ELT processes can be further applied to generate user-specific reports.
However, having an ETL, ELT or ETLT as a data warehousing solution is not enough. A clear understanding of the vision of the data warehouse output from an end user’s perspective is essential for a successful data warehouse testing validation. It’s also important to comprehend the business context and clearly define the input data, their interrelationships and the associated processes in the data warehouse system to generate correct and relevant reports for the business/end user.
Original Source
0 comments:
Post a Comment