Moving Legacy Data into a Warehouse
The IDB wanted to repurpose some of its data assets so that a new generation of users could benefitThe Challenge
More than 20 years ago, the Inter-American Development Bank (IDB) started to build an information system to track national trade and integration. Initial data sets were for trade flows by product; later, it added tariffs and preferential tariffs, tariffs quotas, rules of origin, sanitary and phytosanitary measures, rejected exports and more. As the data sources and data types grew, the system grew more complex and evolved into a loosely-connected hybrid of three databases connected by ad hoc data migrations. The production processes were arcane, redundant and took far too long to complete. The databases themselves contained redundant and inconsistent data, made worse by missing or corrupted fields. In fact, the three distinct databases are connected together by ad-hoc scripts that “glue” the data together as it was needed.
In addition, the system’s front-end was supposed to allow different kinds of users to query the data to obtain results relevant to them. However, the queries were slow and the workflow to complete them was cumbersome to the point of impossible. For example, changing a data filter, such as a country of interest or a product grouping or a year, required the user to restart the query from scratch, resulting in a frustrating and time-consuming process that all too often did not completely deliver the desired outcome. The consequence was that despite the inherent value of the data, few researchers were using it and those who did were barely realizing its potential.
The Solution
The IDB engaged Looi Consulting to evaluate the INTrade data architecture and make recommendations to optimize the system in order to produce interactive visualizations and reports on the fly with simpler yet more effective workflows. Rather than redesign the existing data tables and workflows (an option that was not considered economically viable), we proposed to design and develop a solution to migrate existing data without affecting the stable data curation processes or existing tables.
Based on those recommendations, Looi Consulting was further engaged to design and develop the following:
- Data Warehouse with the proper fact tables, views and dimensions to serve the correct data on-demand.
- ETLs to move and transform the data from the three databases to the Data Warehouse.
- Web services/APIs to produce JSON objects that can be consumed by applications to produce interactive visualizations and reports.
A subsequent project then leveraged these changes to perform more engaging and fully interactive experiences, providing users with better ways to explore and compare the data.
Capabilities
Looi Consulting was a unique consultancy in that it provided strategy-level advice as to how to evolve and manage the IDB’’s data assets, but didn’t stop there. In addition, we developed a plan to realize it and then implemented it from prototype to production.
Our demonstrated capabilities included:
- Data Management Consulting: Broad and deep understanding of client business and technical problems.
- Data Modeling: Capturing business problems in data and designing solutions.
- Data Migration: Use of Microsoft’s SSIS; management of semi-structured data.
- Data Systems Planning and Architecture: Data cleansing and Data extraction.
- Data Warehouse: Design and development.
- ETL, ELT and related data management skills: Enterprise data warehousing design.
Timeframe
The initial engagement for discovery, architecture and design lasted 4 months. Design and build-out of the first phase of the Enterprise Data Warehouse took 8 months; it has been successfully deployed and continues to be improved. Today, it supports the work of economists, policy advisors, governments and investors throughout the Americas.
%
Query Time Reduction
%
Redundant Data Eliminated
Bad Data Records Removed
Simplified Architecture
Performance Increased 100X
Reduced Query Time
Eliminated Redundant Data
Discovered and Removed Bad Data
We identified and cleaned 60,000 bad and duplicate records.
Simplified Architecture
- Query Times 1%
- Data Size 25%