Mónica de Oliveira has been a Data Integration Consultant at PrimeIT for over a year and a half. Despite having a degree in Civil Engineering, she fell in love with the world of data and is currently pursuing a master’s degree in Information Management with a specialization in Knowledge Management and Business Intelligence at Universidade Nova de Lisboa. Today, she explains to us the importance of good data integration:
“Have you ever stopped to think about how big companies can process large amounts of data to generate a high-value analysis for business development and growth?
A critical piece to complete this puzzle is data integration. This process makes it possible to guarantee data availability in the correct time, place and format, in a secure manner and following what was requested. In this way, data can move between different sources, applications, systems and structures, always maintaining consolidated information.
Data availability for all services and systems is essential for managing the enterprise’s information and decision-making – based on historical, clean, and consolidated information from multiple sources. With the increase, both in quantity and in diversity, of the data format, the integration process emerges as an essential step in the business intelligence methodology. This technique allows data from multiple sources and types to be represented in the same format after undergoing transformations and combinations to achieve a defined structure.
The performance of an efficient data integration process increases its consistency and transparency. It allows the fast creation of reports according to the standards and needs of the business, being also very useful in identifying problems in the quality of the information itself.
A standard procedure for performing data integration is called ETL. ETL is an acronym for the English words “Extract, Transform and Load“.
Let’s get a better look at what each stage means.
The data extraction from the sources can be done, for example, with the reception of the files (CSV, XML, txt, etc.) to be treated or the collection in a database, namely through a query with specified conditions.
In this step, data is modified to comply with rules, requirements and structures defined with the business team and end-users, making it more consistent. It can be a simple mapping or include aggregations, derivations, filters, standards, format changes, lookups, and conditional rules.
A data validation process is also fundamental. It allows, for example, to check if the mobile number field contains only digits – and the correct number of characters -, or whether the date of a past transaction is not in the future. It is also possible to create a process to identify existing errors and create a new file with that information.
The processing data loading for the end-user can be carried out in a new file or using a database. Only new or changed records (incremental load) or all records (full load) can be loaded. When creating a data integration process, it’s important to discuss some points inside the teams involved. It is necessary to define and file in the documentation the stages of the process, the inputs/outputs of each activity, the periodicity, how to manage the recovery (in case of failures) and the details about archiving the files.
In concluding, the development of quality data integration processes allows us to reduce costs and processing time, and to ensure that the information is used more efficiently, securely and quickly, bringing benefits to the company.”
Data Integration Consultant