Distributed Data Platform: Rethinking the Data Pipelines & Lakes using Domain Driven Design

Student: Nicolette McLean, 2020-2021

Sponsor: Avanade, Seattle, WA/ London, UK

For my project I will be examining the inner structure of large and small company data platforms. This data is important to deliver individual specific marketing to customers found through data science, analytics and AI. Typical data platforms currently contain “data lakes and pipelines” in which data is ingested by the system, “cleansed” or organized into specific categories within the data lake, then redistributed through a pipeline structure based on customer specific preferences. However, the current structures of data platforms tend to lose data and disregard domain. I will be looking into restructuring data platforms in such a way to stop the loss of data, make customer data outputs more specific and create a cleaner, more regulated platform. I will do this by researching how to bring data platforms from a centralized structure back to a decentralized one. There is also a need for this type of data reorganization to be quantified as there is a prevalent gap between the financial world and technical world. I will be closing the gap by creating financial models that coincide with the efficiency that results from reconstructing the data.