How do you bring all your data together? Last Friday, at our IT Leadership Group Meeting, Sayeed Reza, Product Engineering, and Data Solutions at Optum led an interactive discussion on the topic, Data as a Service Platform (DaaS). A huge thank you to Optum for hosting us!
Sayeed kicked off the meeting defining what Data as a Service Platform does, which is to provide a single and secure interface for discovering, accessing, analyzing, and visualizing data regardless of data types, data platform, data sources, and data domains. Digital transformation has been able to happen because of the easy access to data.
Enterprise APIs for a small volume of data, Data Lake and Data Warehouse for a larger volume of data have been the norm in the industry.
Data warehouses and data lakes are both widely used for storing big data. While some organizations may utilize both, others may use one or the other. So, what is the difference between a data warehouse and a data lake? Sayeed provided the group with a visual to differentiate the two.
Data Warehouse | Vs. | Data Lake |
Processed, refined data | DATA STRUCTURE | Raw, unprocessed data |
Schema-on-write | PROCESSING | Schema-on-read |
Expensive for large data volumes | STORAGE | Low – cost storage |
Less agile | AGILITY | Highly agile |
Mature | SECURITY | Maturing |
Business Professionals | USERS | Data Scientists |
It all comes down to the experience you are trying to provide to your consumers. As much as we like to provide the best data platform for our consumers, there are many challenges for the data producer and the consumer. Some problems that the consumer may face include the existence of a data source, connection details, documentation about intended use, and process of requesting access to the data source. On the other hand, there are many challenges that the producer may face including such as annotating data sources with descriptive metadata is often a losing effort. As well as creating documentation for data sources is usually a lost effort.
According to Forrester Research, only 14% of business stakeholders make thorough use of customer insights. That’s because most companies don’t have access to their data. So, why does a company need a data catalog? A data catalog helps organizations organize and find data that is kept in their various systems.
Streaming as part of Data Platform has been very popular recently. Kafka, an open-source streaming platform has opened up many many possibilities for transporting or even servicing/distributing data using API.
As each organization is at a different place with their data to wrap up our meeting, Sayeed provided a few tools and technologies for building a DaaS Platform. A few he included were; Dremio, Trifacta, Kafka, Confluent, Apache Drill, Apache Arrow, etc. If you are interested in learning more about what was discussed, check out the slide deck, here.