Democratizing and simplifying the availability and storage of internal datasets is a high priority for Arrayo clients. We have significant experience in the delivery of a data persistence layer that make it possible to create data-lakes and further extend the democratization of data within companies ranging from small biotech startups to large, enterprise level big-pharma and financial services corporations. When possible, we make sure data is open within the constraints of an access policy instead of restricting access with permissions.
Almost every large organization has a data lake somewhere where their data can be retained for long periods of time at a low price and where data hungry users can go to compile many disparate sources together. A data lake is nothing more than a central location where many different forms of data reside in the same infrastructure. Data lakes have built-in fault tolerance: for example, Hadoop is set to have three replications of all data, making it extremely unlikely for information to be lost. This also allows calculation tools such as Spark to perform large and complicated calculations across distributed data at a speed not possible in the past. Data Lakes also provide connectors so that data professionals can leverage distributed computing to extract insights and perform the types of analysis that are mission-critical to your enterprise. Data Lakes have been a mature part of the data ecosystem for some time, due to the simplicity of the infrastructure, the ease of setup, and the value that companies derive from them. This maturity is bringing with it a number of tools to help bring even more value out of these massive data stores.