The explosion of Big data has resulted in many new opportunities for the organizations leading to a rapidly increasing demand for consumption at various levels. The big data applications are generating an enormous amount of data every day and creating scope for analysis of these datasets leading to better and smarter decisions. These decisions depend on meaningful insight and accurate predictions which leads to maximization of the quality of services and generating healthy profits. This storm of data in the form of text, picture, sound, and video (known as " big data") demands a better strategy, architecture and design frameworks to source and flow to multiple layers of treatment before it is consumed. The 3V's i.e. high volume, high velocity, and variety need a specific architecture for specific use-cases.
When an organization defines a data strategy, apart from fundamentals like data vision, principles, metrics, measurements, short/long term objectives, it also considers data/analytics priorities, levels of data maturity, data governance and integration. This is very crucial for the organization's success and a lot depends on its maturity. As the organization moves forward with the aim of satisfying the business needs, the data strategy needs to fulfill the requirements of all the business use-cases.
The use-cases differ from one another resulting in one architecture differing from another. In such scenarios, the big data demands a pattern which should serve as a master template for defining an architecture for any given use-case. Most of the architecture patterns are associated with data ingestion, quality, processing, storage, BI and analytics layer. Each of these layers has multiple options. For example, the integration layer has an event, API and other options. The selection of any of these options for each layer based on the use-case forms a pattern. Likewise, architecture has multiple patterns and each of them satisfies one of the use-cases.
The big data architecture patterns serve many purposes and provide a unique advantage to the organization. The pre-agreed and approved architecture offers multiple advantages as enumerated below;
1. Agreement between all the stakeholders of the organization
2. Better coordination between all the stakeholders within the organization especially between Data Strategy and IT
3. All the stakeholders provide their complete support for the implementation of the architecture
4. Minimal or no effort from all the stakeholders during any new architecture implementation
5. Faster implementation of new architecture
6. Early enablement of architecture will lead to the speedy implementation of the solution
The architecture pattern can be broadly classified as;
1. Source
2. Data Integration
3. Storage
4. Data Processing
5. Data Abstraction
6. Data Schema
Each layer has multiple architecture options along with technologies tagged to each of them. The source system or application broadly generates 3 types of data namely, structured, semi-structured and unstructured depending on the nature of the application. This data can be acquired in many ways using any of the methods like messaging, event, query, API or change data capture (CDC). The extraction of data could be either push or pull depending on which method of architecture pattern is used. Generally, API, CDC and messaging use push while query uses pull mechanism. The ingested data needs storage and this can be done on relational, distributed, Massively Parallel Processing (MPP) or NoSQL databases. In some patterns, the data resides in memory. The in-memory storage is useful when all the processing has to be done in memory without storing the data. The processing of data can be distributed, parallel or sequential. The data abstraction and schema define the output format and further redirect it to analytics, dashboards or downstream applications.
Once the architecture pattern is defined, it can be used for any new or modified use case as mentioned in the below illustration.
As an organization expands its business, it has to deal with a new set of applications and data. In this scenario, the organization's existing data architecture supports only a structured dataset whereas the adoption of new applications generates semi-structured and unstructured data. In such scenarios, a well-defined architecture pattern, as part of the data strategy, can quickly absorb and adopt the new use case requirements. The above illustration depicts the end to end flow of the architecture that is required to bring the semi and unstructured data to support the business with the required analytics and predictive models.
Well, we have covered the architecture patterns with various options like Kappa, Lambda, polyglot, and IoT and included all the major patterns that are currently used. We will glance at other aspects of data strategy in the upcoming articles. Feel free to comment or reach out to me on basu.darawan@gmail.com / https://www.linkedin.com/in/basavaraj-darawan-0823ab54/