Agile and Big Data – A Match Winning Combination
Agile Methodology
Agile is a software development methodology to execute a software project incrementally and systematically within a fixed time frame so that businesses can see the benefits within a short period than waiting for a longer duration. The fun of using agile is, it offers periodical output so that the results can be validated, and corrective action can be taken. The traditional methods like iterative or waterfall start with analysis, design, development till the deployment takes place but the issue is, that business doesn’t realize any value till the end of the project life cycle. These methods are risky, costly and less efficient as the business must wait till the end of the project for any corrective action. As mentioned above, Agile is a methodology or philosophy of an organization to develop a software solution incrementally using short cycles of 2 to 4 weeks so that the development process is aligned with the changing business needs. Instead of a big bang/single-pass development to deployment for several months, all the requirements and risks are discussed upfront. Agile follows a process or framework of frequent feedback where a workable product is delivered after 2 to 4 weeks of iteration. What A Regular Project Flow Looks Like This project begins with an analysis of various components, followed by an ETL strategy approach, then towards data model designing, ending at report development. These components from analysis to development/testing and deployment takes several months before the business can see any result or benefits. The impact of this project are: • Going back and changing the design post testing and before deployment is quite difficult • Studying if the product is in a working condition is not possible until the last day • Some of the risks remain unknown until the last day • Any delay hamper downstream applicationsBI/Big Data Projects
A project gives a business the necessary framework supported by tools and technologies to take the required strategic, tactical and operational decisions in the form of analysis. The analysis could be reports, dashboards, scorecards, and analytics. BI/Big Data provide the strategic and operational insights of business which are crucial for decision making. These insights are provided through various dashboards, OLAP reports, predictive analytics, and mining models. These insights help the organization to take key decisions that will have far-sighted implications.Agile for BI/BigData Projects
In a typical software industry, the general perception is that BI/Big Data typically works well with waterfall or iteration model. Considering the BI success ratio (which is less than the failure ratio), the industry’s first choice is the non-agile model. Most of the BI projects have the components to be developed: • ETL packages/mappings • Data model • Data Quality • Reports/Dashboards The project begins with an analysis of the various components, ETL strategy, design of the Data model to the development of reports. These components from analysis to development/testing and deployment take several months before the business can see the result and benefits. This will result in the following: • It’s difficult or suicidal to go back and change the design once the testing is done before it can be deployed • Users can not see any working product until the last day • Some of the risks remain unknown until the last day • Any delay will further hamper the downstream applications So how we can avoid all these issues and ensure that the BI project is successfully executed? The answer is Agile. Are you wondering how will one deliver BI project with agile? Is this riskier than the waterfall or Iterative model? How will one ensure that the BI project undertaken is a blockbuster? Well, without a thought out plan, Agile is even more dangerous than any other model. The following paragraphs will spell out the mantra for success in using agile for BI projects, leading to a great success story. For an illustration purpose, we are using insurance domain and related modules. All the projects start with a specific business requirements and related topics so that the specific problem/pain areas can be addressed. Insurance policy is mainly a legal contract between an insurance company and policyholder for any liability coverage for many types of risks. It covers property and assets from loss or damage in the event of disasters like fire, theft and natural calamities. The insurance domain consists of many business processes that define the nature of business. Some of the key processes are policies, claims, underwriting and renewal. The key ask from any insurance firms is as following • How will one analyze the data? • What extent of granularity can be applied to the analysis? • Is there any analysis to help one in setting the premium pricing? • How will one differentiate between risky vs non- risky properties? So, the insurance relies heavily on analytics to address some of these key queries. The data is scattered across multiple systems in various structured and unstructured forms. The data volumes are growing every day and that poses a challenge to the insurance firms to continuously churn these data sets to meaningful insight. Big Data technologies and tools will allow firms to find these insights and help them to take some of the key decisions. Big data offers a framework to continuously bring these data sets, even in real time by addressing key data issues, building models and analyzing the data. Bringing in Modularity – The Agile concept By adapting modularity and functionality, any BI/Big Data program can be implemented in Agile and a business can reap all its benefits. The Agile concept is built on the basic principle of modularity. The backlogs/requirements are collected and collated in a highly structured way. A backlog is an important artifact of Agile where all the user stories are stored. These backlog items (user stories) are aligned with various business processes. This takes quite a good amount of time to interview various business users with brainstorming sessions, surveys and 1 on 1 group discussions followed by verification of the understanding. These business processes once identified and confirmed, will be further separated into various modules/sub-modules based on the functionality. This process is called mapping, which helps to map the modules to business processes. This process is a key ingredient in the overall agile framework and the success of an agile project depends a lot on the success of how well the structure of the modularity and functionality is done. This will give an advantage to the team so that they can manage changes to the project at any given stage and efficiently manage the outcome. The next big exercise for an agile team is to identify module dependencies. Based on the functionality, it is key to identify the inter-module and inter-business process dependencies so that a key mapping is developed to understand the relationship between various modules. This exercise helps the agile team in many ways; • Knowhow of the changes and the impact on various modules • Divide and rule – Parallel development activity can be taken where there are no dependencies between modules resulting in significant cut down in elapsed time • Seamless integration • Easy adaption when there are changes to the requirements at any stage of the project Once the modularity and constraints/or dependencies are identified and baselined, prioritizing of these modules will be done so that the agile team along with the business can decide on the order of development and deployment of modules.Sprint Planning
The user stories as part of the backlog are prioritized but not estimated yet. The estimation requires a detailed discussion as part of sprint planning. Hence the agile team takes part in the grooming session to discuss all the user stories along with their priority/dependency, story points, the capacity of the team, productivity and timeliness. Once each story is assigned with story points, the Sprint capacity is committed based on a team’s capacity to develop those many stories in each Sprint. This exercise will lead to the number of sprints so as to develop all the stories that are part of the backlog.Sprint Execution
For better representation, we are considering a data lake/data mart solution for one of the insurance firms. The solution encompasses of pricing models, product analysis, risk cost per policy, claim activity management and claim scoring/forecast. Firstly, all the backlog items should be arranged and aligned with various business processes. In this case, the business processes are Underwriting, Reinsurance, Policy, and claims. The backlog items will be grouped into the above processes like: Underwriting- Need to see the evaluation and sharing the risk factors
- Need of a report to do Primary Factor Analysis