3 steps to discover more in your data from the start

What if techniques typically used for automating delivery could be reused to improve data discovery in the early stages of a data initiative?

The way teams operate and collaborate is undergoing a massive change. Delivery is becoming more agile and with it, an expectation of earlier and more frequent delivery of value. Large homogenous teams focusing exclusively on technical, functional or business tasks are becoming a thing of the past. Today, business and technical people work side-by-side rather than communicating requirements, issues and questions back-and-forth. Technology and automation are critical success factors in this collaboration. What if they can also support data discovery? When done right, this approach can be a catalyst for digital transformation: new services, better collaboration between business and technical teams, increased satisfaction for engineers when working on innovative concepts, etc.

‘Automation drives delivery. So, let’s use it for discovery as well!’

Having business and technical people working side-by-side is resulting in different ways of analyzing data. Functional analysts are now collaborating with their business counterparts or end users to explore and study the data together. Small hackathon-like workshops are creating a common understanding and way forward. However, we notice that a lot of the time discussions remain very theoretical. Our approach in this article attempts to overcome this.

Having data available from the start is crucial. As soon as people see data with their own eyes, they can identify issues, find solutions and dream about opportunities. Our 3-step approach reduces the effort required in making this data available. Firstly, we define the data that needs to be analyzed. Secondly, we use automation techniques to centralize the data so that in a third stage, we can leverage self-service reporting tools during data analyses and workshops. The underlying idea is to reuse existing tools and techniques as a catalyst rather than define new complex solutions.

Step 1: Define the data

Data analysts define the data that needs to be analyzed. This data is typically located in several databases and potentially interesting data can be expressed in one or more questions. Data that is interesting for further analysis will be used as input for the second step. At this stage, it is important to focus on getting the necessary input; data correctness is not yet important.

Finding interesting data to analyze
  • For each data source, the appropriate questions, SQL queries will be described: select customer_name, customer_address from customer
  • Attributes are defined and wildcards are used when in doubt: select * from customer_payments
  • Filters are used when known: select * from customer_payments when type = credit_card

Step 2: Centralize the data for analysis

After making the inventory of interesting data, an extraction procedure will move the data into a centralized data store. This data store makes it possible to analyze and combine data in an iterative manner while not impacting any of the actual systems. By the end of the analysis, the data store can be removed, and the implementation track can continue. The difference will be that knowledge of the data will be larger, the design more precise and future issues already anticipated.

Centralizing interesting data for analysis

The type and location of the data store will depend on a company’s technology choices. However, this is made a lot easier when typical infrastructure governance barriers are removed. If the timeline for requesting the data store takes too long, adoption will not follow. If a data store is made quickly available and ready to use with minimal effort, teams are more likely to start, and more importantly, keep using it. Security rights can be given to people who need access and the necessary formalities and approvals can be registered in line with the company’s data governance procedures. This can be initiated easily without impacting other teams.

Distributed approach allowing different teams to work independently

The most basic deployment scenario is a local data store, allowing a proof of concept to be quickly developed with a minimum effort. Several possibilities exist, including SQLite, a light-weight, file-based database offering the flexibility of a database, which can be stored and shared like a normal file. Since its connectivity supports Java Database Connectivity (JDBC), it can connect to many reporting and analysis tools. When more advanced options and flexibility are needed cloud storage offerings provide a great way to offer effective and affordable solutions to get started.

‘Start investing in your first automation framework today to reach maturity tomorrow’

For data extraction and analysis, we believe it is best to invest in an automation framework from the start. The more configuration, parameterization and decoupling applied in the solution, the higher its maturity will become. But it will need to grow over time, so starting from the outset makes sense.


Creating an automation framework to facilitate data analysis

As yet there is no product on the market offering all the functions required. There are many automation frameworks and toolsets that can help, but creative and innovative engineering are still needed. It is important to recognize this because the choice of technology and toolset can vary depending on the available skills and preferences. It is crucial to remain flexible and unconstrained in order to create and leverage reusable objects, and to scale in steps so that the solution evolves based on actual needs. This approach will be faster and more cost effective in delivering value to the organization. Once you have created your first automation framework, you can start thinking about new functions and services, growing the solution step-by-step.

Step 3: Use the data in a workshop

Once the collected data is in a data store, queries can be executed and used as input for a data workshop. Data workshops are an effective way to encourage collaboration. Some of the questions to get you started in the investigation process include:

  • What data is available? (discovery)
  • How will we use the data? (opportunities)
  • What data do we need for this use case? (design)

‘Make optimal use of self-service reporting tools in data analysis’

Make the workshop fast and efficient and combine discussions fluently with technology. If it takes too long to find an answer in the data, people will disconnect. Data discovery and profiling boil down to writing and running queries. But when engineers start writing them on the fly, the value does not appear. Technology lends a hand here by providing self-service reporting and advanced analytics tools. These tools are built for drag-and-drop reporting, adding fields and filters and changing the visualization. The strength of these visualizations is that trends, patterns and content can be easily identified and complement discussions during the workshop. Large investments or expensive technologies or tools are not necessary. Many industry leading self-service reporting tools offer free license options or the possibility to evaluate them free of charge during a short period before acquiring them.

Getting started

Start simple! Don’t get carried away and spend a lot of time and effort in building complex solutions. Bring passionate business and technical people together and allow them to gradually build an automation framework. Make sure you architect the solution and adhere to some basic principles: modularity, parameterization and configuration. The key lies in control, with endless possibilities you need to grow in steps and learn from experience. New technologies and techniques can be integrated when relevant but should never be the drivers of change.

Want to know more about how our Accenture data foundation can help you derive more value from your data, one step at a time? Feel free to contact us!

Author: Peter Billen