Over the past 20 years, the cost of technology and the way we access it have changed dramatically. We have moved away from only working with specialized vendors to using more and more open source software. Cloud service providers now provide easy and affordable access to advanced data and analytics capabilities. At the same time, the quantity and variety of data is growing, generated by connected devices (smartphones, Internet of Things, etc.).
While companies are investing heavily to determine what they can get out of all this data, they are not investing equally in capabilities to verify data. Without such verification, data may be outdated, inconsistent or inaccurate. And as more data enters the enterprise from external sources beyond the company’s control, this leaves them open to a new kind of vulnerability: manipulation, misuse, or even misinterpretation. When unverified data is used to drive advanced analytics and forecasting systems, it can create skewed and inaccurate insights, with serious consequences. In particular, it makes it harder—if not impossible—to correct accumulated risk in the future. If you can’t trust your own insights, how will you maximize and maintain the trust of your customers?
Data trust issue examples
Trustworthy data is essential to the success of an intelligent enterprise
Embedding trust in data and how it is used within a company does not happen overnight. By combining data governance, architecture and technology, Accenture works with companies to build confidence in their data-driven insights and be alert to new potential threats. Quoting Apple CEO Tim Cook on the question from Kara Swisher: “If you were Mark Zuckerberg, what would you do?” - Tim Cook: “I wouldn’t be in the situation.”
Data veracity is one of the technology trends in Accenture’s Technology Vision 2018. The website and report explain the importance of trustworthy data and what companies can do to address this new vulnerability. Several good examples across different industries are worth reading to complement this article.
From data warehouse to central data platform
Companies rely on data coming from a variety of sources. It is important to verify whether these sources are collaborating and consistent with each other to be able to trust the data they provide. In other words, data veracity is key. In the past, companies relied on data warehousing, which implied that data would be gathered at the end of the chain. Afterwards, a report would be produced. Today, because data plays such a central role in organizations, it is stored in centrally aggregated platforms. This means that all data is gathered in one location, not only for reporting purposes, but also to facilitate immediate interaction with customers, suppliers and employees. With the ability to build and run complex analytical models, it allows companies to store vast amounts of data history, waiting for insights to be uncovered.
Where does data veracity feature inside these central data platforms? The answer is quite simply: everywhere
However, by putting all the data in one place, it becomes very vulnerable. That is why the data needs to be secured properly. Where does data veracity feature inside these platforms? The answer is quite simply: everywhere. First, there is the traditional data quality check, i.e. checking for completeness, consistency, conformity, accuracy, integrity and timeliness. Secondly, data scientists look at the data, verifying whether it is correct, where it originated from, how much risk is involved, etc. Thirdly, there is the real-time evaluation of events, which determines whether the data can be trusted.
Accenture data platform
Just like we use our senses of sight, hearing, smell, touch and taste to verify the authenticity of something, or whether or not to trust someone, data platforms need ways to sense the truthfulness and accuracy of real-world experiences. With centrally aggregated data platforms, real-time data qualification helps to evaluate and grade the trustworthiness of the data. Machine learning and advanced analytics also verify the data’s context and calculate the risk involved using secondary models and metadata.
We need to start thinking about technologies and techniques that can help safeguard our data from inaccuracies
Depending on the context associated with data collection and its use (where, when, why, what, how and by who?), risk scoring can be performed and the data then tagged accordingly. Because we need to start thinking about technologies and techniques that can help safeguard our data from inaccuracies, whether they’re introduced maliciously or accidentally. For instance, when identifying a client on an online channel, a login with a distributed secure token will result in a higher level of trust than a login with a self-service created login. By integrating these types of evaluations into advanced analytical models, the risk exposure to data-related harm can be reduced.
Implementing data veracity touches on many aspects and needs to be a step-by-step approach. As part of the data platform architecture, Accenture offers a metadata foundation that integrates all relevant capabilities and allows them to grow in maturity over time. For every new initiative, available services can be used while new ones can be added to any capability in a modular way.
Accenture metadata foundation
Accenture’s point of view on facilitating ethical decisions throughout the data supply chain suggests that organizations that want to create strong relationships with users and maximize data value should begin with strong controls throughout the data supply chain. This point of view provides a set of questions that can be used as a starting point to dive deeper into specific concerns at each stage.
Data veracity not only relies on architecture and technologies. Data scientists are also needed to work with stakeholders to determine the embedded risks across data supply chains, and to set standards for how much risk is acceptable based on business priorities and the implications of automated decisions.
To reach next generation data excellence, we need data governance
Data governance typically deals with processes and technologies describing the organization’s available data, who owns it and how it is governed throughout the data life cycle. A business meaning is assigned to each individual data item, which is used as the basis to define standards with respect to security, quality, etc. With the introduction of data veracity, additional standards are added to define levels of trust and accuracy.
These definitions and standards are then linked with the systems that store, process and transport the data allowing stakeholders to verify the history from its origin. In the same way that data is aggregated centrally, this information – that is captured in the metadata foundation – is also centralized. Combining it with the right tools allows stakeholders to monitor behavior and context around the data life cycle, mitigating risks that threaten data integrity.
Companies don’t need to start from scratch. They can make use of their existing organizational resources – ramping up existing efforts in data integrity and security. At the same time, investments in cybersecurity and data science can be used to build a data intelligence practice to address data veracity issues. Anomaly detection, for example, can be adapted to monitor data creation, whether the data is originating from physical devices (for instance, checking connected devices for hardware and firmware compliance) or coming from virtual events (for instance, monitoring user events on a server to determine whether they are from a real human user or a bot). By investing in data veracity today, companies will maximize the value of their data – and the data-driven insights and strategies they rely on – for the future.