Looking to spur AI Democratization - Data Literacy and Data Stewardship hold the key
Organizations have realized that adoption of AI can influence their business growth trajectory immensely. They no longer see application of AI as the preserve of a small, centralized, and highly skilled group of data scientists. Rather, organizations would prefer to see business analysts and citizen developers, who are intimately involved with the business domain, and who are part of the lines of business or support functions, applying AI to business processes as part of their daily routine. There are several AutoML tools that abstract the complexity involved in building AI solutions. In any large organization though, there are multiple data sources and data platforms, data processing engines and thousands of data pipelines. The challenge therefore is not in selecting the right AI models or training and testing them, but in the data foundation that lies underneath. Also very importantly, this blog specifically highlights how data literacy and data stewardship plays a major role in making this transformation happen.
The Data Conundrum
On one hand, it is difficult to know what data sources exist and how to access them, which makes a strong case for uninhibited access to citizen developers to search and access data across the enterprise. On the other hand, it is essential for the organization to establish guardrails around who should have access to which datasets, whether some data attributes should be masked or anonymized and at the same time, ensure that regulatory compliance requirements are met. Any data breach, unauthorized access violations or data loss can damage the market standing of the organization and invite penalties or legal action.
How does one overcome this data conundrum? In my view, this calls for a strong campaign to drive data literacy across the organization and establish data stewardship as a strategic imperative.
Understanding Data Literacy
There are many different variants of what data literacy should entail in any organization. At all levels, it becomes imperative to evaluate the following-
- Are we asking the right questions or solving for the right problems?
- Do we know who the data custodians are? Can the data be accessed and managed securely?
- Can we ascertain where the data is coming from? Is it the most recent version of truth?
- Is the data of good quality? Does the data have inherent biases?
- Can we be sure that our data consumers are not accessing data that compromises the organization’s regulatory compliance posture?
- Can we interpret the data well? Test hypotheses using A/B tests to see what results pan out?
- Can we create easy-to-understand visualizations, tell a story to help decision-makers see the big picture and act on the results of analysis? Does the data storytelling go much beyond the ‘what-is-already-known-and-obvious’?
- Can the insights generated be integrated with business processes and workflows for end outcomes?
While a significant part of this can be addressed with advanced data engineering skills (and for the purpose of this blog, we are assuming that is a given), there is a strong case for understanding the business context of data and the soft skills needed to parse through the factors listed above.
This reminds me of some experiences in some of the early work performed for one of our clients. We were delivering on an AI solution for predictive IT operations, where it took us significant amount of time, actually several months, to painstakingly gather data from over 80 data sources and federate into a data lake. However, when the analytic insights to predict anomalies and outages at a business service level were generated, we realized these insights were not integrated with their existing processes and service management systems. This led to significant delays in realizing end outcomes. It was only after the existing service management team was apprised of what data is being used to generate these insights, trained on how these insights ought to be used, and integrated with their service management tooling, that we began to realize breakthrough business outcomes in terms of business service level predictability and end user experience.
And therein, the major learning for us was that a broader campaign around data literacy for data producers and data consumers within the organization – at all levels – should become a key cornerstone of any organizational data strategy. For working professionals and those aspiring to equip themselves adequately, nasscom’s FutureSkills Prime is a wonderful digital skilling initiative that offers many different training programs around Big Data and Analytics.
Understanding Data Stewardship
Simply put, a data steward is a subject matter expert responsible for a group of data or data domain within a line of business or a support function.
The data steward ensures that business glossaries are defined, creates and maintains data quality rules and is responsible for data governance. Data stewards have the responsibility to oversee data custodians, business stakeholders, or operations team members producing and consuming data. The data steward has to help execute on the data governance strategy. The data steward can also play the role of a catalyst in accelerating the data literacy campaign within the remit of his or her function.
The role of the data steward in creating good data assets cannot be understated. Without data stewardship as a key pillar of the organization’s data strategy, any data democratization or AI democratization campaign can very soon become a pipedream, or worse, a nightmare.
Conclusion
AI democratization is a welcome paradigm shift that is being made possible by newer technologies and solutions that are demystifying AI for broader, general application across the enterprise. But to get it right, there has to be an ongoing, conscious data literacy campaign within the organization and a sharp emphasis on data stewardship.
Written by Naveen Kamat, Executive Director & CTO, Data & AI Services, Kyndryl