Sean Nathaniel is its CEO DryvIQthe Unstructured Data Management Company trusted by more than 1,100 organizations worldwide.
On a recent executive communications call, a leader of a large enterprise shared how their 120,000 users were actively involved in improving data quality across the organization. Intrigued, I looked at other enterprise customers and saw a pattern: organizations that involved content owners in data quality management often saw better results than those that relied solely on automation.
As businesses navigate an increasingly data-driven world, the need to manage unstructured data quality is imperative—especially as many prepare to implement productive AI solutions like Microsoft Copilot. Without high-quality data, these solutions cannot fully fulfill their promise, but achieving the right level of data quality is a challenge. The volume and complexity of unstructured data make it impossible to rely solely on human analysis or manual management. While automation has opened up new possibilities, many businesses still struggle to leverage the content of knowledge workers for transformative GenAI initiatives.
The proven secret to meeting this challenge is combining human insight with technology efficiency through HR data quality automation. This approach helps improve data reliability and keeps processes secure and scalable, giving organizations the highest confidence in achieving better results from their AI investments.
The Nuances Of Knowledge Worker Content
Unstructured data is produced at an amazing rate due to the constant creation and updates by both computers and humans. Each form of content and its growing volume creates significant data quality management challenges.
Computer-generated documents—such as automated invoices, bills of lading, transfers, and other data from line-of-business (LOB) applications—are quickly piling up in repositories, creating mountains of dark data. This dark data can become digital baggage. while it may have been relevant at some point, estimating its value is difficult at scale. Automation is essential to manage the quality of this data.
In contrast, user-generated data—design specifications, patents, sales and marketing strategies, financial forecasts, and other forms of intellectual property—have significant value, but are much more difficult to manage. The challenge with this data is determining its relevance at scale, as file attributes alone do not provide enough insight for unsupervised automation. Data owners hold the key to understanding its meaning, making human involvement vital.
While fully automated systems offer efficiency, they lack the big picture that only humans can provide. For example, a system might automatically archive relevant but old data, such as brand documents or engineering files, based on age or activity timestamps rather than their actual meaning—something only a human could interpret. However, relying on users to archive every document is impossible due to volume. people can’t keep up.
So how do business organizations solve this challenge and ensure the highest possible data quality to power transformational initiatives? The answer lies somewhere in the middle.
Aligning data quality with Human-In-The-Loop automation
Human-in-the-Loop automation is an effective way to achieve high-quality unstructured data. It’s a hybrid approach that combines the contextual understanding and intuition of humans with the speed and scalability of automation. While automated systems can infer much based on document contents and file characteristics, the document owner ultimately determines whether the data is valuable and relevant.
Human-in-the-loop data management systems incorporate human judgment into automated processes, inviting users to review, improve, and validate automated results at specific points. This helps ensure accuracy and relevance at scale, improving efficiency without increasing operational risk.
By strategically incorporating human feedback into automated data management, organizations can achieve the necessary level of data quality needed to support successful data-driven initiatives.
Human-In-The-Loop Automation Implementation Considerations
Implementing human-to-loop automation starts with creating data management workflows that include automated actions with human validation before finalization. There are two main considerations:
1. What is the goal?
The goal of your data initiative will shape your workflow. Preparing data for a GenAI solution? Need to automate data retention and compliance? Or reducing storage costs? Determining what makes a document relevant and where it should go—for example, to an archive or the recycle bin—depends on the business objective.
2. Who should be involved?
Who should provide feedback to your human-in-the-loop system? While the owner of the document is often the best authority on its value, that person is not always the creator of the document. Identify the data owners in each business unit and ensure they are involved in the development of the workflow, as they will have the final say in validating the automated process.
What Human-In-The-Loop Automation Looks Like
Based on my experience, the following are examples of workflows that businesses could use to implement human loop automation in data management.
Data Classification
Using AI-powered data discovery to analyze document content, the workflow automatically applies or validates classification tags and then asks the data owner to confirm them, ensuring high accuracy.
Data Retention and Archiving
The workflow manages data retention and archiving by following rules for deleting or retaining data according to legal requirements and creating an archival cache based on document age and usage. It incorporates periodic human validation, prompting data owners to confirm relevance and ensure compliance with retention policies before finalizing actions.
Off-boarding employee
The workflow automates departing employee data and account management, prompting administrators to validate content being retained or archived, which reduces the accumulation of old data. Improves handling of user accounts and enforces retention policies, minimizing manual oversight for efficient storage and compliance.
People and technology: Unlocking the true potential of data
High-quality data is essential for digital transformation, whether for GenAI adoption, improved compliance and security, IT modernization or optimized organizational efficiency. However, achieving the right level of quality on unstructured data requires more than just automation. A strategic, hybrid approach that combines human intelligence with the speed of automation can help prepare your document assets for a data-driven future.
Embracing this balance and sharing responsibility for data quality can better ensure that your organization is ready to fully leverage the potential of your data for future innovations. This is more than just an assumption – it’s a best practice among some of the world’s leading businesses.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Am I eligible?