“AI” is a lofty term that doesn’t have a universal definition. And in fact, AI is not even new. The field of research in artificial intelligence was established at Dartmouth College in 1956. The term “machine learning” was coined three years later by a man named Arthur Samuel, who taught a computer to play checkers.
What is new is that a “perfect storm” convergence of advancements in the last decade – better algorithms, increased computing power, sophisticated processing techniques, and the advent of big data – has brought AI out of theory and into operation.
The result is that industry after industry is being transformed. Two years ago, the AI researcher Andrew Ng announced that “AI is the new electricity,” and he joked that the only industry to be unaffected by AI may be hairdressing. Mr. Ng has since been proven wrong by L’Oreal, which recently launched a 3D hair color simulation app that’s powered by AI.
Legal leaders are left trying to sort through all the generalized AI hype so we can put it to work in practical ways. The adoption is happening slowly. Last year’s study from Altman Weil found that 29% of firms are exploring their options for incorporating AI tools, but only 7.5% have actually begun to use them. At the same time, 72% of respondents believe that the pace of change in the profession will only continue to increase.
The implications of this change may be particularly profound in the areas of privilege and responsive reviews, areas where our challenges are rapidly evolving. ESI is appearing in huge scales and a proliferation of forms. With this growing volume, turnaround times become impossible without employing armies of reviewers.
Ever since Judge Andrew J. Peck accelerated the adoption of TAR from the bench in Da Silva Moore v. Publicis Groupe, litigators and corporate counsel have been exploring what it means to introduce technology to discovery. For many of us, that has meant exploring predictive coding and technology assisted review.
However, this technique has two significant challenges that make it difficult to employ in practice. First, predictive coding requires a subject matter expert to hand-label documents to create a “seed set.” Typically this expert is a senior attorney with practical matter experience, who is unlikely to have the time to devote to this effort. Second, predictive coding is constrained to looking within the four corners of the document, which means that it doesn’t take into account any of the broader ecosystem of documents, where the most valuable context lies.
At Caesars Entertainment, where I served for six years as Chief Counsel, E-Discovery and Information Governance, I searched for opportunities to test the most advanced technology available so we could continue to balance our growing legal risk against ever-growing volumes of electronic data.
I designed an experiment with the AI company Text IQ, which traverses huge scales of unstructured text to find sensitive information, such as privilege, responsive information, Personally Identifiable Information (PII), reputationally damaging information, and codewords.
In my experiment, I gathered all the 80,000 responsive documents that a team of humans had caught in a highly contentious litigation that we’d completed. In this case, we employed our proven skilled human managed review process to review 80,000 documents specifically for privilege. Of these, 1,200 went to second level privilege review.
I gave the same 80,000 documents to Text IQ, along with a list of attorney names, and the names and domains of outside counsel that we used in the manual review. This is the same supporting documentation that was given to human reviewers in the control portion. In a fraction of the time, Text IQ identified 1,800 of the 50,000 documents as potentially privileged. Then, Text IQ also produced “reason reports” to explain why its AI had identified each document as privileged.
The next question was: how much of what the human found did the AI find? The answer was that there was a 1:1 correlation. Every document that humans found, the AI found – plus 600 more.
Accompanying each of these documents, Text IQ produced a “reason report” to explain why its AI had identified each document as privileged. Text IQ also reported on the “shadow network” of attorneys and law firms that it had found in the data set that were not originally listed as known inputs.
For about a year, I replaced our large-scale first pass privilege review with the Text IQ system while continuing to run our standard human review to process in parallel to identify responsive privileged documents, knowing that only a fraction of what these attorneys uncovered would turn out to be responsive.
Having seen success in terms of speed and accuracy, I looked to expand my use case for this technology within our discovery process. If AI could automate privilege review, one of the most challenging stages of litigation, then what else could it automate? I applied technology from Text IQ in a pilot to test substantive responsiveness (or “first pass review”), running side by side with human review.
The results amazed me. The AI quickly knocked out 50% of the population of documents that were non-responsive, allowing me to drastically reduce time and money spent on paying attorneys to manually review documents. Text IQ also assigned scores to each document to enable the determination of sampling and cut-off.
From predictive coding to “true AI”
The reason for its success is that AI can address the two biggest challenges with predictive coding that limit its effectiveness in practice. First, predictive coding systems rely on relatively less powerful machine learning techniques than true AI. Even the most advanced predictive coding technologies begin with a small seed set of labeled documents. Because of its size and its predefined labels, this seed set will have biases, blind spots, and idiosyncrasies that only become magnified as the resulting algorithms are applied to a large-scale set of real data.
To clarify, when I use the term “AI,” I’m referring to unsupervised machine learning, where a system takes on huge scales of unstructured data and makes meaningful deductions that no team of humans could make, given the scale and velocity of the data.
The second practical challenge with predictive coding is that it is confined to the four corners of individual documents. Even a human being will find it difficult to take a document in isolation and decide whether or not it’s sensitive. That’s because so much of what makes a document sensitive has to do with the relationships between the people who are communicating, the context of the document within its larger ecosystem, and all the non-obvious connotations and subtexts that our language carries based on the social network we’re communicating with.
AI goes beyond the four corners of the document and incorporates context and relationships between parties and concepts within the document. Imagine an email where someone says: “I talked to Becky, and she said we need to “be careful how we proceed.” AI can combine Natural Language Processing with a deep understanding of the underlying social networks in order to deduce that “Becky” is “Rebecca Schmidt,” an attorney, and she is referencing a known legal risk in moving forward with a new land development deal because of the tricky regulatory environment in that jurisdiction.
As this AI becomes embedded, it has gained an “organizational fluency” that increases its accuracy over time. The result is that I could better manage latent risk, while also reducing time and costs. I even began to deploy Text IQ in order to contemporaneously run my first pass review with my responsive review on certain cases where we had ultra-short turnaround times that would not have been manageable under my traditional consecutive workflows. Ultimately, AI is not transforming the results of my discovery; it is also transforming the way I manage the discovery process itself.