"How E-Discovery Software Is Helping Battle COVID-19" - Robert Ambrogi
Below is an article originally written by Robert Ambrogi and published by Above the Law on June 29, 2020. This article includes information about PowerToFly Partner Relativity. Go to Relativity's page on PowerToFly to see their open positions and learn more.
The response contrasts with the legal profession's slow pace of adoption of cutting-edge AI technology.
Artificial intelligence software developed to help litigation attorneys get more quickly to the core of a case is now showing promise in helping medical researchers fast-track their inquiries into how to treat COVID-19.
At the University of Waterloo in Ontario, Canada, e-discovery pioneers Maura R. Grossman and Gordon V. Cormack have found a new use for machine-learning technology they developed to help attorneys more quickly sift through large collections of discovery documents — helping medical staff more quickly search massive databases of COVID-related clinical studies.
Meanwhile, data scientists and product managers at e-discovery company Relativity are employing several of their technology tools for the similar purpose of helping medical researchers more quickly review data sets of journal articles and medical literature with the goal of better equipping them to battle COVID-19.
In the Waterloo case, Grossman and Cormack are well known in the e-discovery field for their development of a technology-assisted review tool that uses a continuous active learning protocol. Of the various TAR or predictive coding tools on the market, theirs has been scientifically demonstrated to deliver the best results.
When the coronavirus crisis hit, Grossman, formerly e-discovery counsel at Wachtell, Lipton, Rosen & Katz in New York and now research professor and director of the Women in Science Program in the school of computer science at Waterloo, and Cormack, professor at the computer science school, had already been dabbling in the use of TAR to research health topics, she told me recently.
They saw a process that had many parallels to law, in that expensive medical researchers were spending large amounts of time reviewing hundreds or thousands of clinical studies, just as expensive lawyers spend large amounts of time reviewing documents in discovery.
Seeing an opportunity to help, they began working with the knowledge synthesis team at St. Michael's Hospital in Toronto, on behalf of the Canadian Frailty Network and Health Canada, to automate literature searches related to COVID-19.
The goal, as described in an article posted by the computer science school, was to help the team quickly identify clinical studies that have evaluated the effective and safety of various measures to keep nursing facilities safe, as well as treatments for patients with COVID-19.
Using their CAL technology, Grossman and Cormack have been able to help St. Michael's researchers complete in two weeks reviews that would typically take a year or more.
"Searching and finding studies for systematic reviews has traditionally been a time-consuming and laborious process that uses keyword search, followed by manual screening of abstracts, and finally full papers," Grossman said in the article. "We are instead training a machine learning algorithm to perform the initial steps in this task."
Analyzing COVID-19 Data
At e-discovery company Relativity, data scientists and product managers likewise saw a role for their technology and skills in helping in the fight against COVID-19. Recently, I discussed Relativity's response with Rebecca BurWei, senior data scientist; Andrea Beckman, director of product management; and Trish Gleason, product manager.
They were prompted to act after the White House Office of Science and Technology Policy released a massive dataset of COVID-19 medical research and issued a call to action to the tech community to develop text- and data-mining techniques to help scientists use the data to answer high-priority questions about COVID-19.
The tech community was encouraged to submit tools through Kaggle, a machine learning and data science community owned by Google Cloud, so that the tools would be openly available for researchers anywhere in the world. Kaggle sweetened the request with a $1,000 award for the tool that best met the project criteria.
Relativity responded using its existing AI and text-mining tools. Specifically, it offered four ways in which its technology could assist in facilitating the review of the data:
Elimination of duplicates. Deduplication is a task familiar to any e-discovery attorney, eliminating duplicate and redundant copies of email messages and other documents, in order to enhance the effectiveness of the AI software. When Relativity staff learned from the Kaggle forum that the COVID-19 researchers were seeing the same articles come up repeatedly, they saw a role for their deduplication technology. Using Relativity's Textual Near Duplicates and Repeated Content Identification tools, they reviewed the dataset and identified over 4,000 duplicate articles and a handful of commonly repeated phrases.
Tagging studies by language. Because the dataset included literature from throughout the world, articles were in many languages. Relativity used its Language Identification tool, which can identify text from 100 languages, and was able to tag over 52,000 COVID-19 journal articles by the language in which they were written. Relativity provided this language-tagged dataset to the Kaggle community, earning praise from a Kaggle community leader for having created a "great dataset."
Better keyword search of risk factors. Relativity's Conceptual Analytics uses a machine learning methodology called latent semantic analysis to extract insights and patterns from document data. Based on this technology, Relativity used keyword expansion to find concepts related to cancer and chronic respiratory diseases as risk factors for COVID-19. With those concepts, it was able to find 98 relevant journal articles that would otherwise have been missed.
Identifying pediatric patients. A goal of the Kaggle community's AI-powered literature review was to auto-fill summaries of COVID-19 journal articles, so that public health experts could decide quickly whether they needed to read the full article. Relativity contributed to this project by identifying and summarizing Spanish journal articles that involved asymptomatic pediatric patients.
Relativity's data scientists first used regular expression searches to filter down to a small number of relevant articles, then they experimented with new AI techniques not currently available in the e-discovery product, such as modern vectorizers and question-answer techniques, to automatically extract the ages of the study participants.
Rewarding Use Of Tech
For Grossman and Cormack at Waterloo and the product team at Relativity, using their e-discovery skills to help with COVID-19 research has been rewarding.
"What was most rewarding for me was the community angle and being able to help out during this crisis," said Relativity's Andrea Beckman. "We have a strong community in e-discovery, but here we got to join a different group and be part of everybody coming together in tackling a critical challenge."
Grossman drew a contrast with the legal profession's slow pace of adoption of cutting-edge AI technology such as TAR, due in part to its fear of losing the billable hour.
"Here we're in an area where the incentives are exactly the opposite, where there is receptiveness to something that will cut time and cut costs," she said. "It's refreshing to work in an area where the reception capacity and adoption rate is very different."
Robert Ambrogi is a Massachusetts lawyer and journalist who has been covering legal technology and the web for more than 20 years, primarily through his blog LawSites.com. Former editor-in-chief of several legal newspapers, he is a fellow of the College of Law Practice Management and an inaugural Fastcase 50 honoree. He can be reached by email at firstname.lastname@example.org, and you can follow him on Twitter (@BobAmbrogi).