It has been a whirlwind year for the Open AI-backed, generative AI legal tech startup Harvey, which went from a $5 million seed round in November 2022 to a $21 million Series A in April 2023 to an $80 million Series B in December 2023 at a valuation of $715 million.
It was a year that included some big wins, including the decision in February 2023 by Allen & Overy, one of the world’s largest law firms, to integrate Harvey into its global practice, where it could be used by the firm’s more than 3,500 lawyers across 43 offices operating in multiple languages.
Just a month after that, Harvey and PwC announced a global partnership to give PwC’s Legal Business Solutions professionals exclusive access among the Big 4 to Harvey’s AI platform. More recently, in what they described as a “significant step beyond the exclusivity agreement,” Harvey and PWC announced a strategic alliance, that also included Harvey investor OpenAI, to train and deploy foundation models for tax, legal and human resources.
In September, another major firm, Macfarlanes, announced that it would roll out Harvey firmwide, after an initial pilot program, and last month, Forbes named Harvey to its AI 50, recognizing the most promising privately-held AI companies – the only legal-specific AI on the list.
But even with such dramatic traction for a company that is still less than two years old, there has remained an air of mystery around Harvey. The product continues to be in an early-access phase; few, other than select early-access customers, have seen the Harvey product; and the company’s founders, Winston Weinberg, its CEO, and Gabriel Pereyra, its president, have given few media interviews.
But that is about to change, the two founders told me in an interview earlier this week. They will be coming out of the early-access phase during the third quarter of the year and launching versions that they say will be more affordable to firms of various sizes, depending on their needs.
They also say that they will be showing the product more often, attending more industry conferences, and speaking more often with the press.
From Custom to Commercial
Part of the reason they have been so stealthy, Weinberg and Pereyra say, is that they have been nose-to-the-grindstone building highly customized models for the large law firms they serve.
Customization of its AI model has been Harvey’s trademark, in a sense, and a key differentiator from other popular legal AI products, most notably CoCounsel, the AI legal assistant developed by Casetext and acquired by Thomson Reuters.
More recently, however, Harvey has been building custom models and related products that can be sold commercially to multiple customers. These include the models it is building with PWC for tax, legal and human resources; new case law research models it is developing in partnership with OpenAI; and a new Vault product that will allow customers to apply generative AI capabilities to large document collections.
It also recently launched its product on the Microsoft Azure Marketplace, where it is offering a Harvey on Azure version of its product.
“Deploying on Microsoft Azure is a key milestone for Harvey,” Weinberg, who is Harvey’s CEO, said in announcing that launch. “This collaboration allows us to use Azure’s robust cloud capabilities to enhance Harvey’s vision, making it more powerful and accessible for businesses across the world.”
Later this year, in a move designed to make Harvey more accessible to a broader number of lawyers, it will begin offering commercial access to some of its products. It will start with its case law models, and then begin offering bundles of its products.
Customers will have the option of choosing a bundle of products that will include its AI assistant, various case law or specialized research models, its Vault for large document collections, and custom models or projects.
There will also be prepackaged bundles that will be offered at a discounted price.
Case Law Research
According to Weinberg and Pereyra, a key step in Harvey’s move towards broader commercial availability has been its partnership with OpenAI to build custom-trained caselaw models. They have already done this for U.S. law and will now be adding other jurisdictions.
In seeking to develop a research solution, they found that simply fine-tuning a foundation model such as GPT-4 or using retrieval augmented generation (RAG) was not sufficient to produce the level of results required for legal work.
Instead, the case law system they built is a combination of a large foundation model pre- and post-trained on all of U.S. case law and a case law search system that the model leverages.
It uses a combination of legal specific data preprocessing, hybrid search, pre-training, post-training, multi-stage reasoning, retrieval and custom fine-tuned embeddings, and legal specific answer postprocessing, Weinberg said.
“At a high level, the system we’ve developed performs legal research much like an associate would, taking a complex research query and performing case law searches, analyzing the results and eventually synthesizing all the information to provide an accurate result for the users,” Weinberg said.
“We’ve built a number of legal-specific solutions for the search and answer system including extracting citation graphs, procedural posture, and fact patterns from cases to improve search and detecting case hallucinations, inconsistent arguments, and instances of weak case support in answers.”
This legal research model can be used for traditional research and will also be able to be used for more complex workflows, such as cross-jurisdictional surveys, brief drafting, issue spotting in large sets of discovery or investigative documents, and litigation risk analysis.
Harvey also plans to partner with government entities to use these models to advance access to justice by making case law more accessible.
Already, the company has partnered with court officials in Singapore to help pro se litigants in small claims cases get answers to their legal questions in order to better understand their potential claims or defenses.
PWC Partnership
Through its partnership with PwC, Harvey is developing a series of custom built models focused on the areas in which PwC has domain expertise – tax, legal and human resources. While Harvey provides the AI technology, PwC provides both the intellectual property and the domain experts to fine tune and train these models.
As these models are developed, Harvey and PwC will jointly go to market with them, both selling them through their own channels. These will be sold a separate, standalone products, not part of the bundles described above.
Vault
The Vault product is being designed to enable customers to use generative AI to explore large collections of thousands of documents, either by asking natural language questions of the documents (“Ask Query”) or performing specific tasks against the set, such as finding and summarizing certain language (“Review Query”).
For example, Weinberg said, over a set of 1,000 master service agreements, an Ask Query might be, “Has the company ever executed an MSA with Oracle?” A Review Query could be, “Create a chart showing me every contract that has change-of-control provisions and, for those that do, tell me if the contract allows for termination on a change of control.”
Custom Models for Large Firms
Even as it develops these new products to makes its technology more widely accessible, Harvey is continuing to build products for large law firms.
Specifically, it is building a platform that allows firms to securely train generative AI systems on all their private data, integrate with their existing legal tech software and workflows, and continuously learn from their legal workforce.
“Unfortunately, it isn’t as simple as training a single model on all their documents,” Weinberg said. “Doing so would result in data leakage.”
“Instead, we are building a suite of tools that allow law firms to train, evaluate and deploy generative AI systems that respect data privacy, ethical walls, and client privacy while still being highly accurate and performant,” he said.
For its largest partners, Weinberg said, Harvey is building “hyper specialized” systems for their most complex use cases.
With PwC, for example, Harvey is building foundation models in every tax jurisdiction that can answer complex tax questions over tax codes and legislation as well as perform tax due diligence and more complex scenario-based evaluations, Weinberg said.
“This system is integrated within PwC’s broader tax practice and our models leverage PwC’s other third-party vendors, their IP, and their internal software solutions and can produce reports in their historical format.”
Weinberg said that, given the complexity of such a project, the development cost could exceed $5 million, “but we are starting to find ways to provide more affordable versions to our clients and hope to do so more as we scale.”
Hiring Loads of Lawyers
When they founded Harvey in 2022, Weinberg was a former associate at law firm O’Melveny & Myers and Pereyra was a former research scientist at DeepMind and machine learning engineer at Meta AI.
When we spoke this week, they said that one of the most interesting aspects of their growth over the past year has been learning to run a company of that scale.
But they agreed that one of the aspects they are most proud of is their hiring, which has been primarily of engineers and lawyers. In fact, of the company’s 120 employees today, almost half are lawyers – many of them lawyers with big firm or corporate pedigrees – who have been brought on as domain experts.
While it is not unusual for a legal tech startup to hire lawyers, it is typically for either a non-lawyer role as a product manager or salesperson, or it is for a lawyer role as an inhouse counsel.
But Weinberg said that most of Harvey’s lawyers are working in research roles – not to research the law, but to research lawyers’ workflows and processes around specific tasks.
“They’re saying, ‘How would I have done this task when I was at Latham or I was at Kirkland? How did I do disclosure schedules? How did I do case law research? How did I do complex summarization? How did I draft a brief?’ They’re taking all of those tasks and then mapping them onto AI.”
Weinberg and Pereyra talk about this as “process data” – the process by which a lawyer performs a task. When a lawyer sees a case for the first time, or a merger agreement, or an NDA, the lawyer has a process for analyzing that document. That is the kind of data they want to build into Harvey, and why they are hiring so many lawyers.
“Lawyers have to be trained for many years to do this, and that data – that process data – isn’t publicly available anywhere,” Weinberg said.
“The thing that’s very important is how you train the AI, and you need a combination of domain experts and AI engineers,” Weinberg said. “I think that there aren’t tons of companies that are doing that at the application layer in any vertical right now – I think the thing that’s missing in a lot of these companies is the domain experts.”
But while much of the focus so far of these lawyers on its staff has been on building custom models for large firms, Weinberg and Pereyra said they wanted to expand access to their technology, which is the reason for the new products they will be launching in the third quarter.
“A lot of the firms out there, they can’t afford to do these large, massive customizations,” Weinberg said, “so let’s build things for them that they can also use and that are great as well.”
That commercialization will be rolled out in stages, beginning with the case law models and the Vault product. Next will come the generally available bundles of products. Towards the end of this year or early next year, they will introduce a self-service model.
Of course, all the while, they will continue to offer customizations for large enterprise customers.
A question I hear often is how Harvey compares to CoCounsel, the Thomson Reuters legal AI assistant, and so I put that question to Weinberg and Pereyra.
There are two main differences, they said. One is the customization they are doing for larger law firms, something that TR does not offer with CoCounsel.
The other is the approach they are taking to building their product. On the backend, they have trained multiple models for specific tasks and then, effectively, chained them together, so there is a model for clause extraction and a different model to handle a specific type of query.
“It’s actually similar to how a law firm works, where you have a request from the client and then the partner breaks that request into ten other requests and sends it to a specialized person who does that task, and then they all combine those together,” Weinberg said. “That’s what’s happening on the back end.”
Commitment to A2J
I mentioned above Harvey’s partnership with the courts in Singapore to assist pro se litigants there, and Weinberg and Pereyra say they are committed to serving access to justice on an even broader scale.
They said they will give their product for free to court systems or A2J organizations. “This is something we want to do a lot more and in the U.S. too, to build these systems and then give them to the court for free.”