Welcome to Module 2-C.
Search is the Core Problem of Electronic Discovery.
These three essays will take you further into the fundamental problem of e-discovery, indeed, it may even be the fundamental problem of our age. How to find the information you need when we are flooded by too much information, most of it irrelevant or unreliable? Google solved the problem for finding popular web pages on the Internet and made Billions overnight. No one has yet solved the problem in the more complicated area of evidence and the law. But fortunes and justice await those who can begin to master this core problem of e-discovery.
This module starts by examining search and Judge Paul Grimm’s landmark decision Victor Stanley I, and Judge John Facciola’s Disability Rights opinion. But it doesn’t stop there. In this class we will also include ideas from martial arts, especially Aikido, and how they might apply to e-discovery, in the context of another lesser known case, Perfect Barrier, LLC v. Woodsmart Solutions, Inc. Could it be that search is more about process and methods than magic software? Could cooperation be an essential part of search? Is search an art or a science? This is a challenging module so take your time studying this and pondering the ideas.
Hundredth Blog: Thoughts on SEARCH and Victor Stanley, Inc. v. Creative Pipe, Inc.
I wrote this up as my One Hundredth Blog. I wanted to think that like the mythical Hundredth Monkey, my blog writing would thereafter become much easier, an innate skill. Much like the task of e-discovery search, writing a 3,000 – 5,000 word essay on e-discovery each week takes time, effort, and careful planning to do right. In a way, as I will explain, this is the basic message of the most important case in the area of legal search, Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md., May 29, 2008).
This scholarly e-discovery Order was written by Judge Paul Grimm in Baltimore. He is one of the country’s top judicial experts on e-discovery and in 2010 became a member of the Federal Rules Committee. Judge Grimm fully understands that ESI search and review is a complex, learned skill. It is not an innate ability that every lawyer somehow picks up in law school when they are taught Legal Research 101. Lawyers need to treat search and review seriously, and either take the time necessary to become adept in this complex area, or employ experts who are. If not, the consequences can be devastating, as Victor Stanley shows. The defendants waived their attorney-client and work product privileges to 165 ESI files by their botched search and review before production.
Victor Stanley, Inc. v. Creative Pipe, Inc. and Reasonable Search
Judge Grimm’s forty-three page opinion is, on one level, a detailed ruling on waiver of attorney-client privilege. On another level, it is a treatise on e-discovery search and a guide to proving reasonable efforts. As Jason R. Baron said: “what Judge Grimm has done is give a road-map to lawyers in the United States on how to present to a court how they went about searching for relevant documents.”
Such proof may be required when a search fails and you are faced with sanctions as a result, or, as in this case, loss of privilege. In these circumstances, you may be required to prove that your search was reasonable, albeit, imperfect. As everyone in the industry knows, e-discovery is like golf, there is no such thing as perfect, and everybody, even a PGA tour professional, makes a few mistakes.
That is what the defendants in this case claimed, that the disclosure of the privileged documents was just an honest mistake, and there should be no waiver. Plaintiff agreed that it was a mistake, but denied it was an honest one, and even alleged that some of the ESI revealed fraud. They also challenged the adequacy of the search efforts. Judge Grimm did not directly address the dishonesty allegations, but did agree that no credible evidence was presented to establish reasonable search efforts. Primarily for that reason, Judge Grimm held that defendants disclosure of attorney-client and work product privileged ESI acted to waive those privileges.
That is a pretty scary ruling, especially for the vast majority of litigators in the U.S. who have strictly amateur status in the game of e-discovery. They cannot even begin to comprehend the skills and expertise developed by the likes of PGA touring professionals, much less the kind of practice and dedication they put into every round. Yet, Judge Grimm suggests that when it comes to privilege review at least, they had better improve their game. He does not expect everyone to attain the level of a top professional, but he does expect some time and attention to be put into the important task of ESI search. See Eg.: Clearone Communications, Inc. v. Chiang, 2008 WL 920336 (D. Utah, April 1, 2008) (parties and court labored over keyword search plan). You just cannot hope for the Hundredth Monkey Effect. Moreover, he suggests that some attorneys would be well advised to seek the help of a professional. For most cases, a simple club pro consult will do, but if it is a “bet the company” case, you might want to retain a touring professional.
Speaking of which, several of Jason Baron’s writings and research projects on search were cited by Judge Grimm in Victor Stanley, including The Sedona Conference, Best Practices Commentary on the Use of Search and Information Retrieval, 8 The Sedona Conf. J. 189 (2007), and the Text Retrieval Conference (TREC) sponsored by the National Institute of Standards and Technology. The TREC event is in its third year of scientific evaluations of various kinds of ESI automated search techniques, including the kind of lame keyword search that the losing defendants apparently ran in Victor Stanley. The results are surprising. They suggest that keyword searches alone, especially when poorly done without sampling and iteration, and without the help of more advanced techniques and technologies, will miss most of the documents sought. That finding should be alarming to anyone who does e-discovery, especially if you use keyword searches alone to try to protect against waiver of privilege in a massive production.
Judge Grimm’s opinion is a wake up call to all litigators who put blind trust into simple keyword searches, and think that anyone can do it. It is a dangerous delusion as this case shows. I call it the Myth of Google, where litigators think that since they can run a Google search, and also a Westlaw or Lexis search, that they can run an e-discovery search too. They think that since they know the case, they know what the best keywords are, and that is all it takes to find what they need. After all, it works for them on Google and legal research, so it should work on email search too. It never even occurs to the average trial lawyer that special expertise and training might be needed to find the needles in today’s electronic haystacks. They do not think they need an expert to help them formulate an adequate search strategy, including, but most definitely not limited to, formulating keywords.
Grimm’s tale is that when a search fails miserably, do not expect the judge to simply take your word for it that the efforts were appropriate. It is going to require some kind of expert evidence. In Judge Grimm’s words:
Assuming that the Plaintiff’s version of how Defendants conducted their privilege review is accurate, the Defendants obtained the results of the agreed-upon ESI search protocol and ran a keyword search on the text-searchable files using approximately seventy keywords selected by M. Pappas [Defendant] and two of his attorneys. Defendants, who bear the burden of proving that their conduct was reasonable for purposes of assessing whether they waived attorney-client privilege by producing the 165 documents to the Plaintiff, have failed to provide the court with information regarding: the keywords used; the rationale for their selection; the qualifications of M. Pappas and his attorneys to design an effective and reliable search and information retrieval method; whether the search was a simple keyword search, or a more sophisticated one, such as one employing Boolean proximity operators; or whether they analyzed the results of the search to assess its reliability, appropriateness for the task, and the quality of its implementation. While keyword searches have long been recognized as appropriate and helpful for ESI search and retrieval, there are well-known limitations and risks associated with them, and proper selection and implementation obviously involves technical, if not scientific knowledge.
It cannot credibly be denied that resolving contested issues of whether a particular search and information retrieval method was appropriate –in the context of a motion to compel or motion for protective order– involves scientific, technical or specialized information. If so, then the trial judge must decide a method’s appropriateness with the benefit of information from some reliable source– whether an affidavit from a qualified expert, a learned treatise, or, if appropriate, from information judicially noticed. To suggest otherwise is to condemn the trial court to making difficult decisions on inadequate information, which cannot be an outcome that anyone would advocate. . . . Indeed, it is risky for a trial judge to attempt to resolve issues involving technical areas without the aid of expert assistance.
Judge Grimm follows in the footsteps of Judge John Facciola, who has previously warned of the need for special expertise for appropriate searches in several cases:
Some have criticized Judge Facciola for these decisions, arguing that they unnecessarily drive up the cost of litigation. These same critics will now criticize Judge Grimm for joining his camp. They think that requiring expert input in discovery unnecessarily raises the bar of professional standards and forces litigation attorneys to retain yet another set of experts, e-discovery search experts, which clients can ill afford.
Perhaps it is self-serving on my part, but I strongly disagree. In my experience, experts in this area will save more money than their fee. They can effectively cull the data set down to a more manageable level where final review and production is far less expensive. The trial lawyer with no special skills or experience in e-discovery is likely to just copy and review everything. The keyword searches that I typically see performed by novices are a model of inefficiency, producing far too high a noise-to-hit ratio.
Judge Grimm anticipated and responded to these expense criticisms in footnote 10 of Victor Stanley. The footnote, which is three pages long, and is partially quoted above, makes several additional points explaining why such search experts are needed:
Instead, Judge Facciola made the entirely self-evident observation that challenges to the sufficiency of keyword search methodology unavoidably involve scientific, technical and scientific subjects, and ipse dixit pronouncements from lawyers unsupported by an affidavit or other showing that the search methodology was effective for its intended purpose are of little value to a trial judge who must decide a discovery motion aimed at either compelling a more comprehensive search or preventing one.
Viewed in its proper context, all that O’Keefe and Equity Analyticsrequired was that the parties be prepared to back up their positions with respect to a dispute involving the appropriateness of ESI search and information retrieval methodology–obviously an area of science or technology–with reliable information from someone with the qualifications to provide helpful opinions, not conclusory argument by counsel.
The message to be taken from O’Keefe, Equity Analytics, and this opinion is that when parties decide to use a particular ESI search and retrieval methodology, they need to be aware of literature describing the strengths and weaknesses of various methodologies, such as The Sedona Conference Best Practices, supra, n.9, and select the one that they believe is most appropriate for its intended task. Should their selection be challenged by their adversary, and the court be called upon to make a ruling, then they should expect to support their position with affidavits or other equivalent information from persons with the requisite qualifications and experience, based on sufficient facts or data and using reliable principles or methodology.
For those understandably concerned about keeping discovery costs within reasonable bounds, it is worth repeating that the cost-benefit balancing factors of Fed. R. Civ. P. 26(b)(2)(C) apply to all aspects of discovery, and parties worried about the cost of employing properly designed search and information retrieval methods have an incentive to keep the costs of this phase of discovery as low as possible, including attempting to confer with their opposing party in an effort to identify a mutually agreeable search and retrieval method. This minimizes cost because if the method is approved, there will be no dispute resolving its sufficiency, and doing it right the first time is always cheaper than doing it over if ordered to do so by the court. Additionally, cost can be minimized by entering into a court-approved agreement that would comply with Hopson, or if enacted, Proposed Evidence Rule 502. In addition, there is room for optimism that as search and information retrieval methodologies are studied and tested, this will result in identifying those that are most effective and least expensive to employ for a variety of ESI discovery tasks.
Proper search is the cornerstone of e-discovery, and key to controlling costs. Since most of the cost of e-discovery lies in review expenses, estimates range from 50% to 80%, our efforts should be focused on searches that reduce the amount of ESI to be reviewed. Obviously, the better the search, the more chaff is separated from the wheat. We do not want our reviewers reading chaff – every minute a reviewer spends reading an irrelevant email is a minute wasted. Here experts can help and should be consulted at the very beginning of the case, at the same time as the litigation hold notices. If you are going to consult a pro, it only makes sense to do so before the round begins, not on the eighteenth tee.
Judge Grimm takes pains to point out that search is not only important, but requires a high level of skill to do properly. He also provides some suggestions on how to do that:
Use of search and information retrieval methodology, for the purpose of identifying and withholding privileged or workproduct protected information from production, requires the utmost care in selecting methodology that is appropriate for the task because the consequence of failing to do so, as in this case, may be the disclosure of privileged/protected information to an adverse party, resulting in a determination by the court that the privilege/protection has been waived.
Selection of the appropriate search and information retrieval technique requires careful advance planning by persons qualified to design effective search methodology. The implementation of the methodology selected should be tested for quality assurance; and the party selecting the methodology must be prepared to explain the rationale for the method chosen to the court, demonstrate that it is appropriate for the task, and show that it was properly implemented. In this regard, compliance with the Sedona Conference Best Practices for use of search and information retrieval will go a long way towards convincing the court that the method chosen was reasonable and reliable, which, in jurisdictions that have adopted the intermediate test for assessing privilege waiver based on inadvertent production, may very well prevent a finding that the privilege or work-product protection was waived.
Since I consider search so important, many of my first 100 blogs have addressed this topic.
Defendant’s Failure in Victor Stanley to Prove Reasonable Search Efforts Results in Loss of Attorney-Client Privilege
Judge Grimm, in this case, found the defendants’ search efforts to be negligent.
In this case, the Defendants have failed to demonstrate that the keyword search they performed on the text-searchable ESI was reasonable. Defendants neither identified the keywords selected nor the qualifications of the persons who selected them to design a proper search; they failed to demonstrate that there was quality-assurance testing; and when their production was challenged by the Plaintiff, they failed to carry their burden of explaining what they had done and why it was sufficient.
Further, the Defendants’ attempt to justify what was done, by complaining that the volume of ESI needing review and time constraints presented them with no other choice is simply unpersuasive.
Since their review was negligent, or at least not proven to be adequate, the defendants were found to have waived their privilege to the 165 documents that they accidentally produced to the plaintiff. Bear in mind that the defendants produced tens of thousands of documents in this same production, and so percentage wise, the mistake was very small, less than one percent. (Even so, Judge Grimm thought that 165 documents was a lot to miss, and suggested he might have reached a different result if only a couple had been missed.) Based on the the high number of electronic files that the defendants had to review for privilege, you might be surprised by the seemingly strident tone of the opinion. The defendants were, after all, being stripped of their attorney-client privilege, which is a fundamental right recognized by the Supreme Court since 1826. Here are the Judge’s Grimm words:
Thus, the disclosures were substantive- including numerous communications between defendants and their counsel. . . . any order issued now by the court to attempt to redress these disclosures would be the equivalent of closing the barn door after the animals have already run away.
Every waiver of the attorney-client privilege produces unfortunate consequences for the party that disclosed the information. If that alone were sufficient to constitute an injustice, there would never be a waiver. The only “injustice” in this matter is that done by Defendants to themselves.
But when you dig deeper into the record of this case, you see how restrained his opinion is, and how these defendants really did get what was coming to them.
The Bad Facts Behind the Victor Stanley Law
It is true that defendants produced nearly 39 gigabytes of ESI, comprising tens of thousands of documents and unsearchable image files, such as engineering drawings and photographs. It is also true that the sheer volume of the ESI involved would weigh in favor of leniency for an accidental production of 165 files. But, when you dig deeper, and not only closely study the whole opinion, but also delve into the voluminous record in this case, you find numerous countervailing considerations. You can only guess at some of these factors because parts of the record are still sealed, including the 165 documents at issue. Still, this record is filled with smoke suggestive of bad faith. The total record helps explain this decision, and makes it easy to distinguish. For instance, it is also true that:
I could go on, but you get the picture. The case itself is also interesting, involving allegations of unfair competition based on lying about whether goods were made in China, and violations of copyright. But, at this point, these are all just allegations. The remaining discovery and trial should soon reveal much more. When it does, I will look again at this case to see what, if any, fire is behind all of this smoke.
In the meantime, don’t fall for the Myth of Google, or let your friends fall for it either. Search and review are learned skills of some complexity and require adequate tools to perform correctly. Maybe 100 other lawyers and information scientists can do it, but that does not mean the skill has somehow magically transferred itself to the rest of the Legal or IT professions.
Although the Hundredth Monkey is an inspirational story and may work for ideas, it is based on bad science and does not work for skills. Complex skills of any kind, from monkeys washing sweet potatoes, to lawyers searching emails, to golfers striking a ball, all have to be individually learned. They cannot be learned by some, until a magic numerical threshold is passed, and then instantly transmuted through fields of energy and suddenly ingrained in everyone else. Sorry, it looks like we will all have to do the work. We cannot just wait for others to learn these e-discovery skills, and then hope to wake up one day with their hard earned abilities. If so, considering the number of professional golfers there now are in the world, we should all be breaking 100.
Concept Search v. Keyword Search
An opinion by Judge Facciola distinguishes between keyword searching and concept searching. Disability Rights Council of Greater Wash. v. Wash. Metro. Area Transit Auth., 2007 WL 1585452 (D.D.C. June 1, 2007). The plaintiff had proposed simple keyword searching of email by people’s names, but Judge Facciola suggested the parties instead consider concept searching. This is the first opinion to recognize the distinction between the two types of searches according to Jason R. Baron, Director of Litigation of the National Archives and Records Administration. Jason should know, as he is an expert and strong proponent of concept searching. Indeed, Judge Facciola cites to his article in the opinion. Here is the operative language from Disability Rights Council at *9:
I bring to the parties’ attention recent scholarship that argues that concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results. See George L. Paul & Jason R. Baron, Information Inflation: Can the Legal System Adapt? 13 Rich. J.L. & Tech. 10 (2007).
Concept searching is just one of many cutting edge ideas discussed in Paul & Baron’s article. It pertains to promising new software technology that may allow for far better searching than simple keyword matching. But before I go into that, a little more about the interesting Disability Rights Council case itself.
The defendant Transit Authority configured their Groupwise email system so that all emails were automatically deleted after only 60 days. The only exception was when a user went to the trouble to archive a particular email. These archived emails were not deleted. In practice, few Transit Authority users ever bothered to archive any of their emails, and so after 60 days almost all were deleted. Nothing wrong with such a system in principle, but the problem here is that it was not suspended when suit was filed. In fact, in what the court stated was “remarkable” and “indefensible”, the defendant continued to destroy all emails for over two years after the suit was filed.
The opinion begins by noting that the “safe harbor” of new Rule 37(f) was not intended to apply to this situation, at least insofar as the emails destroyed after the suit was filed are concerned. The rule requires “routine” and “good faith” operation of a system. Although it was routine destruction, the court did not consider it to have been carried out in good faith after suit was filed. That is primarily because we are talking about the destruction of live ESI, namely email still on the system, and not on back-up tapes. A preservation hold should have prevented this. After the live emails are so destroyed, the only place to find them is on the backup tapes. For that reason, among others, even though the court agreed with defendant that the backup tapes were not reasonably accessible under Rule 26(b)(2)(B), it nevertheless found good cause to order that they be restored and searched at defendant’s expense. To hold otherwise would reward defendant for destroying relevant emails, leaving the backup tapes as the only remaining source of the evidence. The court rejected this at *8 with a humorous touch:
While the newly amended Federal Rules of Civil Procedure initially relieve a party from producing electronically stored information that is not reasonably accessible because of undue burden and cost, I am anything but certain that I should permit a party who has failed to preserve accessible information without cause to then complain about the inaccessibility of the only electronically stored information that remains. It reminds me too much of Leo Kosten’s definition of chutzpah: “that quality enshrined in a man who, having killed his mother and his father, throws himself on the mercy of the court because he is an orphan.”
Judge Facciola rejected Defendant’s undue burden and expense inaccessibility arguments, and granted Plaintiff’s motion to compel. He then ordered the parties to meet and discuss how the backup tapes will be restored, and as mentioned, how to search the restored emails, either through keyword as plaintiff had proposed, or via concept search as the judge suggested might be more efficient. (As a postscript, I understand the parties met, but instead of agreeing on search, they settled the case instead.)
Of course, keyword searches have been around for decades and are familiar to any lawyer who has ever done computer research. You can, for instance, run a computer search of hundreds of thousands of emails to find all emails that include one or more of a list of names, as plaintiff here proposed. This takes just seconds, but can produce a high percentage of irrelevant emails; ones that include the names but have nothing to do with the case. It can also omit many relevant emails that just do not happen to include the keywords you guessed a relevant email would have (or perhaps included them, but misspelled them, a problem not often found with computerized legal research).
The use of complex Boolean connectors (such as directives that one term be within the same paragraph as another, or that an otherwise included email be excluded if it contains certain terms, i.e. “but not”) can sometimes improve on the search. So too can the use of so called “fuzzy logic.” But even with the use of Boolean and fuzzy logic, these keyword searches, also known as “theoretical set” searches, are still largely a guessing game. In practice, they often fail to uncover too many otherwise relevant emails, without significantly reducing the irrelevant ones.
A search that creates a lot of noise, that is, one that produces too many irrelevant emails, can create very significant time and expense burdens on all the parties, but especially on the producing party. If for instance the search creates a list of 100,000 emails, the producing party will have to review all of these emails for possibly privileged communication before production. This is a very expensive undertaking, and although clawback agreements provide some comfort, they cannot obviate the need for, and expense of, the privilege review. It is also expensive for the receiving party who also has to spend time and money to review the irrelevant emails.
Therefore, if there is a better search method than keyword that can produce a high percentage of relevant hits, and thus less noise and less wasted time for privilege review, it is to the advantage of all parties to use it. Moreover, it is a potentially very valuable product. There are several software vendors who have created alternative search algorithms to keyword searches. All are sometimes lumped together as “concept searches.” They use a variety of methods, involving such things as contextual usages, algebraic modeling and probabilistic categories. The exact formulas are usually kept secret by the software vendors for obvious reasons, but most are prepared to provide expert testimony in court if necessary to justify the legitimacy of their search methods.
Paul & Baron’s article at pages 26-27 summarizes the existing state of information retrieval science in a “mind boggling,” but eloquent manner:
However, broadly speaking, information retrieval methods fall into three broad classes: set-theoretic (Boolean strings, supplemented by fuzzy search capabilities), algebraic (premised on the mathematical idea that the meaning of a document can be derived from the constituent terms in a document, and thus weighting retrieval by the proximity of a document’s terms in the form of two or higher dimensional maps, as in vector space modeling), and probabilistic (using language models and Bayesian belief networks, the latter of which involves making educated inferences about the relevance of future documents based on prior experience in reviewing documents in a given collection).
In thinking about retrieval problems, one can also supplement all of these methods by focusing on the language used by the creators of the records, which will include using taxonomies and ontologies, essentially synonyms of words and relevant classes of related words to be developed and built in at the front end of a search process to better refine the search, and to maximize both recall and precision. In contrast to strict set-based Boolean techniques, the above algebraic and probabilistic categories of search methods are often broadly termed under various forms of the heading “concept searching.”
A review of the various vendors who offer such marvels will have to await another time, as there are several of them now, and it is a growing field. Still, this is definitely something that you should look into before agreeing to simple keyword searches, especially if the volume of ESI to be reviewed is high. The concept search software fees are probably too expensive for small volume or low dollar cases, but they could be a huge money saver for a larger case. Most vendors will provide you with an idea of the price break point based on the byte size of the ESI. Of course, it is more than the number of megabytes involved. You also need to consider the case subject matter, the amount of money involved, and the importance and complexity of issues.
SUPPLEMENTAL READING AND EXERCISE: Why do you think search in e-discovery is so hard? What other course explains the generally accepted solution to this problem, at least as of 2017? Hint it is also free, but is for advanced practitioners.
Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!
Copyright Ralph Losey 2015