Welcome to Module 4-F.
Chaos Theories and Information Science
This chapter examines the influence of Chaos Theories, Fractals and Information Science on Electronic Discovery. This class is based on a book by James Gleick: The Information: a history, a theory, a flood. I like Gleick. His first book, Chaos: making a new science (1987) was inspirational. I was hoping for more of the same from The Information. Unfortunately, the book did not deliver. But with effort I was able to find some meaning from his new book, and applications to e-discovery. This class shares these insights. Although this book is a good read, despite its difficulty, students are not required to read it to pass this course, or even to understand this module.
The Influence of the Science of Chaos on My Ideas About e-Discovery
Before I go into Gleick’s latest book, and why it was disappointing, I have to go into his first book, and why, for me at least, it was so good. Chaos was a clear and stimulating book. It was filled with big ideas that came together at the end and made sense. Gleick’s book helped motivate me, a lawyer with little or no scientific training, to read more to try to understand the new science of chaos. I started studying math and geometry, including some far-out stuff like Complex imaginary numbers and infinite recursive geometries. That was a big stretch for a liberal arts based lawyer like myself.
Much of the science went over my head, especially the advanced math, but still my long experience with computers allowed me to understand how Chaos theories emerged from their information processing power. Apparently many other people were interested in Chaos theories too, and how they might apply to their life, as the number of science and math books on the subject in the stores exploded to meet the new demand. The butterfly’s wings had flapped and many of us have never looked at the world the same again.
When I read Chaos, it led, among other things, to further study and appreciation of the great French mathematician, Benoit Mandelbrot, who spent most of his life employed by IBM. That led to my in-depth study of fractals, especially the famous computer generated fractal that Mandelbrot discovered and now bears his name (shown below). This fractal demonstrated the hidden order behind chaos. It taught me about recursive self-similarity over scales of magnitude.
The ability to bring order out of chaos by iteration and simple mathematical processes is a key insight of contemporary science and math. This insight later inspired my own work in electronic discovery. It is the basis of my later invention of several new legal methodologies. These are methods designed to find relevant information through iterative processes, sampling, cost projections, and communications between counsel; new methods designed to find the needle in the haystack without breaking the bank.
The latest research in TREC Legal Track seems to be confirming the validity of this iterative sampling communicative approach. Baron, J.R., Law in the Age of Exabytes: Some Further Thoughts on “Information Inflation” and Current Issues in e-Discovery Search, Richmond Journal of Law and Technology, Vol. XVII, Issue 3 (Spring 2011) at Fn. 92. Advanced concept search software with latent semantic indexing, and the like, certainly help with the key problems of search. Id. at Fns. 123, 124. But software and other new technologies alone will not work without new processes and methods – entirely new methods of human judgment that use iteration, cooperation, sampling, and quality controls. Search depends on both advanced software algorithms and processes controlled and designed by lawyers.
Maura Grossman and Gordon Cormack explained the findings of 2010 TREC Legal Track in the Conclusion to their article, Technology Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law and Technology, Vol. XVII, Issue 3 (Spring 2011):
The particular processes found to be superior in this study are both interactive, employing a combination of computer and human input. While these processes require the review of orders of magnitude fewer documents than exhaustive manual review, neither entails the naïve application of technology absent human judgment.
Human judgment, combined with technology and fractal iterative sampling processes, is the most efficient way to sort through large volumes of information and find the key information needed. Modern thought on e-discovery is, to me at least, a natural outgrowth of the chaos theories that have been driving science since the 1970s.
Since Gleick had inspired me before with his book, Chaos, I was hoping for the same experience with The Information. I was hoping it would show a way to really understand the intellectual foundations of computer science and information science. I thought maybe The Information would have insights and meaning that would make me a better e- discovery lawyer, maybe even a better person. (Yes, I have high hopes for books!) The Information offered the promise of explaining the latest science and math behind the Information Age, the explosion of information that drives electronic discovery law today. The Information offered the promise of new substance and meaning.
My high hopes and history help explain my disappointment with The Information. It has some good moments. I learned things that I did not know before about the history of cybernetics, computer science, math, and the men and women involved. I got more information, but little meaning. Overall the book seems hollow, too technical, and, unlike Chaos, it did not all come-together in the end. It was a struggle to read and there was no real pay-off. I finished this over 500 page book tired and disappointed. I expected inspiration and meaning. I expected knowledge, perhaps even wisdom from Gleick. But instead, all I got was information, too much information (TMI).
The book in my opinion contributes to the problem that it points out; that raw information is a flood, a deluge, that is leading to us to more chaos and entropy. As an e-discovery lawyer I know all too well about TMI, about the challenges of finding select information in a chaotic system of large corporate enterprises. Now, with The Information, I have more TMI about TMI. For a full and proper book review see the NY Times article. My role is not to review books, but to look for their meaning and impact on the law.
Since Gleick’s The Information did not really provide answers. I was forced to try to figure it out for myself, to find the meaning of the information presented in the book. This was my way to fight the entropy the book had created in my mind. The alternative was just to forget the whole thing, which, I suspect, is how most readers will react. Still, I had invested a lot of time in reading The Information, so I figured a few more hours to try to sort it out was worth the effort.
My Interpretation of the Ideas in The Information
Here is what I think the book is saying. Anyway, it is how I make sense of the information in The Information.
I think this is what Gleick is saying in The Information. But who knows, I could be wrong. My summary of the information in The Information could be misinformation, but for me it has meaning.
When asked about all of this in an interview by Publishers Weekly Gleick did his best to explain:
By the technical definition, all information has a certain value, regardless of whether the message it conveys is true or false. A message could be complete nonsense, for example, and still take 1,000 bits. So while the technical definition has helped us become powerful users of information, it also instantly put us on thin ice, because everything we care about involves meaning, truth, and, ultimately, something like wisdom. And as we now flood the world with information, it becomes harder and harder to find meaning. That paradox is the final tension in my book.
Chaos Theories, Information, Search, and e-Discovery
So what is this all supposed to mean to electronic discovery? In responding to lawsuits we must search through information stored in computer systems. We are searching for information relevant to a dispute. This dispute necessarily developed and took final form after the information was created and stored, and well after the information storage systems were designed. Information is not stored by anyone or any organization according to a future order of relevance that is unknown at the time of storage. For purposes of our litigation, of finding information relevant to the issues in our case, information storage systems are always too entropic. They are always inadequately ordered, as far as the lawsuit is concerned, even if they are otherwise well-ordered, which in practice is very rare (think random stored PST files and personal email ac- counts). Since time is an impenetrable barrier, for our purposes as evidence finders we are always dealing with inadequately ordered information.
Information can only be stored and ordered according to what is known. Lawsuits before filing are just latent future events whose contours and order are never fully known, even if their potential is recognized and precautions taken to avoid or minimize litigation risks. It is sort of like quantum mechanics. The exact positions of electrons and photons are just probabilities. Perhaps someday quantum computers will overcome this difficulty. (Google and others are working on limited versions of it now.) Perhaps someday we will be able to store information so that it can be easily retrieved for a dispute that has not yet materialized, for a relevancy not yet formed, not yet observed a la Heisenberg. But, with the limited capacities of today’s finite based, non-quantum computers, we cannot organize information for near infinite variables, for purposes of uncertain events that have not yet occurred. We cannot store information according to issues in a dispute or litigation that is not yet at hand.
Due to the limitations of time, and the complexity — the chaos — inherent in possible future events, we lawyers are always essentially dealing with disordered information. We search through information that has a high degree of entropy and meaninglessness to our case. The information we search through is usually not completely random. There is some order to it, some meaning. There are, for instance, custodian and time parameters that assist our search for relevance. But the ESI we search is never presented to us arranged in an order that tracks issues that were just raised by a new law suit. The ESI we search is arranged according to other orders. Sometimes the order behind the ESI we search is very weak, like most email systems, and sometimes very strong, like databases. But it is always disordered for purposes of relevancy. It is our job to find the hidden order, to bring order to the chaos by separating the relevant information from the irrelevant information. We search and find the documents that have meaning for our case. We use sampling, metrics, and iteration to achieve our goals of precision and recall.
Once we have separated the relevant from the irrelevant, which in large ESI collections is a process that iterates until budgetary constraints are reached, we have moved from information to knowledge. We have added meaning to the raw bits and bytes. But our work is not finished. All relevant information is not produced, much less useful. Further knowledge refinement is required. More yes-no decisions must be made. Is this piece of information privileged and thus excluded from production?
Even after the knowledge is further enhanced, and a production set is made. Our work is still incomplete. In litigation sorting and gathering relevant producible information, the evidence, is not enough. There is almost always far too much of this knowledge to be useful. The knowledge must be further processed. Relevancy itself must be ranked. The relevant documents must be refined down to the seven or fewer documents that will persuade the judge and jury to rule our way, to reach the yes or no decision we seek. The vast body of knowledge, relevant evidence, must become wisdom, persuasive evidence.
Metrics of Meaning in e-Discovery
In a typical significant lawsuit, the metrics of this process are as follows: from trillions, to thousands, to a handful. (You can change the numbers if you want to fit the dispute, but what counts here are the relative proportions.) In a typical case today an enterprise stores from three trillion to seven trillion computer files in its computers (3,000,000,000,000 – 7,000,000,000,000). A competent e-discovery team is able to reduce this down to thirty-thousand to seventy-thousand files that are relevant (30,000 – 70,000). (Maybe the e-discovery team can do even better than this, and reduce to 3,000 to 7,000 files. It depends on many things, including primarily cooperation.) This is the knowledge of the lawsuit gathered from the raw information. Many think this is what e-discovery is all about: find the relevant evidence, convert information to knowledge. But it is not. It is just the first step: from 1 to 2. The next step, 2 to 3, is more difficult and far more important.
The relevant evidence, the knowledge of the case, is still too vast in today’s trillion-file world. The human brain can, at best, only keep seven items in mind at a time. Tens of thousands of documents, or even thousands of documents, are not helpful to human jurors. It may all be relevant. But it is not all important. All trial lawyers will tell you that trials are won or lost on only five to nine documents. The rest is just noise, or soon forgotten foundation.
So the final step of information processing in e-discovery is only complete when the 30,000 – 70,000 files are winnowed down to 6 to 5 of less. That is the final step of information processing, the distillation from knowledge to wisdom. Our challenge as e-discovery team members is to take TMI and turn it into wisdom –the five to nine documents with powerful meaning that will produce the yes or no decisions we seek.
From Three Trillion to Three, from just information to practical wisdom — that is the challenge of chaos and entropy in the law today. That is the challenge of justice in the Information Age. How to meet that challenge? How to self-organize the needed order from the chaos of TMI? Iterative, cooperative, communication processes that employ advanced technologies, sampling, metrics, and sound human judgment. The answer is fast becoming clear to every specialist. What was once a novel invention is rapidly becoming an obvious solution. That is how information works. What was novel one day, even absurd, can very quickly become commonplace and establishment. We are processing information faster than be- fore.
The pace of change quickens as information and communication grows. New information flows and inventions propagate. The encouragement of such negentropic innovation is the basis of our patent laws, the basis of our commerce. The right information at the right time has great value.
Just ask any trial lawyer armed with four powerful documents — four smoking guns. They are what make or break a case. The rest is just so much background noise, relevant but unimportant. Wisdom is what counts, not information, not even knowledge. The challenge of Law and Justice in our Information Age is to never lose sight of this fundamental truth, this fundamental pattern. If we do, we get lost in the details. We drown in a flood of meaningless information. We lose the big picture. We lose the case.
There is wisdom from Chaos theories and iterative fractals that we need to understand and apply in the Law. Details are important, but never lose sight of the fundamental pattern. It remains the same over different scales of magnitude, from the small county court case, to the largest complex multinational actions. Justice is and must remain the fundamental fractal pattern of the Law. The key is finding the hidden order behind the apparent chaos, finding the truth of the controversy. That is the ultimate meaning of e-discovery: finding the significant relevant facts in large chaotic systems, the facts that make or break your case.
For a beautiful example of fractal iterations and self-similarity over ever increasing scales of magnitude, see for example the Team Fresh videos of Mandelbrot set zooms from HD-Fractals.com. These are accurate mathematical calculation visualizations, not fantasies.
SUPPLEMENTAL READING: Research some articles and book reviews of The Information, to get some different views of this book. You do NOT have to read the book itself. But if you get the time some day, it is a worthwhile mental exercise.
EXERCISE: Go to some of the many fractal art and video websites and enjoy the animations. Think about some of the zero – one insights from information science, and chaos theories. How does this apply to e-discovery in general, and advanced search processes in particular. Here is one of my favorites.
Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!
Copyright Ralph Losey 2015