Welcome to Module 3-N.
Should You Go Native? More on the controversy of production in native file format.
The title of Paul Gauguin’s famous primitivist painting of Tahiti above is: “Where Do We Come From? What Are We? Where Are We Going?” These are perennial questions of past, present, and future that all humans face. In the world of e-discovery at least, the answer to these questions is clear. We come from paper, we are now in “Tiff and load,” and we are going native. This class goes still deeper into the native file format debate.
Right now most of us in e-discovery live in the safe, well established world of “Tiff and load.” (By this I am referring to converted static image files of all types, including Tiff, and corresponding load files with indexes and other information on the original pre-converted files, usually including select metadata.) We have heard that some in e-discovery, from time to time, leave the familiar comfort of Tiff review and production to try to handle raw computer files that have not been converted, that still live in their untamed native state. There are tales of others who have completely “gone native” and now only review and produce in original forms. They promote a total return to the native, much like Paul Gauguin, and Henri Rousseau (whose painting “The Dream” is shown below) did in the world of Art in the late Nineteenth Century. They tempt us with primitive visions of a wild unfiltered world; a new landscape overflowing with all types of strange exotic metadata.
These nativists brag of cost savings by eliminating steps and keeping it simple. They no longer even try to convert the masses of ESI found in the wild. They instead leave them in their first forms. They openly taunt the converted, the static image. They even abandon the security and well defined order of the sacred Bates stamp. Instead, they babble on about hash, algorithms, authenticity, and dreams of digital fingerprints.
These wild native lovers appear to be brainwashed by over exposure to IT culture. They have succumbed to a classic, incurable case of Clientitus. Those who have lived long in the world of e-discovery, and still profit from the old ways of static image conversion and blowback, will not quickly give in to these new, unproven methods. As even Henri Rousseau once said: “I cannot now change my style, which I acquired, as you can imagine, by dint of labour.” Thus, a mighty battle is set. A war of ignorance and half-truths is in play between the old and new approaches to e-discovery.
This clash of cultures can be seen in four cases. There have been many cases like this before and after. See eg. Aguilar v. Immigration and Customs Enforcement Div. of U.S. Dept. of Homeland Sec., 2008 WL 5062700 (S.D.N.Y. Nov. 21, 2008) (Opinion by Magistrate Judge Frank Maas includes a scholarly review of many cases battling over metadata and an observation at *9 that “Metadata has become ‘the new black,’ with parties increasingly seeking its production in every case”). The appearance of four more cases on the issue in the past sixty days suggests that the conflict is escalating. Tiff and load is still the dominant colonial culture, but, as these cases show, the wild ways of native production are beginning to make their mark, hash mark that is.
In Re: ClassicStar Mare Lease Litigation
The first case examining this conflict comes out of Kentucky. In Re: ClassicStar Mare Lease Litigation, 2009 WL 260954 (E.D.K.Y. Feb. 2, 2009). In ClassicStar, Magistrate Judge James B. Todd considered whether a defendant should be required to produce financial data in its original, native format after it had already been produced in Tiff and load. One of the defendants in this multi-district litigation, GeoStar, produced 273,000 pages of financial records to all parties. GeoStar converted the information from their original format to a Tiff, with DII load files and OCR “so that the documents could be readily loaded into the parties Summation or Concordance (both litigation document management software) databases for ease of searching.” Id. at *1. In footnote two, GeoStar contends that all prior productions in the case have been made in this same format and no one had complained before.
In spite of the fact that this production of 273,000 of financial records was searchable and in accord with prior practice in this case, a group of plaintiffs had succumbed to the lure of native for reasons that will be explained below, and demanded that all of these documents be produced again in their original native format. GeoStar objected, and its Chief Financial Officer testified at a hearing on the issue that a second production in original native format would be unduly burdensome:
He identified three reasons that the production of the financial information would be extremely difficult and burdensome on GeoStar: (1) GeoStar would have to redact all financial information pertaining to Gastar Exploration, Inc., pursuant to a confidentiality agreement between those companies; (2) GeoStar would have to redact information outside the requested time period; and (3) GeoStar would have to review older data and repair any corrupted data.
Id. at *2.
Judge Todd correctly begins his analysis of this “Tiff v. Native” dispute by reviewing Rule 34, Federal Rules of Civil Procedure. It provides that, if the requesting party fails to specify a format, “a party must produce it in a form or forms in which it is ordinarily maintained or in a reasonably useable form or forms.” Rule 34(b)(2)(E)(ii), Fed.R.Civ.P. The Court also correctly referenced the related Rule 34(b)(2)(E)(iii): “a party need not produce the same electronically stored information in more than one form.”
Defendant GeoStar took the position that its production of financial records in Tiff format, with DII load files, satisfied the requirement under the Rule to produce the ESI “in a reasonably useable form.” GeoStar argued that the conversion to Tiff with a load file did not “significantly degrade” the searchability function of the documents. GeoStar relied upon United States v. O’Keef, 537 F.Supp.2d 14, 23 (D.D.C. 2008) where Judge Facciola held:
[P]roduction of the electronically stored information in PDF or TIFF format would suffice, unless defendants can show that those formats are not ‘reasonably usable’ and that the native format, with the accompanying metadata, meet the criteria of ‘reasonably usable’ whereas the PDF or TIFF formats do not.
In response to this position, the plaintiffs’ group argued that production in native format would make the information “more usable.” Judge Todd commented that the “nativeists” did not address the core issue in the rule as to whether or not the Tiff and load file was “reasonably useable.” To support their argument as to greater usability, the plaintiffs referred to a particular financial report as an example. The report contained 18,000 rows and 13 columns of data. As the Court said, “these materials contain an extraordinary amount of data.” In my experience, this type of report is not unusual for financial institutions these days and the situation here addressed by the Court is becoming quite common.
The plaintiffs went on to argue that unless these reports are produced in their original native format, and not just Tiff and load, that they will be forced “to manually sort through tens of millions of rows of densely formatted financial data.” Yes, they said millions! Conversely, the plaintiffs argue that if the data is produced in its native format, as it is maintained in the ordinary course of GeoStar’s business, “plaintiffs could query various search reports and extract the desired information in a fraction of the time.” Id. at *3. Nothing like facing the costs involved in reviewing millions of rows of data to open the mind to new alternatives like native file review.
Still, if these plaintiffs had wanted it in native, why didn’t they ask for it that way to begin with, before the producing party went to all of the expense of conversion to Tiff? Apparently, it took them a quite a while to understand the benefits of native file review. Plus, as we will see, the producing party maneuvered them into acceptance of Tiff with scary tales of proprietary software.
The Court resolved this fairness quandary by holding that:
GeoStar has not necessarily erred by producing the documents in .pdf and .tif formats. Indeed, GeoStar apparently produced the documents in a format which was most likely to be usable-to some extent-by any of the parties to this lawsuit, none of whom had the necessary software for the native format data at the time of the production. In other words, the West Hills Plaintiffs have received, to the best of the Court’s knowledge, the production that their Request sought.
Id. at *3.
After making this ruling, the Court went on to say that it was, however, “very concerned” with a written exchange between counsel concerning this production. In the written exchange, GeoStar’s attorneys basically stated it would not be appropriate to produce the financial information requested because it could only be read if the plaintiffs purchased the two specialty software programs in which they were created. Defense counsel also indicated that they thought this would cost in excess of $50,000.00 to purchase these programs and, for that reason, they were going to produce them in the Summation or Concordance database formats instead, namely Tiff and load files. The letter did, however, indicate that if they chose to buy the software, then they would produce the data in its native format.
The Court seized upon this throwaway remark as a clever way around the issue of whether the Tiff and load here was “reasonably useable” as required under the rules. The Court basically held that the offer in the letter superseded the rules requirements and obligated the defense to produce the financial database in its native format, assuming that the plaintiffs in fact purchased the “specific software necessary to read the native data format so that they might use its imbedded metadata to better understand the data and the information that it represents.” Id. at *4. The Court did, however, shift the costs of this second native production onto the requesting plaintiffs group, holding:
Finally, as GeoStar has already produced the data once and as the original document request did not specify a native format production, it is only fair to shift the reasonable cost of copying and delivering this second production to the West Hills Plaintiffs.
Id. at *4. This ruling reminds me of a famous quote by Paul Gauguin: “A compromise is the art of dividing a cake in such a way that everyone believes that he has got the biggest piece.”
The obvious moral to this case is that if you are a requesting party, and you want native files, then you should specifically request the native files at the get-go. Do not wait until after you are slammed with Tiff and load to find religion. Do your homework up front. This case also stands for the proposition that if you want native format, you may also have to purchase special, sometimes quite expensive software on which the files were created. This depends on many facts and circumstances, including the export capacities of the original software. These are the kind of technical details that should be worked out by the parties. See Sedona Conference Cooperation Proclamation; Aguilar v. Immigration and Customs Enforcement Div. of U.S. Dept. of Homeland Sec., supra at *9 (metadata issues are an inherently “party-oriented process”).
Finally, it is significant to note that, in this particular case, the confidentiality issues raised by GeoStar were found to be without merit. The court held that there was no confidential data. In other circumstances where bona fide confidential or privileged information is involved, these concerns can raise serious issues about going native. It is impossible to redact a native file in the same way you can redact a Tiff or other image file. The only way that I know of to protect confidentiality in a native file is to alter the native file by deletion or encryption of the protected data. Thus, for instance, if the native file is an Excel spreadsheet, and only one page of the 30-page spreadsheet is relevant and not confidential, then the redaction of the native file takes place by removing the confidential and irrelevant sections of the spreadsheet. This act of removing the confidential information, of course, completely changes the file and in effect creates a new file. Thus, in the context of native file production, redaction is accomplished through alteration and creation of new files. This can be an extremely time consuming process which will often result in cost-shifting to the requesting parties, especially if native format is demanded in order to maintain the full functionality of file.
As an aside, a later case considered what types of information can be redacted before production. Lapin v. Goldman, Sachs & Co., 2009 WL 222788 (S.D.N.Y. Jan. 23, 2009). This is an interesting short opinion by Manhattan Magistrate Judge Douglas F. Eaton that is definitely worth reading. It primarily concerns keyword search terms. The plaintiff class took the incredible position that all of the emails of 39 past and current employees of Goldman Sachs during a certain time period should be produced without any filtering for relevance at all. Had this position been adopted, the defense would have had to review over 300,000 emails of these employees to search for privilege. They would have had to produce all of them, no matter how irrelevant they were. This appears to be a classic case of using e-discovery as a weapon, not a tool of truth. Judge Eaton agreed with the defense on this issue and ordered the parties to meet to attempt to agree upon keywords to cull out irrelevant information. He also rejected plaintiffs related extreme position that all of the emails found should be produced without any redaction at all. If this position had been adopted, the defendant’s privacy rights, and the privacy rights of its employees, including attorney client privilege, would have been trampled. Once again, Judge Eaton saw through this “e-discovery as weapon” tactic and held that information could be redacted by defendant if:
(1) the information is proprietary and would adversely affect Goldman Sachs’s competitive position even at this late date; (2) the information would invade the analyst’s privacy rights and is entirely irrelevant to the issue of whether the analyst perceived any pressure to slant a report to please anybody in Goldman Sachs’s investment banking division; or (3) the information is protected by the attorney-client privilege.
I applaud Judge Eaton here for recognizing and providing some protection to the privacy rights of Goldman Sachs employees. These rights are often ignored and innocent people are unnecessarily harmed or embarrassed by having private emails made public in litigation, even when they are completely irrelevant. There was no discussion in this opinion as to form of production, but I assume it was the standard Tiff and load.
Kingsway Financial Services, Inc. v. Pricewaterhouse-Coopers, LLP
A second case examining the “Tiff v. Native” conflict comes out of New York. Kingsway Financial Services, Inc. v. Pricewaterhouse-Coopers, LLP, 2008 WL 5423316 (S.D.N.Y. Dec. 31, 2008). Here Magistrate Judge Henry Pitman addresses a plaintiff’s request for the defendant’s production of metadata from documents the defendants have previously produced in Tiff. It seems to me that this request for native was not really serious. It looks more like a “strategic tactic” because, in the Court’s words:
Plaintiffs do not identify the types of metadata they seek nor do they explain why metadata is relevant in this matter. In addition, plaintiffs do not raise any questions about the authenticity of any documents produced by PwC nor do they claim that any document has been improperly “doctored” or modified.
Id. at *6.
The opinion of Judge Pitman cites to Aguilar:
As Aguilar points out, there are three different types of metadata: (1) substantive metadata; (2) system metadata, and (3) embedded metadata. 08 WL 5062700 at *3-*4. In general, metadata is relevant when the process by which a document was created is in issue or there are questions concerning a document’s authenticity; metadata may reveal when a document was created, how many times it was edited, when it was edited and the nature of the edits. In the absence of an issue concerning the authenticity of a document or the process by which it was created, most metadata has no evidentiary value. 2008 WL 5062700 at *5.
Kingsway at *6.
I must point out that Judge Pitman in Kingsway omits another important reason for obtaining metadata; namely to improve the functionality of the file. There is much more to this “back to native” movement than the pursuit of authenticity. The added functionality of native files arises from the “embedded metadata” mentioned in the Aguilar three-fold type differentiation. The classic example of this is seen in spreadsheets, wherein the embedded metadata allows formula studies, as well as functional assortment and other internal search capabilities. This is also well demonstrated in In Re: ClassicStar Mare Lease Litigation, 2009 WL 260954 (E.D.K.Y. Feb. 2, 2009).
The requesting party here (plaintiff) did not include the functionality argument. The responding party, PwC, stated that it was not possible to produce the metadata because it had been deleted from their computer system prior to the commencement of the action–it no longer existed. Apparently, they were referring to system-type metadata in making this argument.
In any event, the Court would not compel the production of metadata in this case without a showing of good cause, and none was given. In the Court’s words:
In light of the dubious value of metadata and plaintiffs’ total failure to explain its relevance to the claims and defenses in this action, plaintiffs’ application to compel its production is denied. Given the advanced stage of discovery in this litigation and the absence of any showing that production of metadata would serve any useful purpose, I am left with the strong sense that ordering the production of metadata would not shed any light on any of the claims or defenses in this action.
Kingsway at *6.
Although I am obviously a “native sympathizer,” I completely agree with Judge Pitman’s holding. In some cases metadata is important, in others it is not. In most cases the producing party should not be required to make a second production of metadata where the requesting party failed to request it in the first place, especially when, as here, they cannot show good cause.
Armor Screen Corporation v. Storm Catcher, Inc.
The next case arises in West Palm Beach, Florida. Armor Screen Corporation v. Storm Catcher, Inc., 2008 WL 5262707 (S.D.Fla. Dec. 17, 2008). Here Magistrate Judge Ann E. Vitunac considered a situation where defendants received a production in native format of reports used by plaintiff’s experts. The defendants complained that this production in native format was not “reasonably usable” under the rules, and demanded that plaintiff reproduce the information again in “hard-copy print-outs of the electronic SPSS data, including all ‘metadata’.” Id. at *2.
This kind of request has to put a smile on any experienced e-discovery professional, even a die-hard Tiff and load type. The defendants object to a native production and ask for a paper print-out of all of the computer files, including even a paper print-out of all metadata! They did not even ask for Tiff and load. They asked for paper and more paper! Electronic discovery, much like a wild jungle, can be a very scary place for some. They want everything made into paper.
Here is the Court’s, I think, very correct analysis of this bizarre request:
Defendants ask the Court to compel Plaintiff to produce hard-copy print-outs of the data relied upon by Dr. Cowan in his expert report. In an affidavit by Dr. Cowan, attached to Plaintiff’s Response, Dr. Cowan states that he conducted his survey electronically, no paper copies were ever generated, and the electronic survey data was produced in the same format in which he received and analyzed it, i.e. *.sav files. [Footnote omitted.] Dr. Cowan represents that hard-copy print-outs of the data would be an expensive, time-consuming process prone to error.
* * *
[T]he Court finds that the production to Defendants of the survey data relied upon by Plaintiff’s expert Dr. Cowan in the form of electronic SPSS data files complies with the requirements of Rule 34 and this Court’s prior Order (DE 153) requiring Plaintiff to produce such data in “a reasonably accessible format.” Consequently, the Court will not compel Plaintiff to produce hard-copy print-outs of the electronic data relied upon by Dr. Cowan and already produced to Defendants in the form of electronic SPSS files.
Id. at *3.
The plaintiffs argued, and I am inclined to agree, that the defendants’ motion was frivolous and they should be awarded their costs. The Court was obviously also concerned about the defendants’ motion here as it held that it would reserve ruling on the plaintiff’s request for an award of fees:
The Court will hold a hearing at the close of discovery to determine whether the Motion “was substantially justified or other circumstances make an award of fees unjust.” Fed.R.Civ.P. 37(a)(5)(B).
Id. at *3.
By the way, Judge Vitunac here showed strong research resourcefulness and analytic skills to understand the technicalities of what was going on here. Footnote 1 of her opinion shows that she did an Internet search to determine what a “.sav” data file means. This was information that, by all appearances, completely eluded the defendants. Judge Vitunac correctly found a file extension resources website and was able to confirm Dr. Cowan’s testimony that “.sav” data files are a common data format utilized by many statistical packages for social sciences programs (SPSS) and can be opened by various standard statistical computer packages. Kudos to Judge Vitunac and her research clerks for that one!
Superior Production Partnership v. Gordon Auto Body Parts Co., Ltd.
The last case comes out of Ohio. Superior Production Partnership v. Gordon Auto Body Parts Co. Ltd., 2008 WL 5111184 (S.D.Ohio Dec. 2, 2008). Magistrate Judge Terence P. Kemp considered a motion by plaintiff, PBSI, to compel the defendants to produce documents in native format. Once again, the defendants here had already produced the documents in hard-copy, even though the parties agreed that they were maintained by defendants in some sort of electronic format.
The defendants responded to the request for a second production in native format by claiming that the computer system “does not maintain ‘metadata’ that would provide additional information about the document beyond that shown in the hard-copy.” Id. at *1. The court notes that defendants did not address the secondary issue as to “the relative ease or difficulty of producing in native format.” Id.
Unlike the other cases discussed here, the opinion nowhere mentions whether or not native file format was requested by PBSI, so I assume the request was silent as to form. This, of course, triggers the provisions of Rule 34(b)(2)(E)(ii), which allow a responding party to produce its ESI in either “ordinarily maintained” form, in other words, “native,” or in a “reasonably useable” form. The question here is easy to decide because the defendant produced in paper, not in Tiff and load, where at least some metadata would be included and search capacity preserved. Here is how Judge Kemp dealt with PBSI’s belated request for native production:
Federal Rule of Civil Procedure 26, as amended, expresses a preference for the production of electronically stored information in its native format. As PBSI points out in its reply brief, the utility of having documents produced in this format is not limited to being able to view any metadata which might be embedded in the electronic document but not visible on the hard copy. It is often more convenient for the requesting party to receive the documents electronically in order to be able to store them and manipulate them during the litigation process. Here, there do not appear to be any obstacles to the production of the documents in native format, and PBSI has articulated at least one good reason to have them produced in that fashion. Under these circumstances, the Court will direct Gordon to produce documents in their native format.
The court here did not consider cost shifting and did not criticize PBSI for not specifically requesting native to begin with. Other courts may not be so kind. Thus it remains a good idea to specifically request native with metadata, if that is what you want, and do so at the earliest opportunity. As Judge Mass correctly noted in Aguilar:
There is a clear pattern in the case law concerning motions to compel the production of metadata. Courts generally have ordered the production of metadata when it is sought in the initial document request and the producing party has not yet produced the documents in any form. (citations omitted) … On the other hand, if metadata is not sought in the initial document request, and particularly if the producing party already has produced the documents in another form, courts tend to deny later requests, often concluding that the metadata is not relevant.
Aguilar v. Immigration and Customs Enforcement Div. of U.S. Dept. of Homeland Sec., supra at *7.
So we come back to the original question, should you go native? My view remains a emphatic yes! The reason is simple: native is usually the most efficient and costs effective manner for ESI review and production. Usually the best, but not always. The colonialists are right, there are still many obstacles to full native review of all types of ESI in all circumstances, but these obstacles can be overcome with better technology and collaboration.
The polestar here, as always, is the balance between cost and risk management. If native saves you time and money, and improves functionality to boot, then why not request it? We have seen one negative answer to this question in ClassicStar Mare Lease Litigation. You might not have the software needed to read the native files. The cost to purchase the software might well exceed the potential cost and functionality savings of native. Other reasons to resist the move to native might be limitations built into your own systems. You may be setup for Tiff review, but not at all for native. Even if set up for native, the review in Tiff might be faster and thus make up for any conversion costs and delays. These limitations are, however, fast becoming obsolete. Most review software now includes dual modes, both native and Tiff. Still, an economic analysis may show that in some cases today, Tiff review and production is cheaper than native. But I predict this will change as the industry, which is still heavily invested in Tiff and load, bows to the inevitable and moves its focus to native. As vendors change, and new products develop, the circumstances where Tiff is cheaper will be fewer and farther between.
Even though a native format production may often be better for the requesting party, in some circumstances a native production may still be too burdensome to the responding party, or altogether impossible. For example, in the case of some custom software or databases, it might be impossible for an export of the relevant data requested, or if possible to export, impossible for any other party to read. The producing party may be the only one in the world with the software or systems needed to read the information. The only solution may be a non-starter from the producing party’s perspective, allowing the requesting party to use the producing party’s own proprietary systems. See eg. Aguilar v. Immigration and Customs Enforcement Div. of U.S. Dept. of Homeland Sec., supra at *14-*15 (government ordered to provide a demonstration of confidential law enforcement database to plaintiffs’ attorneys and expert).
Another more common example concerns confidential and privileged information. How do you stamp an electronic document with a confidentiality legend? How do you redact information contained in an electronic file? As mentioned in the discussion of the ClassicStar Mare Lease Litigation case, in essence you now have to alter the file and create a new file with the confidential data omitted. The alternative is adding legends, but that also changes the file and is often not an acceptable solution for a variety of reasons, especially where sensitive information is involved. Depending on the circumstances, the native file redactions may be too expensive and burdensome upon the responding party. Since the Tiff and load alternative to native is now usually an easier and less expensive setup for partial redaction of files and thus protection of confidentiality, if a requesting party wants to stay native with confidential files, they may well have to pay for the added expense. Still, even here, I predict new software innovations will arise to meet this challenge. Many vendors are already adding improved capacities to address this,
Aside from these few caveats, in most projects there is no need for the extra busywork of converting native to image files and creation of load files. The wise words of Paul Gauguin apply to this situation: “Stressing output is the key to improving productivity, while looking to increase activity can result in just the opposite.” Yes, our past may be paper, and our present a confused mess, but the future more productive world belongs to native.
SUPPLEMENTAL READING: Study these legal opinions carefully. Also, if you have not already done so, be sure to read the Aguilar opinion. If you were the attorneys in these cases, what would you have done differently? Do you think the judges made the right ruling? Why or why not?
EXERCISE: Many vendors dislike native file production for a variety of reasons. Talk it over with them. Try out the pro-native arguments and see what they say.
Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!
Copyright Ralph Losey 2015