Welcome to Module 1-H.

Metadata - the data about the data

What is the emerging standard of metadata production? This page addresses issues of metadata production, and the classic cases where it is compelled or prohibited. This is basic stuff that everyone needs to know, and many of you probably already do. Easy or not, it is important for everyone involved with e-discovery to understand metadata and know something about the law on this issue. For experienced legal practitioners metadata and production format are non-issues. It is all part of the original ESI document.

METADATA

First of all, what is metadata? Literally it means “data about data”. Many courts define the term by referring to the Sedona Glossary of Commonly Used Terms for E-Discovery and Digital Information Management, which defines “metadata” as

. . . information about a particular data set or document which describes how, when and by whom it was collected, created, accessed, modified and how it is formatted. Can be altered intentionally or inadvertently. Can be extracted when native files are converted to image. Some metadata, such as file dates and sizes, can easily be seen by users; other metadata can be hidden or embedded and unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed.

All computer files have metadata associated or within them that provides information about the files. For instance, email software includes information in email files about its author, creation date, attachments, and identities of all recipients, including those who received a cc or bcc. Metadata even tells you if an email has been opened by a recipient. The printout of an email, which is essentially a TIFF version of the email, may not show the blind copies, and certainly will not tell you if it has been read or not. The metadata of an email will also maintain the history of the email, its conversation thread, such as who replied and who forwarded.  Also, unless it is an Outlook email stripped out of its PST file into a MSG file (as explained in my Blog of January 13, 2007, “MSG Is Bad For You”), it will tell you in what folder the email was filed by its custodian.

Some programs include information within the contents of files that is hidden until you instruct the software to reveal the information.  This is called “embedded” information, and courts frequently refer to such information as metadata.  Technically, it is not true “metadata” because it is not “data about data.”  It is not information about the file itself.  Instead, it is information within a file, but hidden for some reason.  It is information that the user  has created, but is not visible without a command.  A good example of this is the “Comments” feature in Word.  Comments can be inserted into a Word document that are not visible until you use the View Command to show them.  The comments are embedded into the file itself.   Another example is the formula that a user can place in an Excel or other spreadsheet to calculate values within a cell.  The math used to calculate the value of a spreadsheet cell is embedded in the file.  Although technically “embedded data” is not “metadata,” for purposes of legal analysis “embedded data” is treated as a form or type of “metdadata” because most courts, and the legal profession at large, do not grasp the distinction.

 WILLIAMS I

Judge Waxse InformalThe key case on metadata is Williams v. Sprint/United Management Company, 230 F.R.D. 640 (D. Kan. 2005). It was written by a great e-Discovery judge, David Waxse. Here, terminated employees brought a class action and sought Excel spreadsheets with all metadata intact, including embedded formulae. The court held that under “emerging standards of electronic discovery”, metadata ordinarily visible to users of Excel spreadsheets “should presumptively be treated as part of the ‘document’ and should thus be discoverable.” Id. at 652.

The court reviewed case law, the pending new rules and commentary and the Sedona Principles for Electronic Document Production, especially Principle 12, which provides that “[u]nless it is material to resolving the dispute, there is no obligation to preserve and produce metadata absent agreement of the parties or order of the court.” The commentary to this principle opined that “most of the metadata has no evidentiary value, and any time (and money) spent reviewing it is a waste of resources.” The commentary also set forth an important caveat or except to its principle 12: “Of course, if the producing party knows or should reasonably know that particular metadata is relevant to the dispute, it should be produced.”

The court accepted the Sedona Principle 12, with commentary, as an important part of the “emerging standard”, but rejected Sprint’s argument that this meant the Excel spreadsheets’ metadata should not be produced. Instead, the court found that the Excel metadata was material to the dispute, Sprint United should have known that and should have produced it. The actual holding then goes even further to state:

Based on these emerging standards, the Court holds that when a party is ordered to produce electronic documents as they are maintained in the ordinary course of business, the producing party should produce the electronic documents with their metadata intact, unless that party timely objects to production of metadata, the parties agree that the metadata should not be produced, or the producing party requests a protective order. The initial burden with regard to the disclosure of the metadata would therefore be placed on the party to whom the request or order to produce is directed. The burden to object to the disclosure of metadata is appropriately placed on the party ordered to produce its electronic documents as they are ordinarily maintained because that party already has access to the metadata and is in the best position to determine whether producing it is objectionable. Placing the burden on the producing party is further supported by the fact that metadata is an inherent part of an electronic document, and its removal ordinarily requires an affirmative act by the producing party that alters the electronic document.

It looked like, in spite of Sedona Principle 12, metadata production was indeed to be the new standard, especially since the 2006 revisions to Fed.R.Civ.P.34(b)(2)(B) went into effect. It required production of electronically stored information as they “are kept in the usual course of business or in a form or forms that are reasonably usable.” The usual course of business is to keep files in their native format, because that is how they are used, i.e. .doc files created in Word, .xls files created in Excel, and native files by definition include all metadata.

WILLIAMS II

Several cases, along with a sequel to Williams itself, have, however, shown that the exact nature of the emerging standard is still in doubt.  First the sequel, Williams v. Sprint/United Management Company, 2006 WL 3691604 (D.Kan. Dec. 12, 2006)  (Williams II).  The spreadsheets in question in Williams I were produced in native format as the court ordered, but the plaintiffs wanted more.  They returned to the court a year later to try to compel production in native format of all 11,000 emails produced that transmitted spreadsheets. (The Judge here used a golf analogy for “native” format of “Play it like it lies.”  See my Blog post of February 5, 2007.)  The plaintiffs argued that the original native format of the emails was needed in order for them to determine which emails transmitted which spreadsheets.  The defendant had the burden to show why this native production should not be done, that it was permissible for it to have “improved their lie.”

Defendant met this burden and the motion to compel was denied, primarily because the emails had already been produced in paper without objection, and a second reproduction at this date would be very burdensome, especially since the emails contained many attorney-client privileged materials.   Actually, Plaintiffs had originally objected to paper production of the email, but had withdrawn their objection during one of many discovery hearings based on defendant’s assurances that they would provide Spreadsheet Reports that “would match up the transmittal e-mails with their respective attachments.”  Defendant argued it had done so as agreed, but the plaintiffs complained that the Spreadsheet Reports were deficient, and they were unable to match them up. The Court disagreed that the Spreadsheet Reports were deficient, noted the apparent impossibility to redact privileged materials from native files, and held that since the plaintiffs had already received production in one format (paper), the new rules protected defendant from having to produce them again in another format (native).  To continue the golf analogy, the court in effect applied new rule 34(b)(iii) to prevent plaintiff from receiving a “mulligan,” a second request. The exact wording of the court is instructive, and to a certain extent, explains and clarifies Williams I:

Federal Rule of Civil Procedure 34(b)(iii), as amended on December 1, 2006, provides that “[u]nless the parties otherwise agree, or the court otherwise orders, … a party need not produce the same electronically stored information in more than one form.” In this case, Defendant has already produced the transmittal e-mails, as well as all the attachments to those e-mails. Defendant has further created Spreadsheet Reports to correlate the transmittal e-mails to the attachments they transmitted. The Court therefore finds that under Rule 34(b)(iii), Defendant need not re-produce the its RIF-related transmittal e-mails together with their attachments in native format, as requested by Plaintiffs.

Defendant raises legitimate concerns about producing the transmittal e-mails with their attachments in their native format, including whether production in native format would permit the redaction or removal of privileged information in the transmittal e-mail or the attachment.

Moreover, even assuming that Defendant could produce the transmittal e-mails together with their attachments in native format with the privileged information redacted, Plaintiffs have not sufficiently explained why they need the transmittal e-mails in their native format. Previously, this Court has ordered Defendant to produce the Excel RIF spreadsheets in native format, but in that instance Plaintiffs provided valid reasons for the spreadsheets to be produced in their native format. Namely, that the contents of the spreadsheet cells could not otherwise be viewed as the cells contained formulas. Also, in many instances, the column width of the cells prevented viewing of the entire content of the cells. Here, other than arguing that ordering Defendant to reproduce the transmittal e-mails together with their attachments in native format would be more helpful to Plaintiffs in matching up the transmittal e-mails with their respective attachments, Plaintiffs fail to provide any other reason why they need the transmittal e-mails produced in their native format. For these reasons, the Court denies Plaintiffs’ request for Defendant to produce all its RIF-related transmittal e-mails in native format with all attachments in native format and attached to the transmittal e-mails.

The Williams II court did, however, state that plaintiffs could pose specific interrogatories to defendant as necessary to decipher the Spreadsheet Reports and determine which emails matched with a particular spreadsheet.

WYETH

Another court quoted Williams I, but reached an opposite result disallowing production of native format.  Wyeth v. Impax Laboratories, Inc., 2006 WL 3091331, 2006 U.S. Dist. LEXIS 79761 (D.Del. Oct. 26, 2006).  The facts behind the decision are similar to those of Williams II.  Wyeth declined to require metadata production relying primarily on the failure to request metadata before the production, and a local rule making TIFF and JPEG the default format of production. In effect, the court was relying on the “one format” production limitation of revised Rule 34(b), although the new rule is not mentioned in the opinion, and had not yet gone into effect.  The Wyeth court explained its ruling as follows:

Since the parties have never agreed that electronic documents would be produced in any particular format, Wyeth complied with its discovery obligation by producing image files. Further, neither party has argued that the need for accessing metadata was foreseeable or generally necessary. Finally, Impax has not demonstrated a particularized need for the metadata or database production it has requested. Therefore, this part of Impax’s Motion is denied.

Wyeth apparentlytries to buttress the decision with a quote from Williams I, that the “emerging standards of electronic discovery appear to articulate a general presumption against the production of metadata.” The Court correctly quotes Williams I, but does not point out that this was a summary of the defendant’s position which Williams I rejected.  (Williams II  had not yet been decided.)

KENTUCKY SPEEDWAY

A better reasoned case in Kentucky district court explicitly rejects Williams I.  Kentucky Speedway, LLC v. NASCAR, 2006 U.S. Dist. LEXIS 92028 (E.D. Ky. Dec. 18, 2006). This is an antitrust action against NASCAR where the defendants had already spent over $3 million in 5 months responding to e-discovery requests. Then, with that background, the plaintiff for the first time asks for production of all  metadata in documents already produced.  (Again, note the similarities with Williams II, which had not yet been decided.)  The plaintiff relied upon Williams I to try to justify this late request, but failed to make “any showing of a particularized need for the metadata.” The Kentucky Speedway court rejected Williams I in this context and instead followed Wyeth, holding that:

In the rapidly evolving world of electronic discovery, the holding of the Williams case is not persuasive. Having the benefit of the newly amended rules, advisory notes, and commentary of scholars, I respectfully disagree with its conclusion that a producing party “should produce the electronic documents with their metadata intact, unless that party timely objects …, the parties agree that the metadata should not be produced, or the producing party requests a protective order.”

Here, the parties clearly had no agreement that the electronic files would be produced in any particular format. Plaintiff did not notify defendant ISC that it sought metadata until seven months after ISC had produced both hard copy and electronic copies of its documents. . . .

To the extent that plaintiff seeks metadata for a specific document or documents where date and authorship information is unknown but relevant, plaintiff should identify that document or documents by Bates Number or by other reasonably identifying features. Responding to a request for additional information concerning specific documents would be far less burdensome to defendant and far more likely to produce relevant information.

The opinion in Kentucky Speedway does not answer the question of whether it would have reached the same result if the plaintiff had made the request for metadata from the beginning, and not waited until after the defendants had already spent millions of dollars to produce the same documents without metadata.

It is, however, clear from Kentucky Speedway that whenever a metadata production will create a substantial burden on the producing party (here it would have cost NASCAR another $500,000), then the requesting party will have to provide good cause.  The plaintiff’s reliance on Williams I to support the production of metadata in all circumstances, and without a good cause showing, was misplaced and distorts its actual holding of Williams I (as shown for instance by Williams II).  Instead Williams stands for the proposition that the producing party must object and show undue burden, and then the burden shifts to the requesting party to prove good cause.  The argument on metadata production is essentially the same as the inaccessibility argument under Rule 26(b)(2)(B). If you can show a real need to see the metadata, as the plaintiffs did in Williams I (but not Williams II), it may be possible to compel the production in spite of burden on the producing party. It will be a balancing test dependent upon the circumstances, and following something like the seven factors recommended by the Rules Committee for 26(b)(2)(B) analysis, and earlier in Zubulake I for cost shifting, but with the added dimension of the debatable feasibility at this time of redacting privileged materials from native files.

IN RE PAYMENT

The latest word in this controversy of “emerging standards”comes out of a consolidated group of class action cases styled In Re Payment Card Interchange Fee and Merchant Discount Antitrust Litigation, 2007 U.S. Dist. LEXIS 2650 (E.D. NY. January 12, 2007).  As in Kentucky Speedway, Wyeth and Williams II, the defendants here sought the production of the metadata for documents already produced without metadata. (Actually, defendants never filed a motion to compel, they just raised the issue at a conference, and that was part of the problem.) In addition, the defendants wanted the plaintiffs to produce all metadata on documents they had not yet produced.

As to the previously produced documents, the holding here follows Kentucky Speedway, Wyeth and Williams II.   In Re Payment holds that the defendants waited too long to complain of the metadata-stripped production, and implied that there had been a waiver. The lesson here is that in order to obtain metadata you may need, you should specifically ask for that metadata to begin with, and if the production is later stripped of the metadata requested, you should immediately and vigorously object.

In Re Payment also followed Williams I to a certain extent, in that it ordered all future productions by plaintiffs to include metadata. The Court explained its reasons for requiring full metadata production:

The defendants object to Individual Plaintiffs’ production protocol on the grounds that, by failing to supply meta-data, it does not comply with amended Rule 34. The Advisory Committee on Civil Rules, in its notes to the 2006 amendment to Rule 34, wrote that a party responding to a discovery request may elect to produce a “reasonably usable” form of electronic data rather than produce the information as kept in the ordinary course of business. Fed. R. Civ. P.34(b), 2006 Amendment, Advisory Committee’s Note. That is precisely what the Individual Plaintiffs have done. By making that choice, however, they have run afoul of the Advisory Committee’s proviso that data ordinarily kept in electronically searchable form “should not be produced in a form that removes or significantly degrades this feature.” Id.

However, the same cannot be said of prospective discovery, meaning the electronic documents that the Individual Plaintiffs have not yet substantially prepared for production as searchable TIFF images. Now that the Individual Plaintiffs are aware of the defendants’ objections, their argument of undue burden is weaker; indeed, they have conceded that their concerns about the burdens of producing electronic documents in native format largely disappear with respect to the documents they have not yet processed for production.

Thus this new metadata case, like Williams I and II,  supports the proposition that the emerging standard requires metadata production when requested and not objected to, but at the same time emphasizes the need for early, clear requests, and prompt objections if the metadata is not provided.

EXERCISE: Why do you think the NSA focuses on metadata collections? Why do you think some espionage experts have said they would much rather know the metadata of a persons communications than the actual content? On another topic, why do you think metadata and production format should be non-issues?

Discretionary Bonus Exercise: Find a case discussing metadata written by Judge Scheindlin. Then look especially at the footnotes setting out the key elements of metadata fields. Do you agree or disagree? Any guess as to who allegedly helped her to write that footnote? (Not me!)

SUPPLEMENTAL READING: Read up on the controversial NSA metadata collection program that Snowden disclosed. There is more to this debate than privacy concerns, and fighting terrorists. What kind of metadata information would you like to have about the party suing your client? About opposing counsel? About someone your client is thinking about suing? Do you Google all of these folks? Google your clients? Why not? What steps do you take to try to make sure your private communications are kept private? What do you do to try to protect your clients’ discovery data from being stolen by hackers. Cybersecurity issues are discussed at another R. Losey web, eDiscoverySecurity.com. Also check out HackerLaw.org, another of my favorites. Every e-Discovery lawyer should be familiar with the CFAA (Computer Fraud and Abuse Act).

Students are invited to leave a public comment below. Insights that might help other students are especially welcome. Let’s collaborate!

Copyright Ralph Losey 2015

 

Ralph Losey is a practicing attorney who specializes in electronic discovery law. He is a principal in a U.S. law firm with over 50 offices & 800 lawyers where he supervises electronic discovery work and litigation support. Ralph has written over two million words on law and technology, including six books on electronic discovery. His latest books are "E-Discovery for Everyone" (ABA 2017) and "Perspectives on Predictive Coding" (ABA 2017) (ed. & contributor). His blog is widely read in the industry: "e-DiscoveryTeam.com." Ralph is the founder and principal author of "Electronic Discovery Best Practices" and "e-Discovery Team Training," a free online course covering all aspects of e-discovery. Ralph's sub-speciality is the search and review of electronic evidence using multimodal methods, including artificial intelligence. He also has a free online training program to teach these advanced methods - the "TAR Course." Ralph has devoted a month of his time each year since 2013 to research and test various AI-enhanced document review methods. In 2015 and 2016 Ralph and his Team participated in the TREC Total Recall Track experiments sponsored by the National Institute of Standards and Technology. Ralph has been involved with computers and the law since 1978. His full biography is found at RalphLosey.com. Ralph is the proud father of two children, Eva M. Losey and Adam Colby Losey, a high-tech lawyer married to another e-discovery lawyer, Cat Jackson Losey, and, best of all, Ralph has been married since 1973 to Molly Friedman Losey, a mental health counselor and life-long friend.

One Comment on “Sec. 1 – Mod. H

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: