Main menu:



Site search

Recent Comments

  • pensions Birmingham: on The Pension Committee Blog Series: Implications and Questions
    The ediscovery refers to discovery in civil litigation which...
  • John Mayer: on The Pension Committee Blog Series: Implications and Questions
    How about California's AB5 which opens discovery to all ESI ...
  • David: on EMC/Kazeon offering complementary "Bringing eDiscovery In-House for Dummies" by Jake Frazier, MBA, Esq. in booth 1380
    Mike...a good book that covers the basics is EDiscovery for ...
  • mike: on EMC/Kazeon offering complementary "Bringing eDiscovery In-House for Dummies" by Jake Frazier, MBA, Esq. in booth 1380
    What is a good primer book/article for lawyers who know almo...
  • michael rose: on Webinar: The Judges Lay Down the eDiscovery Law
    i missed the e-discovery webinar. is it stored on-line for ...

Archives

Blogroll

SourceOne eDiscovery - Kazeon Authors

eDiscovery StraightTalk with James D. Shook, Esq. – Issue 5: The Hash Value

CSI For Lawyers – The Hash Value and Digital Fingerprints

Question: What is the technology that is the most challenging for legal professionals today?

James D. Shook, Esq., CIPP

Shook: In my opinion, hashing algorithms remain one of the greatest but least understood technologies in the legal world.  A hashing algorithm takes an electronic “file” (which could be almost anything — word processing or spreadsheet document, email, sound, video, etc.) as input and returns a unique sequence — a fingerprint — as output.  If you change even a single bit in the input file, the hash returns a completely different fingerprint.  Thus, the SHA-1 hash value of this document as I write it is “CD2EF5D9931033F54A49AF4046EDA61DAF6FFE9D”; but adding just a few words changes that value to the completely different “64C2068B3273122E42D64ED0DDB948E8941CECA9″.

Question: So, the SHA-1 hash value for a file is unique like a digital fingerprint for the document to prove its authenticity?

Shook: Yes, absolutely.   Also, we can quickly see that there are a vast number of uses for a tool like the hash algorithm, particularly in the legal and eDiscovery world.  One of the most useful purposes is leveraging the hash in de-duplication:  files with the same hash value are consolidated into a single object, and all files with that identical hash value “point” to that common object.  Among other benefits, this can save a substantial amount of storage.  A second common purpose is in determining and reporting the level of duplicate objects across a system, repository, custodian group or even in a document production.  Hashes can also be used to track all locations of a specific document across an entire infrastructure, and take action on those documents as needed.  For example, we have had clients use our tools to track all locations of a classified document — located by its hash value — to determine whether any copies reside outside of permitted repositories, such as on unprotected fileshares.  When located, those unauthorized copies can be deleted or reported on.

Question: Will hashing ever replace the traditional Bates Stamping?

Shook: I believe it will.  One completely underutilized area, in the legal world, is using the hash value, or a small portion of the value, as a document’s identifier in a case, instead of the very old-fashioned Bates Stamp.  As a litigator from “way back”, I can remember problems with Bates Stamps in cases involving mere thousands of documents — important documents, produced several times, would have several different Bates values; some would have illegible or mis-identified Bates stamps.  In some cases, litigants who would stamp each page with a separate number, while others would stamp only the first page of a “document” (with the determination made on whether that document had been stapled or paper-clipped), etc.  Sometimes the same numbers would be re-used, and there could be two or three identical numbers referring to different documents.  (Fellow Sedona Conference member Ralph Losey has written an excellent law review article that goes into great depth on hash values and the bates stamp, which can be found at: http://ralphlosey.files.wordpress.com/2008/07/hasharticleloseycorrected.pdf).  Substituting the hash value for the Bates Stamp is such an elegant, simple solution to the document ID problem that I am surprised each time I hear a customer requirement for Bates Stamping.

Question: I think we can all understand the Bates Stamps….however, the technical speak around SHA-1 hashing is mathematical jargon. Does a legal professional need to understand to leverage the SHA-1?

Shook: Of course, as with any technology you need to have an understanding of how it works to properly leverage its use.  One must-have bit of knowledge in the eDiscovery space is to understand exactly what is being fed into the hash algorithm to compute the fingerprint output, and how different options will affect your results.  For example, most hashing tools will use the “core” content of an file object when computing its hash value, ignoring the so-called system metadata such as the filename, date of creation, etc.  This is an extremely useful decision in most cases, because it insures that a file will be identified as a copy regardless of whether its filename has been changed or its last date of access has been modified.  However, in some cases the name of the file may be extremely important across otherwise identical files, and you will not want to de-duplicate based solely upon the hash value.  In the case of email messages, there are so many different potential fields for hashing that you’ll want to understand exactly what is being used in a particular instance.

EMC SourceOne eDiscovery – Kazeon solution makes extensive use of hash technologies in all of these areas.  And in keeping with our basic design philosophy of giving you choices, it does not force you into one method.   For example, while de-duplication is typically a needed and desired feature in eDiscovery, there are times where it is not be useful.  In the Kazeon solution, when working with a document set, there is a choice as to whether documents should be de-duplicated based upon hash values or if they should all be maintained separately.

eDiscovery StraightTalk with James D. Shook, Esq.

We hope you have found this issue of eDiscovery StraightTalk insightful.  If you have questions that you would like to have answered in future issues, please submit them via email at david@kazeon.com.

  • StumbleUpon
  • del.icio.us
  • Digg
  • Furl
  • Ma.gnolia
  • Reddit

Write a comment