Updated: Jul 9, 2019
For some time now, e-Discovery has been a hot button issue and it is not going away. There are many dedicated positions in law firms now that consult and project manage e-discovery work. As the digital age continues to progress, attorneys and paralegals will need to be very well versed in e-discovery in order to best service their clients. Below are some of the top terms you should know when working on an eDiscovery project.
The literal definition for ‘data processing’ is taking a set of data and converting it to a ‘more usable format’. In e-discovery terms, this means taking the raw electronic evidence and converting it into a format where you can easily review it, typically in an e-discovery software platform.
Metadata is ‘data about data’. Data typically refers to documents. This can be things like the Author, Date Created, To, Subject, From, and Sent fields. There are hundreds, but just know it is data about a document.
This simply means taking the collected raw data and loading it into data processing software so you can manipulate it. Ingestion and indexing are practically the same thing.
Culling, e-discovery terms, is synonymous with filtering. Collections of electronically stored information (ESI) often have a high volume of non-responsive or duplicative documents in them. Culling is a technique used to get rid of those documents so attorneys do not waste time looking at them.
This term refers to the person or thing that had ‘custody’ of the data. For example, if you collected email from John Doe, the ‘Custodian’ for that email would be John Doe.
When reviewing documents during discovery, it is not uncommon to find the exact same documents in the collection. De-duplication is the process of the computer identifying these files and removing them. The computer will use a unique fingerprint called a hash value to determine what are duplicates and what aren't.
NIST stands for the National Institute of Standards and Technology. They have a list of file extensions that are computer-generated or application-generated – essentially the files that help the computer run. De-NISTing is the process of removing those since they were not created by users.
Computers have gotten much more sophisticated in recent years. Predictive coding (sometimes referred to as TAR – technology assisted review) is the process of having the computer extrapolate a humans decisions across a large data set. There is a lost more to it, but just know it is in the sphere of AI and is gaining momentum!
The load file is used to import data into a database. This will include all of the ‘metadata’ about a document. Popular formats are .DII, .BAK, and .DAT.
There are many more terms, but these are some of the major ones to know when working on an e-discovery project. If you have any further questions, please feel free to reach out!