In order to review documents within an eDiscovery process, it is helpful to assess entire email conversations. Short messages without context may be difficult or even impossible to assess, e.g. when preceding questions are missing. Furthermore, it is simply more efficient to tag an entire thread rather every single message in the thread.
Mathematically speaking, an email thread is a directed acyclic graph that relates messages to other messages. An email message is related to another by sending, replying and forwarding. A message in the sender’s sent box is related to the same message in the recipient’s inbox. Analogously, a received message in the inbox is related to the reply in the outbox.
Depending on the collection process, it may well be that not all email messages of a conversation have been collected. Thus, an email thread may not be a connected graph; in particular, the root message may be missing. Depending on the configuration of the mail client, the body of the root message may be included in a related mail. The fact that these included texts may be changed subsequently must be taken into account.
The relations between email messages can be determined in different ways. Sometimes, but not always, thread identifiers are available. References may be available, pointing to the message identifier of a related message. If none of this information is available, hashing sender, subject and date may yield additional relations.