Depending on the industry and the typical damage periods (i.e. Anti-Trust where the damage period can go back 10-15 years on averge) most of the data needed for responding to legal discovery will reside on tape. So if you’ve recently implemented an archival strategy for capturing e-mails, where you can use the system for legal discovery, you may still need to deal with tape restores for quite some time, years even.
It’s prudent to consider tape restoration to the archive as part of your implementation and strategy upfront.
About the picture. PowderHorn 9310 tape library
The cost for tape restoration can usually be high and typically involve third parties other than your archival vendor to deal with the factory style logistics needed for managing 100’s or maybe 1000’s of tapes. The key need however resides within the archive tier and it’s ability at the component layer to handle lower level e-mail formats such as internet standard RFC822, EML, DXML, etc. basically as many formats as possible, so that the avenues for re-ingestion are flexible. What’s more important however is the ability to separate or mark the data that is coming from tape, as such. Perhaps, keeping more than just a virtual store makes the most sense.
One of the key pieces of information that you lose when restoring from tape is access to unwind the address books or directory servers containing group based addressing information. So the search for all e-mails to [Fred Smith] will only yield those messages where Fred Smith is listed explicitly in the To: field. All of the messages where Fred Smith received messages as a result of being a member of a group will not be presented in the search.
Some vendors enumerate the information backwards by simply identifying all of the mailboxes that contained a certain message and then deducing that if a message was in your INBOX and you are not on the address line, then you were a member of the group distribution. That doesn't catch the use case where Fred received a message, opened it, read it, and then deleted it, before the next backup cycle - if the email backup system is backing up mailboxes then the tape restoration, even with a deduced distribution list is completely unaware of this transaction.
In the case where the group lists are preserved, the system would know the more pertinent piece of data, which is that Fred received a message, but its not in his Inbox.
This is one reason why it’s a good practice not to mix newly captured e-mails where the distributions lists are expanded in real-time, with legacy data imports where access to the distribution list which was active and current at the time was integrated separately - the context for errors, questions and further research really do require separate approaches to real-time vs. legacy email data.
Lastly, once the tape data is successfully ingested into the archive, and now managed via policies which govern it’s expiration based on categories, destroy the tapes, for legacy data they no longer serve a purpose and represent potential risk!

Comments