News Stories

Don’t Store Your Data Like the Empire in Star Wars

Now that the hubbub has quieted down around the new Star Wars movie, Rogue One, we can finally concentrate on the important part of the film: its portrayal of digital archiving. I sat down with the Bentley’s Max Eckard to talk to him about how the Bentley archives data and how you can do a better job managing your own files than the Empire did in the film.

by Robert Havey

Q: Hi Max, can you tell us your official title at the Bentley?

A: Sure, I’m the Assistant Archivist for Digital Curation.

Q: Some people may not realize that the Bentley gets more than papers and photos—we get digital data too. Can you tell me how the Bentley stores hard drives when we receive them?

A: We don’t actually store the hard drives when we get them. Hard drives and other media like them (optical discs like CDs and DVDs, USB drives, floppy disks, etc.) are actually some the most fragile sources of digital material we receive. They don’t have a long lifespan. Whenever we get storage media like that we try to get the content off there as soon as we can and make authentic, bit-by-bit copies. Each copy will retain key information (like last modified times/dates) and will be in what we call a baseline preservation state.

maxzipdisk

Max Eckard, digital curation expert!

Q: Once you get the information off the hard drive, where does it go?

Initially we put it through a series of transfer steps, which is basic stuff like looking for viruses, extracting metadata about the file formats, extracting zipped files, and running scans that look for sensitive data like Social Security and Credit Card numbers. From there it goes to a secure backlog.

Eventually, after it’s fully processed, it will go into DeepBlue, which is U-M’s digital repository for long-term preservation and access.

Q: So what you’re saying is that the Bentley doesn’t have a gigantic claw machine to retrieve hard drives.

A: Uh, yes [laughs]. The Bentley doesn’t have a claw machine. It might be cool, I think.

Q: So how do patrons access the Bentley’s digital material (if not through an elaborate heist)?

A: The Bentley’s “born-digital” holdings (that is, they came to us from donors as digital files) are accessible through DeepBlue (https://deepblue.lib.umich.edu/handle/2027.42/65133), or through links to particular items on a specific collection’s finding aid. Finding aids provide contextual information about the files, particularly information about the collection the file is located in.

Q: Do you have advice on how people should preserve and protect their own data?

A: First, I’d take an inventory of all the places you have digital files, and think about which of those you really care to spend time (and it does take time!) managing and preserving. These could be documents on your home or work computer or in a cloud storage provider, photos on a digital camera or phone, or on Facebook or other social media, or even health or legal records stored somewhere that are extremely sensitive and important.

After you’ve compiled all your important files, start to make a strategy for managing them going forward. An important thing to consider is that right now, you know your files. You work with them almost every day, you know when and how the files were created. The thing is, you have to think about not just you now, but other people and in the future. Were your files created in a weird proprietary format? Will you or anyone else be able to open your files in the future? Even if someone could open a file, would they understand what they were looking at?

Which leads to another important point: metadata. Metadata is just a fancy word for information about the file itself. A file doesn’t tell you a lot about itself, besides its name (which may or may not be helpful) and maybe who created it or when. You want more detailed information about the file available to you or anyone else who needs to figure out what it is in the future. Things like spreadsheets can help with this, or even just a document that describes your organization scheme. File and folder naming conventions/structures can help with this too.

Finally, think about storage. Ultimately, you don’t want your files to end up in only in one place, on one hard drive where it’s real easy to spill coffee on it—

Or get it stolen by rebels?

—Sure, or get it stolen. If an important file is only in one place, there are plenty of random accidents that can happen to it. The big idea is that unlike paper, copies of digital content can be *exact* copies so there’s no reason why you shouldn’t make them and store redundant copies in geographically different locations. There are new, fancy hard drives out there that say your data will be safe on them, that they will be readable in 1,000 years, but they aren’t foolproof.

I’ve found this guide from our friends at MIT to be particularly helpful with this kind of thing: http://libraries.mit.edu/digital-archives/files/2015/10/2015_pda_handoutdissemination-v3.pdf.

Q: Thanks, Max. May the Force be with you.