Skip to main content

Digital Preservation conference notes

Really enjoyed this year's Digital Preservation conference in Washington DC- the annual meeting of the National Digital Information Infrastructure and the National Digital Stewardship Alliance. It proved to be a good variety of perspective and types of libraries/archives/museums/repositories throughout.

I've put together some of my notes below that might be of interest for others- Enjoy! I hope to get the chance to return to the conference down the road sometime.

Tuesday, July 22nd

An Overview of NDSA Advancing the Capacity to Preserve our Nation's Digital Resources, National Digital Steward Alliance, Micah Altman (presentation slides)

No-one needs digital preservation for its own sake (the hardware/tools/platforms), but rather for the long-haul. Equation to scientific research needs a cumulative, traceable evidence base, as such we need long-term access to information in order to communicate to future generations and further build on knowledge bases.

"People don't need drills, they need holes"

Altman calls for more collaboration and coordination in stewardship. **Single institutions cannot counter all the risks. With the sheer amount of data ever on the rise requiring not only storage but curation, and more open access models, there is a need for a higher level of performance and collaboration to achieve these goals. Should we change to a publish now, filter later model? (Data deluge problem?)

Talked about Amazon Glacier, with the 99.999999999% reliability, but is this really a good odd when you consider the content? What about the .000000001% loss? (Particularly in regard to large collections- This can be a substantial loss over time)

We need to weigh cost with long-term loss. (Ex- Glacier may be a cheap option now, but what are the long-term gauges on loss to get data back over time?

Look for the 2015 publication, 'National Agenda for Digital Stewardship' to act as a roadmap for this shared stewardship. Geared towards senior institutional decision makers, and focused on: Digital Content areas, Organizational Roles/Policies/Practice, Technical infrastructure and Research.

"Software, it's a thing", Matt Kirschenbaum (presentation slides)

Digital restoration- Using software to uncover lost Warhol images found on old discs, using GraphiCraft software, or other examples of lost knowledge/image/art like other viewpoints of the first trip to the moon that were lost, or old video games (lost atari ET game)

History of how software eventually became unbundled from its hardware counterpart, and how it changed to be a standalone, packaged piece of code.

Talked about the importance of sites like the National Software Reference Library, and incorporation of file profiles into a reference data set for long term preservation of software to enable reading older file formats, which will continue to plague those involved in digital preservation. Software as expression- Without the programs to read the files, what good are preserving just the files? More holistic look at preservation (packaging software with associated files). Software will continue to be the key to save the past, though many many approaches and frameworks to this ideology.

Open source has changed this game with sites like GitHub preserving the code and versioning.

Internet Archive has software tab, and a mechanism to run an in-browser emulation.

Do we need to rethink how we are capturing and storing content? What good are mountains of files with no readers?

Preservation Aesthetics, Shannon Mattern (presentation slides)

We are in an interesting point where we have creative works that often outlive their original medium. This should be a conversation from inception- Formats are bound to change, but what do we lose (or gain) with migration and change over time? What about formats that require certain types of equipment to emit its meaning? (Digital artwork and display, in particular)

Does it become the preservation of aesthetics, or the aesthetics of preservation?

Variable media network- Guggenheim's unconventional preservation strategy.

Are some works essentially unpreservable? Or, what parts can we preserve (or should we?) These questions present many ethical conversations for collection managers. In art, this is trickier with the framework of embodiment of the original aura. Is migration reflective of the embodiment? Maybe so, or maybe not- Many fine lines here.

New Museum Transfer station- Open door artist-centered media archiving project

Stewarding Space Data panel (Multiple slide presentations- Ramapriyan- NASA's earth observing system data and info system; Byrne- Levels of Archival Stewardship at National Oceanographic Data Center; Shaw- Stewarding early space data at the University of Iowa; )

Premise of the talks was on the idea of freeing big data into a larger arena. Many stages of data management, and oftentimes the working storage space available during projects are not considered permanent, and how can the data be captured, accessed and stored over the long-term?

Many logistics issues for NASA data- 10 satellite missions on-going and collecting huge amounts of data. They are not considered a permanent archive agency, but rather a research archive for as long as scientific research and/or transition responsibility to another archive.

General mandate that there should be no loss of bits, and data sets be: discoverable, accessible, readable, understandable, usable and reproducible.

In the case of University of Iowa, there were preservation issues, and also capturing a medium that noone had used or heard in decades, which brought questions of if the capture was done correctly. They collaborated with local physicists, and got input throughout the project.

Some interesting conversation and insight on database preservation- Format-centric or systems centric? What do we need to be preserving?

Community Driven Innovation (Presentation slides)
Research Data Alliance- Development and sharing of data accelerates coordinated data infrastructure. Currently a working group working on sharable code, policy, standards and "harvestable efforts"

Related- AP Trust- Trusted digital repository with read/write functionality (compared with DPN service which is not re-writable)

Smaller local efforts- 5 colleges in Massachusetts - Focused mainly on local need and open, free workshops. Also keyed in on a small part of experimentation- Looking at some open source products to utilize for small institutions with little/no tech expertise

POWRR- Preserving digital objects with restricted resources- Compiling a tool grid for an open resource for digital libraries

Next generation: the first digital repository for museum collections (Presentation slides)
Overview of project to create MOMA's first digital repository with Artefactual Systems (Archivematica creators)

Analyzed past years of growth in digital media. MOMA is mainly a time-based media collection, so the size of the collection is growing at a rapid pace and has much different needs that other types of repositories. Many of the requirements were built on Archivematica main actions and functionalities. Growth per GB is at a huge annual rate beginning in 2002.

5 departments in MOMA were looked at in-depth with particular needs, and assessed the stakeholders. Their needs defined something beyond the current systems (TMS and DAM)

Demo of current state of the new digital repository- Due out later in the year, and code will be open and freely available.

Link to Poster Session summaries 

Wednesday July 23rd
Tools Showcase and Demo Session
Wayne State digital collections infrastructure- Customized Fedora implementation. Didn't have Ruby or Drupal programmers to explore Hydra or Islandora. Created in-house solution for a front-end interface (named Ouroboros). Soft launch in May. They realize this may not be the over-arching solution for long-term preservation, but something that adapted to their use of Python.

Community ScanDays (ResCarta)- Non-profit organization that focuses on small institutions with little/no ability for internally created and maintained digital collections. http://www.rescarta.org/ They will come to small community collections and use a trained base of people to scan high priority items within a short timeframe. Some open source tools of interest- Audio Transcription

Dance Heritage coalition- A different type of repository, since much of the content is not located in a single collection, but rather materials from a number of content creators, stewards, dance companies, choreographers, foundations that will likely never be given to a traditional repository. They have a specialized set of metadata fields that are particular to dance.

AVPreserve Tools- 3 free tools available - Fixity, MDQC, and cost calculator. Addressing different issues apparent in digital preservation, from file verification to embedded metadata and long-term cost.

Using Metadata to support the presumption of Authenticity (presentation slides)
Trustworthiness- Cloud based storage. Talked again about the idea of loss in the cloud. Many terms and conditions of cloud service with the concept of "as is" service- No warranty, guarantee of service. Maintaining integrity does not necessarily mean authenticity.

Conversations about repeated conversion and migration are a must. There are no static digital files- Change is inevitable, and how do we build an infrastructure that can bear witness to these changes?

InterPARES TRUST- Generating a theoretical and methodological framework that will support the development of integrated and consistent networks of policies, procedures, regulations, standards and legislation. Sneak peaks at Flow diagrams, specifications and class diagrams.

Implementing preservation metadata (presentation slides)
PREMIS data dictionary- Common data model for organizing and thinking about preservation metadata. Guidance for local implementations. OAIS compatible.

Why the need for a conformance statement? Technical neutrality of PREMIS, and contexts like inter-repository exchange, repository certification, shared registries, automation and vendor support

New working group to draft new conformance statements. Three levels of conformance- Mapping, export and direct implementation. Refinement within each level- Object entity only, object plus Events and Agents

Practical and Conceptual considerations of Research Object Preservation (presentation slides)
Data changes states throughout the collections, processing, analysis and publication phases. The processes are often as important as the products. Oftentimes, this is in a non-linear fashion.

Breaking down these states into three main segments- Live, Curated and Published. (May have many variants within each segment). Some flux throughout, and versioning is very important to capture.

Next presentation from OCLC highlighted the re-use stage of data- How opening up networks of data storage and access can indeed lead to advancing research. Researchers who may not think of their notes and data as necessarily valuable during the collection stage can be helpful to other researchers down the road (Example of researcher wanting to pitch field notes, but how content specialist was able to link up another researcher who was also studying bone density)

Future of Web Archiving (presentation slides)
The Web in transition- Moving from HTML to Java, DocView to VM, Desktop to Mobile, etc.

General lack of policies, rights, permissions. Need to coordinate infrastructure in this volatile environment with better capture mechanisms and discovery modalities. (And how to assess quality or validity of these resources? Many errors in the Wayback machine- Example of pulling up weather channel examples with incorrect time and date on specified pages). Even some faults in the naming conventions in these tools that point to difficulties. Memento project to help give some standards and best practices.

Archival Acid Test- Evaluating quality and performance- Link to more info

Other links/resources of interest
LC Recommended Formats- Link to slides
DuraSpace/Chronopolis/DPN collaboration- Link to slides
Video game source discs at LoC- Link to slides
Save your databases using SIARD- Link to slides
DIY History- Link
AVPreserve- Link to open source programs




 

Comments

Popular posts from this blog

Summertime

It's funny when you work in academics, and so many people assume you have summers off. Summers end up being the busiest time of the year for me for a few reasons. I may have less meetings with the 9 month contract faculty, but it's high time for projects. I normally have students who can work more hours, and that in turns means more time from me to keep things running. This summer I have been working closely with the Kent Historical Society on a project to get their oral history collection online, over the past three years or so using interns from the Library Science program at KSU. We finally got the first batch up, after alot of work to capture and transfer digital files, make content descriptions and also work on transcriptions. Here's the link to the collection at Ohio Memory: http://www.ohiomemory.org/cdm/landingpage/collection/p16007coll83  It's been amazing to work with the historical society- Alot of names that are familiar to me being a Kent native, and

Privacy and digital collections

This past October, I put in a book proposal on the topic of ethical decision-making around privacy issues in digital collections. It has been accepted by Morgan and Claypool, and I am cranking to meet a May 1st deadline to get this into print by November. It's exciting, but also nerve-wracking and perhaps a little terrifying for a few reasons. Ethics is head space that I very much enjoy- This work will include a nod to an essay from Martin Heidegger, which oddly enough I used a different Heidegger essay in my museum studies MA thesis on the ethics of art conservation. The philosophy aspect in ethics is probably the most enjoyable part for me, but it's also unbelievably murky waters. I spent many years rejecting absolutes in my early twenties, though at some point I have to put the pen to the paper and just write. (Funny sidenote- This digital girl still prefers the analog. I write primarily on my laptop and then print out draft and edit by hand. I also hate, hate, hate e-bo

Tenure track, twins and prenatal loss

Life of late has been crazy busy. Technically the tenure clock is paused this year as I toll, yet the 2 year NHRPC  grant kicked off last September as I returned from maternity leave, and I continue to make a stab at research and writing in the interim. But my life has changed quite a bit (and as such, the intermittent absence of the blog). We welcomed twins last May, who are currently inches away from walking and continue to keep me on my toes in a daily whirlwind of activity. They came into this world exactly a year and a day after our devastating full term loss in 2015. Life is strange and odd, and often I find that I am still reeling when I think about the unexplained loss of our first. It has been difficult to move on, and feel a huge part of my heart remains with that baby. There are constant reminders- friends who had successful births around the same time remind me of the huge, gaping hole in our lives when I see their little one, or walking by the tree my amazingly thought