10
May
Please generate and paste your ad code here. If left empty, the default referral ads will be shown on your blog.

The following is a guest post by Jane Mandelbaum, IT Project Manager at the Library of Congress’s Office of Strategic Initiatives.

The Insights Interview series is an occasional feature sharing interviews and conversations between National Digital Stewardship Alliance Innovation Working Group members and individuals involved in projects related to preservation, access, and stewardship of digital information. In this post, I am excited to have the chance to talk with Joe Lambert, Founder and Executive Director of the Center for Digital Storytelling.

Joe Lambert by user cogdogblog on Flickr

Joe Lambert by user cogdogblog on Flickr

Q1. Could you give us a quick overview of your organization?

The Center for Digital Storytelling was founded in the early 1990s as a community-based training center in digital media.  From the beginning we were known for encouraging a style of short (2-3 min), personal films.  The method and applications of a workshop model became our focus as we moved to UC Berkeley in 1998.  Since then we have expanded as an international organization with approximately 90 separate projects per year, helping organizations engage populations in producing and distributing stories in numerous contexts.

Q2: What do you think makes digital storytelling different than other storytelling?

Our original interest was the simple idea of affordability and distribution of video editing as a form of expression, summed up in the idea that what typing did to 20th century literacy, video editing will do to 21st Century literacy.  But implicit in this understanding is that when you compose in film, or words spoken, sound and image, you are engaged in multi-modal communication – that had exponentially more complex impact on you as an editor/creator, as well as your audience. What we share with the storytelling traditions is a strong sense of formal issues on the creation of meaningful and powerful stories, we teach the elements of story as a core part of our trainings.

Q3: How did the center get started and what do you think has contributed to its growth?

The Center grew out of my theater and community arts organization, Life on the Water, in San Francisco, and a specific collaboration between myself and the late Dana Atchley, a professional video producer, designer and performing artist, called Next Exit.   We grew through several stages, and to some extent markets, starting as an arts-based organization working locally but also engaged in the media technology industries of the early and mid-nineties.  In moving to Berkeley, our emphasis became more explicitly educational and tied to discussions of digital media literacy and instructional technology. In this, our third phase, beginning in 2005 our focus became more and more human services and work with NGO and agencies dealing with post-trauma or difficult life issues and the use of storytelling as a healing modality.

Q4: What kinds of people or examples have inspired your work?

I am inspired by numerous sources; certainly the work of Studs Terkel in popularizing stories of ordinary people and oral history.  I am deeply inspired by the general trends of community-based arts where artists engage people in the issues of their lives.  More recently, I take inspiration from Storycorps and organizations like the Museum of the Person in Brazil, that have managed to make reflections on lives a valued and more central part of the dominant cultures in their countries.

Q5: Can you describe the different ways in which people interact with stories?

At the general level, it is of course fundamental human activity.  Answering what happened, listening as witness to other’s experience and to your own, makes us human, and shapes us in countless ways.  In the specific sense of our work, the stories start as deeply personal artifacts, often to be shown to a limited audience, family members, friend and community, and on the other end of the spectrum become broadcast stories consumed by a general public.  We delineate for our partners and clients how the stories serve these different purposes, personal expression and the preservation of memory, tools for learning and the sharing of information, tools for organizing and mobilization, tools for advocacy and social change, and tools for evaluation and reflection.  In each way, the same story can said to have a different role, simply by the way it is contextualized.

Q6: I can see that the work includes the recording of stories, and also outreach and education efforts. How do you evaluate the success of what you do? How do you describe your outcomes?

One fortunate part of being a process organization that makes a product is that the products themselves are tangible outcomes.  When we work with a community to discuss issues and address problems, the community ends up with a compendium of stories they can use in all the above ways.  We also develop project plans with specific additional deliverables including curricula, study guides, subtitling, web design, presentation services, etc, that are evaluated for their impact and usefulness.  When possible we like to have thorough evaluation of the movement of the storytellers from beginning to end of the process, what changed within them in the process, and how the story is continued to be used.  Many colleagues in the academic world have taken a much deeper look at the long term impact of a digital storytelling process within educational and community environments.

Q7: You offer web media design and production services.  How does that work fit in?

It is a small portion of what we do, but inevitably organizations with which we are collaborating are also in the process of more developed media strategy, including re-design of websites, or the making of documentaries about their organization, or a project, and because we have those capacities as well, we will take on the project to assist.

Digital Storytelling Viewing Station by user MACSD on Flickr

Digital Storytelling Viewing Station by user MACSD on Flickr

Please generate and paste your ad code here. If left empty, the default referral ads will be shown on your blog.

Q8: What methods have you found useful in encouraging large-scale projects?  What do you think are keys to scalability?

Besides obviously some resource for the technology (although that is endlessly cheaper), the real issue is systems of training facilitators.  Like many honed methods, the quality of experience as well as the effectiveness of the stories, comes with the ability to adapt to a given set of individuals and the context of the environment (both in terms of how storytellers are engaged and purposing the work, as well as the issues with the technological infrastructure provided).  We take several years of collaboration with a partner to fully qualify facilitators, but once they are present, programs tend to grow, because these “lead” facilitators can then pass those skills on and create sustainable capacities for the organization/community.  The other issue is vision of the ways the stories become part of a community, in ongoing rituals of presentation events, or senses that this particular process serves as a rallying process for certain kinds of campaigns.  In University settings we have seen the projects scale because they have a clear niche in the way people make community. People have learned to gather around each other’s digital stories, as ways to recognize each other, and support each other’s interests and unique contributions.

Q9: How do you think about or encourage preservation of stories into the future?  How do you address that in your partnerships and workshops? What can be learned from how people have preserved stories in the past? What kinds of technology issues have you dealt with, and how have you dealt with them?

Our standing agreement is that all projects, the output film files, and the project resources (were the film need to be re-edited from scratch), are maintained by CDS.  So 900-1200 stories a year are archived by our organization through this process.  The archive has taken every media form, from videotape, to CD-Rom, to DVD, every kind of storage media, but mainly external hard drives.  Our centralized archive is on a DROBO 16 TB server, holding approximately 4000 stories. The main lesson is tertiary levels of back up, with no two levels being stored at the same location.  We have used some cloud based storage, but the data flow is a bit too much for us to do at the informal level.  At six different times we have attempted putting in place a searchable system for the archive, but we still are mainly using a date process, as the files are not maintained with inherent metadata; the metadata format exists, as do the databases, but as a small independent arts organization we have not been able to afford and maintain staff focused on our archive.  As a result, when we are asked, as happens once a month, do you have a copy of my movie from 2001, we have a way of finding the film, but it is by no means at a moment’s notice, and is not available to researchers and others as a rich research source, which would be our preference.

Q10: Do you work or interact with organizations such as libraries and archives that collect digital content?

Shortly after the agreement with StoryCorp to bring their material to the LOC, we approached the Folklife folks about the same idea with our archive, but we had no simple way of providing ongoing maintenance, so we gave up on that solution.  We should find an institutional partner to assist us, as it really is an amazing archive at this point.

DS106 Set Title Image by user life-long-learners on Flickr

DS106 Set Title Image by user life-long-learners on Flickr

Q11:  What do you think are currently the most pressing problems that need to be addressed for those working with digital content? To what extent do you think we are addressing these problems?

I really cannot speak for the field.  Data management has not vastly improved in terms of standardization, but the cloud-based solutions suggest we are not far from trusting the great brother in the sky to maintain everything for us.  In which case, how you move terabytes of data from a small organization up into the cloud is the problem.

Surprisingly, for our group,  the issues that concern us start with digital creation.  We have no simple way to collect metadata information on the films as they are created, so we create a backlog of mind-numbing data entry work to make the documents valuable.  It seems that we need a way, even as people are titling their pieces, and/or filling out their evaluations, that they fill out the story metadata information, choosing appropriate tags, and content information so we would not have return to the work, years later, and make sense of what these stories can tell us, or how we can use them as examples.

Q12: What do you think are the kinds of problems that your organization will be facing in the near future? In the longer future?

We would love help, is what it comes down to.

Q13. What do you find encouraging in our current world, in terms of your work?  What do you find discouraging?

The pendulum from a data-centric, logico-centric domination of what we consider knowledge and wisdom, to the intuitive/creative/story based wisdom, is swinging our way. We need a more balanced understanding of knowing what it means to be human, and ways for listening processes to be seen as at least as valuable as arguing processes. What is discouraging is how little listening happens, lots of noise, lots of things to listen to, but little real listening.  We are drowning in information immediacy, and we need the lifeboat of reflection.  I keep putting patches in canvas to keep it floating, but the tide keeps rising as far as I can see.

Q14. How do you think storytelling and listening affect other aspects of people’s lives?

I obviously think story is the whole enchilada of our lives.  We carry a script of ourselves, we generally do not alter the script, we are trapped inside so many levels of self-identification and self-rationalization that we can not see how it molds all our choices, our behaviors.  If you go tell and re-tell the seven essential stories of your life, family self/becoming, community self/connecting, essential self/being, loving self/partnering, creative self/serving, thinking self/knowing, and dying self/transcending, you become a more complete person.  It is a challenge, and many more people are taking that challenge, and are made richer for it.

Q15: What have you learned from other fields or professions?

I read across history, society, cognitive science, technology, health and wellness, mindfulness and spirituality, and stories, many stories.  I take parts of each to consider different parts of the way stories work upon us.  I am currently engulfed in ethics, particularly professional ethics in the human contact professions.  Lots of words to actualize the golden rule.

Q16: We’re always trying to tell compelling stories that illustrate the importance of digital preservation.  Do you have advice for us?

Funny thing happened on the way to the hardrive is not a joke I have heard lately, in part because a hard drive is precisely an inconsequential thing.  Pulling a hand written letter from a file cabinet that was written by your great uncle about his feelings on Woodrow Wilson’s role at Versailles is one thing, whipping it out of the National Archive database is another.  The visceral still has great hold on us.  Most of the stories we have about the encyclopedia internetica are about the delightful accidental discoveries, you thought you were looking for X and you found Y, and Y it turns out is exactly what you needed.  I think those kinds of stories rationalize the great apparatus of memory, that we can make a link that was not expected, and that turns out to be a decisive moment in understanding.  Do you have any of those?

10
May
Please generate and paste your ad code here. If left empty, the default referral ads will be shown on your blog.

The following is a guest post by Carla Miller, Administrative Specialist for the Office of Strategic Initiatives.

This is Part Two of a post reporting on the joint meeting on March 23, 2012 of both the Still Image and Audio Visual Working Groups of the Federal Agencies Digitization Guidelines Initiative hosted at the National Archives and Records Administration’s College Park campus.  Part One covered the Still Image meeting, and now we move on to the Audio Visual meeting.

After the Still Image Working Group meeting, attendees had lunch and then were given a tour by Kate Murray of the Digitization Services facilities at NARA.  Once the tour concluded, Carl Fleischhauer of the Library of Congress led the Audio-Visual Working Group meeting.

Seen on the digital conversion tour at NARA in College Park, MD: the Sondor Altra Film Scanner, used to capture imagery from motion picture film. Photo by Criss Kovac, courtesy of NARA.

Standards are not fixed but continue to evolve over time, and so it is with the standards for embedding metadata in Broadcast WAVE files.  In 2011, the European Broadcast Union standards body updated their specification for Broadcast WAVE files.  In response to this, Jimi Jones led an effort to draft Version 2 of the FADGI guideline Embedding Metadata in Broadcast WAVE Files, and at this meeting the revision was adopted by the group.  Thus the FADGI document remains in sync with the revised EBU standard.

Kate Murray from NARA presented information about an XML technical metadata schema they have developed for reformatted video objects.  In support of this schema, NARA has also developed an XML metadata export/extraction tool that will organize and assemble data to meet the schema specification.  NARA has identified the best area within the AVI file header to embed limited and controlled metadata for preservation purposes and has developed a tool that supports embedding, validating and exporting of metadata in AVI files.  This is an interesting example of synergy with open source, since NARA borrows from and uses a pre-existing tool MediaInfo (available through SourceForge) and contributed a new open source tool AVI MetaEdit (available through GitHub).  Some additional information is provided on pages linked from NARA’s Products and Services Web site.

Expert consultant Chris Lacinak discussed his audio performance metrics project.  The project report Analog-to-Digital Converter Performance Specification and Testing Study and Recommended Guideline has been posted on the FADGI site for comment.

 

 

10
May

Four years ago, the National Digital Information Infrastructure and Preservation Program awarded grants to four projects involving multiple states, known as the Preserving State Government Digital Information initiative.

At that time, NDIIPP was in the process of expanding its network of partnerships through projects exploring the preservation of and future access to at-risk digital content.   State government digital information was identified as content particularly at-risk, and there was relatively little experience in cross-state preservation collaboration.  In fact, few individual states had made much headway in managing digital information at the state or local levels.

Flash forward to 2012. The Preserving State Government Digital Information projects are in the process of wrapping up. We’ve talked about the excellent results the Model Technological and Social Architecture for the Preservation of State Government Digital Information project.   Now I have the opportunity to share some of the results of the Persistent Digital Archives and Library System research project.

Members of the Persistent Digital Archives and Library System research project team at BPE2010.

Members of the Persistent Digital Archives and Library System research project team at BPE2010.

Led by the Arizona State Library, Archives and Public Records, the goal of PeDALS was to develop a shared curatorial framework for the preservation of digital public records and to investigate technical solutions for managing the records, including ingesting and cost-effective storage solutions for large collections of state government agency publications and records.

Results from the technical investigations are one of the highlights of the project.  The project adopted a distributed preservation system for a cost-effective storage solution.  Partner states benefited from sharing knowledge, increasing staff technical skills and understanding requirements for maintaining and providing access to electronic records.  Creating and leveraging shared learning that resulted in expertise across states – a great outcome.

The project website documents the results of the partner efforts mentioned above.  The projects also explored solutions to support the life cycle of the curatorial process.  Project members from Florida, Arizona, Wisconsin and New York developed a core metadata dictionary of the metadata elements used across multiple states; the network architecture developed for the project is laid out; and a description of LOCKSS technology concisely discusses the “digital stacks” approach of the project.

As the PeDALS final report (PDF) notes, the partner states faced many challenges due to budget cuts, making it difficult to hire, train and retain staff, to participate in project meetings and to contribute staff or technology resources.  Despite these challenges, partner states and project team members developed a strong affinity for collaborating across state lines.  A community of shared practice emerged, engaging in best practices for preserving state government records, developing and testing software, with the hope to foster a system that could be applied to multiple states.

The PeDALS project officially wrapped up in March 2012.  Going forward, the PeDALS partnership will be a loose confederation of the four states – Alabama, Arizona and Wisconsin, with New York being an active observer.

10
May

Doug White, of NSRL

Insights is an occasional series of posts in which members of National Digital Stewardship Alliance Innovation Working Group take a bit of time to chat with people doing novel, exciting and innovative work in and around digital preservation and stewardship. In this post, I am thrilled to have a chance to hear from Doug White, Project leader for the National Institute of Standards and Technology National Software Reference Library. I heard Doug give an fantastic talk about his work at the CurateGear Workshop (see slides from the talk here).

Before we dig into the details of the project, you mentioned that the NSRL has already resulted in saving at least one person’s life. Could you walk us through exactly how that came about? I think it makes for a really compelling story for why software preservation matters.

Nice Paint Job, by Vicki's Picks, on Flicr

Nice Paint Job, by Vicki's Picks, on Flickr

Doug: Certainly; it was an unintentional circumstance. To begin, we often were asked if software may be borrowed from the NSRL, and the response was, “No, we are a reference, not a lending library.” But then we received a call from an Food and Drug Administration agent on a Friday afternoon in December 2004.

A medical supply company in Miami had received a delivery of botulin, which was to be processed into Botox and distributed. However, it was misprocessed, and a dangerous concentrate was distributed. The FDA had all of the information needed to identify the recipients, but the information was in a file created with a 2003 version of a popular business software application. The 2004 version available to the FDA could not open the data file. The manufacturer of the software was also unable to supply the relevant version.

It so happened that one of the agents involved in the case was familiar with the NSRL, and had in fact provided software to us earlier in the year. He called, explained the situation, and asked if we had the 2003 version of the software. We did! The agent then arranged for an FDA contact to come to NIST, get the software, and put it on a jet to Miami. The people working the case in Miami were able to install the old version, open the data file, and trace the paths of the botulin.

Several fortunate events occurred to enable this story to end on a positive note. We have a process in place should this occur again, though we consider the NSRL to be a “last resource.”

Trevor: I have heard you describe the National Software Reference Library as a library of software, a database of metadata, a NIST publication and a research environment. Could you give us a little background on the project and explain how NSRL serves these different functions?

Doug: The diagram below is an overview showing several facets of the NSRL. The path using red arrows involves our core operations, green arrows designate “derivative” operations, and blue illustrates some collaborative research.

The physical library is our foundation. At the inception of the project, in 2000, organizations were creating and sharing metadata describing computer files on a very ad hoc basis. If the metadata were questioned, it was highly unlikely that the original media were available to resolve the issue. The NSRL operates in the same fashion as an evidentiary locker, with the original media available in the event of a question.

The physical library has a parallel virtual library. NSRL has created bit-for-bit copies of the original media and images of packaging materials that are kept on a network storage device. I need to point out that the NSRL runs on a network disconnected from the Internet, and in fact, also disconnected from the NIST network infrastructure, using equipment and cables we installed. The media copies can be manipulated automatically, used by multiple processes and repeated physical contact with original objects is minimized.

From the packaging and media, we collect metadata from every application, from every file. We store the metadata in a PostgreSQL database. The database has several schemas, which act as conceptual boundaries around accession processes, the collection of software application descriptions by manual processes, the collection of content metadata by automated processes, storage processes and publication processes. The work processes and the technology are modular components that are easy to test, maintain, train, or reuse. The database metadata (with the exception of staff information) is available on request.

There is a subset of the collected metadata which is of use to investigators and researchers in the community in which NSRL participates, and the subset is published quarterly as NIST Special Database #28. The specific data includes:

  • Manufacturer Name
  • Operating System Name
  • Operating System Version
  • Product Name
  • Product Version
  • Product Language
  • Application Type
  • SHA-1 of file (digital fingerprint)
  • MD5 of file
  • CRC32 of file
  • File Name
  • File Size

The research environment allows NSRL to collaborate with researchers who wish to access the contents of the virtual library. Researchers may perform tasks on the NSRL isolated network that involve access to the copies of media, to individual files, or to “snapshots” of software installations. In addition to the media copies, NSRL has compiled a corpus of the 25,000,000 unique files found on the media, and examples of software installation and execution in virtual machines.

Trevor: Could you give us a brief overview of what exactly is the content of the library? What data and metadata do you collect and how do you work with it?

Doug: The library contains commercial software, both off-the-shelf shrink-wrapped physical packages and download-only “click-wrapped” digital objects. This includes computer operating systems, business software, games, mobile device apps, multimedia collections and malicious software tools.

Metadata, by Shira Golding, on Flickr

Metadata, by Shira Golding, on Flickr

Most of the software in the NSRL is purchased. We try to acquire everything the top selling lists. Some software we hear about by word of mouth, some by schedule (like tax programs each tax year, security, antivirus) and some by requests from law enforcement and other agencies. We accept donations from manufacturers and have paperwork to state we will not use the software license. We accept donations of used software as long as it is in useable condition but there is no guarantee that it will make it into the NSRL.

The data and metadata is detailed in documents on the NSRL website. To summarize, we collect accession data familiar to your readers; the information about the manufacturer and publisher, the minimal requirements listed, the number and types of media, etc. We also process the contents of the media to obtain metadata about the file system(s), directory structure, file types (based on signature strings) and many file-level metadata as I mentioned in the previous question.

NSRL makes minimal use of this metadata. We perform mock investigations using the metadata to measure the applicability. We investigate the randomness of the cryptographic algorithm results. We are constantly seeking related collections with which we could combine an index or translate a taxonomy, to cross-reference NSRL data with other sets.

Trevor: In the context of thinking about NSRL as a research environment it seems that the key value there is the corpus of software, the 23,809,431 unique files, that you have identified. Could you tell us about some of the research uses these have served so far? The audience for the blog varies widely in technical knowledge so it would be ideal if you could unpack these concepts a bit too.

Doug: The highest value, in my opinion, is the provenance and persistence of the collection. Given the virtual library, it is easy to apply new technology, new algorithms to the entire set or specific content automatically, while maintaining the the relationship to previous work and the original media.

NSRL has applied several cryptographic algorithms against the corpus, and statistically analyzed the results. This is an interesting measurement of the algorithm properties within the relatively small scope of binary executable file types. NSRL found that indeed there were no collisions among the 25 million files.

Working with a collaborator, we are able to define precise, static content sections of executable files, obtain a digital fingerprint of those sections, then identify those sections when they are present on a running computer. This can allow an investigator to determine that a program was running, even though the files do not exist on the computer.

Working with a collaborator, we are able to provide practical feedback on the development of an algorithm called a similarity digest. Currently, if you have two digital copies of the Gettysburg Address text, one which begins “Five score and …”, the two cryptographic hashes of the differing files will be extremely dissimilar, as intended. Two similarity digest results on the two Address files will be similar, and the similarity can be measured. Algorithms of this kind are also known as “fuzzy” hashes, and they tend to be impractical for very large sets. We are assisting in developing a practical implementation.

NSRL has in past limited metadata collection to the content of the application media. We have now acquired the resources and defined the processes to automatically install an operating system on a virtual machine, run the OS, perform noteworthy tasks, install applications, generate content, uninstall applications, etc. This enables the collection of metadata on dynamic system files, registries, log files, memory, various versions of user-generated files. We can use some of this metadata as feedback into our core process, and we have some research opportunities.

Another imminent collaboration is the creation of many word processing documents with created with different applications and multiple versions that contain the same text. A corpus of document tags or codes spanning versions and products has generated some interest.

Trevor: Could you tell us a little bit about the NSRL environment? What kinds of technologies and software are you currently using currently and what are you exploring for use in the future?

Doug: We are fortunate to have three contiguous rooms, one that houses the physical library, one that houses the data entry workstations, and one that houses servers and storage. The proximity of the rooms allowed us to pull our own cables, which makes that level of our infrastructure a controlled, known quantity.

The physical library has an alarmed, multi-factor entry control. The shelf system is a powered collapsing system which defaults to a closed, fire-retardant position. The environment is not kept within the recommended practices for archives; this was considered, but not implemented. Heat, fire, humidity and other risks are minimized to the best extent we can.

NSRL has strived to keep infrastructure implementations to hardware and technologies that can be quickly obtained and made functional in the event of a disaster. I would prefer to not name manufacturers at this time, but am willing to discuss those details with individuals.

Ad Hoc, by Steve Rhodes, on Flickr

Ad Hoc, by Steve Rhodes, on Flickr

In the second room, core work is performed using OpenSuSE Linux workstations for browser-based data entry and media copying. The Linux machines can be created in bulk or ad hoc using a net boot image. This room also contains a system used to perform software installations, so the NSRL can collect installed files, registry information and other artifacts of a running application. This room contains a computer attached to the internet on which NSRL downloads digital-only distributions of software. A photography stand and flatbed scanning stations are in this room, used to create digital photos of packaging, so these photos can be used for data entry and research instead of shelved material.

Movement of original packages and media is limited to the previous two rooms.

The third room is a computer server room with racks of equipment. The media copies are stored on a commercial, expandable network (currently 42TB) that is capable of access by Windows, Apple and Linux computers. We have several quad-core rack mounted servers that perform the automated distributed metadata collection tasks. A PostgreSQL database and an Apache webserver reside on one of the rack servers which is dedicated to these functions. The database is on local storage in that server.

The equipment described in the previous paragraph is duplicated, and that is the research environment. Media images, individual files, virtual machine slices and all databases are backed up across a dedicated fiber connection to storage several buildings distant. Verification of critical files is performed nightly. We also periodically ship copies of the critical files to NIST Boulder, CO, campus.

The software we use is mostly written in Perl, with some PHP for the browser-based data entry. Reuse is key, as is flexibility; the NSRL code is essentially a wrapper or application interface which calls third-party tools to manipulate media, files or systems.

E Is For Evidence, by Howdy, I'm H. Michael Karshis, on Flickr

E Is For Evidence, by Howdy, I'm H. Michael Karshis, on Flickr

We have a quality assurance process that involves loading NSRL quarterly candidate releases into several third-party digital forensics tools, in each publishing cycle.

We don’t anticipate substantial changes to our technology or software in the near future. If anything, we would revisit our internal database design, and address some issues that did not scale up as well as we expected.

Trevor: If other organizations have special collections would NSRL be interested in adding those collections to the reference library? If yes, what process would you suggest to someone interested discussing such an arrangement?

Doug: NSRL is very interested in pursuing loan arrangements with other institutions. Transfer of materials to NIST need not be a requirement. Please contact me, or any NSRL staff, via nsrl@nist.gov or 301-975-3262.

Trevor: Are their more research uses or ways that you think the NSRL could play a role in digital preservation work and research? Further, if any of the folks who follow this blog are interested in exploring doing research involving the software corpus what should they put together and how should they go about getting in touch with your team?

Doug: We are new participants in the community, so I believe we are still at the point of introducing ourselves. I am hopeful that uses may be identified as our capabilities and activities are made known. This blog is a step in that direction, and I thank you for this opportunity.  Anyone with questions regarding research access should contact me.

Trevor: As a final question, could you tell us a bit about how the NSRL came about? One of the tricky parts of digital stewardship establishing the value and need building and maintaining collections and I think the story of the need and uses that the NSRL serves offers a powerful frame for thinking about the kinds of coalitions and common needs that digital stewardship initiatives work to support.

Doug: Prior to NIST involvement in digital forensics, Law enforcement identified the need for automated methods to review the large number of files in investigations involving computers. The FBI “Known File Filter” project supplied hash values of known files, the NDIC “Hashkeeper” project supplied hash values of installed files and of “known malicious” data files. Several commercial and open source tools existed that each used different hash values (CRC32, MD4, MD5, SHA-1)

Hash values were exchanged informally throughout the entire community via email, FTP sites, etc. Investigators had to know where to find hash sets; investigators had to judge the quality of the hash sets. There was no central, trusted repository, and there were open avenues for conflicts of interest.

NIST was contacted because of its history of impartiality in research and standards development. Among the benefits of this involvement were :

  • NIST is an unbiased organization, not in law enforcement, not a vendor
  • NIST can control quality of data
  • NIST can provide traceability by retaining original software
  • NIST can provide data in formats useful by many existing tools
  • NIST has distribution mechanism in the Standard Reference Data service

The result of this is a data set that is court-admissible, a process that is transparent, and a collection open to researchers.

10
May

The May 2012 Library of Congress Digital Preservation Newsletter is now available.

http://www.digitalpreservation.gov/news/newsletter/201205.pdf

In this issue:

  • Exploring Collections using Viewshare
  • The challenges of extracting information from floppy disks
  • U.S. government elections and web archiving at the Spring CNI Meeting
  • Preservation of and access to federally funded scientific data
  • Help launch a digital preservation Q & A site
  • Recent interviews with Bram van de Werf, Lori Phillips, Anne Van Camp, and Ellysa Stern Cahoy
  • New reports: States of Sustainability: A Review of State Projects Funded by NDIIPP; and Preserving State Government Digital Information, Minnesota Historical Society Final Report
  • Upcoming Conferences: JCDL2012 (June 10-14) and Screening the Future 2012 (May 21-23)
10
May

The following is a guest post by Jefferson Bailey, Fellow at the Library of Congress’s Office of Strategic Initiatives.

The NDSA Content Working Group, one of the five working groups of the National Digital Stewardship Alliance focuses on identifying content already preserved, investigating guidelines for the selection of significant content, discovery of at-risk digital content or collections, and matching orphan content with NDSA partners who will acquire the content, preserve it, and provide access to it. As part of this effort, the group conducted a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the web. Conducted from October 3 through October 31, 2011, the goal of the survey was to better understand the landscape of web archiving activities in the United States by identifying the organizations involved, the types of web content being preserved, the tools and policies being used, and the types of access being provided.

Preliminary results of the report presented here are being released in conjunction with the International Internet Preservation Consortium 2012 General Assembly taking place this week here at the Library of Congress. The full report will be made available soon.

The survey featured 28 questions and garnered 77 unique responses from a range of institutions, with survey participants primarily representing the cultural heritage (29%, 22 of 77), government (22% 17 of 77), and university communities (46%, 36 of 77). Of the survey respondents, 31% (24 of 77) were members of the NDSA and 8% (6 of 77) were members of the IIPC.

Web Archiving Activity

The current web archiving activities of the survey respondents was as follows:

  • 63% (49 of 77) have an active web archiving program.
  • 16% (12 of 77) are actively testing a web archiving program.
  • 17% (13 of 77) are planning on pursuing a web archiving program in the near future.
  • 4% (3 of 77) formerly managed web archiving programs, but no longer do so.

Chart 1: Status of current web archiving activities.

Interestingly, of the 71 respondents that identified their web archiving goals, 80% (57 of 71) were archiving content “from other organizations or individuals for future research,” 69% (49 of 71) were preserving their own institutional web content, and 49% (35 of 71) were doing both.

In reviewing the full survey results, a number of themes emerged.

The recent emergence of web archiving, especially at academic institutions

One surprising result was the preponderance of universities that have initiated web archiving programs in the last 5 years. Of the 68 respondents that identified the specific year their web archiving began, nearly a third, 32% (22 of 68) began their programs within the last two years, the exact same number of institutions (22, 32%) that began archiving web content in the 17 years between 1989 and 2006. The recent surge in web archiving within the last 5 years – 68% (46 of 68) of those surveyed – is primarily due to universities starting web archiving programs.

Chart 2: Year began archiving web content.

Inconsistent custodianship

One discovery of the survey was the low percentage of respondents that have transferred their archived data from their external service to their institution. Only 18% (9 of 49) of survey members have transferred their data in-house, including only 2 of the 12 government respondents and only 4 of the 25 university respondents. A total of 82% of those using an external service have not transferred data to their institution. Free text comments for this question pointed to many concerns for transferring externally harvested data to in-house systems including “duplicate costs,” a lack of infrastructure, confidentiality concerns, and cataloging and accessibility challenges.

Chart 3: Rates of transferring web content in-house for those collecting content through an external service.

Lack of policies and unclear guidance on permissions

Internal policy documentation appeared to be an area of continued improvement for many institutions. While some programs had incorporated web-materials into existing policies and procedures, others had not and some seemed unsure of their institution’s current policy status for web content.

The survey also brought to light an acute lack of clarity around seeking permission from content creators, both for harvesting and for providing access to collections. Chart 4 and Chart 6 show policies related to seeking permission from content creators to harvest content and provide access to archived web sites.

Chart 4: Policies towards seeking permission to crawl websites.

Collecting trends and collaborative potential

The types of content being acquired included websites, blogs, and social media:

  • 78% (60 of 77) included or plan on including websites in their archive
  • 57% (44 of 77) included or plan on including blogs in their archive
  • 38% (29 of 77) included or plan on including social media in their archive

Chart 5: Policies towards seeking permission to provide access to archived web content.

A free-text survey question asked for respondents to “briefly describe the scope of your web archive collections.” Broadly stated, these responses fell into one of three categories: institutional self-documentation, collection enhancement, and thematic. Chart 6 shows the survey responses when asked to choose from among a variety of specific subject topics.

The potential for collaboration was a notable aspect of the survey results. Though only 23% of organizations were currently collaborating on web archiving, 96% (64 of 67) answered either “yes” (34, 51%) or “maybe” (30, 45%) when asked if they were interested in participating in future collaborative collecting activities. As these numbers demonstrate, there is a significant interest in the collaborative opportunities around joint web archiving, but little current action in this area.

Chart 6: Subjects currently or planned to be represented in respondents’ web archives.

 

Chart 7: Current participation and interest in future participation in collaborative web archiving.

While the survey sometimes exposed the continued challenges of preserving content that is created on the web, as well as the ongoing permission and management challenges of providing access, it also pointed to the growing importance of web archiving as a core function of collection development for many institutions. This, coupled with the openness towards collaboration, suggests that many of the challenges evident in the report will be addressed in due time by the combined efforts of the entire community. Events like the IIPC General Assembly and alliances like the NDSA are a key part of the knowledge-sharing and collaboration essential to organizations as they work to archive and preserve web collections.

10
May

The following is a guest post by John O’Connor, Program Assistant at the Library of Congress’s Office of Strategic Initiatives.

In April, the Office of Strategic Initiatives hosted a discussion panel on the creation of a curriculum for the new National Digital Stewardship Residency. The NDSR is a new partner venture between the Library of Congress and the Institute of Museum and Library Services wherein recent Master’s-level graduates will acquire the knowledge and skills involved in the selection, management, and long-term preservation of digital assets.

As a part of the program, eight residents will be placed at the Library of Congress and other Washington, D.C. area cultural heritage institutions for extended residencies during which they will complete a project related to digital stewardship. This residency program will serve as an innovative model for new professionals to obtain a detailed, hands-on experience working alongside expert practitioners in the field; it will introduce a set of talented future leaders to the information and cultural heritage professions. For more information on the creation of the residency, please see the earlier post about the program’s launch.

Members of the NDSR Curriculum Panel and Project Staff

Members of the NDSR Curriculum Panel and Project Staff

The NDSR curriculum panel included deans and professors from a number of iSchools and library and information science programs, as well as directors and representatives from multiple cultural heritage and digital stewardship organizations from across the nation. The expert panel designed numerous pieces of the residency program, including the framework for an intensive, two-week immersion workshop at the beginning of the residency that will teach the core competencies of digital stewardship. An additional idea that came out of the meeting included having a capstone week at the end of the term at which residents will present their projects to fellow residents, Library and IMLS staff, host institutions and members of the digital stewardship community.  The panel also proposed a series of events and continuing education opportunities to take place during the six months including expert speakers, online and social media projects by residents, and tours of organizations in the Washington, D.C. area currently involved in digital stewardship activities.

Beyond developing a curriculum, panel members helped define other program elements including qualities of potential residents and host sites, the characteristics of successful digital projects, and additional tools and ideas to help foster a community of residents and mentors. Panel members will continue to be a valuable part of the NDSR’s development and will be involved in ongoing work to build an innovative initiative to support the education and growth of the next generation of digital stewardship professionals.

We would love to hear feedback or question about the NDSR, so please post comments!

Typo corrected, 5/2/2012.

10
May

To try and better communicate and share information about the work happening at organizations in the National Digital Stewardship Alliance we are trying out a new series for the blog that draws attention to particularly interesting and valuable born-digital collections. This series will profile particular collections and incorporate conversations with curators, archivists, librarians, historians, scholars and others working on collecting, preserving and providing access to our born digital cultural record.

The first conversation in this series is with Ben Fino-Radin, digital conservator for the Rhizome ArtBase online archive of digital art. Founded in 1999, the Rhizome ArtBase is a collection of new media art containing some 2125 art works. The ArtBase encompasses a range of projects by artists all over the world that employ materials such as software, code, websites, moving images, games and browsers to aesthetic and critical ends. As digital conservator, Ben oversees the development of Rhizome’s online archive. He actively monitors and creates new records on art works and drafts policies and procedure for the preservation of digital works. He also collaborates with a team on the repair of artworks that fall victim to obsolescence, as well as supporting the ingest of digital archival materials.

Trevor: What kinds of stories do you think the ArtBase collection tells us? Are there some trends and changes over time in the collection that you could tell us about? It would be ideal if you could point to particular works that you think exemplify these trends.

Ben: In a big way, the collection directly reflects the narrative of our institution’s history and evolution as a community. Rhizome was established in 1996 as a listserve and served as a hub for some of the first artists that worked online. The email list was used by the community not just as a public forum, but also a place to debut new projects. In 2001 the ArtBase was established to serve as a more permanent and accessible index to the broad catalog of web based work emerging from the community. So in many ways the collection is reflective of the various players in this early scene. As well, I think the collection presents a very human telling of the evolution of the web and the artist’s relationship to emerging technologies in their practice. This narrative has naturally unfolded as the collection has aged, and it is something that we are actively cultivating. I’m deeply interested how an artwork that engages what is at the time of creation a new technology, accrues different meaning and historic value as it ages.

Here are a few examples of artists engaging new technological affordances or responding to cultural memes created by new affordances, spanning from the early web, to current date.

Lev Manovich, Little Movies (1994), an experiment with early QuickTime.

Olia Lialina, My Boyfriend Came Back From the War (1996), is an oft cited example of early hypertext oriented net.art. Lialina’s early work is heavily narrative driven as she came from a film background before becoming immersed in the web. As evident in this piece, Lialina was quite fond of framesets, back in the early net.art years.

Barbara Lattanzi, The Letter and the Fly (2002), is an example of works from the early 2000’s when a lot of artists who had been producing interactive CD-ROMs began to compile these works for the web, making use of Macromedia Director and the Shockwave plugin.

Image of The Letter and the Fly (2002), Barbra Latanzi.

Perry Bard, Man With a Movie Camera – The Global Remake (2007), is a great early example of a web-based, crowd sourced digital archive of video – here employed to produce a shot for shot remake of Vertov’s landmark piece of cinematic history.

Petra Cortright, VVEBCAM (2007), serves as a monument to the culture of YouTube – produced in 2007 only two years after YouTube’s public beta (hard to imagine that it has only been that long).

PaintFX, (2010), is emblematic of this idea of works serving as a historic document of the introduction of innovation in the creative practice. This artist collaborative sought to highlight the aesthetics inherent in the tools they were using – producing prolific quantities of “default” digital paintings – insinuating a post individual style era of production.

Sebastian Schmieg, All jQuery Effects (2012), represents a sort of ultimate web minimalism. The piece cycles through all of the animation effects of the ubiquitous jQuery, using all of the default values from the jQuery demos.

Trevor: What are some of the most popular pieces in the collection? Could you tell us a little bit about them?

Ben: Alexei Shulgin and Natalie Bookchin’s Introduction to Net.art is pretty legendary. Shulgin (who coined the term ‘net.art’) and Bookchin produced this tongue-in-cheek diatribe in 1997 both as a manifesto for art’s place in cyberspace, and as a somewhat sardonic guide to success for the aspiring net.artist.

Rafael Rozendaal, Jellotime.com (2008), is a crowd pleaser for obvious reasons.

Recently we archived several works by Takeshi Murata. I’m guessing that most readers of The Signal aren’t familiar with Takeshi’s work – but he has been hugely influential in the field. He pioneered a style of glitch video referred to as “datamosh”, which entails removing certain bits of data from .AVI files, causing them to spill out incredible artifacts. His pieces Untitled (Pink Dot) (2007), and Untitled (Silver) are two examples of this. His work has evolved quite far from his early work, as exemplified by I, Popeye or Get Your Ass to Mars, yet his early datamosh work is to this day parroted by young aspiring glitch artists.


Image from Untitled (Silver) (2006), Takeshi Murata

Trevor: How about a few of the most underappreciated pieces in the collection? Do you have a few favorites that you think people should be paying more attention to? Again, it would be ideal if you could give us a bit of context for these pieces. What kinds of stories do these works help us tell?

Ben: I wouldn’t venture to label any works as underappreciated, but two big ones for me are Alexei Shulgin’s Desktop Is (1998), and Adam Cruces’ Desktop Views (2012). Both of these projects collected screenshots of the desktop environments of various artists. No surprise that these would appeal to an archivist. The desktop is such a rich area of metaphor and personalization within a strict set of parameters, and these images reveal so much about their owners. Jason Huff wrote a great piece about just that. Unfortunately Desktop Is has suffered from significant link rot, as back in the day it was never archived by Rhizome and many of the images were hotlinked from sites that have long since passed. We are in the process of hunting and gathering the missing pieces.


Image from Desktop Is (1998), Alexei Shulgin

A recent addition that has a lot of potential is Anthony Antonellis’ Endangered GIF Preserve (2012-ongoing). Antonellis is building an ad-hoc archive of animated GIFs on Wikipedia that have been marked for deletion. Archivalism in the studio practice of artists tends to focus on what others would consider to be the chaff of culture, and this is all the more valuable when it occurs online.

Trevor: Could you tell us a bit about how the collection is being used? To what extent is the audience for the collection artists in search of inspiration? To what extent is it for the general public? To what extent is it for scholars and researchers?

Ben: Currently the collection is used most heavily in academia, and by curators and researchers. Many professors of new media integrate the ArtBase into their lesson plan, designing research and curatorial assignments centered around the students using our members tools to curate exhibitions.

Trevor: I don’t think there are many people out there with the title of digital conservator. Could you tell us a bit about how you define this role? To what extent do you think this role is similar and different to analog art conservation? Similarly, to what extent is this work similar or different to roles like digital archivist or digital curator?

Ben: I drew the distinction with my title for two reasons: 1) I am at the service of an institution that lives within a museum, and 2) the digital objects I am cataloging and preserving access to are not “records” by the archival definition. They are artifacts – and as such require a different kind of care.

I am responsible for the stewardship of intellectual entities that are often inseparable from their digital carriers, due to the artist’s exploitation of the inherent characteristics of the material. It calls for a high degree of regard for the creator’s intent, and a thorough understanding of the subtleties of the materials. A digital archivist tasked with preserving the records of an office probably isn’t going to wonder if the use of Comic Sans in the accountant’s email signature has artifactual significance.

Of course the lines are much blurrier than that and there plenty of examples of people with the title “digital archivist” or “digital curator” doing significant work on preserving the subtle artifactual quality of digital materials (not to mention the incredible people who are contributing to significant projects in their spare time). This is a new phenomenon though, where you have individuals with the title “archivist” or “curator” devoting a level of care to documents, that with paper materials would be the work of a document conservator.

While I would hesitate to compare the two, I think that the conservation of digital artifacts, and the conservation of objects, documents and the like, at their essence hold many similarities. They both require an empathy for the artist, expertise with the medium, and understanding of the proper environment. Sometimes I go to the Greek and Roman galleries at the Met, and daydream about what net art from the 90’s will look like hundreds of years from now.

10
May

Do you wonder if cloud storage is a good option for your personal digital photographs?  Do you have questions about metadata and file formats?  Are you uneasy about the prospects of keeping your digital photos available for yourself and your family into the future?  If so, you have lots of company.

Tourist Family, by fatedsnowfox, on Flickr

Tourist Family, by fatedsnowfox, on Flickr

On April 26, over 570 people participated in a web-based presentation about preserving digital photographs.  I had the pleasure of giving the session on behalf of the American Library Association’s Association for Library Collections & Technical Services as part of Preservation Week. A recording of the session and a copy of the slides are available on the ALCTS website.

The presentation discussed how to identify the different places where personal photographs might lurk, as well as how to decide which images are most important and ways to organize your collection.  I talked about the importance of making copies and storing the copies in separate places.  I also touched on metadata, file formats and storage media options.

The talk had a dual purpose.  The major intent was to give basic tips about what people can do to preserve their personal digital photo collection.  Digital photography has grown phenomenally popular in the just the last few years, and there is a shortage of practical information about how to manage personal images.  The National Digital Information Infrastructure and Preservation Program has provided personal digital archiving guidance for the last couple of years, and the presentation drew from information on the program website.

My second goal was to encourage librarians, archivists and other information professionals to get involved in providing similar advice within their communities.  I made the point that the need for personal digital archiving advice is going to keep growing and that librarians are exactly the right people to help meet this need.

 

10
May

Technology is the easy part of digital preservation.