Warning: include(/home/pacscl5/public_html/clir/wp-content/plugins/wp-super-cache/wp-cache-base.php) [function.include]: failed to open stream: No such file or directory in /home/pacscl5/clir.pacscl.org/wp-content/plugins/wp-super-cache/wp-cache.php on line 65

Warning: include() [function.include]: Failed opening '/home/pacscl5/public_html/clir/wp-content/plugins/wp-super-cache/wp-cache-base.php' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/pacscl5/clir.pacscl.org/wp-content/plugins/wp-super-cache/wp-cache.php on line 65

Warning: include_once(/home/pacscl5/public_html/clir/wp-content/plugins/wp-super-cache/ossdl-cdn.php) [function.include-once]: failed to open stream: No such file or directory in /home/pacscl5/clir.pacscl.org/wp-content/plugins/wp-super-cache/wp-cache.php on line 82

Warning: include_once() [function.include]: Failed opening '/home/pacscl5/public_html/clir/wp-content/plugins/wp-super-cache/ossdl-cdn.php' for inclusion (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/pacscl5/clir.pacscl.org/wp-content/plugins/wp-super-cache/wp-cache.php on line 82
PACSCL Hidden Collections Processing Project » Lessons learned

Lessons learned

...now browsing by category


I Love It When a Plan Comes Together!

Thursday, November 6th, 2014

One of the lessons I’ve learned during the course of this project is that often, despite your best efforts, processing will inevitably lead to snags that slow down your pace and extend processing time. When you’re aiming for 4 hours per linear foot in order to stay under the minimal processing time requirements, this can definitely cause some problems. While my partner, Steve, and I have had collections that matched the MPLP requirements closely enough to stay within that deadline, there have been times when it was a struggle to make the timeline work. Some forced us into item level processing. Some surprised us with accessions that had been completely removed from their original home or reordered for no apparent reason. These slowed down our processing time considerably.

How not to store blueprints.

How not to store blueprints.

But then there were the Hahnemann University Academic Affairs records at the Drexel University College of Medicine Legacy Center Archives and Special Collections. This collection has by far best matched the MPLP requirements at this point in the project, despite being the largest collection with which we’ve worked. This collection consists of 250 linear feet of Academic Affairs records, coming from all the various iterations of Hahnemann University. These include the Homeopathic College of Pennsylvania, Hahnemann University, and even a few records from its current Drexel University College of Medicine title. This large collection also came to us quite disjointed, with multiple accessions often originating from various faculty members’ offices or departments within the college, which made for a lot of overlap. However, despite this small challenge, the records themselves were in great shape for MPLP. None had been previously processed (aside from one small “collection,” whose enterprising owner had taken out all the records from their folders and stacked them loosely into a Xerox box, destroying most of the original order). Additionally, because the records came from specific offices and departments, they were often far more consistently organized than personal papers, making it easier to find links between the contents and to figure out what certain folders contained, without excessive detective work.

Because we did not have to focus on item level processing or learning how to re-work previously written folder titles, it left us free to focus on carefully constructing DACS compliant folder titles, and made physical processing that much easier, as many of the separate “collections” were left intact and made into series or subseries. For example, Series II of this collection consists of administration and faculty records. We created subseries based on the faculty member or department from which the records came, which meant very little reorganization, since these records were already split this way.

A student from 1883 -- what fine hair!

A student from 1883 — what fine hair!

As a result of having to spend less time worrying about archives “detective” work, we were able to come up with some methods to streamline the process even further. My favorite of these methods arose when it came time to create the container list. Generally, we had done data entry first and then wrote the scope note after all the physical arrangement had been completed. This time, we wrote the scope notes as we created our container list. It seems like common sense now, because this allowed us to have a fresher memory of what each series and subseries included, and we were able to make preservation and digitization notes as we went along. It helped us track some of the connections between series, as well as to look through the material to double check records that were especially unique within their series. I thought that it would extend the data entry process, looking over all those records again, but this time around Steve and I worked separately on different series, cutting data entry time in half and allowing us to become ‘experts’ on certain sections of the collection. This reinforced the knowledge we had already gathered while working on the collection, and contributed to the ease of creating the scope note as well.

Aside from the well-suited nature of the collection to MPLP, Steve and I also divided our roles more efficiently this time around. We split up data entry, re-boxing, and physical arrangement duties.  Having more time in one institution was also helpful, although a variety of ‘snow days’ meant that, despite finishing about 8 weeks ahead of schedule, there were still a couple of wrenches thrown in that could have considerably stalled us were we working with a less-ideal collection.

250 feet of beauty!

250 feet of beauty!

The takeaway here is that minimal processing works much better for some collections than for others. Repositories looking to get through some of their backlog should carefully consider the fact that not all collections are going to yield a 2-4 hour per linear foot result, regardless of applying MPLP methods. Often, previously processed collections in particular make that result extremely difficult. If a processing archivist is given a previously-processed item level collection with vague folder titles and no obvious original order, MPLP is probably not going to function like one might hope. However, when the right collection is chosen, the result can be a collection ready for researchers in a fraction of the time.



A challenge from the Superintendents

Wednesday, November 5th, 2014

When I first approached the Archdiocesan Superintendent of Schools records at the Philadelphia Archdiocesan Historical Research Center (PAHRC), I was concerned to say the least.  In fact, I was panicked.  The collection, which documents the administrations of three superintendents spanning a period of thirty years, is all of 9.2 linear feet, which is small compared to most collections.  One hopes that a collection of this size can be dealt with quickly.  However, given that half of the collection consisted of loose, unsorted papers stuffed in document boxes and the other half processed by no less than 8 different LIS students at Villanova University in the 1960s, I assumed we would never meet our processing deadline.  Little did I know upon the terror of first viewing the Superintendent records that it would be the first collection my partner and I completed processing in well under our 4 hours per linear foot limit.  When all was said and done, MPLP processing allowed us to transform twenty-three boxes of disarrayed records into an accessible and usable collection at a swift processing speed of 2.6 hours per linear foot.

My biggest concern while processing this collection was how my partner and I could responsibly process 9 boxes of loose, unsorted records and somehow meaningfully interfile those records into the arrangement we imposed upon the collection.  We did not have enough time to view every loose record individually.  Item-level review is not a luxury afforded to MPLP processors!  Therefore, we could not be entirely sure that we were spot-on with regard to chronology.  We also realized early on that some of those unsorted records were bound to be interfiled within the wrong series, since we were not processing at item level.  Bearing in mind Greene and Meissner’s principle of processing “good enough,” we allowed ourselves to become comfortable with the idea that a few records may be misplaced, which seemed well worth the sacrifice if it meant that the majority of those loose records would finally be given an intelligible arrangement and made accessible to researchers.  When faced with a predicament such as this, it is important to remember that whatever has been done according to the MPLP methodology can be undone.  Those potentially misplaced records would not be buried and lost forever and could very easily be repositioned according to a more refined arrangement at a future point in time!

MPLP is not a final solution.  MPLP is a step in the right direction, though not without its imperfections and limitations.  With MPLP, one should always assume slight imperfections, and collections that have been processed minimally should indeed be revisited and refined when more resources become available.  While easily corrected imperfections are a real possibility of MPLP processing, access is an absolute certainty.

SAA Student Poster Re-Cap: “Reprocessing: The Trials and Tribulations of Previously Processed Collections”

Monday, August 25th, 2014

from the poster presented at the Society of American Archivists Annual Meeting, August 2014, Washington, D.C.

by Annalise Berdini, Steven Duckworth, Jessica Hoffman, Alina Josan, Amanda Mita, & Evan Peugh; Philadelphia Area Consortium of Special Collections Libraries (PACSCL)


PACSCL’s current project, “Uncovering Philadelphia’s Past: A Regional Solution to Revealing Hidden Collections,” will process 46 high research value collections, totaling 1,539 linear feet, from 16 Philadelphia-area institutions that document life in the region. Since the start of processing in October 2013, the team has completed 31 collections at 13 repositories, totaling over 1,225 linear feet. Plans have evolved over the course of the project due to previous processing in many collections. As the processing teams tackled the collections, the solutions devised for the various challenges they encountered developed into a helpful body of information regarding minimal processing. Future archivists and collaborators can use this knowledge to choose appropriate collections for minimal processing projects, and be prepared to handle unexpected challenges as they arise.


  • Novice Archivists: Volunteers and novice archivists, while well meaning, can make simple mistakes that lead to larger problems.
    • Learn about the previous processors; their background and level of knowledge with the materials. Having a better idea of their relationship to the collection helps guide decisions in the new iteration of processing.
    • “Miscellaneous.” It is a very popular word, even with seasoned archivists. Attempts should be made to more accurately describe the contents of a folder, such as “Assorted records” or “Correspondence, assorted,” followed by examples of record types or 1 to 3 names of individuals represented.
  • Losing Original Order: Processors with good intentions can disrupt original order through poor arrangement, item-level processing, and removing items for exhibits or other purposes.
    • Use what original order remains to influence arrangement in a way that might bring separated records back together.
    • Lone items may require more detailed description to provide links back to other documents.
    • Be aware of handwriting: Previous folder titling can serve as a clue for separated items and original order.
  • Item-Level Description: Item-level description can render the collection’s original order impossible to discern and greatly diminish access.
    • Gain a broad perspective of the collection in order to determine the most intelligible arrangement of materials with an awareness of grouping like with like.
    • For item-level reference materials, such as newspaper and magazine clippings, merge materials into larger subject files and include a rough date span.
    • Be cautious when merging other records, such as correspondence. Arrange materials into a loose chronological order and include in the folder title the names of recurring correspondents, if possible.
    • Make sure to account for the new arrangement in one’s arrangement note. Reuniting item-level materials and describing those materials to the new level of arrangement will greatly enhance access to the collection.
  • Legacy Finding Aids: It can be difficult to tell how accurate an existing finding aid is, and the decisions made on how much of it to preserve can be complicated.
    • Again, knowledge of the previous processors’ education and history with the collection will prove helpful.
    • Consider the fate of the legacy finding aid. If the collection will be entirely reprocessed, is anything in the legacy finding aid worth keeping? Should the old and new simply be linked or should parts of the old finding aid be incorporated into the new one?
    • Proofread! Anything retained from a legacy finding aid should be proofread very carefully.
    • Keep ideas of continuity in mind while creating new folder titles and dates.
    • Format can be a problem. Will the format (e.g., hardcopy only) prove problematic for import? Scanning and OCR can be a time-consuming process.
  • Collection Size and Type: Size and type of collection can have a drastic impact on processing speeds.
    • If possible, choose larger collections to economize on time and money. Multiple smaller collections require more effort than one larger one.
    • Institutional records average a faster processing speed than family or personal papers. Keep this in mind when choosing which collections to process.


  • Work closely with current staff; understand the history of the collection and the desired shape of its future.
  • Learn about previous processors to understand their training, background, and history with the records.
  • Edit and expand upon non-descriptive terms (e.g., miscellaneous) when possible. More detailed descriptions can assist in linking separated records back together.
  • Merge clippings and reference files together when feasible.
  • Make note of reprocessing decisions in the finding aid.
  • Proofread any reused documents or folder titles, keeping ideas of consistency in mind.
  • Be mindful of donor relationships in discussing past problems, especially in any public forum, such as a project blog.
  • Plan carefully from the outset. If possible, choose collections that best fit the project goals.
  • Remain flexible and be prepared to compromise.


"Reprocessing" poster for Society of American Archivists 2014 Annual Meeting

Poster for Society of American Archivists 2014 Annual Meeting

Processing speed by collection size graph

Average processing speed by collection size

Processing speed by collection type graph

Average processing speed by collection type

Where are they now? Part II

Friday, July 27th, 2012

The last time this blog heard from me, I had finished processing the papers of Dr. Stella Kramrisch at the Philadelphia Museum of Art. In that blog post, you can tell that I’m a little surprised that the processing went so well. I thought it would be complex to reconcile two different phases of previous processing that had separated a collection into two physical groups.  I can laugh at that now, because it turns out 1-year-ago-Sarah had no idea how complex processing could really be. (Oh, little baby archivist, just you wait.)

Since I left the Hidden Collections project, I’ve worked on two projects at The Historical Society of Pennsylvania (which participated in the CLIR grant but alas, I was not part of the team that worked there). The first was as project archivist for the Digital Center for Americana Project, Phase II. Both phases of this project had, at their heart, the drive to create access to the collections at HSP through digitization. Phase I focused on collections relating to the Civil War and Phase II on collections that documented immigrant families, individuals, and communities in the Philadelphia area. I feel especially lucky that I got to work on this project given the subject matter. Many researchers know about HSP’s treasures – and there are some amazing things in those holdings, believe me – but fewer researchers know about these collections that document the immigrant experience or represent minority groups. The history of the Philadelphia area is mostly a narrative of Western European families who, yes, were all immigrants themselves, but very well-documented immigrants. So I’m happy to be adding to the richness of that narrative by making collections of less well-documented minority and immigrant groups accessible to the public.

The project involved some MPLP and some full processing. Collections had to be arranged, described, housed, inventoried, conserved, and digitized. Some collections received full digitization, like the beautiful 18th and 19th century bound volumes in the Abraham H. Cassel collection and the tapes and transcripts in the Balch Institute’s South Asian Immigrants in the Philadelphia Area Oral History Project.  Others received “signpost” images, meaning that I selected items for digitization that represented the contents of the collection. This was actually a bit of a challenge, because I had to resist the urge to digitize the most unusual, amazing, or funniest items in a collection and just digitize things that wouldn’t mislead a researcher as to the collection’s contents.  So, for the Athena Tacha papers, rather than digitize a letter from one of Tacha’s famous artist friends, I chose one of her many letters to her family in Greece.

One of the biggest challenges with this project was the language barrier. I can read some German (but don’t ask me to speak it), as well as Japanese, Latin, and a tiny bit of Spanish, but this project also included Greek, Swedish, and French, languages that I had zero experience with. Luckily, I was able to fall back on the skills of two interns who were natives of Sweden and Greece. Without their help, the finding aids for these collections would have been a lot less informative and the processing experience a lot less fun. The interns had different levels of archives experience, so I relied on them mostly as translators rather than processors. But even our clever Swedish intern, who spoke German fluently, was stumped by some of the spidery, 18th century German handwriting and syntax we encountered.

Working on the DCAII has given me a deep respect and thankfulness for the work that Holly and Courtney did on the PACSCL CLIR project. Transitioning from a student processing intern to a project archivist had a very, very, very steep learning curve. But luckily I had some understanding coworkers who created a support system of archivists, conservators, and digital technicians, all willing to put up with my mistakes and answer my questions (although in hindsight, one of my biggest mistakes was not asking more questions). Coordinating moving collections between three departments was difficult, as was getting used to budgeting my time on a project for which I had to keep track of and participate in processing, conservation, and digitization tasks. I also managed interns, ordered supplies, blogged, helped organize an exhibit, helped arrange a talk, and generally tried to look like I knew what I was doing. (As the internet says: fail.)

Of course, I would not be where I am now — happily processing the papers of the Woodlands Cemetery Company at HSP — if I hadn’t been selected as a student processor for the Hidden Collections project. This project and others like it are truly wonderful ways for archives and LIS students to get their feet wet in the processing pool. Especially if they’re managed as well as we were, with readily available guidance and frequent on-site supervision, processing interns gain not only skills they’ll need for those first few jobs, but the confidence to use them.

For further reading, here are some links with information about the projects I’ve done since Hidden Collections:

HSP’s Digital Library: http://digitallibrary.hsp.org/

HSP’s finding aids: http://hsp.org/collections/catalogs-research-tools/finding-aids

HSP’s archives blog, “Fondly, Pennsylvania:” http://hsp.org/blogs/fondly-pennsylvania

Please feel free to contact me if you have any questions about the DCAII and its collections, Woodlands Cemetery, or my experience with the PACSCL-CLIR Hidden Collections project. snewhouse@hsp.org

Hidden Collections Initiative for Pennsylvania Small Archival Repositories

Monday, May 7th, 2012

If you’ve been following this blog of the PACSCL-CLIR Hidden Collections Processing Project, you might be interested in learning about the Hidden Collections Initiative for Pennsylvania Small Archival Repositories (HCI-PSAR, or the “Small Repository Project” for short). This post could be filed under “PACSCL-CLIR Student Processors–Where Are They Now?” since I, and fellow former student processor Michael Gubicza, are both currently employed on the Small Repository Project. But before you conjure up too many thoughts of drug-addicted 80s TV stars and one-hit-wonder 90s teen queens, think of this post also under the headings “Lessons Learned” and “Project Legacy.” The Small Repository Project carries on PACSCL’s commitment to uncovering hidden archival collections, and builds on the PACSCL-CLIR methodology, tools, and infrastructure–with a few new twists, of course.

Another creative storage solution at Millbrook Society! Hatboro Borough records, stored in a biscuit box.

Another creative storage solution at Millbrook Society! Hatboro Borough records, stored in a biscuit box.

First, some background on the Small Repository Project. It’s an initiative of the Historical Society of Pennsylvania–not coincidentally, one of the repositories where I processed for PACSCL-CLIR–with funding from the Andrew W. Mellon Foundation. The Small Repository Project aims to make better known and more accessible the important archival collections held at the many small, primarily volunteer-run historical societies, historic sites, and museums in the Philadelphia region. It was envisioned as a three-part project, and right now we’re in the midst of Phase I, which focuses on Philadelphia and Montgomery Counties. My title is Project Surveyor, so my job is to visit all of the small repositories in those two counties and survey their archival collections. There are two major components to the survey work: description and assessment.

Historical Society of Tacony: Frank Shuman, a Tacony resident, developed the world's first solar power plant in 1912-1913!

Historical Society of Tacony: Frank Shuman, a Tacony resident, developed the world's first solar power plant in 1912-1913!

Description In just six months of surveying, we’ve already discovered many amazing collections! From big names–like Pennsylvania Governor Samuel Pennypacker and Civil War naval engineer John Ericsson–to names that didn’t make the history books–like Frank Shuman, who built the world’s first solar power plant in 1912, or Dr. Hiram Corson, an abolitionist and prominent advocate for women physicians. To make these important resources more visible, we are creating what amount to “stub” finding aids: we don’t have the time to physically process any collections, but we can provide collection-level descriptions with very summary information. To be as fast yet thorough as possible, Michael and I use Archivist’s Toolkit, Holly and Courtney’s data-entry best practices, and an Excel-to-XML worksheet of my own devising that was heavily inspired by Matt Herbison’s.

PACSCL and the University of Pennsylvania recently agreed to host our finding aids, so they will be on the PACSCL Finding Aid Site together with the PACSCL-CLIR “Hidden Collections” finding aids. I am personally thrilled about this detail, because it means Philadelphia will be one step closer to having one central database where all area archival collections could be searched. In one place, you will be able to search collections from the biggest professionally-run PACSCL member to the smallest all-volunteer historical society! None of the Small Repository Project finding aids are up quite yet, but keep an eye on the site…

Old York Road Historical Society

Old York Road Historical Society

Assessment As I mentioned, the Hidden Collections Project doesn’t have the time to physically process all the collections that we survey, but we do hope that at least some of them will be processed in the not-too-distant future! Toward that end, we not only describe but also assess each of the collections we survey. We look at the condition of the material, quality of housing, degree of intellectual access (existence of finding aids), physical accessibility (organization), and research value (a combination of an interest ranking, and a rating for how well those interesting topics are documented). These ratings help establish collection care and processing priorities–a collection with a high research value rating but low accessibility ratings should be processed first.

PACSCL did the same sort of assessments for its member institutions a few years back (PACSCL Consortial Survey Initiative), based on a survey project at the Historical Society of Pennsylvania before that. The collections processed for the PACSCL-CLIR “Hidden Collections” Processing Project were those identified by the PACSCL survey as having the highest potential research value.

The assessment methodology that we use in the Small Repository Project, down to the assessment criteria and ratings descriptions, is modeled after the PACSCL survey. Check out Matthew Lyons’ blog post about our methodology. We strive for consistency so that our ratings will be comparable to PACSCL’s. Only the future can say whether anyone will undertake a large-scale, multi-repository processing project like PACSCL-CLIR “Hidden Collections.” But our assessments can help individual small repositories best allocate their own limited resources.

Social Media While I worked on the PACSCL-CLIR project, I loved sharing my favorite “finds” from the collections I processed on the project Flickr page and blog. We do the same thing at the Small Repository Project! Check out our blog and our photoalbums. For updates, follow us on Facebook or Twitter.

Finally, I’d like to take this opportunity to thank Holly, Courtney, and everyone who has worked on the PACSCL-CLIR Hidden Collections Project. The tools, techniques, and wisdom they developed and shared on their project website have proved invaluable to us in implementing the Small Repository Project. I’m sure that many other important and innovative archival projects will build on the PACSCL-CLIR project, and we all, collectively, thank you for enriching our communal knowledge.

Legacy finding aids: a trial (by any definition)!

Monday, February 13th, 2012

Unknown size: small.

77 “substandard” or legacy guides are now in the Archivists’ Toolkit and final editing is underway.  And I am happy about that … however, almost none of these look as good as they could or should.  Garrett Boos, Archivists’ Toolkit cataloger, and I spoke many times about the limitations of this part of the project.

We decided that there were several problems:  working remotely from the collections; the format, structure and quality of the finding aids that were given to us; and, to be perfectly honest, our own expectations for the final product.

Before Garrett started, I decided that working remotely was going to be the most logical way to approach this part of the project.  Garrett worked in our office at Penn and entered the collections into our own instance of the Archivists’ Toolkit.  We then exported the finding aids from his AT and  imported them into each repository’s instance of the Archivists’ Toolkit.  I decided to have Garrett work at Penn primarily because of logistics—otherwise, he would have had to work at 18 different repositories and, as we have learned, technology and space are two of the greatest challenges of the project.  Not to mention the instances when security clearances would need to be run, etc.  However, now that Garrett is done with the project, I have been trying to decide if it would have been better for him to work on-site and I am torn.  On the one hand, it would have made a lot of factors easier—especially checking on locations, vague titles and missing dates, to name only a few.  On the other hand, it would almost certainly have stopped being a “legacy finding aid conversion” project and turned into a “reprocessing” project. So I guess I need to stand by my decision to work off-site, even it was limiting.

Unknown size: small.

The reason I say that it would have turned into a “reprocessing” project is because Garrett and I think that at least 60% of the collections should have had some physical and intellectual work before the finding aid was considered final.  As with all aspects of this project, the legacy finding aid component was an experiment and therefore, the grant allowed repositories to send us any “substandard finding aids.” This resulted in several types of “tools.”  Garrett took them all on:  lists, card catalogs, databases and more traditional finding aids.  The biggest problem we found was that very few of these guides were organized hierarchically which meant that we had to do a lot of guessing—was something a folder, or was it an item?  Should the paragraph connected to a folder title be added as a scope note or was it actually part of the folder title?  What to do with the information about the contents of a letter, or the condition of the material?  What happens when there is no biographical/historical note and no scope and content note?  Thank goodness for email and helpful repository staff! 

I should say that there were a number of finding aids that came to us in absolute perfect shape … putting that finding aid into the Archivists’ Toolkit was a piece of cake and the resulting finding aid was beautiful. Others that were written before finding aids were standardized did not work nearly so well. Because we forced non-hierarchical guides into AT, a system designed to organize information hierarchically, some of the finding aids are actually less user-friendly than the originals. Many of these legacy guides had item level description, something our stylesheet doesn’t handle well, resulting in what Garrett and I have termed, “really ugly finding aids.” Moreover, of 77 finding aids, only 15 did not require some enhancement of biographical/historical or scope and contents notes–which is pretty tricky when working off-site. Titles and dates almost always needed to be reformatted for DACs compliance. Our primary goal was to maintain every bit of information that was in the original, but it worries me that we have created online guides that are potentially overwhelming and off-putting to researchers.

Some repositories have told me that I should not worry—that getting the guide online is enough.  Others, though, I know are really disappointed with the result. We surveyed our participating repositories about the effectiveness of the project and their satisfaction, and while we have not heard from all, the component of the project that proved least satisfying is the legacy finding aid component. I know that it is, by far, the part of the project with which I am least pleased.

Does this mean that you should not do a legacy finding aid conversion project?  No!  Do a legacy finding aid conversion, but do it with some structure and guidelines!  In order to have a successful legacy finding aid conversion project, we learned that repository staff will have to do some (or alot of) front line work prior to unleashing the guide on the cataloger.

Before handing over a finding aid, repository staff should identify (in pencil is okay):

• Folder title (underlined in one color)
• Folder date (underlined in another color)
• Box number
• Folder number
• If there is additional material, into what field in the Archivists’ Toolkit/EAD should it be entered?
• Biographical/historical note (does not need to be narrative, but the information should be provided by an “expert”)
• Scope and content note (same as the bio note)

If, as you go through this process, it becomes obvious that reprocessing is necessary, take the collection off your conversion list and place it on a priority list for processing.  Processing the collection may be quick and speedy and your result will almost certainly be better! In fact, I think, in some cases, we spent more time forcing data into AT than it would have taken to reprocess the collection.

Identifying these essentials should result in finding aids that are more standardized and allow researchers greater access to your awesome stuff. Don’t count on it being a quick process, however: the prep work is time consuming, the conversion is time consuming, and the proofing and editing is REALLY time consuming. This is not a task that can be placed only on the person converting the finding aid … even after the finding aid was in AT, Courtney and I, with fresh pairs of eyes, found lots of mistakes in spelling, hierarchy and grammar which would have been embarrassing and, even worse, would have potentially prevented people from finding that for which they were looking. Which is, of course, the whole point of all our work!

Description in MPLP is counter-intuitive

Tuesday, February 7th, 2012

Courtney and I both felt strongly, from the very beginning of the project, that sacrificing description for speed was a risk in this project.  Although we know that every collection could still use additional work, we worked hard to make it so that the repository did not feel that additional work was necessary before they made the collection public.  Moreover, we knew from the start, that many of the collections would NEVER be worked on again.  Unfortunately, that is just how it is.

Unknown size: small.

So what have we learned about description?  We learned that description takes a lot of time—in fact, that is probably the first thing we learned in this project when we tested the manual and discovered that even an experienced processor could not arrange and describe a fairly straightforward collection from start to finish in 2 hours per linear foot.  As a result, Courtney and I created processing plans that included a preliminary biographical/historical note before processing started.  In general, we have learned that it generally takes roughly the same amount of time to describe a collection as it does to arrange a collection.

I’m not going to lie … I am pro description … few things give me more professional pleasure that a beautifully crafted folder title or a paragraph in a scope and content note that I know will help a user determine if this collection is going to help them with their research.  That is the whole point—letting researchers know that we have the stuff that they need.  As a result, the PACSCL/CLIR team took it seriously.  Description is the one part of training that has probably evolved most over the course of the project.  We developed exercises to help our processors write better and more descriptive folder titles and structure notes so that they are both concise and informative.  The project didn’t have a lot of time, so we tried to make our processors think like a user and learn to quickly assess the contents of a folder.  For the most part, we are really pleased with our finding aids and I think, nine times out of ten, researchers will be able to determine by the finding aid if the collection is worth their time in looking at it.

One of the really interesting things we learned is, to me, still the most counter-intuitive.  A collection with extremely tidy existing arrangement usually results in a collection with less thorough description.  I am going to use two specific collections to illustrate this issue.

The first collection is the Dillwyn and Emlen family correspondence, 1770-1818, housed at the Library Company of Philadelphia (unquestionably one of my favorite collections in this project—as well as being one of my biggest disappointments, archivally speaking).  When I sat down to process this collection, I was really confident—the collection was 2 linear feet and was already arranged.  At one point in time, it had been bound in volumes and at another point in time, the letters were removed from the volumes and placed in very acidic folders.  Every letter had a catalog number written on the document.  While a few of the letters were out of chronological order, the vast majority of the collection was arranged very effectively; each folder containing letters from a span of dates.

Unknown size: small.

This collection desperately needed to be re-foldered.  Not only were the folders highly acidic, but they were too small and some of the letters were showing a bit of damage.  I re-foldered the 130 folders in the collection which took about 2.5 hours.  Then I entered the folder list into the Archivists’ Toolkit which probably took only about 15 to 20 minutes.   So in roughly 3 hours (three quarters of my allotted time), I had the collection rehoused and the folder list in the Archivists’ Toolkit, which left me 1 hour to write a scope and content note.  Should have been easy, right? Well, no. Because this collection was perfectly arranged, I did not need to look at even one document in order to create the container list.  Moreover, the container list is not very helpful to a researcher.  All it contains is a list of dates which means that the scope and content note should be full of the subjects addressed in the correspondence.  Problem is, I did not know anything about the letters.  There was no way that I could read enough of the letters in an hour to discover all the topics addressed in the letters that will almost certainly be interesting to researchers.  I did my best—I valiantly scanned through as many letters as I could and wrote down key topics that popped up more than once or twice, and as each minute passed, my heart sank just a little more—I knew perfectly well that I could never do this extraordinary collection justice, even with twice the time.  Prior to beginning processing, I had performed my research for the biographical note and I had discovered that several authors had used portions of the collection in their published works … so I turned to them for expertise on this collection.  They wrote about only a tiny portion of the collection, Susanna Dillwyn Emlen’s bout with breast cancer.  I soaked up every bit of information in their books and included it in my scope note in order to give users the most information possible, but I feel like the project failed this collection.  Perhaps I feel this so strongly because I had been so confident in significantly improving access to it.

Unknown size: small.

I have beheld the second collection, the Belfield collection, 1697-1977, housed at the Historical Society of Pennsylvania, with equal amounts of awe, excitement and horror since I first laid eyes on it.  Never have I seen such a mess of a collection—please see just a few photographs as words cannot effectively describe the condition of this collection.  Courtney and I spoke with Matthew Lyons of HSP and he said that he was not expecting much more than good box level descriptions of the contents.  Even with these reduced expectations, we thought it wise to double our forces and therefore, Michael, Celia, Courtney and I all worked together on this collection.  I am happy to say that this collection will, for quite a few series, contain folder level description, but even more than that, the scope and content note for this collection is rich, deep and full of the flavor of the four generations of family who lived at Belfield.

So why does a collection that was the biggest (filthiest) mess of all time result in a better finding aid than a small and beautifully arranged collection?   I know it is because we were forced to sift through the messy collection in order to create any order, and it is amazing how much one absorbs simply by looking at the collection.  In the end, I feel that this is one of the biggest rapid maximal processing successes of the entire project.  We took the collection from utterly unusable chaos to an order that could certainly be refined, but is beyond serviceable.

When selecting collections for a minimal/rapid maximal processing project, consider your time frames and what result you want from the project.  If you want a container list in a hurry, select a well-organized collection.  If you want fuller description, a collection that needs some arrangement will probably be the best choice.  From a purely selfish perspective, I would pick a wreck of a collection over a tidy one every time—the sense of accomplishment and success is so much sweeter than that despair I still feel when I think of Dillwyn and Emlen letters.

I mentioned in an earlier blog post that there are about 3 collections that I don’t feel enormously benefited from this project.  In every case, the collections had existing arrangement that I felt either prevented me from starting from scratch or were in good enough order that I did not learn valuable content that I could then share with researchers.

The decision to minimally process should be a collection-by-collection decision …

Friday, January 27th, 2012

Fairly early in this project, Courtney and I determined that “MPLP 2 Hours” was not going to be a wholesale success—most collections simply cannot be processed in that time frame, regardless of the shortcuts taken (our average across the board is 3.2 hours per linear foot).  And in some cases, those shortcuts resulted in a product that we did not feel was more useful to a researcher post-processing.  What we have determined is essentially this … it is difficult, if not impossible, to say that collections can be processed in a set or determined amount of time, but it is possible to make educated estimates allowing us to allocate human resources to process collections efficiently.

There are several factors that allow us to better determine a time frame for the processing of collections:  age, type of collection, and original arrangement of the collection are the three biggies. None of these factors work independently—they are all intertwined to help determine the time frame.  So, based upon the data collected for 125 collections, processors have physically processed collections with the oldest material dating from the:

17th century at an average of 4.1 hours per linear foot;

18th century at an average of 3.3 hours per linear foot;

19th century at an average of 3.4 hours per linear foot;

20th century at an average of 2.9 hours per linear foot.

Processors have processed:

artificial collections at an average of 3.6 hours per linear foot;

institutional/corporate records at an average of 2.5 hours per linear foot;

personal papers at an average of 3.7 hours per linear foot;

family papers at an average of 4.2 hours per linear foot.

Age seems like it should be the most logical factor, but in fact, it has proven to be the least certain factor in our ability to judge the time frame for processing.  We thought originally that old collections (pre 1850s for certain) would take us significantly longer to process, but this is not necessarily the case.  The age does not seem to deter us in being able to efficiently process an “old” collection.  Age does, however, quite frequently deter us from describing the collections well.  Quickly skimming for content in folders of 17th, 18th and 19th century handwritten material is not easy—and it absolutely results in less thorough description.  However, if the collection is arranged and available for research use, perhaps this is where we ask for help … as researchers use the collections, we can ask them to provide more robust description of what the correspondence, journals, etc. contain.  Finding aids CAN be iterative … especially with technology such as the Archivists’ Toolkit.  “Newer” collections may or may not be easier to process … certainly there is more typewritten material that makes it immediately easier to categorize series/subseries/folders and describe the contents of the folders more thoroughly.  However, in the end, the ease of the processing relies more heavily on the type of collection more than the age.

For this project, we have divided collections into four basic types:  institutional/corporate records, personal papers, family papers and artificial collections.  Again, there is no one size fits all … each collection is unique (is that not why archival collections are so awesome?).  Generally speaking though, an institution or company’s records can be processed most quickly, followed by personal papers and then family papers.  Artificial collections are usually the fastest or the slowest depending entirely upon the collector.  Usually, they are speedy—the collector is in love with the topic they are collecting and as a result, they arrange the collection for their own personal satisfaction and use—all the letters of a children’s book author are arranged chronologically by date sent or alphabetically by the recipients’ names.  If this is the case, the artificial collection is a dream to process and it usually requires only description.  In a few instances, however, we have found collections where the collector simply collects … they probably know that the stuff is important, but they are not organizers.  At that point, trying to create a system out of a group of randomly acquired material can be quite difficult.

Institutional and business records are usually quick and easy and this is because the functions of a business or an institution generally follow the same basic structures and are fairly predictable.  Usually, you will find financial records, minutes, committee records, administrative records, subject files, correspondence, etc.  Because the function generates the records, it is logical and easy to determine a good organizational scheme for the papers.  But as always, the collections are unique and we have found that different creators generate different levels of tidiness, logical order, and structure.

Personal papers are the next quickest to process (generally speaking), especially if the creator was involved in several major movements, careers, and/or activities.  However, the ability to efficiently process a person’s personal collection often depends upon how intermingled those pursuits are with family, friends, and work.

Family papers have been, fairly consistently, the most time-consuming collections to process.  The problems that arise with family papers that generally do not exist with personal papers are the intertwining relationships that make determining to whom a certain group of materials belong challenging, and sometimes, impossible.  When every generation in a family has a woman named Sarah, determining generations becomes a trial.   Many a day passed at the Historical Society of Pennsylvania with the following conversation: “So wait, this is Sarah Logan Wister Starr?”  “No, this is Sarah Logan Starr Blaine!”  Or:  “Here is a letter to Grandma Sarah from Sarah …does that mean it is Sarah Logan Starr Blain?”  “No!  It could be Sarah Logan Starr Blain OR Sarah Logan Wister Starr OR Sarah Tyler Boas Wister!”  Egads … I wanted to buy a baby name book for this family!  Not surprisingly, this kind of questioning takes time … lots of time.

The third main factor in determining time for processing a collection is existing arrangement.  A collection of 20th century business records thrown into boxes will take longer than a collection of 18th century business records that are housed in volumes.  A collection of family papers organized by the donor into distinct family member’s papers can probably be processed more quickly than a collection of personal papers that are completely unsorted.  I have intentionally not used the term original order which implies that the order was generated the creator.  Existing arrangement may have been generated by the creator, but in many cases, it is generated by an archivist who starts processing the collection but does not complete the project.  Unfortunately, the hardest collections to process efficiently are often collections that someone else has started to process.  Trying to understand an undocumented order that has been imposed or continue with an arrangement scheme that does not seem logical is much more difficult than imposing order from absolute chaos.  And without a questions, the collections that take the absolute longest are ones in which parts of the collection have received item level treatment.  Addressed in the next blog post will be how this type of existing arrangement affects description of collections.

So, basically what we have said here is that every collection is different and unique and there is absolutely no way to say that one time will work even within a date frame or a type of record. Our observations are backed by Greene and Meissner who say that “MPLP … advises vigorously against adopting cookie-cutter approaches … and [recommends] flexible approaches,” (page 176).  In order to make educated estimates for allocating resources, we believe that a base-line starting time frame is needed:  institutional/corporate collections should be given 3 hours per linear foot.  Based upon the existing arrangement, tack on another hour per linear foot if it is in a shambles.  If the bulk of the material is from the 18th century, tack on yet another hour per linear foot for increased perusal time which will result in more effective description.  So, in this case, your estimated processing time is 5 hours per linear foot.  Could you do it in three?  Yes, probably.  However, with allowances for age and existing arrangement, you will almost unquestionably have a better product, still at just over ½ the rate of traditional processing.

Based upon our experience, the PACSCL/CLIR project believes that the following base-line processing time estimates would work well:

Artificial collections:  3 hours per linear foot

Institutional/corporate collections:  3 hours per linear foot

Personal papers:  4 hours per linear foot

Family papers:  6 hours per linear foot

Our averages clearly show how quickly collections can be processed … but the base-line estimate with upgrades allows us to provide the best possible product while being mindful of available resources.

27 months, 125 collections — How’d we do?

Wednesday, January 18th, 2012

After two years of speed processing across the Delaware Valley, Holly and I thought it prudent to take one last look at the collections before calling it quits. From September to the end of November we traveled from site to site reviewing our work and gathering information on the quality and accuracy of our efforts. In doing so, we learned a lot about the limitations of minimal processing AND our approach to training.

We processed 125 collections and spot checked 103.  Our approach varied a little from collection to collection, but generally speaking we followed the same protocol across the board, and created a worksheet to keep us on task. We took note of the overall condition of each collection, and recorded data on the condition of folders and whether folder labels were complete and legible.  We remeasured each collection (including counting containers and volumes), and carefully reviewed the contents of several boxes (every fifth or tenth box, for example, depending on the size of the collection).  Within boxes, we counted folders and reviewed the title and contents of at least one folder (sometimes many more) in each box, comparing the physical collection to what was recorded in the finding aid.  Here’s what we found:

  • Collections or parts of collections that benefited from new housing were infinitely easier to review than collections that remained in their original housing, particularly when it came to counting files, and reading and understanding the information provided on folder labels.
  • Inconsistent and incomplete folder labeling was a recurring issue in 32% of the collections we reviewed.  In particular, one of the more frustrating problems we encountered was that students frequently sacrificed recording the box and folder number or collection name on folder labels.
  • Another major issue we encountered was mistakes in box and folder numbering.  9% of the boxes we checked had numbering issues.  9% doesn’t seem like a lot, but renumbering boxes and folders (141 of ‘em, to be precise) is incredibly time consuming.  One mistake in numbering, as you probably know, means the entire box must be renumbered and updated in the database.
  • We identified 17 items that were unaccounted for in finding aids.
  • Happily, 96% of the files we checked for accuracy in description, when compared to the finding aid, were correctly described!

What we learned:

We had the good fortune to find and hire bright and enthusiastic student processors — nearly all of whom planned to become professional archivists.  We sometimes forgot, however, that they were not yet professional archivists and, though we provided a lot of training and feedback in certain areas, we placed less emphasis on others, perhaps assuming the importance of some tasks to be common knowledge.  We absolutely provided instruction on how to handle, house and label the physical collection, but in training (and in supervision) I think we inadvertently placed more importance on the quality of the finding aid.  That we employed MPLP, where less work is done physically, probably exacerbated this problem.

Though we were not able to gather data for all of these issues, anecdotally, I can say, the biggest offenders that detracted from the overall physical quality of the processed collections were: (1) failure to replace all of the damaged and/or brittle folders, (2) failure to re-record information provided on file labels with deteriorating adhesive, (3) inconsistency in folder labeling, (4) neglecting to record the collection name or number on folder labels, and (5) neglecting to record box and folder numbers on folder labels.

These issues not only made the collections look messy, but made them difficult to use.  Incomplete and inconsistent folder labels will certainly make research and reference (particularly returning files to their rightful place) difficult. And the failure to re-record information from failing adhesive labels risks losing some or all identifying information when those labels inevitably fall off and are lost.

If we had the chance to do it again, we would definitely add to our training and change how we supervised the students. At the very least, we would incorporate reference exercises into boot camp to place greater emphasis on how the condition of the physical collection impacts research and reference. Though in our case, this was hard to avoid, I think there would be less remote supervision. While we pored over finding aids, making endless edits (four rounds of editing!), we should have made more time to review the actual collection together with the processors.  We had lots of conversations along the way about how to approach arrangement, but little time was made to discuss the mechanics of processing. Doing so would also have provided the opportunity for processors to fix their own mistakes (rather than Holly and I doing it for them, after they’ve moved on), which, in my opinion, is one of the best ways to learn.

We are revamping our training and processing materials to reflect what we learned over the last few months, so be on the lookout for a tweet or blog post announcing when they are ready.