Cameron Blevins, Rutgers University
There are a dizzying number of places to learn how to write computer code, from Codeacademy to Codewars to Girls Who Code. Where to begin? For those of us working in history, literature, or other humanities disciplines, general programming instruction can feel far removed from the topics we study and teach. How does learning the concept of a for-loop help us better understand, say, nineteenth-century anti-slavery movements? Enter The Programming Historian: a suite of open-access, peer-reviewed tutorials aimed at teaching programming and other digital methods within the context of history and the humanities. Under this framework, historian Caleb McDaniel doesn’t just help you write a for-loop in Python; he shows you how to use a for-loop to extract data from 7,000 anti-slavery documents from the Boston Public Library. Given the explosion of interest in the digital humanities, The Programming Historian fills a growing pedagogical need for hands-on, applied instruction in the field.
The current version of The Programming Historian is the project’s second iteration. The first was launched in 2007-2008 by Bill Turkel and Alan MacEachern at the University of Western Ontario, and was a pioneering attempt to make programming accessible and relevant for historians with little to no prior technical experience. Turkel and MacEachern co-wrote a series of lessons that taught basic applications of the Python programming language, including scraping web-pages, counting word frequencies, and making n-gram dictionaries. Adam Crymble then joined The Programming Historian and helped relaunch its second, current iteration of the project. Today, Crymble is joined by fellow editors Fred Gibbs, Allison Hegel, Caleb McDaniel, Ian Milligan, and Miriam Posner. More than twenty others have authored tutorials for the project, with dozens more participating as reviewers.
As of November 2015, The Programming Historian offered more than forty tutorials on a range of common tasks that a historian (or humanist) might want to do with a computer. Thinking about having your class build an online collection of primary sources? Check out Miriam Posner’s lessons on Up and Running on Omeka.net and Creating an Omeka.net Exhibit. Need to get some messy scanned texts into a more usable format? Laura Turner O’Hara will guide you through Cleaning OCR’d Text with Regular Expressions. Interested in topic modeling but aren’t sure how or why you’d use it? Shawn Graham, Scott Weingart, and Ian Milligan’s Getting Started With Topic Modeling and MALLET offers a hands-on introduction to the tool. Individual lessons vary in length, detail, and complexity, but they are all geared towards (and written by) actual humanities scholars using actual data and actual problems from humanities projects. This is enormously refreshing for anyone who has worked through tutorials written by people who write code for a living.
Tutorials on The Programming Historian are grouped within broad categories such as Mapping and GIS, Data Manipulation, and Data Management. Although they are notionally written for historians, the bulk of the lessons are applicable across the humanities. There are too many tutorials to review them all, so I’ll focus briefly on just one example. Lately I’ve been meaning to tinker with the workflow of how I write. So I decided to write this review by following the instructions in Dennis Tenen and Grant Wythoff’s tutorial, Sustainable Authorship in Plain Text using Pandoc and Markdown. The idea behind “sustainable authorship” is to strip writing down to its most basic components and sidestep the need for proprietary infrastructure such as Microsoft Word or Google Docs to write, read, share, or even open files. Tenen and Wythoff speak thoughtfully and directly to a beginner audience, offering reassurances such as “The installation of the necessary tools presents perhaps the biggest barrier to participation.” They explain not just the how of using Pandoc and Markdown, but also the why — the philosophy and guiding principles behind why you might want to use these tools. This is a crucial component to learning and teaching digital methods in the humanities. It’s easy to assume the utility of a tool is self-explanatory. For students or colleagues who are not already familiar with a method, however, it isn’t always apparent why they should learn how to use it. Many of the individual lessons in The Programming Historian take great pains to not just explain the mechanics of the skill or tool, but also its broader utility, applications, and significance. This is tutorial writing at its best.
The Programming Historian is currently the best one-stop-shop for technical skill-building in the digital humanities. But this comes with a caveat: it is heavily slanted towards a particular kind of technical skill-building, one bound up within a tradition of humanities computing and tech culture. As Natalia Cecire, Adeline Koh, and others have noted, there’s a danger to privileging this kind of technical knowledge about coding and programming. For one, it reifies a certain definition of the digital humanities and the practices that constitute it. Overwhelmingly, The Programming Historian focuses on topics like GIS, text processing, or corpus analysis rather than, say, new media studies or critical computing. To be fair, the project was never intended to cover every method or skill in digital history, much less the digital humanities. Nor is it even capable of doing so: The Programming Historian is maintained by a group of generous and thoughtful people working on an entirely volunteer basis. It is as much a community of contributors as it is a collection of tutorials. The topics of specific lessons mirror the interests and expertise of the individual people who wrote them. Taken together, however, these tutorials raise a question: who gets shut out from the project?
The Programming Historian can feel daunting for people who are not already familiar with humanities computing or tech culture. Over the past several years I’ve referred dozens of students and colleagues to The Programming Historian. But I always try to point them towards a specific tutorial that will fit a discrete need. Otherwise, they run the risk of getting lost in a forest of intimidating trees. Jumping into The Programming Historian landing page armed with little more than “I have a bunch of photos from the archive – what do I do now?” is a recipe for alienation. Lesson titles such as Transliterating non-ASCII Characters with Python, Supervised Classification with a Naive Bayesian, or even the tutorial I followed for this review, Sustainable Authorship in Plain Text Using Pandoc and Markdown, assume an existing familiarity with what all those words mean and how they might help you. ASCII, Python, Bayesian, Pandoc, Markdown: these are part of an insider vocabulary.
Contributing to The Programming Historian can feel even more daunting. Because the project is hosted through Github Pages, becoming an author or reviewer requires a working familiarity with Markdown syntax and Github pull requests. Github has advantages for this kind of large-scale, decentralized, collaborative project, but it comes with real downsides, including a steep technical learning curve and an overwhelmingly male user base. On that note, as of November 2015 less than 30% of authors and reviewers at The Programming Historian were women. There were even fewer people of color. Glancing through the line-up of lessons, authors, and reviewers, or reading about the Github submission and review process, it’s easy to end up asking yourself, “Do I belong here?” Let me be perfectly clear: none of these critiques are unique to The Programming Historian. The wider field of digital humanities is riddled with these sorts of dynamics. But it’s nevertheless important to highlight the barriers and exclusions that can sprout from an emphasis on this particular kind of technical skill-building and the communities that coalesce around it.
Unlike many digital humanities projects, however, The Programming Historian is actively and openly wrestling with questions over exclusion and inclusion. In November of 2015, editor Adam Crymble posed an open question on the project’s Github repository: “How can we make the PH more friendly for women to contribute?” In the space of little more than a week, the question received 48 comments offering ideas and feedback, ranging from the opaqueness of Github to consciously placing more women in leadership positions. This conversation exemplifies one of The Programming Historian‘s most important contributions to the larger field of digital humanities: its commitment to transparency. This approach inflects the entire project. Not only does The Programming Historian jettison blind peer review; it shines a spotlight on the reviewers themselves. Under this model, reviewers act more like collaborators than gatekeepers, and their names appear alongside those of the authors in the byline of every lesson. Despite its downsides, the project’s adoption of Github in 2014 further nudges the project’s workflow out into the open. Moving forward, a reader could theoretically see the conversations and decisions between authors and reviewers that shaped the final version of the tutorial. This commitment to building a transparent, collaborative, and continually self-improving publication process will hopefully make The Programming Historian a sustainable pedagogical resource well beyond the impact of any individual lesson or tutorial.
About The Author
Cameron Blevins is a digital historian of the nineteenth-century United States and the American West. He received his PhD from Stanford University in 2015 and is a postdoctoral fellow at Rutgers University’s history department and the Rutgers Center for Historical Analysis. He is currently completing The Postal West, which contributes a new spatial history of the western United States and its integration into the nation.