Tagged datasets

Plot counting wildfires in California by cause. In the plot, the fewest fires have been attributed to illegal alien campfires and firefighter training.

Ethnographies of Datasets: Teaching Critical Data Analysis through R Notebooks


With the growth of data science in industry, academic research, and government planning over the past decade, there is an increasing need to equip students with skills not only in responsibly analyzing data, but also in investigating the cultural contexts from which the values reported in data emerge. A risk of several existing models for teaching data ethics and critical data literacy is that students will come to see data critique as something that one does in a compliance capacity prior to performing data analysis or in an auditing capacity after data analysis rather than as an integral part of data practice. This article introduces how I integrate critical data reflection with data practice in my undergraduate course Data Sense and Exploration. I introduced a series of R Notebooks that walk students through a data analysis project while encouraging them, each step of the way, to record field notes on the history and context of their data inputs, the erasures and reductions of narrative that emerge as they clean and summarize the data, and the rhetoric of the visualizations they produce from the data. I refer to the project as an “ethnography of a dataset” not only because students examine the diverse cultural forces operating within and through the data, but also because students draw out these forces through immersive, consistent, hands-on engagement with the data.


Last Spring one of my students made an important discovery regarding the politics encoded in data about California wildfires. Aishwarya Asthana was examining a dataset published by California’s Department of Forestry and Fire Protection (CalFIRE), documenting the acres burned for each government-recorded wildfire in California from 1878 to 2017. The dataset also included variables such as the fire’s name, when it started and when it was put out, which agency was responsible for it, and the reason it ignited. Asthana was practicing applying techniques for univariate data analysis in R—taking one variable in the dataset and tallying up the number of times each value in that variable appears. Such analyses help to summarize and reveal patterns in the data, prompting questions about why certain values appear more than others.

Tallying up the number of times each distinct wildfire cause appeared in the dataset, Asthana discovered that CalFIRE categorizes each wildfire into one of nineteen distinct cause codes, such as “1—Lightning,” “2—Equipment Use,” “3—Smoking,” and “4—Campfire.” According to the analysis, 184 wildfires were caused by campfires, 1,543 wildfires were caused by lightning, and, in the largest category, 6,367 wildfires were categorized with a “14—Unknown/Unidentified” cause code. The cause codes that appeared the fewest number of times (and thus were attributed to the fewest number of wildfires) were “12—Firefighter Training” and the final code in the list: “19—Illegal Alien Campfire.”

fires %>% 
  ggplot(aes(x = reorder(CAUSE,CAUSE,
                     function(x)-length(x)), fill = CAUSE)) +
  geom_bar() +
  labs(title = "Count of CalFIRE-documented Wildfires since 1878 by Cause", x = "Cause", y = "Count of Wildfires") + 
  theme_minimal() +
  theme(legend.position = "none", 
        plot.title = element_text(size = 12, face = "bold")) +

Figure 1: Plot counting wildfires in California by cause. In the plot, the fewest fires have been attributed to illegal alien campfires and firefighter training.

Figure 1. Plot of CalFIRE-documented wildfires by cause, produced in R.

Interpreting the data unreflectively, one might say, “From 1878 to 2017, four California wildfires have been caused by illegal alien campfires—making it the least frequent cause.” Toward the beginning of the quarter in Data Sense and Exploration, many students, particularly those majoring in math and statistics, compose statements like this when asked to draw insights from data analyses. However, in only reading the data on its surface, this statement obscures important cultural and political factors mediating how the data came to be reported in this way. Why are “illegal alien campfires” categorized separately from just “campfires”? Who has stakes in seeing quantitative metrics specific to campfires purportedly ignited by this subgroup of the population—a subgroup that can only be distinctly identified through systems of human classification that are also devised and debated according to diverse political commitments?

While detailing the history of the data’s collection and some potential inconsistencies in how fire perimeters are calculated, the data documentation provided by CalFIRE does not answer questions about the history and stakes of these categories. In other words, it details the provenance of the data but not the provenance of its semantics and classifications. In doing so, it naturalizes the values reported in the data in ways that inadvertently discourage recognition of the human discernment involved in their generation. Yet, even a cursory Web search of the key phrase “illegal alien campfires in California” reveals that attribution of wildfires to undocumented immigrants in California has been used to mobilize political agendas and vilify this population for more than two decades (see, for example, Hill 1996). Discerning the critical import of this data analysis thus demands more than statistical savvy; to assess the quality and significance of this data, an analyst must reflect on their own political and ethical commitments.

Data Sense and Exploration is a course designed to help students reckon with the values reported in a dataset so that they may better judge their integrity. The course is part of a series of undergraduate data studies courses offered in the Science and Technology Studies Program at the University of California Davis, aiming to cultivate student skill in applying critical thinking towards data-oriented environments. Data Sense and Exploration cultivates critical data literacy by walking students through a quarter-long research project contextualizing, exploring, and visualizing a publicly-accessible dataset. We refer to the project as an “ethnography of a dataset,” not only because students examine the diverse cultural forces operating within and through the data, but also because students draw out these forces through immersive, consistent, hands-on engagement with the data, along with reflections on their own positionality as they produce analyses and visualizations. Through a series of labs in which students learn how to quantitatively summarize the features in a dataset in the coding language R (often referred to as a descriptive data analysis), students also practice researching and reflecting on the history of the dataset’s semantics and classification. In doing so, the course encourages students to recognize how the quantitative metrics that they produce reflect not only the way things are in the world, but also how people have chosen to define them. Perhaps, most importantly, the course positions data as always already structured according to diverse biases and thus aims to foster student skill in discerning which biases they should trust and how to responsibly draw meaning from data in spite of them. In this paper, I present how this project is taught in Data Sense and Exploration and some critical findings students made in their projects.

Teaching Critical Data Analysis

With the growth of data science in industry, academic research, and government planning over the past decade, universities across the globe have been investing in the expansion of data-focused course offerings. Many computationally or quantitatively-focused data science courses seek to cultivate student skill in collecting, cleaning, wrangling, modeling, and visualizing data. Simultaneously, high-profile instances of data-driven discrimination, surveillance, and mis-information have pushed universities to also consider how to expand course offerings regarding responsible and ethical data use. Some emerging courses, often taught directly in computer and data science departments, introduce students to frameworks for discerning “right from wrong” in data practice, focusing on individual compliance with rules of conduct at the expense of attention to the broader institutional cultures and contexts that propagate data injustices (Metcalf, Crawford, and Keller 2015). Other emerging courses, informed by scholarship in science and technology studies (STS) and critical data studies (CDS), take a more critical approach, broadening students’ moral reasoning by encouraging them to reflect on the collective values and commitments that shape data and their relationship to law, democracy, and sociality (Metcalf, Crawford, and Keller 2015).

While such courses help students recognize how power operates in and through data infrastructure, a risk is that students will come to see the evaluation of data politics and the auditing of algorithms as a separate activity from data practice. While seeking to cultivate student capacity to foresee the consequences of data work, coursework that divorces reflection from practice end up positioning these assessments as something one does after data analysis in order to evaluate the likelihood of harm and discrimination. Research in critical data studies has indicated that this divide between data science and data ethics pedagogy has rendered it difficult for students to recognize how to incorporate the lessons of data and society into their work (Bates et al. 2020). Thus, Data Sense and Exploration takes a different approach—walking students through a data analysis project while encouraging them, each step of the way, to record field notes on the history and context of their data inputs, the erasures and reductions of narrative that emerge as they clean and summarize the data, and the rhetoric of the visualizations they produce. As a cultural anthropologist, I’ve structured the class to draw from my own training in and engagement with “experimental ethnography” (Clifford and Marcus 1986). Guided by literary, feminist, and postcolonial theory, cultural anthropologists engage experimental ethnographic methods to examine how systems of representation shape subject formation and power. In this sense, Data Sense and Exploration positions data inputs as cultural artifacts, data work as a cultural practice, and ethnography as a method that data scientists can and should apply in their work to mitigate the harm that may arise from them. Importantly, walking students into awareness of the diverse cultural forces operating in and through data helps them more readily recognize opportunities for intervention. Rather than criticizing the values and political commitments that they bring to their work as biasing the data, the course celebrates such judgments when bent toward advancing more equitable representation.

The course is predominantly inspired by literature in data and information infrastructure studies (Bowker et al. 2009). These fields study the cultural and political contexts of data and the infrastructures that support them by interviewing data producers, observing data practitioners, and closely reading data structures. For example, through historical and ethnographic studies of infrastructures for data access, organization, and circulation, the field of data infrastructure studies examines how data is made and how it transforms as it moves between stakeholders and institutions with diverse positionalities and vested interests (Bates, Lin, and Goodale 2016). Critiquing the notion that data can ever be pure or “raw,” this literature argues that all data emerge from sites of active mediation, where diverse epistemic beliefs and political commitments mold what ultimately gets represented and how (Gitelman 2013). Diverting from an outsized focus on data bias, Data Sense and Exploration prompts students to grapple with the “interpretive bases” that frame all data—regardless of whether it has been produced though personal data collection, institutions with strong political proclivities, or automated data collection technologies. In this sense, the course advances what Gray, Gerlitz, and Bounegru (2018) refer to as “data infrastructure literacy” and demonstrates how students can apply critical data studies techniques to critique and improve their own day-to-day data science practice (Neff et al. 2017).

Studying a Dataset Ethnographically

Data Sense and Exploration introduces students to examining a dataset and data practices ethnographically through an extended research project, carried out incrementally through a series of weekly labs.[1] While originally the labs were completed collaboratively in a classroom setting, in the move to remote instruction in Spring 2020, the labs were reformulated as a series of nine R Notebooks, hosted in a public GitHub repository that students clone into their local coding environments to complete. R Notebooks are digital documents, written in the scripting language Markdown, that enable authors to embed chunks of executable R code amidst text, images, and other media. The R Notebooks that I composed for Data Sense and Exploration include text instruction for how to find, analyze, and visualize a rectangular dataset, or a dataset in which values are structured into a series of observations (or rows) each described by a series of variables (or columns). The Notebooks also model how to apply various R functions to analyze a series of example datasets, offer warnings of the various faulty assumptions and statistical pitfalls students may encounter in their own data practice, and demonstrate the critical reflection that students will be expected to engage in as they apply the functions in their own data analysis.

Interspersed throughout the written instruction, example code, and reflections, the Notebooks provide skeleton code for students to fill in as they go about applying what they have learned to a dataset they will examine throughout the course. At the beginning of the course, when many students have no prior programming experience, the skeleton code is quite controlled, asking students to “fill-in-the-blank” with a variable from their own dataset or with a relevant R function.

# Uncomment below and count the distinct values in your unique key. Note that you may need to select multiple variables. If so, separate them by a comma in the select() function.
#n_unique_keys <- _____ %>% select(_____) %>% n_distinct()

# Uncomment below and count the rows in your dataset by filling in your data frame name.
#n_rows <- nrow(_____)

# Uncomment below and then run the code chunk to make sure these values are equal.
# n_unique_keys == n_rows
Figure 2. Example of skeleton code from R Notebooks.

However, as students gain familiarity with the language, each week, they are expected to compose code more independently. Finally, in each Notebook, there are open textboxes, where students record their critical reflections in response to specific prompts.

Teaching this course in the Spring 2020 quarter, I found that the structure provided by the R Notebooks overall was particularly supportive to students who were coding in R for the first time and that, given the examples provided throughout the Notebooks, students exhibited greater depth of reflection in response to prompts. However, without the support of a classroom once we moved online, I also found that novice students struggled more to interpret what the plots they produced in R were actually showing them. Moreover, advanced students were more conservative in their depth of data exploration, closely following the prompts and relying on code templates. In future iterations of the course, I thus intend to spend more synchronous time in class practicing how to quantitatively summarize the results of their analysis. I also plan to add new sections at the end of each Notebook, prompting students to leverage the skills they learned in that Notebook in more creative and free-form data explorations.

Each time I teach the course, individual student projects are structured around a common theme. In the iteration of the course that inspired the project that opens this article, the theme was “social and environmental challenges facing California.” In the most recent iteration of the course, the theme was “social vulnerability in the wake of a pandemic.” In an early lab, I task students with identifying issues warranting public concern related to the theme, devising research questions, and searching for public data that may help answer those questions. Few students entering the course have been taught how to search for public research, let alone how to search for public data. In order to structure their search activity, I task the students with imagining and listing “ideal datasets”—intentionally delineating their topical, geographic, and temporal scope—prior to searching for any data. Examining portals like data.gov, Google’s dataset search, and city and state open data portals, students very rarely find their ideal datasets and realize that they have to restrict their research questions in order to complete the assignment. Grappling with the dearth of public data for addressing complex contemporary questions around equity and social justice provides one of the first eye-opening experiences in the course. A Notebook directive prompts students to reflect on this.

Throughout the following week, I work with groups of students to select datasets from their research that will be the focus of their analysis. This is perhaps one of the most challenging tasks of the course for me as the instructor. While a goal is to introduce students to the knowledge gaps in public data, some public datasets have so little documentation that the kinds of insights students could extrapolate from examinations of their history and content would be considerably limited. Further, not all rectangular datasets are structured in ways that will integrate well with the code templates I provide in the R Notebooks. I grapple with the tension of wanting to expose students to the messiness of real-world data, while also selecting datasets that will work for the assignment.

Once datasets have been assigned, the remainder of the labs provide opportunities for immersive engagement with the dataset. In what follows, I describe a series of concepts (i.e. routines and rituals, semantics, classifications, calculations and narrative, chrono-politics, and geo-politics) around which I have structured each lab, and provide some examples of both the data work that introduced students to these concepts and the critical reflections they were able to make as a result.

Data Routines and Rituals

In one of the earlier labs, students conduct a close reading of their dataset’s documentation—an example of what Geiger and Ribes (2011) refer to as a “trace ethnography.” They note the stakeholders involved in the data’s collection and publication, the processes through which the data was collected, the circumstances under which the data was made public, and the changes in the data’s structure. They also search for news articles and scientific articles citing the dataset to get a sense of how governing bodies have leveraged the data to inform decisions, how social movements have advocated for or against the data’s collection, and how the data has advanced other forms of research. They outline the costs and labor involved in producing and maintaining the data, the formal standards that have informed the data’s structure, and any laws that mandate the data’s collection.
From this exercise, students learn about the diverse “rituals” of data collection and publication (Ribes and Jackson 2013). For instance, studying the North American Breeding Bird Survey (BBS)—a dataset that annually records bird populations along about 4,100 roadside survey routes in the United States and Canada—Tennyson Filcek learned that the data is produced by volunteers skilled in visual and auditory bird identification. After completing training, volunteers drive to an assigned route with a pen, paper, and clipboard and count all of the bird species seen or heard over the course of three minutes along each designated stop on the route. They report the data back to the BBS Office, which aggregates the data and makes them available for public consumption. While these rituals shape how the data get produced, the unruliness of aggregating data collected on different days, by different individuals, under different weather and traffic conditions, and in different parts of the continent has prompted the BBS to implement recommendations and routines to account for disparate conditions. The BBS requires volunteers to complete counts around June, start the route a half-hour before sunrise, and avoid completing counts on foggy, rainy, or windy days. Just as these routines domesticate the data, the heterogeneity of the data’s contexts demands that the data be cared for in particular ways, in turn patterning data collection as a cultural practice. This lab is thus an important precursor to the remaining labs in that it introduces students to the diverse actors and commitments mediating the dataset’s production and affirms that the data could not exist without them.

While I have been impressed with students’ ability to outline details involving the production and structure of the data, I have found that most students rarely look beyond the data documentation for relevant information—often missing critical perspectives from outside commentators (such as researchers, activists, lobbyists, and journalists) that have detailed the consequences of the data’s incompleteness, inconsistencies, inaccuracies, or timeliness for addressing certain kinds of questions. In future iterations of the course, I intend to encourage students to characterize the viewpoints of at least three differently positioned stakeholders in this lab in order to help illustrate how datasets can become contested artifacts.

Data Semantics

In another lab, students import their assigned dataset into the R Notebook and programmatically explore its structure, using the scripting language to determine what makes one observation distinct from the next and what variables are available to describe each observation. As they develop an understanding for what each row of the dataset represents and how columns characterize each row, they refer back to the data documentation to consider how observations and variables are defined in the data (and what these definitions exclude). This focused attention to data semantics invites students to go behind-the-scenes of the observations reported in a dataset and develop a deeper understanding of how its values emerge from judgments regarding “what counts.”

ca_crimes_clearances <- read.csv("https://data-openjustice.doj.ca.gov/sites/default/files/dataset/2019-06/Crimes_and_Clearances_with_Arson-1985-2018.csv")

## 'data.frame':    24950 obs. of  69 variables:
##  $ Year               : int  1985 1985 1985 1985 1985 1985 1985 1985 1985 1985 ...
##  $ County             : chr  "Alameda County" "Alameda County" "Alameda County" "Alameda County" ...
##  $ NCICCode           : chr  "Alameda Co. Sheriff's Department" "Alameda" "Albany" "Berkeley" ...
##  $ Violent_sum        : int  427 405 101 1164 146 614 671 185 199 6703 ...
##  $ Homicide_sum       : int  3 7 1 11 0 3 6 0 3 95 ...
##  $ ForRape_sum        : int  27 15 4 43 5 34 36 12 16 531 ...
##  $ Robbery_sum        : int  166 220 58 660 82 86 250 29 41 3316 ...
##  $ AggAssault_sum     : int  231 163 38 450 59 491 379 144 139 2761 ...
##  $ Property_sum       : int  3964 4486 634 12035 971 6053 6774 2364 2071 36120 ...
##  $ Burglary_sum       : int  1483 989 161 2930 205 1786 1693 614 481 11846 ...
##  $ VehicleTheft_sum   : int  353 260 55 869 102 350 471 144 74 3408 ...
##  $ LTtotal_sum        : int  2128 3237 418 8236 664 3917 4610 1606 1516 20866 ...
##  $ ViolentClr_sum     : int  122 205 58 559 19 390 419 146 135 2909 ...
##  $ HomicideClr_sum    : int  4 7 1 4 0 2 4 0 1 62 ...
##  $ ForRapeClr_sum     : int  6 8 3 32 0 16 20 6 8 319 ...
##  $ RobberyClr_sum     : int  32 67 23 198 4 27 80 21 16 880 ...
##  $ AggAssaultClr_sum  : int  80 123 31 325 15 345 315 119 110 1648 ...
##  $ PropertyClr_sum    : int  409 889 166 1954 36 1403 1344 422 657 5472 ...
##  $ BurglaryClr_sum    : int  124 88 62 397 9 424 182 126 108 1051 ...
##  $ VehicleTheftClr_sum: int  7 62 16 177 8 91 63 35 38 911 ...
##  $ LTtotalClr_sum     : int  278 739 88 1380 19 888 1099 261 511 3510 ...
##  $ TotalStructural_sum: int  22 23 2 72 0 37 17 17 7 287 ...
##  $ TotalMobile_sum    : int  6 4 0 23 1 26 18 9 3 166 ...
##  $ TotalOther_sum     : int  3 5 0 5 0 61 21 64 2 22 ...
##  $ GrandTotal_sum     : int  31 32 2 100 1 124 56 90 12 475 ...
##  $ GrandTotClr_sum    : int  11 7 1 20 0 14 7 2 2 71 ...
##  $ RAPact_sum         : int  22 9 2 31 4 21 25 9 15 451 ...
##  $ ARAPact_sum        : int  5 6 2 12 1 13 11 3 1 80 ...
##  $ FROBact_sum        : int  77 56 23 242 35 38 136 13 22 1120 ...
##  $ KROBact_sum        : int  22 23 2 71 10 7 43 3 4 264 ...
##  $ OROBact_sum        : int  3 11 2 43 11 3 7 1 1 107 ...
##  $ SROBact_sum        : int  64 130 31 304 26 38 64 12 14 1825 ...
##  $ HROBnao_sum        : int  59 136 26 351 56 32 116 3 0 1676 ...
##  $ CHROBnao_sum       : int  38 48 15 150 9 21 43 4 13 253 ...
##  $ GROBnao_sum        : int  23 2 1 0 2 7 43 6 9 83 ...
##  $ CROBnao_sum        : int  32 2 2 0 0 8 21 2 2 46 ...
##  $ RROBnao_sum        : int  11 20 6 47 14 9 19 3 2 306 ...
##  $ BROBnao_sum        : int  3 2 3 21 0 2 6 0 3 37 ...
##  $ MROBnao_sum        : int  0 10 5 91 1 7 2 11 12 915 ...
##  $ FASSact_sum        : int  25 16 3 47 6 47 43 10 26 492 ...
##  $ KASSact_sum        : int  27 30 2 103 8 38 55 13 21 253 ...
##  $ OASSact_sum        : int  111 90 10 224 9 120 208 29 43 396 ...
##  $ HASSact_sum        : int  68 27 23 76 36 286 73 92 49 1620 ...
##  $ FEBURact_Sum       : int  1177 747 85 2040 161 1080 1128 341 352 9011 ...
##  $ UBURact_sum        : int  306 242 76 890 44 706 565 273 129 2835 ...
##  $ RESDBUR_sum        : int  1129 637 100 2015 89 1147 1154 411 274 8487 ...
##  $ RNBURnao_sum       : int  206 175 33 597 32 292 295 100 44 2114 ...
##  $ RDBURnao_sum       : int  599 195 44 1418 26 485 532 163 103 5922 ...
##  $ RUBURnao_sum       : int  324 267 23 0 31 370 327 148 127 451 ...
##  $ NRESBUR_sum        : int  354 352 61 915 116 639 539 203 207 3359 ...
##  $ NNBURnao_sum       : int  216 119 32 224 44 274 238 104 43 1397 ...
##  $ NDBURnao_sum       : int  47 46 21 691 14 110 45 34 26 1715 ...
##  $ NUBURnao_sum       : int  91 187 8 0 58 255 256 65 138 247 ...
##  $ MVTact_sum         : int  233 187 42 559 85 219 326 76 56 2711 ...
##  $ TMVTact_sum        : int  56 33 4 55 9 71 88 40 9 121 ...
##  $ OMVTact_sum        : int  64 40 9 255 8 60 57 28 9 576 ...
##  $ PPLARnao_sum       : int  5 31 26 133 5 10 1 4 3 399 ...
##  $ PSLARnao_sum       : int  60 20 4 163 4 14 20 6 3 251 ...
##  $ SLLARnao_sum       : int  289 664 40 1277 1 704 1058 106 435 1123 ...
##  $ MVLARnao_sum       : int  930 538 147 3153 207 1136 753 561 241 8757 ...
##  $ MVPLARnao_sum      : int  109 673 62 508 153 446 1272 155 252 901 ...
##  $ BILARnao_sum       : int  205 516 39 611 16 360 334 276 151 349 ...
##  $ FBLARnao_sum       : int  44 183 46 1877 85 493 417 187 281 4961 ...
##  $ COMLARnao_sum      : int  11 53 17 18 24 27 59 7 2 70 ...
##  $ AOLARnao_sum       : int  475 559 37 496 169 727 696 304 148 4055 ...
##  $ LT400nao_sum       : int  753 540 84 533 217 937 1089 370 235 976 ...
##  $ LT200400nao_sum    : int  437 622 68 636 122 607 802 299 262 2430 ...
##  $ LT50200nao_sum     : int  440 916 128 2793 161 1012 1102 453 464 4206 ...
##  $ LT50nao_sum        : int  498 1159 138 4274 164 1361 1617 484 555 13254 ...
Figure 3. Basic examination of the structure of the CA Crimes and Clearances dataset.

For instance, studying aggregated totals of crimes and clearances for each law enforcement agency in California in each year from 1985 to 2017, Simarpreet Singh noted how the definition of a crime gets mediated by rules in the US Federal Bureau of Investigation (FBI)’s Uniform Crime Reporting Program (UCR)—the primary source of statistics on crime rates in the US. Singh learned that one such rule, known as the hierarchy rule, states that if multiple offenses occur in the context of a single crime incident, for the purposes of crime reporting, the law enforcement agency classifies the crime only according to the most serious offense. In descending order, these classifications include 1. Criminal Homicide 2. Criminal Sexual Assault 3. Robbery 4. Aggravated Battery/Aggravated Assault 5. Burglary 6. Theft 7. Motor Vehicle Theft 8. Arson. This means that in the resulting data, for incidents where multiple offenses occurred, certain classes of crime are likely to be underrepresented in the counts.

Sidhu also acknowledged how counts for individual offense types get mediated by official definitions. A change in the FBI’s definition of “forcible rape” (including only female victims) to “rape” (focused on whether there had been consent instead of whether there had been physical force) in 2014 led to an increase in the number of rapes reported in the data from that year on. From 1927 (when the original definition was documented) up until this change, male victims of rape had been left out of official statistics, and often rapes that did not involve explicit physical force (such as drug-facilitated rapes) went uncounted. Such changes come about, not in a vacuum, but in the wake of shifting norms and political stakes to produce certain types of quantitative information (Martin and Lynch 2009). By encouraging students to explore these definitions, this lab has been particularly effective in getting students to reflect not only on what counts and measures of cultural phenomena indicate, but also on the cultural underpinnings of all counts and measures.

Data Classifications

In the following lab, students programmatically explore how values get categorized in the dataset, along with the frequency with which each observation falls into each category. To do so, they select categorical variables in the dataset and produce bar plots that display the distributions of values in that variable. Studying a US Environmental Protection Agency (EPA) dataset that reported the daily air quality index (AQI) of each county in the US in 2019, Farhat Bin Aznan created a bar plot that displayed the number of counties that fell into each of the following air quality categories on January 1, 2019: Good, Moderate, Unhealthy for Sensitive Populations, Unhealthy, Very Unhealthy, and Hazardous.

aqi$category <- factor(aqi$category, levels = c("Good", "Moderate", "Unhealthy for Sensitive Groups", "Unhealthy", "Very Unhealthy", "Hazardous"))

aqi %>%
  filter(date == "2019-01-01") %>%
  ggplot(aes(x = category, fill = category)) +
  geom_bar() +
  labs(title = "Count of Counties in the US by Reported AQI Category on January 1, 2019", subtitle = "Note that not all US counties reported their AQI on this date", x = "AQI Category", y = "Count of Counties") +
  theme_minimal() +
  theme(legend.position = "none",
        plot.title = element_text(size = 12, face = "bold")) +
  scale_fill_brewer(palette="RdYlGn", direction=-1)

Figure 4: R output when student plots the number of counties in each AQI category on January 1, 2019. Bar plot displays that most counties reported Good air quality on that day.

Figure 4. Barplot of counties in each AQI category on January 1, 2019.

Studying the US Department of Education’s Scorecard dataset, which documents statistics on student completion, debt, and demographics for each college and university in the US, Maxim Chiao created a bar plot that showed the number of universities that fell into each of the following ownership categories: Private, Public, Non-profit.

scorecard %>%
  mutate(CONTROL_CAT = ifelse(CONTROL == 1, "Public",
                          ifelse(CONTROL== 2, "Private nonprofit",
                                 ifelse(CONTROL == 3, "Private for-profit", NA)))) %>%
           ggplot(aes(x = CONTROL_CAT, fill = CONTROL_CAT)) +
           geom_bar() +
           labs(title ="Count of Colleges and Universities in the US by Ownership Model, 2018-2019", x = "Ownership Model", y = "Count of Colleges and Universities") +
           theme_minimal() +
           theme(legend.position = "none",
                 plot.title = element_text(size = 12, face = "bold"))

Figure 5: R output when student plots the number of colleges and universities by their ownership model in the 2018-2019 academic year.

Figure 5. Barplot of colleges and universities in the US by ownership model.

I first ask students to interpret what they see in the plot. Which categories are more represented in the data, and why might that be the case? I then ask students to reflect on why the categories are divided the way that they are, how the categorical divisions reflect a particular cultural moment, and to consider values that may not fit neatly into the identified categories. As it turns out, the AQI categories in the EPA’s dataset are specific to the US and do not easily translate to the measured AQIs in other countries, where for a variety of reasons, different pollutants are taken into consideration when measuring air quality (Plaia and Ruggieri 2011). The ownership models categorized in the Scorecard dataset gloss over the nuance of quasi-private universities in the US such as the University of Pittsburgh and other universities in Pennsylvania’s Commonwealth System of Higher Education.

For some students, this Notebook was particularly effective in encouraging reflection on how all categories emerge in particular contexts to delimit insight in particular ways (Bowker and Star 1999). For example, air pollution does not know county borders, yet, as Victoria McJunkin pointed out in her labs, the EPA reports one AQI for each county based on a value reported from one air monitor that can only detect pollution within a delimited radius. AQI is also reported on a daily basis in the dataset, yet for certain pollutants in the US, pollution concentrations are monitored on an hourly basis, averaged over a series of hours, and then the highest average is taken as the daily AQI. The choice to classify AQI by county and day then is not neutral, but instead has considerable implications for how we come to understand who experiences air pollution and when.

Still, I found that, in this lab, other students struggled to confront their own assumptions about categories they consider to be neutral. For instance, many students categorizing their data by state in the US suggested that there were no cultural forces underlying these categories because states are “standard” ways of dividing the country. In doing so, they missed critical opportunities to reflect on the politics behind how state boundaries get drawn and which people and places get excluded from consideration when relying on this bureaucratic schema to classify data. Going forward, to help students place even “standard” categories in a cultural context, I intend to prompt students to produce a brief timeline outlining how the categories emerged (both institutionally and discursively) and then to identify at least one thing that remains “residual” (Star and Bowker 2007) to the categories.

Data Calculations and Narrative

The next lab prompts students to acknowledge the judgment calls they make in performing calculations with data, including how these choices shape the narrative the data ultimately conveys. Selecting a variable that represents a count or a measure of something in their data, students measure the central tendency of the variable—taking an average across the variable by calculating the mean and the median value. Noting that they are summarizing a value across a set of numbers, I remind students that such measures should only be taken across “similar” observations, which may require first filtering the data to a specific set of observations or performing the calculations across grouped observations. The Notebook instructions prompt students to apply such filters and then reflect on how they set their criteria for similarity. Where do they draw the line between relevant or irrelevant, similar or dissimilar? What narratives do these choices bring to the fore, and what do they exclude from consideration?

For instance, studying a dataset documenting changes in eligibility policies for the US Supplemental Nutrition Assistance Program (SNAP) by state since 1995, Janelle Marie Salanga sought to calculate the average spending on SNAP outreach across geographies in the US and over time. Noting that we could expect there to be differences in state spending on outreach due to differences in population, state fiscal politics, and food accessibility, Salanga decided to group the observations by state before calculating the average spending across time. Noting that the passing of the American Recovery and Reinvestment Act of 2009 considerably expanded SNAP benefits to eligible families, Salanga decided to filter the data to only consider outreach spending in the 2009 fiscal year through the 2015 fiscal year. Through this analysis, Salanga found California to have, on average, spent the most on SNAP outreach in the designated fiscal years, while several states spent nothing.

snap %>%
  filter(month(yearmonth) == 10 & year(yearmonth) %in% 2009:2015) %>% #Outreach spending is reported annually, but this dataset is reported monthly, so we filter to the observations on the first month of each fiscal year (October)
  group_by(statename) %>%
  summarize(median_outreach = median(outreach * 1000, na.rm = TRUE), 
            num_observations = n(), 
            missing_observations = paste(as.character(sum(is.na(outreach)/n()*100)), "%"), 
            .groups = 'drop') %>%
statename median_outreach num_observations missing_observations
California 1129009.3990 7 0 %
New York 469595.8557 7 0 %
Texas 422051.5137 7 0 %
Washington 273772.9187 7 0 %
Minnesota 261750.3357 7 0 %
Arizona 222941.9250 7 0 %
Nevada 217808.7463 7 0 %
Illinois 195910.5835 7 0 %
Connecticut 184327.4231 7 0 %
Georgia 173554.0009 7 0 %
Pennsylvania 153474.7467 7 0 %
South Carolina 126414.4135 7 0 %
Ohio 125664.8331 7 0 %
Rhode Island 99755.1651 7 0 %
Tennessee 98411.3388 7 0 %
Massachusetts 97360.4965 7 0 %
Wisconsin 87527.9999 7 0 %
Maryland 81700.3326 7 0 %
Vermont 69279.2511 7 0 %
North Carolina 62904.8309 7 0 %
Indiana 58047.9164 7 0 %
Oregon 57951.0803 7 0 %
Michigan 53415.1688 7 0 %
Florida 37726.1696 7 0 %
Hawaii 29516.3345 7 0 %
New Jersey 23496.2501 7 0 %
Missouri 23289.1655 7 0 %
Louisiana 20072.0005 7 0 %
Colorado 19113.8344 7 0 %
Iowa 18428.9169 7 0 %
Virginia 15404.6669 7 0 %
Delaware 14571.0001 7 0 %
Alabama 11048.8329 7 0 %
District of Columbia 9289.5832 7 0 %
Kansas 8812.2501 7 0 %
North Dakota 8465.0002 7 0 %
Mississippi 4869.0000 7 0 %
Alaska 3199.3332 7 0 %
Arkansas 3075.0833 7 0 %
Nebraska 217.1667 7 0 %
Idaho 0.0000 7 0 %
Kentucky 0.0000 7 0 %
Maine 0.0000 7 0 %
Montana 0.0000 7 0 %
New Hampshire 0.0000 7 0 %
New Mexico 0.0000 7 0 %
Oklahoma 0.0000 7 0 %
South Dakota 0.0000 7 0 %
Utah 0.0000 7 0 %
West Virginia 0.0000 7 0 %
Wyoming 0.0000 7 0 %
Table 1. Median of annual SNAP outreach spending from 2009 to 2015 per US state.

The students then consider how their measures may be reductionist—that is, how the summarized values erase the complexity of certain narratives. For instance, Salanga went on to plot a series of boxplots that displayed the dispersion of outreach spending across fiscal years for each state from 2009 to 2015. She found that, while outreach spending had been fairly consistent in several states across these years, in other states there had been a difference in several hundred thousand dollars from the fiscal year with the maximum outreach spending to the year with the minimum.

snap %>%
  filter(month(yearmonth) == 10 & year(yearmonth) %in% 2009:2015) %>%
  ggplot(aes(x = statename, y = outreach * 1000)) +
  geom_boxplot() +
  coord_flip() +
  labs(title = "Distribution of Annual SNAP Outreach Spending per State from 2009 to 2015", x = "State", y = "Outreach Spending") +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, face = "bold")) 

Figure 6: R output when student plots the distribution of outreach spending per state from 2009 to 2015.

Figure 6. Boxplot showing distribution of annual SNAP outreach spending from 2009 to 2015.

This nuanced story of variations in spending over time gets obfuscated when relying on a measure of central tendency alone to summarize the values.

This lab has been effective in getting students to recognize data work as a cultural practice that involves active discernment. Still, I have noticed that some students complete this lab feeling uncomfortable with the idea that the choices they make in data work may be framed, at least in part, by their own political and ethical commitments. In other words, in their reflections, some students describe their efforts to divorce their own views from their decision-making: they express concern that their choices may be biasing the analysis in ways that invalidate the results. To help them further grapple with the judgment calls that frame all data analyses (and especially the calls that they individually make when choosing how to filter, sort, group, and visualize the data), the next time I run the course I plan to ask students to explicitly characterize their own standpoint in relation to the analysis and reflect on how their unique positionality both influences and delimits the questions they ask, the filters they apply, and the plots they produce.

Data Chrono-Politics and Geo-Politics

In a subsequent lab, I encourage students to situate their datasets in a particular temporal and geographic context in order to consider how time and place impact the values recorded. Students first segment their data by a geographic variable or a date variable to assess how the calculations and plots vary across geographies and time. They then characterize, not only how and why there may be differences in the phenomena represented in the data across these landscapes and timescapes, but also how and why there may be differences in the data’s generation.

For instance, in Spring 2020, a group of students studied a dataset documenting the number of calls related to domestic violence received each month to each law enforcement agency in California.

dom_violence_calls %>%
  ggplot(aes(x = YEAR_MONTH, y = TOTAL_CALLS, group = 1)) +
  stat_summary(geom = "line", fun = "sum") +
  facet_wrap(~COUNTY) +
  labs(title = "Domestic Violence Calls to California Law Enforcement Agencies by County", x = "Month and Year", y = "Total Calls") +
  theme_minimal() +
  theme(plot.title = element_text(size = 12, face = "bold"),
        axis.text.x = element_text(size = 5, angle = 90, hjust = 1),
        strip.text.x = element_text(size = 6))

Figure 7: R output when student plots the total domestic violence calls to California law enforcement agencies over time divided by county.

Figure 7. Timeseries of domestic violence calls to California law enforcement agencies by county.

One student, Laura Cruz, noted how more calls may be reported in certain counties not only because domestic violence may be more prevalent or because those counties had a higher or denser population, but also due to different cultures of police intervention in different communities. Trust in law enforcement may vary across California communities, impacting which populations feel comfortable calling their law enforcement agencies to report any issues. This creates a paradox in which the counts of calls related to domestic violence can be higher in communities that have done a better job responding to them.

Describing how the values reported may change over time, Hipolito Angel Cerros further noted that cultural norms around domestic violence have changed over time for certain social groups. As a result of this cultural change, certain communities may be more likely to call law enforcement agencies regarding domestic violence in 2020 than they were a decade ago, while other communities may be less likely to call.

This was one of the course’s more successful labs, which helped students discern the ways in which data are products of the cultural contexts of their production. Dividing the data temporally and geographically helped affirm the dictum that “all data are local” (Loukissas 2019)—that data emerge from meaning-making practices that are never completely stable. Leveraging data visualization techniques to situate data in particular times and contexts demonstrated how, when aggregated across time and place, datasets can come to tell multiple stories from multiple perspectives at once. This called on students, in their role as data practitioners, to convey data results with more care and nuance.


Ethnographically analyzing a dataset can draw to the fore insights about how various people and communities perceive difference and belonging, how people represent complex ideas numerically, and how they prioritize certain forms of knowledge over others. Programmatically exploring a dataset’s structure, schemas, and contexts helped students see datasets not just as a series of observations, counts, and measurements about their communities, but also as cultural objects, conveying meaning in ways that foreground some issues while eclipsing others. The project also helped students see data science as a practice that is always already political, as opposed to something that can potentially become politicized when placed into the wrong hands or leveraged in the wrong ways. Notably, the project helped students cultivate these insights by integrating a computational practice with critical reflection, highlighting how they can incorporate social awareness and critique into their work. Still, the course content could be strengthened to encourage more critical examinations of categories students consider to be standard, and to better connect their choices in data analysis with their own political and ethical commitments.

Notably, there is great risk to calling attention to just how messy public data is, especially in a political moment in the US where a growing culture of denialism is undermining the credibility of evidence-based research. I encourage students to see themselves as data auditors and their work in the course as responsible data stewardship, and on several occasions, we have worked together to compose emails to data publishers describing discrepancies we have found in the datasets. In this sense, rather than disparaging data for its incompleteness, inconsistencies, or biases, the project encourages students to rethink their role as critical data practitioners, responsible for considering when and how to advocate for making datasets and data analysis more comprehensive, honest, and equitable.


[1] I typically assign Joe Flood’s The Fires (2011) as the course text. The book tells a gripping and sobering story of how a statistical model and a blind trust in numbers contributed to the burning of the NYC’s poorest neighborhoods in the 1970s.


Bates, Jo, David Cameron, Alessandro Checco, Paul Clough, Frank Hopfgartner, Suvodeep Mazumdar, Laura Sbaffi, Peter Stordy, and Antonio de la Vega de León. 2020. “Integrating FATE/Critical Data Studies into Data Science Curricula: Where Are We Going and How Do We Get There?” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 425–435. FAT* ’20. Barcelona, Spain: Association for Computing Machinery. https://dl.acm.org/doi/abs/10.1145/3351095.3372832.

Bates, Jo, Yu-Wei Lin, and Paula Goodale. 2016. “Data Journeys: Capturing the Socio-Material Constitution of Data Objects and Flows.” Big Data & Society 3, no. 2. https://doi.org/10.1177/2053951716654502.

Bowker, Geoffrey, Karen Baker, Florence Millerand, and David Ribes. 2009. “Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment.” In International Handbook of Internet Research, edited by Jeremy Hunsinger, Lisbeth Klastrup, and Matthew Allen, 97–117. Springer Netherlands. https://doi.org/10.1007/978-1-4020-9789-8_5.

Bowker, Geoffrey C., and Susan Leigh Star. 1999. Sorting Things Out: Classification and Its Consequences. Cambridge, Massachusetts: MIT Press.

Clifford, James, and George E. Marcus. 1986. Writing Culture: The Poetics and Politics of Ethnography: A School of American Research Advanced Seminar. Berkeley: University of California Press.

Flood, Joe. 2011. The Fires: How a Computer Formula, Big Ideas, and the Best of Intentions Burned Down New York City—and Determined the Future of Cities. New York: Riverhead Books.

Geiger, R. Stuart, and David Ribes. 2011. “Trace Ethnography: Following Coordination through Documentary Practices.” In 2011 44th Hawaii International Conference on System Sciences, 1–10. https://doi.org/10.1109/HICSS.2011.455.

Gitelman, Lisa, ed. 2013. “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT Press.

Gray, Jonathan, Carolin Gerlitz, and Liliana Bounegru. 2018. “Data Infrastructure Literacy:” Big Data & Society, July. https://doi.org/10.1177/2053951718786316.

Hill, Jim. 1996. “Illegal Immigrants Take Heat for California Wildfires.” CNN, July 28, 1996. https://web.archive.org/web/20051202202133/https://www.cnn.com/US/9607/28/border.fires/index.html.

Loukissas, Yanni Alexander. 2019. All Data Are Local: Thinking Critically in a Data-Driven Society. Cambridge, Massachusetts: The MIT Press.

Martin, Aryn, and Michael Lynch. 2009. “Counting Things and People: The Practices and Politics of Counting.” Social Problems 56, no. 2: 243–66. https://doi.org/10.1525/sp.2009.56.2.243.

Metcalf, Jacob, Kate Crawford, and Emily F. Keller. 2015. “Pedagogical Approaches to Data Ethics.” Council for Big Data, Ethics, and Society. Council for Big Data, Ethics, and Society. https://bdes.datasociety.net/council-output/pedagogical-approaches-to-data-ethics-2/.

Neff, Gina, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. “Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science.” Big Data 5, no. 2: 85–97. https://doi.org/10.1089/big.2016.0050.

Plaia, Antonella, and Mariantonietta Ruggieri. 2011. “Air Quality Indices: A Review.” Reviews in Environmental Science and Bio/Technology 10, no. 2: 165–79. https://doi.org/10.1007/s11157-010-9227-2.

Ribes, David, and Steven J Jackson. 2013. “Data Bite Man: The Work of Sustaining a Long-Term Study.” In Gitelman 2013, 147–166.

Star, Susan Leigh, and Geoffrey C. Bowker. 2007. “Enacting Silence: Residual Categories as a Challenge for Ethics, Information Systems, and Communication.” Ethics and Information Technology 9, no. 4: 273–80. https://doi.org/10.1007/s10676-007-9141-7.


Thanks are due to the students enrolled in STS 115: Data Sense and Exploration in Spring 2019 and Spring 2020, whose work helped refine the arguments in this paper. I also want to thank Matthew Lincoln and Alex Hanna for their thoughtful reviews, which not only strengthened the arguments in the paper but also my planning for future iterations of this course.

About the Author

Lindsay Poirier is Assistant Professor of Science and Technology Studies at the University of California, Davis. As a cultural anthropologist working within the field of data studies, Poirier examines data infrastructure design work and the politics of representation emerging from data practices. She is also the Lead Platform Architect for the Platform for Experimental Collaborative Ethnography (PECE).

Runaway Quilt Project: Digital Humanities Exploration of Quilting During the Era of Slavery


Deimosa Webber-Bey, Pratt Institute School of Information and Library Science


The Runaway Quilt Project began with methods and management exercises for a course during library school, archived on a research blog. The initial goal was to use digital humanities tools to explore the plethora of data that exists regarding quilting during the era of slavery, looking for interesting trends and correlations. The “Maker Unknown” quilt preserves the results of research performed during this course. The following year, this endeavor continued with other library school projects, and the goal evolved from simply exploring quilt data to creating a meaningful interpretation and presentation of the information aggregated. The “Maker Known” quilt preserves the data visualizations created during the second year of research.


The development of the World Wide Web has had a profound impact on quilting at the turn of this century, as the development of the printing press impacted it at the turn of the last. Before the printing press was invented, quilt patterns were passed from quilter to quilter; after it was invented, quilt patterns were mass distributed in magazines and newsletters. Before the invention of the World Wide Web, quilt research was primarily qualitative, relying on the analysis of surviving pieces in museum collections; since the web’s development, it has become possible to study large swaths of surviving quilts held in museums, as well as those personally owned and itemized in various documentation projects. This presents quilt scholars with a wide range of new opportunities.

My grandmother is both a quilter and a teacher, traits which she has passed down to me. She meets weekly with an informal quilting circle, passing on knowledge and patterns to her peers. We are both members of the Empire Quilter’s Guild, where quilters in the metropolitan area meet monthly for professional development and inspiration. My favorite part of the monthly meeting is a show-and-tell, where twenty to thirty quilts are displayed in rapid succession. Members share recently completed quilts as well as works in progress. These objects of art vary from traditional block patterns in muted tones to innovative uses of color captured in modern abstract designs. Each quilt shown is documented by the guild, and the online gallery of photographs from each meeting expands the audience from those present in the room to anyone with an internet connection. When the meeting is over and this group (of predominantly senior citizens) leaves the meeting, folks on the sidewalk that see these women emerging from the building with tote bags and rolling suitcases have no idea that they are crossing paths with fine artists. The meeting is inspirational.

In my second semester at the Pratt Institute School of Information and Library Science, during an introductory Digital Humanities course, our professor, Dr. Chris Alen Sula, advised that we explore the same topic throughout the semester as we learned various digital tools for scholarship. This lab course taught us how to support twenty-first century scholarship in the humanities as librarians, and since it was my opportunity to engage any topic I desired, I decided to study African American quilting in the nineteenth century and use a blog, which I named the Runaway Quilt Project, to enable public interaction and feedback. This project eclipsed the one semester and eventually took center stage during my time at Pratt, and this article reflects two years of experimentation with digital humanities tools.

Trained as both an African and African American Studies scholar and a quilter, I am intrigued by the myth that quilts were used as signs on the Underground Railroad, and I am more than aware of the controversy that surrounds it. In fact, I avoid taking sides; as an African American female quilter I want to believe that women in slavery and their quilting peers on the outside were capable of such acts of resistance, but as an academic I know that there is no smoking gun/quilt and that the case against this legend has been well established. However, I chose to create digital objects that tell the story of quilting during the era of slavery so that I understand the context of this debate, and the resulting materials neither support nor refute the myth. These digital objects serve as a foundation for exploring trends in the data and developing research questions.

The tools and experimentation process take central stage in this narrative because the main purpose of each exercise was to learn how to create a digital object for a class assignment. Sometimes a later assignment allowed me to explore an idea or discovery in more depth, and at other times the research pursued a tangent or jumped into something new. At the end of each year, following discussions surrounding the preservation challenges of digital research projects, I created a tactile object – a quilt – that captured information and visualizations that could be represented on a two-dimensional surface. This allowed me to archive my research and share it with a non-academic audience, such as during the show-and-tell at the Empire Quilt Guild meeting. Ultimately, I entered these archival quilts into the International Quilt Festival to share my experience and use these objects of interest to drive traffic to the research blog that contains two years of work. This provides me with a critical mass of views so that I can eventually evaluate the blog and research process using altmetrics.

Altogether, this article privileges process over product. At the beginning, I examine the background for this research, including Brackman numbers, digitization projects, and the myth of a quilt code. This leads into textual analysis and introduces Gracie Mitchell, a quilter and ex-slave interviewed by the WPA Federal Writers Project in 1938. I present the experiments that were inspired by Mitchell’s interview transcript individually, with a brief discussion of the digital object at end of each section; they are not analyzed in full because my intention is to show rather than tell. At the end of this paper, I discuss documentation, preservation, and sharing, and I introduce the revised data quilt that preserves my work.


In his seminal essay, “English and the African Writer,” Chinua Achebe discusses the use of English for communication:

The African writer should aim to use English in a way that brings out his message best without altering the language so much that its value as a medium of international exchange will be lost. He should aim at fashioning an English that is at once universal and able to carry his peculiar experience. I have in mind here the writer who has something new, something different, to say. (Achebe 1997, 347)

In the twenty-first century I interpret this charge to now include the use of HTML, the medium of international exchange on the World Wide Web, and so I attempt to use the tools of Digital Humanities to say something new and different about my particular experience as an African American quilter. The simple informational displays created for this project constitute practice for me, preparing me for future in-depth research.

Data visualization and the African & African American studies scholar

The week that we learned to use Tableau Public, a free data visualization tool, I created a digital object that has nothing to do with quilting, but everything to do with my interest in resistance during slavery. I turned to the Voyages Trans-Atlantic Slave Trade Database, where there are 34,946 records in the “List of [slave] voyages” from 1514 to 1866. The website enables users to create visualizations, but users can also create and download custom tables, so I downloaded a table with only the data for voyages where the slaves resisted their captors. I experimented with several different chart configurations before I settled on a bubble map:

Figure 1: Bubble map for resistance during Middle Passage

Figure 1. Bubble map for resistance during Middle Passage

Left to right shows the continuum of days during middle passage, the size of the circle markers show how many slaves were on a ship, and the color indicates what country the slaves are from. Interesting questions that emerge from the data visualization include:

  • Why did slaves from Senegambia (pale green) who resisted tend to rebel on the first third of the voyage?
  • Why did slaves from Bight of Benin (dark orange) who resisted tend to rebel during the second half of the voyage?
  • Why did slaves from Sierra Leone (purple) and the Windward Coast (lavender) who resisted tend to rebel about fifty days into Middle Passage?

Creating this visualization gave me good practice with Tableau Public, and this example demonstrates the potential for using data visualizations to develop research questions.

Quilt digitization and documentation projects

The exciting part of exploring this subject in the digital humanities realm is that, with the affordances of the Internet and quilt documentation projects, particularly the Quilt Index, I was able to sit at my desk and download data for thousands of quilts. Before, conducting this work might have required a lifetime of traveling to museums and private homes to collect information. Since museums and research collections have posted their collections online and hosted documentation days, where private quilt owners bring in quilts to be photographed, dated, and sometimes placed geographically, the Quilt Index has been able to collect these virtual collections in one place on the web. This allowed me to build on their work and to create data visualizations and interactive digital objects that I and the public can explore, looking not for answers, but for interesting trends that led to new research questions and serendipitous discovery.

Brackman numbers

Many of the quilts in the Quilt Index and other archival collections are tagged with a Brackman number, from Barbara Brackman’s Encyclopedia of Pieced Quilting Patterns, which provides authority control for cataloging. This encyclopedia standardized the classification of quilts in 1993. Previously, patterns might be described in many different ways, but Brackman (1993) divides patterns into 25 categories, “classified and grouped into categories on the basis of the basic unit of design and the way it is repeated (its repeat). These visual categories are usually defined by seam lines that organize designs into types” (13). According to Janice Price (2011), Collection Manager at the International Quilt Study Center and Museum and a graduate of University of Nebraska-Lincoln’s unique quilt emphasis program, Brackman numbers are universal identifiers understood in the entire quilting domain. Brackman’s The Encyclopedia of Pieced Quilting Patterns and The Encyclopedia of Applique are considered the essential guides, with the potential to make databases interoperable. When quilt archivists are cataloging a quilt with no know title, location, or quiltmaker, eliminating the question of author or title entry, they begin by dating it within a twenty-year range. To do this curators identify patterns using the 1” x 1” pictures in Brackman’s encyclopedia.   They then examine the fabric and its colors, as well as whether the item was machine-stitched. Price acknowledges that dates change frequently as quilt historians make new discoveries. The quilt revival has been able to cope with the explosion of quilt documentation due to the affordances made possible by Brackman’s meticulous work.

The myth of a quilt code

One subject that has occupied popular conversation in the quilt domain since the 1996 publishing of Hidden in Plain View is the theory that quilt blocks were used for a code system in the Underground Railroad (Tobin 1999). Jacqueline Tobin advances the idea that different block patterns had unique meanings and that when the quilts were hung outside on clotheslines, fences, and porch railings, runaway slaves potentially interpreted the coded messages and acted accordingly. This controversial theory has two camps: one which asserts that there is no proof, and one which believes that this act of resistance is obvious, particularly due to the lack of proof, because success required discretion. It therefore occupies what Mary Louise Pratt refers to as a “contact zone,” the space between oral history and the written record, where these two distinct camps meet, clash, and grapple with each other (Sharpe 2003, 4). An example of this is the March 2003 Traditional Quiltworks article, “Betsy Ross redux: the Underground Railroad ‘Quilt Code,’” expanded into an e-book on the author’s website, where Leigh Fellner (2006) meticulously refutes the details included in Hidden in Plain View and devotes space to a “Hall of Shame” for “A seemingly endless cavalcade of retailers (almost exclusively white) who use slavery, African-Americans, and the ‘Underground Railroad Quilt Code’ as a marketing tool.” However, in this current global culture, picture books such as Sweet Clara and the Freedom Quilt, in which a slave creates a map quilt with the Underground Railroad route stitched onto its surface, and the proliferation of quilt code lesson plans on the Internet, ensure that this idea will persist into subsequent generations and feed popular consumption (Sharpe 2003, 40).

Jenny Sharpe (2003), in Ghosts of Slavery, argues that in the absence of written information historians are forced to turn to oral histories to examine conjecture and understand what may have happened, so it is important to note that the data that I use for this project is incomplete (25). The quilts that have survived and inform this research represent a fraction of the quilts and quilters that were active in the era that I am studying. The experiments that I conducted with this data are as significant for what they do not say as they are for what they do. Like Sharpe, I am using an incomplete dataset to better understand a subject we cannot definitively know, since both the quilts and the narratives of slaves cited in this study represent the fraction that have survived the passage of time. Sharpe (2003) writes:

Rather than equating a black female subjectivity with individual consciousness or modes of self-expression like songs and testimonies, I locate it between written and oral histories, first-person and third-person accounts, pro- and antislavery writings, and at the point where the unspoken narratives of everyday life intersect with the known stories of slavery. In noting the inadequacy of language, I also denote the limits of this study as an effort to describe the everyday lives of female slaves, about which we have much to learn but can never fully know (xxvi).

I have been aware of the controversy regarding whether or not quilts were used as signposts on the Underground Railroad for years, so rather than ascribe to the arguments of either side, I used this opportunity in library school, learning tools for twenty-first century scholarship, to research quilting during and directly after the era of slavery within the general American and specifically African American population. Recognizing my limits, as Sharpe does, my goal was to create objects that stimulate conversation so that I can learn from scholars and quilters alike, facilitating serendipitous discovery and recording the cultural knowledge possessed by my grandmother and her quilting peers.

In Ghostly Matters, Avery Gordon (2008) writes about the need to investigate “that which makes its mark by being there and not there at the same time,” and the goal of the Runaway Quilt Project is not to prove or disprove a legend passed down through oral tradition (6). My objective is to understand the context of this legend while contributing empirical information to the field of African American and quilt scholarship that facilitates quantitative analysis for a variety of academic inquiries. At a minimum, quilting allowed African American women a form of artistic expression during a time of subjugation and dehumanization, and as Saidiya Hartman cautions, we should not “overestimate the subversiveness of everyday acts of resistance in the face of terror and cruelty suffered by slaves and the constraints placed on their agency” (quoted in Sharpe 2003, xv). However, the overwhelming appeal of the idea of a quilt code to the human psyche creates not only a hook or point of interest for this research exercise; it also offers an opportunity for future investigation into how slave myths and legends formed, as well as how they inform African American culture today. As Gordon (2008) writes:

[A]ny people who are not graciously permitted to amend the past, or control the often barely visible structuring forces of everyday life, or who do not even secure the moderate gains from routine amnesia, that state of temporary memory loss that feels permanent and that we all need in order to get through the days, is bound to develop a sophisticated consciousness of ghostly haunts and is bound to call for an “official inquiry” into them (151).

African American quilters have not been allowed to enshrine this everyday act of resistance into the historical record because of its foundation in oral tradition, a lack of concrete evidence, and details that may have become exaggerated over the passage of time. Yet, the persistence of this legend required me to address it, since it has become enshrined in the narrative of popular culture, and my opinion on this controversy is, and will continue to be, a question frequently asked by family, friends, peers, and colleagues. My answer has to be informed.

Oral history and digital annotation

For our first foray into digital humanities, we explored digital annotation, and I decided to take advantage of the Library of Congress’ online American Memory collection. During the 1930s, the Works Progress Administration conducted interviews with African Americans who were former slaves. Searching for the keyword “quilt,” out of over 2,300 interviews, which were digitized for Born in Slavery: Slave Narratives from the Federal Writers’ Project, 1936-1938, I found that 156 (6.78%) mention quilting (Library of Congress Manuscript Division 2001). Over the course of a week, I read each of these interview transcripts and themes began to emerge. As I transcribed, I divided the quotes into categories:

  • Quotes from (or about) specific female slaves who quilted
  • Quotes that explain how quilts were made
  • Quotes that describe how quilts were used
  • Quotes from anecdotal stories where quilts are mentioned
  • Quotes about a term that was new to me – the “quilting party”

To share this work in an interactive forum, I created a Digress.it website, where users can comment on each quote individually, engage the text, debate interpretations, and make their own meaning of the text. These brief mentions of specific patterns, methods, and hanging out at a “quilting” late into the night, show that the craft was economically, socially, and politically essential to the community. Unfortunately, Digress.it seems to no longer be a functional website, so I created pages on the blog that list the quotes by topic, but users will no longer be able to comment at the paragraph level. One transcript that caught my immediate attention was that of Gracie Mitchell, interviewed in Pine Bluff, Arkansas. Like an Empire Quilter, Mitchell conducted a show-and-tell with her captive audience, interviewer Bernice Bowden. In the notes for Mitchell’s interview, Bowden included a list of the twenty-two quilt designs that Mitchell showed her on that day (Mitchell 1938). This list, and Mitchell’s transcript, became central to my research experience, influencing the direction of further inquiry.

English literature and line graphs

The Google Ngrams tool measures how often phrases occur in books scanned as part of the Google Books project, and so I looked at terms related to large gatherings of slaves, such as “cornhuskings,” “log rollings,” and “quilting parties,” as well as quilt block names. These were analyzed in relation to relevant time periods and terms, such as “runaway slave” and “underground railroad,” and presented as graphs with observations.

To begin with, quiltings, candy pullings, log rollings, and corn shuckings were significant social gatherings identified by ex-slaves interviewed for the Federal Writers’ Project in the 1930s. Explaining “quiltins,” three interviewees state that, with permission from their masters, slaves were able to attend a quilting at another plantation, which presented a significant opportunity for socializing and the communication of ideas (Avery 1936, 3; Davis 1938, 9; Mullen 1936, 3). Regardless of whether they were intentionally subversive gatherings, the presence of alcohol and limited oversight is significant. The “quilting party” also figures significantly in 19th century American texts, as shown in Figure 2 below.

Figure 2: English quilt term variants 1800-1900 (sm=5)

Figure 2. English quilt term variants 1800-1900 (sm=5)

This chart shows that the most popular way to refer to the event being discussed was “a quilting,” followed by the plural version “quiltings.” Both of these use the string of letters q-u-i-l-t-i-n-g as a noun, and its use as a noun is far more popular than its use as an adjective to modify “bee” or “party.” Investigating the sources scanned in Google Books that make up this data confirms that in the phrases “a quiltin” and “a quilting,” q-u-i-l-t-i-n-g is being used as noun, not a verb or adjective. A quilting was its own significant categorical event in the American psyche. All of the terms occur less frequently in British English publications, and both “quilting party” and the informal variant “a quiltin” are absent altogether, as shown in Figure 3 below.

Figure 3: 19th century British English quilt gathering term variants (sm=5)

Figure 3. 19th century British English quilt gathering term variants (sm=5)

This chart indicates that audiences for “quilting parties” and “quiltins” are unique to the American population. The “quilting party” is mentioned more in nineteenth-century American print publications than any of the terms in British print. Returning to Figure 1, it shows that these American phenomena were first mentioned in print in 1820.

  • The phrase “quiltin” peaks in American print in the 1840s (~0.000000850%) and “quilting party” in the 1850s (~0.000001750%).
  • The phrase “quiltings,” the most frequently occurring variant in British print, was most popular in the 1830s (~0.000001600%).

Filtering out British publications, I analyzed the terms to examine the American experience (Figure 4).

Figure 4: American English quilt gathering term variants (sm=3)

Figure 4. American English quilt gathering term variants (sm=3)

  • Most of these variants show a rapid increase in American print usage of the terms between 1820-1850.
  • The phrases “quiltin,” “quiltings,” and “quilting party” slowly decrease in occurrence in American print throughout the twentieth century.
  • The “quilting bee” replaces “quiltin,” “quiltings,” and “quilting party.”

The phrase “a quilting” sees revival and growth beginning in the 1970s, but investigating the Google Books used for the data set shows that, for the latter half of the twentieth century, quilting is used as an adjective in these occurrences, not as a noun. Falling out of usage first is “a quiltin,” hitting a low during the 1920s and never recovering. The phrase “quilting party” occurs about as often in the 1940s as it does in the 1850s, holding consistent for roughly one hundred years before a steep decline in the 1950s. The pluralized noun “quiltings” decreases in popularity beginning in the 1880s and continues to decrease throughout the twentieth century. Social quilting gatherings by these names have a life span of about 140 years, likely their audience did as well – meaning that the quilters who used this specific term were a distinct group of women born in the mid- to late-nineteenth century, who died in the early to mid-twentieth century without passing on the specific habits related to this term to the next generation of quilters. In the span of one generation, these particular nineteenth century American quilters, a phenomenon occurred that did not exist before, and it has since faded from collective discussion. If they didn’t pass down the term or practice of “quiltings” to their daughters, then we can infer that there is related information that was also lost.

Searching for quilt needles in a haystack

While the correlation between English literature and African American quilters is slim, curiosity required that I look for the frequency of use for phrases that are associated with the Underground Railroad quilt code legend. Figure 5 shows the introduction of related phrases into American English texts.

Figure 5: American English URR Myth (sm=1)

Figure 5. American English URR Myth (sm=1)

  • The uniquely American social gatherings to finish quilts predate the popular phrase “Underground Railroad,” but not the act of resistance embodied in the phrase “runaway slave.”
  • The two Google Books showing “Underground Railroad” mentioned in 1800 are incorrectly dated and are actually from 1860 and 1890.

As Figure 6 indicates, of any social gatherings for slaves with the potential for planning such an endeavor, a “quiltin” or “quilting party” are the most significant.

Figure 6: Terms for slave gatherings in American English during the URR (sm=0).

Figure 6. Terms for slave gatherings in American English during the URR (sm=0).

Looking at the events surrounding significant bumps on this chart:

  • The Fugitive Slave Act of 1850 requires that runaway slaves be returned to their owners.
  • In 1854 the Republican Party forms.
  • In 1857 the Dred Scott decision states that the Bill of Rights does not apply to slaves.
  • In 1858 Abraham Lincoln, nominated by the newly formed Republican Party, runs for Senate.
  • In 1860 Abraham Lincoln is elected President of the United States.
  • January 1, 1863, President Lincoln signs the Emancipation Proclamation.

After looking at significant terms for slave gatherings, I examined phrases used to describe quilt gatherings, the act of resistance, and pieced quilt block patterns that were included in Bowden’s interview notes for Gracie Mitchell. Of particular note, in the WPA slave narrative interview with Walter Rimm, he describes a quilting party where a runaway slave escapes a trap set for him by patrollers, yelling “Bird in de air!” into the night as makes his getaway (Rimm 1936-1938, 2). This is of particular note because the “bird in the air” or “birds in the air” pattern is one of the quilt block patterns most frequently associated with the Underground Railroad quilt code myth. It is theorized that this pattern (aligned triangles) was hung on a railing, porch, or clothesline, with the triangles aiming North, South, East or West, so that escaped slaves could determine which direction to go. It does not prove anything, but the fact that a slave yells this particular phrase while running away from a quilting party is both interesting and significant.

Figure 7: URR terms and quilt patterns in American English (sm=1)

Figure 7. URR terms and quilt patterns in American English (sm=1)

The theory of a quilt code has many enthusiasts, but no concrete evidence. Quilt historians and Underground Railroad scholars have disputed and criticized the idea since it came to light.  An examination of the relative occurrence of all social slave events and acts of escape shows that quilt gatherings had the most potential for the correlation of activities, and looking at the quilt patterns associated with the myth demonstrates that the phrases “railroad crossing,” “breakfast dish,” “birds in the air,” and “half an orange,” are significantly used in the decade prior to Emancipation. However, figure 7 does not offer any compelling circumstantial evidence that would support or refute claims that these patterns were used for a quilt code.

Gracie Mitchell: Snapshot of an African American Quilter

I created a timeline for Gracie Mitchell, using Timeline JS, in order to present an interactive object that gives context for the era during which she lived. In addition to dates from her life that she described during the interview, I researched the significant dates for the quilt patterns that she showed her interviewer in 1938. Copying the template provided by the Timeline JS site, I created a Google Sheet with time-series data related to details from Gracie Mitchell’s interview transcript, where the Bernice Bowden included two pages of notes. While one notes page lists all of the quilts that Mitchell showed Bowden that day, another page lists details such as the exact duration of Mitchell’s residence in each state and the years that she moved. Following that, I researched relevant historical items (including still images, audio, and video) for each time span, topic mentioned, or location lived in, and placed it in the appropriate place chronologically. I set the timeline to begin with the date she was interviewed, so that to read it the user has to go backwards in time from 1938. This software is easy to use and produces digital objects that are simple to navigate.

Figure 8: Timeline for Gracie Mitchell

Figure 8. Timeline for Gracie Mitchell

Geospatial mapping – part one

For our first foray into mapping I used the Leaflet Maps Marker tool (a WordPress plug-in) to create interactive maps of 19th century quilt pattern occurrences, relying on data from the online collection of the International Quilt Study Center and Museum. Gracie Mitchell’s interview transcript provided me with a thematic grouping of quilt patterns to work with, and each map uses one of Leaflet Maps Marker’s symbols to represent a quilt pattern. The symbols are placed where the quilt was made. Clicking on the symbol provides a pop-up window with metadata for the particular quilt, and the aggregated map, shown in figure 9, depicts all of the maps layered together.

Figure 9: Aggregate map of 5 Gracie Mitchell designs

Figure 9. Aggregate map of 5 Gracie Mitchell designs

One constraint of this activity at the time that I worked on it was that only five of Gracie Mitchell’s quilt patterns were mapped. Of those that were mapped, most of the occurrences are concentrated in the Northeastern United States.

  • Broken Dishes: The earliest example of this design is from New England, circa 1860-1880.
  • Sawtooth: This is the second pattern design listed in the 1938 interview record. There is one early quilt (circa 1830-1850) that was produced in Alabama. However, most of the pattern occurrences (including the earliest, circa 1820-1840) are in modern day Pennsylvania; 19th-century quilters were likely inspired by the rugged terrain of the Appalachian Mountains.
  • Tulip Appliqué: Gracie Mitchell referred to appliqué as “laid work.” This design made it to Indiana by 1860.
  • Cactus: Because Gracie Mitchell called the design she completed “Prickle Pear,” it is possible that she appliquéd the design. However, the interviewer did not list “appliqué” or “laid work” as she did for other pieces that used that method.
  • Birds in the Air: This pattern is has a strong affiliation with the quilt code myth, and the phrase is used significantly in an ex-slave’s interview, as noted earlier. Ms. Bowden lists it as “Birds All Over the Elements” in the interview transcript.

This mapping exercise was both intriguing and frustrating, as it had incredible potential for analysis, but I was not working with the ideal tool. One year later I would return to this problem.

Network analysis

For my final Digital Humanities class project in the spring of 2012, I decided to preserve the data I curated during the project (Jan-May 2012) on a quilt top. Using Cytoscape, open source software for visualizing networks, I created a frequency map of the quilt block patterns that were mentioned by Gracie Mitchell in 1938 and also existed in the 19th century. The size of the block shows how often the quilt pattern occurs in the IQSCM collection. The colors of the log cabin block, the largest square (one side of the quilt), represent the five types of quilts. The quilt is framed with a border that provides the source code for the home page of the project website in May 2012, and the hanging loops around the edges represent the ethnicities of the women working together on an object at a quilting party. The network of quilt blocks is constructed by hand and by machine, using multicolored thread.

Figure 10. Planning notes, quilt front (log cabin block with quotes), and quilt back (remaining blocks)

To finish the quilt, I held a quilting party with some of my female friends, and after presenting in the quilt in class and at the Pratt SILS Student Showcase, I entered it in the show-and-tell at the Empire Quilters’ meeting in May 2012.


I created a video to capture the process of creating this quilt, titled Maker Unknown because 81% (141 out of 175) of the nineteenth-century quilts used for research in this project were listed in the IQSCM catalog as “Maker Unknown.” The quilt is dedicated to the slave women who labored as quilters, creating art for everyday use, and to all post-Emancipation quilters and quilt enthusiasts who keep their legacy alive.

  • The quilt was constructed in April and May 2012 in New York City (Brooklyn and Queens).
  • The quilt blocks were pieced by Deimosa Webber-Bey (Log Cabin, Tulip Appliqué, Orange Peel, Reel, Railroad Crossing, Tree of Life, Sunflower, Birds in the Air, Bird’s Nest, Ocean Wave, Drunkard’s Path, Leaf) and her grandmother, Marian Webber (Feathered Star, Carolina Lily, Sawtooth Star, Broken Dishes, Cactus, Whirligig).
  • The Log Cabin block was made by printing the data and images aggregated during the project, as well as the QR and source code for runawayquiltproject.org, onto fabric treated with Bubble Jet Set.
  • Quilting around the data strips on the Log Cabin block was done by hand in variegated thread by Tiana Grimes, Erica Schwartz, and Deimosa Webber-Bey at a quilting party, in keeping with nineteenth-century construction techniques; the hanging loops represent enslaved nineteenth-century quilters sitting around a quilt, and they are made from African (15/16) and Native American (1/16) fabric (in 9 out of 156 interviews used for this project, the ex-slave identified Native American heritage).
  • The information on the Log Cabin block faces out from the center in all directions so that it is best engaged when placed on a table, with observers seated around it making their own meaning of the information presented.
  • There is a suggested citation on the quilt top.

Ego network

During the spring of 2013, I took Dr. Sula’s Information Visualization course, and I returned to the problem of network analysis. For this visualization experiment I retrieved a dataset from the Quilt Index by searching for quilts made in the different geographic locations where Gracie Mitchell lived when she lived there. Her interviewer, Bernice Bowden, recorded the places where Mitchell lived during her life and the years that she lived in each. The result is a sample dataset of quilters, her contemporaries, and the patterns that were being made around her, by her contemporaries, throughout her life. While Gracie Mitchell’s quilts may not have physically survived, their occurrence – or instantiation – was documented by an authority, so the goal of this ego-centric network was to create a digital object that shows the context in which Gracie Mitchell quilted and chose the twenty-two patterns that she executed.

First, using Cytoscape, I uploaded the column with locations as the source, the quilter column as the interaction, the pattern column as the target, and the year column as an edge attribute. Then, in the Custom Graphics Manager pane I added a Sawtooth Star icon to the list. I changed all of the nodes to Sawtooth Stars, except for the locations, for which I found public domain state icons, and I changed the font to Courier, which resembles the typeface used in the WPA narratives. Then I selected the degree-sorted circle layout, so that Cytoscape would run a calculation that would allow me to use degree as a node attribute.

Next I chose to use the date attribute for continuous mapping of the edge color. This allowed me to create a gradient that identifies the period when the quilt was made; I chose to leave the edges black for quilts made during the era of slavery. Then I changed the node sizes so the degree of magnitude is 10x their in-degree number and the location node sizes are 10x the number of years that Gracie Mitchell lived in each place. I also changed the edge line style for Gracie Mitchell to dashed lines and the edge line style for all of the other quilters to dotted lines. At this point I made minute manual adjustments to the positions of a few nodes in order to minimize the overlap from the node labels. I shared my network with a few friends to get a sense of readability and what story it was telling, and I got a suggestion to position the locations relative to each other as they are geographically.

Figure 11: Edge weighted force directed layout

Figure 11. Edge weighted force directed layout

I was concerned that the network gives the impression that all of Gracie Mitchell’s quilts were made in 1938, so after manipulating the nodes awhile, I settled on placing the geographic place names in the center, Gracie’s quilts in an inner circle (degree sorted), and the quilts made around her in an outer circle (sorted alphabetically). This places Gracie in the center of the network and shows how the quilt patterns she chose fit into the larger context of where she was living and how she was influenced by the quilters around her. The network visualization depicts the patterns created in the states where Gracie Mitchell lived while she was there. The lines represent each quilter that executed the pattern and the line colors indicate the year the quilt was made.

Figure 12: Ego centric quilt network

Figure 12. Ego centric quilt network

With a growing sense of my final project for Information Visualization looking like a Reconstruction era American flag, I created an ego network with a circular layout, placing the Quilt Index data on the outer ring, Gracie Mitchell’s twenty-two designs in an inner ring, and the three residences in the center. The visualization infers the following:

  • Gracie Mitchell probably became familiar with certain patterns, such as the Feathered Star, Sawtooth Star, Log Cabin, and Sunflower, while living in Texas, where she resided until she was almost 40.
  • She possibly learned how to do the Tree of Life pattern while she was living in Chicago for eight years.
  • Only the Orange Peel overlaps with work that has survived from Arkansas.

In comparison with other quilters of her location and era, you can infer that her pattern selection was influenced by her experiences in other states. But the fact that very few of her designs overlap with contemporaries’ shows that she was experimental, and perhaps one of the first in her area to purchase a book of patterns. She does state that she had a book of patterns in her interview, lent to a friend and never returned (Mitchell 1938, 2).

Big Data and Its Affordances

At the beginning of the 2013 spring semester, I decided to use a more comprehensive database, the Quilt Index, to continue the project. The Quilt Index is somewhat comparable to WorldCat, in that it is a compilation of records from many different quilt collections. It is a free, open-access project of Matrix,  Michigan State University Museum, and the Quilt Alliance. There is less consistent metadata, but the pattern name and year are almost always present, which is essential to the interactive map problem that I was working on. Using the search tool, I was able to construct large comparison tables for item records, and then copy and paste them into Excel. After bringing the file into Google Refine, a software program for cleaning messy data, I was able to enforce a controlled vocabulary for the quilt pattern names in about 750 item records and format all of the dates similarly.

Using Google Refine, I clustered/merged the patterns and quilter names and then clustered/merged the dates as text facets in order to delete the “c” for circa in front of some dates and turn date ranges like “1860-1890” into the earliest potential occurrence (“1860”). After that I transformed the column into dates. After some additional cleaning, I was able to make several linked datasets available openly through Google Sheets.

Geospatial mapping – part two

Now that I had a significant dataset, occurrences of the twenty-two patterns Gracie Mitchell used during the years 1800-1849, I transferred it to a Google Fusion Table. With this brand new tool, I was able to view the quilt pattern occurrences on a full screen map and share a link where others could zoom in and out, and they would finally be able to filter by pattern! This was an improvement over the WordPress plug-in that I used during spring 2012, which only allowed me to add markers one at a time (and I never finished building the layers for all twenty-two patterns). The map shows all of the patterns with a different icon, and the user can zoom in and filter. Regarding icons, I chose images that somewhat related to the name of the pattern (for example, “drunkard’s path” is represented with a martini glass), but the best imagery for this map would be the actual quilt blocks. As this map is very cluttered, I knew that in the end I would be making small multiples in order to facilitate analysis of the data.

Geospatial mapping – part three

Returning to my mapping challenge once again, I used Tableau Public to create an individual map for each block pattern. Initially, I had a complex spreadsheet and cluttered maps, but I decided to only show the oldest instances of the quilt pattern documented in the quilt index (roughly ten or fewer data points) so that users can consider the possible geographical origin for each design. They can also compare the designs to each other, getting a sense of what patterns are the oldest in the dataset.

After uploading my dataset with 171 records into Tableau, I had to clean it. If the location for where a quilt was made was not clear in the full item record (in the “location,” “provenance,” “quilt history,” or “quiltmaker address” fields), then I entered a location based on the address of the owner (usually a relative/descendant of the quiltmaker), the person who brought the quilt in for donation or documentation. There were about a dozen instances where I entered the location of the contributor (to the Quilt Index) or the quilt collection. Next, I uploaded the file into Tableau and created a map visualization using the pattern field, which gave me horizontal maps. I used the pattern field again to create columns, which gave me a grid with square maps. The ones that I needed were in a diagonal from the top left of the visualization to the bottom right, so, using Microsoft Paint, I cropped screen shots of each pattern map for small multiples:


Overall I was pleased with the way that the maps turned out. Some of them are dense, and in order to fit them in the same size square you lose readability (such as with the Log Cabin, Sunflower, Tulip, and Reel), but it is a worthy sacrifice for the overall effect. I was able to save my visualization to the web with Tableau, and embed the maps into my blog.

Heat map

In order to get a sense of the popularity of each of the twenty-two patterns over time, I retrieved a dataset from the Quilt Index that spanned 1840-1940 – the 100 years prior to Gracie Mitchell’s WPA interview. Using Google Refine, I made authoritative choices for pattern names and dates (changing circa spans to a specific year), and then, using Tableau, I created a heat map with the data presented in five year “buckets,” where color codes frequency in the matrix.


At the end of the spring 2013 semester, I aggregated final versions of the information visualizations into a whole cloth design for a quilt, a second draft of the “Maker Unknown” quilt that I constructed the previous spring, and had the design printed onto fabric. This textile object and infographic is a more thoughtful execution of my research into Gracie Mitchell, and so it is named “Maker Known.” I decided to approach this infographic quilt with three questions:

  • How was Gracie Mitchell influenced by the quilt(er)s around her?
  • What is the geographical origin for each pattern?
  • How popular were each of these designs during the final decades of slavery in the U.S. and during Gracie Mitchell’s lifetime?

Figure 13: Data quilt 2.0 - "Maker Known"

Figure 13. Data quilt 2.0 – “Maker Known”

The overall data quilt is designed to resemble a Reconstruction-era flag, and the colors used are hues of red, white, and blue, where red is consistently used to emphasize significant data and the background white is a word cloud generated with Wordle from the RQP digital annotation exercise. The heat map runs across the bottom of the data quilt top, adding to the flag impression with more horizontal stripes. The heat map shows the frequency with which twenty-one of the twenty-two patterns were made; the log cabin pattern occurs the most frequently, making up almost half of the items retrieved for the data set, so it is highlighted with its own chart, separated from the heat map because it is an outlier and heavily skews the visualization when included. In the heat map a vibrant red codes high frequency and dark blue codes low frequency. In the chart for the log cabin, like the small multiple maps, red codes an item as older, blue as newer, and size codes frequency.

Overall this infographic quilt represents using data to place the historical figure Gracie Mitchell in context. The visualizations show that she was experimental and executed several quilt designs that were established patterns, but not frequently made by her contemporaries. She also created a few quilt tops that demonstrate her familiarity with traditional patterns that existed during the era of slavery and continued to be popular throughout her lifetime. I quilted the final tactile object at home, by machine, and then submitted this quilt and “Maker Unknown” to the International Quilt Festival.


Ultimately, “Maker Known” was accepted into the International Quilt Festival, and the quilt traveled from August 2013 to August 2014. In the fall of 2013, I traveled with my sister and grandmother to the show in Houston, Texas, where I saw my quilt hanging in one of the most highly esteemed quilt shows in the field. While I still feel that I am an amateur quilter, I know that the piece earned its place because of the eighteen months of research and design that went into it. It was important as well, as a personal accomplishment, to share the experience of traveling to Houston to see my work in the show with my grandmother in her eighty-fifth year. This is my proudest achievement to date.

Figure 14: "Maker Known" at the International Quilt Festival in Houston, TX

Figure 14. “Maker Known” at the International Quilt Festival in Houston, TX

Now that the quilt has been returned, it is my hope that traffic to the blog will continue and that readers will comment on and annotate the digital objects. These second, third, and fourth sets of eyes may identify ghosts that I could not see in the data. Ideally, this research will engage quilters as well as digital humanists and African American Studies scholars, whose knowledge of the craft can piece together the past and unleash the imaginative force of what might have been (Sharpe 2003, xii). These conversations will inform my future research.


Avery, Celestia, interview by Minnie B. Ross. 1936. Born in Slavery: A Few Facts of Slavery (November 30).   OCLC #47265597

Brackman, Barbara. 1993. Encyclopedia of Pieced Quilt Patterns. Paducah, KY: American Quilting Society.  OCLC #27812938

Davis, Minnie, interview by Sadie B. Hornsby. 1938. Plantation Life as Viewed by an Ex-slave. OCLC #47265597

Fellner, Leigh. 2006. “Betsy Ross redux: the Underground Railroad “Quilt Code.” Hart Cottage Quilts. http://ugrrquilt.hartcottagequilts.com/ (accessed September 20, 2014).

Gordon, Avery. 2008. Ghostly Matters: Haunting and the Sociological Imagination. Minneapolis: University of Minnesota Press. OCLC #232663637

Library of Congress Manuscript Division. 2001. Born in Slavery: Slave Narratives from the Federal Writers’ Project, 1936-1938. Washington DC, 2001. OCLC #47265597

Mitchell, Gracie, interview by Bernice Bowden. 1938. Born in Slavery: Arkansas Narratives, Volume II, Part 5 (November 1).  OCLC #47265597

Mullen, Mack, interview by J.M. Johnson. 1936. Born in Slavery: Mack Mullen (September 8). OCLC #47265597

Price, Janet, interview by Deimosa Webber-Bey. 2011. Collection Manager, International Quilt Study Center & Museum (November 1).

Rimm, Walter, interview by WPA. 1936-1938. Born in Slavery: Ex-slave stories (Texas).  OCLC #47265597

Sharpe, Jenny. 2003. Ghosts of Slavery: A Literary Archaeology of Black Women’s Lives. Minneapolis: University of Minnesota Press. OCLC #50479199



About the Author

Deimosa Webber-Bey, MSEd MSLIS, is a librarian and educator with a passion for young adult literature, graphic novels, postcolonial subjects, and quilting. An undergraduate English and African & African American studies major from Dartmouth College, she was a New York City Teaching Fellow and, in addition to several years in the classroom, she has worked in the public library system as a teen librarian. She is the Associate Librarian at Scholastic Inc. and has worked as an adjunct reference and special projects librarian at CUNY Brooklyn College. A regular contributor to Scholastic’s On Our Minds blog, she can also be found on Goodreads or Twitter @dataquilter.

Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar