Short Form Pieces

css.php
A screenshot of a highlighted section of the research essay; students’ annotations comment on the driving question and data collection in a word processor.
1

Visualizing Essay Elements: A Color-Coding Approach to Teaching First-year Writing

In this piece, I share a strategy for teaching first-year writing in which students color-code and annotate sample rhetorical analysis and research-based essays for elements including citations, quotations, transition words, vocabulary, and structure.

Introduction

Digital platforms such as Google Docs offer spaces for students to visualize, conceptualize, and collaborate on learning. While teaching writing, I have observed that first-year students often struggle with academic writing skills including developing ideas, incorporating and citing sources, and organizing essays. In this article, I share a Google Docs activity I designed and implemented in a first-year writing class that asks students to color-code and annotate sample rhetorical analysis and research-based essays for elements such as citations, quotations, transition words, vocabulary, and structure. The assignment enhances students’ metacognitive awareness of the characteristics of essay writing in various genres and supports the social construction of learning. In addition, this activity fosters first-year writing learning goals including teaching students to develop strategies for composing and to “create complex, analytic, well-supported arguments that matter in academic contexts and beyond” (UM English 124/125 Learning Goals).

Color-Coding Rhetorical Analysis Essays

While teaching the rhetorical analysis genre, I encourage students to interpret rhetorical appeals such as logos, ethos, and pathos. I ask students to select an article of their choice that is targeted toward an audience within a particular discourse community that they are familiar with or interested in examining. As I state in the essay prompt (included below), I instruct students to “analyze and evaluate the effectiveness of the rhetorical strategies that the author employs in order to communicate his/her purpose. In addition, interpret the text in relation to the larger context.”

To scaffold the essay writing process, I share sample essays with the class and have students work in small groups to evaluate each essay based on the rubric criteria, which includes development/argument, structure/organization, and language/craft. In addition, as a way to facilitate students’ awareness of the language-level elements of essay writing, such as citations and transition words, I developed color-coding guidelines that ask students to color-code the following elements in each body paragraph of a sample essay and, optionally, in their own essay drafts. As a way to enhance accessibility for students with varying visual needs, I included options for identifying the elements using bold, italic, underline, and highlight:

  • Citations to sources in red/bold
  • Quotations from sources in green/italic
  • Transition words/phrases in blue/underline
  • Precise vocabulary/word choice in purple/highlight

In one class activity, students worked in small groups to color-code the sample essay “A Lopsided Pyramid: An Analysis on Michael Greger’s Call to Ditch the Dairy.” I shared the essay with the class as a Google Doc, and each student accessed the doc on their individual laptops, so that each member of the class could edit the doc simultaneously. I assigned each group to highlight a different body paragraph.

This image shows a screenshot of a color-coded paragraph from a rhetorical analysis essay, with the citations in red, quotations in green, transition words in underline, and vocabulary in purple
Figure 1. Screenshot of a color-coded rhetorical analysis essay paragraph.

As illustrated in the screenshot, one group highlighted the name of the article’s author (Greger) in red, a quotation in green/italics (“accelerated aging, being overweight, canker sores, kidney stones, childhood asthma, constipation, prediabetes and diabetes, prostate and other cancers, heart disease, imbalanced hormones, mucus, Parkinson’s disease, rheumatoid arthritis, rising blood pressure, skin wrinkling, sudden infant death syndrome, ulcerative colitis, bacterial vaginosis, and Multiple Sclerosis”), transition words and phrases in blue/underline (“in addition,” “though the article is short,” and “also”), and precise vocabulary in purple (“methodically,” “shear,” “extremely,” “sophisticated,” “compromised,” and “bombarded”). Following the activity, I asked each group to share their observations about the paragraphs they highlighted.

Color-coding Research-based Essays

This image shows a screenshot of instructions for annotating the research essay; each group is assigned a different section to identity and annotate.
Figure 2. Screenshot of color-coding instructions for the sample research-based essay.

In addition to engaging students in this rhetorical analysis activity, I have adapted this color-coding exercise for research-based essays. In the research-based essay assignment, I ask students to examine a topic of their choice by conducting secondary well as primary research, including observations, surveys, or interviews. The assignment encourages students to contribute to a conversation that holds interest or significance to them. As students often investigate issues that arise from their personal interests, this genre can include traditionally narrative elements such as scene-setting and description as well as typical elements of research articles, including secondary sources, primary data collection, findings, analysis, and discussion. In this sense, this genre is hybrid in form, interweaving aspects of narrative, analytical, and research-based writing. Observing that students sometimes struggle with conceptualizing ways to structure their essays, especially when approaching unfamiliar genres such as this one, I developed an activity in which students highlight the sections of a sample essay entitled “A Tale of Two Ice Cream Stores.”

As shown in the instructions, each group is assigned to identify, highlight, and annotate a different section of the essay (exposition and context, driving questions/hypothesis, secondary source evidence, data collection and results, analysis and discussion, and conclusion).

This image shows a screenshot of a highlighted section of the research essay; students’ annotations comment on the driving question and data collection.
Figure 3. Screenshot of students’ highlights and annotations of the sample research-based essay.

As shown in the screenshot above, for example, a student highlighted the driving question in purple (“Did locals, like my best friend, support Stucchi’s more than Ben & Jerry’s because of its connection to Ann Arbor?”) and commented, “Driving question of how a local business can compete with a large corporation lead to the hypothesis regarding the connection with Ann Arbor.” Another group highlighted the description of the primary data collection in green, while a student commented, “Data Collection through a survey posted to Facebook.”

This image shows a screenshot of the concluding section of the research essay; students’ annotations comment on the conclusion.
Figure 4. Screenshot of students’ highlights and annotations of the sample research-based essay.

As shown in the second screenshot above, four students annotated the final paragraphs of the essay with their comments. As seen in the highlighted sections, the analysis (“what they stand for, and the quality of their products”) leads naturally into a conclusion (“But convenience comes with a price…”), while a further rhetorical question is posed within the conclusion (“Do we truly want everywhere to look the same? Have the same familiar stores, the same familiar logos?”). This screenshot illustrates the multilayered, intersecting nature of writing purposes and structures, showing that the sections are not necessarily linear, sequential, or mutually exclusive. Although this sample essay serves as one possible model or guide for approaching the assignment, I encourage students to structure their essays in flexible ways, according to their intended purposes for writing. For instance, I note the way the writer of the sample essay recursively generates new driving questions throughout the essay as opposed to only presenting initial questions in the opening.

Discussion

As illustrated, these color-coding activities render visible and legible the discrete elements of writing and enhance students’ metacognitive awareness of academic argumentation in various rhetorical situations. While collaborating on a Google Doc, students co-construct meaning by identifying key characteristics of written genres and by annotating the essays with their own commentary. Even so, one possible limitation of this approach might be that conceptions of writing are reduced to the surface-level features of a writing sample. In the future, I seek to encourage critical discussion of the shifting, evolving nature of academic writing as constructed by continual interactions across generic and discursive contexts, as writers shape and become shaped by the discourses they create. In this sense, students can become inspired not only to form their written compositions, but to transform their conceptions of writing as well.

Appendixes

Appendix A – Color-Coding Guidelines
Appendix B – Rhetorical Analysis Essay Prompt
Appendix C – Research-Based Argument Essay Prompt

About the Author

Ruth Li is a PhD student in the Joint Program in English and Education at the University of Michigan, Ann Arbor. Her research examines first-year students’ writing about literature, with attention to the reading-writing connection. More broadly, she is interested in literacy studies, writing development, applied linguistics, and digital tools and technologies for supporting writing research, pedagogy, and assessment. She serves as an instructor in the English Department Writing Program, where she has taught first-year and upper-level writing classes.

A banner for a student website, featuring a menu below a 19th century painting of many women in Graeco-Roman antiquity.
0

Ushering “Women in Antiquity” into the Modern Classroom

This assignment was created for a 200-level course cross-listed between Classics and Women & Gender Studies, entitled “Women in Mediterranean Antiquity.” The website is a work-in-progress, and any questions or collaboration inquiries can be sent to chelsea.gardner@acadiau.ca.

Introduction

This article introduces a project that I developed for an undergraduate course on the subject of Women in Mediterranean Antiquity at Mount Allison University in the winter 2017 semester. The aim of this project was to provide undergraduate students with an introduction to digital platforms in a historically archaic field and provide said students with skills that would impart digital literacy and valuable knowledge to benefit them regardless of their future career endeavors (see Macauley-Lewis 2015). The concept of the 21st-century university student as a “digital native” is problematic, and I echo Brandon Locke’s argument that although “many in higher education generalize their undergraduate students as being well acquainted with technology and approach their studies through a digital lens, students often struggle when it comes to critical content creation and mediation” (Locke 2017). I present here the methodology used in creating the assignment, its successes and failures, and future directions. The syllabus used for the course and instructions for implementation and evaluation are included as downloadable supplementary materials.

Description of the Assignment & Methodology

In lieu of a traditional research essay, students were asked to participate in what I termed a digital research assignment that required each student to submit 8–10 pages of research on a topic of their choosing (related to Women in Mediterranean Antiquity) with a comprehensive bibliography, and then to populate their own webpage within a larger website that I built using WordPress. This project was particularly well suited to Women in Antiquity because this was a course that seemed to have obvious topic selections for the student research that would then be placed into general headings on the website.

Once the traditional research was submitted, graded, and returned, students embarked upon the “digital” portion of this assignment. As mentioned above, I designed and set up a website through the WordPress.com platform (using the Gateway Theme). I chose WordPress for three primary reasons: 1) it is free to operate; 2) I had prior experience with the platform; and (most importantly) 3) WordPress is a useful platform with a lot of user support: as of May 2018, WordPress “runs 28.9% of the entire internet” (Karol, 2018; 31.6% on W3 Techs, n.d.). WordPress is user-friendly but also allows students to explore html options and coding language. Any party interested in adopting a version of this project for their own classroom would not be limited to WordPress but could instead explore the many options available online and select an alternative platform for a collaborative website (Kick, 2013)—although, as I repeated to my students throughout the semester, chances are that if they are facing a technical problem, someone else has already found the answer to it, and the resources available to WordPress users are extensive.

On the WordPress.com platform, the students each created a “blog post” that I later nested under parent headings so that it appeared instead as a static “page” to the website visitor. Within their posts, students had full creative license for design & media (including coding and CSS). I provided step-by-step instructions as to how to initially set up their pages, which are included here as Appendix B and are also freely available on the website under the Resources tab for any instructor interested in embarking upon a similar project. The students were strongly encouraged to set up their websites for a general audience and to make their pages as visually appealing as possible, but they were required to use only open-access images and media.

Final Product & Results

There were 47 original student contributions to the website, which I arranged into thematic groupings that appeared as drop-down menus on the main page (Greek Women, the Female Body, Roman Myth, etc.). I encouraged the use of real names and emphasized the merit of their contribution to this publicly available online resource; however, participants could remain anonymous in their authorship of the page through a chosen pseudonym, and any students who did not want their page to remain public could let me know and I’d remove it immediately following the end of the semester. My initial goal for this assignment was to include it every time I taught “Women in Antiquity” or a similar course, thus creating an ever-expanding resource on the subject of women in the ancient world. Anticipating a range in the quality of submissions, I informed students at the beginning of the course that their contributions would not necessarily be permanent, but that pages might be taken down and/or altered in future versions of the class. Due to the outstanding nature of several of the contributions, I decided that with the permission of the student authors, the best contributions would remain on the site long-term, with the ultimate goal that with each iteration of the course, further outstanding contributions would be added to the permanent version of the site, thus creating a growing open-access resource for the study of women in the ancient world.

Student Reactions & Feedback

The student feedback for this project was primarily—but not universally—positive. Anonymous feedback from course evaluations included comments such as, “The digital research project is fun and helped me hone applicable skills in website construction”; “Loved the digital component of the course and that we were able to choose from several topics, or our own”; “The website idea was great as it was a less stressful assignment”; and “Website making was so cool!” I asked a few students to provide more extensive feedback for this publication:

The Women in Antiquity website project is enjoyable as it allows students to write about a topic that they are interested in and be free from the usual boundaries of the traditional essay. I enjoyed being able to write in a less formal way and I felt like being able to share my research in this way helped me to gain a deeper understanding of my topics… this project not only benefits students within the class but also allows for others to access this wealth of information. —Caitlin McGowan

I really enjoyed the process of editing my research paper to see it evolve from one with a distinctive academic tone to a piece that was user friendly, engaging, and tailored to a blog platform, yet still maintained the credibility of academic writing. —Caroline Chamandy

I found the Women in Antiquity Digital Project component of Dr. Gardner’s course one of the most engaging projects of my undergraduate career. It taught me how to reconceptualize the way in which I thought about and approached my research. Being able to share my work with family, friends, and peers as a highly accessible and informative tool was very rewarding. —Janan Assaly

Negative feedback from students is also worth sharing since these provide important perspectives to consider for future improvement: one student enrolled in my course told me at the end of the semester that her friend had dropped this course as soon as he’d seen the digital project on the syllabus. In her words, he “just wanted to write a traditional essay and be done with it.” This student was an exception, but this reaction and the pushback against digital media does, in some cases, exist. Another student, who remained in the class and produced an excellent webpage, was the only individual (out of 47 students in my initial iteration of this course) who wanted their page taken down. This student generously provided feedback about their decision to remove their content, and I received their permission to share it here:

Personally, I am a private person, who tries to have as minimum a presence online as possible, therefore, I prefer not to have my name associated with something published on the internet. Overall, I felt that I was not in the right position to be educating the masses on my chosen topic because I was unsure if I truly believed what I had written. —Anonymous

This student’s insightful concerns are valid, because they bring up the question of whether there is any benefit to providing even more online content when students already have difficulties in evaluating the legitimacy of existing online sources (Basulto, 2017; Fleming 2018). Ultimately, I believe that researched, appealing academic content—even at the undergraduate level—is beneficial and valuable, especially in light of the inevitability that a large percentage of the population is accessing information online. By encouraging my students to provide reliable content with active hyperlinks to additional reliable, academic content (such as JSTOR, Diotima, and Lacus Curtius), we can strengthen the network of reputable online information.

Outreach & Impact

A recurring theme in this student feedback—both positive and negative—is the exposure of the site, which has far surpassed initial expectations. Students were encouraged to share their pages on social media platforms and to include many tags for their pages in order to generate search engine hits. In 2017 the site had 12,597 views; by 2018 the site had 63,290 views and as of November 2019 there have been 75,347 visitors. There is currently an average of 238 visitors per day. One of our most successful student contributions is the page Spinning and Weaving in Ancient Greece by Marion Blight, which was featured in the Seattle Weaver’s Guild May 2018 monthly bulletin and currently has 11,204 views. Links to our site are included in TedEd lessons (The myth of Arachne which links to Blight’s page and the legend of Medusa which links to Cheryl MacKinnon’s Medusa and her Sisters: The Gorgons); a post on Grunge (that links to Pregnancy and Childbirth by Keelin Howe); and an article on the Conversation (Barker, 2018; on Hetairai: The Ancient Athenian Courtesan by Samuelle Saindon). Adrienne Mayor, author of The Amazons: Lives and Legends of Warrior Women across the Ancient World (2014), found the site and offered permission to use some of her images for the page The Amazons by Dexter Fennell. Our top referrers for website traffic are search engines (105,214 views) while Facebook is a distant second (1,756 views); the site is also included among the online resources of courses at universities including Colorado State University, Kansas University, The Open University, Sewanee: The University of the South, Fashion Institute of Technology in New York (SUNY), Vassar College, Charleston College, and Memorial University. I was not aware that these institutions were using the site for their courses until I consulted the online statistics, which means that we are gradually achieving our desire to be an open-access resource for general public and scholarly audiences alike.

Future Directions & Collaborations

I decided to transform this assignment into an international, collaborative project, inviting instructors from any institution to incorporate this assignment into their undergraduate or graduate courses. The motivation to do so was twofold: first, this would provide a way for the website to expand continuously with new contributions from institutions all over North America, thereby bolstering the content and availability of resources for the study of Women in Antiquity; and second, this project offers an opportunity for instructors and for students, a viable means to begin to engage with Digital Humanities and alternative scholarship. For those without the vast quantities of time required to master even basic DH skills such as website-building, digitization, and database creation, there is a continued need for introductory-style pedagogical projects that can provide a viable solution for all scholars who want or need to embrace digital applications in the university classroom (Boss and Kraus 2007; The Pedagogy Project).

A different instructor at a different institution can teach this course and assign this project each semester, in order to continually add to and improve upon the existing content and to strengthen the collaborative networks that are fundamental to the Digital Humanities (Griffin & Hayler 2018). It is my personal hope that this website continues to grow and improve with the contributions of the next generation of scholars, encouraging the study of Women in Antiquity and the production of open-access information for a global audience, ultimately creating a comprehensive and collaborative resource for the foreseeable future.

Appendices

Appendix A – Original Course Syllabus
Appendix B – Instruction Slides
Appendix C – Digital Research Project Instructions

Bibliography

Basulton, Dominic. 2017. “Information Overload? There Has Always Been Too Much to Know.” BigThink. Accessed September, 2018.
https://bigthink.com/endless-innovation/information-overload-there-has-always-been-too-much-to-know.

Boss, Suzie, and Jane Kraus. 2007. Reinventing Project-Based Learning: Your Field Guide to Real-World Projects in the Digital Age. Eugene: ISTE.

Fleming, Grace. 2018. “Bad Sources for your Research Project.” ThoughtCo. Accessed September, 2018.
https://www.thoughtco.com/bad-research-sources-1857257.

Griffin, Gabriele, and Matt Steven Hayler. 2018. “Collaboration in Digital Humanities Research: Persisting Silences.” Digital Humanities Quarterly 12, no. 1. www.digitalhumanities.org/dhq/vol/12/1/000351/000351.html.

Karol, K. 2018. “WordPress Stats: Your Ultimate List of WordPress Statistics (Data, Studies, Facts—Even the Little-Known).” CodeinWP blog. Accessed September, 2018. https://www.codeinwp.com/blog/wordpress-statistics/.

Kick, Verena. 2013. “02. Collaboratively blogging / authoring a website in the foreign-language classroom.” HASTAC Online. Accessed September, 2018.
https://www.hastac.org/blogs/vkick/2013/11/01/02-collaboratively-blogging-authoring-website-foreign-language-classroom.

Locke, Brandon T. 2017. “Digital Humanities Pedagogy as Essential Liberal Education: A Framework for Curriculum Development” Digital Humanities Quarterly 11, no. 3.
http://www.digitalhumanities.org/dhq/vol/11/3/000303/000303.html#brake2014.

Macauley-Lewis, Elizabeth. 2015. “Transforming the Site and Object Reports for a Digital Age: Mentoring Students to Use Digital Technologies in Archaeology and Art History.” The Journal of Interactive Technology and Pedagogy 7.
https://jitp.commons.gc.cuny.edu/transforming-the-site-and-object-reports-for-a-digital-age-mentoring-students-to-use-digital-technologies-in-archaeology-and-art-history/.

W3Techs Web Technology Surveys. n.d. “Usage of content management systems for websites.” Accessed September, 2018.
https://w3techs.com/technologies/overview/content_management/all.

About the Author

Chelsea A.M. Gardner is an Assistant Professor of Ancient History in the Department of History & Classics at Acadia University in Nova Scotia, Canada. She is a Classical Archaeologist who specializes in the history and material culture of the ancient Mediterranean, and her research focuses on archaeological exploration in southern Greece. She currently works in the Mani peninsula, and is the co-director of The CARTography Project, a DH mapping project that analyzes and recreates the routes of early modern travellers. Her other interests include ancient and modern cultural identity, ancient religious space, the history of travel, archaeological survey, women in the ancient world, animals and nature in antiquity, landscape studies, and—of course—Digital Humanities.

0

Computational Thinking–Centered Pedagogy: A Collecting Data with Web Scraping Workshop

A library workshop introduces participants to using Python-based web scraping for data collection and raises important questions for how we think about teaching computational thinking and prepare users to consider the ethical implications of the tool.

Introduction: The Challenges of Library Instruction

Library-based instruction can be a tricky thing. We usually only have one chance to give a workshop or to visit a class, so there is a pressure to get it right the first time. This is hard enough when teaching first-year students the basics of information literacy, and it presents an additional set of challenges for technology-based instruction. New technical skills are rarely acquired in 60- or 90-minute sessions. They more often require longer periods of study and build on a foundation of other technical skills that one cannot assume all participants will have (Shorish 2015; Locke 2017).

NYU Libraries offers a range of technical workshops designed to provide this technical foundation, and sessions cover topics such as quantitative and qualitative software, GIS and data visualization, research data management, and digital humanities approaches. While these workshops can be taken as one-off sessions, they are designed as part of an interwoven curriculum that introduces technical skills and concepts in an incremental way. The Collection Data with Web Scraping workshop discussed here is offered as a digital humanities course and, while there are no prerequisites and it is open to all, participants are encouraged to take the Introduction to Python and Text as Data in the Humanities workshops in advance (NYU Libraries 2019).

The workshop introduces web scraping techniques and methods using Python’s Beautiful Soup library, with a focus on developing participants’ computational thinking skills. I always emphasize that no one becomes an expert on web scraping in this 90-minute workshop, especially given that some have no previous programming experience. However, participants still learn valuable skills and concepts and through this process develop a more foundational understanding of computational logic and its affordances when applied to digital research. I call this computational thinking and it is the primary learning outcome of the workshop.

Agenda and Learning Outcomes

The workshop is divided into four sections, with the agenda as follows:

  1. Why use web scraping?
  2. What are the legal and ethical implications?
  3. Technical introduction and setup
  4. Hand-on web scraping exercises

The sections are designed to fulfill the workshop’s learning outcomes:

  1. Strengthen computational thinking skills
  2. Learn the concepts and basic approaches to web scraping
  3. Understand how web scraping relates to academic research
  4. Understand the broader legal and ethical context of web scraping

A Computational Thinking Centered Pedagogy

The primary learning objective of this workshop is to help participants strengthen their computational thinking skills. A basic working definition of computational thinking is understanding the logic of computers. It seems obvious, yet worth stating, that computers prioritize different patterns of logic than humans. There are multiple layers of complexity to understanding computational logic and then applying it in real-world research and teaching environment, and my approach is to reveal and make explicit some of these layers. For example, one of the core activities of the workshop is an in-depth look at how websites are packaged and how data, broadly defined, is structured within them using HTML and CSS. This close look at one of the building blocks of the web then allows us to identify patterns in this structured data in order to extract the useful pieces of information and build the collection. More importantly, these lessons are applicable to contexts beyond web scraping and are transferrable to our other workshops or to any activity involving data work. This gives the workshop an added value and empowers participants more confident and comfortable using technology in their research (Taylor et al. 2018).

In addition to identifying patterns in structured data, there are countless other opportunities to provide insights that give participants a deeper understanding of how technology works. For instance, when introducing the Beautiful Soup library, I describe how programming libraries are just blocks of code that allow us to write our program with 10 lines of code instead of 100. There are several web scraping programs written in other languages, but I chose Python because it has a robust developer community. That is, there are people contributing to a whole network of libraries, like Beautiful Soup, that serve to expand the functionality so that once you have extracted data from a website and are ready to analyze it, you can simply import another library, such as spaCy or the Natural Language Toolkit (NLTK) to do your next phase of work (Explosion AI 2019; NLTK Project 2019). When built into the curriculum in a thoughtful way, these parenthetical notes make it easier to learn the material at hand and also to establish a wider technical context for the work.

Why Use Web Scraping?

In addition to inserting computational thinking vignettes throughout the workshop, I find it helpful to begin with a discussion of why one might use web scraping. Since the workshop’s primary audience is humanists, this discussion of when web scraping is (and is not) appropriate and how it can be used in research is particularly useful. For example, as more and more primary and secondary source materials are appearing on/as websites, it is increasingly common for scholars to need to gather this material. Within libraries, archives and museums, initiatives such as Collections as Data underscore a shifting approach whereby library collections are conceptualized and provided as data (Always Already Computational 2019). Projects such as OPenn demonstrate how a library’s digitized special collections can be made accessible as machine readable and ready for large scale analysis (University of Pennsylvania Libraries 2019). An additional example, the New York Society Library’s City Readers project, presents the Library’s early circulation records as data, allowing users to, for example, compare whether John Jay or John Jacob Astor read more books in a given year (The New York Society Library 2019). Such examples help participants envision how they could use web scraping in their work.

A graph showing a comparison of circulation statistics for various authors.
Figure 1. A data visualization from the New York Society Library comparing circulation statistics among patrons.

Another core concept of the workshop is that web scraping will become one of many skills in participant’s “digital toolbox,” and can connect with other technical skills used in the research lifecycle. For example, data gathered from web scraping is often messy and often needs additional scrubbing in a program like OpenRefine (MetaWeb Technologies, Inc. 2019). Or, web scraping might be just one step in text analysis project, and you might want to use a named entity recognition (NER) package to next extract names of people or places from the scraped dataset.

What are the Legal and Ethical Implications?

Next is a conversation about the legal and ethical implications of web scraping. The key lesson here is that just because you can scrape a website, it doesn’t mean you should. It is important to first check a site’s terms of use policy to understand whether there are rate limitations or if scraping is outright prohibited. Collecting certain types of online data on human subjects (e.g. some types of social media data) will require IRB approval. After collecting data, scholars will also need to consider how will the data be stored or archived and whether this has the potential to put others at risk. This is a particularly pertinent concern for materials dealing with controversial subject matters or underrepresented groups. The Documenting the Now project has many great resources to help navigate these often complex issues (Documenting the Now Project 2019).

In terms of research best practices, it also takes some data literacy basics to evaluate your target source. There is a lot of garbage online, and how so do you know the data is what it claims to be? Is it representative and what biases does it contain? And research projects using digital sources or methods are no different from more traditional approaches in that getting the data or producing a visualization of it is often not the end of a project. In most cases, the data must then be analyzed in a theoretical framework of the scholar’s discipline in order to form a scholarly argument. The earlier cited example of the New York Society Library illustrates this well – the circulation record visualization shown above is an interesting anecdote but the image is a relatively simple data visualization and does not actually tell us anything meaningful about, say, the American Revolution or eighteenth-century reading patterns.

Using Beautiful Soup

While asking participants to bring their own laptop and set them up with their own Python environment provides rich opportunities for moments of computational thinking, it is time intensive, demanding on the instructor, and requires a longer workshop. A simpler approach is to use an already exiting environment such as JupyterHub, PythonAnywhere, or a computer lab with Jupyter Notebook installed (Project Jupyter team 2019; PythonAnywhere LLP 2019; Project Jupyter 2019).

Beautiful Soup is a Python library for extracting textual data from web pages (Richardson 2019). This data could be dates, addresses, news stories, or other such information. Beautiful Soup allows you target specific data within a page, extract the data, and remove the HTML markup surrounding it. This is where computational thinking skills are needed. Webpages are intended to be machine readable via HTML. The goal is to write a program, in machine readable form, that extracts this data in a more human readable form. This requires that we “see” as our computers “see” in order to understand that if, for example, we want the text of an article, that we need to write a program that extracts the data between the paragraph tags.

<p></p>

Once we understand the underlying rules for how pages are displayed – i.e. using HTML and CSS – we can start to see the patterns in how content creators decide to present different types of information on pages. And that is the computational thinking logic behind web scraping: identifying these patterns so that you can efficiently extract the data you need.

Computational Thinking in Action

The examples used in the workshop are available online (Coble 2019), and working through the first example – collecting the titles from the Craigslist page for writing, editing, and translation – will illustrate some of these concepts. While the research value of this data is rather limited, it is a straightforward example to introduce basic techniques that are built upon in subsequent examples.

 

A screenshot from the Writing / Editing / Translation section of Craigslist, showing various offers and prices.
Figure 2. Screenshot of Craigslist page for writing / editing / translation.

The first step is to use the browser’s View Source feature to look at the page’s HTML code. Not only do we get a quick glimpse into how the data is structured, we can also begin to identify the parts of the code that uniquely mark the title of these posts.

 

A page with the HTML source code for the previous figure.
Figure 3. Screenshot of Craigslist page source code for writing / editing / translation.

For example, here is the source code for the first post on our page:

<a href="https://newyork.craigslist.org/mnh/wet/d/brightwaters-college-tasks-essay-exams/6998844623.html" data-id="6998844623" class="result-title hdrlnk">🎼 🎼 College Tasks | Essay | Exams | Course Help 🎼 🎼</a>

Let’s start by breaking this into parts:

a href="https://newyork.craigslist.org/mnh/wet/d/brightwaters-college-tasks-essay-exams/6998844623.html"

The above part is the link to the full post. We don’t want this because it’s not the title.

data-id="6998844623"

This looks better, but data-id appears to be a unique identifier for a specific post. If we write a program to search for this, it will only return one title. This won’t work because we want all titles of posts on our page.

class="result-title hdrlnk"

This looks much better. But there are actually two class tags here, class=”result-title” and class=”hdrlnk” (condensed and separated by a space), so which one is best? We can do a quick check by searching on the View Source page – using Cmd+F or Ctrl+F – for “result-title.” There are 120 posts displaying on my page, and the search for “result-title” returns 120 results. Bingo!

 

Another image of source code being searched.
Figure 4. Screenshot of source code for Craigslist and the browser’s search feature.

We can repeat this process for “hdrlnk,” which, in this case, also returns 120 results. So we can comfortably use either “result-title” or “hdrlnk” for our program. To be safe, I would also do a quick manual check of other links on the page – both links for posts and for other links (My Account, Save Search, etc) to confirm that “result-title” and “hdrlnk” is the unique string that will return the post’s title and only the post’s title.

And this is the computational thinking the workshop helps to build. By understanding how web pages use HTML and CSS to structure their contents, we are able to isolate patterns unique to our target data and to use these patterns to extract the target data. Once we have these pieces in place, we can write a program that looks like this:

# import the urllib library to get the HTML
import urllib.request
# import the Beautiful Soup library to parse the HTML
from bs4 import BeautifulSoup

# define a variable with our web page
start_url = 'https://newyork.craigslist.org/search/bar'
# ask urllib to get the HTML from our web page
html = urllib.request.urlopen(start_url).read()
# ask Beautiful Soup to parse the web page as HTML
soup = BeautifulSoup(html, 'html.parser')
# ask Beautiful Soup to extract the titles
titles = soup.select('.hdrlnk')

# for loop to print each title
for title in titles:
    print (title.text)

And get something back that looks like this:

🎼 🎼 College Tasks | Essay | Exams | Course Help 🎼 🎼
Writing and English tutor. NYU and Columbia graduate.
Writing/Essay Assistance, $80. NYU and Columbia graduate.
Versatile Content Writer Provides Top Notch Business-Related Material
Experienced Proofreader At Your Service
Screenplay Solutions! Writing, Edits, Formatting, Etc.
Thesis, Research, Dissertations, Publications, Presentations.  Ivy
--> A Special Speech for a Special Event? |  Hire a Professional
Need a Bio? For profiles, websites, expert collateral, exec resumes
School and college coursework & essays w r i t i n g Service
$25 resume editing & consulting for students and young professionals
Don't Just Talk! Communicate - Medical School Intervew
Grad/law/MBA/med school personal statements due?
FOR HIRE: AWARD-WINNING, IVY-EDUCATED EDITOR/SCRIPT CONSULTANT
Pay me write your essay, edit your work, take an classes fully online
FAST Affordable Dissertation and Academic EDITING-NonNative English OK
Versatile Content Writer Provides Top Notch Business-Related Material
Winning Resume, Cover Letter and LinkedIn Package For $30
French writer and translator
Writers for FrontPage.nyc
Academic Intervention & Paper Writing

Conclusion

Bringing computational thinking concepts to the forefront of the workshops has been successful and resulted in more engaging sessions. Participant feedback has indicated that having a greater contextual understanding of web scraping and learning about its underlying principles has helped them better understand its potential applications and to feel more confident in doing their work. Given the nature of library-offered technical workshops, focusing on a computational thinking–centered pedagogy has been successful in helping participants to meet their specific need to pick up a new skill as well as to meet a less often stated need to understand how and why a particular tool or approach is situated within larger research and technology ecosystems.

Bibliography

Always Already Computational – Collections as Data. 2019. “Always Already Computational – Collections as Data.” https://collectionsasdata.github.io/.

Coble, Zach. 2019. Code examples from Collecting Textual Data with Web Scraping workshop. https://github.com/coblezc/webscraping-workshop.

Documenting the Now Project. 2019. “Documenting the Now.” https://www.docnow.io/.

Explosion AI. 2019. “SpaCy – Industrial-Strength Natural Language Processing in Python.” https://spacy.io/.

Locke, Brandon T. 2017. “Digital Humanities Pedagogy as Essential Liberal Education: A Framework for Curriculum Development.” Digital Humanities Quarterly 113. http://www.digitalhumanities.org/dhq/vol/11/3/000303/000303.html.

Metaweb Technologies, Inc. 2019. “OpenRefine.” http://openrefine.org/.

NLTK Project. 2019. “NLTK 3.4.5 documentation.” https://www.nltk.org/.

NYU Libraries. 2019. “NYU Libraries Classes.” New York, NY: New York University. https://nyu.libcal.com/.

Project Jupyter. 2019. “Project Jupyter.” https://jupyter.org/.

Project Jupyter team. 2019. “JupyterHub.” https://jupyterhub.readthedocs.io/.

PythonAnywhere LLP. 2019. “Host, run, and code Python in the cloud: PythonAnywhere.” https://www.pythonanywhere.com/.

Richardson, Leonard. 2019. “Beautiful Soup.” https://www.crummy.com/software/BeautifulSoup/.

Shorish, Yasmeen. 2015. “Data Information Literacy and Undergraduates: A Critical Competency.” College & Undergraduate Libraries 22, no. 1: 97–106. https://doi.org/10.1080/10691316.2015.1001246.

Taylor, Natalie G., J. Moore, M Visser, C. Drouillard. 2018. “Incorporating Computational Thinking into Library Graduate Course Goals and Objectives.” School Library Research 21. http://www.ala.org/aasl/sites/ala.org.aasl/files/content/aaslpubsandjournals/slr/vol21/SLR_IncorporatingComputationalThinking_V21.pdf.

The New York Society Library. 2019. “City Readers.” https://cityreaders.nysoclib.org/.

University of Pennsylvania Libraries. 2019. “OPenn.” http://openn.library.upenn.edu/.

About the Author

Zach Coble is the Head of Digital Scholarship Services at NYU Libraries

A map of the landmass claimed by the United States, annotated with Indigenous territories and insets about religious practices.
1

Digital Building Blocks for Original Research

This submission details a digital and collaborative encyclopedia entry assignment as a building block for creating research literacy and confidence. Using Padlet, an online platform that enables students and faculty to create digital bulletin boards, students compile and visualize diverse sources into one interactive project.

For many college students, tackling original research can feel a bit like climbing the insurmountable. For a variety of reasons, ranging from teacher time constraints to increased focus on testing, many students arrive at college never having written a research paper (Wood 2010; Carter and Harper 2013). And as a report from Primary Research Group in 2018 noted, colleges are requiring fewer long-form writing assignments and students are OK with that (Whitford 2018).During fall 2018 (and again this fall), I taught Religion in Native North America, a 200-level undergraduate Religious Studies course. Taking it as a truth that research and writing remain integral skills, but recognizing my students’ concerns and probable lack of experience, I designed building block assignments throughout the semester to increase research literacy and build analytical confidence. One such assignment, was a collaborative encyclopedia entry created using the free online platform Padlet, a digital bulletin board that enables teachers and students to share images, links, videos, and text in an easily created and manipulated format.

An encyclopedia entry assignment offered students an opportunity to combine multiple sources in a single project, but remained a step away from requiring the development of an original research question or argument. I also explicitly introduced the project as a building block toward their final research papers, helping to alleviate some of the anxieties about performing research. Padlet’s easy drag and drop format enabled students to visualize and manipulate the data in ways akin to laying out flash cards. Content could be easily moved around the screen, color coded, and connected by arrows. The easy inclusion of videos and images made for richer storytelling and broadened students’ perspectives on the types of sources available for research. Students also enjoyed playing with the aesthetics of their projects, which helped students move beyond seeing each source as its own island and physically create and visualize connections. The digital format was highly accessible and easy to learn, and simply, more fun than a traditional annotated bibliography.

Blocks of text, images, and videos are easily integrated on Padlet bulletin boards
Figure 1. An encyclopedia entry example for the Northern Arapaho. Text boxes can be colored to correspond to various types of information; videos and photos can easily be dropped and moved around as new information is added.

The Assignment

Students were asked to work in groups of two or three to research a Native American community through the lens of religion. A month before the assignment was due, we met in the library with our Research & Instruction Librarian to explore available research databases, discuss academic sources, and begin to learn how to use the digital platform. Using a minimum of five sources, they had to find information on the Tribe, including language, geographic movement, arts, history, interaction with other Tribes, Anglo settlers, missionaries, and soldiers, as well as details about religious, political, and social organization and practices. Throughout, students highlighted the connections between religious beliefs and practices and other aspects of life. For example, a creation story often directly ties the Tribe to a specific location, helps to explain social patterns and gender relations, and is represented in Tribal art and storytelling.

The Platform

In an earlier version of this course, I used Moodle’s in-house wiki platform to create a similar assignment. Ultimately, I found the wiki format was too static and students struggled with the intricacies of Wikitext. Thus, in the second iteration I turned to Padlet.

I set up the homepage or base bulletin board using the canvas feature, which enables posts to be scattered across the page. Students created their own free accounts (just an email is required) and began their entries, to which they could easily add members and link to the homepage. This format brought all of the students’ work to one space, which allowed me to check in on progress, troubleshoot issues outside of office hours, and provided students with the opportunity to see the types of things their peers had uncovered.

Clickable images connect all student work to a central Padlet homepage
Figure 2. The home bulletin board on Padlet, with linked student encyclopedia entries. For this specific project I used a map of the United States that lists Native American homelands, which enabled students to locate their entries relative to the map.

Content could be easily dragged and dropped into place, moved around the screen, and color coded. Students did not need to wait to add information until the end of the project, but could post and edit in real time. In addition, Padlet can be easily produced and manipulated on a smartphone. Students at my institution come from a wide range of socio-economic backgrounds and many do not own personal computers, but the vast majority do own smartphones. Programs such as Padlet, that can be accessed successfully in a variety of formats, help breach the digital divide created by access to and use of technology (Dolan 2015).

Data can be visually connected with arrows and colors
Figure 3. The Canvas feature allows posts to be dragged around the page and connected to each other by color or arrow.

In addition to the encyclopedia entry, students were individually responsible for turning in a 250-word reflection on the research process. What was difficult or surprising about finding sources? How did you determine which sources to use? What research skills or strategies have you learned? What did you learn about your own research process and style? What do you still need to learn?

Challenges

As with all new tools, a degree of failure is expected. In two cases, students accidently deleted posts. The platform does not have a history feature or a way to retrieve deleted information, so if it’s gone, it’s gone. In addition, some students worked directly within the entry, rather than taking a middle step to analyze and gather research. This meant some of the posts on their bulletin boards were the result of a single paraphrased source, rather than achieving the greater goal of the assignment to compile information from multiple sources. As a means to help students not lose material if posts are deleted and encourage better note taking and personal analysis, in future iterations of the assignment I will recommend students keep all of their research and transcripts in Google Drive and Docs as backups.

Outcomes

On due day, students brought their devices to class and we spent the hour looking through each other’s work. Students reflected on the act of research itself, as well as the content of the entries.

For my non-Indigenous, predominately Christian students the content of this assignment helped make non-Abrahamic traditions more legible. With 573 federally recognized Tribes in the US, a focus on one community enabled students to see the more nuanced and Tribally specific frames of religious life. Significantly, Native history and religions cannot be studied without attention to settler colonialism, which has inescapably altered and harmed the communities under study (Avalos 2018). By focusing on one community, students were able to not only see the historical realities of colonialism, but the contemporary ramifications in a smaller, easier to digest case study of their own making.

As a form, the assignment gave students the opportunity to find and compile academic sources using campus databases in a lower stakes project. The collaborative bulletin board format enabled students to play with how the information was presented and helped students visualize how different sources and ideas could be connected. In addition to enjoying the project, students’ reflections noted that the following two assignments, an annotated bibliography and final research paper, felt more achievable after completing their encyclopedia entries.

Bibliography

Avalos, Natalie. 2018. “Decolonial Approaches to the Study of Religion: Teaching Native American and Indigenous Religious Traditions.” Teaching Religion as Anti-Racism Education, Spotlight on Teaching, Religious Studies News (October). http://rsn.aarweb.org/spotlight-on/teaching/anti-racism/decolonial-approaches

Carter, Michal J. and Harper, Heather. (2013) “Student Writing: Strategies to Reverse Ongoing Decline,” Academic Questions 26: 285–295.

Dolan, Jennifer E. 2016. “Splicing the Divide: A Review of Research on the Evolving Digital Divide Among K–12 Students.” Journal of Research on Technology in Education 48.1: 16–37. https://doi.org/10.1080/15391523.2015.1103147

Whitford, Emma. 2018. “Minimal Writing? No problem.” InsideHigherEd (July 31, 2018): https://www.insidehighered.com/news/2018/07/31/new-study-shows-few-students-see-need-more-writing-instruction

Wood, Peter. 2010. “‘It Messes Up My Fishing Time’: Why American High School Teachers Don’t Assign Research Papers.” National Association of Scholars. (October 14, 2010) https://www.nas.org/blogs/dicta/it_messes_up_my_fishing_time_why_american_high_school_teachers_dont_assign_

About the Author

Brennan Keegan is the Ainsworth Visiting Scholar of American Culture at Randolph College, where she teaches Native American and Religious Studies courses. She holds a PhD from Duke University.

Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar