Corpus-Based Vocabulary Analysis of English Podcasts

Document Type


Publication Date



In addition to movies, television programs, and TED Talks presentations, podcasts are an increasingly popular form of media that promotes authentic public discourse for diverse audiences, including university professors and students. However, English language teachers in the English as a second language/English as a foreign language contexts might wonder: “How do I know that my students can handle the vocabulary demands of podcasts?” To answer that question, we have analyzed a 1,137,163-word corpus comprising transcripts from 170 podcast episodes derived from the following popular podcasts: Freakonomics; Fresh Air; Invisibilia; Hidden Brain; How I Built This; Radiolab; TED Radio Hour; This American Life; and Today Explained. The results showed that knowledge about the most frequent 3000 word families plus proper nouns (PN), marginal words (MW), transparent compounds (TC), and acronyms (AC) provided 96.75% coverage, and knowledge about the most frequent 5000 word families, including PN, MW, TC, and AC provided 98.26% coverage. The analysis also showed that there is some variation in coverage among podcast types. The pedagogical implications for teaching and learning vocabulary via podcasts are discussed.



Publication Title

RELC Journal



This document is currently not available here.