State of the Union
Political Prose Over Time

About

The spectacle of the president's speech (George W. Bush, in 2006) was the occasion for considering the relationship between language and politics. This project examines changes in the language of the State of the Union address over the past 200 years.

State of the Union was authored by Brad Borevitz. Originally developed in 2006 using Java for the analysis and Processing for the graphic user interface, the project was updated in 2013 when the interface was ported to Processing.js so that Java was no longer required for viewing. In 2025, the project was completely modernized using contemporary web technologies while preserving the core analytical approach.

A project such as this, although executed directly by a single person, is a collaborative effort. The work would not be possible without the kind of open resources that are currently available on the web. It required a good deal of research on hundreds of websites to build the knowledge-base that produced this project.

The Visualization

State of the Union (SOTU) provides access to the corpus of all the State of the Union addresses from 1790 to 2025.

The Words

SOTU maps the significant content of each State of the Union address so that users can appreciate its key terms and their relative importance.

The horizontal axis shows the average position of a word in the document. The vertical axis displays the word's relative frequency, determined by comparing how frequently the word occurs in the document to how frequently it appears throughout the entire body of SOTU addresses (see appendix for details).

Common words ("and," "the," etc.) and words that occur frequently in the entire corpus ("states") are largely filtered out; what remains are words that are especially characteristic of a given address. The size of the word indicates how many times it was used in the document. Click the word to view the full text of the address with the word highlighted. Rollover the word to get detailed frequency data.

The Data

The data underneath the map of significant words shows trends in the language of the State of the Union addresses. On the graph, black bars indicate the word length of each address. The grey dots indicate readability as measured by the address's Flesch-Kincaid score, which is meant to suggest the grade level in an American school for which the text is comprehensible. The actual scores are displayed in the bottom right corner of the interface (for more information on Flesch-Kincaid, see the appendix).

The current corpus contains 240 documents. There are 1,824,300 words in the corpus, and 24,471 unique words.

Technologies Used

The current version (2025) is built with:

  • Python - Text analysis and statistical processing, replacing the original Java pipeline. Calculates TF-IDF, Log-Likelihood Statistics, and Flesch-Kincaid readability scores.
  • Astro - Static site generator that builds the entire site as pre-rendered HTML, eliminating the need for server-side processing.
  • D3.js - Data visualization library used for rendering the interactive word cloud to Canvas, replacing the deprecated Processing.js.
  • Tailwind CSS - Utility-first CSS framework providing responsive design and mobile support.
  • Pagefind - Client-side search engine, replacing the server-side Sphider PHP search.
  • Claude Code - The agentic coding tool was used to help transform legacy code so that the site now uses these contemporary tools.

Resources and Credits

I have relied directly on the following resources:

  • The original interface was written using Processing and later Processing.js.
  • Most of the text was from Project Gutenberg and updated yearly from various sources including the White House website, the Congressional Record, and C-SPAN and other news outlets.
  • Methods for building frequency word lists were based on code in Andrew Roberts' aConCorde.
  • The syllable counting algorithm is by Daniel Schiffman.
  • Thanks to Christiane Paul & Martin Wattenberg for feedback on the 2007 version.

Use this contact form to send me any feedback or questions.