Appendices
- The State of the Union Historically
- Flesch-Kincaid Scores Comparatively
- Statistical Methods Used in this Project
- Source Code
The State of the Union Historically
The following chart shows the timing of the messages and the distribution of oral and written delivery as well as indicating the increasing publicity of the address starting in the 20th Century.
Key: Oral deliveries in Red Written deliveries in Black *Radio delivery after written message marked with asterisk |
Increase in publicity:
|
Time of Message | ||||||
---|---|---|---|---|---|---|
President | Term | 1st | 2nd | 3rd | 4th | End |
George Washington | 1789-1793 | 1790 | 1790 | 1791 | 1792 | |
1793-1797 | 1793 | 1794 | 1795 | 1796 | ||
John Adams | 1797-1801 | 1797 | 1798 | 1799 | 1800 | |
Thomas Jefferson | 1801-1805 | 1801 | 1802 | 1803 | 1804 | |
1805-1809 | 1805 | 1806 | 1807 | 1808 | ||
James Madison | 1809-1813 | 1809 | 1810 | 1811 | 1812 | |
1813-1817 | 1813 | 1814 | 1815 | 1816 | ||
James Monroe | 1817-1821 | 1817 | 1818 | 1819 | 1820 | |
1821-1825 | 1821 | 1822 | 1823 | 1824 | ||
John Quincy Adams | 1825-1829 | 1825 | 1826 | 1827 | 1828 | |
Andrew Jackson | 1829-1833 | 1829 | 1830 | 1831 | 1832 | |
1833-1837 | 1833 | 1834 | 1835 | 1836 | ||
Martin Van Buren | 1837-1841 | 1837 | 1838 | 1839 | 1840 | |
William Henry Harrison | 1841 | |||||
John Tyler | 1841-1845 | 1841 | 1842 | 1843 | 1844 | |
James K. Polk | 1845-1849 | 1845 | 1846 | 1847 | 1848 | |
Zachary Taylor | 1849-1850 | 1849 | ||||
Millard Fillmore | 1850-1853 | 1850 | 1851 | 1852 | ||
Franklin Pierce | 1853-1857 | 1853 | 1854 | 1855 | 1856 | |
James Buchanan | 1857-1861 | 1857 | 1858 | 1859 | 1860 | |
Abraham Lincoln | 1861-1865 | 1861 | 1862 | 1863 | 1864 | |
Andrew Johnson | 1865-1869 | 1865 | 1866 | 1867 | 1868 | |
Ulysses S. Grant | 1869-1873 | 1869 | 1870 | 1871 | 1872 | |
1873-1877 | 1873 | 1874 | 1875 | 1876 | ||
Rutherford B. Hayes | 1877-1881 | 1877 | 1878 | 1879 | 1880 | |
James A. Garfield | 1881 | |||||
Chester A. Arthur | 1881-1885 | 1881 | 1882 | 1883 | 1884 | |
Grover Cleveland | 1885-1889 | 1885 | 1886 | 1887 | 1888 | |
Benjamin Harrison | 1889-1893 | 1889 | 1890 | 1891 | 1892 | |
Grover Cleveland | 1893-1897 | 1893 | 1894 | 1895 | 1896 | |
William McKinley | 1897-1901 | 1897 | 1898 | 1899 | 1900 | |
Theodore Roosevelt | 1901-1905 | 1901 | 1902 | 1903 | 1904 | |
1905-1909 | 1905 | 1906 | 1907 | 1908 | ||
William Howard Taft | 1909-1913 | 1909 | 1910 | 1911 | 1912 | |
Woodrow Wilson | 1913-1921 | 1913 | 1914 | 1915 | 1916 | |
1917-1921 | 1917 | 1918 | 1919 | 1920 | ||
Warren G. Harding | 1921-1923 | 1921 | 1922 | |||
Calvin Coolidge | 1923-1925 | 1923 | 1924 | |||
1925-1929 | 1925 | 1926 | 1927 | 1928 | ||
Herbert Hoover | 1929-1933 | 1929 | 1930 | 1931 | 1932 | |
Franklin D. Roosevelt | 1933-1937 | 1934 | 1935 | 1936 | ||
1937-1941 | 1937 | 1938 | 1939 | 1940 | ||
1941-1945 | 1941 | 1942 | 1943 | 1944 | ||
1945 | 1945* | |||||
Harry S Truman | 1945-1949 | 1946 | 1947 | 1948 | ||
1949-1953 | 1949 | 1950 | 1951 | 1952 | 1953 | |
Dwight D. Eisenhower | 1953-1957 | 1953 | 1954 | 1955 | 1956* | |
1957-1961 | 1957 | 1958 | 1959 | 1960 | 1961 | |
John F. Kennedy | 1961-1963 | 1961 | 1962 | 1963 | ||
Lyndon B. Johnson | 1964-1965 | 1964 | ||||
1965-1969 | 1965 | 1966 | 1967 | 1968 | 1969 | |
Richard M. Nixon | 1969-1973 | 1970 | 1971 | 1972 | ||
1973-1974 | 1973 | 1974 | ||||
Gerald R. Ford | 1974-1977 | 1975 | 1976 | 1977 | ||
Jimmy Carter | 1977-1981 | 1978 | 1979 | 1980 | 1981 | |
Ronald Reagan | 1981-1985 | 1982 | 1983 | 1984 | ||
1985-1989 | 1985 | 1986 | 1987 | 1988 | ||
George Bush | 1989-1993 | 1990 | 1991 | 1992 | ||
William J. Clinton | 1993-1997 | 1994 | 1995 | 1996 | ||
1997-2001 | 1997 | 1998 | 1999 | 2000 | ||
George W. Bush | 2001-2005 | 2002 | 2003 | 2004 | ||
2005-2008 | 2005 | 2006 | 2007 | 2008 | ||
Barack Obama | 2009-2012 | 2009 | 2010 | 2011 | 2012 | |
2013-2016 | 2013 | 2014 | 2015 | 2016 | ||
Donald J. Trump | 2017-2020 | 2017 | 2018 | 2019 | 2020 | |
Joseph R. Biden | 2021-2025 | 2021 |
(Source: https://www.presidency.ucsb.edu/sou.php)
Flesch-Kincaid Scores Comparatively
The Flesch-Kincaid score is commonly used for determining the age-appropriateness of reading material. It was developed originally in the 1940s by Rudolf Flesch, who wrote Why Johnny Can't Read. The current version of the formula was developed together with J.P. Kincaid for the Navy in the 1970s:
(0.39 * average_words_per_sentence)
+ (11.8 * average_syllables_per_word) - 15.59
It is a United States Government Department of Defense standard (DOD MIL-M-38784B). The score indicates that the text would be at the limit of comprehension for a person with the equivalent of that number of years of education; in a comprehension test, that person would answer 50 per cent of the questions correctly. It is estimated that the population of the U.S. has an average reading ability at the eighth-grade level.
The use of this metric is controversial. It does not account for sentence structure, vocabulary, style or context. This implementation is fraught with potential inaccuracies in the determination of the number of syllables in a word (possibly as much as ± 10%). It is probably best at comparing similar types of text. There are several newer, potentially more accurate measures of readability. The accuracy and applicability of the metric over 200 years is doubtful given changes in both language and education. So the use here of Flesch-Kincaid requires some explanation.
Flesch-Kincaid remains one of the best known and frequently used metrics of readability. Its correlation with grade-levels gives a simple and accessible sense of the metrics meaning. It is a convenient quantitative marker of style that has broad current acceptance. In examining the historical corpus of the State of the Union, a set of documents with exceptional continuity over time, it provides a measure of the gradual changes in language use.
One interesting finding of the project is in fact that the trend of the score is so consistent and seemingly uncorrelated with particular presidents. From this study it is not possible to determine whether this is a change in language generally or one specific to political language.
The following chart shows Flesch-Kincaid Scores of a variety of texts:
Date | Title | Author | Type | Score |
---|---|---|---|---|
1611 | King James Bible | Literature | 11.0 | |
1775 | Give Me Liberty or Give Me Death | Patrick Henry | Speech | 7.0 |
1776 | Declaration of Independence | Document | 15.1 | |
1787 | US Constitution | Document | 17.8 | |
1788 | The Federalist Papers | Alexander Hamilton, Et. Al. | Essays | 17.1 |
1850 | Scarlet Letter | Nathaniel Hawthorne | Literature | 11.0 |
1863 | Gettysburg address | Abraham Lincoln | Speech | 11.17 |
1865 | Alice's Adventures in Wonderland | Charles Dodgson | Literature | 6.3 |
1906 | What It Means to be Colored | Mary Church Terrell | Speech | 15.0 |
1906 | Taxes and Morals | Mark Twain | Speech | 9.1 |
1914 | Tarzan of the Apes | Edgar Rice Burroughs | Literature | 9.4 |
1921 | The Morality of Birth Control | Margaret Sanger | Speech | 11.5 |
1922 | Ulysses | James Joyce | Literature | 6.8 |
1929 | A Room of One's Own | Virginia Woolf | Literature | 11.8 |
1932 | Brave New World | Aldous Huxley | Literature | 7.4 |
1950 | Declaration of Conscience | Margaret Chase Smith | 13.7 | |
1954 | Lord of the Flies | William Golding | Literature | 4.8 |
1960 | To Kill a Mocking Bird | Harper Lee | Literature | 6.0 |
1963 | I Have a Dream | Martin Luther King Jr. | Speech | 9.4 |
1964 | The Ballot or the Bullet | Malcolm X | Speech | 7.8 |
1966 | In Cold Blood | Truman Capote | Literature | 7.9 |
1973 | New International Bible | Literature | 13.5 | |
1973 | Gravity’s Rainbow | Thomas Pynchon | Literature | 9.5 |
1991 | Statement to the Senate Judiciary Committee on Clarence Thomas | Anita Faye Hill | Speech | 8.9 |
1993 | New York Times | Newspaper | 14* | |
1993 | LA Times | Newspaper | 14* | |
1993 | Washington Post | Newspaper | 14* | |
1993 | Associated Press | Newspaper | 13* | |
1993 | Wall Street Journal | Newspaper | 11* | |
1993 | Newsweek | Newspaper | 11* | |
2004 | Commencement at the U of Penn | Bono | Speech | 5.9 |
2006 | The (Sorry) State We Are In | Brad Borevitz | Essay | 13.7 |
* Average Score
(Note: There is a rumor that USToday is written at 8th grade level, but
I have found no documentation of that, and spot checks seem to indicate
that it is probably in the same range–around 12–as other
papers.)
(Sources: for Literature Amazon.com, for Newspapers Jack Hart, Editor & Publisher,
November 6, 1993 (v126 i45 p.5) quoted at https://answers.google.com/answers/threadview?id=301734,
for others original analysis.)
Statistical Methods Used in this Project
The methods used in this project all rely on frequency counts of words. The size of the word is determined simply by the frequency of its occurrence in the document. The x (horizontal) position within the interface is determined by the average position of the word in the document. The y position of displayed words (the vertical) is determined by a calculation of the words Relative Frequency Score, S.
The S score is an attempt to quantify significance as a combination of frequency and a determination of the uniqueness of the words use within an individual document as compared to its use in the corpus as a whole. S is average of two statistics: the Log Likelihood Statistic (LLS), the and the Term Frequency–Inverse Document Frequency (TF-IDF) . Each of these scores has been normalized so that the minimum and maximum values of each component over the corpus are comparable (between 0 and 100).
The formula for the LLS is:
LLS = 2f * ((freqInCorp * log(freqInCorp / E1)) + (freqInDoc
* log(freqInDoc / E2)))
where
E1 = wordsInCorp * (freqInCorp + freqInDoc) / (wordsInCorp
+ wordsInDoc)
E2 = wordsInDoc * (freqInCorp + freqInDoc) / (wordsInCorp + wordsInDoc)
In order to even out the scale of the LLS, the following formula was employed to derive the log of the LLS+1, (L1LLS):
L1LLS = log(LLS+1)
The formula for TF:
TF = freqInDoc/wordsInDoc
The formula for IDF:
IDF = log( docsInCorpus / docsWhereWordOccurs)
The formula for TF-IDF:
TF-IDF = TF * IDF
The formula for S is as follows:
S = (L1LLS + TF-IDF) / 2
(values are normalized before being averaged)
Words are filtered to eliminate the most commonly used words.
For each document, the words with the top 40 S scores are selected. Depending on the length of the address, words with too few occurences are also filtered out.
Source Code
Interface source code (Processing.js):
SotuDisplayJS.pde and jsStuff.js
SotuGraph source code (Processing.js):
SotuGraphJS.pde
Data:
Word frequency data (normalized as frequency per 10,000 words, i.e. 10000*count/length as integers by rounding down) by year for all words in tab delimited plain text file history.txt.gz
Analyzed word data for top words by document in JSON format documentsData.json.gz
Text of addresses:
stateoftheunion1790-2021.txt.zip