Today I am publishing two new types of statistics for understanding the behavioral relationships between Members of Congress. The first is a new approach to the leader-follower scores, based on the same algorithm Google uses to rank pages on the web. The second statistic is an update to my political spectrum graph. New charts are presented at the end.

UPDATE 1/22/2011: These images are now posted in zoomable form here.

Introduction

Bulk access to legislative information makes large-scale statistical analyses possible. GovTrack has shown over the last six years that many millions of Americans are interested in getting a deeper understanding of what laws are coming down the pipes and what their elected representatives are doing. Though normally statistical analysis are in the domain of political science and economics research, when presented in a form useful to the public it becomes a valuable resource, among many, for citizens to be engaged with what is happening here in Washington, DC.

Background

The first large-scale statistical analysis I did on legislative data — my 2004 political spectrum — was in the language of statistics a principle components analysis (PCA) of something like a term-document matrix. The idea is that Members of Congress (“terms”) who cosponsor similar sets of bills (“documents”) should be grouped together, while Members of Congress who don’t cosponsor any of the same bills should be grouped far apart. I got the idea after my undergraduate advisor suggested I write a paper on latent semantic indexing, which is based on the same idea. A similar analysis by Professor Keith Poole using voting records rather than cosponsorship produces similar results; as far as I know, I was the first to apply PCA to congressional (UPDATE:) cosponsorship behavior.

The process doesn’t look at the content of the bills or the party affiliation or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like left-right ideology. If you follow the link above, you’ll see that the political spectrum analysis does a good job at separating the Dems from the GOP, and within each party the moderates from the extremes. If you wanted to know how your representatives stood in relation to their peers ideologically, the political spectrum is a good place to start.

The second novel analysis I published was a leader-follower score. This came directly out a suggestion from Joseph Barillari (who I knew in college). The idea behind a leader-follower score is that if I cosponsor your bills but you do not cosponsor my bills, then I am a follower relative to you being a leader. (I formalized this as follows: To compute a leader-follower score for representative X, make a table that lists all other representatives. On each row put the following: the number bills sponsored by X and cosponsored by the representative in that row divided by the number of bills sponsored by the representative in that row and cosponsored by X. The higher the number, the more times others are cosponsoring X’s bills without X returning the favor. Then take the logarithm of each number, and then the mean.)

New Leadership Scores

The first new statistic I am publishing today involves a completely new type of analysis of congressional behavior. The inspiration for this analysis comes from Google’s PageRank algorithm, which governs how Google ranks the order of pages in its search results. Google’s method is widely known: the more links you get, the higher ranked your page but links you get from highly ranked pages are even better. Determining a site’s ranking isn’t trivial because you need to know the ranking of all of the sites linking in, and to get their ranking you need the ranking of the sites linking to them, and on and on. Fortunately there is an elegant mathematical solution that now makes the Web go round.

Google’s PageRank works because it learns which pages are, let’s say, useful by the implicit votes of usefulness found on the web in the form of links. A link is a vote of confidence that the target website is probably useful. This idea can be adapted to any domain that we can view as a network (or “graph”).

In Congress, we can look at the network of who is cosponsoring whose bills. When a representative cosponsors a bill, it is a vote of confidence not only for that bill but also a vote of confidence or loyalty for the bill’s sponsor. If we imagine Members of Congress each as a “web page” and each time a Member cosponsors another Member’s bill it is a link from one “web page” to that of the other, then the PageRank algorithm is going to reveal the ranking of the implicit loyalties directly from the public, official behavior of the Members of Congress.

The results of this Congressional PageRank-style Leadership Analysis run over the last two years of sponsorship data look roughly good. In the Senate, the highest value is given to Harry Reid, the Majority Leader. The Minority Leader, Mitch McConnell, has nearly the highest value among the Republicans. In the House, the leadership values are overall relatively low for the Speaker, party leaders, and party whips. I could only guess about why the Senate and House have this difference. One of the lowest values in the House was given to little-known Rep. Chakka Fattah (PA2), my former congressman, though famous recently for his unique idea of replacing the income tax with a transaction tax.

The results are similar to the old leadership-follower scores.

New Political Spectrum

I am also presenting an update to the political spectrum using the same PCA method but based on a different underlying term-document matrix. In the original version, the terms were Members of Congress and the documents were bills. Basically, you form a matrix (a grid of numbers) with columns representing the representatives and rows representing the bills and put  a 1 in each cell where the representative (co)sponsored the bill (and zeros everywhere else). Then you do the PCA magic (UPDATE: singular value decomposition). In the new version, the documents are also Members of Congress. Here the matrix’s rows are also members, and I put a 1 in each cell where the representative for the column cosponsored any bill of the representative for the row (and zeros everywhere else).

The results are similar to the old political spectrum. I don’t believe there are any particular benefits of this new method, except that its formulation is more parallel to the new Leadership scores than the old political spectrum formulation.

New Charts

Well finally here are some graphics. Each chart below is a scatterplot of Members of Congress. The x-axis is the political spectrum value from the new method (oriented with Democrats on the left, color indicates party for reference). The y-axis is the new Leadership score. In other words, we’d expect Democratic leaders to be in the top left; GOP leaders in the top right; GOP followers in the bottom-right; and so on. The first chart is for the Senate, the second for the House.

I’ve additionally labeled in green the leadership positions in the Senate and House so you can easily locate those folks. Again, it seems to work well in the Senate, not so much in the House.