Analysis Methodology

Statistical analyses of legislation and legislators provide context for the legislative process. Of all of the 10,000+ bills pending at any given time, our unique analyses help GovTrack visitors know what is relevant and what to pay attention to.

Ideology

The Ideology Analysis compares the sponsorship and cosponsorship patterns of Members of Congress to put them on a scale roughly from liberal to conservative. Read More »

Prognosis

The Prognosis Analysis looks at the factors that help or hurt a bill’s chance of getting out of committee and being enacted. It is based on a regression model. Read More »

Leadership

The Leadership Analysis looks at who is cosponsoring whose bills to see who the legislative leaders are. It’s a little like if you scratch my back will I scratch yours? The analysis is based on Google PageRank, the algorithm Google uses to order search results. Read More »

Ideology Analysis of Members of Congress

The ideology analysis assigns a liberal–conservative score to each Member of Congress based on his or her pattern of cosponsorship.

In a nutshell, Members of Congress who cosponsor similar sets of bills will get scores close together, while Members of Congress who sponsor different sets of bills will have scores far apart. Members of Congress with similar political views will tend to cosponsor the same set of bills, or bills by the same set of authors, and inversely Members of Congress with different political views will tend to cosponsor different bills.

You can find this analysis on the pages for current Members of Congress.

The charts to the right plot the ideology score on the horizontal axis and the leadership score on the vertical axis. Look at the extremes. For instance, Sen. Jim Inhofe appears as the most extreme Republican in the Senate chart and he is widely regarded as one of the most conservative senators.

Overview

The data that goes into this analysis is a list of who sponsored or cosponsored which bills. The process doesn’t look at the content of the bills or the party affiliation or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like left-right ideology.

You’ll see in the charts on the right that the ideology analysis does a good job at separating the Democrats from the Republicans, and within each party the moderates from the extremes. If you wanted to know how your representatives stood in relation to their peers ideologically, this chart is a good place to start.

We first began publishing this analysis in 2004, then calling it a political spectrum. A similar analysis by Professor Keith Poole using voting records rather than cosponsorship produces similar results: see voteview.com. (As far as we know, we were the first to apply this sort of analysis to cosponsorship behavior.)

Methodology

The statistical method behind this analysis is Principal Components Analysis, also known as dimensionality reduction. Principal Components Analysis is a statistical technique that reveals underlying patterns in data.

Here’s how it works: Form a matrix (a grid of numbers) with columns representing Members of Congress and rows also representing Members of Congress. Do this for the House and Senate separately. We include (co)sponsorship from the current and previous two Congresses, so between four and six years of data. For the Senate, you have a 100x100 table. In each cell of the table, put the number of times the senator for the row cosponsored a bill introduced by the senator for the column. Or if it's the same senator in the row and column, put in the number of bills he or she introduced. Then compute the singular value decomposition of the matrix (which is how Principal Components Analysis is often done).

Every square matrix has a singular value decomposition. The magic is in how you interpret it. The singular value decomposition takes one matrix and gives you back three: called u, s, and v-transpose. V-transpose can be interpreted as a set of scores for each Member of Congress on a new set of dimensions. The dimensions are ranked in order by how much of the original data they explain. We have found that the second dimension best corresponds with ideology. We use the scores from that dimension in our charts.

Each score is a number. It’s entirely arbitrary whether liberal or conservative is positive or negative — the original matrix is blind to actual information like that. In fact, there’s no guarantee that these numbers even have anything to do with liberal- and conversative-ness. All it tells us is how to separate Members of Congress into two groups, or more precisely how to spread them out along a spectrum in a way that explains their record of cosponsorship. But in practice it captures ideology very well.

(In the original version of this analysis called the political spectrum, the rows were Members of Congress and the columns were bills. That is, form a matrix with a 1 in each cell where the Member of Congress corresponding to the row sponsored or cosponsored the bill corresponding to the column. The change was made only to reuse the source code with the leadership analysis, which needs a member-member matrix.)

Data

The ideology scores can be found in two CSV files sponsorshipanalysis_h.txt and sponsorshipanalysis_s.txt (House and Senate) over here.

Source Code

Running this analysis is pretty simple in Python. It is literally two lines. Assuming you have the cosponsorship matrix in P:

u, s, vT = numpy.linalg.svd(P)
ideology = vT[1,:]

The full source code for this analysis can be found on github.

Citation

To cite our methodology and results, we recommend either of these:

GovTrack.us. 2013. Ideology Analysis of Members of Congress. Accessed at https://www.govtrack.us/about/analysis.

Tauberer, Joshua. 2012. Observing the Unobservables in the U.S. Congress, presented at Law Via the Internet 2012, Cornell Law School, October 2012. [text | slides | video]

References

For more on how to use singular value decomposition, check out:

Wall, Rechtsteiner, and Rocha. “Singular value decomposition and principal component analysis.” in A Practical Approach to Microarray Data Analysis. D.P. Berrar, W. Dubitzky, M. Granzow, eds. pp. 91-109, Kluwer: Norwell, MA (2003). LANL LA-UR-02-4001.

Leadership Analysis of Members of Congress

A leadership score is computed for each Member of Congress by looking at how often other Members of Congress cosponsor their bills — more or less. The analysis is based on PageRank, Google’s algorithm for ranking pages on the web.

The idea behind a leadership score is that if X cosponsors Y’s bills but Y does not cosponsor X’s bills, then X is a follower relative to Y being a leader.

You can find this analysis on the pages for current Members of Congress.

The charts to the right plot the leadership score on the vertical axis and the ideology score on the horizontal axis.

There are some interesting things in this chart. There’s a distinct V-shape. Congressional leaders appear to be more extreme. There are some confounding effects to consider here. Leaders tend to be more senior members of Congress, they tend to be older, and they have had more time to participate in legislating. But somewhere among those factors there’s an interesting correlation to having an extreme political ideology.

These leadership and ideology scores give us a view into Congress that is normally hidden to us. We can’t observe leadership. We’re not there, in Congress, to see it. We’re not in the meetings where you can see relationships form. But those relationships are known to the representatives and senators. It’s obvious to them. They know whether they lead or follow. Their staff know. This is a sort of social knowledge that is locked within the institution of Congress, unless we get a little creative with how we try to observe it.

Overview

The data that goes into this analysis is a list of who sponsored or cosponsored which bills. The process doesn’t look at the content of the bills or anything else about the Members of Congress, but it is able to infer underlying behavioral patterns, some of which correspond to real-world concepts like leadership.

We first began publishing leadership scores in 2010. As far as we know, this analysis is unique to GovTrack.

Methodology

The inspiration for this analysis comes from Google’s PageRank algorithm, which governs how Google ranks the order of pages in its search results. Google’s method is widely known: the more links you get to your website from other websites, and the more links those other websites have, the higher your PageRank and the higher up in search results you appear.

Here’s how we apply it to Congress: the more Members of Congress that cosponsor Member X’s bills, and the more cosponsors those other Members of Congress have, the higher X’s leadership score.

We start by forming a matrix (a grid of numbers) with cosponsorship data. It is the same matrix as in the ideology analysis, so see the methodology section there for details. Then we run the PageRank algorithm on the matrix, which yields a new number for each Member of Congress. That is the leadership score.

This analysis came from a suggestion from Joseph Barillari (who GovTrack’s creator knew in college). (The original formulation of the score for Member of Congress X was the mean across all other Members of Congress Y of the log of the number of bills sponsored by X and cosponsored by Y divided by the number of bills sponsored by Y and cosponsored by X.)

Data

The leadership scores can be found in two CSV files sponsorshipanalysis_h.txt and sponsorshipanalysis_s.txt (House and Senate) over here.

Source Code

Here is pseudo-code in Python. Assuming you have the cosponsorship matrix in P:

x = numpy.ones( (N, 1) ) / float(N)
while True:
    y = numpy.dot(P, x)
    if onenorm(y-x) < .00000000001: break
    x = y
def onenorm(u): return sum(abs(u))

The full source code for this analysis can be found on github.

Citation

To cite our methodology and results, we recommend either of these:

GovTrack.us. 2013. Leadership Analysis of Members of Congress. Accessed at https://www.govtrack.us/about/analysis.

Tauberer, Joshua. 2012. Observing the Unobservables in the U.S. Congress, presented at Law Via the Internet 2012, Cornell Law School, October 2012. [text | slides | video]

References

Kamvar, Sep. 2010. Numerical algorithms for personalized search in self-organizing information networks. Princeton University Press.

Bill Prognosis Analysis

GovTrack computes a prognosis for each bill, which is the probability that the bill will be enacted. Our computation is based on factors that are correlated with successful or failed bills in the past, such as whether the sponsor is a committee chair.

What is the point of this?

  • More than 10,000 bills will be considered by each Congress. About 4% will become law. Which bills should we focus on?
  • Representatives and senators, their staff, and lobbyists all know what bills are important because they have the institutional knowledge of what makes a bill important. The prognosis highlights the factors that make a bill successful.

The prognosis scores can be found on the pages for bills throughout the site.

Overview

The data that goes into this analysis are factors that we compute for bills, such as whether the sponsor is a committee chair (see right for a full list), and whether the bill was successful. We “train” the model on bills from the 113th Congress (2013-2015) to compute probabilities for bills in the current Congress.

We first began publishing prognosis scores in 2012. As far as we know, we were the first to apply this analysis to Congressional bills.

Methodology

This analysis is based on a logistic regression. Logistic regression is similar to simple linear regression but it is more appropriate when modeling probabilities. We create eight separate models: For each of the four types of legislative measures (bills, joint resolutions, concurrent resolutions, and simple resolutions), we compute one model that predicts whether the bill/resolution will get out of committee and a separate model that computes, for bills/resolutions out of committee, whether the bill/resolution will be enacted or agreed to.

The independent variables are the binary factors mentioned above and listed in the factors table at the right.

The dependent variable is how successful the bill or resolution was. When predicting whether a bill or resolution will make it out of committee, it is a binary variable. When predicting whether a bill will be enacted or a resolution agreed to, this is a continuous variable computed as the percentage of paragraphs in the bill that appear in any enacted bill (and similarly for resolutions). We do this because there are often identical bills in Congress (so-called companion bills) and often bills are incorporated into other bills (such as omnibus bills), and we want to give the original bills credit for being successful even if the original bill itself is not enacted per se.

The output of the logistic regression models are weights assigned to the factors, called β in the table at the right. The prognosis score for a bill is computed by multiplying all of the factors together that apply to the bill (more or less, see logistic regression on Wikipedia for details). The result is a number that can be interpreted as a probability.

In choosing the factors for model, we select from a large set of plausible factors those which appear to be statistically significant on their own (using a binomial distribution). After the logistic regression, we remove factors that appear statistically non-significant and re-compute the model.

Citation

To cite our methodology and results, we recommend either of these:

GovTrack.us. 2013. Bill Prognosis Analysis. Accessed at https://www.govtrack.us/about/analysis.

Tauberer, Joshua. 2012. Observing the Unobservables in the U.S. Congress, presented at Law Via the Internet 2012, Cornell Law School, October 2012. [text | slides | video]

References

Here is some academic work on the same subject:

Tae Yano, Noah A. Smith, and John D. Wilkerson. 2012. "Textual Predictors of Bill Survival in Congressional Committees," at New Directions in Analyzing Text as Data 2012, 5-6 October at Harvard.

John Wilkerson, David Smith, Nick Stramp, and Jeremy Dashiell. 2013. "Tracing the Flow of Policy Ideas in Legislatures: A Computational Approach".

Results

The following tables show how various factors help or hurt a bill or resolution’s chance of making it out of committee and getting enacted (or agreed to). Two tables are given for each of the four bill types.

In the tables, N is the number of bills/resolutions that had the indicated factor in the training corpus; %S is of bills with this factor, the percent that were successful (past committee or enacted); and β is the regression coefficient (weight) from the prognosis analysis. Higher weights increase the bill or resolution’s probability of success.

Bills sent out of committee to the floor

Overall, about 15% of the 8,905 bills in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N %S β Factor
67 72% 2.5 Title starts with "To designate the facility of the United States Postal".
27 48% 1.9 Title starts with "A bill to designate the".
534 55% 1.8 Sponsor is a relevant committee chairman.
286 60% 1.6 Got past committee in a previous Congress.
30 53% 1.6 Referred to Senate Appropriations (incl. companion).
799 46% 1.4 A cosponsor is a relevant committee chairman.
70 49% 1.2 Referred to Senate Indian Affairs (incl. companion).
158 22% 1.0 Referred to House Appropriations (incl. companion).
802 34% 0.9 Referred to House Natural Resources (incl. companion).
151 23% 0.9 On a companion bill: A cosponsor is a relevant committee chairman.
412 34% 0.7 Referred to Senate Energy and Natural Resources (incl. companion).
999 28% 0.7 A cosponsor is a relevant committee ranking member.
439 23% 0.6 Has a companion bill sponsored by a member of the other party.
2,312 20% 0.6 Has cosponsors from both parties.
725 28% 0.5 Sponsor is in majority party and 1/3rd+ of cosponsors are in minority party.
2,497 26% 0.5 Sponsor is on a relevant committee & in majority party.
1,650 24% 0.3 Cosponsor has high leadership score (majority party).
3,335 20% -0.2 2 or more cosponsors are on a relevant committee.
345 11% -0.4 Introduced in the last 90 days of the Congress (incl. companion bills).
4,045 8% -0.5 Sponsor is a member of the minority party.
229 8% -0.7 Referred to House House Administration (incl. companion).
1,501 6% -0.7 Referred to House Ways and Means (incl. companion).
391 11% -0.7 Referred to House Veterans' Affairs (incl. companion).
1,032 11% -0.8 Referred to House Judiciary (incl. companion).
387 11% -0.8 Referred to Senate Judiciary (incl. companion).
778 5% -0.8 Referred to House Education and the Workforce (incl. companion).
1,233 9% -0.8 Referred to House Energy and Commerce (incl. companion).
572 6% -0.9 Referred to Senate Health, Education, Labor, and Pensions (incl. companion).
2,228 6% -1.0 Is a bill reintroduced from a previous Congress.
401 6% -1.1 Referred to House Armed Services (incl. companion).
120 8% -1.1 Referred to House Rules (incl. companion).
726 4% -1.3 Referred to Senate Finance (incl. companion).
79 4% -1.8 Referred to Senate Agriculture, Nutrition, and Forestry (incl. companion).
121 2% -1.9 Referred to Senate Armed Services (incl. companion).
32 0% -30.1 Title starts with "A bill for the relief of".
29 0% -30.2 Title starts with "A bill to amend the Internal Revenue Code of".

Simple resolutions sent out of committee to the floor

Overall, about 46% of the 1,385 simple resolutions in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N %S β Factor
21 100% 33.2 Title starts with "A resolution to authorize".
99 98% 4.7 Title starts with "Providing for consideration of".
44 86% 2.9 Title starts with "A resolution recognizing the".
18 22% 2.2 On a companion bill: Sponsor has a high leadership score (majority party).
20 90% 2.1 Title starts with "A resolution commemorating the".
44 89% 1.8 Got past committee in a previous Congress.
97 56% 1.6 Sponsor is a relevant committee chairman.
32 84% 1.1 Title starts with "A resolution congratulating the".
111 33% 0.8 A cosponsor is a relevant committee ranking member.
276 35% 0.6 Cosponsor has high leadership score (majority party).
101 56% -0.7 Referred to Senate Foreign Relations (incl. companion).
86 57% -0.8 Referred to Senate Judiciary (incl. companion).
580 26% -0.9 Sponsor is a member of the minority party.
393 24% -0.9 2 or more cosponsors are on a relevant committee.
30 23% -1.2 Referred to House Ways and Means (incl. companion).
16 19% -1.4 Has a companion bill sponsored by a member of the other party.
26 19% -2.0 Title starts with "A resolution expressing the sense of the Senate that".
179 21% -2.4 Referred to House Foreign Affairs (incl. companion).
149 72% -2.6 Referred to House Rules (incl. companion).
31 16% -2.8 Referred to House Armed Services (incl. companion).
45 20% -3.0 Referred to Senate Health, Education, Labor, and Pensions (incl. companion).
63 6% -3.2 Referred to House Judiciary (incl. companion).
120 3% -3.4 Is a bill reintroduced from a previous Congress.
84 1% -3.7 Title starts with "Expressing the sense of the House of Representatives that".
52 13% -4.2 Referred to Senate Rules and Administration (incl. companion).
96 2% -4.4 Referred to House Energy and Commerce (incl. companion).
110 3% -4.5 Referred to House Oversight and Government Reform (incl. companion).
78 1% -4.8 Referred to House Education and the Workforce (incl. companion).
21 0% -33.2 Title starts with "Expressing support for designation of the".
20 0% -34.2 Title starts with "Supporting the goals and ideals of National".
21 0% -35.3 Title starts with "Expressing support for the".

Bills enacted

Overall, about 21% of the 1,333 bills that got past committee in 2013-2015 were enacted. The following factors help or hurt that:

N %S β Factor
48 72% 1.8 Title starts with "To designate the facility of the United States Postal".
29 53% 1.3 Referred to House Budget (incl. companion).
36 56% 1.2 Referred to Senate Health, Education, Labor, and Pensions (incl. companion).
205 44% 1.0 Sponsor is in majority party and 1/3rd+ of cosponsors are in minority party.
50 53% 1.0 Referred to Senate Homeland Security and Governmental Affairs (incl. companion).
34 46% 0.9 Referred to House Appropriations (incl. companion).
27 48% 0.9 Referred to Senate Finance (incl. companion).
25 48% 0.9 Referred to Senate Banking, Housing, and Urban Affairs (incl. companion).
28 42% 0.9 Referred to Senate Foreign Relations (incl. companion).
34 41% 0.8 Referred to Senate Indian Affairs (incl. companion).
140 40% 0.8 Referred to Senate Energy and Natural Resources (incl. companion).
50 37% 0.7 Referred to Senate Commerce, Science, and Transportation (incl. companion).
110 44% 0.6 Referred to House Energy and Commerce (incl. companion).
322 41% 0.6 Sponsor is a member of the minority party.
282 39% 0.5 A cosponsor is a relevant committee ranking member.
456 32% 0.3 Has cosponsors from both parties.
142 31% -0.4 Is a bill reintroduced from a previous Congress.
654 29% -0.8 2 or more cosponsors are on a relevant committee.

Simple resolutions agreed to

Overall, about 96% of the 634 simple resolutions that got past committee in 2013-2015 were agreed to. The following factors help or hurt that:

N %S β Factor
54 82% -1.9 Sponsor is a relevant committee chairman.
96 84% -2.3 2 or more cosponsors are on a relevant committee.

Joint resolutions sent out of committee to the floor

Overall, about 19% of the 178 joint resolutions in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N %S β Factor
21 57% 4.6 Sponsor is a relevant committee chairman.
38 50% 3.8 Sponsor is on a relevant committee & in majority party.
57 2% -4.2 Introduced in the first 90 days of the Congress (incl. companion bills).
21 0% -37.9 A cosponsor is a relevant committee ranking member.
56 0% -40.3 Title starts with "Proposing an amendment to the Constitution of the United".

Concurrent resolutions sent out of committee to the floor

Overall, about 39% of the 169 concurrent resolutions in 2013-2015 were sent out of committee to the floor. The following factors help or hurt that:

N %S β Factor
15 93% 3.1 Got past committee in a previous Congress.
65 18% -1.7 Sponsor is a member of the minority party.
18 11% -2.0 Has a companion bill in the other chamber.
17 0% -35.6 Referred to House Oversight and Government Reform (incl. companion).
25 0% -36.6 Title starts with "Expressing the sense of Congress that".

Concurrent resolutions agreed to

Overall, about 83% of the 66 concurrent resolutions that got past committee in 2013-2015 were agreed to. The following factors help or hurt that:

There were no statistically significant factors in the model.

Joint resolutions enacted or passed

Overall, about 42% of the 33 joint resolutions that got past committee in 2013-2015 were enacted or passed. The following factors help or hurt that:

There were no statistically significant factors in the model.

Did it work? The following charts compare the prognoses computed for bills to their actual rate of success. The prognosis model for these charts was trained on the 113th Congress and tested on the 113th Congress.

For each regression model, the bills are divided into 10 bins by prognosis. The median prognosis is plotted on the horizontal axis and the percentage of successful bills in the bin is plotted on the vertical axis.

The prognosis closely estimates the actual chances of a bill getting out of committee. Though the accuracy is much less for other predictions, the rough upward slope in most of the charts shows that the prognosis was often predictive of a bill’s future.

Bills sent out of committee to the floor

Simple resolutions sent out of committee to the floor

Bills enacted

Simple resolutions agreed to

Joint resolutions sent out of committee to the floor

Concurrent resolutions sent out of committee to the floor

Here are some additional charts for machine learning researchers.

The charts below show precision vs. recall plotted parametrically for various values of a success-fail threshold t. Bills with prognosis above t are predicted successes for the purposes of these charts. The prognosis model for these charts was trained on the 113th Congress and tested on the 113th Congress.

Bills sent out of committee to the floor

Simple resolutions sent out of committee to the floor

Bills enacted

Simple resolutions agreed to

Joint resolutions sent out of committee to the floor

Concurrent resolutions sent out of committee to the floor