Bulk Raw Data Documentation

GovTrack’s bulk raw data can be used to build other tools around Congressional information.

Our bulk data rsync server provides access to most of the information found on GovTrack. Data for the 113th Congress (2013 and on) are from github:unitedstates/congress-legislators and github:unitedstates/congress, community projects that we helped create and now help maintain. Historical data is from our legacy scrapers.

Consider using our API if you only need a small slice of the data.

What Data Is Available

  • Bills and Resolutions: Our database contains metadata about bills and resolutions (title, status, actions, etc.) from the 82nd Congress through the present, but consult the coverage table for details about what is available when.
  • Roll Call Votes: Our database contains roll call votes from the founding of the nation. See below.
  • Other Data: We have some other data as well. See below, but also consult the two github projects listed at the top!

Getting Started

Consider joining our mail list for notices about any data format changes. If you have any ideas for making GovTrack data better, please post a message to let others know.

You must agree to the license terms before accessing the bulk data or API. The terms are about as unrestrictive as terms go. Note that we do not make a distinction between commercial and noncommercial use.

If you are building a serious product and want a contractual guarantee that the data will remain available, like a SLA, you’ll need to contact us at the address at the bottom of the page. That entails a nominal fee.

We’d appreciate it if you let us know how you are using the data, just to satisfy our curiosity. Email us at the address at the bottom of the page.

Getting the Data

The raw data is 43 gigabytes in all. As a result, we have a few different methods for helping you get the data you want:

  • You can browse the bulk data at http://www.govtrack.us/data. Start here to get an idea for the directory structure and what the files look like. We discourage using HTTP to actually download the data in bulk, though. (Also see the API for HTTP-based access.)
  • “Rsync” is the preferred method for obtaining the bulk data. You can use rsync to choose just the directories you want and it efficiently keeps your files up to date by downloading only changes since your last update. Rsync Instructions >

Bulk Data Schema

General Structure

Most files are organized by “Congress.” A “Congress” is a two-year term of activity, starting in the year after an election year. Many things in Congress reset after each two year term, such as bill numbers. In GovTrack, a "Congress" is called a "session" (which is actually a misnomer because each "Congress" is made up of two "sessions" which follow the calendar years).

The 112th Congress (session = 112) roughly covers the period 2011-2012, although technically the Congress usually starts a few days into January of the first year and extends a few days into January of the year following the last full year of the Congress. Each Congress is in its own directory: data/us/112, data/us/111, data/us/110, etc. We have roll call data going back to the first Congress, so we have data going back to the directory data/us/1.

Schemas

  • Bills and resolutions are stored in data/us/112/bills for the current Congress and similarly named directories for other Congresses. Bill XML Schema >
  • Roll call votes are stored in data/us/112/rolls for the current Congress and similarly named directories for other Congresses. Votes by unanimous consent, for example, are not included here because they are not recorded votes. Roll Call Vote XML Schema >

Other Files

  • data/us/sessions.tsv gives the start and end date of each session (typically one year) and Congress (typically two years).
  • The data/photos directory contains jpeg images of Members of Congress, past and present. Not all MoC's have photos. The name of the photo is the GovTrack numeric identifier for the person followed by: nothing, for the largest original image available; 200px, 100px, 50px, for three sizes of the photo, by width; all followed by .jpeg. -credit.txt files give a tab-delimeted source URL and source description information for each photo.
  • The data/us/bills.text directory includes the text of legislation in PDF, XML, and text format (from GPO) and in HTML format (from THOMAS). It is organzized by Congress, bill type, and GPO bill status code. See GPO's documentation on bill text status.
  • DEPRECATED AS OF 1/3/2013 — data/us/people.xml: This file contains everyone that has ever served in Congress, and U.S. presidents, with their party affiliation, terms in Congress, birthdays, etc. This file is quite large... best not to open it in your browser. This file has been put together from a variety of sources and is maintained by hand. People.xml Schema >