Bulk Raw Data Documentation

GovTrack’s bulk raw data can be used to build other tools around Congressional information.

Our bulk data rsync server provides access to most of the information found on GovTrack. Data are from github:unitedstates/congress-legislators and github:unitedstates/congress, community projects that we helped create and now help maintain.

Consider using our API if you only need a small slice of the data.

Getting Started

Consider joining our mail list for notices about any data format changes. If you have any ideas for making GovTrack data better, please post a message to let others know. We’d appreciate it if you let us know how you are using the data, just to satisfy our curiosity. Email us at the address at the bottom of the page.

You must agree to the license terms before accessing the bulk data or API. The terms are about as unrestrictive as terms go. Note that we do not make a distinction between commercial and noncommercial use. If you are looking for an SLA, you’ll need to contact us at the address at the bottom of the page.

Overview

You can familiarize yourself with the contents of our bulk data by browsing http://www.govtrack.us/data/congress-legislators/ and http://www.govtrack.us/data/congress/. Get an idea for the directory structure and what the files look like.

An explanation of the directory layout of the files and their formats is given below.

Getting the Data

To fetch the data we support rsync, a common Unix/Mac tool for efficiently fetching files and keeping them updated as they change. The root of our rsync tree is govtrack.us::govtrackdata, and this corresponds exactly to what you see at http://www.govtrack.us/data/.

To download bill data for the 113th Congress into a local directory named bills, run:

rsync -avz --delete --delete-excluded --exclude **/text-versions/ \
		govtrack.us::govtrackdata/congress/113/bills .

(Note the double colons in the middle and the period at the end. This is a long command. I’ve indicated the line continuation with a backslash.)

This directory will grow as bills are introduced, and files will be updated regularly as we pull new information from Congress. To keep your files fresh, just run the command again. It will only download new and updated files!

The complete data directory is around 100 gigabytes in all, so keep your rsync command as narrowly focused as possible.

Although you can also see these files over HTTP, we discourage using HTTP to actually download the data in bulk. Use it for a few files, but don’t hammer our server with tens of thousands of HTTP requests.

For more info about rsync, see our rsync notes and the rsync documentation.

Terminology

A “Congress”

Most files are organized by “Congress.” A “Congress” is a two-year term of activity, starting in the year after an election year. Many things in Congress reset after each two year term, such as bill numbers. Each year is called a “session”. (In historical data, the durations of sessions and Congresses were more arbitrary.)

Congresses start and end on January 3 of odd-numbered years. The 113th Congress started on Jan 3, 2013 at noon and will end on Jan 3, 2015 at noon. (Again, in historical data the start and end dates of Congresses was more arbitrary.)

Congresses are divided into two “sessions”, which correspond roughly to calendar years. But they aren’t exactly calendar years, so we also call them legislative years. In our data, sessions are identified by the calendar year in which they start, e.g. 2014 is the name for the second session (or legislative year) of the 113th Congress and while it began in 2014 it will end on Jan 3, 2015.

Bulk Data Files

Members of Congress (and Presidents/Vice Presidents)

  • /data/congress-legislators/
    This directory contains files in YAML and CSV format containing information on Members of Congress and presidents and vice presidents from 1789-present. It is basically a mirror of github:unitedstates/congress-legislators, and see that project for documentation. The CSV file contains only a subset of the fields from the YAML files.
  • /data/photos
    This directory contains JPEG images of Members of Congress, past and present. The name of the photo is the GovTrack numeric identifier for the person followed by: .jpeg, for the largest original image available; -200px.jpeg, -100px.jpeg, and -50px.jpeg for three standard sizes of the photo by width; or -credit.txt which is a tab-delimeted file containing first the URL of where the image was acquired from and second the name of the source, both intended to be used in credit links. The photos are sourced from various locations, and many come via the github:unitedstates/images project on github.

Committees and Committee Assignments

Bills, Amendments, Votes

Data files for bills, amendments, and votes are contained in directories named as /data/congress/{congress}, by the Congress number in which the bill or amendment was introduced or the vote took place. The files are the output of the scrapers developed in the github:unitedstates/congress project.

Within these directories you will find...

  • bills/{bill-type}/{bill-type}{number}/data.json
    example: /data/congress/113/bills/hr/hr4015/data.json
    Bill status in JSON format for the 93rd Congress (1973) forward, plus limited data for the 82nd-92nd Congresses (1951-1972; statutes and enrolled concurrent resolutions only) and the 6th-42nd Congresses (1799-1873). See the documentation at the github:unitedstates/congress project for details. See our coverage table for details. You’ll also see XML files here which are for legacy applications and are no longer supported.
  • bills/{bill-type}/{bill-type}{number}/text-versions/{version}/...
    example: /data/congress/113/bills/sconres/sconres14/text-versions/is
    Bill text and associated metadata, comprehensively since the 103rd Congress, plus OCR'd stext of statutes for the 82nd-92nd Congresses (1951-1972), and links to scans for bills in the 6th-42nd Congresses (1799-1873). Bills change during their life cycle, and each “print” from the Government Publishing Office has a version code, but use the publication date and version name in the metadata rather than the version code. The text itself is stored in multiple formats. See the documentation for what formats are available and our coverage table for other details.
  • amendments/{amdt-type}/{amdt-type}{amdt-number}/data.json
    The metadata and status of floor amendments. See the documentation for details of the JSON format. Amendment information is available starting with the 97th Congress.
  • votes/{session}/{chamber}{vote-number}/data.json
    example: /data/congress/113/votes/2014/h108/data.json
    Roll call vote results. See the documentation for details of the JSON format. Note that not every vote in Congress is a roll call vote.
  • /data/us/sessions.tsv
    A TSV file containing the start and adjournment dates of each session of Congress.

Other files that you may see in the /data directory are unsupported. Use at your own risk.