GovTrack is an open source project, which means that the inner workings of this site are completely transparent and open for revision --- just like we want the government to be! Although it is just maintained by one person at the moment (see the about page), I hope others will become involved.
The rest of this page is intended for programmers.
There are three components to GovTrack:
- The "front-end": This is the website that you see. It is based on ASP.NET, though it uses a very custom page generation system based on XSLT.
- The "legislative database": The legislative database is primarily a very large collection of XML files that is derived from a variety of government websites. The data files are used to power a variety of other websites (some are listed on this page).
- The "back-end": These are a collection of Perl scripts that access government websites to create and update the legislative database.
I would love to have you get involved. If you have ideas that you'd like to implement on this site, for instance, please see below for how you can hack on the site, and drop me an email!
If you want to get involved or are using the legislative database, I strongly encourage you to join the GovTrack mail list.
There are also some widgets & APIs that you might be interested in.
The Front-end
To get involved with the development of the front-end of GovTrack (i.e. this website that you see), and if you are running a Linux or probably Mac OS X computer, you can set yourself up to run a version of the GovTrack website on your own computer. You will be able to modify the source code of the website, and if you make changes you can send a patch to me.
The pages and source code of the site are licensed under the GNU AGPL. In short, you may only make modifications to the code if you make your modifications publicly available.
You will need installed Mono (including "mcs" and "xsp") and Subversion.
Make a directory for GovTrack files.
Checkout the website "page" files from the source repository. These files are the XSLT templates (.xpd) that generate the pages of the site. This will create a "www" directory.
svn co svn://occams.info/govtrack/website/www cd www
Download the website "code" binary .NET DLLs. These are some helper routines for the front-end files.
wget http://www.govtrack.us/frontend_bin.tgz tar -zxf frontend_bin.tgz rm frontend_bin.tgz
At this point you can start the website in sandbox mode running locally on your system. You can visit the site by visiting http://localhost:8080/index.xpd. The website will download data files from GovTrack's web server as it needs them (and will store them on disk for later), and will connect to GovTrack's MySQL database to access other information.
SANDBOX=1 xsp2
Once data files are downloaded, they won't be updated from GovTrack's server. So your files will go out of data. To update them efficiently, use this command:
rsync -az --existing govtrack.us::govtrackdata/us/110 data/us/
The sandbox won't download files needed by the web browser only, so PDFs for bills and automatically generated images like vote maps will not appear. If you really want all of the files, you can download them for a current session of Congress with the command below. It will download almost 500 megabytes, so for both your and my sake, don't do this unless you specifically want the missing files:
rsync -az govtrack.us::govtrackdata/us/110 data/us/
Additionally, the sandbox does not have access to the user profiles database, which means you cannot "log in" in the sandbox.
Some backend .NET code is used as helper functions to generate the pages of the website. The code is compiled to www/bin/GovTrackWeb.dll. To edit this code, check out the backend source files.
(cd out of the www directory) svn co svn://occams.info/govtrack/website/src
After editing files, recompile the binary by running make:
cd src make
The Legislative Database

The underlying data about the U.S. Congress that powers this site is the only such database made freely available for others to reuse. Provided I have any copyright claims to any of the data described here, I am releasing it into the public domain.
The data is primarily in XML format. You can browse the underlying source data for this website here. These are the very same data files that GovTrack uses to make itself go, so pretty much anything you see on the site is in one of those files.
Some documentation of the structure of the data files is in the Data Directory page on the wiki.
The source data is 16 gigabytes in all, so don't think about downloading the whole thing in one shot. And be nice on my bandwidth. The data covers the activity of bills, PDFs of bill texts, roll call votes, indexing for fast searches (meant for me not for you), and photos of members of Congress. Almost all of the data files are in XML format.
Please contact me if you would like to start using the data, just because I'm curious and like to know what it's being used for. You can download the data efficiently using rsync with the following Linux command:
rsync -az govtrack.us::govtrackdata/us/110/bills .
This will download the 110th Congress bill data into a directory called bills in the current directory. The first download should be roughly 75MB. Subsequent updates will be much less. The directory structure exposed by rsync mirrors the HTTP-browsable data directory (but, again, please don't do massive downloading by HTTP). If you're using Windows, please look for a Windows rsync client.
The XML files are updated roughly daily (a good time for you to rsync them is 4PM Eastern time, daily). The directories for roll call votes (e.g. .../110/rolls) are updated much more frequenctly. If you need almost-real-time roll call vote data, you can rsync that directory hourly.
These files you might find most interesting:
- people.xml: Everyone that has ever served in Congress, with their party affiliation, terms in Congress, birthdays, etc. (This file is quite large... best not to open it in your browser.)
- bills.index.xml: A summary of of the bills introduced this session of Congress. (This file's format has been completely revised as of 2007-01-14.)
RDF Data for the Semantic Web
Most of the data that powers this site is archived in RDF, the data format of the Semantic Web. It's around 13 million triples, covering about eight years of legislative information. The vocabularies used in the data include FOAF, vCard, and several schemas I created.
There are a few ways that you can access the data. The first is by downloading the RDF files, which are in a mix of XML and N3 formats. You can also interactively browse the web of information. Lastly, you can query the data store using SPARQL.
The RDF files aren't regularly updated at the moment, and the structure is subject to change.
The Back-end
The back-end is a collection of Perl scripts that basically screen-scrape a handful of government websites to build the legislative database described above.
The scripts are in the process of being made publicly available.
The source code of the back-end are licensed under the GNU AGPL. In short, you may only make modifications to the code if you make your modifications publicly available. I am very serious about these terms. Again, you can download and use, but you cannot enhance the code without sharing your enhancements.
You can check out some of the back-end files with Subversion:
svn co svn://occams.info/govtrack/gather/us
If you checked out the files into the us directory, the scripts expect that there is a data directory along side the us directory. It's the same data directory as referenced above.
You will need a whole bunch of Perl modules to run the scripts. The best way to figure out which is to look at the use directives in the scripts, or to just run them and see what's missing.
Some scripts have some dependencies on files that I haven't made available yet.
And I haven't yet written any documentation on how to use the scripts. Sorry!



