Getting Bulk Raw Data with Rsync
GovTrack’s bulk raw data can be used to build other tools around Congressional information.
To download GovTrack’s raw data files in bulk, or if you plan to regularly update the files, you should use the rsync tool. Rsync is good for selecting which directory of data you want and keeping your files up to date by only downloading changes on each update.
Using rsync is pretty easy on Linux and Mac if you are comfortable with the command line. It is harder on Windows. Windows users may prefer the GovTrack API.
Rsync On Linux and Mac
Once you install rsync, just type on a command-line:
rsync -avz --delete --delete-excluded govtrack.us::govtrackdata/us/112/bills .
That is all one line. Note the double colons and the period at the end.
Rsync on Windows
On Windows, install DeltaCopy, which contains rsync for Windows. Then on a command line type:
mkdir C:\GovTrackData cd "\Program Files\Synametrics Technologies\DeltaCopy" rsync -avz --delete govtrack.us::govtrackdata/us/112/bills /GovTrackData
Note that you have to give a relative path to your GovTrackData directory because rsync will interpret "C:" as something other than a drive letter, since there are no drive letters in the Unix world. Watch out for the double colons in the middle.
This will put bill XML files in either C:\GovTrackData\bills or C:\cygwin\GovTrackData\bills. cygwin is the name of a common Windows wrapper around Unix tools. That's something to do with DeltaCopy, not GovTrack.
File Structure / Getting Updates
This will download the current bill data into a directory called bills in the current directory. It will be about 50-100 MB. Run the same command again to fetch updated files. You can run this command daily around 4pm.
The directory path after govtrackdata mirrors the same structure you will see if you browse the data directory over HTTP. Use your web browser and consult the raw data schema to figure out what directory(s) you want.