Ben Hancock

Computational Journalism, Python, and Linux

Introducing CalScrape for Judicial Calendars


On the U.S. District Court for the Northern District of California, there are 31 different judges, each with their own separate court calendar. Each is juggling hundreds of cases at a time, at courthouses ranging from San Francisco, to Oakland, to San Jose, and as far north as Eureka. For journalists who cover the judiciary, it can be difficult to keep track of every newsworthy hearing -- let alone all the intermediary steps in a case that happen before a judge. And that's just at one federal court.

This is the problem that I set out to tackle in creating CalScrape, a command-line web scraping tool written in Python that lets users quickly search across judicial calendars by keyword (or a set of keywords). It's been a work in progress for a while now, but last weekend I released the project on GitHub under an open source license. The project has already gotten its first code contribution (thanks Robert!), and while it currently only supports the Northern District of California, I'm excited to add more courts in the near future.

Although I started this project primarily as a time-saving tool for fellow reporters, I've come to realize in the process how daunting it can be for ordinary members of the public to get information about day-to-day goings on in the judicial system. Journalists at least can sign up for email alerts for cases they know about using the electronic filing systems at some federal courts. I don't think the same level of access is available to most people. The court records system, PACER, is certainly a walled garden -- which is why the work being done by The Free Law Project is so important. The unequal level of access seems anathema for what is supposed to be a public, transparent system. I'm hoping that CalScrape can also start to help bridge the information gap, if even in a small way.

I still have yet to lay out the full road map for this project, but in addition to adding support for other high-profile courts, I'm also hoping to make the tool more user-friendly for folks who aren't as comfortable in the terminal. So check out the project on GitHub and stay tuned for more updates.