While it is thus quite simple to design a filter that does not overblock, and equally simple to design a filter that does not underblock, it is currently impossible, given the Internet’s size, rate of growth, rate of change, and architecture, and given the state of the art of automated classification systems, to develop a filter that neither underblocks nor overblocks a substantial amount of speech. The more effective a filter is at blocking Web sites in a given category, the more the filter will necessarily overblock. Any filter that is reasonably effective in preventing users from accessing sexually explicit content on the Web will necessarily block substantial amounts of non-sexually explicit speech. 4. Attempts to Quantify Filtering Programs’ Rates of Over- and Underblocking The government presented three studies, two from expert witnesses, and one from a librarian fact witness who conducted a study using Internet use logs from his own library, that attempt to quantify the over- and underblocking rates of five different filtering programs. The plaintiffs presented one expert witness who attempted to quantify the rates of over- and underblocking for various programs. Each of these attempts to quantify rates of over- and underblocking suffers from various methodological flaws.
The fundamental problem with calculating over- and underblocking rates is selecting a universe of Web sites or Web pages to serve as the set to be tested. The studies that the parties submitted in this case took two different approaches to this problem. Two of the studies, one prepared by the plaintiffs’ expert witness Chris Hunter, a graduate student at the University of Pennsylvania, and the other prepared by the defendants’ expert, Chris Lemmons of eTesting Laboratories, in Research Triangle Park, North Carolina, approached this problem by compiling two separate lists of Web sites, one of URLs that they deemed should be blocked according to the filters’ criteria, and another of URLs that they deemed should not be blocked according to the filters’ criteria. They compiled these lists by choosing Web sites from the results of certain key word searches. The problem with this selection method is that it is neither random, nor does it necessarily approximate the universe of Web pages that library patrons visit.
The two other studies, one by David Biek, head librarian at the Tacoma Public Library’s main branch, and one by Cory Finnell of Certus Consulting Group, of Seattle, Washington, chose actual logs of Web pages visited by library patrons during specific time periods as the universe of Web pages to analyze. This method, while surely not as accurate as a truly random sample of the indexed Web would be (assuming it would be possible to take such a sample), has the virtue of using the actual Web sites that library patrons visited during a specific period. Because library patrons selected the universe of Web sites that Biek


