This phenomenon occurs for a number of reasons explicated in the more detailed findings of fact supra. These include limitations on filtering companies’ ability to: (1) harvest Web pages for review; (2) review and categorize the Web pages that they have harvested; and (3) engage in regular re-review of the Web pages that they have previously reviewed. The primary limitations on filtering companies’ ability to harvest Web pages for review is that a substantial majority of pages on the Web are not indexable using the spidering technology that Web search engines use, and that together, search engines have indexed only around half of the Web pages that are theoretically indexable. The fast rate of growth in the number of Web pages also limits filtering companies’ ability to harvest pages for review. These shortcomings necessarily result in significant underblocking. Several limitations on filtering companies’ ability to review and categorize the Web pages that they have harvested also contribute to over- and underblocking. First, automated review processes, even those based on “artificial intelligence,” are unable with any consistency to distinguish accurately material that falls within a category definition from material that does not. Moreover, human review of URLs is hampered by filtering companies’ limited staff sizes, and by human error or misjudgment. In order to deal with the vast size of the Web and its rapid rates of growth and change, filtering companies engage in several practices that are necessary to reduce underblocking, but inevitably result in overblocking. These include: (1) blocking whole Web sites even when only a small minority of their pages contain material that would fit under one of the filtering company’s categories (e.g., blocking the Salon.com site because it contains a sex column); (2) blocking by Ip address (because a single Ip address may contain many different Web sites and many thousands of pages of heterogenous content); and (3) blocking loophole sites such as translator sites and cache sites, which archive Web pages that have been removed from the Web by their original publisher.
Finally, filtering companies’ failure to engage in regular re-review of Web pages that they have already categorized (or that they have determined do not fall into any category) results in a substantial amount of over- and underblocking. For example, Web publishers change the contents of Web pages frequently. The problem also arises when a Web site goes out of existence and its domain name or Ip address is reassigned to a new Web site publisher. In that case, a filtering company’s previous categorization of the Ip address or domain name would likely be incorrect, potentially resulting in the over- or underblocking of many thousands of pages. The inaccuracies that result from these limitations of filtering technology are quite substantial. At least tens of thousands of pages of the indexable Web are overblocked by each of the filtering programs evaluated by experts in this case, even when considered against the filtering companies’ own category definitions. Many erroneously blocked pages contain content that is completely innocuous for both adults and minors, and that no rational person could conclude matches the filtering companies’ category definitions, such as “pornography” or “sex.”


