Children's Internet Protection Act (CIPA) Ruling eBook

United States District Court for the Eastern District of Pennsylvania
This eBook from the Gutenberg Project consists of approximately 196 pages of information about Children's Internet Protection Act (CIPA) Ruling.

Children's Internet Protection Act (CIPA) Ruling eBook

United States District Court for the Eastern District of Pennsylvania
This eBook from the Gutenberg Project consists of approximately 196 pages of information about Children's Internet Protection Act (CIPA) Ruling.

The first method, entering certain keywords into commercial search engines, suffers from several limitations.  First, the Web pages that may be “harvested” through this method are limited to those pages that search engines have already identified.  However, as noted above, a substantial portion of the Web is not even theoretically indexable (because it is not linked to by any previously known page), and only approximately 50% of the pages that are theoretically indexable have actually been indexed by search engines.  We are satisfied that the remainder of the indexable Web, and the vast “Deep Web,” which cannot currently be indexed, includes materials that meet CIPA’s categories of visual depictions that are obscene, child pornography, and harmful to minors.  These portions of the Web cannot presently be harvested through the methods that filtering software companies use (except through reporting by customers or by observing users’ log files), because they are not linked to other known pages.  A user can, however, gain access to a Web site in the unindexed Web or the Deep Web if the Web site’s proprietor or some other third party informs the user of the site’s URL.  Some Web sites, for example, send out mass email advertisements containing the site’s URL, the spamming process we have described above.  Second, the search engines that software companies use for harvesting are able to search text only, not images.  This is of critical importance, because CIPA, by its own terms, covers only “visual depictions.” 20 U.S.C.  Sec. 9134(f)(1)(A)(i); 47 U.S.C.  Sec. 254(h)(5)(B)(i).  Image recognition technology is immature, ineffective, and unlikely to improve substantially in the near future.  None of the filtering software companies deposed in this case employs image recognition technology when harvesting or categorizing URLs.  Due to the reliance on automated text analysis and the absence of image recognition technology, a Web page with sexually explicit images and no text cannot be harvested using a search engine.  This problem is complicated by the fact that Web site publishers may use image files rather than text to represent words, i.e., they may use a file that computers understand to be a picture, like a photograph of a printed word, rather than regular text, making automated review of their textual content impossible.  For example, if the Playboy Web site displays its name using a logo rather than regular text, a search engine would not see or recognize the Playboy name in that logo.

In addition to collecting URLs through search engines and Web directories (particularly those specializing in sexually explicit sites or other categories relevant to one of the filtering companies’ category definitions), and by mining user logs and collecting URLs submitted by users, the filtering companies expand their list of harvested URLs by using “spidering” software that can “crawl” the lists of pages produced by the previous four methods, following

Copyrights
Project Gutenberg
Children's Internet Protection Act (CIPA) Ruling from Project Gutenberg. Public domain.