Skip to content
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Menu
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Menu
Home
About Us
Resources
Profiles Metrics
Authors Directory
Institutions Directory
Top Authors
Top Institutions
Top Sponsors
AI Digest
Contact Us
Publication Details
AFRICAN RESEARCH NEXUS
SHINING A SPOTLIGHT ON AFRICAN RESEARCH
computer science
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering, Volume 18, No. 2, Year 2006
Notification
URL copied to clipboard!
Description
Along with the ever-growing Web comes the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable Web content. In this paper, we investigate this problem and describe WebGuard, an automatic machine learning-based pornographic Web site classification and filtering system. Unlike most commercial filtering products, which are mainly based on textual content-based analysis such as indicative keywords detection or manually collected black list checking, WebGuard relies on several major data mining techniques associated with textual, structural content-based analysis, and skin color related visual content-based analysis as well. Experiments conducted on a testbed of 400 Web sites including 200 adult sites and 200 nonpornographic ones showed WebGuard's filtering effectiveness, reaching a 97.4 percent classification accuracy rate when textual and structural content-based analysis was combined with visual content-based analysis. Further experiments on a black list of 12,311 adult Web sites manually collected and classified by the French Ministry of Education showed that WebGuard scored a 95.62 percent classification accuracy rate. The basic framework of WebGuard can apply to other categorization problems of Web sites which combine, as most of them do today, textual and visual content. © 2006 IEEE
Authors & Co-Authors
Hammami, Mohamed Ali
France, Villeurbanne
Laboratoire D'informatique en Image et Systèmes D'information
France, Paris
Cnrs Centre National de la Recherche Scientifique
Chen, Liming
France, Villeurbanne
Laboratoire D'informatique en Image et Systèmes D'information
Statistics
Citations: 97
Authors: 2
Affiliations: 2
Identifiers
Doi:
10.1109/TKDE.2006.34
ISSN:
10414347
Research Areas
Environmental