There has been an exponential growth in the amount of image data that is available on the World Wide Web since the early development of Internet. With such a large amount of information and image available and its usefulness, an effective image retrieval system is thus greatly needed. In this paper, we present an effective approach with both image matching and indexing techniques that improvise on existing integrated image retrieval methods. This technique follows a two-phase approach, integrating query by topic and query by example specification methods. In the first phase, The topic-based image retrieval is performed by using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. This technique consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. In the second phase, we use query by example specification to perform a low-level content-based image match in order to retrieve smaller and relatively closer results of the example image. From this, information related to the image feature is automatically extracted from the query image. The main objective of our approach is to develop a functional image search and indexing technique and to demonstrate that better retrieval results can be achieved.
Ever since the advent of Internet, there has been an immense growth in the amount of image data that is available on the World Wide Web. With such a magnitude of image availability, an efficient and effective image retrieval system is required to make use of this information. This research presents an effective image matching and indexing technique that improvises on existing integrated image retrieval methods. The proposed technique follows a two-phase approach, integrating query by topic and query by example specification methods. The first phase consists of topic-based image retrieval using an improved text information retrieval (IR) technique that makes use of the structured format of HTML documents. It consists of a focused crawler that not only provides for the user to enter the keyword for the topic-based search but also, the scope in which the user wants to find the images. The second phase uses the query by example specification to perform a low-level content-based image match for the retrieval of smaller and relatively closer results of the example image. Information related to the image feature is automatically extracted from the query image by the image processing system. A technique that is not computationally intensive based on color feature is used to perform content-based matching of images. The main goal is to develop a functional image search and indexing system and to demonstrate that better retrieval results can be achieved with this proposed hybrid search technique.
The enormous growth of the World Wide Web has made it important to perform resource discovery efficiently for any given topic. Several new techniques have been proposed in the recent years for this kind of topic specific web-mining, and among them a key new technique called focused crawling which is able to crawl topic-specific portions of the web without having to explore all pages. Most existing research on focused crawling considers a simple topic definition that typically consists of one or more keywords connected by an OR operator. However this kind of simple topic definition may result in too many irrelevant pages in which the same keyword appears in a wrong context.
In this research we explore new strategies for crawling topic specific portions of the web using complex and precise predicates. A complex predicate will allow the user to precisely specify a topic using Boolean operators such as "AND", "OR" and "NOT". Our work will concentrate on defining a format to specify this kind of a complex topic definition and secondly on devising a crawl strategy to crawl the topic specific portions of the web defined by the complex predicate, efficiently and with minimal overhead. Our new crawl strategy will improve the performance of topic-specific web crawling by reducing the number of irrelevant pages crawled.
In order to demonstrate the effectiveness of the above approach, we have built a complete focused crawler called "Eureka" with complex predicate support, and a search engine that indexes and supports end-user searches on the crawled pages.
As the Web technology is progressing, XML (Extensible Markup Language) has become a new data exchange format for future Web mining and applications. Various XML middleware have been developed for transferring Web data stored in relational databases to XML documents enabling a uniform data searching technique. However, these developments are not efficient due to their using extra memory and database resources that lead to poor scalability in Web mining development. In this paper, we explore the research on building an efficient XML middleware for large-scale Web mining applications. Our approach is that if the XML structure can be properly embedded in the creation of relational content during XML middleware table construction, the data in the relational database could be retrieved with minimum amount of memory and database resources. The results of our approach will be analyzed and comparisons with related research will be made.
Conference Committee Involvement (3)
Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006
17 April 2006 | Orlando (Kissimmee), Florida, United States
Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005
28 March 2005 | Orlando, Florida, United States
Data Mining and Knowledge Discovery: Theory, Tools, and Technology VI
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.