Tuesday, July 14, 2009
What do we mean by "Semantics" in Semantic Search?
I have often been asked this question in last few months by clients and other people who are interested in finding more about the activities in the world of search. Is it same as natural language search? Or it is just applicable for semantic data expressed in RDF and OWL? So companies who are developing semantic search are basically trying to throw google out? How relevant is it for an enterprise. These are the standard questions which I often get asked. At high level, semantic search engines is about interpreting meaning of the "query" in a smart way. Most of todays search engines today are primarily based on "key words" focused though the trend has been changing since early 2008. If you ask a question: q: What is the time difference between US and Tanzania ( a country in Africa)? average search engine : Will give you all documents having key words 'time difference', 'US' and 'Tanzania'. This is the case with millions of web sites who are using "plain vanilla" search. good search engine (having some semantic capabilities) : Will probably understand the intention of the query and give you links to the websites which talks about time difference between countries. There are still various inconsistencies in the answers if you test any of the popular search engines out there. You just need to ask the same question in a different way to validate the 'semanticity" of the engine. A Real Semantic Search Engine : Will give you a time difference by getting the right content or it will tell you that it doesn't know the answer. This is very difficult to attain on a consistent basis by by any technology out there. So most of the discussion will focus on good semantic search engines. Even though, the popular search engines like google, yahoo, bing (Microsoft), ask.com etc. claim that they have semantic abilities, they will still be in the category of a good search engine. They are hybrid of "key word" and "semantic" based search. None of these engines are purely semantic in nature. Google is very clear about its messaging - they are not going to replace key word search with semantic search capabilities. Most of the search engines today are really based on how the query is phrased. They do a poor job in understanding the meaning of the query if is phrased in different way. Semantic search engines should do disambiguation very cleverly - If someone is talking about "furniture", the semantic engine should even cosnider documents with and "tables" and "chairs" in it. Basically, it should understand the context. A good semantic search engine takes the burden from the user to answer a query even if it is asked in very different way. It is also very important to understand that these engines are not getting their content from semantic web. They are still working with the same set of millions of document in the regular web. They haven't touched semantic web for your query needs. The "semantic" aspect in these search engines are only related with their ability to interpret the query. Swoogle is the only semantic search engine which only fetches the content from the semantic web - basically data written in RDF format. The need to improve the "semanticity" of search engines will grow exponentially. It is very difficult to measure it though. There are primarily two categories : Pure play semantic search engines like Hakia,Sensebot, Congnition Search, Exalead, Powerset and many others. And the other category is Google,Yahoo, Microsoft and Ask.com of the world who are integrating semantic search algorithms in their core search technology. The line is blurring between these two categories. There is not just one approach to semantic search. Most semantic search engines mix and match them in various ways to yield a unique search experience for their users. There are at least four approaches to semantic search. Different semantic search engines may use one or more of these approaches. The point of semantic search is to use meaning to improve the user's search experience. For example, one approach is to use contextual analysis to help to disambiguate queries. Another approach focuses on reasoning. Given a set of facts that are represented in the system, additional facts can be inferred from them. A number of semantic search engines emphasize natural language understanding. These engines process the content they index and the queries people submit to try to identify the intent of the information. They use the syntax of the sentence and rules to identify people, places, organizations, and so forth. Powerset makes extensive use of natural language understanding. The fourth approach uses an ontology to represent knowledge about a domain and expand queries. On this approach, when a user enters a query for a word like "sofa sets," the system adds terms from its ontology (e.g., "furniture" because a sofaset is a kind of furniture) to make the search more focused as well as more broad. This approach is used by a large number of semantic search systems. Google and Yahoo will continue to have an edge because of existing vistors to their site and their continued investments in these technologies. Despite claims by various vendors, semantic search engines have a long way to go. There is a big gap between the existing information out there versus the tools which can actually get that information.