Tuesday, July 14, 2009

What do we mean by "Semantics" in Semantic Search?

I have often been asked this question in last few months by clients and other people who are interested in finding more about the activities in the world of search. Is it same as natural language search? Or it is just applicable for semantic data expressed in RDF and OWL? So companies who are developing semantic search are basically trying to throw google out? How relevant is it for an enterprise. These are the standard questions which I often get asked. At high level, semantic search engines is about interpreting meaning of the "query" in a smart way. Most of todays search engines today are primarily based on "key words" focused though the trend has been changing since early 2008. If you ask a question: q: What is the time difference between US and Tanzania ( a country in Africa)? average search engine : Will give you all documents having key words 'time difference', 'US' and 'Tanzania'. This is the case with millions of web sites who are using "plain vanilla" search. good search engine (having some semantic capabilities) : Will probably understand the intention of the query and give you links to the websites which talks about time difference between countries. There are still various inconsistencies in the answers if you test any of the popular search engines out there. You just need to ask the same question in a different way to validate the 'semanticity" of the engine. A Real Semantic Search Engine : Will give you a time difference by getting the right content or it will tell you that it doesn't know the answer. This is very difficult to attain on a consistent basis by by any technology out there. So most of the discussion will focus on good semantic search engines. Even though, the popular search engines like google, yahoo, bing (Microsoft), ask.com etc. claim that they have semantic abilities, they will still be in the category of a good search engine. They are hybrid of "key word" and "semantic" based search. None of these engines are purely semantic in nature. Google is very clear about its messaging - they are not going to replace key word search with semantic search capabilities. Most of the search engines today are really based on how the query is phrased. They do a poor job in understanding the meaning of the query if is phrased in different way. Semantic search engines should do disambiguation very cleverly - If someone is talking about "furniture", the semantic engine should even cosnider documents with and "tables" and "chairs" in it. Basically, it should understand the context. A good semantic search engine takes the burden from the user to answer a query even if it is asked in very different way. It is also very important to understand that these engines are not getting their content from semantic web. They are still working with the same set of millions of document in the regular web. They haven't touched semantic web for your query needs. The "semantic" aspect in these search engines are only related with their ability to interpret the query. Swoogle is the only semantic search engine which only fetches the content from the semantic web - basically data written in RDF format. The need to improve the "semanticity" of search engines will grow exponentially. It is very difficult to measure it though. There are primarily two categories : Pure play semantic search engines like Hakia,Sensebot, Congnition Search, Exalead, Powerset and many others. And the other category is Google,Yahoo, Microsoft and Ask.com of the world who are integrating semantic search algorithms in their core search technology. The line is blurring between these two categories. There is not just one approach to semantic search. Most semantic search engines mix and match them in various ways to yield a unique search experience for their users. There are at least four approaches to semantic search. Different semantic search engines may use one or more of these approaches. The point of semantic search is to use meaning to improve the user's search experience. For example, one approach is to use contextual analysis to help to disambiguate queries. Another approach focuses on reasoning. Given a set of facts that are represented in the system, additional facts can be inferred from them. A number of semantic search engines emphasize natural language understanding. These engines process the content they index and the queries people submit to try to identify the intent of the information. They use the syntax of the sentence and rules to identify people, places, organizations, and so forth. Powerset makes extensive use of natural language understanding. The fourth approach uses an ontology to represent knowledge about a domain and expand queries. On this approach, when a user enters a query for a word like "sofa sets," the system adds terms from its ontology (e.g., "furniture" because a sofaset is a kind of furniture) to make the search more focused as well as more broad. This approach is used by a large number of semantic search systems. Google and Yahoo will continue to have an edge because of existing vistors to their site and their continued investments in these technologies. Despite claims by various vendors, semantic search engines have a long way to go. There is a big gap between the existing information out there versus the tools which can actually get that information.

Monday, July 13, 2009

How do you explain Semantic Technology to someone who has never heard of it?

“Any intellegent fool can make things bigger and more complex. It takes a touch of genius - and a lot of courage to move in the opposite direction.” Albert Einstein

This blog is not about me trying to be a genius in a subtle way. But is about a question which has been bothering me for some time. How do you explain a concept like semantic technology to someone higher up in your organization who will never have the time to go to a conference or a read a book on OWL? The idea is to simplify it in simple words but at the same time convey the value of it. Explaining it as Web 3.0, read-write-execute web or rebirth of AI won't really help.

IMHO, semantic technology can be explained as a way to describe data models and representation which allows to be linked with other data models as if they are part of the same big gigantic database.
Semantic technology solves data integration problem in a big way. It simplifies the way you can connect and exchange data with many systems. The value is that you can maintain it much easily. Also, with the introduction of new semantic tools in the marketplace, we will continue to see that it will be maintained by the business users with mimimal code change. THe W3C, the standards committe behind the semantic web, has done phenomenal job with coming up with languages like RDF and OWL which makes all of it possible.
Business have already made substantial investments in databases, data warehouses, BI, ILM, ERP, CMS and enterprise search. Forming a complete picture of enterprise data is very hard to achieve. Tracking data on timely basis is one of the biggest business and technical challenge! At the end of the day, we need relaible information about business performance! Semantic web technologies, if applied correctly, can solve this problem without scraping your existing investments.

This primer http://www.semantic-conference.com/primer.html might be useful to more inquistive kinds.

Sunday, July 12, 2009

Semantic Conference 2009

I was fortunate to be one of the attendees at the Semantic Conference 2009 held in Fairmont hotel, San Jose, Ca. I have to admit that I was very impressed to witness a huge crowd, probably 1300+, during these recessionary times. I was told by one of the organizers that the number of attendees has been growing considerably since the inception of this conference back in 2005. It was also very encouraging to see more than 20% of the attendees from the business side that just validates the fact that semantic technology is a no longer confined to R&D labs and intellectual discussions. There is a strong interest to apply this technology to derive maximum business value from it. Some notable sessions were:
  • Keynote from Reuters about state of Semantic technology and where is the money
  • Semantic Search discussion between C-level executives from Microsoft, Google and Yahoo
  • At least two sessions from the VC community expressing strong interest in the business side of semantic technologies
I had no doubt, even before going to this conference, that semantic technology can solve some problems which is very difficult to solve using the traditional or well-accepted technologies in the enterprise. I was was very disappointed to see almost lack of representation from even a single well known name from the Wall Street. It is well-known fact that financial industry has been an early adapter of most of the new technology but it seemed this wasn't the case in this context. Maybe, the turmoil in the financial industry is taking its toll on everything. On a brighter side, I saw a large number from the health care and pharma companies who had presented many use cases and application of this technology to solve some real business problems. I was also surprised to see a large convoy from Europe who presented many government-sponsored projects in the semantic technology. It seems that Europe has finally decided to take a very pro-active and leadership role in advocating semantic technology!
To sum it all, I left the conference with a smile on my face and with added enthusiasm for the semantic technology. There is no doubt left in my mind that it is just a question of time when this technology will be a mainstream technology and business benefits will be very real.
More to follow in the next blog ..