Priyank Mohan: September 2009

Wednesday, September 30, 2009

SQL versus Xquery versus SPARQL!

There is no argument that sql is the most widely used query language and has been in use for decades. It is a long established standard and has been primarily used for tabular data. Millions of developers already know it and tt always tends to get complex when more tables are used. I will avoid talking more about it as I am assuming that people who are reading this blog will know about the limitations and pain points of sql.

Lets focus more on the new query languages like xquery and sparql:

Xquery is great for finding data in tree representations. You will find almost all relational database vendors mixing xquery with sql though it tends to get very complex. Xquery is even becoming part of sql standard now. Xquery works great with native xml databases like marklogic and xhive (now Documentum). Xquery is most popular with content publishers. It is a standalone programming language for information intensive applications. To get deeper into this, please use this brilliant prez.. Xquery is still a relatively new standard so the developer community is nowhere in the league of sql.

Sparql is more useful in a pattern matching paradigm when you have to traverse many relationships. It makes federated queries possible, and federated queries can be thought of as queries made simultaneously to multiple, disparate databases that could be located anywhere on a network. This is a key concept in making disparate and separate data stores seem like a single respository. Those who want to dive deeper into sparql should use this tutorial.

Why Cambridge Semantics can make a difference during these times?

There is no doubt that these are hard times for smaller software companies to get into the radar of a CIO/CFO of any enterprise - small, medium or big. All of us know that in any case only 20% of budget is allocated for new projects or products. It will be much harder for a company who is smaller and is offering a relatively new technology which only very small percentage of technology folks have even heard of it. I really like the practical approach and focus of Cambridge Semantics, a Boston based semantic technology startup.

They are not asking the end users to write big Ontologies or think about any other paradigm shift. They use semantic technology to solve some very common problems like consolidating various spreadsheets. The Anzo suite of products enables you to liberate your data, allowing it to be exchanged freely between proprietary applications, relational databases or even Microsoft Excel spreadsheets. They simply leave the data where it is, read and write data between multiple data sources including spreadsheets and let the user create the view they want. The application can be up and running in few days.

Recently, UCB, a leading biopharma company, selected Cambridge Semantics to combine and analyze assay data from dozens of partners around the world. You can get the full news here.

Monday, September 28, 2009

Interesting research from European Semantics reseachers!

The research, called "Triple space", is related with making machine talk more in an asynchronous way using web services. The analogy is very similar to cloud computing in which computational resources are distributed and provided as a service over the internet, the Triple Space deals with data – offering a simple, scalable way for machines to share information asynchronously.
To create the Triple Space, the TripCom researchers worked on making web services and the data they use understandable by computers, using semantic web technologies to communicate machine-readable knowledge rather than raw data.

Just as humans can access the same webpage with different web browsers and different operating systems, computers are able to publish and read information in the Triple Space without format, process or technical constraints. One of hte use cases cited is e-Health record systems. For full story, please go to :

http://www.sciencedaily.com/releases/2009/09/090923105631.htm

Is Bing taking the market share from google?

The story posted in cnnmoney.com reads like this : "Microsoft stages a big comeback." The story talks about new offerings from Microsoft like windows 7, cloud computing and Bing - their new semantic search engine. Microsoft, who is viewed as a laggard behing google in the search/advertising business seems to be getting more market share from its rival. Its share of online business has grown to 9% (some sources even say it is 10%)after it released Bing.com . The 9% takes msn also into account. If they manage to make it to 15%-20% range then it will transalte into billions of dollars.

For full story, you can go to:
http://money.cnn.com/2009/09/28/technology/Microsoft_stock.moneymag/index.htm?postversion=2009092804

Friday, September 25, 2009

Google adds semantic web support for video search!

Google, who has been behind Yahoo in semantic web efforts, is trying to catch up fast. Google announced support for enhanced markup for video search. This will allow webmasters to include important information, such as titles and descriptions, in machine-readable HTML along with the JavaScript or Flash videos themselves. It will use structured data open standards such as microformats and RDFa to give users more detailed previews of the information contained on the web page. Yahoo! searchMonkey has done something very similar for the content.

Tim Berners Lee video about his journey and vision!

This interesting video was published this year.

http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

He talks about his journey of last years from regular web to the semantic web.

Thursday, September 24, 2009

How Semantic Technology helps Data Strategy? The discipline of Business Semantics!

Data Strategy has again become a "hot topic" in the enterprise. The two things which are of utmost importance in this area or require painstaking work are:

1) Data Standardizaton - Agreeing what to call things, which things are equivalent

2) Data Governance - Who "owns" the data, who is the steward, who gets to change things, who is responsible for quality

If you really think about it then more time is spent on analyzing the quality of data and doing reconciliation then anything else. Bottomline is that there has to be a common language between business and IT. Business semantics or developing a dictionary based solution is the key here. You can call this business dictionary as a knowledge base or a an ontology based solution.

In my view, an “Enterprise Ontology” in the Semantic world is equivalent to an Enterprise Data Model in the Relational world. Developing an ontology is similar to creating a data model. However, one important difference is that ontologies allow you to seamlessly move from logical design to physical implementation. Also if you have established a business dictionary using the Web Ontology Language (OWL), you have encoded it in machine-processable as well as human-readable form.

Relational databases (and the applications that use them) behave as if every row in a table is unique. So when a customer database contains "Microsoft” and “Microsoft Inc.,” the system behaves as if they are two different customers. In the world of ontologies, we should be able to simply declare the equivalence of the two instances and be done with it.

Business Semantic is a fresh market and there are very few case studies out there. This market will only grow in the long run as using semantic technology approach for ensuring data lineage and data quality is the best possible approach from flexibility and maintability perspective. There is a start up in Belgium called Collibra which is actively working on it using purely semantic approach.

Web 2.0 versus Web 3.0! Which is real and has more long-term value?

Recently, someone forwarded me this interesting video - http://www.youtube.com/watch?v=oalBUgzKaLw&hl . This video argues whether social media or web 2.0 to some is fad or not. In my opinion, this video is powerful! Social media is a great concept and it works - it is exploited very cleverly by early adapters. But there is not much you can do from enterpreneurship or innovation perspective. I see that there are three categories: 1) Facebook, Linkedin, myspace, youtube, wikipedia etc. etc.. - Basically user-generated- content! The game is pretty much limited as there is opportunity for very few players here. All these companies are real and are going to stay for long. The only way to make money here is through advertising. 2) Software companies who are making wikis, blogs and other interesting ways to collaborate etc. - Sharepoint will be a good example. This is very horizontal in nature. The focus is productivity. There is not much scope for innovation here also and it is very saturated market. In the end, it is all about configuration, governance and best practices. 3) Wikinomics examples - Top companies are inviting people/customers/clients to give design/product ideas etc.. e.g Nissan or P&G inviting their loyal customers for new ideas. In the end social media is not about software but the power to attract a certain segment who has common interests and are interested in contributing content. Number 1 category is the only one where you might see some limited innovation in a verticalized domain. This is the only category where very few will ever make any money using advertisement. Social media is really about people to people. Tim Berners Lee, the father of web, goes on to say that "if Web 2.0 for you is blogs and wikis, then that is people to people. But that was what the web was supposed to be all along." There is just no new technology here! The viedo, mentioned above, doesn't talk about the business value of TV and radio aagainst social media. When TV and radio were invented, there were two kinds of people who benefitted tremendously: a) Those who designed/manufactured and sold them. b)The people who created content/channels/programs - this is endless cycle as you always need to create a new content and a vast industry mushroomed around it. Social Media is all about user-generated content but people who create content gain not much monetarily. Whereas, In case of TV and radio, people who created content benefitted tremendously. So Where does this leave Semantic Web or Web 3.0 called by many? The key concepts have been explained in earlier blogs and I will continue to do in future blogs. But Web 3.0 is really a data web or executable web. It is about reassembly of data and reorganization of data pieces. It can help in solving problems from drug discovery to national security. Even a semantic wiki has alot to offer than a regular wiki. We are going to see unprecedented level of innovation in this space.

Wednesday, September 23, 2009

How Open Calais initiative is helping Semantic Technology!

Open Calais initiative, started by Thomson-Reuters, is one of the most interesting things which has happened in favour of semantic web vision. Reuters acquired this technology as part of their ClearForest, one of the leading vendors in the text analytics space, acquisition. It is stated that the service could quickly become the largest repository of metadata (in the form of named entites and facts) on the Web if it stored the resulting metadata from each request. Open Calais is the "metadata extraction service" ; it is a Web service that allows you to automatically annotate content and extract information like facts and named entities (people, places, and organizations, and much more) from unstructured text. Calais uses linguistic parsing (also known as entity extraction) in a service enables way to producr RDF triples and Semantic Web data models.
Open Calais opens the door to the possibility of lowering the barrier enough for everyday users to publish semantic content. It finally does what critics say to be the greatest obstacle to the Semantic Web: Taking the metadata burden from the end-user by providing an automatic meta-tagging tool. Open Calais initiative will also be one of the biggest enabler of the Linked data initiative.
Recently, CNET has joined OpenCalais initiative as one of the first commercial media companies to publish core data assets for public, programmatic use on the open semantic Web. CNET will leverage OpenCalais' connection to the rapidly expanding 'Linked Data cloud' to allow its original content -- such as tech product reviews on laptops, TVs, smart phones, and digital cameras; news articles and blog posts from its CNET News editorial staff; and parts of its core technology product catalog - to be available for public use.

Tuesday, September 22, 2009

Example of Semantic Technology in action - Yahoo Search Monkey

Its is a misconception when people question the viability of semantic technologies by saying that "it isn't possible to convert all the data to RDF format?" In reality , it is not true at all. Again, it is important to keep in mind that we are not talking about semantic web, we are talking about semantic technologies. It can be best explained by talking about how Yahoo Search Monkey is using RDFa (RDF in Attributes) to enhance search results more useful and visually appealing. It is just one of the simple examples but it can hopefully answer the sceptics. It can be a boon to small businesses who can drive more traffic to their web sites. SearchMonkey looks for special data inside websites, based on a standard called RDFa. Your website should include this data so it is available to Yahoo as they crawl and index your site. This way your business information is available to any developers who build SearchMonkey apps, and you will show up with enhanced results as this gets adopted over time. The SearchMonkey platform has three main components: - "Site owners share structured data with Yahoo!, using semantic markup (microformats, RDF), standardized XML feeds, APIs (OpenSearch or other web services), and page extraction. -Third party developers build SearchMonkey applications. -Consumers customize their search experience." RDFa is a way to encode data within HTMLand XHTML pages which helps people and machines to embed structured data within HTML and XHTML pages. The underlying representation of RDFa is RDF because it is flexible enough to let publishers build an devolve their own vocabularies. You can see Search Monkey in action by clicking on this link http://www.yelp.com/search?find_desc=nobu&ns=1&rpp=10&find_loc=San+Francisco%2C+CA#find_loc=new%20york
You can see the enhanced quality of the result "nobu new york restaurant". You will probably realize that you have many using Semantic technologies without even realizing it.