Priyank Mohan: February 2010

Friday, February 26, 2010

Bill and Melinda Gates Foundation: Philanthropy goes Semantic

Bill and Melinda Gates Foundation is undoubtedly one of the most transparent and largest privately owned philanthropic organization in the world. The foundation's work ranges from providing vaccines to prevent childhood diseases to cutting edge research to improve agricultural yields and prevent malaria, among other things. Though, philanthropy is philanthropy and there is no big and small in that world as anybody who is doing it genuinely is a unique entity - you only need to travel outside US, Western Europe and few other developed nations to understand how desperately it is needed in rest of the world. Still, I believe that this particular foundation is very visionary in all its pursuits and focus areas if you follow them closely. Today, I was pleasantly surprised when I was contacted by Fenton communications who handles communications for them to write about this interesting press release on my blog. This is the bare minimum I can do as I understand the importance of what they are trying to accomplish - I have also been fortunate to work with few philanthropic organizations and I understand how valuable this concept can be for them.

The Bill and Melinda Gates Foundation is funding a new digital-media hub call ViewChange.org. The hub will use semantic technology to create a platform that combines the video sharing power of YouTube with the open information of Wikipedia and the mission of your favorite advocacy organization.

ViewChange.org is being created by a social change organization, making it one of the first time a non-profit is on the leading edge when it comes to technological innovation. They’re partnering with Zemanta, FreeBase and OpenCalais - the three well known companies in the semantics world.

Actor Danny Glover announced the launch of the project today via email and in a video.

ViewChange.org is using the power of semantic technology to make videos, articles, blogs, and actions readily available to people working in global development. While watching high-impact video stories on the site, viewers can choose to dig deeper by exploring up-to-date details on which organizations are involved, links to related content, and lists of relevant actions they can take.

For example, imagine you are watching a short documentary about clean water issues in India. As the video plays, adjacent windows will dynamically generate links to actions and media directly related to each scene. These could include organizations involved in clean water and sanitation, action campaigns related to water issues, relevant videos from YouTube, articles from research organizations, and the latest updates from news services and blogs.

This is a great cause and it will be immense help to anybody who is doing anything noble for any society. Please spread the word in any way you can.

Thursday, February 25, 2010

Where does my money go? A begining for an open government in US, UK and rest of the world

Once James Madison, the political philospher and the fourth president of America, said "If men were angels then no government would be necessary." Though, there are no clear historical records available about the first official government, democracy or parliament, despite various claims by few old civilizations, but I guess that men figured out much before James Madison that they can never be angels and they will always need a goverment to live happily and prosper. I am sure people just didn't want any government but also hoped and craved for a smarter, open and transparent government. But it seems nobody could define it clearly in last so many generations what openness and transparency really means for a government. The good news is that all of it is changing! Surprisingly, for the first time "data" is taking the lead in defining an open government - maybe because it is measurable and never lies.

The two big initiatives, data.gov and data.gov.uk (still in Beta), were launched by US and UK government respectively in May 2009 and January 2010. Infact, Prime minister Gordon Brown of UK asked Berners-Lee to look at access to government data in June, after Barack Obama's administration launched an open source data site. Well, UK already ranks number three in the OECD (Organisation for Economic Cooperation and Development) study behind Austria and Portugal in the sophistication of its e-services so making it data available was the next logical step. At high level, the goals of both the initiatives are same as they want to make the government data available online to general public for improved access; creative use of that data outside the walls of the government; public participation, collaboration and feedback; identify unexpected and insightful data relationships - insights that would normally take several decades and hundreds or thousands of brilliant socialist scientists, statisticians, psychologists, focus groups and public policy experts to simply suspect. Both these initiatives are great because of the intention behind them but lets take a closer look at the state of the initiatives.

The data.gov started with forty seven data sets but already has thousands of datasets from eighty-one US agencies. It links directly to data files in various formats including CSV, XML, Excel, and KML. A lot seems to be lacking though:

It makes little effort to highlight or promote any projects that uses the data from the site
The focus is more on a repository
What you do with the data is not very clear
The website needs lot of work in terms of clarity and user experience
It is still not developer friendly and needs to develop an ecosystem
There's no basic demographic data like population from the Census Bureau
Browse and search functionality seems to be missing

The Sunlight labs, a DC based non-profit organization, is working on some projects to take advantage of this data but it seems you need to have larger developer community doing the same. I have also seen some good applications from the team of James Hendler of Rensselaer Polytechnic Institute, USA. His team is converting the data sets into RDF and taking advantage of semantic technolgies to build few applications. For e.g:

One of the application is about the amount of money received by the government for corporate and personal income taxes projecte through 2014 - click on the link to access it.
If you want to know how knowledgeable is your state - click on the link to access it.

I am still not sure why Linked data (Semantic Technology) approach was not taken from the begining. Overall, data.gov in US has still a long way to go before its goals are met. Ideally, it will be great to see more impressive applications which uses data from different sources and gives you an insight about a specific problem. Nevertheless, it is moving forward - it is also understandable that managing and simplifying the process of publishing humungous data from so many agencies is a herculean task.

Now, when I look at data.gov.uk then I have to say that I am just simply impressed considering the progress they have made in six-seven months. Kudos to the team, along with Sir Tim Berners Lee, who has been working on it. They just used the semantic technology, basically linked data, approach from the begining. It also has a modern design with a very developer friendly approach. Combining data and creating mashups from different sources in this context is not an easy task but semantic technologies have made it possible. Overall, data.gov.uk's approach is simple and clear - they have used open standards, open source and open data. The website has quality and elegance written all over it even though it is in beta. They also need to figure out many things like modelling various datasets behind the scenes, encourage more participation and many other things you can think of in a project of this complexity. But the results are showing! The top ten application as rated in this telegraph article are impressive. You can see the screen shot of one of the application called "Where does my money go." Or click here to access the prototype.

If you are really keen to go deeper in the approach of UK govenment in this implementation then I will encourage you to read this document - Putting the frontline first:Smarter Government. UK might have followed US, Australia, New Zealand in implementing open and smarter government but it seems at least now that their template will be followed by rest of the world. It will be interesting to see if/when countries like India, China and Russia will follow this trend!

"This is very much the beginning. Hopefully, this is the tip of the iceberg. There is a whole lot more to do." But what a beginning! These were the words of Sir Tim Berners Lee when he launched the beta site for data.gov.uk.

Monday, February 22, 2010

After Best Buy, now Tesco adopts Semantic Technology!

It wasn't longtime back when I wrote how Best Buy is using Semantic technology to define a new trend. Now, it is Tesco , largest British retailer, who is following similar steps. Simply put, Tesco is to UK the same way Walmart is to US. Though, both of them get into each other's territory as it was almost an year back when Tesco launched Fresh and Easy,a chain of 10,000 square foot convenience stores in US to compete with Walmart.

Tesco has also been known as a company with a unique ability to manage vast reams of data and translate it into sales. It also uses information gathered from Dunnhumby, a British data mining firm of which it has majority control, to manage every aspect of its business, from creating new shop formats to arranging store layouts to developing private-label products and targeted sales promotions.

Tesco has always known to be pioneer in e-commerce and it is one of the most visited online supermarket sites in UK. They have started experimenting with RDFa in their website which seems like a logical next step. I won't be surprised if they adapt Goodrelations ontology also in near future.

Thursday, February 18, 2010

Cognition Technologies to power Microsoft's Bing now!

According to this news, Bing will also be powered by Cognition Technologies as Microsoft has licensed its technology for its search application. It is an interesting news because Congnition Technologies was always compared with Powerset who Microsoft acquired for $100 million plus in June of 2008. It is not replacing Powerset technology but adding more power to its capabilities. What is the relevance behind this non-exclusive licensing?

Though, Bing has made lot of improvements in the last year, to become more relevant in the search space and grab more share of the search market, it still needed to add more semantic capabilities in its arsenal. Bing is perceived to be good in following things:

Clean interface and a good user experience
Excellent information aggregation capability
Extracting concepts and summarization
Very useful for advanced search on Wikipedia and Freebase

Precision and Recall (the two criteria to measure the accuracy/relevancy of search results) is always a debatable topic if you ask any search company. Eveybody claims different results! As far as capability to undersatnd semantics/meaning of query is concerned, that is also debatable as you get mixed results if you test different queries with different search engines. Different techniques are used to really understand the meaning of a query by all semantic engines. But in case of Google, it gets its context from the humungous amount of data it indexes - they are processing so much data that they have lot of context around things like acronyms etc.. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn't really. There have been instances where I have seen google nailing it quite accurately! So what is left to measure the effectiveness of search engine? Probably, this is the reason, the site traffic ends up becoming the only crietria for effectiveness or success. But, it takes years to build traffic and every improvement in the search engine counts.

I believe that Congition's real value propostion in this licensing is its advanced semantic map (think of it as a combination of dictionary, thesaurus, ontology etc.) which has millions of semantic connections that are comprised of semantic contexts, meaning representations, taxonomy and word meaning distinctions. Bing should benefit from this in making its query parsing more powerful and in improving relevance!

Tuesday, February 16, 2010

HP lists Semantic Technology among the top 10 BI trends for 2010!

It is always good to know that HP has listed Semantic Technology among the top 10 BI trends for 2010 in this new white paper. Ideally, the technology enthusiasts would have liked more details about the semantic technolgy as a trend in this white paper but please keep in mind that this is a business white paper. Meant for executives who will probably scan through it! It is great to have it atleast mentioned there! The key to acceptance of any technology by business is always simplified marketing message and validation by a brand like HP.

It is also good to have it categorized under BI umbrella by a global leader! HP has become a serious player in the IT services sector after its $13.2 billion acquisition of EDS. So far, the acquisition has been working out very well as it grew its 4th quarter revenue of 2009 by 8% to $8.9 billion. Even though HP services only represents 30% of total company revenues, services created 40% of the total operating profit dollars. When technology markets mature, the revenue and margins start becoming more service-centric. Software is another high-margin business for HP. Semantic Technology market is not matured so it will have both product and services opportunity in the long run. Hopefully, HP will proactively help in its widespread acceptance. Maybe, there will be more opportunity for tools and products vendors in semantic technology space who can develop alliances with HP.

Semantic Technology is not new to HP - the most significant thing that HP has done is that it provided Jena. It’s an open source framework for building semantic Web applications. It incorporates RDF and owl APIs; it also includes a rules-based inference engine, it includes in-memory and persistent storage for the data. It includes the SPARQL query engine, and it’s by far the most popularly used framework for developing applications.

In the end, it is always difficult to say whether semantic technology will become a trend in 2010 or later because most of the companies buy/adapt products and services based on their business cycles, not on the vendors' products and services roadmap.

Tuesday, February 9, 2010

New Patent from IBM for Semantic Web!

IBM has always been a leader in patent filing. Do you know that just in 2008, it filed more than 4000 patents which is three times more than its nearest rival HP. Recently, IBM filed a patent to improve traditional tag clouds by using semantic technology. Basically, the idea is that since tags are single words and users can't have description and context with a tag, it is very limiting and value of tag diminishes as the tag space grows. For e.g. picture tagged as "dog" will not show up when the user searches for content associated with tag "puppy." You can think of hundreds of similar examples. So, with the help of ontologies and associations, you can have more meaningful, descriptive and understandable tags. For example, once this tag cloud is represented in an ontology form then a "German Shepherd" can be classified as a type of dog with attributes like eye color, fur color etc. and relationships like "owned by". You can also specify that Puppy is a yound dog in this context.

The method, explained in the attached filing, comprises of receiving a tag cloud which includes tags that hyperlink to web content. It will seperate the tag into different linguistics categories, assigning a weight to each tag, and grouping the tags into clusters, whereas tags in a cluster are associated with a context. The server will have components like linguistic analyzer, semantic domain analyzer, taxonomy builder, attribute analyzer, relationship analyzer and ontology generator. Like any onology generator, the process will be iterative in nature. As a result of this, it will eventually lead to more accurate searches for the content you are looking for.

All of it makes sense to me but I wonder if there are risks associated with companies in future, who might try to accomplish similar goals using different flavors of technology, without infringing on this patent. I will let patent lawyers figure this out in future.

Saturday, February 6, 2010

Your new Virtual Assistant on iphone: Can it deliver?

It always surprises me how little is known about Siri, the virtual assistant, outside the Semantic Technology circle which is still very small community. Intuitively, everyone understands what a virtual assistant can do or deliver but most of the people still think that these technologies are not really ready for serious use. For those of you who don't know, Siri was born out of SRI's CALO Project, the largest Artificial Intelligence project in U.S. history. (CALO stands for Cognitive Assistant that Learns and Organizes). Made possible by a $150 million DARPA (Defense Advanced Research Projects Agency) investment, the CALO Project included 25 research organizations and institutions and spanned 5 years. Siri is bringing the benefits of this technology to the public in the first mainstream consumer application of a virtual personal assistant. The Siri application is just released for iphones and you can download it for free.

Siri - The Personal Assistant in your Phone from Tom Gruber on Vimeo.

It uses Semantic Technology for intelligent mash-ups that automatically make connections, takes action and communicates information based on dimensions such as personal data, theme or task awareness, time and location awareness much the same way a real live virtual assistant working the Internet could. It is almost like a very personalized semantic search focused on concierge-oriented tasks but can also take voice as input.

I downloaded the application on my iphone and it did manage to surprise me pleasantly. This technology has come a long way! I did face few issues as sometimes it struggled to understand what I am trying to say. Actually, it uses Nuance technology for voice recognition.I don't know how it will work outside United States - basically, lets say if you are in India, China or Russia then can this virtual assistant deliver the same help as it can do in US?

All of us need virtual assistant at some point in our professional and personal life - more when we think that time spent on some tasks is not worth our time. Ofcourse, we won't mind as long as this virtual assistant is working free for us and giving us some extra input. Will we trust it enough to do high-value transactions just based on its advice? Maybe buying a movie ticket is not a big risk and the prices are almost standard. But beyond that? It really comes down to confidence we need to develop gradually in our virtual assistant so that we can start delegating more and more tasks to Siri or other similar virtual assistants in future. Nothing different from level of complexity of tasks which we assign to out real life assistant - the trust needs to be earned everyday.

Release of Siri on iphone is still a very important milestone in a new era of AI and semantic technology-based applications.Can Siri build a formidable user base and become synonymous with the consumer Internet? It still has a long way to go and company will have to learn many things from its user base. Can it make enough money by just charging its affiliate network and giving the application to consumers for free? Time will tell this. I still think that the company can eventually make more money by making it more focused for business-oriented professionals. It just needs to think more about applying its technology to solve unique use cases. In the business scenario, it will still have to compete with other flavors of virtual assistants likeTimesvr, which is pronounced Time Saver. It is virtual assistant of different kind - an offshore-based online service which provides a task based (as opposed to assistant based) solution, where each task goes into a queue and is handled by whatever assistant (a real person) is available and qualified. It works surprisingly nicely and economically for few. Well, you can argue, it is not really fair to compare a top-notch AI/Semantic Tech. startup with an offshore-based service whose business model is basically labor arbitrage. But this is the reality of this flat world!You have to compete at every level and in all the fronts to make business sense!

Wednesday, February 3, 2010

Measuring Semantics!

A lot has been written and discussed about the power and benefits of semantic technology. But still when it comes down to quantifying the benefits of semantic technology, you will find very few case studies where it is measured or some kind of ROI analysis has been done. It is always good to know about this case study by Telefonica which is one of the largest fixed-line and mobile telecommunications companies in the world: third largest in terms of number of customers only behind China Mobile and Vodafone, and in the top five in market value. They used semantic technology for their tariff calculation which is a very complex acitivity for an operator of their size. They could achieve 80% reduction in working hours and 75% reduction in errors for activities related with tariff calculation. If Semantic Technology has to become mainstream in the enterprises then we will continue to need more case studies like this.

Semantic technologies for the enterprise. Use case