Thursday, November 11, 2010

Predictive Analytics for social data: Is there a role for Semantics?

Recently, Gartner, the leading technology analyst company, came out with its predictions for key technologies for 2011. The list comprises of cloud computing, mobile applications, social collaboration, next generation analytics, social analytics and many others - shouldn’t surprise you if work in information technology. The only thing which was not clear to me that why next generation analytics and social analytics are in two different categories when social collaboration is already emphasized as a part of a roadmap for large enterprises. I am sure they must be having their own valid reasons to do so. But the point is that overall it is getting a bit confusing about the various terms which are being used in context of analytics. What I mean here is what is the difference between just analytics versus predictive analytics versus forecasting versus predictive modeling versus optimization versus data mining versus advanced analytics. To many people, it sounds same! Also, it will depend upon who you ask this question. If you are a vendor or a consultant then try explaining it to a decision maker in an enterprise - basically, try not to get into that conversation. Many of these disciplines are more than a decade old as something like predictive modeling has been used in credit scoring for years. Also, academically, there is not much difference between classic techniques used in data mining and in statistics. Though, data mining has evolved to deal better with real life messy data. Unfortunately, unlike analytics, statistics could never become a hot topic but maybe it is about to change. In short, analytics or predictive analytics is the umbrella term or the new term - maybe the buzz word. It seems there is new surge of interest in predictive analytics because it is about the future outcomes in context of business intelligence. There is a difference between insights and gaining foresight!

You can always question that companies were always worried about the future outcomes so what has changed now if many of the methods were available before also. Probably, more data is available now, and there has been advancement and simplification of  tools/techniques - you can hire a good business analyst to do the job instead of someone with a doctorate in statistics. Predictive analytics enables you to develop mathematical models to help you better understand the variables driving success. Predictive analytics relies on formulas that compare past successes and failures, and then uses those formulas to predict future outcomes. Also, if you consider the fact that IBM has spent almost eleven billion dollar in the last five years acquiring software companies, like SPSS and Unica, for its analytics consulting organization then it starts making more sense.

On top of that, social analytics is a new kid on the block and there is new buzz that it is going to play a significant role in predictive analytics. I do believe that it will become true gradually but it is  not going to happen as quickly as we are being made to believe. What are the challenges in it? From the process perspective, predictive analytics is about understanding the prediction variables to the business problem, selecting the relevant statistical technique, validating the model with the test data and finally applying/adjusting the model iteratively with the production data. Do you think that it should be very different in context of social data? First of all, just because a company is doing brand monitoring (there are just too many companies), it doesn't make it a social analytics company. What I mean here is that if they are just following converations about an entity and don’t have much semantic intelligence in their software. If you look at most of the common examples of so called social media helping in predictions, they are about topics about election predictions (as recently claimed by Facebook political team) or about how a new product like iPad is perceived - the outcome is most of the time boolean i.e success or failure. In my opinion, these are interesting examples but much more is expected to do predictive analytics from social data. Maybe, if you consider social analytics and all associated prediction with it as a seperate or standalone discipline then it is good enough - but then you don't have an integrated view from an enterprise perspective. Maybe, in some cases, you don't need to integrate social data with enterprise data and still can get some value. But, you still need to build a repeatable predictive model using the social media data. And before you build the predictive model, you need to do true semantic analysis of the social data. Build some kind of normalized social data model to work with enterprise data for predictive analytics. IBM has come out with a new offerring where it claims to have enhanced SPSS software with social analytics and it can do predictive analytics for your business needs. It also offers semantic network analysis of the social data. I am not aware how well it works or if it can work across different business domains without too much business analysis or customization.

It is not an easy task to build a repeatable predictive analytics with a everchanging large volume of social data. The quality of meaningful data is also very important in this context. I see the ambiguity of social data as one of the biggest hurdle. You will have to do lots of preprocessing and deal with many new attributes being added in the social context. Every company is unique and we will see predictive analytics manifesting itself differently for each of them. Though, I do believe that we will see many companies building very high number of predictive models which will take social data into consideration for their business needs - maybe a predictive model per product and total turn around time of a week or less from problem definition to scoring. That brings up another question - even if real-time social data is present, can you take real time actions? If not then what is the true value of real time data in context of predictive analytics? You really need a very different level of infrastruture to take advantage of it.
This whole integration of social media analytics with predictive analytics should be owned by business - not by IT.  Infact, in most of the cases, it should be owned by marketing departments because they value/understand social data more than any other department and they are also most qualified to define prediction goals from a social context, predictive behaviors, pragmatic tradeoffs and evaluating results.

It seems there are some good opportunities in this space. We still have a long way to go. A lot needs to be understood before making any claims in the industry. There are definetely some good examples in the research community like SOMA, a forecasting model developed by a researcher at the University of Maryland, Terror Organization Portal. It analyses a wide range of information about politics, business and society in Lebanon to predict, with surprising accuracy, rocket attacks by the country’s Hizbullah militia on Israel. By the middle of 2010 SOMA was sucking up data from more than 200 sources, many of them newspaper websites. There is another example of two researchers at HP Labs who have established that they can use tweets to predict how well a movie will do - the results turned out to be fairly accurate. What we don't understand that can these examples be generalized for the indusry adaption? It will be good to know more examples from the industry.

No comments:

Post a Comment