My Search For The Business Intelligence Chupacabra
In popular culture the Chupacabra is an almost mythical creature sighted many times in the Americas but has never been confirmed as an actual member of our fauna. Many have sighted, photographed, or even dreamed about the Chupacabra, but we still don’t have any proof that this creature really exists or what it is. In regards to social media, many of us have amassed large amounts of raw textual data and so little has been done thus far to gain insight from it. Many of us have a desire to use business intelligence to analyze social media text to get accurate insights. Decision makers want to ask questions like, “What is my product or company’s social media sentiment?” So how do we measure and score sentiment? How do we separate fact from fiction or truth from sarcasm in an aggregate form? Thus we have a digital “Chupacabra” aptly named “Sentiment Scoring” or “Social Media Sentiment Score.” When you look at all the raw data, it just seems like a cruel riddle wrapped in an enigma of digital noise! But we know there is value to be had.
So How Did We Get Here?
A few months back I was looking at doing a complete refresh of SWC’s Business Intelligence demo environment. We use this environment to showcase BI tools and architecture to our customers and prospects. It was time to start over and go beyond rebranding the same old tired BI use cases we had grown all too comfortable showing in the past. There’s only so much you can do with a generic general ledger cube demo or Bike Sales data mining model. It was time to break out of the traditional BI mold and do something fun and insightful that has a message any analytics hungry BI prospect could get excited about. And so we invented “Hyena Power.” Hyena Power Inc. or HPI is a fictitious manufacturer of home hydrogen generators that consumers use to produce hydrogen for fueling cars, water heaters, home generators and the like. Fun concept right? We’ve had a lot of fun inventing this company and sometimes I’m almost bummed our product is fake, but boy does it make for a really interesting BI Technology demo. So HPI is a new company with a great product and a hunger to grow their customer base in orders of magnitude so that GE buys them out in a leveraged buyout. HPI has a very diverse marketing engine combining Social Media, Search, Online Webinars, call center and seminars/luncheons. On the social media landscape HPI uses Yammer. The Yammer feed is where the sentiment requirement comes into play. For our demo environment, we chose Yammer as a way to simulate what a real Facebook or LinkedIn feed might look like and how that data might be scored. So Yammer is a good alternative for Social Media prototyping and is similar to what we would actually do for a real company on Facebook. Our goal at the outset of this demo exercise was to determine if we could put the Yammer data though a scoring algorithm and get insight into how our message is being perceived in the marketplace.
Putting It All Together
Before we get into the scoring, I thought I’d show you what we are trying to do from the visualization side. In the dashboard above, what we’re trying to do is provide visualization to the all the wonderful data we’re collecting from Yammer and various systems. On the bottom right, we have a heat map that shows a set of popular bloggers for HPI and what their sentiment scoring looks like in aggregate. In this chart red is bad (neutral or negative scores). At the top there is a very simple line chart over calendar weeks showing the count of positive, negative and neutral social media updates. What was interesting here is that HPI made a conscious effort to reduce neutral content and drive more positive scores and we see that correlation nicely. Finally, on the bottom left we used a packed bubble to visualize all of HPIs marketing efforts, including but not limited to social media.
How Did We Do It?
I wish I could tell you that we mastered the challenges of machine learning. But, we did not! However, we learned a lot from this project and got some great insights that helped us to build the 1.0 version rather quickly.
After spending several evenings on my tablet trying to educate myself on the concept of social media scoring, it became abundantly clear that I was way out of my pay grade and really should have taken more math when I was getting my undergrad at North Dakota State University (Go Bison!). So I was kind of stuck, but then I remembered my brother, Kevin Dotzenrod works for Rupert Murdoch. Ok, he doesn’t directly report to him but he has met him a few times at work functions. So my brother Kevin has a colleague at Dow Jones, Douglas Esanbock who is their resident expert on sentiment scoring pertaining to news stories and all things social media. Doug and I traded a few emails and got on the phone to discuss the concept. What we arrived at was something very similar to SentiWordNet research.
Essentially the scoring works like the following:
- Build vectors of positive words and negative words
- Cleanse your social media input and remove non-word characters and punctuation
- Parse the social media update into a list and score each word against the vectors
- Calculate a score of the matching results whereas a high number would be positive sentiment, a negative score would be negative sentiment and somewhere in-between would be neutral
I know some of you gearheads want more and I aim to please, so here’s a snippet of code in “R” that we used for the scoring. I will point that that much of the this code was inspired by Jeffrey Breen
After playing with R and then getting serious about building the ETL prototype in SSIS we decided to forego R for the time being and use C# in an SSIS Script Data Flow component. That of which will be an upcoming blog topic we dig into deeper next month, when one of my colleagues, Patricia Motta, will tell us about her prototype scoring SSIS package. Until then, Happy Holidays!
To learn more about BI and all of the exciting new features SWC has for the business intelligence community; please join us for our next informative Business Intelligence event.
If you enjoyed this post from Chad, please check out a few of our past posts on business intelligence:
Ask SWC: What’s So Great About Tableau?
An Agile Approach to Business Intelligence
How to Fast Track Business Intelligence
Can’t afford BI? Try the BI Analytics Tools in Everyday Software
How to Break Business Intelligence Users’ Excel Addiction
Ask SWC: What Is A New Technology That You Find Interesting?
Agile BI Software Solution
SWC’s Virtual Database Administration (VDBA) Solution