Walking on clouds – IBM and Twitter
New tooling built into BigInsights for Hadoop simplifies landing and processing raw twitter data.
If you blinked you might have missed it, but back in November 2014, IBM announced our Hadoop in the cloud service called BigInsights for Hadoop. Developers can get started with free versions of BigInsights including IBM Analytics for Hadoop on bluemix.net and the downloadable quick-start edition and when they are ready for production, deploy their applications to the new BigInsights for Hadoop cloud service.
As we know, much of the interest in cloud is driven by virtualization. Virtualization changes the economics of provisioning and managing systems, especially when average utilization is low. From a provider’s perspective, why sell a server once when you can load up a server with VMs and sell the same server to many clients.
Hadoop workloads are more challenging. Data volumes are vast, there are specialized hardware requirements, and jobs can run for hours if not days fully consuming all resources. In these production environments the benefits of virtualization are often lost, and superseded by more pressing requirements like the need for security isolation among tenants, quality of service requirements, avoiding “noisy neighbor” effects, and maximizing system performance.
High-performance computing sites have long recognized that when it comes to compute and data-intensive workloads “bare-metal rules”. This recent benchmark from my colleagues at IBM Platform Computing proves the point. Workloads vary, but for network and data intensive frameworks like Spark and Big SQL network latency and performance matters.
BigInsights for Hadoop is an easy-to-deploy, bare-metal, Enterprise Hadoop-as-a-service (eHaaS) offering that runs on IBM’s world-class Softlayer Infrastructure. It includes a 100% standard Hadoop implementation with features that include IBM Big SQL, BigSheets, Big R and Text Analytics. The service provides:
- Exceptional performance
- Security and tenant isolation
- Predictable quality-of-service
- Dedicated dev-ops team
- Global access via IBM SoftLayer
Early in 2015, IBM’s BigInsights for Hadoop cloud service became more compelling with the introduction of a free Twitter Decahose service bundled with select configurations of the service. While the wisdom of using a cloud service for processing data originating inside the firewall might be a point of debate, when the data is “born in the cloud” (as is the case with Twitter and other social media data), analyzing data in the cloud makes sense.
Developers can build applications to land Twitter data themselves using open-source libraries like Twitter4J and contract separately for access to the data, but BigInsights for Hadoop saves the hassle, delivering a pre-integrated and supported capability. Landing Twitter data into HDFS is as simple as clicking the “Run” button using the integrated Twitter application above provisioned along with BigInsights for Hadoop.
Once the raw Twitter data is landed in HDFS, clients can access it using HIVE or IBM Big SQL (Big SQL uses the HIVE metastore) or analyze, filter, transform and visualize data using components BigSheets or Big R using standard tools like R Studio or ECLIPSE.
Sentiment analysis is a core requirement across multiple industries. Whether you are a bank, an insurer, a retailer or a telco, you probably care what you clients are saying about you and your services and Twitter can be a gold mine of useful information. By understanding and analyzing sentiment, organizations can develop more compelling offers, reduce undesirable customer churn, maximize margin, and generally gain a leg up on the competition.
To learn more about BigInsights for Hadoop on the IBM SoftLayer cloud, join us at Strata + Hadoop World in San Jose IBM’s booth (booth #1115) and see the these technologies in action. Live demonstrations will be conducted at the IBM booth Thursday Feb 19th and Friday Feb 20th
This article is also posted at IBM’s Hadoop.Dev – IBM’s site for the Hadoop developer community – https://developer.ibm.com/hadoop/blog/2015/02/11/take-walk-clouds-twitter-ibm/